Almost everyone is familiar with spreadsheet software. It's the de-facto standard tool for many industries and the first tool that comes to mind when someone thinks about data entry.
Most programmers when they want to use data from a spreadsheet, such as Excel, they automatically think of CSV (comma-separated value files). These files are quite simple, and as their name implies they are separating each column using a comma delimiter, and that's why they are preferred. Not all Excel files can be converted to CSV through. An Excel spreadsheet with multiple sheets will have to be converted into multiple CSV files, one for each sheet. This makes it extremely less convenient than having a single Excel file with multiple sheets.
What is not widely known, is that in Python at least, there are quite powerful tools that allow you to work with Excel files, both for reading and writing. Let's quickly see how to do that in Python.
Firstly, you would need to install Pandas, the famous data analytics library, along openpyxl, which actually does the heavy work of reading/writing Excel files, to your virtual environment (or globally, depending on your setup).
pip install pandas pip install openpyxl
import pandas as pd writer = pd.ExcelWriter('path/to/output') # 1. df1 = pd.DataFrame( [['value1', 'value2', ], ['value1', 'value2', ]], columns=['col1', 'col2', ], ) df1.to_excel(writer, sheet_name='Sheet 1', index=False) # 2. df2 = pd.DataFrame( [['value1', 'value2', ], ['value1', 'value2', ]], columns=['col1', 'col2', ], ) df2.to_excel(writer, sheet_name='Sheet 2', index=False) writer.save() # 3.
- This is the main object that represents the entire Excel file. Instead of the path, you can provide a stream (such as
- For each sheet you want your Excel file to have, create a DataFrame with the column and row data. The
to_excelmethod appends the sheet to the Excel file. The
index=Falseparameter is for the row labels not to be written on the sheet.
- This actually flushes the data to the disk and creates the file.
import pandas as pd xl = pd.ExcelFile('path/to/input') df = pd.read_excel(xl, sheet_name='Sheet 1', usecols=['col1', 'col2', ], ) # 1. for i in range(len(df)): # 2. col1 = df.loc[i, 'col1'] if not pd.isnull(col1): # 3. print(col1)
- For each sheet, you read one DataFrame where you define which columns you want to load.
- Loop through each row of the sheet.
- Check if the cell is empty, otherwise proceed with accessing its contents.
Hopefully, this was a quick and easy example of how to write and read Excel files in Python.