4 Ways to Work with Spreadsheets In Data Roles š
Sooner or later, you will need to deal with Spreadsheets as source in your Data processes. We'll talk how to tackle it with different tools!
Google Sheets might be one of the most debated tools in the field of Data Engineering.
The main reason is as follows:
āI create tons of ETL scripts and Dashboards for stakeholders to ask for spreadsheets exports.ā
Not everyone knows Python or SQL to process data at scale, so good all Google Sheets makes it easier for them.
Luckily, we have great tools to collect whatever data is being created within those spreadsheets and process it without much effort.
Stakeholders need their data fast and they canāt wait weeks for fancy pipelines and data integrations. So we need to figure out the right tool for the job.
Weāll explore 4 interesting ways to leverage Google Sheets data in your workflows.
šØš»āš» Sourcetable - The AI Spreadsheet
Recently Iāve been exploring on how to build some AI projects for some stakeholders who are heavy users of spreadsheets.
The baseline was a Slack Bot that could read Google Sheets, run SQL joins and answer questions.
But thereās always a but.
End users started asking for features:
Can it make charts?
Will it run complex analysis?
Can I chat the same way I do with ChatGPT?
Thatās when I realised I was the bottleneck.
I needed to stop being the techy guy who blocked sales team.
And thatās what [Partner] Sourcetable does.
I generated āThe Messy CRMā database, which has 3 tables: companies, leads, activities.
I asked questions that implied schema understanding, data analysis, calculations, correlations and chart creation.
I ended up with a really nice tab with useful reports.






Compared to ChatGPT, I donāt have to go back and forth into scattered conversations because I can just simply save the table outputs and charts wherever I want and use Sourcetable as a regular spreadsheet.
š ļø Google Sheets API
For full control, directly interacting with the Google Sheets API is a powerful option.
For all Google Sheets API scripts in the docs, you can identify this pattern:
Only the specific actions (e.g., read, write, append) will vary in the code, but its overall structure remains familiar.
All actions require:
A spreadsheet id (e.g. SPREADSHEET_ID = āJasw321K29182_12389HsaswdAā)
A range name (e.g. RANGE_NAME = ādata!A1:E12ā)
Values to update, append, write, etc (If applicable)
For each action you want to run, you will execute the service HTTP request to Google Sheets API.
Whenever you see '...', it indicates a placeholder for the previously described pattern.
You can read a spreadsheet like this:
Update values like this:
Appending values uses the same code snippet but replaces 'update' with 'append.'
You might wonder why someone would use this if Python wrappers exist? Flexibility.
Python wrappers cover the more typical use cases, but if you need something specific for an edge case, you might want to enhance them with the plain API.
Learn more about the possibilities in the Google Sheets API docs.
š Python āgspreadā package
Probably the best choice for Data Engineers: Remove the pain from plain Google Sheets API and use a wrapper.
It has a quick learning curve and can be easily extended.
The usual workflow to read a spreadsheet looks like this:
Then, you can make further processing with Pandas or DuckDB, see example:
If you want to expand functionalities, you can use gspread-pandas or gspread-dataframe. Hereās an example to save a dataframe into a worksheet or update a range of cells:
Learn more about the possibilities in the gspread pypi docs.
šļø How does Google Sheets expect to receive data?
Writing has its challenges, because the Google Sheets API requires a list of lists and is not compatible with DataFrames, needing additional formatting. This is how it looks in gspread:
Keep in mind that empty strings (nulls), infinite values, and other caveats must be addressed, as Google Sheets handles data types differently from DataFrames.
The previous example might not be enough, depending on the structure of your data.
Hereās an example of that required processing:
Many of these edge cases can be addressed using the built-in df_to_sheet method from the gspread-pandas package.
š¦ DuckDB, keep making things easier!
A couple weeks ago, I mentioned that some things were cooking for DuckDB
And now itās for real: DuckDB Google Sheets Extension. The greatest thing? itās SQL friendly, itās much easier because thereās no learning curve!
The main highlight is code reduction: we went from 20-30 lines with previous examples to 1 line for the same result.
Writing it is also easier, you donāt need to worry about formatting because itās handled by the extension.
Learn more about the possibilities in the DuckDB Sheets extension docs..
š Authentication with Service Accounts
Sometimes, the main blocker to get this workflow up and running is getting authentication right.
There are many options, such as OAuth, Service Account and others, but Iāll talk about the most powerful in my opinion.
Hereās a quick guide to set up a Service Account:
Head to Google Cloud Console and create a new project.
Navigate to APIs & Services ā Credentials and create a Service Account.
Download the JSON file for the service account and add it to your project.
Share the Service Account Email with the Google Sheets you want to access. This also works for the folder where the Spreadsheets are saved.
If you have a one shot use-case that doesnāt require an automation workflow and you donāt want/need to create a service account, you can just use Google Collab.
This is how gspread
looks without the authentication hassle:
You can keep using gspread or DuckDB to extend your workflows within the notebook environment.
If at some point you need to automate this, you will need to move away from Collab, install required packages and create your service account to handle things properly.
š TL;DR
šØš»āš» Sourcetable - The AI Spreadsheet: chat with your data, ask for charts, all in your typical spreadsheet workflows without creating technical bottlenecks.
š ļø Google Sheets API: full flexibility to extend methods as you please.
š Python āgspreadā package: simple python wrapper to speed up your development.
šļø How does Google Sheets expects to receive data?: Formatting tips and considerations.
š¦ DuckDB, keep making things easier!: SQL in memoryābest for handling and querying large datasets within Sheets.
š Authentication with Service Accounts: step-by-step guide to create your credentials for Google Sheets API.
If you enjoyed the content, hit the like ā¤ļø button, share, comment, repost, and all those nice things people do when like stuff these days. Glad to know you made it to this part!
Hi, I am Alejandro Aboy. I am currently working as a Data Engineer. I started in digital marketing at 19. I gained experience in website tracking, advertising, and analytics. I also founded my agency. In 2021, I found my passion for data engineering. So, I shifted my career focus, despite lacking a CS degree. I'm now pursuing this path, leveraging my diverse experience and willingness to learn.