🐳 Docker Explained: Your FINAL Stop for Understanding It All
I had been working in Data for a couple of years and though this wasn’t strictly necessary for my workflow. I was wrong!
Not so long ago, I had no idea on how to use tech like Docker.
To be honest, it seemed very complex.
There are too many keywords and concepts that can confuse you pretty easily.
But now, I cannot do anything without it.
In this article, I will do a desmytified explanation about:
Containers
Images and Volumes
Dockerfiles
Docker Compose
⚠️ Disclaimer: all the examples apply to Data tools, but the explanation overall is generic.
🛠️ What is Docker?
Imagine you're an amateur chef that crafted an amazing recipe you want to show to your friends.
A fish recipe won’t make sense if your friend only has special knives for chicken.
You will have to bring as much as you can or invite them home, where you have those utils available.
The meal won’t be the same wherever you go.
Docker is like your personal kitchen-in-a-box. No matter where you cook, everything you need for your kitchen will always be available, the way you need it:
Docker is a platform that uses containers, images and volumes with everything your application needs to run.
This includes: code, packages, libraries and everything you use to run your container:
🍳 Docker Is Your Personal Kitchen
Here’s how Docker keeps your data kitchen in order:
Same Recipe, Same Results: With Docker, your container is like taking your oven, stove, and pans with you.
Run the code on your laptop, your friend’s machine, or a cloud server, the results are always the same.
Clean Kitchen, Every Time: Without Docker, your kitchen often has leftovers from previous meals, which makes it really messy.
Run the code without conflicts, doesn’t matter if you computer has different or even older dependencies.
Try New Recipes Without Worry: Whenever you want to try something new, the kitchen appliances you have might not be suitable for the job. You might fear breaking them and having to buy new ones.
You can recreate containers as many times as you want, but nothing will be lost or broken since it’s isolated.
👉 Example: You want to test an ETL pipeline using the latest PostgreSQL. Instead of upgrading your local installation, you can spin up a Docker container with the newest version and experiment inside it.
🗂️ The Blueprint and the Fridge: Images and Volumes
Every time you use Docker, you need:
The blueprint of the kitchen you are going to setup → Image
The fridge to save your preparations while cooking → Volumes
🖼️ What’s an Image?
It has all the details that you need for the kitchen to exist: utensils, ingredients, ovens, pans, etc.
But, it’s not actually a working kitchen yet.
📦 What’s a Volume?
When you shut down your kitchen (container), you can take it home in a bag (local) or store stuff in the fridge (volume).
A volume allows you to save important data or files so that they persist, even if you tear down the container.
🍕 Breaking Down a Dockerfile
Think of a Dockerfile as a recipe for your perfect kitchen setup. It will have the base configuration for your Image.
Each part of the Dockerfile sets up a different piece of your kitchen. Let’s break down the key components:
FROM
: This defines your base environment. Pick whether you want a pizza or sushi kitchen.COPY
: Copies files from your local machine into the container. Bring ingredients into your kitchen.WORKDIR
: Sets your working directory, so you know where everything happens. Choose your dedicated spot in the kitchen.RUN
: Installs dependencies and prepares your environment. Chop and prepare your ingredients before cooking.ENV
: Sets environment variables to control behavior. Set kitchen rules, such as “wash hands before cooking.”CMD
: Defines the command that starts your application when the container runs. Turn on the oven and start the cooking.
🔌 Docker vs. Docker Compose
If Docker is your personal kitchen, Docker Compose is like a group of food trucks.
It allows you to spin up multiple containers at once.
Need a PostgreSQL database, an Airflow server, and a Jupyter notebook?
Docker Compose lets you handle all these tools easily, you can also use multiple Dockerfiles if you want:
🤷♂️ Why Not Just Install Locally?
Sure, you can install everything on your local machine.
It works... until it doesn’t.
It’s pretty likely that you will always miss something, which would be the same as fetching the ingredients one by one or realising that your knives are not sharpened when you need them.
Local Installation Headaches:
Different tools need different versions of libraries.
Need to tweak settings to match a standard setup.
Removing all traces of unused tools is a pain.
With Docker, you create an isolated environment that’s consistent, clean, and portable.
👉 Example: Let's say you have an ETL pipeline using Python, Airflow, and PostgreSQL. On your local machine, versions might differ and libraries might be missing. In Docker, all of these dependencies live together inside the container.
⚠️ Disclaimer: This will only apply for Python users.
There is a good way to validate if someone uses Docker or not. Run this command on your terminal:
pip freeze
When I started my career in Data, running that command would have retrieved an output like this:
beautifulsoup4==4.10.0
boto3==1.18.65
botocore==1.21.65
elasticsearch==7.13.4
fastapi==0.68.2
gunicorn==20.1.0
jsonschema==3.2.0
matplotlib==3.4.3
mlflow==1.19.0
numpy==1.21.2
pandas==1.3.3
... (the list continues)
It meant that my computer was flooded with packages that you could need updates to match the requirements of newer projects.
And that’s the type of chaos Docker avoids!
🚀 Why You Need Docker
Consistency: you stop fixing different issues across environments.
Portability: you ship your projects faster without handling unnecessary conflicts.
Isolation: don’t be afraid if something fails in one container, it won’t affect others.
Next time you see this:
Image - Container - Volume
Think of this:
Blueprint - Kitchen - Fridge
📝 TL;DR
🛠️ What is Docker? - Your personal kitchen-in-a-box, ensuring consistency wherever you run your data projects.
🍳 Docker Is Your Personal Kitchen - Control, cleanliness, and freedom to experiment without breaking anything.
🗂️ The Blueprint and the Fridge: Images and Volumes - The instructions to build the kitchen you need, and the storage to save preparations you cook.
🍕 Breaking Down a Dockerfile - The recipe for your container, setting up everything from the base environment to starting the app.
🔌 Docker vs. Docker Compose - Orchestrate multiple services with one command.
🤷♂️ Why Not Just Install Locally? - Avoid dependency hell and inconsistent environments!
🚀 Why You Need Docker - Ensures consistency, portability, and isolation for your data engineering projects.
If you enjoyed the content, hit the like ❤️ button, share, comment, repost, and all those nice things people do when like stuff these days. Glad to know you made it to this part!
Hi, I am Alejandro Aboy. I am currently working as a Data Engineer. I started in digital marketing at 19. I gained experience in website tracking, advertising, and analytics. I also founded my agency. In 2021, I found my passion for data engineering. So, I shifted my career focus, despite lacking a CS degree. I'm now pursuing this path, leveraging my diverse experience and willingness to learn.