TOOLS Learning Roadmap

This roadmap organizes 7 comprehensive guides into a structured learning path for production data engineering tools. Follow this sequence to master the complete stack for Le Wagon bootcamp and beyond.


📍 Quick Navigation

Start here based on your experience:

  • 🟢 New to containers: Start with TIER 1
  • 🟡 Know Docker, need orchestration: Start with TIER 2
  • 🔴 Need production patterns & scaling: Continue to TIER 3

🎯 TIER 1: Containerization Fundamentals (8-10 hours)

Package and run applications consistently everywhere.

#TopicPageTimeLearn
1Docker FundamentalsDocker-Fundamentals3hImages, containers, volumes, networking
2Docker ComposeDocker-Compose4hMulti-container orchestration locally
3Git & GitHubGit-GitHub2hVersion control, collaboration, PRs

Subtotal: ~9 hours
After TIER 1 you can: Package applications in Docker, run multi-container stacks locally, collaborate on code.


🚀 TIER 2: Orchestration & Transformation (10-12 hours)

Automate pipelines and transform data with industry-standard tools.

#TopicPageTimeLearn
4Apache AirflowApache-Airflow5hDAGs, operators, scheduling, monitoring
5dbt (Data Build Tool)dbt-Data-Build-Tool5hSQL transformations, testing, lineage

Subtotal: ~10 hours (cumulative: ~19 hours)
After TIER 2 you can: Orchestrate complex pipelines, test data quality, scale transformations.


🔥 TIER 3: Scaling & Cloud (8-10 hours)

Process massive datasets and use managed cloud platforms.

#TopicPageTimeLearn
6PySpark FundamentalsPySpark-Fundamentals4hDistributed computing, RDDs, DataFrames, SQL
7Cloud Data WarehousesCloud-Data-Warehouses4hSnowflake, BigQuery, Redshift architecture & setup

Subtotal: ~8 hours (cumulative: ~27 hours)
After TIER 3 you can: Process terabyte-scale data, use cloud platforms, build enterprise-grade pipelines.


📊 Learning Path by Goal

Goal: “I need bootcamp basics (MVP)”

Follow TIER 1 + TIER 2 (~19 hours)
Time to complete: 2-3 weeks
Bootcamp readiness: 95%
Most important for Le Wagon

Goal: “I want to be job-ready post-bootcamp”

Follow TIER 1 + TIER 2 + TIER 3 (~27 hours)
Time to complete: 4-5 weeks
Bootcamp readiness: 100%
Job-market readiness: 85%


🗓️ Le Wagon Timeline Alignment

Before Bootcamp (Oct 31, 2026)

MilestoneDateWhat to KnowPagesPriority
TIER 1 CompleteBy Oct 8Docker, Compose, Git basics1-3🔴 CRITICAL
TIER 2 CompleteBy Oct 25Airflow DAGs, dbt models4-5🔴 CRITICAL
TIER 3 StartBy Oct 31PySpark, Cloud intro6-7🟡 Nice-to-have

During Bootcamp (Nov 2026 - March 2027)

Weeks 1-2 (Container & Orchestration):
You already know Docker + Airflow → focus on production patterns
Reference: Docker-Compose, Apache-Airflow

Weeks 3-4 (Data Transformation):
Learn dbt in depth, data quality testing
Reference: dbt-Data-Build-Tool

Weeks 5-6 (Big Data & Cloud):
Introduction to Spark & cloud warehouses
Reference: PySpark-Fundamentals, Cloud-Data-Warehouses

Weeks 7+ (Capstone):
Build end-to-end pipeline with all tools integrated
Reference: All pages (integrated architecture)


📝 Daily Study Workflow (45 min/day)

Weeks 1-2 (TIER 1, Pages 1-3)

25 min: Read page + understand concepts
15 min: Follow along with examples locally (Docker, Git)
5 min: Notes & summary

Weeks 3-4 (TIER 2, Pages 4-5)

20 min: Read advanced concepts
20 min: Write code (DAG / dbt model)
5 min: Run & verify against data

Weeks 5-6 (TIER 3, Pages 6-7)

15 min: Read architecture & concepts
20 min: Set up environment (Spark, cloud credentials)
10 min: Run example query or transformation

🎓 Before You Start: Prerequisites

You should:

  • Have Docker installed (docker --version)
  • Understand Python basics (functions, loops, dicts)
  • Have git installed (git --version)
  • Know basic command line navigation
  • Have PostgreSQL or similar DB available locally

If not: Spend 2 hours on Prerequisites before starting TIER 1.


✅ Consolidation Checklist

After TIER 1

  • Build a Docker image from a Dockerfile
  • Run a container with port mapping and volumes
  • Define a docker-compose.yml with 2+ services
  • Create a GitHub repo and push code
  • Merge a pull request on GitHub

After TIER 2

  • Write an Airflow DAG with 3+ tasks
  • Schedule an Airflow pipeline to run daily
  • Write a dbt model with 3+ tests
  • Generate dbt documentation (dbt docs)
  • Build a complete ETL: Airflow → dbt → Postgres

After TIER 3

  • Run PySpark locally on 1GB+ dataset
  • Query a cloud warehouse (Snowflake/BigQuery)
  • Load data into cloud warehouse via Airflow
  • Build PySpark transformation in dbt
  • Create a project spanning all 7 tools

🚨 Common Pitfalls (Don’t Fall Into These!)

PitfallWhy It MattersFix
Not using volumes in DockerData lost when container stopsAlways mount volumes for persistence
Hardcoding credentialsSecurity riskUse .env files, never commit secrets
Circular DAG dependenciesPipeline fails silentlyAlways use >> syntax correctly
No dbt testsData quality fails in productionWrite tests before deployment
Large commitsImpossible to reviewCommit frequently, small changes
Using pandas for GB+ datasetsMemory crashes, slowUse PySpark or cloud warehouses
Not monitoring cloud costsSurprise billsEnable cost alerts on Snowflake/BigQuery

📚 Full Page Directory

#TitleDescriptionTimeDifficultyWhen
1Docker-FundamentalsImages, containers, volumes, networking3h🟢 BeginnerPre-bootcamp
2Docker-ComposeMulti-container apps, services, networks4h🟡 IntermediatePre-bootcamp
3Git-GitHubVersion control, branches, pull requests2h🟢 BeginnerPre-bootcamp
4Apache-AirflowDAGs, operators, scheduling, monitoring5h🟡 IntermediatePre-bootcamp
5dbt-Data-Build-ToolModels, tests, documentation, lineage5h🟡 IntermediatePre-bootcamp
6PySpark-FundamentalsRDDs, DataFrames, SQL, distributed computing4h🟡 IntermediateWeek 5+ (bootcamp)
7Cloud-Data-WarehousesSnowflake, BigQuery, Redshift, architecture4h🟡 IntermediateWeek 5+ (bootcamp)

Total: ~27 hours
Recommended pace: 1.5 hours/day over 4-5 weeks


🔗 Integration with Your Knowledge Base

These TOOLS pages connect with your existing knowledge:

Python Fundamentals (1_PYTHON)
        ↓
   SQL Basics (2_SQL)
        ↓
Docker (package code + DB)
        ↓
Docker Compose (local multi-container stack)
        ↓
Git (version control everything)
        ↓
Airflow (orchestrate workflows)
        ↓
dbt (transform in warehouse)
        ↓
PySpark (parallel processing)
        ↓
Cloud Warehouse (production platform)

🎯 Your First Project (After TIER 1)

Build a containerized pipeline:

  1. Create Dockerfile for Python ETL script
  2. Add docker-compose.yml with Postgres + app
  3. Push to GitHub with clear commits
  4. Run locally: docker-compose up

Time: 3-4 hours
Deliverable: GitHub repo with working stack


🎉 Your Second Project (After TIER 2)

Build complete ETL pipeline:

  1. Extract: Airflow task pulls from API
  2. Load: Lands in Postgres staging
  3. Transform: dbt models clean and aggregate
  4. Test: dbt tests validate quality
  5. Orchestrate: Airflow schedules daily runs
  6. Version control: Everything in Git

Time: 8-10 hours
Deliverable: End-to-end pipeline, scheduled, tested


🌟 Your Capstone (After TIER 3)

Production-ready data pipeline:

  1. Extract: Airflow pulls from multiple APIs
  2. Load: Lands in cloud warehouse (Snowflake/BigQuery)
  3. Transform: dbt + PySpark for complex computations
  4. Test: Comprehensive data quality checks
  5. Scale: Process 1GB+ dataset efficiently
  6. Monitor: Airflow tracking + cost monitoring
  7. Deploy: CI/CD via GitHub + dbt Cloud

Time: 20+ hours
Deliverable: Job-ready portfolio project


📞 Getting Help

When stuck:

  1. Check the page’s “Tips & Gotchas” section
  2. Run diagnostics:
    • Docker: docker ps, docker logs
    • Git: git log, git status
    • Airflow: airflow dags list, check logs
    • dbt: dbt debug, dbt run --select model_name
    • Spark: Check Spark UI at localhost:4040
  3. Search error message (usually very clear)
  4. Check official docs:
    • Docker: docs.docker.com
    • Airflow: airflow.apache.org
    • dbt: docs.getdbt.com
    • PySpark: spark.apache.org/docs/latest/api/python
    • Cloud DW: cloud.google.com/bigquery/docs (or Snowflake/Redshift equivalent)

🏆 Final Goals

By the end of this roadmap, you’ll be able to:

Package Python code in Docker
Compose multi-service stacks locally
Orchestrate complex pipelines with Airflow
Transform data with dbt & testing
Collaborate professionally with Git
Scale to terabyte datasets with PySpark
Deploy on cloud warehouses (Snowflake/BigQuery)
Build production-ready ETL pipelines
Follow Le Wagon bootcamp from Day 1 with confidence
Land jobs with modern data engineering skills


🚀 Next Steps

  1. Install prerequisites:

    docker --version
    docker-compose --version
    git --version
    python --version
  2. Start TIER 1: Open Docker-Fundamentals

  3. Build as you learn: Every page has hands-on examples

  4. Push to GitHub: Version control your learning

  5. Track progress: Check off Consolidation Checklist after each TIER


Ready to start? → Begin with Docker-Fundamentals

Already know Docker? → Jump to Apache-Airflow

Want to scale? → See PySpark-Fundamentals and Cloud-Data-Warehouses

Questions? Review the “Related” section at the bottom of each page for cross-references.


This roadmap is designed for data engineers preparing for Le Wagon bootcamp and job market entry. Total time investment: ~27 hours to master production-grade modern data engineering tools.

Last updated: Jan 22, 2026