ETL, ELT & Data Pipelines
How data quietly moves, gets cleaned, and is made ready before it ever reaches a report.
What you'll learn
- Explain what a data pipeline does in plain English
- Tell ETL and ELT apart and know why it matters
- Understand why pipelines run on schedules and sometimes break
Data almost never arrives at a dashboard in the shape it started. Between the system that first recorded it and the report a leader reads, it passes through a series of automated steps that move it, tidy it, and stitch it together. Those steps are the real workhorses of the hidden plumbing behind dashboards. You will hear them called a data pipeline, and the two recipes for running one are nicknamed ETL and ELT. Knowing the difference will demystify a surprising number of conversations about why a report is late, wrong, or expensive.
From raw source to finished report — the pipeline is everything in between.
What a pipeline actually does
A data pipeline is an automated route that carries data from where it is created to where it is useful. Along the way it does three jobs you would otherwise do by hand. First it moves the data — pulling sales from one system, web visits from another, finance from a third. Then it cleans it — fixing dates that are written five different ways, removing duplicates, filling gaps, and making sure “GB” and “United Kingdom” are treated as the same place. Finally it combines and prepares the data so a single report can show numbers that genuinely line up.
The reason this matters to you is reliability. When a colleague says “the figures don’t match,” the cause is usually somewhere in this cleaning-and-combining work, not in the dashboard itself.
ETL: clean first, then store
ETL stands for Extract, Transform, Load. You extract the data from its sources, transform it — clean and reshape it — and only then load the finished result into the warehouse. The big advantage is that only polished, trustworthy data ever lands in the place people query. It is the classic, careful approach, and it suits situations where you know exactly what the final shape should be.
ELT: store first, then clean
ELT simply swaps the last two steps: Extract, Load, Transform. You pour the raw data into a powerful cloud store first, then do the cleaning afterwards, right where it sits. Modern cloud platforms are so fast and roomy that this often wins. It is flexible — you keep the raw data, so if you later need it shaped a different way, you have not thrown anything out — and it suits data lakes and lakehouses beautifully.
ETL cleans on the way in; ELT cleans once it has arrived. Same ingredients, different order of cooking.
Why pipelines run overnight
Many pipelines run on a schedule — a classic example is the overnight job that refreshes the warehouse while everyone is asleep, so the morning’s reports are ready and the live systems are not slowed during business hours. This is exactly why a dashboard can be a few hours behind the live app: it is showing you last night’s refresh, not this second’s reality. That is normal and usually desirable.
Why they break, and why it costs you
Pipelines are automated, but they are not magic. A source system changes a column name, a file arrives late, a date format shifts — and the job either fails or, worse, quietly loads bad data. A broken pipeline is one of the most common reasons a report is suddenly empty, frozen, or obviously wrong. Good teams build in checks and alerts so a failure is caught before it reaches the people relying on it, rather than after.
Spot it: pipeline steps
Read each situation and decide for yourself, then tap a card to flip it and check your answer.
Sort the pipeline concepts
Drag each item into the bucket it belongs to — or tap an item, then tap a bucket. Hit Check placement when you’re done.
Here's where each one goes:
- Cleans data before it ever lands in the warehouse → ETL — transformation happens before the load step.
- Pours raw data in first, then transforms in the cloud store → ELT — load comes before transform.
- Suits situations where the final shape is well known upfront → ETL — the classic careful approach when requirements are clear.
- Keeps the raw copy so you can reshape it later → ELT — nothing is lost because raw data is kept in the store.
- A column rename quietly loads wrong values → Pipeline risk — source changes that break a pipeline often fail silently.
- Overnight job reprocesses all history instead of just new records → Pipeline risk — a logic error that looks fine but drives up cost and time.
Tip: drag with a mouse, or tap an item then tap a bucket on touch screens. Get one wrong and the answer key appears.
How to use it
You will not write a pipeline, but you can ask the questions that keep one honest. Try: “Is this an ETL or ELT setup — and where does the cleaning happen?” or “What time does the overnight refresh finish, so I know how fresh the morning report is?” or “If the report looks wrong, can we check whether last night’s pipeline ran successfully?” When something looks off, resist blaming the dashboard first. Nine times out of ten the story is upstream, in the quiet plumbing that moved and cleaned the data while you were getting on with your day.
Quick check
1. A data pipeline mainly exists to…
2. In ETL, the data is cleaned…
3. A dashboard showing yesterday's numbers is most likely…