Azure Data Factory & Pipelines
The tool that moves and reshapes data on a schedule — explained without a line of code.
What you'll learn
- Explain what Data Factory does and why it exists
- Describe pipelines, activities and triggers
- Know what copy, transform and integration runtime mean
Data rarely starts out where you need it or in the shape you want. It might sit in a sales system, an old database and a stack of files, all in different formats. Azure Data Factory is the tool that fetches that scattered data, tidies it up and delivers it somewhere useful — automatically, on a schedule, without anyone copying and pasting. If you’ve ever wondered who or what keeps the morning report fresh, the answer is often a Data Factory job quietly running overnight.
What Data Factory actually does
Think of Data Factory as a factory conveyor belt for data. Raw material goes in one end, moves through a series of stations, and a finished product comes out the other. Its real name for this job is orchestration — coordinating the steps so each one happens in the right order, at the right time, without a person watching. Crucially, Data Factory is about moving and arranging data, not analysing it. It’s the logistics department, not the brains. It picks data up, carries it across, reshapes it, and drops it where the analysis tools can use it.
A pipeline is the conveyor; each station is an activity; a trigger says when the belt starts.
Pipelines and activities: the belt and its stations
A pipeline is one complete conveyor belt — a named sequence of steps that, run start to finish, gets a job done (“refresh the sales data every night”). Each individual step on that belt is an activity. One activity might grab files from a source, the next might clean them up, the next might drop them in the data lake. String the activities together in order and you have a pipeline. This breakdown is handy because if something goes wrong, the team can point to the exact station that jammed rather than scrapping the whole line.
The two activities you’ll hear about most are copy and transform. A copy activity simply moves data from one place to another, unchanged — like carrying a box from the loading dock to the shelf. A transform activity reshapes the data along the way: combining columns, filtering out junk, converting formats, joining two sources into one. Copy is “move it”; transform is “fix it as it moves”.
Data Factory moves and reshapes data; it doesn’t analyse it. Think conveyor belt, not calculator.
Triggers: deciding when it runs
A pipeline doesn’t run itself — something has to press “go”. That something is a trigger. The most common kind is a schedule (“run every night at 2am”), so reports are ready before anyone arrives. Triggers can also fire when a new file lands in storage, or be kicked off by hand for a one-off run. The point is that triggers make the whole thing automatic: set the schedule once and the data keeps flowing without anyone lifting a finger. When someone says “the pipeline didn’t fire this morning”, they mean the trigger didn’t go off and the data is stale.
Integration runtime: the engine room
One more term shows up in these chats: integration runtime. It sounds intimidating but means something simple — the engine that actually does the work behind a pipeline. It’s the muscle that reaches out, picks up the data and carries it across. There’s a version for data already in the cloud and a version that can reach back into your company’s own offices to fetch data sitting on internal systems. You’ll rarely touch it, but now you know it’s just “the worker doing the lifting”, not a mysterious extra product.
Spot it: pipeline parts
Read each situation and decide for yourself, then tap a card to flip it and check your answer.
Sort the pipeline concepts
Drag each item into the bucket it belongs to — or tap an item, then tap a bucket. Hit Check placement when you’re done.
Here's where each one goes:
- A named sequence of steps end to end → Pipeline — the pipeline is the complete conveyor belt for one job.
- One individual station on the belt → Activity — each step (copy, transform, load) is an activity.
- What decides when the belt starts → Trigger — triggers fire the pipeline on a schedule or on an event.
- Moving data from the CRM to the lake every night → Pipeline — the full overnight refresh is a pipeline.
- Combining two tables and filtering out nulls → Activity — that's a transform activity reshaping data mid-flow.
- Set to "run every Monday at 6am" once and forgotten → Trigger — a scheduled trigger automates the run without anyone pressing go.
Tip: drag with a mouse, or tap an item then tap a bucket on touch screens. Get one wrong and the answer key appears.
How to use it
You don’t build pipelines to benefit from understanding them. When the dashboard is out of date, you can ask the right question: “Did the pipeline run, or did the trigger miss?” When a new data source appears, you’ll understand “we’ll add a copy activity to pull it in”. And when someone promises “we’ll transform it in Data Factory before it hits the report”, you know they mean the cleaning happens on the conveyor belt, not in your spreadsheet. Recognising Data Factory as the logistics layer — moving and shaping data so the analysis tools get clean material — is the single most useful thing to carry into the rest of this course.
Quick check
1. Azure Data Factory mainly…
2. A single step inside a pipeline is called an…
3. A trigger is what…