Fundamentals of Azure Data Factory

In this blog, you will Learn the Fundamentals of Azure Data Factory and key components require performing any data-driven workflows and automate azure data factory pipelines through triggers.

Introduction to Azure Data Factory

  • Azure Data Factory is a managed data integration service that lives in the Azure cloud.
  • Azure Data Factory is using complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
  • ADF Services allow you to create data-driven workflows in the cloud to automating data movement and data transformation.
  • Example:
    • There is a Survey taking a company to collect feedback of products in many world regions. There are taking surveys through many platforms like manually entry, online survey, Social media, etc. now data stored in relational, non-relational, unorganized data.
    • The company wants to analyze feedback increase sell, drive business growth, and provide a better experience to its customers.
    • Now the company joins all the data from different sources and transforming data into a cloud data warehouse such as Azure SQL Data Warehouse to easily build a report on top of it.
    • You can automate this workflow and monitor and manage it on a daily schedule.
  • So using ADF, you can perform data integration across all your data sources whether they are on Azure, on-premises or on other public clouds such as AWS, Google, etc.
  • ADF Support more than 72+ data sources.

Overview of key components – Data Factory

In any Azure Data Factory instances, there are some key components require performing any data-driven workflows

Azure Data Factory Components
Azure Data Factory Components

1 Pipeline

ADF can have one or more pipelines. A pipeline is a logical grouping or containers of activities that together perform a you can manage the activities as a set instead of each one individually.
For example, you can deploy and schedule the pipeline to perform multiple activities

2. Activities

Data Factory supports three types of activities:

  1. Data movement activities
    • Copy Activity in Data Factory copies data from a source data store to a sink data store.
    • Data from any source can be written to any sink.
    • ADF Support more than 72+ data sources.
  2. Data transformation activities
    • Example: Hive, U-SQL, Custom Code, Stored procedures, Spark, etc.
  3. Control activities.
    • Execute Pipeline, for each Activity, Lookup activity, if conditions, wait activity, etc.

3. Datasets

Datasets represent data structures or Metadata information within the data stores that simply point to or reference the data you want to use in your activities as inputs or outputs.

4. Linked services

Linked services are much like connection strings, which define the connection information that’s needed for Data Factory to connect to external resources.

Think of it this way: a connected service defines the association to the information supply, and a dataset represents the structure of the information. For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account.

Azure blob dataset specifies the blob instrumentation and also the folder that contains the information.

5. Triggers

You can schedule pipeline so it will invoke automatically based on your scheduled time. Also, you can execute a pipeline manually if required.

What next?

  • I will write blogs on:-
    • Overview – Data Factory Navigation
    • How to build your first ADF Pipeline?
    • How copy data from SQL Azure to Blog Storage in Azure Data Factory?

About the author

Nirav Gandhi

Hi, My name is Nirav Gandhi, this blog is dedicated to providing unlimited database solutions and helping people to learn about Database Technology.

View all posts

Leave a Reply

Your e-mail address will not be published. Required fields are marked *