To learn Azure Data Factory REST API invocation and configuration, start by understanding the key components involved.
The Azure Data Factory REST API is a powerful tool for automating data integration tasks.
To invoke the API, you'll need to use the Azure Data Factory Management API, which is a REST-based API that allows you to create, update, and delete data factories, pipelines, datasets, and other related entities.
The API uses standard HTTP verbs like GET, POST, PUT, and DELETE to perform operations.
Linked Service Setup
To set up a linked service in Azure Data Factory for API calls, you'll need to create a new Linked service for REST API. Provide Authentication Type and AAD resource values as mentioned below: Authentication Type should be System Assigned Managed Identity and AAD resource should be https://graph.microsoft.com/.
There are two ways to create a REST linked service, either through the Azure portal UI or by providing specific properties. To create a REST linked service using the UI, browse to the Manage tab in your Azure Data Factory or Synapse workspace and select Linked Services, then select New.
The following properties are supported for the REST linked service: type, url, enableServerCertificateValidation, authenticationType, authHeaders, and connectVia. The type property must be set to RestService and url is the base URL of the REST service.
Here's a summary of the required properties for the REST linked service:
Authentication types include Anonymous, Basic, AadServicePrincipal, OAuth2ClientCredential, and ManagedServiceIdentity.
Authentication
Authentication is a crucial aspect of Azure Data Factory's REST API. You can use various authentication methods to connect to your data sources, including Managed Identity, Basic Authentication, Service Principal, OAuth2 Client Credential, and Anonymous Authentication.
Azure Data Factory automatically creates an app in Azure Active Directory when you create a data factory, making it easy to grant access to Graph API or other services. This is known as Managed Identity.
To use Managed Identity, you need to set the authenticationType property to ManagedServiceIdentity and specify the Microsoft Entra resource you are requesting for authorization, such as https://management.core.windows.net.
You can also use user-assigned managed identity, which requires specifying the user-assigned managed identity as the credential object, in addition to the aadResourceId.
Basic Authentication involves setting the authenticationType property to Basic and specifying the user name and password to access the REST endpoint. You can store the password securely in Data Factory as a SecureString.
Service Principal authentication requires setting the authenticationType property to AadServicePrincipal and specifying the Microsoft Entra application's client ID and tenant information. You can also use a service principal key or certificate for authentication.
OAuth2 Client Credential authentication involves setting the authenticationType property to OAuth2ClientCredential and specifying the token endpoint, client ID, client secret, and scope of the access required.
Here's a summary of the authentication types and their required properties:
By using the correct authentication method and specifying the required properties, you can securely connect to your data sources and perform API calls using Azure Data Factory's REST API.
API Invocation
API Invocation is a crucial step in working with Azure Data Factory's REST API. To invoke the API, you need to specify the base URL of the REST service, which is a required property.
You can set the base URL in the "url" property, which is also a required property. This is where you'll enter the full URL of your REST service.
Authentication headers can also be specified for additional HTTP request headers. This can be useful for authentication types like API key authentication, where you can select the authentication type as "Anonymous" and specify the API key in the header.
Invoking REST API
Invoking REST API is a crucial step in API invocation. You can use the REST connector as a source or sink in Azure Data Factory to invoke REST APIs.
The REST connector as a source requires the type property to be set to RestSource. This is a must-have, as it tells the connector that you're using a REST API as the source. The requestMethod property can be set to either GET or POST, but GET is the default.
Additional headers can be added using the additionalHeaders property, but the REST connector ignores any "Accept" header specified in these headers. Instead, it will auto-generate a header of Accept: application/json, as it only supports responses in JSON.
When using pagination, the array of objects as the response body is not supported. However, you can use the paginationRules property to compose next page requests.
Here's a table summarizing the properties for the REST connector as a source:
For the REST connector as a sink, the type property must be set to RestSink. This is a must-have, as it tells the connector that you're using a REST API as the sink. The requestMethod property can be set to POST, PUT, or PATCH, with POST being the default.
The httpCompressionType property can be set to either none or gzip, but none is the default. The writeBatchSize property determines the number of records to write to the REST sink per batch, with a default value of 10000.
When using the REST connector as a sink, the data will be sent in JSON with a specific pattern. You can use the copy activity schema mapping to reshape the source data to conform to the expected payload by the REST API.
Copy Activity Properties
In API invocation, understanding the properties of a copy activity is crucial for successful data transfer.
The copy activity source section supports a list of properties, which are essential for data retrieval.
This list includes properties that enable the source to function correctly.
The copy activity sink section also has a list of supported properties, which are vital for data storage.
These properties ensure that data is properly written to the sink.
By understanding and utilizing these properties, you can optimize your API invocation process.
Configuration
To configure the Azure Data Factory REST API, you need to define the connector configuration details.
The authentication type must be set to Basic, which requires specifying the user name and password. The user name is the value used to access the REST endpoint, and it's a required property.
To store the password securely, mark the field as a SecureString type or reference a secret stored in Azure Key Vault.
Here's a summary of the required properties for basic authentication:
Use Basic
To use basic authentication, you need to set the authenticationType property to Basic. This is a straightforward process that requires a few key properties to be specified.
The userName property must be set to the user name you want to use to access the REST endpoint. This is a required property, so don't forget to include it in your configuration.
The password property is also required and must be set to the password for the user. To store the password securely, mark this field as a SecureString type in Data Factory. You can also reference a secret stored in Azure Key Vault.
Here's a summary of the properties you need to specify for basic authentication:
By following these simple steps, you can configure basic authentication in your Data Factory and start accessing your REST endpoint securely.
Dataset Properties
To configure your dataset, you'll need to set the type property to RestResource. This is a requirement for REST datasets.
The type property must be set to RestResource, as this is the only type supported by REST datasets.
If you want to copy data from a specific resource, you'll need to specify a relative URL in the dataset. This can be a useful feature if you need to access data from a specific location within a larger dataset.
The relative URL property is not required, but it can be useful for accessing specific data within a larger dataset.
Here is a summary of the properties supported for REST datasets:
The relative URL property can be used to access specific data within a larger dataset, and it's not required if you only need to access the URL specified in the linked service definition.
Mapping Properties
Mapping properties is a crucial step in configuration. The copy activity source section supports properties like those found in the REST source and sink.
To specify the source for your data, you'll need to list the properties supported by the REST source and sink, which include those found in the copy activity source section.
In data flows, REST is supported for both integration datasets and inline datasets. This means you can use it to map data flow properties.
For schema mapping, refer to the section on schema mapping to learn how to copy data from a REST endpoint to a tabular sink.
You'll need to map the data flow properties to ensure a smooth transfer of data.
Source Transformation
In source transformation, you can specify the HTTP method, which must be either GET or POST. This is a required property.
The HTTP method is crucial in determining how the data is retrieved from the source.
You can also specify a relative URL to the resource that contains the data. This is optional and can be used to combine with the URL specified in the linked service definition.
The relative URL is used to create a combined URL, which is used to retrieve the data.
If you need to add additional HTTP request headers, you can do so in the source transformation.
Additional HTTP request headers can be used to authenticate or authorize the request.
You can also specify a timeout for the HTTP request to get a response. The default value is 00:01:40.
The timeout value is the time it takes to get a response, not the time it takes to write the data.
To avoid overwhelming the source, you can specify an interval time between different requests in milliseconds. The interval value should be between 10 and 60000.
The request interval is used to control the rate at which requests are sent to the source.
Here is a summary of the properties you can use in source transformation:
Pagination and Response Handling
Pagination and Response Handling is a crucial aspect of working with Azure Data Factory's REST API. The generic REST connector supports various pagination patterns, including using the next request's absolute or relative URL, query parameter, or header based on values in the current response body or headers.
To configure pagination, you'll need to define a dictionary of pagination rules in your dataset. This dictionary contains case-sensitive key-value pairs that the connector will use to generate requests starting from the second page.
The connector will stop iterating when it receives an HTTP status code 204 (No Content) or when a JSONPath expression in the pagination rules returns null. Here are some supported keys in pagination rules:
Supported values in pagination rules include:
In mapping data flows, pagination rules are defined differently than in copy activity. Range is not supported, and instead of using "[]", you should use "{}" to escape special characters. The end condition is supported, but the condition syntax is different, using "body" to indicate the response body and "header" to indicate the response header.
When handling the response from a List By Factory endpoint, you can iterate over the results and process them as needed for your workflow. To avoid endless requests when the range rule is not defined, you can set an end condition, such as referring to the "paging.next" URL in the response structure.
Sources
- https://azureops.org/articles/call-graph-api-from-azure-data-factory/
- https://www.sqlshack.com/integrating-apis-into-data-pipelines-with-azure-data-factory/
- https://learn.microsoft.com/en-us/azure/data-factory/connector-rest
- https://zappysys.com/api/integration-hub/rest-api-connector/azure-data-factory
- https://www.getorchestra.io/guides/azure-data-factory-list-by-factory-api-endpoint-tutorial
Featured Images: pexels.com