Azure Data Factory (ADF) is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines. It's a powerful tool for integrating data from various sources, including Oracle databases.
ADF supports Oracle as a source and sink data store, enabling you to extract data from Oracle databases and load it into other data stores like Azure SQL Database or Azure Blob Storage. You can also use ADF to migrate data from Oracle to Azure.
To get started with ADF and Oracle, you'll need to create an ADF pipeline that includes an Oracle source dataset and a sink dataset. This will allow you to define the data transformation and movement between the two datasets.
Prerequisites
To use Azure Data Factory with Oracle, you'll first need to meet some prerequisites.
If your data store is located inside an on-premises network, you'll need to configure a self-hosted integration runtime to connect to it.
You can also use the Azure Integration Runtime if your data store is a managed cloud data service.
To access an on-premises network, you can add Azure Integration Runtime IPs to the allow list if the access is restricted to approved IPs in the firewall rules.
The managed virtual network integration runtime feature in Azure Data Factory can also be used to access the on-premises network without installing a self-hosted integration runtime.
Data access strategies provide more information about the network security mechanisms and options supported by Data Factory.
Create a Linked Service
To create a linked service to Oracle in Azure Data Factory, start by browsing to the Manage tab and selecting Linked Services, then click New. You can also create a linked service to Oracle Cloud Storage by following similar steps.
To create a linked service to Oracle, you need to search for Oracle and select the Oracle connector. Configure the service details, test the connection, and create the new linked service.
When creating a linked service to Oracle, you'll need to specify the type property, which must be set to Oracle. The connectionString property requires the information needed to connect to the Oracle Database instance.
You can also store the password in Azure Key Vault and pull the password configuration out of the connection string. To enable encryption on Oracle connection, you can use Triple-DES Encryption (3DES) and Advanced Encryption Standard (AES) on the Oracle server side, or set up truststore for SSL server authentication.
Here are the supported properties for Oracle linked services:
For Oracle Cloud Storage, you'll need to specify the type property, which must be set to OracleCloudStorage. The accessKeyId and secretAccessKey properties are also required, along with the serviceUrl property.
Source and Destination
Azure Data Factory (ADF) provides a scalable and secure way to integrate Oracle data sources with various data stores and services.
In ADF, the source and destination are two key concepts that determine the flow of data. The source is the data store or service that contains the data you want to process, while the destination is the location where you want to store the processed data.
To set up a source in ADF, you need to specify the Oracle database connection details, such as the server name, username, and password. This allows ADF to connect to the Oracle database and retrieve the data you need.
Source
The source of your data is crucial in determining how efficiently you can load it into your system. To load data from Oracle, you'll need to set the source type in the copy activity to OracleSource.
The type property of the copy activity source must be set to OracleSource. This is a required property, so don't forget to include it.
You can use a custom SQL query to read data from Oracle by setting the oracleReaderQuery property. For example, you could use the query "SELECT * FROM MyTable" to retrieve all rows from a table named MyTable.
If you're using a custom query, you can enable partitioned load by hooking in any corresponding built-in partition parameters. This is especially useful when you're dealing with large datasets.
Oracle NUMBER type with zero or unspecified scale will be converted to corresponding integer if you set the convertDecimalToInteger property to true.
Here are the properties you can use to specify data partitioning options:
The degree of parallelism to concurrently load data from an Oracle database is controlled by the parallelCopies setting on the copy activity when a partition option is enabled.
Destination
A destination is the endpoint of a data flow, where the processed data is stored or used. It's a crucial part of the data pipeline.
A destination can be a database, a data warehouse, or even a file system. Data is written to a destination in a structured format.
The destination is often determined by the source system, which is the system that originates the data. This ensures that the data is processed correctly and efficiently.
Some common examples of destinations include Amazon S3, Google Cloud Storage, and Apache Cassandra. These destinations provide scalable and secure storage for large amounts of data.
In a data pipeline, the destination is the final step, where the processed data is used for analysis, reporting, or other purposes.
Type Mapping and Capabilities
The Azure Data Factory Oracle connector supports various data types, including BFILE, BLOB, and BYTE[]. These data types are mapped to interim data types, such as Byte[] and String, depending on the Oracle version.
The connector also supports copying files as is or parsing files with supported file formats and compression codecs. This is made possible by Oracle Cloud Storage's S3-compatible interoperability.
The following table shows the supported capabilities of the Azure Data Factory Oracle connector:
① Azure integration runtime ② Self-hosted integration runtime
Type Mapping
Oracle uses interim data type mappings when copying data from and to the database. These mappings are used to convert Oracle data types to interim data types, which are then used by the service.
BFILE is mapped to Byte[] in Oracle, as well as in .NET Framework and the ODBC interface. BLOB is also mapped to Byte[], but only supported on Oracle 10g and higher.
Oracle's CHAR data type is mapped to String in both interim data type mappings and .NET Framework type mappings. CLOB is mapped to String in interim data type mappings and to String in .NET Framework type mappings.
DATE is mapped to DateTime in both interim data type mappings and .NET Framework type mappings. FLOAT is mapped to Decimal in .NET Framework type mappings, and to Decimal or String if precision > 28 in interim data type mappings.
INTEGER is mapped to Decimal or String if precision > 28 in interim data type mappings, and to Decimal in .NET Framework type mappings. LONG is mapped to String in interim data type mappings and to String in .NET Framework type mappings.
The following table shows the mapping of ANSI data types to Oracle data types using the ODBC interface:
Supported Capabilities
The Oracle Cloud Storage connector is a powerful tool that allows for seamless integration with various capabilities. It's supported for copy activity, lookup activity, GetMetadata activity, and delete activity.
These capabilities are a result of the connector's ability to take advantage of Oracle Cloud Storage's S3-compatible interoperability. This means you can copy files as is or parse files with supported file formats and compression codecs.
The supported capabilities are as follows:
The connector supports both Azure integration runtime and self-hosted integration runtime.
Installation and Setup
To set up Azure Data Factory for Oracle, you'll need to install the Data Management Gateway on a machine that hosts the database or on a separate machine to avoid competing for resources with the database. The Data Management Gateway is a software that connects on-premises data sources to cloud services in a secure and managed way.
You'll also need to install Oracle Data Access Components (ODAC) for Windows on the host machine where the gateway is installed. This will enable the connection to your on-premises Oracle database.
Here's a quick rundown of the necessary installations:
- Data Management Gateway
- Oracle Data Access Components (ODAC) for Windows
Installation
To set up the Azure Data Factory service, you'll need to install a few essential components.
First, you'll need to install the Data Management Gateway on the same machine that hosts your on-premises Oracle database or on a separate machine to avoid competing for resources with the database.
Data Management Gateway is a software that connects on-premises data sources to cloud services in a secure and managed way.
You'll also need to install Oracle Data Access Components (ODAC) for Windows on the host machine where the gateway is installed.
Here are the specific components you'll need to install:
- Data Management Gateway
- Oracle Data Access Components (ODAC) for Windows
Integration Runtime
To set up the integration runtime, you'll need to log in to the Azure portal using your specified credentials. The integration runtime is a compute infrastructure used for Synapse pipeline and Azure Data Factory to provide data integration capabilities across multiple network environments.
You can start by searching for Azure Data Factory in the Azure portal. After selecting the data factory, you'll need to open the Azure Data Factory service.
To integrate with an Oracle database, you'll need to select the integration runtime from the Azure Data Factory homepage. This is a crucial step in setting up the integration runtime.
Next, you'll need to select the type of Azure integration you want to use. This will determine the capabilities of your integration runtime. You can choose from various options, but for this example, we'll assume you're using a specific type of integration.
After selecting the Azure integration, you'll need to define a name for it. This name will be used to identify your integration runtime in the Azure portal. Simply type in a name and click the "Create" button to create your integration runtime.
The integration runtime is now created and ready to use. You can verify this by checking the Azure portal for the newly created integration runtime.
Grow with Managed
Growing your business is a big decision, and it's essential to make smart investments along the way.
Companies that use Invantive software can rest assured that their investments in reports and Azure Data Factory knowledge will be retained as their business grows.
Invantive software is used by companies of all sizes, from small to large, and even by the top-30 accountants.
Investing in Invantive is a safe bet for your business's future, as it will continue to support your growth.
If you choose to add another software package besides Oracle (managed), you'll find a connector first at Invantive.
Frequently Asked Questions
How to connect ADF to Oracle?
To connect ADF to Oracle, navigate to the Manage tab in your Azure Data Factory or Synapse workspace and create a new Linked Service using the Oracle Service Cloud connector. This will establish a connection between ADF and your Oracle database.
What is Azure Data Factory based on?
Azure Data Factory is a cloud-based service that enables data integration and automation. It's built on a cloud platform, allowing for seamless data movement and transformation.
Sources
- https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle
- https://github.com/Huachao/azure-content/blob/master/articles/data-factory/data-factory-onprem-oracle-connector.md
- https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle-cloud-storage
- https://www.educba.com/azure-data-factory-oracle/
- https://cloud.invantive.com/en/oracle/azure-data-factory-connector
Featured Images: pexels.com