Integrate Spark Data in Your Informatica Cloud Instance

詳細情報をご希望ですか?

無償トライアル:

ダウンロードへ

製品の詳細情報へ:

Apache Spark JDBC Driver

Apache Spark 連携のパワフルなJava アプリケーションを素早く作成して配布。



Use the CData JDBC Driver for Spark with the Informatica Cloud Secure Agent to access live Spark data from Informatica Cloud.

Informatica Cloud allows you to perform extract, transform, and load (ETL) tasks in the cloud. With the Cloud Secure Agent and the CData JDBC Driver for Spark, you get live access to Spark data, directly within Informatica Cloud. In this article, we will walk through downloading and registering the Cloud Secure Agent, connecting to Spark through the JDBC Driver and generating a mapping that can be used in any Informatica Cloud process.

Informatica Cloud Secure Agent

To work with the Spark data through the JDBC Driver, install the Cloud Secure Agent.

  1. Navigate to the Administrator page in Informatica Cloud
  2. Select the Runtime Environments tab
  3. Click "Download Secure Agent"
  4. Make note of the Install Token
  5. Run the installer on the client machine and register the Cloud Secure Agent with your username and install token

NOTE: It may take some time for all of the Cloud Secure Agent services to get up and running.

Connecting to the Spark JDBC Driver

With the Cloud Secure Agent installed and running, you are ready to connect to Spark through the JDBC Driver. Start by clicking the Connections tab and clicking New Connection. Fill in the following properties for the connection:

  • Connection Name: Name your connection (i.e.: CData Spark Connection)
  • Type: Select "JDBC_IC (Informatica Cloud)"
  • Runtime Environment: Select the runtime environment where you installed the Cloud Secure Agent
  • JDBC Connection URL: Set this to the JDBC URL for Spark. Your URL will look similar to the following:

    jdbc:sparksql:Server=127.0.0.1;

    SparkSQL への接続

    SparkSQL への接続を確立するには以下を指定します。

    • Server:SparkSQL をホストするサーバーのホスト名またはIP アドレスに設定。
    • Port:SparkSQL インスタンスへの接続用のポートに設定。
    • TransportMode:SparkSQL サーバーとの通信に使用するトランスポートモード。有効な入力値は、BINARY およびHTTP です。デフォルトではBINARY が選択されます。
    • AuthScheme:使用される認証スキーム。有効な入力値はPLAIN、LDAP、NOSASL、およびKERBEROS です。デフォルトではPLAIN が選択されます。

    Databricks への接続

    Databricks クラスターに接続するには、以下の説明に従ってプロパティを設定します。Note:必要な値は、「クラスター」に移動して目的のクラスターを選択し、 「Advanced Options」の下にある「JDBC/ODBC」タブを選択することで、Databricks インスタンスで見つけることができます。

    • Server:Databricks クラスターのサーバーのホスト名に設定。
    • Port:443
    • TransportMode:HTTP
    • HTTPPath:Databricks クラスターのHTTP パスに設定。
    • UseSSL:True
    • AuthScheme:PLAIN
    • User:'token' に設定。
    • Password:個人用アクセストークンに設定(値は、Databricks インスタンスの「ユーザー設定」ページに移動して「アクセストークン」タブを選択することで取得できます)。

    Built-In Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the Spark JDBC Driver. Either double-click the .jar file or execute the .jar file from the command-line.

    java -jar cdata.jdbc.sparksql.jar

    Fill in the connection properties and copy the connection string to the clipboard.

  • JDBC Jar Directory: Set this to the lib folder in the installation location for the JDBC Driver (on Windows, typically C:\Program Files\CData\CData JDBC Driver for Spark\)
  • Driver Class: Set this to cdata.jdbc.sparksql.SparkSQLDriver
  • Username: Set this to a placeholder value (since Spark does not require a username)
  • Password: Set this to a placeholder value (since Spark does not require a password)

Create a Mapping for Spark Data

With the connection to Spark configured, we can now access Spark data in any Informatica process. The steps below walk through creating a mapping for Spark to another data target.

  1. Navigate to the Data Integration page
  2. Click New.. and select Mapping from the Mappings tab
  3. Click the Source Object and in the Source tab, select the Connection and set the Source Type
  4. Click "Select" to choose the table to map
  5. In the Fields tab, select the fields from the Spark table to map
  6. Click the Target object and configure the Target source, table and fields. In the Field Mapping tab, map the source fields to the target fields.

With the mapping configured, you are ready to start integrating live Spark data with any of the supported connections in Informatica Cloud. Download a free, 30-day trial of the CData JDBC Driver for Spark and start working with your live Spark data in Informatica Cloud today.