How to create an RPA flow for HDFS Data in UiPath Studio



Use the HDFS ODBC Driver to create workflows that access real-time HDFS data without any coding.

UiPath is a Robotic Process Automation (RPA) platform with rich features and an easy-to-use UI that enables non-developers to create process automation. By using UiPath Studio, you can build an RPA program just like drawing a diagram. With the CData ODBC Driver for HDFS, users can embed HDFS data in the workflow.

This article walks through using the HDFS ODBC Driver in UiPath Studio to create an RPA program that accesses HDFS data.

Configure the Connection to HDFS

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

Connect UiPath Studio to HDFS Data

Now you are ready to use HDFS data ODBC DSN in UiPath Studio with the following steps.

  1. From the Start page, click Blank to create a New Project.
  2. Click Manage Packages then search for and install UiPath.Database.Activities.
  3. Navigate to the Activities and drop a Flowchart (Workflow -> Flowchart -> Flowchart) onto the process.
  4. Drop a database Connect activity (App Integration -> Datbase -> Connect) after the Start activity.
  5. Double-click the Connect activity and configure the Connection.
    1. Click the Connection Wizard
    2. Select "Microsoft ODBC Data Source"
    3. In Connection Properties, select your DSN (CData HDFS Source) and click OK
  6. To store Connection info, create a variable and bind to Output in the Properties section. Choose DatabaseConnection in Output.

Create an Execute Query Activity

With the connection configured, we are ready to query HDFS data in our RPA.

  1. From the Activities navigation, select Execute Query and drop it on the Flowchart.
  2. Double-click the Execute Query activity and set the properties as follows:
    • ExistingDbConnection: Your Connection variable
    • Sql: SELECT statement like SELECT FileId, ChildrenNum FROM Files WHERE FileId = '119116'
    • DataTable: Create and use a variable with the Type System.Data.DataTable

Create Write CSV Activity

With the Connection and Execute Query activities configured, we are ready to add a Write CSV activity to the Flowchart to replicate the HDFS data.

  1. From the Activities navigation, select Write CSV and drop it after the Execute Query activity.
  2. Double-click the Write CSV activity and set the properties as follows:
    • FilePath: Set to a file (new or existing) on disk (i.e.: C:\UiPath[id]-data.csv
    • DataTable: Set to the DataTable variable you created earlier

Connect the Activities and Run the Flowchart

If they are not already connected, connect each Activity that you created to complete the RPA project for extracting HDFS data and exporting it to CSV.

Click Run to extract HDFS data and create a CSV file.

In this article, we used the CData ODBC Driver for HDFS to create an automation flow that accesses HDFS data in UiPath Studio. Download a free, 30-day trial of the ODBC Driver and start working with live HDFS data in UiPath Studio today!

Ready to get started?

Download a free trial of the HDFS ODBC Driver to get started:

 Download Now

Learn more:

HDFS Icon HDFS ODBC Driver

The HDFS ODBC Driver is a powerful tool that allows you to connect with live data from HDFS, directly from any applications that support ODBC connectivity.

Access HDFS data like you would a database - read, write, and update HDFS HDFSData, etc. through a standard ODBC Driver interface.