Transfer Data from Excel to Spark

Ready to get started?

Download a free trial:

Download Now

Learn more:

Excel Add-In for Apache Spark

The Apache Spark Excel Add-In is a powerful tool that allows you to connect with Apache Spark data, directly from Microsoft Excel.

The Add-In maps SQL queries to Spark SQL, enabling direct standard SQL-92 access to Apache Spark. Perfect for mass exports, Excel-based data analysis, and more!

This article explains how to transfer data from Excel to Spark using the Excel Add-In for Spark.

The CData Excel Add-In for Spark enables you to edit and save Spark data directly from Excel. This article explains how to transfer data from Excel to Spark. This technique is useful if you want to work on Spark data in Excel and update changes, or if you have a whole spreadsheet you want to import into Spark. In this example, you will use the Customers table; however, the same process will work for any table that can be retrieved by the CData Excel Add-In.

Establish a Connection

If you have not already done so, create a new Spark connection by clicking From Spark on the ribbon.

Set the Server, Database, User, and Password connection properties to connect to SparkSQL.

Retrieve Data from Spark

To insert data into Spark, you will first need to retrieve data from the Spark table you want to add to. This links the Excel spreadsheet to the Spark table selected: After you retrieve data, any changes you make to the data are highlighted in red.

  1. Click the From Spark button on the CData ribbon. The Data Selection wizard is displayed.
  2. In the Table or View menu, select the Customers table.
  3. In the Maximum Rows menu, select the number of rows you want to retrieve. If you want to insert rows, you need to retrieve only one row. The Query box will then display the SQL query that corresponds to your request.
  4. In the Sheet Name box, enter the name for the sheet that will be populated. By default the add-in will create a new sheet with the name of the table.

Insert Rows to Spark

After retrieving data, you can add data from an existing spreadsheet in Excel.

  1. In a cell after the last row, enter a formula referencing the corresponding cell from the other spreadsheet; for example, =MyCustomersSheetInExcel!A1.
  2. After using a formula to reference the cells you want to add to Spark, select the cells that you are inserting data into and drag the formula down as far as needed. The referenced values you want to add will be displayed on the Customers sheet.
  3. Highlight the rows you want to insert and click the Update Rows button.

As each row is inserted, the Id value will appear in the Id column and the row's text will change to black, indicating that the record has been inserted.