Ready to get started?

Learn more about CData Cloud Hub or sign up for a free trial:

Learn More

Connect to Spark Data as a Federated Tables in MySQL

Use the CData Cloud Hub to set up federated tables for Spark data in MySQL .

You can use the CData Cloud Hub to set up federated tables in MySQL for Spark data. The Cloud Hub provides a MySQL interface for Spark: After configuring a virtual MySQL database for Spark, you can create a server and tables using the FEDERATED Storage Engine in MySQL. You can then work with Spark data just as you would local MySQL tables.

The CData Cloud Hub provides a pure MySQL, cloud-to-cloud interface for Spark, allowing you to easily query live Spark data alongside existing MySQL data — all without replicating the data. Using optimized data processing out of the box, the CData Cloud Hub pushes all supported SQL operations (filters, JOINs, etc) directly to Spark, leveraging server-side processing to quickly return Spark data.

Create a Virtual MySQL Database for Spark Data

You can use any MySQL client to connect to the CData Cloud Hub and create virtual databases.

  1. Connect to the CData Cloud Hub:
    mysql --host --user admin --password
  2. Once authenticated, create the virtual database for Spark:
    mysql> CREATE DATABASE sparkdb
        -> DRIVER = "SparkSQL",
        -> DBURL = "Server=;";

With the virtual database created, you are ready to connect to Spark data from any MySQL client.

Create a FEDERATED Server and Tables for Spark Data

After you have configured and started the service, create a FEDERATED server to simplify the process of creating FEDERATED tables:

Create a FEDERATED Server

The following statement will create a FEDERATED server based on the Cloud Hub. Note that the username and password of the FEDERATED server must match a user account you defined on the Cloud Hub.

OPTIONS (USER 'cloud_hub_user', PASSWORD 'cloud_hub_passwd', HOST '', PORT 3306, DATABASE 'sparkdb');

Create a FEDERATED Table

To create a FEDERATED table using our newly created server, use the CONNECTION keyword and pass the name of the FEDERATED server and the remote table (Customers). Refer to the following template for the statement to create a FEDERATED table:

CREATE TABLE fed_customers (
  city  TYPE(LEN),
  balance  TYPE(LEN),

NOTE: The table schema for the FEDERATED table must match the remote table schema exactly. You can always connect directly to the Cloud Hub using any MySQL client and run SHOW COLUMNS FROM Customers to get the table schema.

Execute Queries

You can now execute queries to the Spark FEDERATED tables from any tool that can connect to MySQL, which is particularly useful if you need to JOIN data from a local table with data from Spark. Refer to the following example:

  local_table.foreign_city =;