Products

Solutions

Connectors

Support

Company

Resources

Spark

Apache Spark Python Connector

Name: CData Python Connector for Apache Spark
Price: 599 USD
Rating: 5 (142 reviews)
Author: CData Software

Read, Write, and Update Spark with Python

Easily connect Python-based Data Access, Visualization, ORM, ETL, AI/ML, and Custom Apps with Apache Spark SQL!

download buy now

Other Technologies

Python Connector Libraries for Apache Spark Data Connectivity. Integrate Apache Spark with popular Python tools like Pandas, SQLAlchemy, Dash & petl. Easy-to-use Python Database API (DB-API) Modules connect Spark data with Python and any Python-based applications.

Features

Maps SQL to Spark SQL, enabling direct standard SQL-92 access to Apache Spark
Fully compatible with the DataBricks Enterprise Platform
Connect to live Apache Spark SQL data, for real-time data access
Full support for data aggregation and complex JOINs in SQL queries
Secure connectivity through modern cryptography, including TLS 1.2, SHA-256, ECC, etc.
Seamless integration with leading BI, reporting, and ETL tools and with custom applications

Specifications

Python Database API (DB-API) Modules for Spark.
Write SQL, get Apache Spark SQL data. Access Spark through standard Python Database Connectivity.
Integration with popular Python tools like Pandas, SQLAlchemy, Dash & petl.
Maps SQL to Spark SQL, enabling direct standard SQL-92 access to Apache Spark.
Full Unicode support for data, parameter, & metadata.

CData Python Connectors in Action!

Watch the video overview for a first hand-look at the powerful data integration capabilities included in the CData Python Connectors.

WATCH THE PYTHON CONNECTOR VIDEO OVERVIEW

Python Connectivity with Apache Spark SQL

Full-featured and consistent SQL access to any supported data source through Python

Universal Python Spark Connectivity
Easily connect to Spark data from common Python-based frameworks, including:
- Data Analysis/Visualization: Jupyter Notebook, pandas, Matplotlib
- ORM: SQLAlchemy, SQLObject, Storm
- Web Applications: Dash, Django
- ETL: Apache Airflow, Luigi, Bonobo, Bubbles, petl
Popular Tooling Integration

The Spark Connector integrates seamlessly with popular data science and developer tooling like Anaconda, Visual Studio Python IDE, PyCharm, and more. Real Python,
Replication and Caching

Our replication and caching commands make it easy to copy data to local and cloud data stores such as Oracle, SQL Server, Google Cloud SQL, etc. The replication commands include many features that allow for intelligent incremental updates to cached data.
String, Date, Numeric SQL Functions

The Spark Connector includes a library of 50 plus functions that can manipulate column values into the desired result. Popular examples include Regex, JSON, and XML processing functions.

Collaborative Query Processing

Our Python Connector enhances the capabilities of Spark with additional client-side processing, when needed, to enable analytic summaries of data such as SUM, AVG, MAX, MIN, etc.
Easily Customizable and Configurable

The data model exposed by our Spark Connector can easily be customized to add or remove tables/columns, change data types, etc. without requiring a new build. These customizations are supported at runtime using human-readable schema files that are easy to edit.
Enterprise-class Secure Connectivity

Includes standard Enterprise-class security features such as TLS/ SSL data encryption for all client-server communications.

Connecting to Spark with Python

CData Python Connectors leverage the Database API (DB-API) interface to make it easy to work with Spark from a wide range of standard Python data tools. Connecting to and working with your data in Python follows a basic pattern, regardless of data source:

Configure the connection properties to Spark
Query Spark to retrieve or update data
Connect your Spark data with Python data tools.

Connecting to Spark in Python

To connect to your data from Python, import the extension and create a connection:

import cdata.sparksql as mod
conn = mod.connect("[email protected]; Password=password;")

#Create cursor and iterate over results
cur = conn.cursor()
cur.execute("SELECT * FROM ApacheSpark")
 
rs = cur.fetchall()
 
for row in rs:
print(row)

Once you import the extension, you can work with all of your enterprise data using the python modules and toolkits that you already know and love, quickly building apps that help you drive business.

Visualize Spark Data with pandas

The data-centric interfaces of the Spark Python Connector make it easy to integrate with popular tools like pandas and SQLAlchemy to visualize data in real-time.

engine = create_engine("sparksql///Password=password&User=user")

df = pandas.read_sql("SELECT * FROM ApacheSpark", engine)

df.plot()
plt.show()