Databricks: Option to Disable SELECT version() Execution
The CData Driver for Databricks now includes an option to disable the execution of SELECT version() queries during connection establishment. This feature is designed to enhance performance in high-throughput environments by reducing unnecessary query overhead.
Background
By default, the CData Databricks driver executes a SELECT version() query once per connection to ensure the SQL Warehouse is active and responsive. While this provides connection reliability, it can impact performance in environments with:
- High-frequency connection establishment
- Multiple concurrent threads
- Performance-sensitive applications where connection speed is critical
Connection Performance Benefits
Disabling the SQL Warehouse availability check provides the following advantages:
- Reduced Connection Time: Eliminates the overhead of version query execution
- Improved Throughput: Faster connection establishment in multi-threaded applications
- Lower Resource Usage: Reduces query load on the Databricks SQL Warehouse
- Enhanced Scalability: Better performance when establishing many concurrent connections
Configuration
The new CheckSQLWarehouseAvailability property can be set to False to disable the SELECT version() execution.
This can be configured in your connection string:
- Property: CheckSQLWarehouseAvailability
- Type: Boolean
- Default: True
- Location: Advanced Properties
- Description: Specifies whether to check if the Databricks SQL Warehouse is available during connection
When set to False, the driver will not execute the SELECT version() query, allowing for faster connection times.
Connection String Configuration
Add the CheckSQLWarehouseAvailability property to your connection string:
jdbc:databricks://[ServerHostname]:443/[HTTPPath];Token=[PersonalAccessToken];CheckSQLWarehouseAvailability=False;
DSN and Other Wizard Configuration
For ODBC connections and other platforms with a Connection Wizard, add the property to in the UI:
- (ODBC) Open the ODBC Data Source Administrator
- Select your Databricks connection and click Configure/Edit/etc.
- Navigate to the Advanced section
- Set CheckSQLWarehouseAvailability to False
- Click OK/Save/etc. to save the configuration
Implementation Examples
JDBC Connection Example
import java.sql.Connection;
import java.sql.DriverManager;
public class DatabricksConnection {
public static void main(String[] args) {
String url = "jdbc:databricks://your-server:443/your-http-path;" +
"Token=your-personal-access-token;" +
"CheckSQLWarehouseAvailability=False;";
try (Connection conn = DriverManager.getConnection(url)) {
// Connection established without SELECT version() check
System.out.println("Connected successfully with optimized performance");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Python Connection Example
import jaydebeapi
# Connection parameters with performance optimization
connection_url = "jdbc:databricks://your-server:443/your-http-path"
connection_properties = {
"Token": "your-personal-access-token",
"CheckSQLWarehouseAvailability": "False"
}
# Establish connection
conn = jaydebeapi.connect(
"cdata.jdbc.databricks.DatabricksDriver",
connection_url,
connection_properties,
"path/to/cdata.jdbc.databricks.jar"
)
print("Connection established with enhanced performance")
conn.close()
Use Cases
This optimization is particularly beneficial for:
- ETL Processes: High-volume data processing applications
- Microservices: Applications that establish frequent short-lived connections
- Connection Pooling: Environments where connection pool initialization speed is critical
- Multi-tenant Applications: Systems serving multiple concurrent users
Try It Yourself
CData enables seamless connectivity to Databricks from reporting tools, databases, and custom applications through best-in-class standards-based drivers. Easily integrate Databricks data with BI, Reporting, Analytics, ETL tools, and custom solutions. Download the free 30-day trial for the CData Databricks Driver & get started today!