Query Hugging Face Data in DataGrip

Jerod Johnson
Jerod Johnson
Director, Technology Evangelism
Create a Data Source for Hugging Face in DataGrip and use SQL to query live Hugging Face data.

DataGrip is a database IDE that allows SQL developers to query, create, and manage databases. When paired with the CData API Driver for JDBC, DataGrip can work with live Hugging Face data. This article shows how to establish a connection to Hugging Face data in DataGrip.

Create a New Driver Definition for Hugging Face

The steps below describe how to create a new Data Source in DataGrip for Hugging Face.

  1. In DataGrip, click File -> New > Project and name the project
  2. In the Database Explorer, click the plus icon () and select Driver.
  3. In the Driver tab:
    • Set Name to a user-friendly name (e.g. "CData Hugging Face Driver")
    • Set Driver Files to the appropriate JAR file. To add the file, click the plus (), select "Add Files," navigate to the "lib" folder in the driver's installation directory and select the JAR file (e.g. cdata.jdbc.api.jar).
    • Set Class to cdata.jdbc.api.API.jar
    Additionally, in the advanced tab you can change driver properties and some other settings like VM Options, VM environment, VM home path, DBMS, etc
    • For most cases, change the DBMS type to "Unknown" in Expert options to avoid native SQL Server queries (Transact-SQL), which might result in an invalid function error
  4. Click "Apply" then "OK" to save the Connection

Configure a Connection to Hugging Face

  1. Once the connection is saved, click the plus (), then "Data Source" then "CData Hugging Face Driver" to create a new Hugging Face Data Source.
  2. In the new window, configure the connection to Hugging Face with a JDBC URL.

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the Hugging Face JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

          java -jar cdata.jdbc.api.jar
        

    Fill in the connection properties and copy the connection string to the clipboard.

    HuggingFace Hub uses token-based authentication to enable access to its API. The API provides access to machine learning models, datasets, spaces, papers, and other resources on the HuggingFace Hub platform.

    Using API Key Authentication

    To authenticate to HuggingFace Hub, you will need to provide an API Key (Access Token). To obtain your access token:

    1. Log in to your HuggingFace account at https://huggingface.co
    2. Navigate to Settings > Access Tokens
    3. Click "New token" to create a new access token
    4. Select the appropriate permissions (read or write)
    5. Copy the token value

    After obtaining your access token, set the following connection properties:

    • AuthScheme: Set this to APIKey.
    • APIKey: Set this to your HuggingFace access token.

    Example connection string

    Profile=C:\profiles\HuggingFace.apip;ProfileSettings='APIKey=hf_xxxxxxxxxxxxxxxxxxxx';
    
  3. Set URL to the connection string, e.g.,
    jdbc:api:Profile=C:\profiles\HuggingFace.apip;ProfileSettings='APIKey=hf_xxxxxxxxxxxxxxxxxxxx';
  4. Click "Apply" and "OK" to save the connection string

At this point, you will see the data source in the Data Explorer.

Execute SQL Queries Against Hugging Face

To browse through the Hugging Face entities (available as tables) accessible through the JDBC Driver, expand the Data Source.

To execute queries, right click on any table and select "New" -> "Query Console."

In the Console, write the SQL query you wish to execute. For example:

SELECT ,  FROM Collections WHERE  = ''

Download a free, 30-day trial of the CData API Driver for JDBC and start working with your live Hugging Face data in DataGrip. Reach out to our Support Team if you have any questions.

Ready to get started?

Connect to live data from Hugging Face with the API Driver

Connect to Hugging Face