Replicate Hugging Face Data to Multiple Databases via the CData Sync CLI

Jerod Johnson
Jerod Johnson
Director, Technology Evangelism
Replicate Hugging Face data to disparate databases with a single configuration.

Always-on applications rely on automatic failover capabilities and real-time access to data. CData Sync for Hugging Face integrates live Hugging Face data into your mirrored databases, always-on cloud databases, and other databases such as your reporting server: Automatically synchronize with remote Hugging Face data from Windows or any machine running Java.

You can use Sync's command-line interface (CLI) to easily control almost all aspects of the replication. You can use the CLI to replicate Hugging Face data to one or many databases without any need to change your configuration.

Connect to Hugging Face Data

You can save connection strings and other settings like email notifications in XML configuration files.

The following example shows how to replicate to SQLite.

Windows

<?xml version="1.0" encoding="UTF-8" ?>
<CDataSync><DatabaseType>SQLite</DatabaseType>
  <DatabaseProvider>System.Data.SQLite</DatabaseProvider>
  <ConnectionString>Profile=C:\profiles\HuggingFace.apip;ProfileSettings='APIKey=hf_xxxxxxxxxxxxxxxxxxxx';</ConnectionString>
  <ReplicateAll>False</ReplicateAll>
  <NotificationUserName></NotificationUserName>
  <DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString>
  <TaskSchedulerStartTime>09:51</TaskSchedulerStartTime>
  <TaskSchedulerInterval>Never</TaskSchedulerInterval>
</CDataSync>

Java

<?xml version="1.0" encoding="UTF-8" ?>
<CDataSync><DatabaseType>SQLite</DatabaseType><DatabaseProvider>org.sqlite.JDBC</DatabaseProvider>
<ConnectionString>Profile=C:\profiles\HuggingFace.apip;ProfileSettings='APIKey=hf_xxxxxxxxxxxxxxxxxxxx';</ConnectionString>
<ReplicateAll>False</ReplicateAll>
<NotificationUserName></NotificationUserName>
<DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString>
</CDataSync>

HuggingFace Hub uses token-based authentication to enable access to its API. The API provides access to machine learning models, datasets, spaces, papers, and other resources on the HuggingFace Hub platform.

Using API Key Authentication

To authenticate to HuggingFace Hub, you will need to provide an API Key (Access Token). To obtain your access token:

  1. Log in to your HuggingFace account at https://huggingface.co
  2. Navigate to Settings > Access Tokens
  3. Click "New token" to create a new access token
  4. Select the appropriate permissions (read or write)
  5. Copy the token value

After obtaining your access token, set the following connection properties:

  • AuthScheme: Set this to APIKey.
  • APIKey: Set this to your HuggingFace access token.

Example connection string

Profile=C:\profiles\HuggingFace.apip;ProfileSettings='APIKey=hf_xxxxxxxxxxxxxxxxxxxx';

Configure Replication Queries

Sync enables you to control replication with standard SQL. The REPLICATE statement is a high-level command that caches and maintains a table in your database. You can define any SELECT query supported by the Hugging Face API. The statement below caches and incrementally updates a table of Hugging Face data:

REPLICATE Collections;

You can specify a file containing the replication queries. This enables you to use the same replication queries to replicate to several databases.

Run Sync

After you have configured the connection strings and replication queries, you can run Sync with the following command-line options:

Windows

APISync.exe -g MySQLiteConfig.xml -f APISync.sql

Java

java -Xbootclasspath/p:c:\sqlitejdbc.jar -jar APISync.jar -g MySQLiteConfig.xml -f APISync.sql

Ready to get started?

Learn more or sign up for a free trial:

CData Sync