Access Databricks Data with Entity Framework 6

Ready to get started?

Download for a free trial:

Download Now

Learn more:

Databricks ADO.NET Provider

Rapidly create and deploy powerful .NET applications that integrate with Databricks.



This article shows how to access Databricks data using an Entity Framework code-first approach. Entity Framework 6 is available in .NET 4.5 and above.

Entity Framework is an object-relational mapping framework that can be used to work with data as objects. While you can run the ADO.NET Entity Data Model wizard in Visual Studio to handle generating the Entity Model, this approach, the model-first approach, can put you at a disadvantage if there are changes in your data source or if you want more control over how the entities operate. In this article you will complete the code-first approach to accessing Databricks data using the CData ADO.NET Provider.

  1. Open Visual Studio and create a new Windows Form Application. This article uses a C# project with .NET 4.5.
  2. Run the command 'Install-Package EntityFramework' in the Package Manger Console in Visual Studio to install the latest release of Entity Framework.
  3. Modify the App.config file in the project to add a reference to the Databricks Entity Framework 6 assembly and the connection string.

    To connect to a Databricks cluster, set the properties as described below.

    Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

    • Server: Set to the Server Hostname of your Databricks cluster.
    • HTTPPath: Set to the HTTP Path of your Databricks cluster.
    • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
    <configuration> ... <connectionStrings> <add name="DatabricksContext" connectionString="Offline=False;Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;" providerName="System.Data.CData.Databricks" /> </connectionStrings> <entityFramework> <providers> ... <provider invariantName="System.Data.CData.Databricks" type="System.Data.CData.Databricks.DatabricksProviderServices, System.Data.CData.Databricks.Entities.EF6" /> </providers> <entityFramework> </configuration> </code>
  4. Add a reference to System.Data.CData.Databricks.Entities.EF6.dll, located in the lib -> 4.0 subfolder in the installation directory.
  5. Build the project at this point to ensure everything is working correctly. Once that's done, you can start coding using Entity Framework.
  6. Add a new .cs file to the project and add a class to it. This will be your database context, and it will extend the DbContext class. In the example, this class is named DatabricksContext. The following code example overrides the OnModelCreating method to make the following changes:
    • Remove PluralizingTableNameConvention from the ModelBuilder Conventions.
    • Remove requests to the MigrationHistory table.
    using System.Data.Entity; using System.Data.Entity.Infrastructure; using System.Data.Entity.ModelConfiguration.Conventions; class DatabricksContext : DbContext { public DatabricksContext() { } protected override void OnModelCreating(DbModelBuilder modelBuilder) { // To remove the requests to the Migration History table Database.SetInitializer<DatabricksContext>(null); // To remove the plural names modelBuilder.Conventions.Remove<PluralizingTableNameConvention>(); } }
  7. Create another .cs file and name it after the Databricks entity you are retrieving, for example, Customers. In this file, define both the Entity and the Entity Configuration, which will resemble the example below: using System.Data.Entity.ModelConfiguration; using System.ComponentModel.DataAnnotations.Schema; [System.ComponentModel.DataAnnotations.Schema.Table("Customers")] public class Customers { [System.ComponentModel.DataAnnotations.Key] public System.String City { get; set; } public System.String CompanyName { get; set; } }
  8. Now that you have created an entity, add the entity to your context class: public DbSet<Customers> Customers { set; get; }
  9. With the context and entity finished, you are now ready to query the data in a separate class. For example: DatabricksContext context = new DatabricksContext(); context.Configuration.UseDatabaseNullSemantics = true; var query = from line in context.Customers select line;