by Haley Burton | September 07, 2021 | Last Updated: December 04, 2023

Cloud Data Crossing the Firewall: Connect Cloud, Hybrid Cloud, and On-Premises Data

While there are many reasons for cloud migration, most organizations migrating to the cloud continue to retain some of their existing on-premises applications and data, adopting a hybrid approach. In fact, according to a survey of 200 enterprises conducted by Everest Group, 72% of businesses described their cloud strategy as hybrid first.

What is a hybrid cloud environment?

While there is no single definition of a hybrid environment, the term refers to some mix of private cloud, public cloud, and on-premises data storage and IT systems.

Despite the many opportunities afforded by cloud data systems and SaaS applications, many organizations turn to a hybrid environment to comply with data privacy regulations and secure their data. For example, a healthcare provider may wish to store sensitive data on internal servers or on a private cloud network that leverages Amazon Web Services (AWS), Azure, or the Google Cloud Services Platform (GCP). Then, they can protect that data behind a firewall to address security concerns or data sovereignty regulations requiring them to store sensitive data in the country of origin.

But when these companies need to report on data (usually non-sensitive or anonymized) using modern cloud reporting tools, such as Looker Studio (formerly Google Data Studio) or Tableau Online. In many cases, they need to integrate data between their on-premises and cloud environments.

The benefits of a hybrid cloud approach

A hybrid cloud approach offers businesses a strategic advantage by combining the strengths of on-premises infrastructure with the flexibility of cloud services. One key benefit lies in the dynamic scalability it provides. Organizations can efficiently manage varying workloads by utilizing on-premises resources for consistent or predictable demands, and seamlessly expand into the cloud for growing data needs. This scalability ensures optimal performance and allows for cost savings, as resources can be provisioned and de-provisioned in response to changing requirements.

Another benefit is in security and compliance. Critical or sensitive workloads can be kept on-premises, providing organizations with greater control over security protocols and data privacy. Meanwhile, less sensitive tasks can be brought to the cloud, providing flexibility and ease of access. This tailored approach to data security infrastructure empowers organizations to maintain data sovereignty while navigating the changing requirements of the modern data landscape.

The challenges of hybrid cloud data integration

One of the main stumbling blocks in hybrid cloud data integration is managing connectivity across corporate firewalls. Modern firewalls are designed to prevent external access to internal resources. As firewalls block inbound requests most cloud-based integration solutions are simply unable to traverse the firewall and move data between cloud and on-premises systems.

While some pure-cloud ETL providers suggest that customers should open ports on their corporate firewall to internal systems, or setup complex tunneling, these solutions are hardly realistic. IT teams are loathe to make updates to firewall policies or implement tunneling. Other providers suggest the installation of internal agents for on-premises access, but ultimately whenever you expose the ability for external requests to internal resources, you expose a new security vulnerability in the process.

All pure cloud data movement tools are equally limited. In the rare case you find a solution to move data across cloud and on-prem environments, most tools do not allow organizations to securely control and granularly specify which data they transfer.

Moreover, cloud solutions do not allow organizations to move data outside the firewall selectively. They use an ELT strategy that extracts data, loads it all into a data warehouse, and then transforms the data as needed. To enable such a cloud-based solution to connect and grab data from inside the firewall, organizations would need to set up rules within the firewall, use as a VPN connection, or install an agent inside the firewall that can connect to their cloud solution.

That's why we built CData Sync. CData Sync offers a secure and easy-to-manage way to build data pipelines for enterprises grappling with data management, on-premesis or on the cloud, without the need for complex security configuration.

As a lightweight ETL/ELT solution, CData Sync can be installed anywhere and can automatically and intelligently push data, from any data source, to your cloud data lake or data warehouse of choice.

Hybrid cloud integration with CData Sync

Organizations need to circumvent the hurdle of traversing the firewall while controlling which data moves where. CData Sync addresses both issues.

CData Sync provides a data integration solution that can run anywhere. You can install Sync in the cloud to integrate data between cloud solutions. Or you can install it inside your firewall and use it to connect out to your cloud-based applications to push and pull data between on-premises and cloud solutions.

Example: SQL Server-Snowflake integration

For example, say you have SQL Server inside your firewall and want to move the data into a cloud Snowflake data warehouse. You simply install CData Sync inside your firewall, where it can connect to SQL Server without having to traverse the firewall. It can then connect out to Snowflake or some other cloud-based service and push data out to it. There's no need to set up a VPN or open a hole in the firewall to enable access to your data. CData Sync is easy enough to use that you need not involve IT to implement – all you need are credentials to connect to your applications and the database.

Enterprise-grade security

Unlike other ETL/ELT data integration solutions, CData Sync never sees your data. CData Sync is a pure data pipeline with an unmatched breadth of connectivity to hundreds of data sources and databases. The tool never sees or stores your data, instead enabling direct connections between your sources and target destinations.

Control the data you move for regulatory compliance

CData Sync enables you to meet regulatory and security requirements for data movement by allowing you to perform complex ETL (extract, transform, load) operations. CData Sync can perform transformations before data is loaded into the data warehouse, which enables you to move only the data you wish to move outside the firewall.

Though CData Sync provides powerful data transformations through traditional ETL, it's also a modern, lightweight solution that can push data transformations down to the underlying destination database when required. Its lightning-fast ELT (extract, load, transform) process delivers the highest performance data integration in the market.

Reduce latency - inside the firewall

CData Sync's ability to run next to your data source, whether that's on-premises or in the cloud, also has implications for latency.

Solutions like Fivetran that run in the cloud and have a VPN connection into your network must move the data from your cloud solution into the Fivetran cloud where it can be read, and then traverse the firewall, making two significant jumps.

Because CData Sync runs close to SQL Server, it can pull data into SQL Server much faster. CData Sync further improves performance by performing incremental loads. Whenever it moves data from the cloud to on-premises or vice versa, it only moves the data that has changed rather than moving a snapshot of the entire database.

Traverse the firewall with CData Sync

With CData Sync, you no longer have to struggle with firewall rules, VPNs, or agents to integrate data across your hybrid applications. Simply install CData Sync wherever you need it, on-premises or in the cloud without the help of IT, and you can move the right data quickly and easily wherever you need it.

To get started, download a free trial of CData Sync.