by CData Software | December 15, 2023

Generative AI in Data Integration

cdata virtuality

There’s absolutely no doubt that 2023 has been a breakthrough year for generative Artificial Intelligence (AI). It is everywhere and is already transforming business in terms of assisting people, improving productivity, and automating tasks. 

Unlocking generative AI: What is it and why does it matter?

So, what is generative AI? Generative Artificial Intelligence (AI) is a subset of deep learning where multi-layer neural network models generate new content such as text, images, audio, video, code, and synthetic data in response to natural language prompts based on what the models have learned from patterns in the content they were trained on.  Examples of these generative AI models are large language models, the most well-known of which is probably Open AI’s GPT4 which has been popularized by the emergence of the Open AI ChatGPT service.  Others also exist such as Google’s PaLM model. These Generative AI models are sometimes called ‘foundation models”.  Their foundation is that they are trained on vast amounts of public content on the internet.   

Real-world benefits: How generative AI boosts productivity

The benefits of generative AI are significant. One of the main benefits is improvements in productivity to get work done faster. For example, AI-powered conversational search, text generation, code generation, assisted metadata curation, and AI-automated actions all help to get things done quicker in response to natural language prompts.  

The rise of prompt-based language UIs: A game changer for user experience

It is not surprising therefore, to see prompt-based natural language user interfaces now emerging everywhere in tools and applications to make them much easier to use.  This broadens the use of tools within the enterprise by opening them up to lesser skilled business users who previously may have found the user interfaces of those tools too difficult to use. With natural language, more people can get involved which is especially important when there is a shortage of highly skilled professionals to meet business demand.  It also enables work to be delegated to AI ‘assistants’ to generate content or code and to act on the users’ behalf.  This applies whether it is to do with business tasks, data management tasks or explaining business insights and their business impact to drive value-added actions in everyday business. 

As you would expect, generative AI is already emerging in almost every area of data management as shown in the figure below. 

Generative AI

Revolutionizing data management: The role of generative AI

This includes data engineering, data virtualization, data catalogs, business glossaries, data marketplaces, data governance and more.  The introduction of generative AI in data management software tools make the following possible:

  • Conversational data search to find data and other artefacts in data catalogs
  • AI-generated descriptions for automated metadata enrichment to accelerate curation in data catalogs
  • Prompt-based data engineering
  • Automated code generation e.g., to validate, transform, integrate and access data
  • AI-generated explanations of code 
  • AI-generated explanations of data products in a data marketplace
  • AI-generated policies to help govern data
  • Automated synthetic sample data generation

Bridging the skills gap: How generative AI is paving the way for citizen data engineers

One area where there is a shortage of highly skilled people is in data integration and growth in demand for AI is set to cause an even bigger shortage in this area because AI needs data and data engineering. Most companies already have a bottleneck here because of a shortage of trained professional IT data engineers.  It is often the case that central IT data engineering teams can no longer keep pace with demand for more data from business users who want to add to what they already know. Companies therefore need to get more non-technical people involved in data integration across the business to deal with this bottleneck.

There is clear evidence emerging to confirm that the demand for data integration is set to become more acute. A survey by Pluralsight showed that Artificial intelligence and machine learning skills are the most in-demand cloud skills (23%) in 2023, up 16% on last year but that the largest cloud skills gaps exist in data, analytics, engineering (42%). The problem here is that AI adoption is slowed down by the lack of data engineering skills.  Also Randstad, the largest talent company in the world, recently posted their Work Monitor Survey that shows their own AI postings have increased by 2,000% since March 2023! A recent article in Computerworld also shows a surge in AI skills job postings.  Given this demand, citizen data engineers are needed to fill an already increasing gap in data integration skills.

The bottleneck of data integration: Where human skill meets AI potential

In addition, even in professional circles, data integration has always been one of the most time-consuming tasks since the birth of the data warehouse over thirty years ago. Therefore, improvements in productivity are needed and people are looking at generative AI as a way to help enable that, especially among non-technical ‘citizen’ data engineers. 

With so many data sources available, it is inevitable that almost every data scientist and non-technical business user will need to integrate data at some point. However, skill sets vary among non-technical people. Therefore, they need assistance to ensure what they are creating will correctly integrate the data they need and will run in a performant manner. This assistance comes in the form of generative AI like ChatGPT which can be used to help people: 

  • Find data
  • Generate code 
  • Explain data pipelines in natural language so they better understand what’s happening
  • Debug and optimize any code

This is true whether you are integrating data to build virtual or physical data products, machine learning models, or any analytical data store such as a data warehouse or a data mart. 

Beyond data creation: Generative AI in data consumption

These are users of generative AI when producing data but with the emergence of data products and data mesh, generative AI is also important on the data consumption side.  For example, non-technical business user data consumers may want natural language explanations of data in one or more data products or an explanation of the complex query transformations that produced these data products. They also want assistance to help them quickly produce their own queries if they need to integrate that data to create new virtual views and produce insights in BI tools. In this case, AI-generated natural language descriptions of physical and virtual data products together with AI-generated natural language explanations of queries makes a huge difference in improving the efficiency of non-technical users and their ability to develop error free, optimized queries. 

Generative AI

The future is bright: The next big milestones in generative AI

Generative AI is an exciting new technology that is helping to democratize and accelerate data management tasks including data engineering. It helps to lower the skills bar, broaden inclusion, accelerate development, make complex code in data pipelines understandable, and accelerate metadata creation and data governance. It also helps explain business insights and the business impact while shortening the time to value and the time to act.

Furthermore, tool vendors will likely implement this together with reinforcement learning so that ultimately everyone will get a self-learning AI assistant. 

To this end, we are excited to announce the beta release of our SQL AI Assistant powered by ChatGPT, integrated into our SaaS solution. We encourage you to test it and provide your feedback and ideas. Start your free 30-day trial and enable it in Preferences.

Want more insights? Check out our webinar on Navigating AI and Data Management

In the world of generative AI and data management, there’s a whole world of possibilities waiting to be uncovered. While AI is all the buzz these days, putting it to work in organizations is still a work in progress. To help you bridge the gap and get a handle on AI’s true power, we invite you to check out our one-hour webinar recording with Mike Ferguson and Nick Golovin. We’ll go deep into the practical use cases of AI and show you how it can transform your business.

Watch the recording