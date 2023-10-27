Lalit Ahuja is Chief Product and Customer Officer GridGain System,

In a previous article, I discussed redefining the challenge facing companies that want to become data-driven. The way most people think about this problem – and the most commonly proposed solution – is to keep all the data in one place, such as a data lake.

There are challenges to this strategy, the biggest of which is that while a data lake makes storing data economical, the retrieval and analysis of that data can be slow and cumbersome, making the data lake impractical for low-latency analytical needs. goes.

Instead, let’s think about the problem as simply the need to have real-time access to all relevant data across the enterprise and external sources to create a cross-sectional view and analyze the entire set of data needed to make informed decisions. Enables.

With this approach, the only requirements are: 1. The need to access data from multiple internal and external sources in real time, and 2. The ability to curate relevant sections of this data and access them quickly.

In this article, I will discuss one strategy to address these needs: a Data Integration Hub (DIH).

What is Data Integration Hub?

Data integration centers have been around for some time and have been successfully adopted by companies with extreme data processing speed and scale requirements, particularly in the financial services and insurance industries.

The DIH architecture creates a common data-access layer that aggregates different types of data from multiple on-premises, cloud-based, and streaming sources. Multiple business applications can then access relevant parts of the aggregated data – ideally, cached in an in-memory data grid for real-time processing.

There are several capabilities underpinning the DIH architecture:

• A multi-model datastore with a standards-based API layer that synchronizes data with disparate back-end sources or systems of record.

• A high-performance and scalable data access layer, supporting all types of data-interaction APIs, including SQL; Non-SQL like Java, C#, Python, Scala, etc.; Or RESTful API.

• A data-integrity management mechanism, similar to ACID support.

• A robust security and access control framework to support secure and controlled data access by different audiences.

A distributed, in-memory platform that combines the features listed above would be an example of DIH to address business needs for low-latency access to large amounts of data in an enterprise data ecosystem.

Why deploy a data integration hub?

Today, the need for extreme data processing speed and scale is spreading to a large number of companies in healthcare, telecommunications, retail, logistics and travel, as well as many other industries.

The reasons for this proliferation are simple:

1. With the ability to process such large amounts of data quickly, there is more data available to enterprises through many different sources.

2. Within companies, the number of use cases for real-time processing increases once the first use case proves successful.

3. Especially recently, almost every company has started looking for ways to exploit AI and Generative AI to accelerate innovation, improve productivity, and enhance customer experiences.

DIH architecture can help achieve these cross-sectional data access goals at scale for huge amounts of information. The DIH layer separates systems of record from consuming applications, allowing applications and underlying systems to evolve at their own pace without relying on or influencing other technology components.

This capability is fundamental to another important initiative in many organizations: developing the ability to migrate individual on-premises components to the cloud or switch cloud service providers or consumers anyway.

What are the limitations of Data Integration Hub?

By definition, a DIH creates a low-latency data access layer across multiple systems of record. Therefore, this naturally creates a separation between data read workloads (queries) and transactional data writes (changes to data). This separation – known as Command Query Response Segregation (CQRS) – introduces a degree of latency as well as some complexities associated with data synchronization across the two systems.

Data integration centers are also primarily a source of cross-enterprise data, and someone still has to pull data from these data integration centers to wherever the actual processing and analysis of such data occurs. This means that DIH may fall slightly short of what today’s real-time data use cases require – not only in speed, scale and performance in accessing data but also in processing large amounts of data.

conclusion

Think of DIH as a “data bus” similar to an enterprise service bus (ESB), essentially creating a scalable and flexible plug-and-play architecture.

The first step to becoming data-driven is to gain access to all available data – once you have access, you have the opportunity to observe, evaluate, and analyze that data. If your organization is on a journey to becoming data-driven and is facing challenges in accessing data at speed and scale, DIH provides a potential first step as a minimally intrusive and decoupled architecture for high-speed access to relevant data. Does, no matter where he lives.

It is also now possible to exceed the above DIH limits. I’ve referenced the term “enterprise data ecosystem” a few times in this article. In my next article, I will examine what the enterprise data ecosystem looks like and how to address DIH limitations to tackle a new type of data challenge facing enterprises.

