The most impactful data-driven insights come from connecting the dots between all your data sources—departments, services, on-premises tools, and third-party applications. But typically, connecting data requires complex extract, transform, and load (ETL) pipelines, which take hours or days. It is too slow for decision making. ETL needs to be simplified and sometimes eliminated.

AWS is investing in several ways to address this. First, for common use cases where ETL is repeated with little value-add, we are integrating services to reduce or eliminate the need for ETL. Second, organizations still need transformations such as cleaning, deduplication, and combining datasets for analytics and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Third, AWS continues to add support for more data sources including software as a service (SaaS) applications, on-premises applications, and connections to other clouds so organizations can act on their data.

In this post, we discuss how we are complementing these investments with a number of data integration innovations spanning AWS databases, analytics, business intelligence (BI), and ML services.

Amazon Aurora MySQL zero-ETL integration with Amazon Redshift now generally available

In June 2023, we announced the public preview of Amazon Aurora MySQL-Compatible Edition zero-ETL integration with Amazon Redshift. We are pleased to announce that this zero-ETL integration is now generally available. Amazon Aurora MySQL zero-ETL integration with Amazon Redshift processes more than 1 million transactions per minute, enabling real-time analytics. Within seconds of new data arriving in Amazon Aurora MySQL, the data is replicated to Amazon Redshift. Updates to Amazon Aurora MySQL are automatically and continuously propagated to Amazon Redshift. Customers and partners can save tremendous time by reducing traditional ETL hassles. They can now analyze business metrics in near real-time and make data-driven decisions faster than ever.

For example, in the retail industry, Infosys wanted to gain faster insight into its business based on transactions in the store management system, such as best-selling products and high-revenue stores. To achieve this they used Amazon Aurora MySQL zero-ETL integration with Amazon Redshift. With this integration, Infosys replicated Aurora data into Amazon Redshift and created Amazon QuickSight dashboards for product managers and channel leaders in just seconds instead of several hours. Now, as part of the Infosys Cobalt and Infosys Topaz blueprint, enterprises can have near real-time analytics on transactional data, which can help them make informed decisions related to store management. – Sunil Senan, SVP and global head of data, analytics and AI, Infosys

Amazon SageMaker Canvas integration with Amazon QuickSight

We’re empowering business analysts to create predictive, interactive dashboards by combining Amazon SageMaker Canvas, our no-code ML service, with Amazon QuickSight, our BI service. Business analysts use SageMaker Canvas to build ML models and generate predictions without the need to write code. They can then seamlessly integrate these predictions into QuickSight to create interactive dashboards that can be shared across their organization. This enables the democratization of predictive insights for better decision making.

Additionally, we’ve enabled deep, bidirectional integration between SageMaker Canvas and QuickSight. Business analysts can send ML models from SageMaker Canvas to QuickSight and run predictions from within QuickSight. Analysts can now send data directly from QuickSight to SageMaker Canvas with just a few clicks to build ML models faster using a simple point-and-click interface, without the need to create or maintain complex data pipelines between the two services. Are. This integration empowers users to go from data to predictions and visualizations faster than ever before.

Connecting to SaaS applications

AWS services already connect to hundreds of AWS and third-party data sources. Data engineers can use services like Amazon AppFlow and AWS Glue to quickly access data from different sources. This enables organizations to gain integrated insights across entire datasets. We recently added new Amazon AppFlow and AWS Glue integrations to our existing portfolio.

Amazon AppFlow now supports concurrent processing for data transfers from SAP applications

Amazon AppFlow, a fully managed integration service that helps you securely move data between AWS services and SaaS applications, now supports concurrent processing and configurable page sizes for faster data transfers from SAP Is. This reduces the time taken to migrate SAP data to AWS data and artificial intelligence (AI) services.

Google BigQuery connectivity to AWS Glue for Apache Spark now generally available

AWS Glue for Apache Spark adds native connectivity to Google BigQuery, enabling libraries to read and write BigQuery data directly without the need to install or manage libraries. You can now add BigQuery as a source or target in the visual interface of AWS Glue Studio or directly into an AWS Glue ETL script.

Summary

The data integration innovations we’ve highlighted reflect our commitment to empowering organizations to easily connect their data. Whether it’s gaining real-time insights, democratizing predictive analytics, or connecting diverse data sources, we’re focused on helping you get more value from your data. With new capabilities from Amazon Aurora MySQL, Amazon Redshift, SageMaker Canvas, QuickSight, Amazon AppFlow, and AWS Glue, data engineers and business analysts can break down data silos to uncover insights.

About the Author

Rahul Pathak Vice President of Relational Database Engines, leading Amazon Aurora, Amazon Redshift, and Amazon QLDB. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio. He has co-founded two companies, one focused on digital media analytics and the other focused on IP-geolocation.

G2 Krishnamurthy Vice President of Analytics, leading AWS Data Lake services, data integration, Amazon OpenSearch Service, and Amazon QuickSight. Prior to his current role, G2 built and ran the analytics and ML platform at Facebook/Meta, and built various parts of SQL Server Database, Azure Analytics, and Azure ML at Microsoft.

