AWS Big Data Blog

Architecture

Develop a business chargeback model within your organization using Amazon Redshift multi-warehouse writes

Now, we are announcing general availability (GA) of Amazon Redshift multi-data warehouse writes through data sharing. This new capability allows you to scale your write workloads and achieve better performance for extract, transform, and load (ETL) workloads by using different warehouses of different types and sizes based on your workload needs.

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

In this post, we explore how to use Aurora MySQL-Compatible Edition Zero-ETL integration with Amazon Redshift and dbt Cloud to enable near real-time analytics. By using dbt Cloud for data transformation, data teams can focus on writing business rules to drive insights from their transaction data to respond effectively to critical, time sensitive events.

Intel Accelerators on Amazon OpenSearch Service improve price-performance on vector search by up to 51%

OpenSearch Service is a managed service for the OpenSearch search and analytics suite, which includes support for vector search. By running your OpenSearch 2.17+ domains on C/M/R 7i instances, you can achieve up to a 51% price-performance gain compared to the past R5 instances on OpenSearch Service. As we discuss in this post, this launch offers improvements to your infrastructure total cost of ownership (TCO) and savings.

Run Apache XTable in AWS Lambda for background conversion of open table formats

In this post, we explore how Apache XTable, combined with the AWS Glue Data Catalog, enables background conversions between open table formats residing on Amazon S3-based data lakes, with minimal to no changes to existing pipelines, in a scalable and cost-effective way.

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

This post demonstrates how generative AI troubleshooting for Spark in AWS Glue helps your day-to-day Spark application debugging. It simplifies the debugging process for your Spark applications by using generative AI to automatically identify the root cause of failures and provides actionable recommendations to resolve the issues.

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

Today, we are excited to announce the preview of generative AI upgrades for Spark, a new capability that enables data practitioners to quickly upgrade and modernize their Spark applications running on AWS. Starting with Spark jobs in AWS Glue, this feature allows you to upgrade from an older AWS Glue version to AWS Glue version 4.0. This new capability reduces the time data engineers spend on modernizing their Spark applications, allowing them to focus on building new data pipelines and getting valuable analytics faster.

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

In this post, we’ll walk through an example ETL process that uses session reuse to efficiently create, populate, and query temporary staging tables across the full data transformation workflow—all within the same persistent Amazon Redshift database session. You’ll learn best practices for optimizing ETL orchestration code, reducing job runtimes by eliminating connection overhead, and simplifying pipeline complexity

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

We are excited to announce that the dbt adapter for Amazon Athena is now officially supported in dbt Cloud. This integration enables data teams to efficiently transform and manage data using Athena with dbt Cloud’s robust features, enhancing the overall data workflow experience. In this post, we discuss the advantages of dbt Cloud over dbt Core, common use cases, and how to get started with Amazon Athena using the dbt adapter.