Skip to content
  • Privacy Policy
  • Privacy Policy
High DA, PA, DR Guest Blogs Posting Website – Pcp247.com

High DA, PA, DR Guest Blogs Posting Website – Pcp247.com

Pcp247.com

  • Computer
  • Fashion
  • Business
  • Lifestyle
  • Automobile
  • Login
  • Register
  • Technology
  • Travel
  • Post Blog
  • Toggle search form
  • แทงบอลสเต็ป Amazon Detective
  • Human-Centered AI Market – Revolutionary Scope by 2032 Technology
  • Fiberglass Columns Decoded: Selecting the Perfect Design for You Home Decor
  • Empower Your Classified Advertisements with FabiLive Cameroon *Post Types
  • SEO Strategies for Law Firms: Boosting Visibility and Driving Organic Traffic with Southeast Legal Marketing Business
  • Genomics Market , trends, share, industry size, growth, demand, opportunities and forecast by 2030 Amazon Detective
  • Magic Drops JOY: Some sort of Design Symphony connected with Enjoyment Amazon DataZone
  • Patients seeking out-of-state abortions sometimes catch rides on private planes : Shots Health and Fitness

AWS Glue Data Catalog now supports automatic compaction of Apache Iceberg tables

Posted on November 15, 2023 By Editorial Team

Today, we’re making available a new capability of AWS Glue Data Catalog to allow automatic compaction of transactional tables in the Apache Iceberg format. This allows you to keep your transactional data lake tables always performant.

Data lakes were initially designed primarily for storing vast amounts of raw, unstructured, or semi structured data at a low cost, and they were commonly associated with big data and analytics use cases. Over time, the number of possible use cases for data lakes has evolved as organizations have recognized the potential to use data lakes for more than just reporting, requiring the inclusion of transactional capabilities to ensure data consistency.

Data lakes also play a pivotal role in data quality, governance, and compliance, particularly as data lakes store increasing volumes of critical business data, which often requires updates or deletion. Data-driven organizations also need to keep their back end analytics systems in near real-time sync with customer applications. This scenario requires transactional capabilities on your data lake to support concurrent writes and reads without data integrity compromise. Finally, data lakes now serve as integration points, necessitating transactions for safe and reliable data movement between various sources.

To support transactional semantics on data lake tables, organizations adopted an open table format (OTF), such as Apache Iceberg. Adopting OTF formats comes with its own set of challenges: transforming existing data lake tables from Parquet or Avro formats to an OTF format, managing a large number of small files as each transaction generates a new file on Amazon Simple Storage Service (Amazon S3), or managing object and meta-data versioning at scale, just to name a few. Organizations are typically building and managing their own data pipelines to address these challenges, leading to additional undifferentiated work on infrastructure. You need to write code, deploy Spark clusters to run your code, scale the cluster, manage errors, and so on.

When talking with our customers, we learned that the most challenging aspect is the compaction of individual small files produced by each transactional write on tables into a few large files. Large files are faster to read and scan, making your analytics jobs and queries faster to execute. Compaction optimizes the table storage with larger-sized files. It changes the storage for the table from a large number of small files to a small number of larger files. It reduces metadata overhead, lowers network round trips to S3, and improves performance. When you use engines that charge for the compute, the performance improvement is also beneficial to the cost of usage as the queries require less compute capacity to run.

But building custom pipelines to compact and optimize Iceberg tables is time-consuming and expensive. You have to manage the planning, provision infrastructure, and schedule and monitor the compaction jobs. This is why we launch automatic compaction today.

Let’s see how it works
To show you how to enable and monitor automatic compaction on Iceberg tables, I start from the AWS Lake Formation page or the AWS Glue page of the AWS Management Console. I have an existing database with tables in the Iceberg format. I execute transactions on this table over the course of a couple of days, and the table starts to fragment into small files on the underlying S3 bucket.

I select the table on which I want to enable compaction, and then I select Enable compaction.

An IAM role is required to pass permissions to the Lake Formation service to access my AWS Glue tables, S3 buckets, and CloudWatch log streams. Either I choose to create a new IAM role, or I select an existing one. Your existing role must have lakeformation:GetDataAccess and glue:UpdateTable permissions on the table. The role also needs logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents, to “arn:aws:logs:*:your_account_id:log-group:/aws-lakeformation-acceleration/compaction/logs:*“. The role trusted permission service name must be set to glue.amazonaws.com.

Then, I select Turn on compaction. Et voilà! Compaction is automatic; there is nothing to manage on your side.

The service starts to measure the table’s rate of change. As Iceberg tables can have multiple partitions, the service calculates this change rate for each partition and schedules managed jobs to compact the partitions where this rate of change breaches a threshold value.

When the table accumulates a high number of changes, you will be able to view the Compaction history under the Optimization tab in the console.

You can also monitor the whole process either by observing the number of files on your S3 bucket (use the NumberOfObjects metric) or one of the two new Lake Formation metrics: numberOfBytesCompacted or numberOfFilesCompacted.

In addition to the AWS console, there are six new APIs that expose this new capability:CreateTableOptimizer, BatchGetTableOptimizer , UpdateTableOptimizer, DeleteTableOptimizer, GetTableOptimizer, and ListTableOptimizerRuns. These APIs are available in the AWS SDKs and AWS Command Line Interface (AWS CLI). As usual, don’t forget to update the SDK or the CLI to their latest versions to get access to these new APIs.

Things to know
As we launched this new capability today, there are a couple of additional points I’d like to share with you:

  • Compaction will not merge delete files. Tables with deleted data will be compacted, but data files that have delete files associated with them will be skipped.
  • S3 buckets configured for exclusive access from a VPC through VPC endpoints are not supported.
  • Apache Iceberg tables using Apache Parquet to store the data can be compacted.
  • Compaction works on buckets encrypted with the default server-side encryption (SSE-S3) or server-side encryption with KMS managed keys (SSE-KMS)

Availability
This new capability is available in Asia Pacific (Tokyo), US East (N. Virginia), US East (Ohio), US West (Oregon) and Europe (Ireland).

The pricing metric is the data processing unit (DPU), a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. There is a charge per DPU/hours metered by second, with a minimum of one minute.

Now it’s time to decommission your existing compaction data pipeline and switch to this new, entirely managed capability today.

— seb

Amazon Simple Storage Service (S3), Analytics, Announcements, AWS Glue, AWS Lake Formation, News

Post navigation

Previous Post: HDROMEDIA – Experiența bogată a televiziunii românești direct pe ecranul tău!
Next Post: Why Are People Sleeping Better with CBN Sleep Gummies?

Related Posts

  • High-Density Polyethylene (HDPE)Market Size & Share Analysis News
  • New – Bring ML Models Built Anywhere into Amazon SageMaker Canvas and Generate Predictions Amazon SageMaker Canvas
  • Introducing Athena Provisioned Capacity Amazon Athena
  • NPK Fertilizers Market Size & Share Analysis News
  • Introducing Amazon CloudFront KeyValueStore: A low-latency datastore for CloudFront Functions Amazon CloudFront
  • Amazon CodeWhisperer offers new AI-powered code remediation, IaC support, and integration with Visual Studio Amazon CodeWhisperer

lc_banner_enterprise_1

Top 30 High DA-PA Guest Blog Posting Websites 2024

Recent Posts

  • How AI Video Generators Are Revolutionizing Social Media Content
  • Expert Lamborghini Repair Services in Dubai: Preserving Luxury and Performance
  • What do you are familiar Oxycodone?
  • Advantages and Disadvantages of having White Sliding Door Wardrobe
  • The Future of Online Counseling: Emerging Technologies and their Impact on Mental Health Care

Categories

  • .NET
  • *Post Types
  • Amazon AppStream 2.0
  • Amazon Athena
  • Amazon Aurora
  • Amazon Bedrock
  • Amazon Braket
  • Amazon Chime SDK
  • Amazon CloudFront
  • Amazon CloudWatch
  • Amazon CodeCatalyst
  • Amazon CodeWhisperer
  • Amazon Comprehend
  • Amazon Connect
  • Amazon DataZone
  • Amazon Detective
  • Amazon DocumentDB
  • Amazon DynamoDB
  • Amazon EC2
  • Amazon EC2 Mac Instances
  • Amazon EKS Distro
  • Amazon Elastic Block Store (Amazon EBS)
  • Amazon Elastic Container Registry
  • Amazon Elastic Container Service
  • Amazon Elastic File System (EFS)
  • Amazon Elastic Kubernetes Service
  • Amazon ElastiCache
  • Amazon EMR
  • Amazon EventBridge
  • Amazon Fraud Detector
  • Amazon FSx
  • Amazon FSx for Lustre
  • Amazon FSx for NetApp ONTAP
  • Amazon FSx for OpenZFS
  • Amazon FSx for Windows File Server
  • Amazon GameLift
  • Amazon GuardDuty
  • Amazon Inspector
  • Amazon Interactive Video Service
  • Amazon Kendra
  • Amazon Lex
  • Amazon Lightsail
  • Amazon Location
  • Amazon Machine Learning
  • Amazon Managed Grafana
  • Amazon Managed Service for Apache Flink
  • Amazon Managed Service for Prometheus
  • Amazon Managed Streaming for Apache Kafka (Amazon MSK)
  • Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
  • Amazon MemoryDB for Redis
  • Amazon Neptune
  • Amazon Omics
  • Amazon OpenSearch Service
  • Amazon Personalize
  • Amazon Pinpoint
  • Amazon Polly
  • Amazon QuickSight
  • Amazon RDS
  • Amazon RDS Custom
  • Amazon Redshift
  • Amazon Route 53
  • Amazon S3 Glacier
  • Amazon S3 Glacier Deep Archive
  • Amazon SageMaker
  • Amazon SageMaker Canvas
  • Amazon SageMaker Data Wrangler
  • Amazon SageMaker JumpStart
  • Amazon SageMaker Studio
  • Amazon Security Lake
  • Amazon Simple Email Service (SES)
  • Amazon Simple Notification Service (SNS)
  • Amazon Simple Queue Service (SQS)
  • Amazon Simple Storage Service (S3)
  • Amazon Transcribe
  • Amazon Translate
  • Amazon VPC
  • Amazon WorkSpaces
  • Analytics
  • Announcements
  • Application Integration
  • Application Services
  • Artificial Intelligence
  • Auto Scaling
  • Automobile
  • AWS Amplify
  • AWS Application Composer
  • AWS Application Migration Service
  • AWS AppSync
  • AWS Audit Manager
  • AWS Backup
  • AWS Chatbot
  • AWS Clean Rooms
  • AWS Cloud Development Kit
  • AWS Cloud Financial Management
  • AWS Cloud9
  • AWS CloudTrail
  • AWS CodeArtifact
  • AWS CodeBuild
  • AWS CodePipeline
  • AWS Config
  • AWS Control Tower
  • AWS Cost and Usage Report
  • AWS Data Exchange
  • AWS Database Migration Service
  • AWS DataSync
  • AWS Direct Connect
  • AWS Fargate
  • AWS Glue
  • AWS Glue DataBrew
  • AWS Health
  • AWS HealthImaging
  • AWS Heroes
  • AWS IAM Access Analyzer
  • AWS Identity and Access Management (IAM)
  • AWS IoT Core
  • AWS IoT SiteWise
  • AWS Key Management Service
  • AWS Lake Formation
  • AWS Lambda
  • AWS Management Console
  • AWS Marketplace
  • AWS Outposts
  • AWS re:Invent
  • AWS SDK for Java
  • AWS Security Hub
  • AWS Serverless Application Model
  • AWS Service Catalog
  • AWS Snow Family
  • AWS Snowball Edge
  • AWS Step Functions
  • AWS Supply Chain
  • AWS Support
  • AWS Systems Manager
  • AWS Toolkit for AzureDevOps
  • AWS Toolkit for JetBrains IntelliJ IDEA
  • AWS Toolkit for JetBrains PyCharm
  • AWS Toolkit for JetBrains WebStorm
  • AWS Toolkit for VS Code
  • AWS Training and Certification
  • AWS Transfer Family
  • AWS Trusted Advisor
  • AWS Wavelength
  • AWS Wickr
  • AWS X-Ray
  • Best Practices
  • Billing & Account Management
  • Business
  • Business Intelligence
  • Compliance
  • Compute
  • Computer
  • Contact Center
  • Containers
  • CPG
  • Customer Enablement
  • Customer Solutions
  • Database
  • Dating
  • Developer Tools
  • DevOps
  • Education
  • Elastic Load Balancing
  • End User Computing
  • Events
  • Fashion
  • Financial Services
  • Game
  • Game Development
  • Gateway Load Balancer
  • General News
  • Generative AI
  • Generative BI
  • Graviton
  • Health and Fitness
  • Healthcare
  • High Performance Computing
  • Home Decor
  • Hybrid Cloud Management
  • Industries
  • Internet of Things
  • Kinesis Data Analytics
  • Kinesis Data Firehose
  • Launch
  • Lifestyle
  • Management & Governance
  • Management Tools
  • Marketing & Advertising
  • Media & Entertainment
  • Media Services
  • Messaging
  • Migration & Transfer Services
  • Migration Acceleration Program (MAP)
  • MySQL compatible
  • Networking & Content Delivery
  • News
  • Open Source
  • PostgreSQL compatible
  • Public Sector
  • Quantum Technologies
  • RDS for MySQL
  • RDS for PostgreSQL
  • Real Estate
  • Regions
  • Relationship
  • Research
  • Retail
  • Robotics
  • Security
  • Security, Identity, & Compliance
  • Serverless
  • Social Media
  • Software
  • Storage
  • Supply Chain
  • Technical How-to
  • Technology
  • Telecommunications
  • Thought Leadership
  • Travel
  • Week in Review

#digitalsat #digitalsattraining #satclassesonline #satexamscore #satonline Abortion AC PCB Repairing Course AC PCB Repairing Institute AC Repairing Course AC Repairing Course In Delhi AC Repairing Institute AC Repairing Institute In Delhi Amazon Analysis AWS Bird Blog business Care drug Eating fitness Food Growth health Healthcare Industry Trends Kheloyar kheloyar app kheloyar app download kheloyar cricket NPR peacock.com/tv peacocktv.com/tv People Review Share Shots site Solar Module Distributor Solar Panel Distributor solex distributor solplanet inverter distributor U.S Week

  • แทงบอลสเต็ป Amazon Detective
  • Human-Centered AI Market – Revolutionary Scope by 2032 Technology
  • Fiberglass Columns Decoded: Selecting the Perfect Design for You Home Decor
  • Empower Your Classified Advertisements with FabiLive Cameroon *Post Types
  • SEO Strategies for Law Firms: Boosting Visibility and Driving Organic Traffic with Southeast Legal Marketing Business
  • Genomics Market , trends, share, industry size, growth, demand, opportunities and forecast by 2030 Amazon Detective
  • Magic Drops JOY: Some sort of Design Symphony connected with Enjoyment Amazon DataZone
  • Patients seeking out-of-state abortions sometimes catch rides on private planes : Shots Health and Fitness

Latest Posts

  • How AI Video Generators Are Revolutionizing Social Media Content
  • Expert Lamborghini Repair Services in Dubai: Preserving Luxury and Performance
  • What do you are familiar Oxycodone?
  • Advantages and Disadvantages of having White Sliding Door Wardrobe
  • The Future of Online Counseling: Emerging Technologies and their Impact on Mental Health Care

Gallery

Quick Links

  • Login
  • Register
  • Contact us
  • Post Blog
  • Privacy Policy

Powered by PressBook News WordPress theme