Adi Polak

February 4, 2021 · 3 min read

Delta Lake essential Fundamentals: Part 1 - ACID

open-source apache spark delta lake

🎉 Welcome to the first part of Delta Lake essential fundamentals! 🎉

What is Delta Lake ?

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.

DeltaLake open source consists of 3 projects:

detla - Delta Lake core, written in Scala.
delta-rs - Rust library for binding with Python and Ruby.
connectors - Connectors to popular big data engines outside Spark, written mostly in Scala.

Delta provides us the ability to “travel back in time” into previous versions of our data, scalable metadata - that means if we have a large set of raw data stored in a data lake, having metadata provides us with the flexibility needed for analytics and exploration of the data. It also provides a mechanism to unify streaming and batch data.
Schema enforcement - handle schema variations to prevent insertion of bad/non-compliant records, and ACID transactions to ensure that the users/readers never see inconsistent data.

Read article →

Acid

Delta Lake essential Fundamentals: Part 1 - ACID

What is Delta Lake ?