Delta Lake essential Fundamentals: Part 1 - ACID
🎉 Welcome to the first part of Delta Lake essential fundamentals! 🎉
What is Delta Lake ?
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
DeltaLake open source consists of 3 projects:
- detla - Delta Lake core, written in Scala.
- delta-rs - Rust library for binding with Python and Ruby.
- connectors - Connectors to popular big data engines outside Spark, written mostly in Scala.
Delta provides us the ability to “travel back in time” into previous versions of our data, scalable metadata - that means if we have a large set of raw data stored in a data lake, having metadata provides us with the flexibility needed for analytics and exploration of the data. It also provides a mechanism to unify streaming and batch data.
Schema enforcement - handle schema variations to prevent insertion of bad/non-compliant records, and ACID transactions to ensure that the users/readers never see inconsistent data.
