A Review of “Designing Data-Intensive Applications”

Chapter 3 discusses storage and retrieval of data and I think this is probably one of the best chapters for helping to explain how many databases work..Big-O notation is introduced to explain computational complexity of algorithms..Append-only systems, b-trees, bloom filters, hash maps, sorted string tables, log-structured merge-trees are all brought up..Storage system implementation details such as how to handle deleting records, crash recovery, partially-written records and concurrency control are covered as well..Its also explained how the above play a role in systems such as Googles Bigtable, HBase, Cassandra and Elasticsearch to name a few..There is a fun section where the worlds "simplest" database, a key-value store, is implemented using two functions in bash..Page 88 onward does a good job of contrasting OLTP and OLAP systems and uses this as a segue into data warehousing..Data cubes, ETL, column-oriented storage, star- and snowflake schemas, fact and dimension tables, sort orderings and aggregation are all discussed..Teradata, Vertica, SAP HANA and ParAccel, Redshift and Hadoop are mentioned as systems incorporating these concepts into their offerings..The fourth chapter discusses data encoding techniques and how data can be stored so that its structure can evolve..Martin is a contributor to Apache Avro, a file format project started by the creator of Hadoop, Doug Cutting..Martin does an amazing job of explaining how Avro files have "readers" and "writers" schemas.. More details

Leave a Reply