ClickHouse is an open source, columnar-oriented database. It has a sweet spot where 100s of analysts can query non-rolled-up /…
Continue Readingcab_type
Convert CSVs to ORC Faster
Every analytical database Ive used converts imported data into a form that is quicker to read. Often this means storing…
Continue Reading1.1 Billion Taxi Rides on ClickHouse & an Intel Core i5
$ clickhouse-client CREATE TABLE trips ( trip_id UInt32, vendor_id String, pickup_datetime DateTime, dropoff_datetime Nullable(DateTime), store_and_fwd_flag Nullable(FixedString(1)), rate_code_id Nullable(UInt8), pickup_longitude Nullable(Float64),…
Continue ReadingA Billion Taxi Rides on Amazon EMR running Spark
$ hive CREATE EXTERNAL TABLE trips_orc ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude…
Continue ReadingA Billion Taxi Rides: AWS S3 versus HDFS
$ hive CREATE EXTERNAL TABLE trips_orc_s3 ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude…
Continue Reading