ClickHouse is an open source, columnar-oriented database. It has a sweet spot where 100s of analysts can query non-rolled-up /…
Continue Readingpassenger_count
50-node Presto Cluster on Amazon EMR
$ hive CREATE EXTERNAL TABLE trips_orc ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude…
Continue Reading1.1 Billion Taxi Rides on AWS EMR 5.3.0 & Spark 2.1.0
SELECT passenger_count, year(pickup_datetime), count(*) FROM trips_parquet GROUP BY passenger_count, year(pickup_datetime); The following completed in 85.942 seconds..SELECT passenger_count, year(pickup_datetime) trip_year, round(trip_distance),…
Continue Reading1.1 Billion Taxi Rides on ClickHouse & an Intel Core i5
$ clickhouse-client CREATE TABLE trips ( trip_id UInt32, vendor_id String, pickup_datetime DateTime, dropoff_datetime Nullable(DateTime), store_and_fwd_flag Nullable(FixedString(1)), rate_code_id Nullable(UInt8), pickup_longitude Nullable(Float64),…
Continue ReadingA Billion Taxi Rides on Amazon EMR running Spark
$ hive CREATE EXTERNAL TABLE trips_orc ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude…
Continue Reading1.1 Billion Taxi Rides: EC2 versus EMR
SELECT passenger_count, year(pickup_datetime), count(*) FROM trips_orc GROUP BY passenger_count, year(pickup_datetime); The following completed in 65 seconds..SELECT passenger_count, year(pickup_datetime) trip_year, round(trip_distance),…
Continue Reading1.1 Billion Taxi Rides with SQLite, Parquet & HDFS
CREATE VIEW query_3_view AS SELECT passenger_count, STRFTIME('%Y', DATETIME(pickup_datetime / 1000, 'unixepoch')) AS pickup_year, COUNT(*) AS num_records FROM trips_0 GROUP BY…
Continue Reading1.1 Billion Taxi Rides on Amazon Athena
CREATE EXTERNAL TABLE trips_parquet ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude DOUBLE, pickup_latitude…
Continue ReadingA Billion Taxi Rides on Amazon EMR running Presto
$ screen $ hive CREATE EXTERNAL TABLE trips_csv ( trip_id INT, vendor_id VARCHAR(3), pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag VARCHAR(1), rate_code_id…
Continue ReadingA Billion Taxi Rides: AWS S3 versus HDFS
$ hive CREATE EXTERNAL TABLE trips_orc_s3 ( trip_id INT, vendor_id STRING, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag STRING, rate_code_id SMALLINT, pickup_longitude…
Continue Reading