For each crawl (there is usually one a month) there can be upwards of 60,000 warc.gz files. These are all…
Continue Readingm:::::m
A Billion Taxi Rides on Amazon EMR running Presto
$ screen $ hive CREATE EXTERNAL TABLE trips_csv ( trip_id INT, vendor_id VARCHAR(3), pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag VARCHAR(1), rate_code_id…
Continue Reading