1.1 Billion Taxi Rides with MapD & 8 Nvidia Tesla K80s

| |===============================+======================+======================| | 0 Tesla K80 On | 0000:06:00.0 Off | Off | | N/A 48C P0 67W / 149W | 2748MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 1 Tesla K80 On | 0000:07:00.0 Off | Off | | N/A 36C P0 78W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 2 Tesla K80 On | 0000:0A:00.0 Off | Off | | N/A 47C P0 66W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 3 Tesla K80 On | 0000:0B:00.0 Off | Off | | N/A 35C P0 77W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 4 Tesla K80 On | 0000:10:00.0 Off | Off | | N/A 44C P0 64W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 5 Tesla K80 On | 0000:11:00.0 Off | Off | | N/A 33C P0 76W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 6 Tesla K80 On | 0000:14:00.0 Off | Off | | N/A 47C P0 64W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 7 Tesla K80 On | 0000:15:00.0 Off | Off | | N/A 36C P0 74W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 8 Tesla K80 On | 0000:86:00.0 Off | Off | | N/A 43C P0 61W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 9 Tesla K80 On | 0000:87:00.0 Off | Off | | N/A 34C P0 77W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 10 Tesla K80 On | 0000:8A:00.0 Off | Off | | N/A 47C P0 66W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 11 Tesla K80 On | 0000:8B:00.0 Off | Off | | N/A 34C P0 78W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 12 Tesla K80 On | 0000:90:00.0 Off | Off | | N/A 44C P0 64W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 13 Tesla K80 On | 0000:91:00.0 Off | Off | | N/A 36C P0 77W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 14 Tesla K80 On | 0000:94:00.0 Off | Off | | N/A 45C P0 65W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 15 Tesla K80 On | 0000:95:00.0 Off | Off | | N/A 36C P0 76W / 149W | 2714MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ +—————————————————————————–+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2666MiB | | 0 10718 G /usr/bin/Xorg 16MiB | | 1 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 1 10718 G /usr/bin/Xorg 16MiB | | 2 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 2 10718 G /usr/bin/Xorg 16MiB | | 3 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 3 10718 G /usr/bin/Xorg 16MiB | | 4 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 4 10718 G /usr/bin/Xorg 16MiB | | 5 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 5 10718 G /usr/bin/Xorg 16MiB | | 6 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 6 10718 G /usr/bin/Xorg 16MiB | | 7 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 7 10718 G /usr/bin/Xorg 16MiB | | 8 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 8 10718 G /usr/bin/Xorg 16MiB | | 9 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 9 10718 G /usr/bin/Xorg 16MiB | | 10 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 10 10718 G /usr/bin/Xorg 16MiB | | 11 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 11 10718 G /usr/bin/Xorg 16MiB | | 12 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 12 10718 G /usr/bin/Xorg 16MiB | | 13 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 13 10718 G /usr/bin/Xorg 16MiB | | 14 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 14 10718 G /usr/bin/Xorg 16MiB | | 15 7542 C+G /raidStorage/prod/mapd/bin/mapd_server 2632MiB | | 15 10718 G /usr/bin/Xorg 16MiB | +—————————————————————————–+ The machine is running CentOS 7.2.1511 on a 16-core, 32-thread Intel Xeon E5-2667 v3 clocked at 3.2 GHz with 792 GB of RAM..Ill be using a RAID array to store both the raw CSV files and MapDs internal columnar-based files it uses to represent the dataset..The RAID array is comprised of 4 x Samsung EVO 2 TB SSDs in a RAID 10 configuration with a LSI Logic / Symbios Logic MegaRAID SAS2108 RAID bus controller making for a total of 4 TB of usable storage..This configuration should see sequential read speeds of up to 500 MB/s..Loading 1.1 Billion Trips into MapD To start, Ill download the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post..This data sits in 56 GZIP files and decompresses into around 500 GB of raw CSV data..$ cd /raidStorage/mark/ $ vi urls.txt https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaa.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xab.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xac.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xad.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xae.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaf.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xag.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xah.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xai.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaj.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xak.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xal.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xam.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xan.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xao.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xap.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaq.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xar.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xas.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xat.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xau.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xav.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaw.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xax.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xay.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaz.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xba.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbb.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbc.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbd.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbe.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbf.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbg.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbh.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbi.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbj.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbk.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbl.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbm.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbn.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbo.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbp.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbq.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbr.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbs.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbt.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbu.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbv.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbw.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbx.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xby.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbz.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xca.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcb.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcc.csv.gz https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcd.csv.gz $ cat urls.txt | xargs -n 1 -P 6 wget Ill then create a schema for my trips table..The SQL looks very much like what youd use with most RDBMS offerings..Two items which will stand out is the dictionary encoding on string columns with low cardinality and a fragment size parameter..The fragment size helps split the 1.1 billion records amongst all of the 8 K80 cards so they can work in parallel with roughly equal workloads..$ vi create_trips_table.sql CREATE TABLE trips ( trip_id INTEGER, vendor_id VARCHAR(3) ENCODING DICT, pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag VARCHAR(1) ENCODING DICT, rate_code_id SMALLINT, pickup_longitude DECIMAL(14,2), pickup_latitude DECIMAL(14,2), dropoff_longitude DECIMAL(14,2), dropoff_latitude DECIMAL(14,2), passenger_count SMALLINT, trip_distance DECIMAL(14,2), fare_amount DECIMAL(14,2), extra DECIMAL(14,2), mta_tax DECIMAL(14,2), tip_amount DECIMAL(14,2), tolls_amount DECIMAL(14,2), ehail_fee DECIMAL(14,2), improvement_surcharge DECIMAL(14,2), total_amount DECIMAL(14,2), payment_type VARCHAR(3) ENCODING DICT, trip_type SMALLINT, pickup VARCHAR(50) ENCODING DICT, dropoff VARCHAR(50) ENCODING DICT, cab_type VARCHAR(6) ENCODING DICT, precipitation SMALLINT, snow_depth SMALLINT, snowfall SMALLINT, max_temperature SMALLINT, min_temperature SMALLINT, average_wind_speed SMALLINT, pickup_nyct2010_gid SMALLINT, pickup_ctlabel VARCHAR(10) ENCODING DICT, pickup_borocode SMALLINT, pickup_boroname VARCHAR(13) ENCODING DICT, pickup_ct2010 VARCHAR(6) ENCODING DICT, pickup_boroct2010 VARCHAR(7) ENCODING DICT, pickup_cdeligibil VARCHAR(1) ENCODING DICT, pickup_ntacode VARCHAR(4) ENCODING DICT, pickup_ntaname VARCHAR(56) ENCODING DICT, pickup_puma VARCHAR(4) ENCODING DICT, dropoff_nyct2010_gid SMALLINT, dropoff_ctlabel VARCHAR(10) ENCODING DICT, dropoff_borocode SMALLINT, dropoff_boroname VARCHAR(13) ENCODING DICT, dropoff_ct2010 VARCHAR(6) ENCODING DICT, dropoff_boroct2010 VARCHAR(7) ENCODING DICT, dropoff_cdeligibil VARCHAR(1) ENCODING DICT, dropoff_ntacode VARCHAR(4) ENCODING DICT, dropoff_ntaname VARCHAR(56) ENCODING DICT, dropoff_puma VARCHAR(4) ENCODING DICT ) with (fragment_size=35000000); Ill create environment variables with my credentials for MapD..$ read MAPD_USERNAME $ read MAPD_PASSWORD $ export MAPD_USERNAME $ export MAPD_PASSWORD The following will create the table schema using the mapdql cli tool.. More details

Leave a Reply