1.1 Billion Taxi Rides with BrytlytDB 2.0 & 2 GPU-Powered p2.16xlarge EC2 Instances

| |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:0F.0 Off | 0 | | N/A 65C P0 57W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 1 Tesla K80 Off | 00000000:00:10.0 Off | 0 | | N/A 54C P0 73W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 2 Tesla K80 Off | 00000000:00:11.0 Off | 0 | | N/A 73C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 3 Tesla K80 Off | 00000000:00:12.0 Off | 0 | | N/A 55C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 4 Tesla K80 Off | 00000000:00:13.0 Off | 0 | | N/A 71C P0 60W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 5 Tesla K80 Off | 00000000:00:14.0 Off | 0 | | N/A 56C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 6 Tesla K80 Off | 00000000:00:15.0 Off | 0 | | N/A 74C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 7 Tesla K80 Off | 00000000:00:16.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 8 Tesla K80 Off | 00000000:00:17.0 Off | 0 | | N/A 65C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 9 Tesla K80 Off | 00000000:00:18.0 Off | 0 | | N/A 53C P0 72W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 10 Tesla K80 Off | 00000000:00:19.0 Off | 0 | | N/A 72C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 11 Tesla K80 Off | 00000000:00:1A.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 12 Tesla K80 Off | 00000000:00:1B.0 Off | 0 | | N/A 70C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 13 Tesla K80 Off | 00000000:00:1C.0 Off | 0 | | N/A 57C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 14 Tesla K80 Off | 00000000:00:1D.0 Off | 0 | | N/A 74C P0 58W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ | 15 Tesla K80 Off | 00000000:00:1E.0 Off | 0 | | N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default | +——————————-+———————-+———————-+ The first machine has an IP address of 52.91.174.189 and acts as the Coordinator and host to the first 16 data nodes..The second machine has an IP address of 34.235.138.233 and hosts the second set of 16 data nodes..TCP port 5432 is open between the two machines for communicating via PostgreSQLs wire protocol, TCP port 7777 is open for global transaction manager communication and ports 15,432 through to 20,000 are open for data node communication..In addition to the default 20 GB EBS volumes on each EC2 instance there are six 500 GB General Purpose SSDs attached which offer a baseline of 100 IOPS each and can burst to 3,000 IOPS if need be..These drives are setup in a RAID 0 configuration on each instance..This gives each instance around 2.2 TB of formatted storage capacity in addition to the 32 GB boot drives..The Marketplace AMI sets up the RAID formation automatically..$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 32G 0 disk └─xvda1 202:1 0 32G 0 part / xvdb 202:16 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdc 202:32 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdd 202:48 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvde 202:64 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdf 202:80 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data xvdg 202:96 0 500G 0 disk └─md0 9:0 0 3T 0 raid0 /usr/local/data BrytlytDB Up & Running The BrytlytDB software will be installed and running when the EC2 instances launch but since Ill be using the two machines as a cluster Ill register the second instances data nodes on the first instance using the brytlyt PostgreSQL user account..$ vi register.sql CREATE NODE d16 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15432); CREATE NODE d17 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15433); CREATE NODE d18 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15434); CREATE NODE d19 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15435); CREATE NODE d20 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15436); CREATE NODE d21 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15437); CREATE NODE d22 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15438); CREATE NODE d23 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15439); CREATE NODE d24 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15440); CREATE NODE d25 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15441); CREATE NODE d26 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15442); CREATE NODE d27 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15443); CREATE NODE d28 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15444); CREATE NODE d29 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15445); CREATE NODE d30 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15446); CREATE NODE d31 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15447); SELECT pgxc_pool_reload(); $ psql -U brytlyt -f register.sql Once thats run I can check that the data nodes have all registered..SELECT * FROM pgxc_node; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id ———-+———–+———–+—————-+—————-+——————+————- c0 | C | 5432 | localhost | f | f | 546080270 d0 | D | 15432 | 127.0.0.1 | f | f | -2088056588 d1 | D | 15433 | 127.0.0.1 | f | f | 342786568 d2 | D | 15434 | 127.0.0.1 | f | f | 823431008 d3 | D | 15435 | 127.0.0.1 | f | f | 907287964 d4 | D | 15436 | 127.0.0.1 | f | f | -1005442993 d5 | D | 15437 | 127.0.0.1 | f | f | -2088175618 d6 | D | 15438 | 127.0.0.1 | f | f | 653419373 d7 | D | 15439 | 127.0.0.1 | f | f | -2114583294 d8 | D | 15440 | 127.0.0.1 | f | f | -1181003975 d9 | D | 15441 | 127.0.0.1 | f | f | 1701757799 d10 | D | 15442 | 127.0.0.1 | f | f | 2938326 d11 | D | 15443 | 127.0.0.1 | f | f | -340513757 d12 | D | 15444 | 127.0.0.1 | f | f | 899853678 d13 | D | 15445 | 127.0.0.1 | f | f | -1529184863 d14 | D | 15446 | 127.0.0.1 | f | f | 297400288 d15 | D | 15447 | 127.0.0.1 | f | f | -2050394442 d16 | D | 15432 | 34.235.138.233 | f | f | -951284347 d17 | D | 15433 | 34.235.138.233 | f | f | 940304836 d18 | D | 15434 | 34.235.138.233 | f | f | 1871723707 d19 | D | 15435 | 34.235.138.233 | f | f | 1351481950 d20 | D | 15436 | 34.235.138.233 | f | f | -958755150 d21 | D | 15437 | 34.235.138.233 | f | f | 1823411759 d22 | D | 15438 | 34.235.138.233 | f | f | 1397545445 d23 | D | 15439 | 34.235.138.233 | f | f | 68184193 d24 | D | 15440 | 34.235.138.233 | f | f | -965766358 d25 | D | 15441 | 34.235.138.233 | f | f | 1469379654 d26 | D | 15442 | 34.235.138.233 | f | f | -2015650531 d27 | D | 15443 | 34.235.138.233 | f | f | 14563042 d28 | D | 15444 | 34.235.138.233 | f | f | 1604721461 d29 | D | 15445 | 34.235.138.233 | f | f | -557897158 d30 | D | 15446 | 34.235.138.233 | f | f | 1910249780 d31 | D | 15447 | 34.235.138.233 | f | f | -1701917292 Loading 1.1 Billion Trips Into BrytlytDB Ill first define the table schema for the 1.1 billion taxi trip records..$ psql -U brytlyt CREATE FOREIGN TABLE trips ( trip_id INTEGER, vendor_id VARCHAR(3), pickup_datetime DATE, dropoff_datetime DATE, store_and_fwd_flag VARCHAR(1), rate_code_id INTEGER, pickup_longitude DOUBLE PRECISION, pickup_latitude DOUBLE PRECISION, dropoff_longitude DOUBLE PRECISION, dropoff_latitude DOUBLE PRECISION, passenger_count INTEGER, trip_distance DOUBLE PRECISION, fare_amount DOUBLE PRECISION, extra DOUBLE PRECISION, mta_tax DOUBLE PRECISION, tip_amount DOUBLE PRECISION, tolls_amount DOUBLE PRECISION, ehail_fee DOUBLE PRECISION, improvement_surcharge DOUBLE PRECISION, total_amount DOUBLE PRECISION, payment_type VARCHAR(10), trip_type INTEGER, pickup VARCHAR(50), dropoff VARCHAR(50), cab_type VARCHAR(6), precipitation INTEGER, snow_depth INTEGER, snowfall INTEGER, max_temperature INTEGER, min_temperature INTEGER, average_wind_speed INTEGER, pickup_nyct2010_gid INTEGER, pickup_ctlabel VARCHAR(10), pickup_borocode INTEGER, pickup_boroname VARCHAR(13), pickup_ct2010 VARCHAR(6), pickup_boroct2010 VARCHAR(7), pickup_cdeligibil VARCHAR(1), pickup_ntacode VARCHAR(4), pickup_ntaname VARCHAR(56), pickup_puma VARCHAR(4), dropoff_nyct2010_gid INTEGER, dropoff_ctlabel VARCHAR(10), dropoff_borocode INTEGER, dropoff_boroname VARCHAR(13), dropoff_ct2010 VARCHAR(6), dropoff_boroct2010 VARCHAR(7), dropoff_cdeligibil VARCHAR(1), dropoff_ntacode VARCHAR(4), dropoff_ntaname VARCHAR(56), dropoff_puma VARCHAR(4) ) SERVER gm_fdw_server OPTIONS (max_size '35000000', is_random 'false', index '10,24') DISTRIBUTE BY roundrobin; Note in the above there are indices on the passenger count and cab type columns..Also, the timestamp columns are using a DATE column type.. More details

Leave a Reply