Alenka: A GPU-Driven, Open Source Database

Ive yet to audit the results but for the record, here they are: |37 |14.1900 |1 | |208 |7.0416 |1508 | |19 |5.0000 |1 | |137 |59.6400 |1 | |38 |7.2900 |1 | |158 |14.4400 |1 | |255 |17.9890 |10 | |249 |9.5000 |1 | |58 |25.9350 |2 | |223 |9.5000 |1 | |33 |8.5950 |2 | |25 |7.5900 |1 | |2 |13.7109 |161755340 | |49 |3.2457 |26 | |70 |10.6900 |1 | |155 |90.2300 |1 | |113 |13.3000 |1 | |125 |16.6000 |1 | |0 |10.6676 |3902029 | |34 |16.8000 |1 | |250 |12.5666 |3 | |163 |15.5300 |1 | |97 |9.9000 |1 | |177 |17.0000 |1 | |6 |14.3061 |23796601 | |211 |7.0000 |1 | |254 |6.5000 |1 | |8 |25.4042 |876 | |7 |25.7082 |913 | |5 |13.1016 |77761602 | |129 |8.7857 |7 | |165 |12.1400 |1 | |53 |7.2900 |1 | |134 |55.1400 |1 | |133 |10.3000 |1 | |66 |19.3000 |1 | |84 |43.8400 |1 | |3 |13.3259 |48313914 | |225 |16.0000 |1 | |141 |18.9400 |1 | |69 |5.7900 |1 | |4 |13.4157 |23325370 | |36 |61.5400 |1 | |61 |31.3400 |1 | |1 |13.1786 |772743590 | |13 |31.5000 |1 | |213 |2.5000 |4 | |65 |23.3600 |3 | |17 |39.9500 |1 | |247 |19.4400 |1 | |47 |9.0000 |1 | |9 |41.7145 |422 | |10 |42.4800 |16 | |164 |62.1400 |1 | |160 |15.3400 |1 | |15 |12.0500 |2 | |193 |7.5000 |1 | Query 3: A := SELECT passenger_count AS pac, YEAR(pickup_datetime) AS pickup_year, COUNT(passenger_count) AS pc FROM trips GROUP BY passenger_count, pickup_year; DISPLAY A USING ('|'); This query crashes Alenka after 11.77 seconds with the following complaint: terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory The above is an interesting issue..During the execution Alenka allocates all non-allocated memory on the GPU before terminating..I suspect the data is being loaded in one go onto the GPU and there isnt enough memory for the columns of data being worked with..I suspect streaming in data in chunks and combining results could help this query finish properly..Query 4: A := SELECT passenger_count AS pac, YEAR(pickup_datetime) AS pickup_year, CAST_TO_INT(trip_distance) AS distance, COUNT(passenger_count) AS the_count FROM trips GROUP BY passenger_count, pickup_year, distance; B := ORDER A BY pickup_year ASC, the_count desc; DISPLAY B USING ('|'); This query crashes Alenka after 19.7 seconds with the same "out of memory" complaint from the Thrust library..Closing Thoughts Ive read of others seeing good execution times with their datasets on Alenka so I dont think the fact I havent yet managed to complete my 1.1 billion taxi trips benchmark should put people off from trying this software out..When I first started looking at Alenka I wanted to dig into the architecture of a GPU-driven database..Ive seen these last eight weeks as a real learning experience and now Im hoping for a few things to become of this blog post..The first objective is that I want to promote GPUs as a data platform..The second objective is to encourage people to build data tools that run on GPUs as I think there is a lot of room for innovation in this space..The third objective is to repay Anton with a shout out for all his help and hard work over the past eight weeks..He was coding up patches during all hours on the weekends and was responsive to every one of my emails I flooded his inbox with.. More details

Leave a Reply