Presto, Parquet & Airpal

EXPLAIN SELECT year, count(*) FROM track_metadata_csv GROUP BY year ORDER BY year; Query Plan ———————————————————————————————————————————————- – Output[year, _col1] => [year:bigint, count:bigint] _col1 := count – Sort[year ASC_NULLS_LAST] => [year:bigint, count:bigint] – Exchange[GATHER] => year:bigint, count:bigint – Aggregate(FINAL)[year] => [year:bigint, count:bigint] count := "count"("count_9") – Exchange[REPARTITION] => year:bigint, count_9:bigint – Aggregate(PARTIAL)[year] => [year:bigint, count_9:bigint] count_9 := "count"(*) – TableScan[hive:hive:default:track_metadata_csv, originalConstraint = true] => [year:bigint] LAYOUT: hive year := HiveColumnHandle{clientId=hive, name=year, hiveType=smallint, hiveColumnIndex=5, partitionKey=false} EXPLAIN SELECT year, count(*) FROM track_metadata_parquet GROUP BY year ORDER BY year; Query Plan ———————————————————————————————————————————————- – Output[year, _col1] => [year:bigint, count:bigint] _col1 := count – Sort[year ASC_NULLS_LAST] => [year:bigint, count:bigint] – Exchange[GATHER] => year:bigint, count:bigint – Aggregate(FINAL)[year] => [year:bigint, count:bigint] count := "count"("count_9") – Exchange[REPARTITION] => year:bigint, count_9:bigint – Aggregate(PARTIAL)[year] => [year:bigint, count_9:bigint] count_9 := "count"(*) – TableScan[hive:hive:default:track_metadata_parquet, originalConstraint = true] => [year:bigint] LAYOUT: hive year := HiveColumnHandle{clientId=hive, name=year, hiveType=smallint, hiveColumnIndex=5, partitionKey=false} Airpal: A Web Interface for Presto In March 2015 AirBNB announced Airpal, a web-based query tool that works with Presto..Beyond a visual interface to run queries it offers access controls, metadata exploration, query progress tracking and CSV exporting of results..These are the steps I took to install this software and launch it within the same Docker container as Presto..The first thing I had to do was to get a copy of Airpals git repository and build the Airpal JAR file..$ apt-get install build-essential git gradle mysql-server $ git clone https://github.com/airbnb/airpal.git $ cd airpal $ ./gradlew clean shadowJar I then created a MySQL database for Airpal to store its configuration and logs..$ mysql -uroot -proot -e'CREATE DATABASE `airpal` CHARACTER SET utf8 COLLATE utf8_general_ci;' Airpal needs a configuration file to tell its JAR file how to, among other things, connect with the MySQL data backend..$ vi reference.yml # Logging settings logging: loggers: org.apache.shiro: INFO # The default level of all loggers..Can be OFF, ERROR, WARN, INFO, DEBUG, TRACE, or ALL..level: INFO # HTTP-specific options..server: applicationConnectors: – type: http port: 8081 idleTimeout: 10 seconds adminConnectors: – type: http port: 8082 shiro: iniConfigs: ["classpath:shiro_allow_all.ini"] dataSourceFactory: driverClass: com.mysql.jdbc.Driver user: root password: root url: jdbc:mysql://127.0.0.1:3306/airpal # The URL to the Presto coordinator..prestoCoordinator: http://127.0.0.1:8080 With that in place I setup Airpals database schema and launched the server in the background..$ java -Duser.timezone=UTC -cp build/libs/airpal-*-all.jar com.airbnb.airpal.AirpalApplication db migrate reference.yml $ java -server -Duser.timezone=UTC -cp build/libs/airpal-*-all.jar com.airbnb.airpal.AirpalApplication server reference.yml & Their is a health check endpoint on port 8082.. More details

Leave a Reply