BigQuery without a credit card: Discover, learn and share

BigQuery without a credit card: Discover, learn and shareIf you ever had trouble signing up for BigQuery, worry no more — now it’s easier than ever to sign up and start querying.

The new sandbox mode even includes free storage, no credit card required.

Felipe HoffaBlockedUnblockFollowFollowingFeb 7See the official blog post “Query without a credit card: introducing BigQuery sandbox” for more details.

Here we are going to focus on you getting started and querying as quickly as possible.

Step 0: Create an accountGet into the BigQuery web UI and then follow the prompts.

In 60 seconds and less than 4 steps you’ll be ready to start querying.

Step 1: My first queryLet’s find out who the most famous Alan is, according to Wikipedia.

Enter this query into the new BigQuery web UI, and then click ‘run’:SELECT title, SUM(views) viewsFROM `fh-bigquery.

wikipedia_v3.

pageviews_2019`WHERE DATE(datehour) BETWEEN '2019-01-01' AND '2019-01-10'AND wiki='en'AND title LIKE r'Alan_%'GROUP BY titleORDER BY views DESCLIMIT 10Turns out Alan Turing was the one Alan with the largest # of pageviews in the English Wikipedia during the first 10 days of 2019.

Try looking for other names or time periods.

Can you guess who’s the most famous Steve?Let’s dissect the query, in case you’re new to BigQuery’s standard SQL:SELECT title, SUM(views) views: This gives me the total number of pageviews for each title.

FROM `fh-bigquery.

wikipedia_v3.

pageviews_2019`: Scanning Wikipedia’s 2019 pageviews table.

This table is shared by me.

WHERE datehour BETWEEN ‘2019–01–01’ AND ‘2019–01–10’: We are only going to scan the first 10 days of 2019.

AND wiki=’en’: There are many Wikipedias — and I want to focus only on the English one.

This filter works well because the table primary clustering is which Wikipedia I want to query.

Oh, had I filtered for en.

m instead — the English Wikipedia for mobile — I would have received different results.

AND title LIKE r’Alan_%’: Asking for all Wikipedia pages that start with “Alan_”.

This filter works well with the secondary clustering by title.

Note that I need to escape the _ when doing a LIKE.

GROUP BY title: We are going to get the SUM(views) for each title.

ORDER BY views DESC: Sorting results by which page got most views.

LIMIT 10: We only want to see the top 10.

Some interesting things happened here:During these 10 days Wikipedia had more than 5.

6 billion pageviews.

Inside BigQuery this is represented by more than 72 GB of data.

At a larger scale, the table for all 2018 Wikipedia pageviews is 2.

25 TB.

The cost of using BigQuery is proportional to the size of the columns scanned — and we have good news here: This gets much better when using tables that have been date partitioned and clustered.

For example, the Alan query was going to cost me 73 GB, but thanks to partitions and clusters, it ended up scanning only 2 GB.

This is a huge difference — it means I can perform 500 queries like this for free every month, instead of only 12.

Step 2: Create your own tablesLet’s say we want to dive into all questions about Tensorflow on Stack Overflow.

We can write a query like this:SELECT view_count, answer_count, DATE(creation_date) date, title FROM `bigquery-public-data.

stackoverflow.

posts_questions`WHERE 'tensorflow' IN UNNEST(SPLIT(tags, '|'))ORDER BY view_count DESCLIMIT 10Top ten Tensorflow questions on Stack Overflow, by total # of viewsThis query used 1.

66 GB of our free quota — and each similar query will have a similar cost.

We can do better: Extract the data you are interested in to a new table.

With BigQuery’s sandbox mode now you also get 10 GB of storage for free.

So instead of running new queries every time over the whole dataset, we can extract all Tensorflow questions to a new table.

To create a new table, first create a dataset.

Note that without a credit card associated to your account, BigQuery will limit the lifetime of any table to 60 days.

Create a new dataset inside your BigQuery project.

Lifetime of a table will be limited to 60 days in sandbox mode.

To create a new table out of our previous query, BigQuery now supports DDL and DML SQL commands:CREATE TABLE `deleting.

tensorflow_questions`ASSELECT view_count, answer_count, DATE(creation_date) date, title FROM `bigquery-public-data.

stackoverflow.

posts_questions`WHERE 'tensorflow' IN UNNEST(SPLIT(tags, '|'))Now I can write queries like this over my new table:SELECT view_count, answer_count, date, title FROM `deleting.

tensorflow_questions`ORDER BY view_count DESCGood news: This query now scans only 3 MB, which gives me a lot more freedom to experiment.

I can execute more than 300,000 queries like this for free every month!Step 3: ShareWith BigQuery you can share your results and findings with your closest friends or the whole world.

This is still not implemented on the new web UI, but it’s really easy on the BigQuery classic web UI:How to share a BigQuery table on the classic UIVisualize with Data StudioMultiple tools and frameworks can connect straight to BigQuery — we love them all.

And now in the new BigQuery web UI you have a quick way to get your results into Data Studio:Visualizing query results with Data StudioGo furtherSubscribe to/r/bigquery to keep in touch with all of the BigQuery latest news.

Check out our official public datasets and also some unofficial ones.

Stuck?.Ask the community on Stack Overflow.

Missing me?.Follow my latest posts on Medium and on Twitter @felipehoffa.

.

. More details

Leave a Reply