Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
exploration.ipynb		exploration.ipynb
load_into_bq.sh		load_into_bq.sh
queries.txt		queries.txt
simevents.json		simevents.json
trainday.txt		trainday.txt
tzcorr.json		tzcorr.json

README.md

5. Interactive data exploration

Catch up from previous chapters if necessary

If you didn't go through Chapters 2-4, the simplest way to catch up is to copy data from my bucket:

Go to the 02_ingest folder of the repo, run the program ./ingest_from_crsbucket.sh and specify your bucket name.
Go to the 04_streaming folder of the repo, run the program ./ingest_from_crsbucket.sh and specify your bucket name.
Create a dataset named "flights" in BigQuery by typing:
```
 bq mk flights
```

Load the CSV files created in Chapter 3 into BigQuery

Open CloudShell and navigate to 05_bqnotebook
Run the script to load data into BigQuery:
```
 bash load_into_bq.sh <BUCKET-NAME>
```

Visit the BigQuery console to query the new table:

 SELECT
   *
 FROM (
   SELECT
     ORIGIN,
     AVG(DEP_DELAY) AS dep_delay,
     AVG(ARR_DELAY) AS arr_delay,
     COUNT(ARR_DELAY) AS num_flights
   FROM
     flights.tzcorr
   GROUP BY
     ORIGIN )
 WHERE
   num_flights > 3650
 ORDER BY
   dep_delay DESC

In BigQuery, run this query to save the results as a table named trainday

 CREATE OR REPLACE TABLE flights.trainday AS
     SELECT
   FL_DATE,
   IF(MOD(ABS(FARM_FINGERPRINT(CAST(FL_DATE AS STRING))), 100) < 70, 'True', 'False') AS is_train_day
 FROM (
   SELECT
     DISTINCT(FL_DATE) AS FL_DATE
   FROM
     `flights.tzcorr`)
 ORDER BY
   FL_DATE

This table will be used throughout the rest of the book.

In CloudShell, create a Datalab instance (change the zone to match where your bucket is located):
```
 datalab create --zone us-central1-a dsongcp
```
Once you get the message that the instance is reachable on localhost:8081, navigate to the web page using the Web Preview button on the top-right of Cloud Shell.
Within Datalab, start a new notebook. Then, copy and paste cells from exploration.ipynb and click Run to execute the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_bqnotebook

05_bqnotebook

README.md

5. Interactive data exploration

Catch up from previous chapters if necessary

Load the CSV files created in Chapter 3 into BigQuery

Files

05_bqnotebook

Directory actions

More options

Directory actions

More options

Latest commit

History

05_bqnotebook

Folders and files

parent directory

README.md

5. Interactive data exploration

Catch up from previous chapters if necessary

Load the CSV files created in Chapter 3 into BigQuery