Puy Web
Profile Blog
EN TH
Blog Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab
Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab
Technology Apr 14, 2026

Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab

Link Lab: https://www.skills.google/course_templates/627/labs/612995

Challenge scenario

You have started a new role as a Data Engineer for TaxiCab Inc. You are expected to import some historical data to a working BigQuery dataset, and build a basic model that predicts fares based on information available when a new ride starts. Leadership is interested in building an app and estimating for users how much a ride will cost. The source data will be provided in your project.

You are expected to have the skills and knowledge for these tasks, so don't expect step-by-step guides to be provided.

Task 1: Clean Your Training Data

We need to copy data from the raw table into a clean training table. We will filter out anomalies like trips with zero distance, zero passengers, impossible fares, and coordinates that fall into the ocean (Null Island). We also combine tolls and fares to create the target fare_amount_617 column (The lab will be specific column name also table name and the conditions).

1. Open the BigQuery SQL Workspace.

2. Paste the following query and click Run. (Don't forget to check the lab specific require for clean the data).

CREATE OR REPLACE TABLE `taxirides.taxi_training_data_243` AS
SELECT
  pickup_datetime,
  pickup_longitude AS pickuplon,
  pickup_latitude AS pickuplat,
  dropoff_longitude AS dropofflon,
  dropoff_latitude AS dropofflat,
  passenger_count AS passengers,
  (tolls_amount + fare_amount) AS fare_amount_617
FROM
  `taxirides.historical_taxi_rides_raw`
WHERE
  RAND() < 0.001
  AND trip_distance > 0 
  AND fare_amount < 3
  AND passenger_count > 0
  AND pickup_longitude > -78 AND pickup_longitude < -70
  AND dropoff_longitude > -78 AND dropoff_longitude < -70
  AND pickup_latitude > 37 AND pickup_latitude < 45
  AND dropoff_latitude > 37 AND dropoff_latitude < 45

Task 2: Create a BigQuery ML Model

Now, we will train a Linear Regression model on clean data. Notice how we use a WITH clause to engineer the euclidean distance feature on the fly before feeding it into the model. With also need to use TRANSFORM to reduce RMSE. Don't forgot to change the fare_model to follow the lab specific.

1. Clear your SQL Editor.

2. Paste the following query and click Run.

CREATE OR REPLACE MODEL `taxirides.fare_model_251`
TRANSFORM(
  fare_amount_617,
  CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek,
  CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday,
  ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
)
OPTIONS(
  model_type='linear_reg',
  input_label_cols=['fare_amount_617']
) AS
SELECT * FROM `taxirides.taxi_training_data_243`;

To check the RMSE, we can use this query to verify. (Don't forgot to change fare_model and taxi_training_data name)

SELECT
  SQRT(mean_squared_error) AS rmse
FROM
  ML.EVALUATE(MODEL `taxirides.fare_model_946`,
    (SELECT * FROM `taxirides.taxi_training_data_243`)
  );

Task 3: Perform Batch Predictions on New Data

Finally, leadership wants we to predict fares for a new batch of 2015 data. We will use the ML.PREDICT function. Crucially, we must apply the exact same euclidean distance calculation to this prediction data so our model knows how to read it.

1. Clear your SQL Editor.

2. Paste the following query and click Run. (Don't forgot to change the fare_model to match the lab require.)

CREATE OR REPLACE TABLE `taxirides.2015_fare_amount_predictions` AS
SELECT * FROM ML.PREDICT(MODEL `taxirides.fare_model_946`, 
  (SELECT * FROM `taxirides.report_prediction_data`)
);

Congratulations! Complete the lab.

Engineer Data for Predictive Modeling with BigQuery ML badge

Share this article:

Related Articles

Journey Log: Setup & Deploy Web App / API / Let's Encrypt on VPS with Dockers and Auto Deploy from GitLab Container Registry
Technology
Apr 20, 2026

Journey Log: Setup & Deploy Web App / API / Let's Encrypt on VPS with Dockers and Auto Deploy from GitLab Container Registry

This blog is a content for setup & deploy web application and api service with dockerize on VPS, with auto deploy if found new image registry.

Read More
Implement Multimodal Vector Search with BigQuery: Challenge Lab
Technology
Apr 13, 2026

Implement Multimodal Vector Search with BigQuery: Challenge Lab

Let learn through the challenge lab.

Read More
Perform Predictive Data Analysis in BigQuery: Challenge Lab
Technology
Apr 12, 2026

Perform Predictive Data Analysis in BigQuery: Challenge Lab

Let learn throught the challenge lab.

Read More
Manage Kubernetes in Google Cloud: Challenge Lab
Technology
Apr 11, 2026

Manage Kubernetes in Google Cloud: Challenge Lab

Let learn throught the challenge lab.

Read More