Puy Web
Profile Blog
EN TH
Blog Implement Multimodal Vector Search with BigQuery: Challenge Lab
Implement Multimodal Vector Search with BigQuery: Challenge Lab
Technology Apr 13, 2026

Implement Multimodal Vector Search with BigQuery: Challenge Lab

Lab Link: https://www.skills.google/course_templates/1232/labs/598657

Challenge Scenario

You are a data scientist at Cymbal, an online retail store. You want to build a pipeline to constantly search for similar products on the market to inform a marketing comparison study. You have a few challenges:

  • How to handle multimodal data: The data you have collected is multimodal including text, image, and video, including some files in Cloud Storage.

  • How to perform a semantic similarity search instead of a keyword search: You want to find similar products across multiple dimensions (e.g., image, description, and specific features), where keyword search may not be effective.

  • How to use BigQuery to do it: Since most of your data is already in BigQuery, using the same tool could minimize the learning curve.

To address these challenges, you decide to implement multimodal vector search with BigQuery.

Task 1: Set up the Environment & Connection

0. Go to BigQuery, then Click on Connections.

1. Create Connection: In the BigQuery, click Create Connection.

* Connection type: Vertex AI remote models, remote functions and BigLake (Cloud Resource).

* Connection ID: vector_conn (or as specified in the lab).

* Location Type: us-west1 (or as specified in the lab).

2. Grant Permissions: Once created, click on the connection ID. Copy the Service Account ID (looks like an email).

3. IAM Policy: Go to IAM & Admin > IAM. Click Grant Access.

* New principals: Paste the Service Account ID.

* Role: BigQuery Data Owner.

* Role: Vertex AI User.

* Role: Storage Object Viewer.

Task 2: Create the Object Table

Create an object table that allows BigQuery to treat files in a GCS buckets as rows in a table. Lab will be provided the query to use. Then replace [] value in the query. Run the query in BigQuery SQL Editor.

Create or replace external table `[PROJECT_ID].[DATASET_NAME].[OBJECT_TABLE_NAME]`
with connection `[PROJECT_ID].[CONNECTION_REGION].[CONNECTION_NAME]`
options
(
object_metadata='SIMPLE',
uris=['gs://[PROJECT_ID]/*']
)

Task 3: Generate Embeddings

Part 1: Connect to the multimodal embeddings model

1. Open the BigQuery SQL Editor.

2. Paste the following query.

3. Replace [] to the lab specific.

Create or replace model
`[PROJECT_ID].[DATASET_NAME].[MODEL_NAME]`
remote with connection `[PROJECT_ID].[CONNECTION_REGION].[CONNECTION_NAME]`
options(
   endpoint = `[ENDPOINT_MODEL_NAME]`
);

Example.

Create or replace model
`[PROJECT_ID].gcc_bqml_dataset.gcc_embedding`
remote with connection `[PROJECT_ID].us-west1.vector_conn`
options(
    endpoint = 'multimodalembedding@001'
);

4. Click Run.

Part 2: Generate embeddings

1. Clear BigQuery SQL Editor.

2. Paste the following query.

3. Replace [] to the lab specific.

Create or replace table `[PROJECT_ID].[DATASET_NAME].[EMBEDDINGS_TABLE_NAME]`
as select *, REGEXP_EXTRACT(uri, r'[^/]+$') as product_name
from [EMBEDDINGS_FUNCTION]
(
MODEL `[PROJECT_ID].[DATASET_NAME].[MODEL_NAME]`,
TABLE `[PROJECT_ID].[DATASET_NAME].[OBJECT_TABLE_NAME]`
)

Example.

Create or replace table `[PROJECT_ID].gcc_bqml_dataset.gcc_retail_store_embeddings`
as select *, REGEXP_EXTRACT(uri, r'[^/]+$') as product_name
from ML.GENERATE_EMBEDDING
(
MODEL `[PROJECT_ID].gcc_bqml_dataset.gcc_embedding`,
TABLE `[PROJECT_ID].gcc_bqml_dataset.gcc_image_object_table`
)

4. Click Run.

Task 4. Run a vector search

1. Clear BigQuery SQL Editor.

2. Paste the following query.

3. Replace [] to the lab specific.

Create or replace table `[PROJECT_ID].[DATASET_NAME].[SEARCH_RESULTS_TABLE]` AS
select base.uri,
base.product_name,
base.content_type,
distance
 from
[VECTOR_SEARCH_FUNCTION](table [DATASET_NAME].[EMBEDDINGS_TABLE_NAME],'ml_generate_embedding_result',
(
SELECT ml_generate_embedding_result as embedding_col
FROM
 [EMBEDDINGS_FUNCTION]
 (
   MODEL `[DATASET_NAME].[MODEL_NAME]`,
   (select 'Men Sweaters' as content),
   STRUCT(TRUE AS flatten_json_output)
 )
),
  [STATEMENT_TO_SELECT_TOP_2_RESULTS],
  distance_type => 'COSINE'
);

Example.

Create or replace table `[PROJECT_ID].gcc_bqml_dataset.gcc_vector_search_table` AS
select base.uri,
base.product_name,
base.content_type,
distance
 from
VECTOR_SEARCH(table gcc_bqml_dataset.gcc_retail_store_embeddings,'ml_generate_embedding_result',
(
SELECT ml_generate_embedding_result as embedding_col
FROM
 ML.GENERATE_EMBEDDING
 (
   MODEL `gcc_bqml_dataset.gcc_embedding`,
   (select 'Men Sweaters' as content),
   STRUCT(TRUE AS flatten_json_output)
 )
),
  top_k => 2,
  distance_type => 'COSINE'
);

Done the lab.

Badge for Implement Multimodal Vector Search with BigQuery
Share this article:

Related Articles

Journey Log: Setup & Deploy Web App / API / Let's Encrypt on VPS with Dockers and Auto Deploy from GitLab Container Registry
Technology
Apr 20, 2026

Journey Log: Setup & Deploy Web App / API / Let's Encrypt on VPS with Dockers and Auto Deploy from GitLab Container Registry

This blog is a content for setup & deploy web application and api service with dockerize on VPS, with auto deploy if found new image registry.

Read More
Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab
Technology
Apr 14, 2026

Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab

Let learn through the lab.

Read More
Perform Predictive Data Analysis in BigQuery: Challenge Lab
Technology
Apr 12, 2026

Perform Predictive Data Analysis in BigQuery: Challenge Lab

Let learn throught the challenge lab.

Read More
Manage Kubernetes in Google Cloud: Challenge Lab
Technology
Apr 11, 2026

Manage Kubernetes in Google Cloud: Challenge Lab

Let learn throught the challenge lab.

Read More