From 08a0569e77e3c177a36a906f67d86f08b522793b Mon Sep 17 00:00:00 2001 From: Adam Seering Date: Wed, 16 Aug 2023 03:09:39 +0000 Subject: [PATCH] Add Text Embeddings example --- samples/README.md | 4 ++++ samples/Text Embeddings using Spanner's DBAPI Driver.ipynb | 1 + 2 files changed, 5 insertions(+) create mode 100644 samples/Text Embeddings using Spanner's DBAPI Driver.ipynb diff --git a/samples/README.md b/samples/README.md index 2d5090b..49c7f97 100644 --- a/samples/README.md +++ b/samples/README.md @@ -8,3 +8,7 @@ machine learning on Spanner. This example walks through using the extensions in this repository to query Spanner and train a scikit-learn model based on data from Spanner. +* Text Embeddings using Spanner's DBAPI Driver ([Colab](https://colab.research.google.com/github/cloudspannerecosystem/Text%20Embeddings%20using%20Spanner's%20DBAPI%20Driver.ipynb)|[.ipynb](Text%20Embeddings%20using%20Spanner's%20DBAPI%20Driver.ipynb)) + +This example shows how to register a Vertex AI embedding model with Spanner, and +how to ingest, embed, and retrieve data to/from Spanner. diff --git a/samples/Text Embeddings using Spanner's DBAPI Driver.ipynb b/samples/Text Embeddings using Spanner's DBAPI Driver.ipynb new file mode 100644 index 0000000..80ba716 --- /dev/null +++ b/samples/Text Embeddings using Spanner's DBAPI Driver.ipynb @@ -0,0 +1 @@ +{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyPtuRJeCTjJIjRN1tBCs2HA"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Text Embeddings using Spanner's DBAPI Driver\n","\n","Spanner has its own Python Client with a variety of Spanner-specific extensions and concepts. However, for some applications, it's simpler to use Spanner's standards-compliant DBAPI Driver.\n","\n","This driver provides the same Python API as is implemented by most other database engines' Python drivers. So it can be easier to use in a mixed-database environment, or for developers who are coming from other database systems."],"metadata":{"id":"yGtDYwb0xwU5"}},{"cell_type":"markdown","source":["## Step 1: Install Dependencies\n","\n","Spanner's DBAPI Driver is bundled into Spanner's client package. Let's go ahead and install that package.\n","\n","Let's also install Pandas, a popular library for manipulating datasets (dataframes) in Python. We'll also install Scikit Learn because it comes with a useful collection of example datasets for ML purposes. You can substitute the example dataset for your own data if you prefer."],"metadata":{"id":"YCUz3Ib9ywGb"}},{"cell_type":"code","source":["!pip install google-cloud-spanner pandas scikit-learn\n","\n","# Let's go ahead and import Pandas, since we'll use it in several places below.\n","import pandas as pd"],"metadata":{"id":"O4ig6T1ExwEw","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1692047538915,"user_tz":240,"elapsed":3982,"user":{"displayName":"Adam Seering","userId":"08779482826874381672"}},"outputId":"5f082068-cebc-4fdf-eb01-6357e0e30a28"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Requirement already satisfied: google-cloud-spanner in /usr/local/lib/python3.10/dist-packages (3.40.0)\n","Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (1.5.3)\n","Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)\n","Requirement already satisfied: google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (2.11.1)\n","Requirement already satisfied: google-cloud-core<3.0dev,>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (2.3.3)\n","Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (0.12.6)\n","Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (1.22.3)\n","Requirement already satisfied: sqlparse>=0.4.4 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (0.4.4)\n","Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in /usr/local/lib/python3.10/dist-packages (from google-cloud-spanner) (3.20.3)\n","Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)\n","Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.3)\n","Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.23.5)\n","Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.10.1)\n","Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.3.2)\n","Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3.2.0)\n","Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (1.60.0)\n","Requirement already satisfied: google-auth<3.0.dev0,>=2.14.1 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (2.17.3)\n","Requirement already satisfied: requests<3.0.0.dev0,>=2.18.0 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (2.31.0)\n","Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (1.57.0)\n","Requirement already satisfied: grpcio-status<2.0.dev0,>=1.33.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (1.48.2)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)\n","Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (5.3.1)\n","Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (0.3.0)\n","Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (4.9)\n","Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (3.2.0)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (3.4)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (2.0.4)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0.dev0,>=2.18.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (2023.7.22)\n","Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.dev0,>=2.14.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0->google-cloud-spanner) (0.5.0)\n"]}]},{"cell_type":"markdown","source":["## Step 2: Authenticate to GCP\n","\n","Google offers a variety of options for authenticating to GCP. Please see the [documentation](https://googleapis.dev/python/google-api-core/latest/auth.html) for more details.\n","\n","Google's hosted Notebook offerings provide a convenient built-in authentication method, as illustrated below. This method will open a pop-up window asking you to authenticate this notebook using your Google credentials."],"metadata":{"id":"6okXrOCvzBuY"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"TjhOWSOTxE1M"},"outputs":[],"source":["from google.colab import auth\n","auth.authenticate_user()"]},{"cell_type":"markdown","source":["## Step 3: Connecting to Cloud Spanner\n","\n","Now that we're authenticated, let's establish a connection to Cloud Spanner. This connection will connect directly to your production Spanner instance and use the compute allocated to that instance. (It doesn't use [DataBoost](https://cloud.google.com/spanner/docs/databoost/databoost-overview), which has great advantages for supported queries but does not support DML, DDL, or non-root-partitioned SELECT.)\n","\n","Please modify this example to specify your own instance and database IDs."],"metadata":{"id":"kPpkm4IozH_S"}},{"cell_type":"code","source":["import os\n","\n","PROJECT_ID = os.environ.get(\"PROJECT_ID\") or \"span-cloud-testing\"\n","INSTANCE_ID = os.environ.get(\"INSTANCE_ID\") or \"aseering-us-east4\"\n","DATABASE_ID = os.environ.get(\"DATABASE_ID\") or \"gsql-test\"\n","\n","from google.cloud.spanner_dbapi import connect\n","\n","connection = connect(INSTANCE_ID, DATABASE_ID, project=PROJECT_ID)\n","cursor = connection.cursor()\n","cursor.autocommit = True # TODO: appears to be a no-op?"],"metadata":{"id":"4aCc4yZw1EiV"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Step 4: Load some Data\n","\n","Now that we're connected, let's go ahead and create a table and load some data into it. Let's go ahead and load some data based on\n","\n","Spanner is often used to host production applications that generate their own data. If you already have tables, feel free to skip this step and update the following steps to point at your own tables. Otherwise, this section is a bit dense; hang onto your hats! Or just run it and skip ahead assuming that you now have some data in your database."],"metadata":{"id":"cVcDaAvY1lNA"}},{"cell_type":"code","source":["# Import a bunch of dependencies\n","import email.parser\n","import sklearn.datasets\n","import time\n","import uuid\n","\n","# Download the sklearn \"20newsgroups\" dataset into a local variable\n","newsgroups = sklearn.datasets.fetch_20newsgroups(subset=\"all\")\n","\n","# The newsgroups are NNTP messages. NNTP messages are structured like e-mails.\n","# Parse them; then construct a DataFrame from a hardcoded list of fields.\n","# Spanner needs a unique ID for each record, so add a UUID column.\n","parser = email.parser.Parser()\n","newsgroup_messages = [parser.parsestr(message) for message in newsgroups.data]\n","df = pd.DataFrame({\n"," 'id': [str(uuid.uuid4()) for _ in newsgroup_messages],\n"," 'from': [message['From'] for message in newsgroup_messages],\n"," 'subject': [message['Subject'] for message in newsgroup_messages],\n"," 'nntp_posting_host': [message['Nntp-Posting-Host']\n"," for message in newsgroup_messages],\n"," 'organization': [message['Organization'] for message in newsgroup_messages],\n"," 'body': [message.get_payload() for message in newsgroup_messages],\n","})\n","\n","# Create a table with columns corresponding to common fields above.\n","# Treat all of the fields as un-bounded strings for now. There's no cost to\n","# doing so, and we don't know how long future values might be.\n","# (Constrain the `id` field; it should be a valid UUID.)\n","cursor.execute(\"\"\"\n","DROP TABLE IF EXISTS spanner_ml_example_20newsgroups;\n","CREATE TABLE spanner_ml_example_20newsgroups (\n"," `id` STRING(36) NOT NULL,\n"," `from` STRING(MAX) NOT NULL,\n"," `subject` STRING(MAX) NOT NULL,\n"," `nntp_posting_host` STRING(MAX) NOT NULL,\n"," `organization` STRING(MAX) NOT NULL,\n"," `body` STRING(MAX) NOT NULL\n",") PRIMARY KEY (id)\n","\"\"\")\n","\n","# Spanner DDL statements don't require a commit on the Spanenr backend, but\n","# this causes Spanner's DBAPI Driver to flush queued-up DDL changes to the\n","# backend and to refresh its local transaction pool.\n","connection.commit()\n","\n","# Load the data from the dataframe into Spanner.\n","# Slice and load several rows of data at a time for slightly better parallelism\n","# within Spanner, and to avoid exceeding Spanner transaction size limits.\n","BATCH_SIZE=1000\n","for i in range(0, len(df), BATCH_SIZE):\n"," cursor.executemany(\"\"\"\n"," INSERT INTO spanner_ml_example_20newsgroups (\n"," `id`, `from`, `subject`, `nntp_posting_host`, `organization`, `body`\n"," ) VALUES (\n"," %(id)s, %(from)s, %(subject)s, %(nntp_posting_host)s, %(organization)s, %(body)s\n"," )\n"," \"\"\", df[i:i+BATCH_SIZE].to_dict(orient=\"records\"))\n"," connection.commit()"],"metadata":{"id":"eT_MhO_I6M29"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Step 5: Register a Model\n","\n","Spanner supports both custom and pre-trained ML models. It uses models that are managed by [Vertex AI](https://cloud.google.com/vertex-ai).\n","\n","Vertex AI provides a powerful suite of tools for managing models. For now, let's just use Vertex AI's pre-trained [\"Gecko\" PaLM embedding model](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).\n","\n","This model takes a string (such as a message body) as an argument, and returns a tuple of `{statistics, values}`:\n","\n","* `values` - the actual embedding\n","* `statistics` - counts and other metadata, gathered while generating the embedding\n","\n","It's automatically available in all GCP projects, published at the URL in the example below. The model should be available to Spanner, but depending on your IAM configuration, you may be prompted to enable permissions to allow access."],"metadata":{"id":"0mhVIKoUfzXn"}},{"cell_type":"code","source":["cursor.execute(\"\"\"\n","CREATE OR REPLACE MODEL spanner_ml_example_textembedding_gecko\n","INPUT (\n"," content STRING(MAX)\n",")\n","OUTPUT (\n"," embeddings STRUCT<\n"," statistics STRUCT<\n"," truncated BOOL, token_count DOUBLE\n"," >,\n"," values ARRAY\n"," >\n",")\n","REMOTE OPTIONS (\n"," endpoint = '//aiplatform.googleapis.com/projects/{PROJECT_ID}/locations/us-central1/publishers/google/models/textembedding-gecko'\n",")\n","\"\"\".format(PROJECT_ID=connection.instance._client.project))\n","connection.commit()\n"],"metadata":{"id":"xUbghft4BTH4","colab":{"base_uri":"https://localhost:8080/","height":426},"executionInfo":{"status":"error","timestamp":1692050094144,"user_tz":240,"elapsed":2027,"user":{"displayName":"Adam Seering","userId":"08779482826874381672"}},"outputId":"19962e5b-a58e-4ceb-fe73-f0d06daed391"},"execution_count":null,"outputs":[{"output_type":"error","ename":"FailedPrecondition","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mFailedPrecondition\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 16\u001b[0m )\n\u001b[1;32m 17\u001b[0m \"\"\".format(PROJECT_ID=connection.instance._client.project))\n\u001b[0;32m---> 18\u001b[0;31m \u001b[0mconnection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcommit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/connection.py\u001b[0m in \u001b[0;36mcommit\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 391\u001b[0m \u001b[0;32mreturn\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 392\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 393\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_prior_DDL_statements\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 394\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minside_transaction\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 395\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/connection.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(connection, *args, **kwargs)\u001b[0m\n\u001b[1;32m 51\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mInterfaceError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Connection is already closed\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 52\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 53\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunction\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconnection\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 54\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 55\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/connection.py\u001b[0m in \u001b[0;36mrun_prior_DDL_statements\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 433\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_ddl_statements\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 435\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdatabase\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate_ddl\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mddl_statements\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 436\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 437\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun_statement\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatement\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretried\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/future/polling.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout, retry, polling)\u001b[0m\n\u001b[1;32m 259\u001b[0m \u001b[0;31m# pylint: disable=raising-bad-type\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 260\u001b[0m \u001b[0;31m# Pylint doesn't recognize that this is valid in this case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 261\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 262\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 263\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_result\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mFailedPrecondition\u001b[0m: 400 Model spanner_ml_example_textembedding_gecko has invalid endpoint option: //aiplatform.googleapis.com/projects/span-cloud-testing/locations/us-central1/publishers/google/models/textembedding-gecko. Expected suffix projects/*/locations/*/endpoints/* 9: Model spanner_ml_example_textembedding_gecko has invalid endpoint option: //aiplatform.googleapis.com/projects/span-cloud-testing/locations/us-central1/publishers/google/models/textembedding-gecko. Expected suffix projects/*/locations/*/endpoints/*"]}]},{"cell_type":"markdown","source":["## Step 6: Add computed embedding column\n","\n","Let's say you want to be able to build a tool to enable quick searching for relevant messages. Now that we have registered a text-embedding model with Spanner, we can use that model to have Spanner automatically calculate and store the embedding for messages in the dataset.\n","\n","With this approach, the embedding will automatically (and transactionally) be updated whenever messages are inserted or modified."],"metadata":{"id":"0ZnuYP8mx0OY"}},{"cell_type":"code","source":["cursor.execute(\"\"\"\n","ALTER TABLE spanner_ml_example_20newsgroups\n","ADD COLUMN body_embedding ARRAY NOT NULL\n"," GENERATED AS spanner_ml_example_textembedding_gecko(body).values STORED\n","\"\"\")\n","connection.commit()"],"metadata":{"id":"grMvK3dTzQrv"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Step 7: Read back embeddings\n","\n","Typically once embeddings are generated, your application would then be updated to use them. This way, Spanner will maintain the embeddings over time, and your application can read an up-to-date embedding whenever it needs one.\n","\n","As a simple example, let's read back the embeddings that we just generated and add them to our DataFrame."],"metadata":{"id":"hUkwRs3s1zU_"}},{"cell_type":"code","source":["# Read embeddings back from Spanner.\n","# Read back the row ID for each embedding as well, so we can match up the\n","# returned embeddings with the rows that we already have.\n","cursor.execute(\"\"\"\n","SELECT `id`, `body_embedding` FROM spanner_ml_example_20newsgroups\n","\"\"\")\n","\n","# Generate a dictionary from the resultset.\n","# Map each row's ID to its embedding.\n","embeddings = {row[\"id\"]: row[\"body_embedding\"] for row in cursor}\n","\n","# For each ID in our dataframe,\n","# insert the corresponding embedding into a new \"body_embeddings\" field\n","# in the dataframe.\n","df[\"body_embedding\"] = [embeddings[x] for x in df[\"id\"]]"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":773},"id":"5z3CLYm82c7S","executionInfo":{"status":"error","timestamp":1692050264688,"user_tz":240,"elapsed":459,"user":{"displayName":"Adam Seering","userId":"08779482826874381672"}},"outputId":"a3d7c009-0003-4080-8062-2b630a69df69"},"execution_count":null,"outputs":[{"output_type":"error","ename":"ProgrammingError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31m_MultiThreadedRendezvous\u001b[0m Traceback (most recent call last)","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py\u001b[0m in \u001b[0;36merror_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 161\u001b[0m \u001b[0mprefetch_first\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcallable_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"_prefetch_first_result_\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 162\u001b[0;31m return _StreamingResponseIterator(\n\u001b[0m\u001b[1;32m 163\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprefetch_first_result\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprefetch_first\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, wrapped, prefetch_first_result)\u001b[0m\n\u001b[1;32m 87\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mprefetch_first_result\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 88\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_stored_first_result\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_wrapped\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 89\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/grpc/_channel.py\u001b[0m in \u001b[0;36m__next__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 540\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__next__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 541\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_next\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 542\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/grpc/_channel.py\u001b[0m in \u001b[0;36m_next\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 966\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_state\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcode\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 967\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 968\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31m_MultiThreadedRendezvous\u001b[0m: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.INVALID_ARGUMENT\n\tdetails = \"Unrecognized name: body_embedding [at 2:12]\\nSELECT id, body_embedding FROM spanner_ml_example_20newsgroups\\n ^\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:173.194.213.95:443 {created_time:\"2023-08-14T21:57:44.679087403+00:00\", grpc_status:3, grpc_message:\"Unrecognized name: body_embedding [at 2:12]\\\\nSELECT id, body_embedding FROM spanner_ml_example_20newsgroups\\\\n ^\"}\"\n>","\nThe above exception was the direct cause of the following exception:\n","\u001b[0;31mInvalidArgument\u001b[0m Traceback (most recent call last)","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/cursor.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, sql, args)\u001b[0m\n\u001b[1;32m 272\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 273\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_itr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPeekIterator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_result_set\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 274\u001b[0m \u001b[0;32mbreak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/utils.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, source)\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 38\u001b[0;31m \u001b[0mhead\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mitr_src\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 39\u001b[0m \u001b[0;31m# Restitch and prepare to read from multiple iterators.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_v1/streamed.py\u001b[0m in \u001b[0;36m__iter__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 144\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 145\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_consume_next\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 146\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_v1/streamed.py\u001b[0m in \u001b[0;36m_consume_next\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 116\u001b[0m \"\"\"\n\u001b[0;32m--> 117\u001b[0;31m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_response_iterator\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 118\u001b[0m \u001b[0mresponse_pb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mPartialResultSet\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpb\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_v1/snapshot.py\u001b[0m in \u001b[0;36m_restart_on_unavailable\u001b[0;34m(method, request, trace_name, session, attributes, transaction, transaction_selector)\u001b[0m\n\u001b[1;32m 87\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mtrace_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtrace_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msession\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mattributes\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 88\u001b[0;31m \u001b[0miterator\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrequest\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 89\u001b[0m \u001b[0;32mwhile\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_v1/services/spanner/client.py\u001b[0m in \u001b[0;36mexecute_streaming_sql\u001b[0;34m(self, request, retry, timeout, metadata)\u001b[0m\n\u001b[1;32m 1203\u001b[0m \u001b[0;31m# Send the request.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1204\u001b[0;31m response = rpc(\n\u001b[0m\u001b[1;32m 1205\u001b[0m \u001b[0mrequest\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/gapic_v1/method.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, timeout, retry, *args, **kwargs)\u001b[0m\n\u001b[1;32m 112\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 113\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapped_func\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 114\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/timeout.py\u001b[0m in \u001b[0;36mfunc_with_timeout\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 119\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 120\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 121\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py\u001b[0m in \u001b[0;36merror_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 165\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mgrpc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mRpcError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 166\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_grpc_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mexc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 167\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mInvalidArgument\u001b[0m: 400 Unrecognized name: body_embedding [at 2:12]\\nSELECT id, body_embedding FROM spanner_ml_example_20newsgroups\\n ^ [locale: \"en-US\"\nmessage: \"Unrecognized name: body_embedding [at 2:12]\\nSELECT id, body_embedding FROM spanner_ml_example_20newsgroups\\n ^\"\n]","\nThe above exception was the direct cause of the following exception:\n","\u001b[0;31mProgrammingError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# Read back the row ID for each embedding as well, so we can match up the\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# returned embeddings with the rows that we already have.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m cursor.execute(\"\"\"\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mSELECT\u001b[0m \u001b[0mid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbody_embedding\u001b[0m \u001b[0mFROM\u001b[0m \u001b[0mspanner_ml_example_20newsgroups\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \"\"\")\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/cursor.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(cursor, *args, **kwargs)\u001b[0m\n\u001b[1;32m 68\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mInterfaceError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Cursor and/or connection is already closed.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 70\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunction\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 71\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.10/dist-packages/google/cloud/spanner_dbapi/cursor.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, sql, args)\u001b[0m\n\u001b[1;32m 288\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mIntegrityError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"details\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 289\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mInvalidArgument\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 290\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mProgrammingError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"details\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 291\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mInternalServerError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 292\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mOperationalError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"details\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mProgrammingError\u001b[0m: [locale: \"en-US\"\nmessage: \"Unrecognized name: body_embedding [at 2:12]\\nSELECT id, body_embedding FROM spanner_ml_example_20newsgroups\\n ^\"\n]"]}]},{"cell_type":"markdown","source":["And there you have it! Embeddings generated and maintained automatically by Spanner for your data, accessible in Python via query."],"metadata":{"id":"_JvZvOT93wZE"}},{"cell_type":"code","source":["df.head(5)"],"metadata":{"id":"6OeLWcsQ3v-y"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Step 7.1: Construct and read embeddings dynamically\n","\n","What if you don't want to store an embedding, you just want to generate it? Spanner can invoke the new model as part of a query as well."],"metadata":{"id":"eNBjlrxLKXOd"}},{"cell_type":"code","source":["cursor.execute(\"\"\"\n","SELECT id, spanner_ml_example_textembedding_gecko(body).values\n","FROM spanner_ml_example_20newsgroups\n","\"\"\")\n","\n","# Query computed and returned the same values as before\n","assert {row[\"id\"]: row[\"body_embedding\"] for row in cursor} == embeddings\n","\n","cursor.execute(\"\"\"\n","SELECT spanner_ml_example_textembedding_gecko(%(text)s).values AS embedding\n","\"\"\", {'text': \"Hello World!\"})\n","next(cursor)['embedding']"],"metadata":{"id":"i46sfj8mKW9T"},"execution_count":null,"outputs":[]}]} \ No newline at end of file