Using Cohere to Generate Vector Embeddings
When using Vector Databases or Vector Embedding features in most (modern, multi-model) Databases, you can gain additional insights. The Vector Embeddings can be utilised for semantic searches, where you can find related data or information based on the values of the Vectors. One of the challenges is generating suitable vectors. There are a lot of options available for generating these Vector Embeddings. In this post, I’ll illustrate how to generate these using Cohere, and in a future post, I’ll illustrate using OpenAI. There are advantages and disadvantages to using these solutions. I’ll point out some of these for each solution (Cohere and OpenAI), using their Python API library.
The first step is to install the Cohere Python library
pip install cohere
Next, you need to create an account with Cohere. This will allow you to get an API key. You can get a Trial API key, but you will be restricted to the number of calls you are allowed. As with most free or trial keys these allow you to do a limited number of calls. This is commonly referred to as Rate Limiting. The trial key for the embedding models allows you to have up to 40 calls per minute. This is very very limited and each call is very slow. (I’ll discuss related issues about OpenAI rate limiting in another post)
The dataset I’ll be using is the Wine Reviews 130K (dropbox link). This is widely available on many sites. I want to create Vector Embeddings for the ‘Description’ field in this dataset which contains a review of each wine. There are some columns with no values, and these need to be handled. For each wine review, I’ll create a SQL INSERT statement and print this out to a file. This file will contain an INSERT statement for each wine review, including the vector embedding.
Here’s the code, (you’ll need to enter an API key and change the directory for the data file)
import numpy as np
import os
import time
import pandas as pd
import cohere
co = cohere.Client(api_key="...")
data_file = ".../VectorDatabase/winemag-data-130k-v2.csv"
df = pd.read_csv(data_file)
print_every = 200
rate_limit = 1000 #Cohere limits to 40 API calls per minute
print("Input file :", data_file)
v_file = os.path.splitext(data_file)[0]+'.cohere'
print(v_file)
#Open file with write (over-writes previous file)
f=open(v_file,"w")
for index,row in df.head(rate_limit).iterrows():
phrases=list(row['description'])
model="embed-english-v3.0"
input_type="search_query"
#####
res = co.embed(texts=phrases,
model=model,
input_type=input_type) #,
# embedding_types=['float'])
v_embedding = str(res.embeddings[0])
tab_insert="INSERT into WINE_REVIEWS_130K VALUES ("+str(row["Seq"])+"," \
+'"'+str(row["description"])+'",' \
+'"'+str(row["designation"])+'",' \
+str(row["points"])+"," \
+'"'+str(row["province"])+'",' \
+str(row["price"])+"," \
+'"'+str(row["region_1"])+'",' \
+'"'+str(row["region_2"])+'",' \
+'"'+str(row["taster_name"])+'",' \
+'"'+str(row["taster_twitter_handle"])+'",' \
+'"'+str(row["title"])+'",' \
+'"'+str(row["variety"])+'",' \
+'"'+str(row["winery"])+'",' \
+"'"+v_embedding+"'"+");\n"
f.write(tab_insert)
if (index%print_every == 0):
print(f'Processed {index} vectors ', time.strftime("%H:%M:%S", time.localtime()))
#Close vector file
f.close()
print(f"Finished writing file with Vector data [{index+1} vectors]", time.strftime("%H:%M:%S", time.localtime()))
The vector generated has 1024 dimensions. At this time there isn’t a parameter to change/reduce the number of dimensions.
The output file can now be run in your database, assuming you’ve created a table called WINE_REVIEWS_130K and has a column with the appropriate data type (e.g. VECTOR)
Warnings: When using the Cohere API you are limited to maximum of 40 calls per minute. I’ve found this to be incorrect and it was more like 38 calls (for me). I also found the ‘per minute’ to be incorrect. I had to wait several minutes and up to five minutes before I could attempt another run.
In an attempt to overcome this, I create a production API key. This involved giving some payment details, and this in theory should remove the ‘per minute’ rate limit, among other things. Unfortunately, for me this was not a good experience, as I had to make multiple attempts to run for 1000 records before I could have a successful outcome. I experienced multiple Server 500 errors and other errors that related to Cohere server problems.
I wasn’t able to process more that 600 records before the errors occurred and I wasn’t able to generate for a larger percentage of the dataset.
An additional issue is with the response time from Cohere. It was taking approx. 5 minutes to process 200 API calls.
So overall a rather poor experience. I then switched to OpenAI and had a slightly different experience. Check out that post for more details.