Vector Databases – Part 8 – Creating Vector Indexes
Building upon the previous posts, this post will explore how to create Vector Indexes and the different types. In the next post I’ll demonstrate how to explore wine reviews using the Vector Embeddings and the Vector Indexes.
Before we start working with Vector Indexes, we need to allocate some memory within the Database where these Vector Indexes can will located. To do this we need to log into the container database and change the parameter for this. I’m using a 23.5ai VM. On the VM I run the following:
sqlplus system/oracle@free
This will connect you to the contain DB. No run the following to change the memory allocation. Be careful not to set this too high, particularly on the VM. Probably the maximum value would be 512M, but in my case I’ve set it to a small value.
alter system set vector_memory_size = 200M scope=spfile;
You need to bounce the database. You can do this using SQL commands. If you’re not sure how to do that, just restart the VM.
After the restart, log into your schema and run the following to see if the parameter has been set correctly.
show parameter vector_memory_size;
NAME TYPE VALUE
------------------ ----------- -----
vector_memory_size big integer 208M
We can see from the returned results we have the 200M allocated (ok we have 208M allocated). If you get zero displayed, then something went wrong. Typically, this is because you didn’t run the command at the container database level.
You only need to set this once for the database.
There are two types of Vector Indexes:
- Inverted File Flat Vector Index (IVF). The inverted File Flat (IVF) index is the only type of Neighbor Partition vector index supported. Inverted File Flat Index (IVF Flat or simply IVF) is a partitioned-based index which balances high search quality with reasonable speed. This index is typically disk based.
- In-Memory Neighbor Graph Vector Index. Hierarchical Navigable Small World (HNSW) is the only type of In-Memory Neighbor Graph vector index supported. HNSW graphs are very efficient indexes for vector approximate similarity search. HNSW graphs are structured using principles from small world networks along with layered hierarchical organization. As the name suggests these are located in-memory. See the step above for allocating Vector Memory space for these indexes.
Let’s have a look at creating each of these.
CREATE VECTOR INDEX wine_desc_ivf_idx
ON wine_reviews_130k (embedding)
ORGANIZATION NEIGHBOR PARTITIONS
DISTANCE COSINE
WITH TARGET ACCURACY 90 PARAMETERS (type IVF, neighbor partitions 10);
As with your typical create index, you define the column and table. The column must have the data type of VECTOR. We can then say what the distance measure should be, the target accuracy and any additional parameters required. Although the parameters part is not required, and the defaults will be used instead.
For other in-memory index we have
CREATE VECTOR INDEX wine_desc_idx ON wine_reviews_130k (embedding)
ORGANIZATION inmemory neighbor graph
distance cosine
WITH target accuracy 95;
You can do some testing or evaluating to determine the accuracy of the Vector Index. You’ll need a test string which has been converted into a Vector by the same embedding mode used on the original data. See my previous posts for some example of how to do this.
declare
q_v VECTOR;
report varchar2(128);
begin
q_v := to_vector(:query_vector);
report := dbms_vector.index_accuracy_query(
OWNER_NAME => 'VECTORAI',
INDEX_NAME => 'WINE_DESC_HNSW_IDX',
qv => q_v, top_K =>10,
target_accuracy =>90 );
dbms_output.put_line(report);
end;
This entry was posted in 23ai, Vector Database, Vector Embeddings and tagged Vector Database, Vector Embeddings, Vector Indexes.