2023-05-15

Metadata in Pinecone

Introduction

Metadata is a powerful and flexible feature of Pinecone that provides additional context to the vector representations of data objects. By enabling the attachment of key-value pair metadata to vectors in an index, you can make your vector search more precise and catered to your specific use case. This article explains the various aspects of metadata in Pinecone, from its implementation and supported types, to its utility in refining search queries, and its role in data management tasks such as insertion, querying, and deletion.

Filtering with Metadata

The primary purpose of metadata in Pinecone is to limit the vector search based on certain attributes or conditions. Such functionality is achieved by specifying filter expressions when querying the index. Metadata filters are highly effective in retrieving the exact number of nearest-neighbor results that match the specified filters.

One of the key advantages of using metadata filters is that, in most cases, the search latency is even lower than unfiltered searches. This increased speed does not come at the cost of accuracy or relevancy, making metadata filters an efficient tool for refining your vector search in Pinecone.

Supported Metadata Types

In Pinecone, each vector in an index can be associated with a metadata payload. This payload is a JSON object consisting of key-value pairs where keys are strings, and values can be of the following types:

  • String
  • Number (integer or floating point, which gets converted to a 64-bit floating point)
  • Boolean (true, false)
  • List of String

Examples of Valid Metadata Payloads

Here are a couple of examples of valid metadata payloads in Pinecone:

{
  "genre": "action",
  "year": 2020,
  "length_hrs": 1.5
}

In this example, the metadata is related to a movie. Each key-value pair represents a different attribute of the movie.

{
  "color": "blue",
  "fit": "straight",
  "price": 29.99,
  "is_jeans": true
}

In the second example, the metadata is related to a clothing item. The key-value pairs describe the color, fit, price, and type of the item.

Supported Metadata Size

Pinecone supports 40kb of metadata per vector. This limitation allows Pinecone to maintain efficient search and retrieval performance while still providing ample room for descriptive metadata.

Metadata Query Language

Pinecone's metadata query language allows for the combination of metadata filters using logical operators like AND and OR. Here are the various operators you can use:

  • $eq: Equal to (applies to numbers, strings, booleans)
  • $ne: Not equal to (applies to numbers, strings, booleans)
  • $gt: Greater than (applies to numbers)
  • $gte: Greater than or equal to (applies to numbers)
  • $lt: Less than (applies to numbers)
  • $lte: Less than or equal to (applies to numbers)
  • $in: In array (applies to strings or numbers)
  • $nin: Not in array (applies to strings or numbers)

Using Arrays of Strings as Metadata Values or as Metadata Filters

In Pinecone, you can use arrays of strings as metadata values or as metadata filters. This flexibility allows for more complex filtering expressions and can cater to a wider range of data scenarios.

For instance, consider a vector with the following metadata payload:

{ "genre": ["comedy", "documentary"] }

Here are some queries with filters that will match the above vector:

{"genre":"comedy"}

{"genre": {"$in":["documentary","action"]}}

{"$and": [{"genre": "comedy"}, {"genre":"documentary"}]}

Conversely, a query with the following filter will not match the vector:

{ "$and": [{ "genre": "comedy" }, { "genre": "drama" }] }

Inserting metadata into an index

Metadata can be included in upsert requests as you insert your vectors.

Here is an example of how to insert vectors with associated metadata, representing movies, into a Pinecone index with Python client:

python
import pinecone

pinecone.init(api_key="your-api-key", environment="us-west1-gcp")
index = pinecone.Index("example-index")

index.upsert([
    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020}),
    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "documentary", "year": 2019}),
    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019}),
    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama"}),
    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "drama"})
])

Querying an Index with Metadata Filters

With metadata attached to your vectors, you can refine your searches by including metadata filter expressions with your queries. This allows you to limit the search to only those vectors that match the filter expression.

For example, you can search for documentaries from the year 2019 in the movies index created in the previous chapter. This example also uses the include_metadata flag to include the vector metadata in the response.

python
index.query(
    vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
    filter={
        "genre": {"$eq": "documentary"},
        "year": 2019
    },
    top_k=1,
    include_metadata=True
)
{'matches': [{'id': 'B',
              'metadata': {'genre': 'documentary', 'year': 2019.0},
              'score': 0.0800000429,
              'values': []}],
 'namespace': ''}

For performance reasons, it is recommended not to return vector data and metadata when top_k > 1000. Queries with top_k over 1000 should not contain include_metadata=True or include_data=True.

Deleting Vectors by Metadata Filter

You can specify vectors to be deleted by their metadata values by passing a metadata filter expression to the delete operation. This functionality allows for the efficient removal of all vectors matching the metadata filter expression.

Here's an example that deletes all vectors with a genre of "documentary" and a year of 2019 from an index:

python
index.delete(
    filter={
        "genre": {"$eq": "documentary"},
        "year": 2019
    }
)

References

https://docs.pinecone.io/docs/metadata-filtering

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!