2023-05-05

How to Choose Right Pod Type and Size in pinecone

Introduction

Choosing the right pod type and size is a crucial step in planning your Pinecone deployment. This article provides an introductory understanding of the pod selection.

Pods in Pinecone

I will explain the concept of pods within Pinecone, their different types, sizes, and how they influence the performance of your service.

Pods, Pod Types, and Pod Sizes

A pod, in the context of Pinecone, is a pre-configured unit of hardware dedicated to running a Pinecone service. Each index in your service runs on one or more pods. The number of pods used in your deployment directly influences the storage capacity, latency, and throughput of your service. A higher number of pods generally leads to increased storage capacity, lower latency, and higher throughput. Different pod sizes are also available, providing you with further flexibility in optimizing your service performance.

Once an index has been created using a specific pod type, it's important to note that the pod type cannot be changed.

Pod Types

There are different types of pods within Pinecone, each with its own advantages and use cases. The main pod types include:

s1 Pods

The s1 pods are storage-optimized pods that provide a large storage capacity and lower overall costs with slightly higher query latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements.

Each s1 pod has enough capacity for around 5M vectors of 768 dimensions.

p1 Pods

p1 pods are performance-optimized pods designed to provide extremely low query latencies. However, they hold fewer vectors per pod compared to s1 pods. p1 pods are ideally suited for applications with stringent latency requirements (less than 100ms). Each p1 pod can comfortably hold around 1M vectors of 768 dimensions.

p2 Pods

p2 pods offer a balanced combination of higher query throughput and lower latency. Especially suited for vectors with fewer than 128 dimensions and queries where the topK value is less than 50, p2 pods can support up to 200 Queries Per Second (QPS) per replica and return queries in less than 10ms. Each p2 pod can hold about 1M vectors of 768 dimensions, but this capacity may vary with the dimensionality of the vectors. However, the data ingestion rate for p2 pods is slower than that of p1 pods. It's important to note that p2 pods do not support sparse vector values.

Pod Size

The performance of a pod is not solely dependent on its type but also on its size. Each pod type supports four pod sizes:

  • x1
  • x2
  • x4
  • x8

The storage and compute capacity of your index doubles for each size step. The default pod size is x1, but you can increase the size of a pod after the creation of the index.

Choosing the Right Pinecone Index

There are five main considerations when deciding how to configure your Pinecone index:

  • Number of vectors
  • Dimensionality of your vectors
  • Size of metadata on each vector
  • QPS throughput
  • Cardinality of indexed metadata

Each of these considerations comes with requirements for index size, pod type, and replication strategy.

Number of vectors

The first and most critical consideration in sizing is the number of vectors you plan on handling. A rough rule of thumb is that a single p1 pod can store approximately 1M vectors, while a s1 pod can store up to 5M vectors. However, this capacity can be influenced by other factors like vector dimensionality and metadata.

Dimensionality of vectors

The capacity estimations mentioned above assume that each vector has 768 dimensions. Depending on your individual use case, the dimensionality of your vectors might vary, and hence, the space required to store them could be more or less.

Each dimension on a single vector consumes 4 bytes of memory and storage per dimension, so if you expect to have 1M vectors with 768 dimensions each, that’s about 3GB of storage without factoring in metadata or other overhead. The table below gives some examples of the typical pod size and number needed for a given index.

Pod type Dimensions Estimated max vectors per pod
p1 512 1,250,000
768 1,000,000
1024 675,000
p2 512 1,250,000
768 1,100,000
1024 1,000,000
s1 512 8,000,000
768 5,000,000
1024 4,000,000

Queries per second (QPS)

Query speed, measured in QPS, is determined by a combination of the pod type, number of replicas, and the top_k value of queries. As different pod types are optimized for varying use cases, the pod type becomes a major driver of QPS.

As a rule, a single p1 pod with 1M vectors of 768 dimensions each and no replicas can handle about 20 QPS. It’s possible to get greater or lesser speeds, depending on the size of your metadata, number of vectors, the dimensionality of your vectors, and the top_K value for your search.

Pod type top_k 10 top_k 250 top_k 1000
p1 30 25 20
p2 150 50 20
s1 10 10 10

The QPS values in Table above represent baseline QPS with 1M vectors and 768 dimensions.

Adding replicas is the simplest way to increase your QPS. Each replica increases the throughput potential by roughly the same QPS, so aiming for 150 QPS using p1 pods means using the primary pod and 5 replicas. Using threading or multiprocessing in your application is also important, as issuing single queries sequentially still subjects you to delays from any underlying latency. The Pinecone gRPC client can also be used to increase throughput of upserts.

Metadata cardinality and size

The final consideration when planning your indexes is the cardinality and size of your metadata. While the increases in storage requirement may seem negligible for a few million vectors, they can have a real impact as your application scales up to handle hundreds of millions or even billions of vectors.

Indexes with high cardinality, like those storing a unique user ID on each vector, can have substantial memory requirements. This can result in fewer vectors fitting per pod. Additionally, if the size of metadata per vector is larger, the index requires more storage.

Example Applications and Sizing

I will illustrate how the guidelines and principles discussed so far can be applied in real-world scenarios. I will reference two example applications in the official document to demonstrate how to choose the appropriate type, size, and number of pods for your index.

Example 1: Semantic Search of News Articles

Suppose we're working with a dataset of 204,135 vectors, with each vector utilizing 300 dimensions for semantic search of news articles. The number of dimensions is well under the general measure of 768 dimensions. Given the rule of thumb stating that a single p1 pod can accommodate up to 1M vectors, we could comfortably run this application using a single p1.x1 pod. Despite the lower capacity use of the pod, the demand for low latency and rapid query response in a semantic search application makes the p1 pod type the most appropriate choice.

Example 2: Facial Recognition

Let's consider a more complex case of a facial recognition application. Suppose you're building an application to identify customers using facial recognition for a secure banking app. The vectors used for facial recognition can work with as few as 128 dimensions, but in this case, we aim for higher precision due to its use in financial security, thus choosing 2048 dimensions per vector. Furthermore, we are planning to cater to 100M customers.

To estimate the required pods, let's first consider the typical configuration fitting 1M vectors with 768 dimensions in a p1.x1 pod. Using this as a baseline, we can divide the desired configuration to get our pod estimate:

100M / 1M = 100 base p1 pods
2048 / 768 = 2.667 vector ratio
2.667 * 100 = 267 (rounding up)

The calculations yield a requirement of 267 p1.x1 pods. However, we could reduce this by switching to s1 pods, which prioritize storage capacity over latency. They hold five times the storage of p1.x1 pods, hence the new calculation would be:

267 / 5 = 54 (rounding up)

Hence, our estimate suggests that we need 54 s1.x1 pods to store very high dimensional data for the face of each of the bank’s customers.

References

https://docs.pinecone.io/docs/choosing-index-type-and-size
https://docs.pinecone.io/docs/indexes

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!