2023-03-10

Pydantic for Data Validation

What is Pydantic

Pydantic is a Python library that simplifies the process of data validation and settings management. It provides a simple and intuitive way to define data models, validate input data, and manage application settings, all while maintaining strict typing and providing helpful error messages.

Pydantic is designed to work seamlessly with popular web frameworks like FastAPI and Starlette, but it can be used in any Python application that requires data validation or settings management.

One of the key benefits of Pydantic is its ability to reduce boilerplate code and make data validation more declarative. Developers can define their data models using a simple class syntax, and Pydantic takes care of the rest, including type checking, validation, and error handling.

Installing Pydantic

To install Pydantic, run the following command:

bash
$ pip install pydantic

Creating Pydantic Models

Creating Pydantic models is a straightforward process that involves defining the fields of the model and adding any necessary validators.

Here are the steps to create Pydantic models:

  1. Import the BaseModel class from the pydantic module:
python
from pydantic import BaseModel
  1. Define your model by subclassing BaseModel and adding fields to it:
python
class User(BaseModel):
    id: int
    name: str
    email: str

In this example, we've defined a User model with three fields: id, name, and email. Note that we've specified the type of each field using Python annotations.

  1. Add validators to your fields as needed. Pydantic provides a wide range of built-in validators, such as MinValue, MaxValue, and Regex, that can be used to ensure that input data meets specific requirements. For example:
python
from pydantic import EmailStr

class User(BaseModel):
    id: int
    name: str
    email: EmailStr

In this updated example, we've replaced the email field with an EmailStr field, which automatically validates that the input data is a valid email address.

  1. Optionally, specify which fields are required and which are optional using the Optional and Required classes from the typing module. For example:
python
from typing import Optional

class User(BaseModel):
    id: int
    name: str
    email: Optional[EmailStr]

In this final example, we've marked the email field as optional by wrapping it in the Optional class. This means that input data can be valid even if the email field is not included.

By following these steps, you can easily create Pydantic models that validate input data and ensure that your Python applications run smoothly.

Using Pydantic Models for Data Validation

Pydantic makes it easy to validate data in Python applications by providing a simple way to define and use data models. In this article, I'll look at how you can use Pydantic models for data validation.

Validating Input Data

To validate input data using Pydantic, you simply need to create a Pydantic model that defines the expected structure of the input data. You can then use this model to validate any input data that your application receives.

Here's an example of how to validate input data using Pydantic:

python
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

def create_user(user_data: dict):
    user = User(**user_data)
    # The above line validates the input data and raises a ValueError if it's invalid
    # If the input data is valid, you can continue with creating the user
    # ...

In this example, we define a User Pydantic model with three fields: id, name, and email. We then define a create_user function that accepts a dictionary of input data as its argument.

To validate the input data, we create a new instance of the User model using the ** syntax to unpack the user_data dictionary. If the input data is valid, the User instance is created successfully. If the input data is invalid, a ValueError will be raised.

Validating Output Data

In addition to validating input data, Pydantic can also be used to validate output data. This is useful when you want to ensure that the data returned by your application meets certain requirements.

Here's an example of how to validate output data using Pydantic:

python
from typing import List
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str

def get_users() -> List[User]:
    # Get a list of users from the database
    users_data = [
        {"id": 1, "name": "Alice", "email": "alice@example.com"},
        {"id": 2, "name": "Bob", "email": "bob@example.com"}
    ]
    # Convert the list of user data dictionaries to a list of User instances
    users = [User(**user_data) for user_data in users_data]
    # Validate the output data
    return users

In this example, we define a get_users function that returns a list of user data. We convert this list of data dictionaries to a list of User instances using a list comprehension.

To validate the output data, we simply return the list of User instances. Pydantic will automatically validate each User instance and raise an exception if any of them are invalid.

Built-in Validator

Pydantic comes with a number of built-in validators that you can use to ensure that your data models meet certain requirements. In this article, I'll go through some of the most commonly used validators and provide examples of how to use them.

EmailStr

This validator ensures that a string is a valid email address.

python
from pydantic import BaseModel, EmailStr

class User(BaseModel):
    name: str
    email: EmailStr

UrlStr

This validator ensures that a string is a valid URL.

python
from pydantic import BaseModel, UrlStr

class Website(BaseModel):
    name: str
    url: UrlStr

Length

This validator ensures that a string has a minimum and/or maximum length.

python
from pydantic import BaseModel, constr, Length

class User(BaseModel):
    name: constr(min_length=1, max_length=50)
    bio: constr(max_length=500)

MinMaxValue

These validators ensure that a numeric value is greater than or equal to a minimum value or less than or equal to a maximum value, respectively.

python
from pydantic import BaseModel, conint, MinValue, MaxValue

class Product(BaseModel):
    name: str
    price: conint(gt=0, le=10000)
    quantity: conint(ge=0)
    discount: conint(ge=0, le=50)

Regex

This validator ensures that a string matches a regular expression pattern.

python
from pydantic import BaseModel, constr, Regex

class User(BaseModel):
    name: constr(regex=r'^[a-zA-Z ]+$')

ConstrainedStr

This validator ensures that a string satisfies a set of constraints defined by its parent class.

python
from pydantic import BaseModel, ConstrainedStr

class Password(ConstrainedStr):
    min_length = 8
    max_length = 50
    regex = r'[a-zA-Z0-9_-]+'

class User(BaseModel):
    email: EmailStr
    password: Password

ConstrainedInt

This validator ensures that an integer satisfies a set of constraints defined by its parent class.

python
from pydantic import BaseModel, ConstrainedInt

class Age(ConstrainedInt):
    ge = 0
    le = 120

class User(BaseModel):
    name: str
    age: Age

Decimal

This validator ensures that a numeric value is a decimal with a given number of decimal places.

python
from pydantic import BaseModel, condecimal

class Product(BaseModel):
    price: condecimal(ge=0, le=10000, max_digits=8, decimal_places=2)

Uuid

This validator ensures that a string value is a valid UUID.

python
from pydantic import BaseModel, UUID4

class Order(BaseModel):
    id: UUID4

IPv4Address

This validator ensures that a string value is a valid IPv4 address.

python
from pydantic import BaseModel, IPv4Address

class NetworkInterface(BaseModel):
    ip_address: IPv4Address

DirectoryPath

This validator ensures that a string value is a valid directory path.

python
from pydantic import BaseModel, DirectoryPath

class Config(BaseModel):
    data_dir: DirectoryPath

FilePath

This validator ensures that a string value is a valid file path.

python
from pydantic import BaseModel, FilePath

class Config(BaseModel):
    data_file: FilePath

Advanced Features of Pydantic

Pydantic provides several advanced features for creating powerful and flexible data models. Here are some of the key features:

Creating Custom Validators

Pydantic allows you to create custom validators for fields, giving you fine-grained control over how your data is validated. You can define your own validator functions and then use them in your data models.

python
from pydantic import BaseModel, validator

class User(BaseModel):
    name: str
    age: int

    @validator('age')
    def validate_age(cls, age):
        if age < 0 or age > 120:
            raise ValueError('Invalid age')
        return age

In this example, we define a custom validator for the age field of the User model that checks if the age is between 0 and 120.

Validating Two Fields

One common use case for custom validators is to validate two fields at once, ensuring that the values of the two fields meet certain criteria or constraints.

For example, let's say you have a Pydantic model that represents a shipping order, with two columns representing the shipping address: street_address and zip_code. You want to ensure that the zip_code corresponds to the correct street_address, based on a mapping of zip codes to street addresses.

To accomplish this, you can define a custom validator for both the street_address and zip_code columns that checks if the combination of values is valid:

python
from pydantic import BaseModel, validator

class ShippingOrder(BaseModel):
    street_address: str
    zip_code: str

    @validator('zip_code')
    def validate_zip_code(cls, value, values):
        # get the value of the street_address column from the values dictionary
        street_address = values.get('street_address')

        # perform validation logic
        if not is_valid_zip_code(street_address, value):
            raise ValueError('Invalid zip code for street address')

        return value

    @validator('street_address')
    def validate_street_address(cls, value, values):
        # get the value of the zip_code column from the values dictionary
        zip_code = values.get('zip_code')

        # perform validation logic
        if not is_valid_zip_code(value, zip_code):
            raise ValueError('Invalid street address for zip code')

        return value

def is_valid_zip_code(street_address, zip_code):
    # perform validation logic, e.g. using a mapping of zip codes to street addresses
    if zip_code == '12345' and street_address != '123 Main St.':
        return False

    return True

In this example, we define two custom validators, one for the zip_code column and one for the street_address column. Each validator takes two arguments: the name of the field being validated (zip_code or street_address) and its value (value), as well as the dictionary of values for all fields in the model (values).

Each validator also calls a separate is_valid_zip_code function to perform the actual validation logic. In this case, the is_valid_zip_code function checks if the combination of street_address and zip_code is valid, based on a mapping of zip codes to street addresses. If the combination is not valid, the validator raises a ValueError with an error message.

Working with Enums

Pydantic provides built-in support for enums, allowing you to define fields that can only take on certain values. This helps ensure that your data models are accurate and secure.

python
from enum import Enum
from pydantic import BaseModel

class Gender(Enum):
    MALE = 'male'
    FEMALE = 'female'

class User(BaseModel):
    name: str
    gender: Gender

In this example, we define an Enum for Gender and use it in the User model to ensure that the gender field can only take on the values male or female.

Handling Nested Models

Pydantic allows you to define nested models, which can be used to represent complex data structures.

python
from pydantic import BaseModel

class Location(BaseModel):
    latitude: float
    longitude: float

class User(BaseModel):
    name: str
    location: Location

In this example, we define a Location model and use it as a field in the User model to represent a user's location.

Using Pydantic with ORM Libraries

Pydantic can be used with popular ORM libraries such as SQLAlchemy and Tortoise-ORM, allowing you to easily convert between your database models and Pydantic models.

python
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from pydantic import BaseModel

Base = declarative_base()

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    age = Column(Integer)

class UserIn(BaseModel):
    name: str
    age: int

class UserOut(UserIn):
    id: int

    class Config:
        orm_mode = True

In this example, we define an SQLAlchemy model for User and then define Pydantic models for input and output. The Config class with orm_mode=True tells Pydantic to convert the SQLAlchemy model to the Pydantic model and vice versa.

References

https://docs.pydantic.dev/

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!