2023-03-31

Protocol in Python

What is Protocol

Protocol is a powerful feature of Python's type system that enables more flexible and expressive type hinting. They are part of the typing module, which is used to annotate types in Python code. Protocols are built on the concept of structural typing, focusing on the behavior or structure of a type, rather than its explicit inheritance hierarchy. This allows you to define interfaces for your classes and create more robust and maintainable code.

Using Protocols in Python can greatly improve the readability and maintainability of your code. By providing explicit information about the expected input and output types of functions and methods, you make your code more self-documenting and easier to understand for other developers. Additionally, this information can be leveraged by static type checkers like Mypy to catch potential bugs and type mismatches before they lead to runtime errors.

How to Use Protocol

To work with Protocol, you first need to understand the typing.Protocol class. The typing.Protocol class is a metaclass that is used for defining new protocol classes. To create a custom protocol, you need to subclass the typing.Protocol class and define the required attributes and methods within the new class.

Custom protocols are created by defining a new class that inherits from typing.Protocol. This class should include the desired methods and properties, which are then used to express the expected behavior of any type implementing this protocol.

For example, to define a custom protocol for a two-dimensional point, you could create a new class called Point2D that inherits from typing.Protocol:

python

from typing import Protocol

class Point2D(Protocol):
    x: float
    y: float

    def distance_to_origin(self) -> float:
        ...

In this example, the Point2D protocol specifies that any type implementing it should have x and y properties of type float, and a method called distance_to_origin that returns a float. When you use the Point2D protocol as a type hint in your code, you indicate that the expected type should have these properties and methods, regardless of its actual inheritance hierarchy.

By creating and using custom Protocols, you can write more expressive and flexible type hints in your Python code, ultimately leading to more robust and maintainable projects.

Comparison to Abstract Base Classes (abc)

While both typing.Protocol and abstract base classes (ABCs) help define interfaces for Python classes, they differ in some key aspects:

typing.Protocol focus on structural subtyping, while ABCs emphasize nominal subtyping. This means that protocols focus on the structure and behavior of a type, while ABCs are based on explicit inheritance relationships.

Protocols are more flexible than ABCs, as they don't enforce implementation. A class can conform to a protocol without explicitly inheriting from it, as long as it has the required attributes and methods.

ABCs can enforce implementation of their abstract methods at runtime, while protocols don't have this capability by default. However, you can use the runtime_checkable decorator for runtime checks with protocols.

When to Use Protocol vs Abstract Base Classes

typing.Protocol are generally preferred when you want to focus on the structure and behavior of a type, without enforcing inheritance relationships. They are especially useful for creating adapter protocols for third-party libraries, defining file-like objects, and creating custom iterable and iterator interfaces.

ABCs, on the other hand, are better suited for cases where you want to enforce implementation of certain methods at runtime and establish explicit inheritance hierarchies.

Scikit-learn Example with typing.Protocol

Scikit-learn provides a wide range of machine learning estimators. You can define a custom estimator protocol that outlines the required methods for any scikit-learn compatible estimator:

python

from typing import Any, Protocol
import numpy as np

class SklearnEstimator(Protocol):
    def fit(self, X: np.ndarray, y: np.ndarray, **kwargs: Any) -> "SklearnEstimator":
        ...

    def predict(self, X: np.ndarray) -> np.ndarray:
        ...

    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        ...

With the custom estimator protocol in place, you can implement a new estimator that follows the protocol:

python

from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np

class CustomEstimator(BaseEstimator, ClassifierMixin):
    def fit(self, X: np.ndarray, y: np.ndarray) -> "CustomEstimator":
        # Implementation of the fit method
        ...

    def predict(self, X: np.ndarray) -> np.ndarray:
        # Implementation of the predict method
        ...

    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        # Implementation of the score method
        ...

Using the custom estimator protocol, you can create functions that work with any scikit-learn compatible estimator:

python

from sklearn.model_selection import cross_val_score
from typing import List

def evaluate_estimators(estimators: List[SklearnEstimator], X: np.ndarray, y: np.ndarray) -> None:
    for estimator in estimators:
        scores = cross_val_score(estimator, X, y, cv=5)

    print(f"{estimator.__class__.__name__}:")
    print(f"  Mean cross-validation score: {scores.mean():.3f}")
    print(f"  Standard deviation: {scores.std():.3f}\n")

In this example, the evaluate_estimators function accepts a list of estimators that conform to the SklearnEstimator protocol, and it uses the cross_val_score function from scikit-learn to evaluate each estimator's performance.

By leveraging the custom SklearnEstimator protocol, you can ensure that any estimator passed to the evaluate_estimators function has the required fit, predict, and score methods, thus making your code more robust and maintainable.

Protocol in Python

What is Protocol

How to Use Protocol

Comparison to Abstract Base Classes (abc)

When to Use Protocol vs Abstract Base Classes

Scikit-learn Example with typing.Protocol

References

Type Annotation in Python

defaultdict in Python

Ryusei Kakujo