What is Protocol
Protocol is a powerful feature of Python's type system that enables more flexible and expressive type hinting. They are part of the typing
module, which is used to annotate types in Python code. Protocols are built on the concept of structural typing, focusing on the behavior or structure of a type, rather than its explicit inheritance hierarchy. This allows you to define interfaces for your classes and create more robust and maintainable code.
Using Protocols in Python can greatly improve the readability and maintainability of your code. By providing explicit information about the expected input and output types of functions and methods, you make your code more self-documenting and easier to understand for other developers. Additionally, this information can be leveraged by static type checkers like Mypy to catch potential bugs and type mismatches before they lead to runtime errors.
How to Use Protocol
To work with Protocol, you first need to understand the typing.Protocol
class. The typing.Protocol
class is a metaclass that is used for defining new protocol classes. To create a custom protocol, you need to subclass the typing.Protocol
class and define the required attributes and methods within the new class.
Custom protocols are created by defining a new class that inherits from typing.Protocol
. This class should include the desired methods and properties, which are then used to express the expected behavior of any type implementing this protocol.
For example, to define a custom protocol for a two-dimensional point, you could create a new class called Point2D
that inherits from typing.Protocol
:
from typing import Protocol
class Point2D(Protocol):
x: float
y: float
def distance_to_origin(self) -> float:
...
In this example, the Point2D
protocol specifies that any type implementing it should have x
and y
properties of type float
, and a method called distance_to_origin
that returns a float
. When you use the Point2D
protocol as a type hint in your code, you indicate that the expected type should have these properties and methods, regardless of its actual inheritance hierarchy.
By creating and using custom Protocols, you can write more expressive and flexible type hints in your Python code, ultimately leading to more robust and maintainable projects.
Comparison to Abstract Base Classes (abc)
While both typing.Protocol
and abstract base classes (ABCs) help define interfaces for Python classes, they differ in some key aspects:
typing.Protocol
focus on structural subtyping, while ABCs emphasize nominal subtyping. This means that protocols focus on the structure and behavior of a type, while ABCs are based on explicit inheritance relationships.
Protocols are more flexible than ABCs, as they don't enforce implementation. A class can conform to a protocol without explicitly inheriting from it, as long as it has the required attributes and methods.
ABCs can enforce implementation of their abstract methods at runtime, while protocols don't have this capability by default. However, you can use the runtime_checkable decorator for runtime checks with protocols.
When to Use Protocol vs Abstract Base Classes
typing.Protocol
are generally preferred when you want to focus on the structure and behavior of a type, without enforcing inheritance relationships. They are especially useful for creating adapter protocols for third-party libraries, defining file-like objects, and creating custom iterable and iterator interfaces.
ABCs, on the other hand, are better suited for cases where you want to enforce implementation of certain methods at runtime and establish explicit inheritance hierarchies.
Scikit-learn Example with typing.Protocol
Scikit-learn provides a wide range of machine learning estimators. You can define a custom estimator protocol that outlines the required methods for any scikit-learn compatible estimator:
from typing import Any, Protocol
import numpy as np
class SklearnEstimator(Protocol):
def fit(self, X: np.ndarray, y: np.ndarray, **kwargs: Any) -> "SklearnEstimator":
...
def predict(self, X: np.ndarray) -> np.ndarray:
...
def score(self, X: np.ndarray, y: np.ndarray) -> float:
...
With the custom estimator protocol in place, you can implement a new estimator that follows the protocol:
from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np
class CustomEstimator(BaseEstimator, ClassifierMixin):
def fit(self, X: np.ndarray, y: np.ndarray) -> "CustomEstimator":
# Implementation of the fit method
...
def predict(self, X: np.ndarray) -> np.ndarray:
# Implementation of the predict method
...
def score(self, X: np.ndarray, y: np.ndarray) -> float:
# Implementation of the score method
...
Using the custom estimator protocol, you can create functions that work with any scikit-learn compatible estimator:
from sklearn.model_selection import cross_val_score
from typing import List
def evaluate_estimators(estimators: List[SklearnEstimator], X: np.ndarray, y: np.ndarray) -> None:
for estimator in estimators:
scores = cross_val_score(estimator, X, y, cv=5)
print(f"{estimator.__class__.__name__}:")
print(f" Mean cross-validation score: {scores.mean():.3f}")
print(f" Standard deviation: {scores.std():.3f}\n")
In this example, the evaluate_estimators
function accepts a list of estimators that conform to the SklearnEstimator
protocol, and it uses the cross_val_score
function from scikit-learn to evaluate each estimator's performance.
By leveraging the custom SklearnEstimator
protocol, you can ensure that any estimator passed to the evaluate_estimators
function has the required fit
, predict
, and score
methods, thus making your code more robust and maintainable.
References