2023-02-03

Hugging Face Transformers：Pipeline

Machine Learning

NLP

Hugging Face

Python

Hugging Face Transformers Pipeline

Hugging face Transformers' Pipeline allows you to perform NLP tasks with just a few lines of code.

The Pipeline internally performs the following three steps when it receives raw text data.

Tokenizer: preprocessing is performed to convert the data into a model input format.
Model: the converted input text is fed into the model.
Post Processing: the model's inference results are post-processed into a more manageable form for output.

Pipeline flow
Behind the pipeline

How to use Pipeline

Install Hugging Face Transformers with the following command.

$ pip install transformers

Specify the name of the task you want to perform in pipeline, such as pipeline("question-answering"). Tasks can be, for example

feature-extraction (get the vector representation of a text)
fill-mask
ner (named entity recognition)
question-answering
sentiment-analysis
summarization
text-generation
translation
zero-shot-classification

More information can be found at the following link.

For example, if you want to perform text classification, write the following.

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

The following results are returned.

[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

If you want to use a specific model from Hub, you can omit the task name only if a model on Hub already defines that task.

from transformers import pipeline

pipe = pipeline(model="roberta-large-mnli")
pipe("This restaurant is awesome")

>> [{'label': 'NEUTRAL', 'score': 0.7313136458396912}]

It is also possible to pass a list for input.

from transformers import pipeline

pipe = pipeline("text-classification")
pipe(["This restaurant is awesome", "This restaurant is awful"])

>> [{'label': 'POSITIVE', 'score': 0.9998743534088135},
>>  {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

A custom Pipeline can also be defined.

class MyPipeline(TextClassificationPipeline):
    def postprocess():
        # Your code goes here
        scores = scores * 100
        # And here

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)

Pipeline examples

Here is an example of an NLP task.

Zero-shot classification
Text generation
Mask filling

Zero-shot classification

Zero-shot classification is a task that requires no labeled text; instead, you simply provide the labels you wish to classify directly to Pipeline, which returns inference results for those labels. Annotating text is usually time consuming and requires specialized knowledge. Zero-shot classification is useful in such cases.

Hugging Face Transformers：Pipeline

Hugging Face Transformers Pipeline

How to use Pipeline

Pipeline examples

Zero-shot classification

Text generation

Mask filling

References

Hugging Face Transformers：Overview

Hugging Face Transformers：Model

Ryusei Kakujo