2023-02-03

Hugging Face Transformers:Pipeline

Hugging Face Transformers Pipeline

Hugging face Transformers' Pipeline allows you to perform NLP tasks with just a few lines of code.

The Pipeline internally performs the following three steps when it receives raw text data.

  1. Tokenizer: preprocessing is performed to convert the data into a model input format.
  2. Model: the converted input text is fed into the model.
  3. Post Processing: the model's inference results are post-processed into a more manageable form for output.

Pipeline flow
Behind the pipeline

How to use Pipeline

Install Hugging Face Transformers with the following command.

$ pip install transformers

Specify the name of the task you want to perform in pipeline, such as pipeline("question-answering"). Tasks can be, for example

  • feature-extraction (get the vector representation of a text)
  • fill-mask
  • ner (named entity recognition)
  • question-answering
  • sentiment-analysis
  • summarization
  • text-generation
  • translation
  • zero-shot-classification

More information can be found at the following link.

https://huggingface.co/docs/transformers/main_classes/pipelines

For example, if you want to perform text classification, write the following.

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

The following results are returned.

[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

If you want to use a specific model from Hub, you can omit the task name only if a model on Hub already defines that task.

from transformers import pipeline

pipe = pipeline(model="roberta-large-mnli")
pipe("This restaurant is awesome")

>> [{'label': 'NEUTRAL', 'score': 0.7313136458396912}]

It is also possible to pass a list for input.

from transformers import pipeline

pipe = pipeline("text-classification")
pipe(["This restaurant is awesome", "This restaurant is awful"])

>> [{'label': 'POSITIVE', 'score': 0.9998743534088135},
>>  {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

A custom Pipeline can also be defined.

class MyPipeline(TextClassificationPipeline):
    def postprocess():
        # Your code goes here
        scores = scores * 100
        # And here

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)

Pipeline examples

Here is an example of an NLP task.

  • Zero-shot classification
  • Text generation
  • Mask filling

Zero-shot classification

Zero-shot classification is a task that requires no labeled text; instead, you simply provide the labels you wish to classify directly to Pipeline, which returns inference results for those labels. Annotating text is usually time consuming and requires specialized knowledge. Zero-shot classification is useful in such cases.

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)
{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445988297462463, 0.11197440326213837, 0.04342682659626007]}

Text generation

Text generation is a task where, given a prompt, the model generates the rest of the text and autocompletes it.

from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")
[{'generated_text': 'In this course, we will teach you how to understand and use '
                    'data flow and data interchange when handling user data. We '
                    'will be working with one or more of the most commonly used '
                    'data flows — data flows of various types, as seen by the '
                    'HTTP'}]

It is also possible to select a specific model from the Hub.

from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)
[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
                    'move your mental and physical capabilities to your advantage.'},
 {'generated_text': 'In this course, we will teach you how to become an expert and '
                    'practice realtime, and with a hands on experience on both real '
                    'time and real'}]

Mask filling

Mask filling is the task of filling in the blanks of a given text.

from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)
[{'sequence': 'This course will teach you all about mathematical models.',
  'score': 0.19619831442832947,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course will teach you all about computational models.',
  'score': 0.04052725434303284,
  'token': 38163,
  'token_str': ' computational'}]

References

https://huggingface.co/course/chapter1/3
https://huggingface.co/course/chapter2/2?fw=pt
https://huggingface.co/docs/transformers/main_classes/pipelines
https://huggingface.co/docs/transformers/quicktour#use-another-model-and-tokenizer-in-the-pipeline

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!