Hugging Face Transformers Pipeline
Hugging face Transformers' Pipeline allows you to perform NLP tasks with just a few lines of code.
The Pipeline internally performs the following three steps when it receives raw text data.
- Tokenizer: preprocessing is performed to convert the data into a model input format.
- Model: the converted input text is fed into the model.
- Post Processing: the model's inference results are post-processed into a more manageable form for output.
How to use Pipeline
Install Hugging Face Transformers with the following command.
$ pip install transformers
Specify the name of the task you want to perform in pipeline
, such as pipeline("question-answering")
. Tasks can be, for example
- feature-extraction (get the vector representation of a text)
- fill-mask
- ner (named entity recognition)
- question-answering
- sentiment-analysis
- summarization
- text-generation
- translation
- zero-shot-classification
More information can be found at the following link.
For example, if you want to perform text classification, write the following.
from transformers import pipeline
pipe = pipeline("text-classification")
pipe("This restaurant is awesome")
The following results are returned.
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
If you want to use a specific model from Hub, you can omit the task name only if a model on Hub already defines that task.
from transformers import pipeline
pipe = pipeline(model="roberta-large-mnli")
pipe("This restaurant is awesome")
>> [{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
It is also possible to pass a list for input.
from transformers import pipeline
pipe = pipeline("text-classification")
pipe(["This restaurant is awesome", "This restaurant is awful"])
>> [{'label': 'POSITIVE', 'score': 0.9998743534088135},
>> {'label': 'NEGATIVE', 'score': 0.9996669292449951}]
A custom Pipeline can also be defined.
class MyPipeline(TextClassificationPipeline):
def postprocess():
# Your code goes here
scores = scores * 100
# And here
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
Pipeline examples
Here is an example of an NLP task.
- Zero-shot classification
- Text generation
- Mask filling
Zero-shot classification
Zero-shot classification is a task that requires no labeled text; instead, you simply provide the labels you wish to classify directly to Pipeline, which returns inference results for those labels. Annotating text is usually time consuming and requires specialized knowledge. Zero-shot classification is useful in such cases.
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
"This is a course about the Transformers library",
candidate_labels=["education", "politics", "business"],
)
{'sequence': 'This is a course about the Transformers library',
'labels': ['education', 'business', 'politics'],
'scores': [0.8445988297462463, 0.11197440326213837, 0.04342682659626007]}
Text generation
Text generation is a task where, given a prompt, the model generates the rest of the text and autocompletes it.
from transformers import pipeline
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")
[{'generated_text': 'In this course, we will teach you how to understand and use '
'data flow and data interchange when handling user data. We '
'will be working with one or more of the most commonly used '
'data flows — data flows of various types, as seen by the '
'HTTP'}]
It is also possible to select a specific model from the Hub.
from transformers import pipeline
generator = pipeline("text-generation", model="distilgpt2")
generator(
"In this course, we will teach you how to",
max_length=30,
num_return_sequences=2,
)
[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
'move your mental and physical capabilities to your advantage.'},
{'generated_text': 'In this course, we will teach you how to become an expert and '
'practice realtime, and with a hands on experience on both real '
'time and real'}]
Mask filling
Mask filling is the task of filling in the blanks of a given text.
from transformers import pipeline
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)
[{'sequence': 'This course will teach you all about mathematical models.',
'score': 0.19619831442832947,
'token': 30412,
'token_str': ' mathematical'},
{'sequence': 'This course will teach you all about computational models.',
'score': 0.04052725434303284,
'token': 38163,
'token_str': ' computational'}]
References