From 4370b595be548e5b3dfe54a076cbef50e8cf026c Mon Sep 17 00:00:00 2001 From: Steven Date: Fri, 5 Aug 2022 13:41:34 -0700 Subject: [PATCH 1/3] =?UTF-8?q?=20=F0=9F=93=9D=20add=20example=20of=20mult?= =?UTF-8?q?imodal=20usage=20to=20pipeline=20tutorial?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/en/pipeline_tutorial.mdx | 39 ++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/docs/source/en/pipeline_tutorial.mdx b/docs/source/en/pipeline_tutorial.mdx index 274f97f0d6cc92..6f34681911b8f8 100644 --- a/docs/source/en/pipeline_tutorial.mdx +++ b/docs/source/en/pipeline_tutorial.mdx @@ -12,21 +12,21 @@ specific language governing permissions and limitations under the License. # Pipelines for inference -The [`pipeline`] makes it simple to use any model from the [Model Hub](https://huggingface.co/models) for inference on a variety of tasks such as text generation, image segmentation and audio classification. Even if you don't have experience with a specific modality or understand the code powering the models, you can still use them with the [`pipeline`]! This tutorial will teach you to: +The [`pipeline`] makes it simple to use any model from the [Hub](https://huggingface.co/models) for inference on tasks such as text generation, image segmentation, audio classification, and multimodal tasks like visual question answering. Even if you don't have experience with a specific modality or familiar with the underlying code behind the models, you can still use them for inference with the [`pipeline`]! This tutorial will teach you to: * Use a [`pipeline`] for inference. * Use a specific tokenizer or model. -* Use a [`pipeline`] for audio and vision tasks. +* Use a [`pipeline`] for audio, vision, and multimodal tasks. -Take a look at the [`pipeline`] documentation for a complete list of supported tasks. +Take a look at the [`pipeline`] documentation for a complete list of supported tasks and available parameters. ## Pipeline usage -While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the specific task pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. +While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the task specific pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. 1. Start by creating a [`pipeline`] and specify an inference task: @@ -67,7 +67,7 @@ Any additional parameters for your task can also be included in the [`pipeline`] ### Choose a model and tokenizer -The [`pipeline`] accepts any model from the [Model Hub](https://huggingface.co/models). There are tags on the Model Hub that allow you to filter for a model you'd like to use for your task. Once you've picked an appropriate model, load it with the corresponding `AutoModelFor` and [`AutoTokenizer'] class. For example, load the [`AutoModelForCausalLM`] class for a causal language modeling task: +The [`pipeline`] accepts any model from the [Hub](https://huggingface.co/models). There are tags on the Hub that allow you to filter for a model you'd like to use for your task. Once you've picked an appropriate model, load it with the corresponding `AutoModelFor` and [`AutoTokenizer`] class. For example, load the [`AutoModelForCausalLM`] class for a causal language modeling task: ```py >>> from transformers import AutoTokenizer, AutoModelForCausalLM @@ -95,7 +95,7 @@ Pass your input text to the [`pipeline`] to generate some text: ## Audio pipeline -The flexibility of the [`pipeline`] means it can also be extended to audio tasks. +The [`pipeline`] also supports audio tasks like audio classification and automatic speech recognition. For example, let's classify the emotion in this audio clip: @@ -129,9 +129,9 @@ Pass the audio file to the [`pipeline`]: ## Vision pipeline -Finally, using a [`pipeline`] for vision tasks is practically identical. +Using a [`pipeline`] for vision tasks is practically identical. -Specify your vision task and pass your image to the classifier. The imaage can be a link or a local path to the image. For example, what species of cat is shown below? +Specify your task and pass your image to the classifier. The image can be a link or a local path to the image. For example, what species of cat is shown below? ![pipeline-cat-chonk](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg) @@ -146,3 +146,26 @@ Specify your vision task and pass your image to the classifier. The imaage can b >>> preds [{'score': 0.4335, 'label': 'lynx, catamount'}, {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}, {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}, {'score': 0.0239, 'label': 'Egyptian cat'}, {'score': 0.0229, 'label': 'tiger cat'}] ``` + +## Multimodal pipeline + +The [`pipeline`] supports more than one modality. For example, a visual question answering (VQA) task combines text and image. Feel free to use any image link you like and a question you want to ask about the image. The image can be a URL or a local path to the image. + +For example, if you use the same image from the vision pipeline above: + +```py +>>> image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" +>>> question = "how chonky is this cat?" +``` + +Create a pipeline for `vqa` and pass it the image and question: + +```py +>>> from transformers import pipeline + +>>> vqa = pipeline(task="vqa") +>>> preds = vqa(image=image, question=question) +>>> preds = [{"score": round(pred["score"], 4), "answer": pred["answer"]} for pred in preds] +>>> preds +[{'score': 0.43120142817497253, 'answer': 'very'}, {'score': 0.11782129853963852, 'answer': 'not very'}, {'score': 0.03669029846787453, 'answer': 'not at all'}, {'score': 0.024968842044472694, 'answer': 'old'}, {'score': 0.019012143835425377, 'answer': 'medium'}] +``` \ No newline at end of file From 768732192a805e612a8c2d63e9dc2f7ee68cc6d9 Mon Sep 17 00:00:00 2001 From: Steven Date: Fri, 5 Aug 2022 15:52:41 -0700 Subject: [PATCH 2/3] =?UTF-8?q?=20=F0=9F=96=8D=20apply=20feedbacks?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/en/pipeline_tutorial.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/en/pipeline_tutorial.mdx b/docs/source/en/pipeline_tutorial.mdx index 6f34681911b8f8..cc9b53bb2b24a2 100644 --- a/docs/source/en/pipeline_tutorial.mdx +++ b/docs/source/en/pipeline_tutorial.mdx @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # Pipelines for inference -The [`pipeline`] makes it simple to use any model from the [Hub](https://huggingface.co/models) for inference on tasks such as text generation, image segmentation, audio classification, and multimodal tasks like visual question answering. Even if you don't have experience with a specific modality or familiar with the underlying code behind the models, you can still use them for inference with the [`pipeline`]! This tutorial will teach you to: +The [`pipeline`] makes it simple to use any model from the [Hub](https://huggingface.co/models) for inference on any language, computer vision, speech, and multimodal tasks. Even if you don't have experience with a specific modality or familiar with the underlying code behind the models, you can still use them for inference with the [`pipeline`]! This tutorial will teach you to: * Use a [`pipeline`] for inference. * Use a specific tokenizer or model. @@ -26,7 +26,7 @@ Take a look at the [`pipeline`] documentation for a complete list of supported t ## Pipeline usage -While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the task specific pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. +While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the task-specific pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. 1. Start by creating a [`pipeline`] and specify an inference task: @@ -155,7 +155,7 @@ For example, if you use the same image from the vision pipeline above: ```py >>> image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg" ->>> question = "how chonky is this cat?" +>>> question = "Where is the cat?" ``` Create a pipeline for `vqa` and pass it the image and question: @@ -167,5 +167,5 @@ Create a pipeline for `vqa` and pass it the image and question: >>> preds = vqa(image=image, question=question) >>> preds = [{"score": round(pred["score"], 4), "answer": pred["answer"]} for pred in preds] >>> preds -[{'score': 0.43120142817497253, 'answer': 'very'}, {'score': 0.11782129853963852, 'answer': 'not very'}, {'score': 0.03669029846787453, 'answer': 'not at all'}, {'score': 0.024968842044472694, 'answer': 'old'}, {'score': 0.019012143835425377, 'answer': 'medium'}] +[{'score': 0.9112, 'answer': 'snow'}, {'score': 0.8796, 'answer': 'in snow'}, {'score': 0.6717, 'answer': 'outside'}, {'score': 0.0291, 'answer': 'on ground'}, {'score': 0.027, 'answer': 'ground'}] ``` \ No newline at end of file From 875afa72ee817eecd05e8777970a510e0ce34361 Mon Sep 17 00:00:00 2001 From: Steven Date: Mon, 8 Aug 2022 09:02:36 -0700 Subject: [PATCH 3/3] =?UTF-8?q?=20=F0=9F=96=8D=20apply=20niels=20feedback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/source/en/pipeline_tutorial.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/pipeline_tutorial.mdx b/docs/source/en/pipeline_tutorial.mdx index cc9b53bb2b24a2..95585b64359f49 100644 --- a/docs/source/en/pipeline_tutorial.mdx +++ b/docs/source/en/pipeline_tutorial.mdx @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # Pipelines for inference -The [`pipeline`] makes it simple to use any model from the [Hub](https://huggingface.co/models) for inference on any language, computer vision, speech, and multimodal tasks. Even if you don't have experience with a specific modality or familiar with the underlying code behind the models, you can still use them for inference with the [`pipeline`]! This tutorial will teach you to: +The [`pipeline`] makes it simple to use any model from the [Hub](https://huggingface.co/models) for inference on any language, computer vision, speech, and multimodal tasks. Even if you don't have experience with a specific modality or aren't familiar with the underlying code behind the models, you can still use them for inference with the [`pipeline`]! This tutorial will teach you to: * Use a [`pipeline`] for inference. * Use a specific tokenizer or model. @@ -26,7 +26,7 @@ Take a look at the [`pipeline`] documentation for a complete list of supported t ## Pipeline usage -While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the task-specific pipelines. The [`pipeline`] automatically loads a default model and tokenizer capable of inference for your task. +While each task has an associated [`pipeline`], it is simpler to use the general [`pipeline`] abstraction which contains all the task-specific pipelines. The [`pipeline`] automatically loads a default model and a preprocessing class capable of inference for your task. 1. Start by creating a [`pipeline`] and specify an inference task: