Add support for vision chat completion models #557

pufferffish · 2023-11-09T06:30:00Z

Describe the change
This PR let's you pass in image_url so we can pass in images to vision models like gpt-4-vision-preview

Describe your solution
The definition of ChatCompletionMessage is changed such that we can pass in a struct containing text and images for Content instead of a string.

Tests
This PR has been tested in example/chatvision

Issue: #539

codecov · 2023-11-09T06:30:59Z

Codecov Report

Merging #557 (9095b38) into master (08c167f) will decrease coverage by 1.27%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #557      +/-   ##
==========================================
- Coverage   97.88%   96.61%   -1.27%     
==========================================
  Files          20       20              
  Lines         991     1004      +13     
==========================================
  Hits          970      970              
- Misses         15       28      +13     
  Partials        6        6

Files	Coverage Δ
chat.go	`61.76% <0.00%> (-38.24%)`	⬇️

rkintzi · 2023-11-09T08:12:36Z

chat.go

 type ChatCompletionMessage struct {
+	Role    string               `json:"role"`
+	Content []ChatMessageContent `json:"content"`


This decision is not within my jurisdiction, but these modifications breaks compatibility. I have already suggested a less intrusive alternative here. I am awaiting the maintainer's involvement to facilitate a discussion on the matter.

The OpenAI API still accepts string values for Content. Ideally this wrapper conforms to their API contract.

@sashabaranov
Do you have any idea how to deal with backward compatibility? I summed-up the possible solutions (that I see) here.

AlexandrosKyriakakis · 2023-11-12T18:03:16Z

chat.go

@@ -51,7 +56,31 @@ type PromptAnnotation struct {
 	ContentFilterResults ContentFilterResults `json:"content_filter_results,omitempty"`
 }

+type ChatMessageImageURL struct {
+	URL string `json:"url"`


You can also add detail attribute. https://platform.openai.com/docs/guides/vision/calculating-costs

I made it like this

type ImageURLDetail string const ( ImageURLDetailLow ImageURLDetail = "low" ImageURLDetailHigh ImageURLDetail = "high" ) type ChatMessageImageURL struct { Detail ImageURLDetail `json:"detail"` URL string `json:"url"` }

AlexandrosKyriakakis · 2023-11-12T18:05:41Z

examples/chatvision/main.go

+	req := openai.ChatCompletionRequest{
+		Model: openai.GPT4VisionPreview,
+		Messages: []openai.ChatCompletionMessage{
+			{
+				Role: openai.ChatMessageRoleUser,
+				Content: []openai.ChatMessageContent{
+					{
+						Type: openai.ChatMessageContentTypeText,
+						Text: "What's in this image",
+					},
+					{
+						Type: openai.ChatMessageContentTypeImage,
+						ImageURL: &openai.ChatMessageImageURL{
+							URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+						},
+					},
+				},
+			},
+		},
+	}


Can also include an example with local image like this one.

imagePath := "path/to/image.png" // Read the image file imgData, err := os.ReadFile(imagePath) if err != nil { fmt.Println("Error reading image file:", err) os.Exit(1) } // Encode to base64 base64Str := base64.StdEncoding.EncodeToString(imgData) req := openai.ChatCompletionRequest{ Model: openai.GPT4VisionPreview, MaxTokens: 4096, Messages: []openai.ChatCompletionMessage{ { Role: openai.ChatMessageRoleUser, Content: []openai.ChatMessageContent{ { Type: openai.ChatMessageContentTypeText, Text: "What's in this image", }, { Type: openai.ChatMessageContentTypeImage, ImageURL: &openai.ChatMessageImageURL{ Detail: openai.ImageURLDetailHigh, URL: "data:image/png;base64," + base64Str, }, }, }, }, }, }

…-openai#557 is merged

arif599 · 2023-11-17T06:07:24Z

Hey! is it possible to stream with gpt-4-vision-preview?

arif599 · 2023-11-17T06:40:54Z

Hey! is it possible to stream with gpt-4-vision-preview?

I think I figured it out. However, the stream sometimes receives EOF during middle of the sentence.
Any idea why? @AlexandrosKyriakakis @rkintzi

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	openai "github.com/sashabaranov/go-openai"
)

func main() {
	client := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
	ctx := context.Background()

	req := openai.ChatCompletionRequest{
		Model: openai.GPT4VisionPreview,
		Messages: []openai.ChatCompletionMessage{
			{
				Role: openai.ChatMessageRoleUser,
				Content: []openai.ChatMessageContent{
					{
						Type: openai.ChatMessageContentTypeText,
						Text: "What's in this image",
					},
					{
						Type: openai.ChatMessageContentTypeImage,
						ImageURL: &openai.ChatMessageImageURL{
							URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
						},
					},
				},
			},
		},
		Stream: true,
	}

	stream, err := client.CreateChatCompletionStream(ctx, req)
	if err != nil {
		fmt.Printf("ChatCompletionStream error: %v\n", err)
		return
	}
	defer stream.Close()

	fmt.Printf("Stream response: ")
	for {
		response, err := stream.Recv()
		if errors.Is(err, io.EOF) {
			fmt.Println("\nStream finished")
			return
		}

		if err != nil {
			fmt.Printf("\nStream error: %v\n", err)
			return
		}

		fmt.Printf(response.Choices[0].Delta.Content)
	}
}

rkintzi · 2023-11-17T07:21:27Z

A few days ago, I encountered the same issue. Typically, I receive a sentence and a half, regardless of whether I send a URL with the https:// prefix or with an embedded image. I haven't tried without streaming enabled but will do so in a few hours. Have you noticed any differences in results when using streaming versus not using it?

@arif599

AlexandrosKyriakakis · 2023-11-17T11:27:57Z

@arif599 I believe you need to set the MaxTokens to be higher value or highest(= 4096).
I guess it similar to this issue -> https://community.openai.com/t/gpt-4-vision-preview-finish-details/475911

arif599 · 2023-11-18T05:00:11Z

@arif599 I believe you need to set the MaxTokens to be higher value or highest(= 4096). I guess it similar to this issue -> https://community.openai.com/t/gpt-4-vision-preview-finish-details/475911

Added MaxTokens: 4096 in ChatCompletionRequest and is working fine, thanks :)

arif599 · 2023-11-18T05:02:54Z

A few days ago, I encountered the same issue. Typically, I receive a sentence and a half, regardless of whether I send a URL with the https:// prefix or with an embedded image. I haven't tried without streaming enabled but will do so in a few hours. Have you noticed any differences in results when using streaming versus not using it?

@arif599

Haven't noticed any difference with the quality of results but feel free to test it out yourself.
For streaming, just add MaxTokens: 4096 to ChatCompletionRequest in the example I provided.
So now it would be


import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	openai "github.com/sashabaranov/go-openai"
)

func main() {
	client := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
	ctx := context.Background()

	req := openai.ChatCompletionRequest{
		Model: openai.GPT4VisionPreview,
		Messages: []openai.ChatCompletionMessage{
			{
				Role: openai.ChatMessageRoleUser,
				Content: []openai.ChatMessageContent{
					{
						Type: openai.ChatMessageContentTypeText,
						Text: "What's in this image",
					},
					{
						Type: openai.ChatMessageContentTypeImage,
						ImageURL: &openai.ChatMessageImageURL{
							URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
						},
					},
				},
			},
		},
		Stream:    true,
		MaxTokens: 4096,
	}

	stream, err := client.CreateChatCompletionStream(ctx, req)
	if err != nil {
		fmt.Printf("ChatCompletionStream error: %v\n", err)
		return
	}
	defer stream.Close()

	fmt.Printf("Stream response: ")
	for {
		response, err := stream.Recv()
		if errors.Is(err, io.EOF) {
			fmt.Println("\nStream finished")
			return
		}

		if err != nil {
			fmt.Printf("\nStream error: %v\n", err)
			return
		}

		fmt.Printf(response.Choices[0].Delta.Content)
	}
}

arif599 · 2023-11-18T21:36:49Z

@rkintzi can we send base64 encoded imgs?
Like https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images

I wrote this encodeImage function in go

func encodeImage(imagePath string) string {
	imageData, err := os.ReadFile(imagePath)
	if err != nil {
		fmt.Println("Error:", err)
		return ""
	}
	base64String := base64.StdEncoding.EncodeToString(imageData)
	return base64String
}

but when I pass the encoded img into ImgURL

	req := gpt.ChatCompletionRequest{
		Model: gpt.GPT4VisionPreview,
		Messages: []gpt.ChatCompletionMessage{
			{
				Role: gpt.ChatMessageRoleUser,
				Content: []gpt.ChatMessageContent{
					{
						Type: gpt.ChatMessageContentTypeText,
						Text: prompt,
					},
					{
						Type: gpt.ChatMessageContentTypeImage,
						ImageURL: &gpt.ChatMessageImageURL{
							URL: encodeImage("example.jpg"),
						},
					},
				},
			},
		},
		Stream:    true,
		MaxTokens: 4096,
	}

I get error, status code: 400, message: Invalid image.

UPDATE: Needed to add this "data:image/jpeg;base64," + encodeImage("o.jpg"), now it works

AlexandrosKyriakakis · 2023-11-26T21:14:24Z

@sashabaranov @pufferffish closing this?

pufferffish · 2023-11-27T19:50:33Z

Closed, duplicate of #580

add structs for image_url content

70c7038

pufferffish added 2 commits November 9, 2023 06:38

fix Content type to an array

77a66a3

fix response unmarshalling

9095b38

pufferffish changed the title ~~add structs for image_url content~~ Add support for vision chat completion models Nov 9, 2023

pufferffish marked this pull request as ready for review November 9, 2023 06:57

rkintzi reviewed Nov 9, 2023

View reviewed changes

AlexandrosKyriakakis reviewed Nov 12, 2023

View reviewed changes

KeganHollern added a commit to KeganHollern/Aika that referenced this pull request Nov 13, 2023

some comments on how to introduce vision to aika once sashabaranov/go…

016741f

…-openai#557 is merged

rkintzi mentioned this pull request Nov 14, 2023

Add support for multi part chat messages (and gpt-4-vision-preview model) #580

Merged

pufferffish closed this Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for vision chat completion models #557

Add support for vision chat completion models #557

pufferffish commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

rkintzi Nov 9, 2023

zquestz Nov 9, 2023

rkintzi Nov 14, 2023 •

edited

Loading

AlexandrosKyriakakis Nov 12, 2023

AlexandrosKyriakakis Nov 12, 2023

AlexandrosKyriakakis Nov 12, 2023

arif599 commented Nov 17, 2023

arif599 commented Nov 17, 2023 •

edited

Loading

rkintzi commented Nov 17, 2023

AlexandrosKyriakakis commented Nov 17, 2023

arif599 commented Nov 18, 2023

arif599 commented Nov 18, 2023 •

edited

Loading

arif599 commented Nov 18, 2023 •

edited

Loading

AlexandrosKyriakakis commented Nov 26, 2023

pufferffish commented Nov 27, 2023

Add support for vision chat completion models #557

Add support for vision chat completion models #557

Conversation

pufferffish commented Nov 9, 2023 • edited Loading

codecov bot commented Nov 9, 2023 • edited Loading

Codecov Report

rkintzi Nov 9, 2023

Choose a reason for hiding this comment

zquestz Nov 9, 2023

Choose a reason for hiding this comment

rkintzi Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

AlexandrosKyriakakis Nov 12, 2023

Choose a reason for hiding this comment

AlexandrosKyriakakis Nov 12, 2023

Choose a reason for hiding this comment

AlexandrosKyriakakis Nov 12, 2023

Choose a reason for hiding this comment

arif599 commented Nov 17, 2023

arif599 commented Nov 17, 2023 • edited Loading

rkintzi commented Nov 17, 2023

AlexandrosKyriakakis commented Nov 17, 2023

arif599 commented Nov 18, 2023

arif599 commented Nov 18, 2023 • edited Loading

arif599 commented Nov 18, 2023 • edited Loading

AlexandrosKyriakakis commented Nov 26, 2023

pufferffish commented Nov 27, 2023

pufferffish commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

rkintzi Nov 14, 2023 •

edited

Loading

arif599 commented Nov 17, 2023 •

edited

Loading

arif599 commented Nov 18, 2023 •

edited

Loading

arif599 commented Nov 18, 2023 •

edited

Loading