Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for vision chat completion models #557

Closed
wants to merge 3 commits into from

Conversation

pufferffish
Copy link

@pufferffish pufferffish commented Nov 9, 2023

Describe the change
This PR let's you pass in image_url so we can pass in images to vision models like gpt-4-vision-preview

Describe your solution
The definition of ChatCompletionMessage is changed such that we can pass in a struct containing text and images for Content instead of a string.

Tests
This PR has been tested in example/chatvision

Issue: #539

Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #557 (9095b38) into master (08c167f) will decrease coverage by 1.27%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #557      +/-   ##
==========================================
- Coverage   97.88%   96.61%   -1.27%     
==========================================
  Files          20       20              
  Lines         991     1004      +13     
==========================================
  Hits          970      970              
- Misses         15       28      +13     
  Partials        6        6              
Files Coverage Δ
chat.go 61.76% <0.00%> (-38.24%) ⬇️

@pufferffish pufferffish changed the title add structs for image_url content Add support for vision chat completion models Nov 9, 2023
@pufferffish pufferffish marked this pull request as ready for review November 9, 2023 06:57
type ChatCompletionMessage struct {
Role string `json:"role"`
Content []ChatMessageContent `json:"content"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision is not within my jurisdiction, but these modifications breaks compatibility. I have already suggested a less intrusive alternative here. I am awaiting the maintainer's involvement to facilitate a discussion on the matter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAI API still accepts string values for Content. Ideally this wrapper conforms to their API contract.

Copy link
Contributor

@rkintzi rkintzi Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sashabaranov
Do you have any idea how to deal with backward compatibility? I summed-up the possible solutions (that I see) here.

@@ -51,7 +56,31 @@ type PromptAnnotation struct {
ContentFilterResults ContentFilterResults `json:"content_filter_results,omitempty"`
}

type ChatMessageImageURL struct {
URL string `json:"url"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it like this

type ImageURLDetail string

const (
	ImageURLDetailLow  ImageURLDetail = "low"
	ImageURLDetailHigh ImageURLDetail = "high"
)

type ChatMessageImageURL struct {
	Detail ImageURLDetail `json:"detail"`
	URL    string         `json:"url"`
}

Comment on lines +14 to +33
req := openai.ChatCompletionRequest{
Model: openai.GPT4VisionPreview,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
Content: []openai.ChatMessageContent{
{
Type: openai.ChatMessageContentTypeText,
Text: "What's in this image",
},
{
Type: openai.ChatMessageContentTypeImage,
ImageURL: &openai.ChatMessageImageURL{
URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
},
},
},
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also include an example with local image like this one.

	imagePath := "path/to/image.png"

	// Read the image file
	imgData, err := os.ReadFile(imagePath)
	if err != nil {
		fmt.Println("Error reading image file:", err)
		os.Exit(1)
	}

	// Encode to base64
	base64Str := base64.StdEncoding.EncodeToString(imgData)
	req := openai.ChatCompletionRequest{
		Model:     openai.GPT4VisionPreview,
		MaxTokens: 4096,
		Messages: []openai.ChatCompletionMessage{
			{
				Role: openai.ChatMessageRoleUser,
				Content: []openai.ChatMessageContent{
					{
						Type: openai.ChatMessageContentTypeText,
						Text: "What's in this image",
					},
					{
						Type: openai.ChatMessageContentTypeImage,
						ImageURL: &openai.ChatMessageImageURL{
							Detail: openai.ImageURLDetailHigh,
							URL:    "data:image/png;base64," + base64Str,
						},
					},
				},
			},
		},
	}

@arif599
Copy link

arif599 commented Nov 17, 2023

Hey! is it possible to stream with gpt-4-vision-preview?

@arif599
Copy link

arif599 commented Nov 17, 2023

Hey! is it possible to stream with gpt-4-vision-preview?

I think I figured it out. However, the stream sometimes receives EOF during middle of the sentence.
Any idea why? @AlexandrosKyriakakis @rkintzi

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	openai "github.com/sashabaranov/go-openai"
)

func main() {
	client := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
	ctx := context.Background()

	req := openai.ChatCompletionRequest{
		Model: openai.GPT4VisionPreview,
		Messages: []openai.ChatCompletionMessage{
			{
				Role: openai.ChatMessageRoleUser,
				Content: []openai.ChatMessageContent{
					{
						Type: openai.ChatMessageContentTypeText,
						Text: "What's in this image",
					},
					{
						Type: openai.ChatMessageContentTypeImage,
						ImageURL: &openai.ChatMessageImageURL{
							URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
						},
					},
				},
			},
		},
		Stream: true,
	}

	stream, err := client.CreateChatCompletionStream(ctx, req)
	if err != nil {
		fmt.Printf("ChatCompletionStream error: %v\n", err)
		return
	}
	defer stream.Close()

	fmt.Printf("Stream response: ")
	for {
		response, err := stream.Recv()
		if errors.Is(err, io.EOF) {
			fmt.Println("\nStream finished")
			return
		}

		if err != nil {
			fmt.Printf("\nStream error: %v\n", err)
			return
		}

		fmt.Printf(response.Choices[0].Delta.Content)
	}
}

@rkintzi
Copy link
Contributor

rkintzi commented Nov 17, 2023

A few days ago, I encountered the same issue. Typically, I receive a sentence and a half, regardless of whether I send a URL with the https:// prefix or with an embedded image. I haven't tried without streaming enabled but will do so in a few hours. Have you noticed any differences in results when using streaming versus not using it?

@arif599

@AlexandrosKyriakakis
Copy link

@arif599 I believe you need to set the MaxTokens to be higher value or highest(= 4096).
I guess it similar to this issue -> https://community.openai.com/t/gpt-4-vision-preview-finish-details/475911

@arif599
Copy link

arif599 commented Nov 18, 2023

@arif599 I believe you need to set the MaxTokens to be higher value or highest(= 4096). I guess it similar to this issue -> https://community.openai.com/t/gpt-4-vision-preview-finish-details/475911

Added MaxTokens: 4096 in ChatCompletionRequest and is working fine, thanks :)

@arif599
Copy link

arif599 commented Nov 18, 2023

A few days ago, I encountered the same issue. Typically, I receive a sentence and a half, regardless of whether I send a URL with the https:// prefix or with an embedded image. I haven't tried without streaming enabled but will do so in a few hours. Have you noticed any differences in results when using streaming versus not using it?

@arif599

Haven't noticed any difference with the quality of results but feel free to test it out yourself.
For streaming, just add MaxTokens: 4096 to ChatCompletionRequest in the example I provided.
So now it would be


import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	openai "github.com/sashabaranov/go-openai"
)

func main() {
	client := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
	ctx := context.Background()

	req := openai.ChatCompletionRequest{
		Model: openai.GPT4VisionPreview,
		Messages: []openai.ChatCompletionMessage{
			{
				Role: openai.ChatMessageRoleUser,
				Content: []openai.ChatMessageContent{
					{
						Type: openai.ChatMessageContentTypeText,
						Text: "What's in this image",
					},
					{
						Type: openai.ChatMessageContentTypeImage,
						ImageURL: &openai.ChatMessageImageURL{
							URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
						},
					},
				},
			},
		},
		Stream:    true,
		MaxTokens: 4096,
	}

	stream, err := client.CreateChatCompletionStream(ctx, req)
	if err != nil {
		fmt.Printf("ChatCompletionStream error: %v\n", err)
		return
	}
	defer stream.Close()

	fmt.Printf("Stream response: ")
	for {
		response, err := stream.Recv()
		if errors.Is(err, io.EOF) {
			fmt.Println("\nStream finished")
			return
		}

		if err != nil {
			fmt.Printf("\nStream error: %v\n", err)
			return
		}

		fmt.Printf(response.Choices[0].Delta.Content)
	}
}

@arif599
Copy link

arif599 commented Nov 18, 2023

@rkintzi can we send base64 encoded imgs?
Like https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images

I wrote this encodeImage function in go

func encodeImage(imagePath string) string {
	imageData, err := os.ReadFile(imagePath)
	if err != nil {
		fmt.Println("Error:", err)
		return ""
	}
	base64String := base64.StdEncoding.EncodeToString(imageData)
	return base64String
}

but when I pass the encoded img into ImgURL

	req := gpt.ChatCompletionRequest{
		Model: gpt.GPT4VisionPreview,
		Messages: []gpt.ChatCompletionMessage{
			{
				Role: gpt.ChatMessageRoleUser,
				Content: []gpt.ChatMessageContent{
					{
						Type: gpt.ChatMessageContentTypeText,
						Text: prompt,
					},
					{
						Type: gpt.ChatMessageContentTypeImage,
						ImageURL: &gpt.ChatMessageImageURL{
							URL: encodeImage("example.jpg"),
						},
					},
				},
			},
		},
		Stream:    true,
		MaxTokens: 4096,
	}

I get error, status code: 400, message: Invalid image.

UPDATE: Needed to add this "data:image/jpeg;base64," + encodeImage("o.jpg"), now it works

@AlexandrosKyriakakis
Copy link

@sashabaranov @pufferffish closing this?

@pufferffish
Copy link
Author

Closed, duplicate of #580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants