-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for vision chat completion models #557
Conversation
Codecov Report
@@ Coverage Diff @@
## master #557 +/- ##
==========================================
- Coverage 97.88% 96.61% -1.27%
==========================================
Files 20 20
Lines 991 1004 +13
==========================================
Hits 970 970
- Misses 15 28 +13
Partials 6 6
|
type ChatCompletionMessage struct { | ||
Role string `json:"role"` | ||
Content []ChatMessageContent `json:"content"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decision is not within my jurisdiction, but these modifications breaks compatibility. I have already suggested a less intrusive alternative here. I am awaiting the maintainer's involvement to facilitate a discussion on the matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenAI API still accepts string values for Content. Ideally this wrapper conforms to their API contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sashabaranov
Do you have any idea how to deal with backward compatibility? I summed-up the possible solutions (that I see) here.
@@ -51,7 +56,31 @@ type PromptAnnotation struct { | |||
ContentFilterResults ContentFilterResults `json:"content_filter_results,omitempty"` | |||
} | |||
|
|||
type ChatMessageImageURL struct { | |||
URL string `json:"url"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add detail attribute. https://platform.openai.com/docs/guides/vision/calculating-costs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it like this
type ImageURLDetail string
const (
ImageURLDetailLow ImageURLDetail = "low"
ImageURLDetailHigh ImageURLDetail = "high"
)
type ChatMessageImageURL struct {
Detail ImageURLDetail `json:"detail"`
URL string `json:"url"`
}
req := openai.ChatCompletionRequest{ | ||
Model: openai.GPT4VisionPreview, | ||
Messages: []openai.ChatCompletionMessage{ | ||
{ | ||
Role: openai.ChatMessageRoleUser, | ||
Content: []openai.ChatMessageContent{ | ||
{ | ||
Type: openai.ChatMessageContentTypeText, | ||
Text: "What's in this image", | ||
}, | ||
{ | ||
Type: openai.ChatMessageContentTypeImage, | ||
ImageURL: &openai.ChatMessageImageURL{ | ||
URL: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg", | ||
}, | ||
}, | ||
}, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also include an example with local image like this one.
imagePath := "path/to/image.png"
// Read the image file
imgData, err := os.ReadFile(imagePath)
if err != nil {
fmt.Println("Error reading image file:", err)
os.Exit(1)
}
// Encode to base64
base64Str := base64.StdEncoding.EncodeToString(imgData)
req := openai.ChatCompletionRequest{
Model: openai.GPT4VisionPreview,
MaxTokens: 4096,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
Content: []openai.ChatMessageContent{
{
Type: openai.ChatMessageContentTypeText,
Text: "What's in this image",
},
{
Type: openai.ChatMessageContentTypeImage,
ImageURL: &openai.ChatMessageImageURL{
Detail: openai.ImageURLDetailHigh,
URL: "data:image/png;base64," + base64Str,
},
},
},
},
},
}
Hey! is it possible to stream with gpt-4-vision-preview? |
I think I figured it out. However, the stream sometimes receives EOF during middle of the sentence.
|
A few days ago, I encountered the same issue. Typically, I receive a sentence and a half, regardless of whether I send a URL with the https:// prefix or with an embedded image. I haven't tried without streaming enabled but will do so in a few hours. Have you noticed any differences in results when using streaming versus not using it? |
@arif599 I believe you need to set the MaxTokens to be higher value or highest(= 4096). |
Added |
Haven't noticed any difference with the quality of results but feel free to test it out yourself.
|
@rkintzi can we send base64 encoded imgs? I wrote this encodeImage function in go
but when I pass the encoded img into ImgURL
I get error, status code: 400, message: Invalid image. UPDATE: Needed to add this "data:image/jpeg;base64," + encodeImage("o.jpg"), now it works |
@sashabaranov @pufferffish closing this? |
Closed, duplicate of #580 |
Describe the change
This PR let's you pass in
image_url
so we can pass in images to vision models likegpt-4-vision-preview
Describe your solution
The definition of
ChatCompletionMessage
is changed such that we can pass in a struct containing text and images forContent
instead of a string.Tests
This PR has been tested in
example/chatvision
Issue: #539