Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ perf: Optimize the image upload size for gpt-4-vision #669

Merged
merged 7 commits into from
Dec 18, 2023

Conversation

mushan0x0
Copy link
Contributor

@mushan0x0 mushan0x0 commented Dec 15, 2023

💻 变更类型 | Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 🔨 chore
  • 📝 docs

🔀 变更说明 | Description of Change

设置图片最大宽度或者高度为 2k,再将图片格式转为 webp

Close #668
Close #646

📝 补充信息 | Additional Information

压缩之后
image

压缩之前
image

Copy link

vercel bot commented Dec 15, 2023

@mushan0x0 is attempting to deploy a commit to the LobeHub Team on Vercel.

A member of the Team first needs to authorize it.

@lobehubbot
Copy link
Member

👍 @mushan0x0

Thank you for raising your pull request and contributing to our Community
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.
If you encounter any problems, please feel free to connect with us.
非常感谢您提出拉取请求并为我们的社区做出贡献,请确保您已经遵循了我们的贡献指南,我们会尽快审查它。
如果您遇到任何问题,请随时与我们联系。

src/services/file.ts Show resolved Hide resolved
@arvinxx
Copy link
Contributor

arvinxx commented Dec 15, 2023

此外,实现思路上可以考虑下,是只存压缩后的缩略图,还是 raw 也存,缩略图也存

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


In addition, you can consider the implementation idea, whether to store only compressed thumbnails, or to store raw and thumbnails as well.

@canisminor1990
Copy link
Member

之前自定义头像有个压缩方法 https://github.com/lobehub/lobe-chat/blob/main/src/utils/imageToBase64.ts
感觉可以合一合

@mushan0x0
Copy link
Contributor Author

之前自定义头像有个压缩方法 https://github.com/lobehub/lobe-chat/blob/main/src/utils/imageToBase64.ts 感觉可以合一合

一开始是用的这个,但是里面有个图片居中逻辑,合并的话不如再抽个文件

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (b142c17) 87.55% compared to head (be7a692) 87.36%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/services/file.ts 30.30% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #669      +/-   ##
==========================================
- Coverage   87.55%   87.36%   -0.19%     
==========================================
  Files         171      172       +1     
  Lines        8045     8107      +62     
  Branches      719      724       +5     
==========================================
+ Hits         7044     7083      +39     
- Misses       1001     1024      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mushan0x0 mushan0x0 force-pushed the pref/compress_image branch 2 times, most recently from 9fb1840 to 016acd9 Compare December 15, 2023 14:10
@mushan0x0 mushan0x0 changed the title ⚡️ feat: Optimize the image upload size for gpt-4-vision ⚡️ pref: Optimize the image upload size for gpt-4-vision Dec 15, 2023
@mushan0x0 mushan0x0 changed the title ⚡️ pref: Optimize the image upload size for gpt-4-vision ⚡️ perf: Optimize the image upload size for gpt-4-vision Dec 15, 2023
async uploadFile(file: DB_File) {
// 跳过图片上传测试
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图片压缩这里测试时只能跳过了,应该很难 mock

Copy link
Contributor

@arvinxx arvinxx Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以使用 LobeChat 测试工程师 来帮你写单测:

现在 LobeChat 中很多单测都是它帮忙写的: https://shareg.pt/xHPM9NJ

@mushan0x0
Copy link
Contributor Author

此外,实现思路上可以考虑下,是只存压缩后的缩略图,还是 raw 也存,缩略图也存

原图就不用存了,占体积也麻烦,本身 webp 跟原图差别也不大

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


In addition, you can consider in terms of implementation ideas whether to store only compressed thumbnails or raw and thumbnails as well.

There is no need to save the original image, and it takes up a lot of space. Webp itself is not much different from the original image.

@canisminor1990
Copy link
Member

canisminor1990 commented Dec 15, 2023

还有一个思考点,目前压缩图片是通过分辨率判断,是不是通过 blob.size 最大值判断更好,理论上应该是在条件允许的可能下,保留更高清的图片的,只是一个想法,实现上可能会触发多次压缩循环,性能应该不太好

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


There is another point to think about. Currently, compressed images are judged by resolution. Is it better to judge by the maximum value of blob.size?

@Wxh16144
Copy link
Contributor

我也想试着参与你们,在这个 pr 上我补充了一些测试用例。 大佬们看看可以不 😁

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I also want to try to participate in you, and I have added some test cases in this PR. Guys, let’s see if you can 😁

@arvinxx
Copy link
Contributor

arvinxx commented Dec 16, 2023

还有一个思考点,目前压缩图片是通过分辨率判断,是不是通过 blob.size 最大值判断更好,理论上应该是在条件允许的可能下,保留更高清的图片的,只是一个想法,实现上可能会触发多次压缩循环,性能应该不太好

@canisminor1990 我感觉不是,应该按照分辨率来。原因是 GPT-4v 模型是按分辨率收钱的:

image

refs: https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding

目前我的实现中是写死了 auto 模式,因此模型会自动识别分别率。比如一张 512x512 的图片,发过去就会是 65 token。

如果是一张分辨率是 12800x25600 的图片,如果限定尺寸,而不限定分辨率,可能是缩小到了 5120x 12800 ,满足了我们的尺寸要求。但发给 4v 之后容易造成更多的 token 浪费。毕竟有可能使用低分辨率就可以描述清楚了。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Another point to think about is that currently compressed images are judged by resolution. Is it better to judge by the maximum value of blob.size? In theory, higher-definition images should be retained when conditions permit. This is just an idea. The implementation may trigger multiple compression cycles, and the performance may not be very good.

@canisminor1990 I don’t think so, it should be based on the resolution. The reason is that the GPT-4v model charges based on resolution:

image

At present, the auto mode is hard-coded in my implementation, so the model will automatically recognize the resolution. For example, if a 512x512 picture is sent, it will cost 65 tokens.

If it is a picture with a resolution of 12800x25600, if the size is limited but not the resolution, it may be reduced to 5120x 12800, which meets our size requirements. However, it is easy to cause more waste of tokens after issuing it to 4v. After all, it is possible to describe clearly using low resolution.

@canisminor1990
Copy link
Member

还有一个思考点,目前压缩图片是通过分辨率判断,是不是通过 blob.size 最大值判断更好,理论上应该是在条件允许的可能下,保留更高清的图片的,只是一个想法,实现上可能会触发多次压缩循环,性能应该不太好

@canisminor1990 我感觉不是,应该按照分辨率来。原因是 GPT-4v 模型是按分辨率收钱的:

image

refs: https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding

目前我的实现中是写死了 auto 模式,因此模型会自动识别分别率。比如一张 512x512 的图片,发过去就会是 65 token。

如果是一张分辨率是 12800x25600 的图片,如果限定尺寸,而不限定分辨率,可能是缩小到了 5120x 12800 ,满足了我们的尺寸要求。但发给 4v 之后容易造成更多的 token 浪费。毕竟有可能使用低分辨率就可以描述清楚了。

这样啊 之前没仔细了解过

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


There is another point to think about. Currently, compressed images are judged by resolution. Is it better to judge by the maximum value of blob.size? In theory, it should be possible to retain higher-definition images when conditions permit. This is just an idea. , the implementation may trigger multiple compression cycles, and the performance should not be very good.

@canisminor1990 I don’t think so, it should be based on the resolution. The reason is that the GPT-4v model charges based on resolution:

image

refs: https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding

In my current implementation, the auto mode is hard-coded, so the model will automatically recognize the resolution. For example, if a 512x512 picture is sent, it will cost 65 tokens.

If it is a picture with a resolution of 12800x25600, if the size is limited but not the resolution, it may be reduced to 5120x 12800, which meets our size requirements. However, it is easy to cause more waste of tokens after issuing it to 4v. After all, it is possible to describe clearly using low resolution.

That's it. I haven't understood it carefully before.

@arvinxx
Copy link
Contributor

arvinxx commented Dec 17, 2023

@mushan0x0 merge 下 main ? 有冲突

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@mushan0x0 merge under main? Conflict

@mushan0x0
Copy link
Contributor Author

好的,晚点处理一下

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Okay, I'll deal with it later.

@arvinxx arvinxx merged commit d038d24 into lobehub:main Dec 18, 2023
2 of 5 checks passed
@lobehubbot
Copy link
Member

❤️ Great PR @mushan0x0 ❤️

The growth of project is inseparable from user feedback and contribution, thanks for your contribution! If you are interesting with the lobehub developer community, please join our discord and then dm @arvinxx or @canisminor1990. They will invite you to our private developer channel. We are talking about the lobe-chat development or sharing ai newsletter around the world.
项目的成长离不开用户反馈和贡献,感谢您的贡献! 如果您对 LobeHub 开发者社区感兴趣,请加入我们的 discord,然后私信 @arvinxx@canisminor1990。他们会邀请您加入我们的私密开发者频道。我们将会讨论关于 Lobe Chat 的开发,分享和讨论全球范围内的 AI 消息。

@lobehubbot
Copy link
Member

🎉 This PR is included in version 0.114.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@mushan0x0 mushan0x0 deleted the pref/compress_image branch December 28, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants