Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add LLava as image recognizer on web pages #33

Open
amonpaike opened this issue Mar 31, 2024 · 9 comments
Open

Feature Request: add LLava as image recognizer on web pages #33

amonpaike opened this issue Mar 31, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@amonpaike
Copy link

It would be nice to add LLava to recognize images in web pages so that it works together with RAG models.. LLava is however a textual AI model so it can be used as the main model to describe pages, this is to avoid overloading loading times. you just need to be able to "see" the various images.

On hugging face there are also models like mistral7B mixed with llava....

@n4ze3m n4ze3m added the enhancement New feature or request label Mar 31, 2024
@amonpaike
Copy link
Author

amonpaike commented Mar 31, 2024

LLAVA already works if you copy and paste the image link that are in a web page ,,,,

image

@n4ze3m
Copy link
Owner

n4ze3m commented Mar 31, 2024

Yes, both the web UI and the side panel support uploading images.

image

There are a few more side panel supports for:

  • Chat with YouTube video (with transcript)
  • Chat with a PDF opened on a website

@amonpaike
Copy link
Author

amonpaike commented Mar 31, 2024

Nice! I assume that it will therefore be much less complicated to make the RAG AI model interact with the LLAVA description of the images to create a single summary analysis. I hope this is the case, it will be a wonderful feature.
image

@amonpaike
Copy link
Author

I think that the work you will have to do is even less complicated, already now, if I open an image from the web as a page LLAVA can see it and describe it.. At this point the only thing you will have to do is indicate the links to LLAVA of the relevant images in the page and mix them with the text that the RAG provides to the AI.. :) it seems easy, but obviously I don't know anything about it, it's just a guess.

image

@amonpaike
Copy link
Author

More fun! In reality the AI with the RAG it is already capable of extracting the link of the images! They just need to be sent to LLAVA to write the description of the image.
image

@n4ze3m
Copy link
Owner

n4ze3m commented Mar 31, 2024

Yes, it's possible, I guess. I will add a setting to configure the model with vision for the image URL in the upcoming release :)

@amonpaike
Copy link
Author

bingo mate!, now is your turn to do the magic here...
LLAVA whit the RAG is already capable to extract the link of the images and describing it ....
(all you have to do is invoke LLAVA like you did for the RAG and mix everything..)

image

@n4ze3m
Copy link
Owner

n4ze3m commented Mar 31, 2024

Thank you for the suggestions. We will improve on them in the upcoming version. By the way, Page Assist has a ChatGPT-like web UI. Have you tried it? If not, click on the Page Assist icon.

The web UI supports internet search and many more features. :)

image
image

@amonpaike
Copy link
Author

yes I tried it a little, but at the moment I'm excited by the concept of the private co-pilot next to the web pages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants