Feature Request: add LLava as image recognizer on web pages #33

amonpaike · 2024-03-31T16:15:37Z

It would be nice to add LLava to recognize images in web pages so that it works together with RAG models.. LLava is however a textual AI model so it can be used as the main model to describe pages, this is to avoid overloading loading times. you just need to be able to "see" the various images.

On hugging face there are also models like mistral7B mixed with llava....

amonpaike · 2024-03-31T17:34:35Z

LLAVA already works if you copy and paste the image link that are in a web page ,,,,

n4ze3m · 2024-03-31T18:11:08Z

Yes, both the web UI and the side panel support uploading images.

There are a few more side panel supports for:

Chat with YouTube video (with transcript)
Chat with a PDF opened on a website

amonpaike · 2024-03-31T18:51:34Z

Nice! I assume that it will therefore be much less complicated to make the RAG AI model interact with the LLAVA description of the images to create a single summary analysis. I hope this is the case, it will be a wonderful feature.

amonpaike · 2024-03-31T19:07:37Z

I think that the work you will have to do is even less complicated, already now, if I open an image from the web as a page LLAVA can see it and describe it.. At this point the only thing you will have to do is indicate the links to LLAVA of the relevant images in the page and mix them with the text that the RAG provides to the AI.. :) it seems easy, but obviously I don't know anything about it, it's just a guess.

amonpaike · 2024-03-31T19:21:52Z

More fun! In reality the AI with the RAG it is already capable of extracting the link of the images! They just need to be sent to LLAVA to write the description of the image.

n4ze3m · 2024-03-31T19:24:28Z

Yes, it's possible, I guess. I will add a setting to configure the model with vision for the image URL in the upcoming release :)

amonpaike · 2024-03-31T19:30:00Z

bingo mate!, now is your turn to do the magic here...
LLAVA whit the RAG is already capable to extract the link of the images and describing it ....
(all you have to do is invoke LLAVA like you did for the RAG and mix everything..)

n4ze3m · 2024-03-31T19:34:19Z

Thank you for the suggestions. We will improve on them in the upcoming version. By the way, Page Assist has a ChatGPT-like web UI. Have you tried it? If not, click on the Page Assist icon.

The web UI supports internet search and many more features. :)

amonpaike · 2024-03-31T19:38:57Z

yes I tried it a little, but at the moment I'm excited by the concept of the private co-pilot next to the web pages

n4ze3m added the enhancement New feature or request label Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add LLava as image recognizer on web pages #33

Feature Request: add LLava as image recognizer on web pages #33

amonpaike commented Mar 31, 2024

amonpaike commented Mar 31, 2024 •

edited

Loading

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024 •

edited

Loading

amonpaike commented Mar 31, 2024

amonpaike commented Mar 31, 2024

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024

Feature Request: add LLava as image recognizer on web pages #33

Feature Request: add LLava as image recognizer on web pages #33

Comments

amonpaike commented Mar 31, 2024

amonpaike commented Mar 31, 2024 • edited Loading

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024 • edited Loading

amonpaike commented Mar 31, 2024

amonpaike commented Mar 31, 2024

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024

n4ze3m commented Mar 31, 2024

amonpaike commented Mar 31, 2024

amonpaike commented Mar 31, 2024 •

edited

Loading

amonpaike commented Mar 31, 2024 •

edited

Loading