You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aries-ckt
changed the title
使用chroma库和bge-large-zh-v1.5模型,对某些词语召回时,召回的却是完全不相关的切片
Using the chroma library and the bge-large-zh-v1.5 model, when recalling certain words, completely irrelevant slices are recalled.
Jun 19, 2024
Search before asking
Operating system information
Linux
Python version information
3.10
DB-GPT version
main
Related scenes
Installation Information
Installation From Source
Docker Installation
Docker Compose Installation
Cluster Installation
AutoDL Image
Other
Device information
GPU 96G
Models information
bge-large-zh-v1.5
What happened
使用的是bge-large-zh-v1.5模型和chroma向量库,在检索某些词语的时候,召回的切片分数很高但是是和词语完全无关的。但只有某个词语是这样的,其他的绝大部分词语的召回还是比较准的。
目前文档存在pdf、csv和word,切片数量大概6000个左右。
示例:
词语:“水资源”
存在20个文档,900个切片,直接出现了水资源词语。其他文档均未出现这三个字。
但在询问水资源时,召回的切片中出现的均是与其无关的切片
目前未发现其他词语出现这个问题。
What you expected to happen
正常应该是从完全出现这个词语的切片中进行召回才是合理的。
How to reproduce
未知复现方法
Additional context
No response
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: