Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the chroma library and the bge-large-zh-v1.5 model, when recalling certain words, completely irrelevant slices are recalled. #1646

Open
2 of 15 tasks
tanghaichen opened this issue Jun 19, 2024 · 1 comment
Labels
bug Something isn't working Waiting for reply

Comments

@tanghaichen
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Operating system information

Linux

Python version information

3.10

DB-GPT version

main

Related scenes

  • Chat Data
  • Chat Excel
  • Chat DB
  • Chat Knowledge
  • Model Management
  • Dashboard
  • Plugins

Installation Information

Device information

GPU 96G

Models information

bge-large-zh-v1.5

What happened

使用的是bge-large-zh-v1.5模型和chroma向量库,在检索某些词语的时候,召回的切片分数很高但是是和词语完全无关的。但只有某个词语是这样的,其他的绝大部分词语的召回还是比较准的。
目前文档存在pdf、csv和word,切片数量大概6000个左右。
示例:
词语:“水资源”
存在20个文档,900个切片,直接出现了水资源词语。其他文档均未出现这三个字。
但在询问水资源时,召回的切片中出现的均是与其无关的切片
目前未发现其他词语出现这个问题。

What you expected to happen

正常应该是从完全出现这个词语的切片中进行召回才是合理的。

How to reproduce

未知复现方法

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@tanghaichen tanghaichen added bug Something isn't working Waiting for reply labels Jun 19, 2024
@Aries-ckt Aries-ckt changed the title 使用chroma库和bge-large-zh-v1.5模型,对某些词语召回时,召回的却是完全不相关的切片 Using the chroma library and the bge-large-zh-v1.5 model, when recalling certain words, completely irrelevant slices are recalled. Jun 19, 2024
@Aries-ckt
Copy link
Collaborator

what kind of your document type and could you show some bad cases for us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Waiting for reply
Projects
None yet
Development

No branches or pull requests

2 participants