forked from labring/FastGPT
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize the file storage structure of the knowledge base (labring#386)
- Loading branch information
Showing
41 changed files
with
591 additions
and
231 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
weight: 540 | ||
title: "设计方案" | ||
description: "FastGPT 部分设计方案" | ||
icon: public | ||
draft: false | ||
images: [] | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
weight: 541 | ||
title: "数据集" | ||
description: "FastGPT 数据集中文件与数据的设计方案" | ||
icon: dataset | ||
draft: false | ||
images: [] | ||
--- | ||
|
||
## 文件与数据的关系 | ||
|
||
在 FastGPT 中,文件会通过 MongoDB 的 FS 存储,而具体的数据会通过 PostgreSQL 存储,PG 中的数据会有一列 file_id,关联对应的文件。考虑到旧版本的兼容,以及手动输入、标注数据等,我们给 file_id 增加了一些特殊的值,如下: | ||
|
||
- manual: 手动输入 | ||
- mark: 手动标注的数据 | ||
|
||
注意,file_id 仅在插入数据时会写入,变更时无法修改。 | ||
|
||
## 文件导入流程 | ||
|
||
1. 上传文件到 MongoDB 的 FS 中,获取 file_id,此时文件标记为 `unused` 状态 | ||
2. 浏览器解析文件,获取对应的文本和 chunk | ||
3. 给每个 chunk 打上 file_id | ||
4. 点击上传数据:将文件的状态改为 `used`,并将数据推送到 mongo `training` 表中等待训练 | ||
5. 由训练线程从 mongo 中取数据,并在获取向量后插入到 pg。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: 'V4.4.7' | ||
description: 'FastGPT V4.4.7 更新(需执行升级脚本)' | ||
icon: 'upgrade' | ||
draft: false | ||
toc: true | ||
weight: 840 | ||
--- | ||
|
||
## 执行初始化 API | ||
|
||
发起 1 个 HTTP 请求({{rootkey}} 替换成环境变量里的`rootkey`,{{host}}替换成自己域名) | ||
|
||
1. https://xxxxx/api/admin/initv445 | ||
|
||
```bash | ||
curl --location --request POST 'https://{{host}}/api/admin/initv447' \ | ||
--header 'rootkey: {{rootkey}}' \ | ||
--header 'Content-Type: application/json' | ||
``` | ||
|
||
初始化 pg 索引以及将 file_id 中空对象转成 manual 对象。如果数据多,可能需要较长时间,可以通过日志查看进度。 | ||
|
||
## 功能介绍 | ||
|
||
### Fast GPT V4.4.7 | ||
|
||
1. 优化了数据库文件 crud。 | ||
2. 兼容链接读取,作为 source。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import { strIsLink } from './str'; | ||
|
||
export const fileImgs = [ | ||
{ suffix: 'pdf', src: '/imgs/files/pdf.svg' }, | ||
{ suffix: 'csv', src: '/imgs/files/csv.svg' }, | ||
{ suffix: '(doc|docs)', src: '/imgs/files/doc.svg' }, | ||
{ suffix: 'txt', src: '/imgs/files/txt.svg' }, | ||
{ suffix: 'md', src: '/imgs/files/markdown.svg' }, | ||
{ suffix: '.', src: '/imgs/files/file.svg' } | ||
]; | ||
|
||
export function getFileIcon(name = '') { | ||
return fileImgs.find((item) => new RegExp(item.suffix, 'gi').test(name))?.src; | ||
} | ||
export function getSpecialFileIcon(name = '') { | ||
if (name === 'manual') { | ||
return '/imgs/files/manual.svg'; | ||
} else if (name === 'mark') { | ||
return '/imgs/files/mark.svg'; | ||
} else if (strIsLink(name)) { | ||
return '/imgs/files/link.svg'; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
export function strIsLink(str?: string) { | ||
if (!str) return false; | ||
if (/^((http|https)?:\/\/|www\.|\/)[^\s/$.?#].[^\s]*$/i.test(str)) return true; | ||
return false; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
export enum DatasetSpecialIdEnum { | ||
manual = 'manual', | ||
mark = 'mark' | ||
} | ||
export const datasetSpecialIdMap = { | ||
[DatasetSpecialIdEnum.manual]: { | ||
name: 'kb.Manual Data', | ||
sourceName: 'kb.Manual Input' | ||
}, | ||
[DatasetSpecialIdEnum.mark]: { | ||
name: 'kb.Mark Data', | ||
sourceName: 'kb.Manual Mark' | ||
} | ||
}; | ||
export const datasetSpecialIds: string[] = [DatasetSpecialIdEnum.manual, DatasetSpecialIdEnum.mark]; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
import { datasetSpecialIds } from './constant'; | ||
import { strIsLink } from '@fastgpt/common/tools/str'; | ||
|
||
export function isSpecialFileId(id: string) { | ||
if (datasetSpecialIds.includes(id)) return true; | ||
if (strIsLink(id)) return true; | ||
return false; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
{ | ||
"name": "@fastgpt/support", | ||
"version": "1.0.0" | ||
"version": "1.0.0", | ||
"dependencies": { | ||
"@fastgpt/common": "workspace:*" | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
{ | ||
"name": "app", | ||
"version": "4.4.6", | ||
"version": "4.4.7", | ||
"private": false, | ||
"scripts": { | ||
"dev": "next dev", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.