Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
一、增加的功能描述
1.能够从 .pdf, .ppt/.pptx, .doc/.docx, .xls/.xlsx 文件中成功提取出所有嵌入的图片。
2.提取出的图片需以 .png 或 .jpg 格式保存至指定存储系统,并生成对应的元数据(如来源文件、在原文中的位置描述)存入知识库索引。
3.检索与展示: 在对话界面中,当用户提问涉及图片内容时,系统能返回并正确显示对应的图片。
二、功能实现的实现流程截图
1.部署阶段



1.1数据库knowledge_record_t表中加入新列"is_multimodal"(用于判断列"embedding_model_name"表示模型是否是多模态的)
1.1部署时可以选择是否下载模型(用于.pdf、.doc、.docx文件的图片提取部分)。下图的最上面是新增的选择,中间可以看到下载进度。
1.2模型下载位置为nxent-data下新建文件夹model中
1.3模型文件路径存入.env
2.创建知识库阶段





2.1右上角的"Multimodal"可点击,绿色表示使用多模态向量模型,黑色(默认)使用向量模型。
2.2向创建的索引中上传文件,提取是图片(doc docx pdf文件用了"hi_res"策略提取图片,如果不下载模型,无法提取对应文件图片)存入MinIO,存入新建的images_in_attachments文件夹里
2.3每一个图片元数据存到json对象里,作为文件切片内容上传到es中,将图片的向量存到es分片的“multi_embedding”字段中
元数据格式示例如下:
2.4上传成功后,知识库列表会展示模型类型"multimodal"的标签(不是多模态向量模型就不显示了)
3.配置知识库检索工具阶段

3.1配置参数部分新加了一个"multimodal"字段,可选true或者false,表示是否用多模态向量模型,索引部分也会显示对应的“模型不匹配”和"multimodal"标签。测试后能正确出结果,会返回文件的文本,也会返回图片的元数据
4.对话



4.1模型能成功输入xxx文件的xxx图片,来源部分的图片里也会展示搜索到的图片