Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在网页版的demo中推理仅仅只能进行语言模型的对话,没有多模态的推理 #461

Closed
a2382625920 opened this issue Feb 27, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@a2382625920
Copy link

例如:
我已经微调好了Yi-VL-6B,想使用网页版推理进行对话,但是推理页面没有专门的图片输入选项,只能进行chat,希望可以改进一下

@a2382625920
Copy link
Author

例如: 我已经微调好了Yi-VL-6B,想使用网页版推理进行对话,但是推理页面没有专门的图片输入选项,只能进行chat,希望可以改进一下

不对,应该是yi-vl-6B没有图片的输入也无法进行chat,网页版推理仅对纯语言模型有效

@Jintao-Huang
Copy link
Collaborator

是的, 网页端暂时不支持多模态模型的推理, 可以使用swift infer进行命令行的推理

@Jintao-Huang Jintao-Huang added the enhancement New feature or request label Feb 28, 2024
@a2382625920
Copy link
Author

是的, 网页端暂时不支持多模态模型的推理, 可以使用进行命令行的推理swift infer

好的谢谢,那能不能通过一个循环,将一个json文件中的关键字段全部提取出来,然后json中的每个数据都推理一遍

@a2382625920
Copy link
Author

a2382625920 commented Feb 28, 2024

类似于这样(这是Qwen-VL的批量预测的代码):
def _load_model_tokenizer():
tokenizer = AutoTokenizer.from_pretrained(
DEFAULT_CKPT_PATH, trust_remote_code=True, resume_download=True,
)

device_map = "cuda"
model = AutoPeftModelForCausalLM.from_pretrained(
DEFAULT_CKPT_PATH, # path to the output directory
device_map="cuda",
trust_remote_code=True
).eval()
# model.generation_config = GenerationConfig.from_pretrained(
#     DEFAULT_CKPT_PATH, trust_remote_code=True, resume_download=True,
# )

return model, tokenizer

def parse_text(text):
lines = text.split("\n")
lines = [line for line in lines if line != ""]
count = 0
for i, line in enumerate(lines):
if "```" in line:
count += 1
items = line.split("") if count % 2 == 1: lines[i] = f'<pre><code class="language-{items[-1]}">' else: lines[i] = f"<br></code></pre>" else: if i > 0: if count % 2 == 1: line = line.replace("", r"`")
line = line.replace("<", "<")
line = line.replace(">", ">")
line = line.replace(" ", " ")
line = line.replace("*", "*")
line = line.replace("
", "_")
line = line.replace("-", "-")
line = line.replace(".", ".")
line = line.replace("!", "!")
line = line.replace("(", "(")
line = line.replace(")", ")")
line = line.replace("$", "$")
lines[i] = "
" + line
text = "".join(lines)
return text

def predict(message):
start = time.time()
message = _parse_text(message)
print("用户: " + _parse_text(message))
history = []
response, history = model.chat(tokenizer, message, history=history)
full_response = _parse_text(response)
print("Qwen-VL-Chat: " + _parse_text(full_response))
print(f"耗时{time.time()-start}")
return full_response

model, tokenizer = _load_model_tokenizer()

if name == 'main':
with open(r'your/path', 'r', encoding='utf-8') as f:
valid_data = json.load(f)
data_list = []
start = time.time()
for index, i in enumerate(valid_data):
image_id = i.get("id")
print(f'进度{index + 1}/{len(valid_data)};{image_id}')
conversations = i.get("conversations")
query = conversations[0].get("value")
true_ = conversations[1].get("value")
full_response = predict(query)
print("实际: ", true_)
data = {"图像名称": image_id, "实际结果": true_, "预测结果": full_response, "查询": query}
data_list.append(data)
print(f"耗时{time.time()-start}")

output_path = 'your/predict.json'
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(data_list, f, ensure_ascii=False, indent=2)

print(f"输出保存在 {output_path}")

@GUOGUO-lab
Copy link

GUOGUO-lab commented Mar 22, 2024

是的, 网页端暂时不支持多模态模型的推理, 可以使用swift infer进行命令行的推理

此前的1.5.4版本可以加载qwen-vl-chat,图片的推理也可以用“<img>xxx.jpg</img>“来进行输入,结果到了1.7.0这个也不支持了,请问这个能恢复吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants