Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何批量导出MS数据集为swift能加载的格式 #1041

Closed
WSC741606 opened this issue Jun 1, 2024 · 2 comments
Closed

如何批量导出MS数据集为swift能加载的格式 #1041

WSC741606 opened this issue Jun 1, 2024 · 2 comments

Comments

@WSC741606
Copy link

如题,离线训练模型需要先在有网的地方下载后迁移,但是迁移后镜像加载的swift默认读取位置不在用户文件夹下,我要怎么把数据迁移过去?能怎么把ModelScope的数据集导出成jsonl或者csv格式,从而能从自定义数据集的模式加载?

@wangxingjun778
Copy link
Collaborator

wangxingjun778 commented Aug 28, 2024

尝试使用如下命令:

  1. modelscope download --dataset 'your_namespace/your_dataset_name' --local_dir './your_data_dir'
  2. from modelscope import MsDataset
  3. MsDataset.load('imagefolder', data_dir=your_data_dir) # imagefolder可以是csv、json等

详细可参考文档:

  1. 数据集的下载:https://modelscope.cn/docs/%E6%95%B0%E6%8D%AE%E9%9B%86%E7%9A%84%E4%B8%8B%E8%BD%BD
  2. 数据集使用指南:https://modelscope.cn/docs/%E6%95%B0%E6%8D%AE%E9%9B%86%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97

@WSC741606
Copy link
Author

谢谢大佬回复,我知道这个下载方式,但下载后通常都不是能由Swift直接读入的格式(如 https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md 所示的几种格式),就没法直接传参进命令行--dataset ***,我希望是能导出成jsonl/csv等,这样我如果要进一步筛选也好处理

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants