-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoNLP TrainerBase and Text Classification #3728
Conversation
auto_trainer.train( | ||
train_ds, | ||
dev_ds, | ||
num_cpus=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的计算会存在异构设备同时计算吗?
因为同时暴露num_cpus和num_gpus.
从飞桨自身发展角度,未来还有更多硬件,如昆仑xpu等,这一命名设计会限制未来新AI硬件接入的扩展
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的cpu就是正常的cpu core, gpu特指cuda devices. 至于未来的npu, xpu之类的,也可以通过类似的方法来限制https://docs.ray.io/en/latest/tune/faq.html#how-do-i-set-resources
这块有什么建议吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为ray也是apache 2.0协议开源的,有定制的需要我们也可以向upstream推。之前tpu,他们就做了类似的工作https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/gcp/tpu.yaml
time_budget_s=time_budget_s, | ||
max_concurrent_trials=max_concurrent_trials, | ||
) | ||
tuner = tune.Tuner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
得看ray的依赖是否很重,如果很重的话是否可以在外围,当用户需要这个功能时再安装,而不宜作为默认的依赖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前就是放在外围的。见setup.py
#3724 |
@@ -1,4 +1,4 @@ | |||
paddlepaddle>=2.3.0,<2.4.0 | |||
paddlepaddle==2.4.0rc0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前prompt model的动转静需要2.4.0以后才能跑通
@@ -45,6 +45,7 @@ unit-test: | |||
install: | |||
pip install -r requirements-dev.txt | |||
pip install -r requirements.txt | |||
pip install -r paddlenlp/experimental/autonlp/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里按道理不应该加入make install, 因为大部分开发都不需要autonlp的依赖,现在暂时为了跑单测加入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档增加下跳转,同意一下AI Studio品牌名撰写,其他没问题
train_dataset=train_ds, | ||
eval_dataset=dev_ds, | ||
label_column="labels", | ||
text_column="sentence", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于文本分类任务中输入有两个columns输入,这里是否兼容了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
刻意不兼容,后续会有专门的AutoTrainerForSemanticSearch
|
||
```python | ||
auto_trainer = AutoTrainerForTextClassification( | ||
train_dataset=train_ds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的输入这一侧,是否要把datasets概念传递给用户了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,这里暂时是这样设计的,用户需要自己把数据集转成datasets格式。一个是trainer本来就需要Datasets, 所以这里正好衔接上。还有就是MVP版本尽量轻量化,减少非核心功能,如果后续有需求,可以支持pd.DataFrame之类的转化
|
||
- num_models (int, required): 模型试验数量 | ||
- num_gpus (str, optional): 实验使用的 GPU 数量。默认情况下,这是根据检测到的 GPU 设置的。 | ||
- num_cpus (str, optional): 实验使用的 CPU 数量。默认情况下,这是根据检测到的 vCPU 设置的。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vCPU -> CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里确实是vCPU, 英语原文是virtual core, 来自底层调用的ray
- num_gpus (str, optional): 实验使用的 GPU 数量。默认情况下,这是根据检测到的 GPU 设置的。 | ||
- num_cpus (str, optional): 实验使用的 CPU 数量。默认情况下,这是根据检测到的 vCPU 设置的。 | ||
- max_concurrent_trials (int, optional): 同时运行的最大试验数。必须是非负数。如果为 None 或 0,则不应用任何限制。默认为None。 | ||
- time_budget_s: (int|float|datetime.timedelta, optional) 以秒为单位的全局时间预算,超过时间后停止所有模型试验。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的时间限制倒是没有必要提供这么多类型,可以直接int、float类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个来自于底层ray的配置,既然人家配置好了,我也乐于放出来
- hp_overrides: (dict[str, Any], optional): (仅限高级用户)。覆盖每个候选模型的超参数。例如,`{"TrainingArguments.max_steps":5}`。 | ||
- custom_model_candiates: (dict[str, Any], optional): (仅限高级用户)。运行用户提供的候选模型而不 PaddleNLP 的默认候选模型。可以参考 `._model_candidates` 属性 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还是有疑问?AutoNLP看起来没有统一对外接口,而是不同的任务统一接口,这里的考虑是什么了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以后会有,当前MVP因为只有一个分类的class, 暂时还没有配置。后续会支持类似Taskflow这种,加上一个task_type
""" | ||
model_result = self._get_model_result(trial_id=trial_id) | ||
exported_model_path = os.path.join(model_result.log_dir, self.export_path) | ||
shutil.copytree(exported_model_path, export_path, dirs_exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
导出模型这里,可以把静态图的模型的导出放入进去
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一块暂时还是输出动态图,保持一个灵活性。后续版本会统一增加to_static的stage。
from hyperopt import hp | ||
from paddle.io import Dataset | ||
from scipy.special import expit as sigmoid | ||
from sklearn.metrics import accuracy_score, precision_recall_fscore_support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn看起来没有放入到requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddlenlp本来就带了sklearn, seqeval带进来的
) | ||
|
||
@property | ||
def _model_candidates(self) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前咱们只考虑预训练模型是吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Description
AutoNLP TrainerBase and Text Classification
AiStudio例子 https://aistudio.baidu.com/aistudio/projectdetail/4994688?contributionType=1