$ torchrun --nproc_per_node=4 run_image_classification.py --dataset_name beans --output_dir ./beans_outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 5 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --save_strategy epoch --load_best_model_at_end True --save_total_limit 3 --seed 1337 --fsdp "full_shard auto_wrap" --fsdp_min_num_params 20000 WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** 08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False 08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 08/11/2022 03:14:10 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[, ], fsdp_min_num_params=20000, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=True, local_rank=0, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=./beans_outputs/runs/Aug11_03-14-10_ip-172-31-58-202, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=loss, mp_parameters=, no_cuda=False, num_train_epochs=5.0, optim=adamw_hf, output_dir=./beans_outputs/, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=False, report_to=[], resume_from_checkpoint=None, run_name=./beans_outputs/, save_on_each_node=False, save_steps=500, save_strategy=epoch, save_total_limit=3, seed=1337, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) 08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False 08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False 08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default 08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default 08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default 08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default 08/11/2022 03:14:10 - WARNING - datasets.builder - Reusing dataset beans (/home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791) 0%| | 0/3 [00:00> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/config.json [INFO|configuration_utils.py:695] 2022-08-11 03:14:11,818 >> Model config ViTConfig { "_name_or_path": "google/vit-base-patch16-224-in21k", "architectures": [ "ViTModel" ], "attention_probs_dropout_prob": 0.0, "encoder_stride": 16, "finetuning_task": "image-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 768, "id2label": { "0": "angular_leaf_spot", "1": "bean_rust", "2": "healthy" }, "image_size": 224, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "angular_leaf_spot": "0", "bean_rust": "1", "healthy": "2" }, "layer_norm_eps": 1e-12, "model_type": "vit", "num_attention_heads": 12, "num_channels": 3, "num_hidden_layers": 12, "patch_size": 16, "qkv_bias": true, "transformers_version": "4.22.0.dev0" } [INFO|modeling_utils.py:2067] 2022-08-11 03:14:11,834 >> loading weights file pytorch_model.bin from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/pytorch_model.bin [WARNING|modeling_utils.py:2491] 2022-08-11 03:14:12,999 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias'] - This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2503] 2022-08-11 03:14:12,999 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,003 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias'] - This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,003 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,005 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias'] - This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,005 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,006 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias'] - This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,007 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [INFO|feature_extraction_utils.py:432] 2022-08-11 03:14:13,271 >> loading configuration file preprocessor_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/preprocessor_config.json [INFO|configuration_utils.py:643] 2022-08-11 03:14:13,535 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/config.json [INFO|configuration_utils.py:695] 2022-08-11 03:14:13,536 >> Model config ViTConfig { "_name_or_path": "google/vit-base-patch16-224-in21k", "architectures": [ "ViTModel" ], "attention_probs_dropout_prob": 0.0, "encoder_stride": 16, "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 768, "image_size": 224, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "model_type": "vit", "num_attention_heads": 12, "num_channels": 3, "num_hidden_layers": 12, "patch_size": 16, "qkv_bias": true, "transformers_version": "4.22.0.dev0" } [INFO|feature_extraction_utils.py:469] 2022-08-11 03:14:13,537 >> Feature extractor ViTFeatureExtractor { "do_normalize": true, "do_resize": true, "feature_extractor_type": "ViTFeatureExtractor", "image_mean": [ 0.5, 0.5, 0.5 ], "image_std": [ 0.5, 0.5, 0.5 ], "resample": 2, "size": 224 } /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. warnings.warn( /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. warnings.warn( /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. warnings.warn( /opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. warnings.warn( /home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( /home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( /home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( /home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1612] 2022-08-11 03:14:15,840 >> ***** Running training ***** [INFO|trainer.py:1613] 2022-08-11 03:14:15,840 >> Num examples = 1034 [INFO|trainer.py:1614] 2022-08-11 03:14:15,840 >> Num Epochs = 5 [INFO|trainer.py:1615] 2022-08-11 03:14:15,840 >> Instantaneous batch size per device = 8 [INFO|trainer.py:1616] 2022-08-11 03:14:15,840 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1617] 2022-08-11 03:14:15,840 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1618] 2022-08-11 03:14:15,840 >> Total optimization steps = 165 {'loss': 1.0569, 'learning_rate': 1.8787878787878792e-05, 'epoch': 0.3} {'loss': 0.9069, 'learning_rate': 1.7575757575757576e-05, 'epoch': 0.61} {'loss': 0.7793, 'learning_rate': 1.6363636363636366e-05, 'epoch': 0.91} 20%|█████████████████████████████████▏ | 33/165 [00:38<00:31, 4.17it/s][INFO|trainer.py:2898] 2022-08-11 03:14:53,953 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:14:53,953 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:14:53,953 >> Batch size = 8 {'eval_loss': 0.658123791217804, 'eval_accuracy': 0.9624060150375939, 'eval_runtime': 0.866, 'eval_samples_per_second': 153.573, 'eval_steps_per_second': 5.773, 'epoch': 1.0} 20%|█████████████████████████████████▏ | 33/165 [00:38<00:31, 4.17it/s[INFO|trainer.py:2647] 2022-08-11 03:14:54,945 >> Saving model checkpoint to ./beans_outputs/checkpoint-33 [INFO|configuration_utils.py:440] 2022-08-11 03:14:54,946 >> Configuration saved in ./beans_outputs/checkpoint-33/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:14:55,542 >> Model weights saved in ./beans_outputs/checkpoint-33/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:14:55,543 >> Feature extractor saved in ./beans_outputs/checkpoint-33/preprocessor_config.json {'loss': 0.6662, 'learning_rate': 1.5151515151515153e-05, 'epoch': 1.21} {'loss': 0.5955, 'learning_rate': 1.3939393939393942e-05, 'epoch': 1.52} {'loss': 0.4857, 'learning_rate': 1.2727272727272728e-05, 'epoch': 1.82} 40%|██████████████████████████████████████████████████████████████████▍ | 66/165 [00:48<00:23, 4.20it/s][INFO|trainer.py:2898] 2022-08-11 03:15:04,193 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:15:04,193 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:15:04,193 >> Batch size = 8 {'eval_loss': 0.39150580763816833, 'eval_accuracy': 0.9774436090225563, 'eval_runtime': 0.7883, 'eval_samples_per_second': 168.711, 'eval_steps_per_second': 6.343, 'epoch': 2.0} 40%|██████████████████████████████████████████████████████████████████▍ | 66/165 [00:49<00:23, 4.20it/s[INFO|trainer.py:2647] 2022-08-11 03:15:05,114 >> Saving model checkpoint to ./beans_outputs/checkpoint-66 [INFO|configuration_utils.py:440] 2022-08-11 03:15:05,115 >> Configuration saved in ./beans_outputs/checkpoint-66/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:15:05,614 >> Model weights saved in ./beans_outputs/checkpoint-66/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:05,614 >> Feature extractor saved in ./beans_outputs/checkpoint-66/preprocessor_config.json {'loss': 0.4285, 'learning_rate': 1.1515151515151517e-05, 'epoch': 2.12} {'loss': 0.3621, 'learning_rate': 1.0303030303030304e-05, 'epoch': 2.42} {'loss': 0.3357, 'learning_rate': 9.090909090909091e-06, 'epoch': 2.73} 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌ | 99/165 [00:58<00:16, 4.10it/s][INFO|trainer.py:2898] 2022-08-11 03:15:14,177 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:15:14,178 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:15:14,178 >> Batch size = 8 {'eval_loss': 0.25688496232032776, 'eval_accuracy': 0.9849624060150376, 'eval_runtime': 0.7837, 'eval_samples_per_second': 169.71, 'eval_steps_per_second': 6.38, 'epoch': 3.0} 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌ | 99/165 [00:59<00:16, 4.10it/s[INFO|trainer.py:2647] 2022-08-11 03:15:15,092 >> Saving model checkpoint to ./beans_outputs/checkpoint-99 [INFO|configuration_utils.py:440] 2022-08-11 03:15:15,093 >> Configuration saved in ./beans_outputs/checkpoint-99/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:15:15,593 >> Model weights saved in ./beans_outputs/checkpoint-99/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:15,593 >> Feature extractor saved in ./beans_outputs/checkpoint-99/preprocessor_config.json {'loss': 0.3146, 'learning_rate': 7.87878787878788e-06, 'epoch': 3.03} {'loss': 0.2688, 'learning_rate': 6.666666666666667e-06, 'epoch': 3.33} {'loss': 0.2645, 'learning_rate': 5.4545454545454545e-06, 'epoch': 3.64} {'loss': 0.2509, 'learning_rate': 4.242424242424243e-06, 'epoch': 3.94} 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 132/165 [01:08<00:07, 4.27it/s][INFO|trainer.py:2898] 2022-08-11 03:15:24,157 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:15:24,157 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:15:24,157 >> Batch size = 8 {'eval_loss': 0.22308172285556793, 'eval_accuracy': 0.9774436090225563, 'eval_runtime': 0.7834, 'eval_samples_per_second': 169.783, 'eval_steps_per_second': 6.383, 'epoch': 4.0} 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 132/165 [01:09<00:07, 4.27it/s[INFO|trainer.py:2647] 2022-08-11 03:15:25,071 >> Saving model checkpoint to ./beans_outputs/checkpoint-132 [INFO|configuration_utils.py:440] 2022-08-11 03:15:25,072 >> Configuration saved in ./beans_outputs/checkpoint-132/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:15:25,572 >> Model weights saved in ./beans_outputs/checkpoint-132/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:25,572 >> Feature extractor saved in ./beans_outputs/checkpoint-132/preprocessor_config.json [INFO|trainer.py:2725] 2022-08-11 03:15:25,839 >> Deleting older checkpoint [beans_outputs/checkpoint-33] due to args.save_total_limit {'loss': 0.226, 'learning_rate': 3.0303030303030305e-06, 'epoch': 4.24} {'loss': 0.255, 'learning_rate': 1.8181818181818183e-06, 'epoch': 4.55} {'loss': 0.2252, 'learning_rate': 6.060606060606061e-07, 'epoch': 4.85} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:18<00:00, 4.26it/s][INFO|trainer.py:2898] 2022-08-11 03:15:34,092 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:15:34,092 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:15:34,092 >> Batch size = 8 {'eval_loss': 0.20757310092449188, 'eval_accuracy': 0.9849624060150376, 'eval_runtime': 0.7852, 'eval_samples_per_second': 169.383, 'eval_steps_per_second': 6.368, 'epoch': 5.0} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:19<00:00, 4.26it/s[INFO|trainer.py:2647] 2022-08-11 03:15:35,007 >> Saving model checkpoint to ./beans_outputs/checkpoint-165 [INFO|configuration_utils.py:440] 2022-08-11 03:15:35,008 >> Configuration saved in ./beans_outputs/checkpoint-165/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:15:35,509 >> Model weights saved in ./beans_outputs/checkpoint-165/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:35,509 >> Feature extractor saved in ./beans_outputs/checkpoint-165/preprocessor_config.json [INFO|trainer.py:2725] 2022-08-11 03:15:35,775 >> Deleting older checkpoint [beans_outputs/checkpoint-66] due to args.save_total_limit [INFO|trainer.py:1857] 2022-08-11 03:15:35,854 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:1949] 2022-08-11 03:15:35,854 >> Loading best model from ./beans_outputs/checkpoint-165 (score: 0.20757310092449188). {'train_runtime': 80.4298, 'train_samples_per_second': 64.28, 'train_steps_per_second': 2.051, 'train_loss': 0.45646645228068033, 'epoch': 5.0} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:20<00:00, 2.05it/s] [INFO|trainer.py:2647] 2022-08-11 03:15:36,366 >> Saving model checkpoint to ./beans_outputs/ [INFO|configuration_utils.py:440] 2022-08-11 03:15:36,367 >> Configuration saved in ./beans_outputs/config.json [INFO|modeling_utils.py:1569] 2022-08-11 03:15:36,871 >> Model weights saved in ./beans_outputs/pytorch_model.bin [INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:36,872 >> Feature extractor saved in ./beans_outputs/preprocessor_config.json ***** train metrics ***** epoch = 5.0 train_loss = 0.4565 train_runtime = 0:01:20.42 train_samples_per_second = 64.28 train_steps_per_second = 2.051 [INFO|trainer.py:2898] 2022-08-11 03:15:36,875 >> ***** Running Evaluation ***** [INFO|trainer.py:2900] 2022-08-11 03:15:36,875 >> Num examples = 133 [INFO|trainer.py:2903] 2022-08-11 03:15:36,875 >> Batch size = 8 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 7.72it/s] ***** eval metrics ***** epoch = 5.0 eval_accuracy = 0.985 eval_loss = 0.2076 eval_runtime = 0:00:00.78 eval_samples_per_second = 170.463 eval_steps_per_second = 6.408