$ torchrun --nproc_per_node=4 run_image_classification.py       --dataset_name beans       --output_dir ./beans_outputs/       --remove_unused_columns False       --do_train       --do_eval       --learning_rate 2e-5       --num_train_epochs 5       --per_device_train_batch_size 8       --per_device_eval_batch_size 8       --logging_strategy steps       --logging_steps 10       --evaluation_strategy epoch       --save_strategy epoch       --load_best_model_at_end True       --save_total_limit 3       --seed 1337       --fsdp "full_shard auto_wrap" --fsdp_min_num_params 20000
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
08/11/2022 03:14:10 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[<FSDPOption.FULL_SHARD: 'full_shard'>, <FSDPOption.AUTO_WRAP: 'auto_wrap'>],
fsdp_min_num_params=20000,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=False,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=0,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=./beans_outputs/runs/Aug11_03-14-10_ip-172-31-58-202,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=loss,
mp_parameters=,
no_cuda=False,
num_train_epochs=5.0,
optim=adamw_hf,
output_dir=./beans_outputs/,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=[],
resume_from_checkpoint=None,
run_name=./beans_outputs/,
save_on_each_node=False,
save_steps=500,
save_strategy=epoch,
save_total_limit=3,
seed=1337,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
08/11/2022 03:14:10 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default
08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default
08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default
08/11/2022 03:14:10 - WARNING - datasets.builder - Using custom data configuration default
08/11/2022 03:14:10 - WARNING - datasets.builder - Reusing dataset beans (/home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791)
  0%|                                                                                                                                                                                 | 0/3 [00:00<?, ?it/s]08/11/2022 03:14:10 - WARNING - datasets.builder - Reusing dataset beans (/home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 724.28it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 781.06it/s]
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-4dcd67f9b435dcef.arrow
08/11/2022 03:14:10 - WARNING - datasets.builder - Reusing dataset beans (/home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791)
  0%|                                                                                                                                                                                 | 0/3 [00:00<?, ?it/s]08/11/2022 03:14:10 - WARNING - datasets.builder - Reusing dataset beans (/home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791)
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-4dcd67f9b435dcef.arrow
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 784.72it/s]
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-ef66bfb93c432846.arrow
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 774.48it/s]
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-ef66bfb93c432846.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-4dcd67f9b435dcef.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-4dcd67f9b435dcef.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-727b336be0446947.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-727b336be0446947.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-ef66bfb93c432846.arrow
08/11/2022 03:14:10 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-ef66bfb93c432846.arrow
08/11/2022 03:14:11 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-727b336be0446947.arrow
08/11/2022 03:14:11 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791/cache-727b336be0446947.arrow
[INFO|configuration_utils.py:643] 2022-08-11 03:14:11,817 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/config.json
[INFO|configuration_utils.py:695] 2022-08-11 03:14:11,818 >> Model config ViTConfig {
  "_name_or_path": "google/vit-base-patch16-224-in21k",
  "architectures": [
    "ViTModel"
  ],
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "finetuning_task": "image-classification",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "id2label": {
    "0": "angular_leaf_spot",
    "1": "bean_rust",
    "2": "healthy"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "angular_leaf_spot": "0",
    "bean_rust": "1",
    "healthy": "2"
  },
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "qkv_bias": true,
  "transformers_version": "4.22.0.dev0"
}

[INFO|modeling_utils.py:2067] 2022-08-11 03:14:11,834 >> loading weights file pytorch_model.bin from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/pytorch_model.bin
[WARNING|modeling_utils.py:2491] 2022-08-11 03:14:12,999 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2503] 2022-08-11 03:14:12,999 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,003 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,003 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,005 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,005 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[WARNING|modeling_utils.py:2491] 2022-08-11 03:14:13,006 >> Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2503] 2022-08-11 03:14:13,007 >> Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|feature_extraction_utils.py:432] 2022-08-11 03:14:13,271 >> loading configuration file preprocessor_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/preprocessor_config.json
[INFO|configuration_utils.py:643] 2022-08-11 03:14:13,535 >> loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--google--vit-base-patch16-224-in21k/snapshots/1ba429d32753f33a0660b80ac6f43a3c80c18938/config.json
[INFO|configuration_utils.py:695] 2022-08-11 03:14:13,536 >> Model config ViTConfig {
  "_name_or_path": "google/vit-base-patch16-224-in21k",
  "architectures": [
    "ViTModel"
  ],
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "model_type": "vit",
  "num_attention_heads": 12,
  "num_channels": 3,
  "num_hidden_layers": 12,
  "patch_size": 16,
  "qkv_bias": true,
  "transformers_version": "4.22.0.dev0"
}

[INFO|feature_extraction_utils.py:469] 2022-08-11 03:14:13,537 >> Feature extractor ViTFeatureExtractor {
  "do_normalize": true,
  "do_resize": true,
  "feature_extractor_type": "ViTFeatureExtractor",
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "resample": 2,
  "size": 224
}

/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication.
  warnings.warn(
/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication.
  warnings.warn(
/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication.
  warnings.warn(
/opt/conda/envs/pytorch/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:929: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication.
  warnings.warn(
/home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/home/ubuntu/.local/lib/python3.9/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1612] 2022-08-11 03:14:15,840 >> ***** Running training *****
[INFO|trainer.py:1613] 2022-08-11 03:14:15,840 >>   Num examples = 1034
[INFO|trainer.py:1614] 2022-08-11 03:14:15,840 >>   Num Epochs = 5
[INFO|trainer.py:1615] 2022-08-11 03:14:15,840 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:1616] 2022-08-11 03:14:15,840 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1617] 2022-08-11 03:14:15,840 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1618] 2022-08-11 03:14:15,840 >>   Total optimization steps = 165
{'loss': 1.0569, 'learning_rate': 1.8787878787878792e-05, 'epoch': 0.3}                                                                                                                                     
{'loss': 0.9069, 'learning_rate': 1.7575757575757576e-05, 'epoch': 0.61}                                                                                                                                    
{'loss': 0.7793, 'learning_rate': 1.6363636363636366e-05, 'epoch': 0.91}                                                                                                                                    
 20%|█████████████████████████████████▏                                                                                                                                    | 33/165 [00:38<00:31,  4.17it/s][INFO|trainer.py:2898] 2022-08-11 03:14:53,953 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:14:53,953 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:14:53,953 >>   Batch size = 8
{'eval_loss': 0.658123791217804, 'eval_accuracy': 0.9624060150375939, 'eval_runtime': 0.866, 'eval_samples_per_second': 153.573, 'eval_steps_per_second': 5.773, 'epoch': 1.0}                              
 20%|█████████████████████████████████▏                                                                                                                                    | 33/165 [00:38<00:31,  4.17it/s[INFO|trainer.py:2647] 2022-08-11 03:14:54,945 >> Saving model checkpoint to ./beans_outputs/checkpoint-33                                                                                                   
[INFO|configuration_utils.py:440] 2022-08-11 03:14:54,946 >> Configuration saved in ./beans_outputs/checkpoint-33/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:14:55,542 >> Model weights saved in ./beans_outputs/checkpoint-33/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:14:55,543 >> Feature extractor saved in ./beans_outputs/checkpoint-33/preprocessor_config.json
{'loss': 0.6662, 'learning_rate': 1.5151515151515153e-05, 'epoch': 1.21}                                                                                                                                    
{'loss': 0.5955, 'learning_rate': 1.3939393939393942e-05, 'epoch': 1.52}                                                                                                                                    
{'loss': 0.4857, 'learning_rate': 1.2727272727272728e-05, 'epoch': 1.82}                                                                                                                                    
 40%|██████████████████████████████████████████████████████████████████▍                                                                                                   | 66/165 [00:48<00:23,  4.20it/s][INFO|trainer.py:2898] 2022-08-11 03:15:04,193 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:15:04,193 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:15:04,193 >>   Batch size = 8
{'eval_loss': 0.39150580763816833, 'eval_accuracy': 0.9774436090225563, 'eval_runtime': 0.7883, 'eval_samples_per_second': 168.711, 'eval_steps_per_second': 6.343, 'epoch': 2.0}                           
 40%|██████████████████████████████████████████████████████████████████▍                                                                                                   | 66/165 [00:49<00:23,  4.20it/s[INFO|trainer.py:2647] 2022-08-11 03:15:05,114 >> Saving model checkpoint to ./beans_outputs/checkpoint-66                                                                                                   
[INFO|configuration_utils.py:440] 2022-08-11 03:15:05,115 >> Configuration saved in ./beans_outputs/checkpoint-66/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:15:05,614 >> Model weights saved in ./beans_outputs/checkpoint-66/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:05,614 >> Feature extractor saved in ./beans_outputs/checkpoint-66/preprocessor_config.json
{'loss': 0.4285, 'learning_rate': 1.1515151515151517e-05, 'epoch': 2.12}                                                                                                                                    
{'loss': 0.3621, 'learning_rate': 1.0303030303030304e-05, 'epoch': 2.42}                                                                                                                                    
{'loss': 0.3357, 'learning_rate': 9.090909090909091e-06, 'epoch': 2.73}                                                                                                                                     
 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                  | 99/165 [00:58<00:16,  4.10it/s][INFO|trainer.py:2898] 2022-08-11 03:15:14,177 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:15:14,178 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:15:14,178 >>   Batch size = 8
{'eval_loss': 0.25688496232032776, 'eval_accuracy': 0.9849624060150376, 'eval_runtime': 0.7837, 'eval_samples_per_second': 169.71, 'eval_steps_per_second': 6.38, 'epoch': 3.0}                             
 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                  | 99/165 [00:59<00:16,  4.10it/s[INFO|trainer.py:2647] 2022-08-11 03:15:15,092 >> Saving model checkpoint to ./beans_outputs/checkpoint-99                                                                                                   
[INFO|configuration_utils.py:440] 2022-08-11 03:15:15,093 >> Configuration saved in ./beans_outputs/checkpoint-99/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:15:15,593 >> Model weights saved in ./beans_outputs/checkpoint-99/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:15,593 >> Feature extractor saved in ./beans_outputs/checkpoint-99/preprocessor_config.json
{'loss': 0.3146, 'learning_rate': 7.87878787878788e-06, 'epoch': 3.03}                                                                                                                                      
{'loss': 0.2688, 'learning_rate': 6.666666666666667e-06, 'epoch': 3.33}                                                                                                                                     
{'loss': 0.2645, 'learning_rate': 5.4545454545454545e-06, 'epoch': 3.64}                                                                                                                                    
{'loss': 0.2509, 'learning_rate': 4.242424242424243e-06, 'epoch': 3.94}                                                                                                                                     
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                 | 132/165 [01:08<00:07,  4.27it/s][INFO|trainer.py:2898] 2022-08-11 03:15:24,157 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:15:24,157 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:15:24,157 >>   Batch size = 8
{'eval_loss': 0.22308172285556793, 'eval_accuracy': 0.9774436090225563, 'eval_runtime': 0.7834, 'eval_samples_per_second': 169.783, 'eval_steps_per_second': 6.383, 'epoch': 4.0}                           
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                 | 132/165 [01:09<00:07,  4.27it/s[INFO|trainer.py:2647] 2022-08-11 03:15:25,071 >> Saving model checkpoint to ./beans_outputs/checkpoint-132                                                                                                  
[INFO|configuration_utils.py:440] 2022-08-11 03:15:25,072 >> Configuration saved in ./beans_outputs/checkpoint-132/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:15:25,572 >> Model weights saved in ./beans_outputs/checkpoint-132/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:25,572 >> Feature extractor saved in ./beans_outputs/checkpoint-132/preprocessor_config.json
[INFO|trainer.py:2725] 2022-08-11 03:15:25,839 >> Deleting older checkpoint [beans_outputs/checkpoint-33] due to args.save_total_limit
{'loss': 0.226, 'learning_rate': 3.0303030303030305e-06, 'epoch': 4.24}                                                                                                                                     
{'loss': 0.255, 'learning_rate': 1.8181818181818183e-06, 'epoch': 4.55}                                                                                                                                     
{'loss': 0.2252, 'learning_rate': 6.060606060606061e-07, 'epoch': 4.85}                                                                                                                                     
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:18<00:00,  4.26it/s][INFO|trainer.py:2898] 2022-08-11 03:15:34,092 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:15:34,092 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:15:34,092 >>   Batch size = 8
{'eval_loss': 0.20757310092449188, 'eval_accuracy': 0.9849624060150376, 'eval_runtime': 0.7852, 'eval_samples_per_second': 169.383, 'eval_steps_per_second': 6.368, 'epoch': 5.0}                           
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:19<00:00,  4.26it/s[INFO|trainer.py:2647] 2022-08-11 03:15:35,007 >> Saving model checkpoint to ./beans_outputs/checkpoint-165                                                                                                  
[INFO|configuration_utils.py:440] 2022-08-11 03:15:35,008 >> Configuration saved in ./beans_outputs/checkpoint-165/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:15:35,509 >> Model weights saved in ./beans_outputs/checkpoint-165/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:35,509 >> Feature extractor saved in ./beans_outputs/checkpoint-165/preprocessor_config.json
[INFO|trainer.py:2725] 2022-08-11 03:15:35,775 >> Deleting older checkpoint [beans_outputs/checkpoint-66] due to args.save_total_limit
[INFO|trainer.py:1857] 2022-08-11 03:15:35,854 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1949] 2022-08-11 03:15:35,854 >> Loading best model from ./beans_outputs/checkpoint-165 (score: 0.20757310092449188).
{'train_runtime': 80.4298, 'train_samples_per_second': 64.28, 'train_steps_per_second': 2.051, 'train_loss': 0.45646645228068033, 'epoch': 5.0}                                                             
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 165/165 [01:20<00:00,  2.05it/s]
[INFO|trainer.py:2647] 2022-08-11 03:15:36,366 >> Saving model checkpoint to ./beans_outputs/
[INFO|configuration_utils.py:440] 2022-08-11 03:15:36,367 >> Configuration saved in ./beans_outputs/config.json
[INFO|modeling_utils.py:1569] 2022-08-11 03:15:36,871 >> Model weights saved in ./beans_outputs/pytorch_model.bin
[INFO|feature_extraction_utils.py:339] 2022-08-11 03:15:36,872 >> Feature extractor saved in ./beans_outputs/preprocessor_config.json
***** train metrics *****
  epoch                    =        5.0
  train_loss               =     0.4565
  train_runtime            = 0:01:20.42
  train_samples_per_second =      64.28
  train_steps_per_second   =      2.051
[INFO|trainer.py:2898] 2022-08-11 03:15:36,875 >> ***** Running Evaluation *****
[INFO|trainer.py:2900] 2022-08-11 03:15:36,875 >>   Num examples = 133
[INFO|trainer.py:2903] 2022-08-11 03:15:36,875 >>   Batch size = 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  7.72it/s]
***** eval metrics *****
  epoch                   =        5.0
  eval_accuracy           =      0.985
  eval_loss               =     0.2076
  eval_runtime            = 0:00:00.78
  eval_samples_per_second =    170.463
  eval_steps_per_second   =      6.408