diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..4cb613e
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,20 @@
+APP_NAME=ngxbac/pytorch_cv:kaggle_cell
+CONTAINER_NAME=kaggle_cell
+DATA_DIR=/raid/data/kaggle/recursion-cell
+OUT_DIR=/raid/bac/kaggle/logs/recursion-cell
+
+run: ## Run container
+	nvidia-docker run \
+		-e DISPLAY=unix${DISPLAY} -v /tmp/.X11-unix:/tmp/.X11-unix --privileged \
+		--ipc=host \
+		-itd \
+		--name=${CONTAINER_NAME} \
+		-v $(DATA_DIR):/data \
+		-v $(OUT_DIR):/logs \
+		-v $(shell pwd):/kaggle-cell $(APP_NAME) bash
+
+exec: ## Run a bash in a running container
+	nvidia-docker exec -it ${CONTAINER_NAME} bash
+
+stop: ## Stop and remove a running container
+	docker stop ${CONTAINER_NAME}; docker rm ${CONTAINER_NAME}
\ No newline at end of file
diff --git a/PERFORMANCE.md b/PERFORMANCE.md
deleted file mode 100644
index 76af0c6..0000000
--- a/PERFORMANCE.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Performances
-
-- Baseline  
-  - Model: se_resnext50_32x4d
-  - image_size: 512x512
-  - batch_size: 64
-  - grad_accum: 2
-  - augmentations:
-  ```pythonstub
-    def train_aug(image_size=512):
-        return Compose([
-            Resize(image_size, image_size),
-            RandomRotate90(),
-            Flip(),
-            Transpose(),
-        ], p=1)
-
-
-    def valid_aug(image_size=512):
-        return Compose([
-            # CenterCrop(448, 448),
-            Resize(image_size, image_size)
-            # Normalize(),
-        ], p=1)
-  ```
-  
-  - Optimizers: 
-  ```yaml
-  criterion_params:
-    criterion: CrossEntropyLoss
-
-  optimizer_params:
-    optimizer: Adam
-    lr: 0.0003
-    weight_decay: 0.0001
-
-  scheduler_params:
-    scheduler: MultiStepLR
-    milestones: [25, 30, 40]
-    gamma: 0.5
-
-  data_params:
-    batch_size: 64
-    num_workers: 4
-    drop_last: False
-
-    image_size: &image_size 512
-    train_csv: "./csv/train_0.csv"
-    valid_csv: "./csv/valid_0.csv"
-    root: "/raid/data/kaggle/recursion-cellular-image-classification/"
-    site: 2
-    channels: [1, 2, 3]
-  ```
-  Results: (fold 0)
-  
-  | Experiment | CV | LB |   
-  |:---------|----|---:|  
-  |c123_s1| 42.9%| 30.6%|  
-  |c123_s2| 41%| 23.6%|  
-  |Ensemble 0.7 * c123_s1 + 0.3 * c123_s2 | - | 32.5 |  
-  
-  c123_s1: using channels=[1,2,3] and site = 1
-  
\ No newline at end of file
diff --git a/README.md b/README.md
index c043f01..8c7f5b5 100644
--- a/README.md
+++ b/README.md
@@ -1,44 +1,467 @@
-# Requirements
+# Overview 
+This repository is used for Recursion Cellular Image Classification.  
+The writeup can be found in [here](https://www.kaggle.com/c/recursion-cellular-image-classification/discussion/110337)
 
-- torch == 1.1.0
-- cnn_finetune == 0.5.3
-- albumentations == 0.2.3
-- catalyst == 19.06.5
+The pipeline of this repository is shown as bellows
+![Pipeline](images/pipeline.png)
 
-# How to train 
-## Pretrained with controls
+
+There are 3 main parts: 
+* I. Pretrained from control images which has 31 siRNAs
+* II. Continue fintuning models with image dataset which has 1108 siRNAs. 
+* III. Continue fintuning models with image dataset and pseudo labels. 
+
+
+# Getting started 
+Thing you should know about the project. 
+* We run experiments via bash files which are located in `bin` folder. 
+* The config files (`yml`) are located in `configs` folder which are corresponding to each `bash files`. 
+
+  Ex: `train_control.sh` should go with `config_control.yml` 
+
+* The yml config file allows changing either via bash scripts for the flexible settings or directly modification for 
+the fixed settings.  
+  Ex: `stages/data_params/train_csv` can be `./csv/train_0.csv, ./csv/train_2.csv,... etc`. So when training K-Fold we
+  make a for loop for the convinent. 
+  
+# Common settings 
+
+The common settings in `yml config file`. 
+
+1. Define the model
+```yml 
+model_params:
+  model: cell_senet
+  n_channels: 5
+  num_classes: 1108
+  model_name: "se_resnext50_32x4d"
+```
+
+* model: Model function (callable) which returns model for the training. It can be found in `src/models/` package. 
+All the settings bellow `model_params/model` are considered as `parameters` of the function.  
+  Ex: `cell_senet` has default paramters as `model_name='se_resnext50_32x4d', num_classes=1108, n_channels=6, weight=None`. 
+  Those parameters can be set/overried as the config above. 
+  
+2. Metric monitoring  
+  We use MAP@3 for monitoring. 
+    ```
+    state_params:
+      main_metric: &reduce_metric accuracy03
+      minimize_metric: False
+    ```
+3. Loss   
+  `LabelSmoothingCrossEntropy` is used.
+    ```
+    criterion_params:
+      criterion: LabelSmoothingCrossEntropy
+    ```
+4. Data settings 
+    ```
+      batch_size: 64
+      num_workers: 8
+      drop_last: False
+
+      image_size: &image_size 512
+      train_csv: "./csv/train_0.csv"
+      valid_csv: "./csv/valid_0.csv"
+      dataset: "non_pseudo"
+      root: "/data/"
+      sites: [1]
+      channels: [1,2,3,4,5,6]
+    ```
+
+  * train_csv: path to train csv.
+  * valid_csv: path to valid csv.
+  * dataset: can be `control, non_pseudo, pseudo`. `control` is used to train with `control images` (Part I), `non_pseudo` is used to train non-pseudo dataset (Part II) and `pseudo` is used to train pseudo dataset (Part III). 
+  * root: path to data root. Default is: `/data`
+  * channels: a list of combination channels. Ex: [1,2,3], [4,5,6], etc.
+
+5. Optimizer and Learning rate 
+    ```
+      optimizer_params:
+        optimizer: Nadam
+        lr: 0.001
+    ```
+
+6. Scheduler 
+
+    OneCycleLR.
+
+    ```
+    scheduler_params:
+      scheduler: OneCycleLR
+      num_steps: &num_epochs 40
+      lr_range: [0.0005, 0.00001]
+      warmup_steps: 5
+      momentum_range: [0.85, 0.95]
+    ```
+
+# Build docker 
 ```bash
-bash bin/train_control.sh
+cd docker 
+docker build . -t ngxbac/pytorch_cv:kaggle_cell
 ```
 
-Pretrained models are saved at:
-`/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/$channels/se_resnext50_32x4d/`  
-where `channels` can be: `[1,2,3,4,5], etc`
+# Run container
+In `Makefile`, change: 
+* `DATA_DIR`: path to the data from kaggle. 
+```bash
+|-- pixel_stats.csv
+|-- pixel_stats.csv.zip
+|-- recursion_dataset_license.pdf
+|-- sample_submission.csv
+|-- test
+|-- test.csv
+|-- test.zip
+|-- test_controls.csv
+|-- train
+|-- train.csv
+|-- train.csv.zip
+|-- train.zip
+`-- train_controls.csv
+```
+
+* `OUT_DIR`: path to the folder which contains log, checkpoints.
+
+Run the commands: 
 
-## Train with pseudo data
 ```bash
-bash bin/train_pseudo.sh
+make run 
+make exec 
+cd /kaggle-cell/
 ```
 
-* `PRETRAINED_CONTROL`: is the root folder of pretrained models above  
+# Part I. Train with from control images 
+
+```bash
+bash bin/train_control.sh
+``` 
+
+This part, we use all the control images from train and test. 
 
-* `--model_params/weight=
-$PRETRAINED_CONTROL/$channels/se_resnext50_32x4d/checkpoints/best.pth:str \`  is the weight of 
-corresponding model pretrained on controls dataset.
+* Input: 
+  * `model_name`: name of model.  
+  In our solution, we train:  
+    * se_resnext50_32x4d, se_resnext101_32x4d for `cell_senet`.
+    * densenet121 for `cell_densenet`.
 
+* Output: 
+Default output folder is: `/logs/pretrained_controls/` where stores the models trained by control images. 
+Here is an example we train `se_resnext50_32x4d` with 6 combinations of channels.
+```bash
+/logs/pretrained_controls/
+|-- [1,2,3,4,5]
+|   `-- se_resnext50_32x4d
+|-- [1,2,3,4,6]
+|   `-- se_resnext50_32x4d
+|-- [1,2,3,5,6]
+|   `-- se_resnext50_32x4d
+|-- [1,2,4,5,6]
+|   `-- se_resnext50_32x4d
+|-- [1,3,4,5,6]
+|   `-- se_resnext50_32x4d
+`-- [2,3,4,5,6]
+    `-- se_resnext50_32x4d
+```
 
-## Train with usually data
-Similar to `Train with pseudo data` part
+# Part II. Finetuning without pseudo label  
 ```bash
 bash bin/train.sh
+``` 
+* Input:
+  * `PRETRAINED_CONTROL`: The folder where stores the model trained with control images. Default: `/logs/pretrained_controls/` 
+  *  `model_name`: name of model.  
+  * `TRAIN_CSV/VALID_CSV`: train and valid csv file for each fold. They are automaticaly changed each fold.
+
+* Output:  
+  Default output folder is: `/logs/non_pseudo/`. Here is an example we train K-Fold `se_resnext50_32x4d` with 6 combinations of channels.
+
+  ```
+  /logs/non_pseudo/
+  |-- [1,2,3,4,5]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,3,4,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,3,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,4,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,3,4,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  `-- [2,3,4,5,6]
+      |-- fold_0
+      |   `-- se_resnext50_32x4d
+      |-- fold_1
+      |   `-- se_resnext50_32x4d
+      |-- fold_2
+      |   `-- se_resnext50_32x4d
+      |-- fold_3
+      |   `-- se_resnext50_32x4d
+      `-- fold_4
+          `-- se_resnext50_32x4d
+  ``` 
+
+# Part III. Finetuning pseudo labels
+
+The different between Part III and Part II is only train/valid csv input files. 
+
+```bash
+bash bin/train_pseudo.sh
+``` 
+* Input:
+  * `PRETRAINED_CONTROL`: The folder where stores the model trained with control images. Default: `/logs/pretrained_controls/` 
+  *  `model_name`: name of model.  
+  * `TRAIN_CSV/VALID_CSV`: train and valid csv file for each fold. They are automaticaly changed each fold.
+
+* Output:  
+  Default output folder is: `/logs/pseudo/`. Here is an example we train K-Fold `se_resnext50_32x4d` with 6 combinations of channels.
+
+  ```
+  /logs/pseudo/
+  |-- [1,2,3,4,5]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,3,4,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,3,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,2,4,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  |-- [1,3,4,5,6]
+  |   |-- fold_0
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_1
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_2
+  |   |   `-- se_resnext50_32x4d
+  |   |-- fold_3
+  |   |   `-- se_resnext50_32x4d
+  |   `-- fold_4
+  |       `-- se_resnext50_32x4d
+  `-- [2,3,4,5,6]
+      |-- fold_0
+      |   `-- se_resnext50_32x4d
+      |-- fold_1
+      |   `-- se_resnext50_32x4d
+      |-- fold_2
+      |   `-- se_resnext50_32x4d
+      |-- fold_3
+      |   `-- se_resnext50_32x4d
+      `-- fold_4
+          `-- se_resnext50_32x4d
+  ``` 
+
+# Predict
+
+```bash
+export LC_ALL=C.UTF-8
+export LANG=C.UTF-8
+
+CUDA_VISIBLE_DEVICES=2,3 python src/inference.py predict-all --data_root=/data/ --model_root=/logs/pseudo/ --model_name=se_resnext50_32x4d --out_dir /predictions/pseudo/
+``` 
+Where: 
+* `data_root`: path to the data from kaggle. 
+* `model_root`: path to the log folders (Ex: `/logs/pseudo/`, `/log/non_pseudo/`) 
+* `model_name`: can be `se_resnext50_32x4d`, `se_resnext101_32x4d` or `densenet121`.
+* `out_dir`: folder where stores the logit files. 
+
+The `out_dir` will have the structure as follows: 
+```
+/predictions/pseudo/
+|-- [1,2,3,4,5]
+|   |-- fold_0
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_1
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_2
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_3
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   `-- fold_4
+|       `-- se_resnext50_32x4d
+|           `-- pred_test.npy
+|-- [1,2,3,4,6]
+|   |-- fold_0
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_1
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_2
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_3
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   `-- fold_4
+|       `-- se_resnext50_32x4d
+|           `-- pred_test.npy
+|-- [1,2,3,5,6]
+|   |-- fold_0
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_1
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_2
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_3
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   `-- fold_4
+|       `-- se_resnext50_32x4d
+|           `-- pred_test.npy
+|-- [1,2,4,5,6]
+|   |-- fold_0
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_1
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_2
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_3
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   `-- fold_4
+|       `-- se_resnext50_32x4d
+|           `-- pred_test.npy
+|-- [1,3,4,5,6]
+|   |-- fold_0
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_1
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_2
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   |-- fold_3
+|   |   `-- se_resnext50_32x4d
+|   |       `-- pred_test.npy
+|   `-- fold_4
+|       `-- se_resnext50_32x4d
+|           `-- pred_test.npy
+`-- [2,3,4,5,6]
+    |-- fold_0
+    |   `-- se_resnext50_32x4d
+    |       `-- pred_test.npy
+    |-- fold_1
+    |   `-- se_resnext50_32x4d
+    |       `-- pred_test.npy
+    |-- fold_2
+    |   `-- se_resnext50_32x4d
+    |       `-- pred_test.npy
+    |-- fold_3
+    |   `-- se_resnext50_32x4d
+    |       `-- pred_test.npy
+    `-- fold_4
+        `-- se_resnext50_32x4d
+            `-- pred_test.npy
 ```
 
-* `PRETRAINED_CONTROL`: is the root folder of pretrained models above  
+# Ensemble 
 
-* `--model_params/weight=
-$PRETRAINED_CONTROL/$channels/se_resnext50_32x4d/checkpoints/best.pth:str \`  is the weight of 
-corresponding model pretrained on controls dataset.
+In [`src/ensemble.py`](src/ensemble.py#L47), `model_names` is the list of model that be used for ensemble.
 
+  Ex: `model_names=['se_resnext50_32x4d', 'se_resnext101_32x4d', 'densenet121']`
 
+```bash
+export LC_ALL=C.UTF-8
+export LANG=C.UTF-8
+
+python src/ensemble.py ensemble --data_root /data/ --predict_root /predictions/pseudo/ --group_json group.json
+```
+Where: 
+* `data_root`: path to the data from kaggle. 
+* `predict_root`: folder where stores the logit files. 
+* `group_json`: JSON file stores the plate groups of test set. 
 
-- Run: `python src/make_submission.csv`
\ No newline at end of file
+Output:  
+The `submission.csv` will be located at `${predict_root}/submission.csv`. 
diff --git a/bin/image_to_arr.sh b/bin/image_to_arr.sh
deleted file mode 100755
index 1e8e6e2..0000000
--- a/bin/image_to_arr.sh
+++ /dev/null
@@ -1,11 +0,0 @@
-#!/usr/bin/env bash
-
-dataset=train
-csv=/raid/data/kaggle/recursion-cellular-image-classification/${dataset}.csv
-base_path=/raid/data/kaggle/recursion-cellular-image-classification/
-output=/raid/data/kaggle/recursion-cellular-image-classification/array/
-
-python preprocessing/image_to_arr.py image-to-arr   --csv=$csv \
-                                                    --base_path=$base_path \
-                                                    --output=$output \
-                                                    --dataset=$dataset
\ No newline at end of file
diff --git a/bin/train.sh b/bin/train.sh
index ca0cbe1..7f569e6 100755
--- a/bin/train.sh
+++ b/bin/train.sh
@@ -3,20 +3,22 @@
 export CUDA_VISIBLE_DEVICES=2,3
 RUN_CONFIG=config.yml
 
-# [1,2,3,4,5] [1,2,3,4,6] [1,2,3,5,6]
 
-PRETRAINED_CONTROL=/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/
-for channels in [1,2,3,4,5]; do
+PRETRAINED_CONTROL=/logs/pretrained_controls/
+model=se_resnext50_32x4d
+for channels in [1,2,3,4,5] [1,2,3,4,6] [1,2,3,5,6] [1,2,4,5,6] [1,3,4,5,6] [2,3,4,5,6]; do
     for fold in 0 1 2 3 4; do
-        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/normal_from_control/$channels/fold_$fold/densenet121/
+        TRAIN_CSV=./csv/train_$fold.csv
+        VALID_CSV=./csv/valid_$fold.csv
+        LOGDIR=/logs/non_pseudo/$channels/fold_$fold/$model/
         catalyst-dl run \
             --config=./configs/${RUN_CONFIG} \
             --logdir=$LOGDIR \
             --out_dir=$LOGDIR:str \
             --stages/data_params/channels=$channels:list \
-            --stages/data_params/train_csv=./csv/train_$fold.csv:str \
-            --stages/data_params/valid_csv=./csv/valid_$fold.csv:str \
-            --model_params/weight=$PRETRAINED_CONTROL/$channels/densenet121/checkpoints/best.pth:str \
+            --stages/data_params/train_csv=$TRAIN_CSV:str \
+            --stages/data_params/valid_csv=$VALID_CSV:str \
+            --model_params/weight=$PRETRAINED_CONTROL/$channels/$model/checkpoints/best.pth:str \
             --verbose
     done
 done
\ No newline at end of file
diff --git a/bin/train_control.sh b/bin/train_control.sh
index ec8b692..204cf4d 100755
--- a/bin/train_control.sh
+++ b/bin/train_control.sh
@@ -1,16 +1,18 @@
 #!/usr/bin/env bash
 
-export CUDA_VISIBLE_DEVICES=0,1
+export CUDA_VISIBLE_DEVICES=2,3
 RUN_CONFIG=config_control.yml
 
 #[1,2,4,5,6] [1,3,4,5,6] [2,3,4,5,6]
 
-for channels in [1,2,3,4,5]; do
-    LOGDIR=/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/$channels/se_resnext101_32x4d/
+model_name=se_resnext50_32x4d
+for channels in [1,2,3,4,5] [1,2,3,4,6] [1,2,3,5,6] [1,2,4,5,6] [1,3,4,5,6] [2,3,4,5,6]; do
+    LOGDIR=/logs/pretrained_controls/${channels}/${model_name}/
     catalyst-dl run \
         --config=./configs/${RUN_CONFIG} \
         --logdir=$LOGDIR \
         --out_dir=$LOGDIR:str \
         --stages/data_params/channels=$channels:list \
+        --model_params/model_name=${model_name}:str \
         --verbose
 done
\ No newline at end of file
diff --git a/bin/train_ds.sh b/bin/train_ds.sh
deleted file mode 100755
index e688bd4..0000000
--- a/bin/train_ds.sh
+++ /dev/null
@@ -1,19 +0,0 @@
-#!/usr/bin/env bash
-
-export CUDA_VISIBLE_DEVICES=2,3
-RUN_CONFIG=config_ds.yml
-
-
-for channels in [1,2,3,4,5,6]; do
-    for fold in 0; do
-        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/DS/fold_$fold/DSResnet/
-        catalyst-dl run \
-            --config=./configs/${RUN_CONFIG} \
-            --logdir=$LOGDIR \
-            --out_dir=$LOGDIR:str \
-            --stages/data_params/channels=$channels:list \
-            --stages/data_params/train_csv=./csv/train_$fold.csv:str \
-            --stages/data_params/valid_csv=./csv/valid_$fold.csv:str \
-            --verbose
-    done
-done
\ No newline at end of file
diff --git a/bin/train_multi_channels.sh b/bin/train_multi_channels.sh
deleted file mode 100755
index 6629934..0000000
--- a/bin/train_multi_channels.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-#!/usr/bin/env bash
-
-export CUDA_VISIBLE_DEVICES=2,3
-RUN_CONFIG=config.yml
-
-
-# for channels in [1,2,3,5] [1,2,3,6] [1,2,4,5] [1,2,4,6] [1,2,5,6] [1,3,4,5] [1,3,4,6] [1,3,5,6] [1,4,5,6] [2,3,4,5] [2,3,4,6] [2,3,5,6] [2,4,5,6] [3,4,5,6]; do
-for channels in [1,2,3,4]; do
-    for fold in 1 2 3; do
-        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/search_channels/fold_$fold/$channels/se_resnext50_32x4d/
-        catalyst-dl run \
-            --config=./configs/${RUN_CONFIG} \
-            --logdir=$LOGDIR \
-            --out_dir=$LOGDIR:str \
-            --stages/data_params/channels=$channels:list \
-            --stages/data_params/train_csv=./csv/train_$fold.csv:str \
-            --stages/data_params/valid_csv=./csv/valid_$fold.csv:str \
-            --verbose
-    done
-done
\ No newline at end of file
diff --git a/bin/train_pseudo.sh b/bin/train_pseudo.sh
index 3f25bb4..aa4fcc9 100755
--- a/bin/train_pseudo.sh
+++ b/bin/train_pseudo.sh
@@ -1,60 +1,24 @@
 #!/usr/bin/env bash
 
-export CUDA_VISIBLE_DEVICES=2,3
+export CUDA_VISIBLE_DEVICES=0,1
 RUN_CONFIG=config_pseudo.yml
 
-#export MASTER_PORT=9669
-#export MASTER_ADDR="127.0.0.1"
-#export WORLD_SIZE=2
-#export RANK=0
 
-
-PRETRAINED_CONTROL=/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/
+PRETRAINED_CONTROL=/logs/pretrained_controls/
 model=se_resnext50_32x4d
-for channels in [1,2,3,4,5]; do
-    for fold in 0 1 2; do
-        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/pseudoall2_from_control/$channels/fold_$fold/$model/
+for channels in [1,2,3,4,5] [1,2,3,4,6] [1,2,3,5,6] [1,2,4,5,6] [1,3,4,5,6] [2,3,4,5,6]; do
+    for fold in 0 1 2 3 4; do 
+        TRAIN_CSV=./csv/pseudo/train_$fold.csv
+        VALID_CSV=./csv/pseudo/valid_$fold.csv
+        LOGDIR=/logs/pseudo/$channels/fold_$fold/$model/
         catalyst-dl run \
             --config=./configs/${RUN_CONFIG} \
             --logdir=$LOGDIR \
             --out_dir=$LOGDIR:str \
             --stages/data_params/channels=$channels:list \
-            --stages/data_params/train_csv=./csv/pseudo_all2/train_$fold.csv:str \
-            --stages/data_params/valid_csv=./csv/pseudo_all2/valid_$fold.csv:str \
+            --stages/data_params/train_csv=$TRAIN_CSV:str \
+            --stages/data_params/valid_csv=$VALID_CSV:str \
             --model_params/weight=$PRETRAINED_CONTROL/$channels/$model/checkpoints/best.pth:str \
             --verbose
-
     done
-done
-
-
-#PRETRAINED_CONTROL=/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/
-#model=se_resnext50_32x4d
-#for channels in [1,2,3,4,5]; do
-#    for fold in 0; do
-#        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/pseudo_from_control_sync/$channels/fold_$fold/$model/
-#        RANK=0 LOCAL_RANK=0 catalyst-dl run \
-#            --config=./configs/${RUN_CONFIG} \
-#            --logdir=$LOGDIR \
-#            --out_dir=$LOGDIR:str \
-#            --stages/data_params/channels=$channels:list \
-#            --stages/data_params/train_csv=./csv/pseudo/train_$fold.csv:str \
-#            --stages/data_params/valid_csv=./csv/pseudo/valid_$fold.csv:str \
-#            --model_params/weight=$PRETRAINED_CONTROL/$channels/$model/checkpoints/best.pth:str \
-#            --verbose \
-#            --distributed_params/rank=0:int
-#
-#        sleep 5
-#
-#        RANK=1 LOCAL_RANK=1 catalyst-dl run \
-#            --config=./configs/${RUN_CONFIG} \
-#            --logdir=$LOGDIR \
-#            --out_dir=$LOGDIR:str \
-#            --stages/data_params/channels=$channels:list \
-#            --stages/data_params/train_csv=./csv/pseudo/train_$fold.csv:str \
-#            --stages/data_params/valid_csv=./csv/pseudo/valid_$fold.csv:str \
-#            --model_params/weight=$PRETRAINED_CONTROL/$channels/$model/checkpoints/best.pth:str \
-#            --verbose \
-#            --distributed_params/rank=1:int
-#    done
-#done
\ No newline at end of file
+done
\ No newline at end of file
diff --git a/bin/train_pseudo_2.sh b/bin/train_pseudo_2.sh
deleted file mode 100755
index afb7b96..0000000
--- a/bin/train_pseudo_2.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/usr/bin/env bash
-
-export CUDA_VISIBLE_DEVICES=0,1
-RUN_CONFIG=config_pseudo.yml
-
-
-PRETRAINED_CONTROL=/raid/bac/kaggle/logs/recursion_cell/pretrained_controls/
-model=se_resnext50_32x4d
-for channels in [1,2,3,4,5]; do
-    for fold in 3 4; do
-        LOGDIR=/raid/bac/kaggle/logs/recursion_cell/pseudoall_from_control/$channels/fold_$fold/$model/
-        catalyst-dl run \
-            --config=./configs/${RUN_CONFIG} \
-            --logdir=$LOGDIR \
-            --out_dir=$LOGDIR:str \
-            --stages/data_params/channels=$channels:list \
-            --stages/data_params/train_csv=./csv/pseudo_all/train_$fold.csv:str \
-            --stages/data_params/valid_csv=./csv/pseudo_all/valid_$fold.csv:str \
-            --model_params/weight=$PRETRAINED_CONTROL/$channels/$model/checkpoints/best.pth:str \
-            --verbose
-
-    done
-done
\ No newline at end of file
diff --git a/configs/config.yml b/configs/config.yml
index 87ccb16..eb2227a 100644
--- a/configs/config.yml
+++ b/configs/config.yml
@@ -1,9 +1,8 @@
 model_params:
-  model: cell_densenet
+  model: cell_senet
   n_channels: 5
   num_classes: 1108
-  model_name: "densenet121"
-#  pretrained: "/raid/bac/pretrained_models/pytorch/fishnet150_ckpt.tar"
+  model_name: "se_resnext50_32x4d"
 
 args:
   expdir: "src"
@@ -20,19 +19,18 @@ stages:
     minimize_metric: False
 
   criterion_params:
-#    criterion: CrossEntropyLoss
     criterion: LabelSmoothingCrossEntropy
 
   data_params:
     batch_size: 64
     num_workers: 8
     drop_last: False
-    # drop_last: True
 
     image_size: &image_size 512
     train_csv: "./csv/train_0.csv"
     valid_csv: "./csv/valid_0.csv"
-    root: "/raid/data/kaggle/recursion-cellular-image-classification/"
+    dataset: "non_pseudo"
+    root: "/data/"
     sites: [1]
     channels: [1,2,3,4,5,6]
 
@@ -52,8 +50,6 @@ stages:
 
     callbacks_params: &callback_params
       loss:
-        #        callback: CriterionCallback
-#        callback: TwoHeadsCriterionCallback
         callback: LabelSmoothCriterionCallback
       optimizer:
         callback: OptimizerCallback
@@ -66,10 +62,6 @@ stages:
         reduce_metric: *reduce_metric
       saver:
         callback: CheckpointCallback
-#      slack:
-#        callback: SlackLogger
-#        channel: logs_cell
-#        url: "https://hooks.slack.com/services/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"
 
   stage1:
 
@@ -81,7 +73,6 @@ stages:
       scheduler: OneCycleLR
       num_steps: &num_epochs 40
       lr_range: [0.0005, 0.00001]
-      # lr_range: [0.0015, 0.00003]
       warmup_steps: 5
       momentum_range: [0.85, 0.95]
 
diff --git a/configs/config_control.yml b/configs/config_control.yml
index a69e280..d1812ce 100644
--- a/configs/config_control.yml
+++ b/configs/config_control.yml
@@ -25,14 +25,13 @@ stages:
     batch_size: 32
     num_workers: 8
     drop_last: False
-    # drop_last: True
 
     image_size: &image_size 512
     train_csv: "./csv/train_0.csv"
     valid_csv: "./csv/valid_0.csv"
     dataset: "control"
     site_mode: "two"
-    root: "/raid/data/kaggle/recursion-cellular-image-classification/"
+    root: "/data/"
     sites: [1]
     channels: [1,2,3,4,5,6]
 
@@ -52,8 +51,6 @@ stages:
 
     callbacks_params: &callback_params
       loss:
-        #        callback: CriterionCallback
-#        callback: TwoHeadsCriterionCallback
         callback: LabelSmoothCriterionCallback
       optimizer:
         callback: OptimizerCallback
@@ -66,10 +63,6 @@ stages:
         reduce_metric: *reduce_metric
       saver:
         callback: CheckpointCallback
-#      slack:
-#        callback: SlackLogger
-#        channel: logs_cell
-#        url: "https://hooks.slack.com/services/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"
 
   stage1:
 
@@ -81,7 +74,6 @@ stages:
       scheduler: OneCycleLR
       num_steps: &num_epochs 10
       lr_range: [0.0005, 0.00001]
-      # lr_range: [0.0015, 0.00003]
       warmup_steps: 5
       momentum_range: [0.85, 0.95]
 
diff --git a/configs/config_ds.yml b/configs/config_ds.yml
deleted file mode 100644
index 595acea..0000000
--- a/configs/config_ds.yml
+++ /dev/null
@@ -1,86 +0,0 @@
-model_params:
-  model: DSResnet
-  n_channels: 6
-  num_classes: 1108
-
-args:
-  expdir: "src"
-  logdir: &logdir "./logs/cell"
-  baselogdir: "./logs/cell"
-
-distributed_params:
-  opt_level: O1
-
-stages:
-
-  state_params:
-    main_metric: &reduce_metric acc_final
-    minimize_metric: False
-
-  criterion_params:
-#    criterion: CrossEntropyLoss
-    criterion: LabelSmoothingCrossEntropy
-
-  data_params:
-    batch_size: 64
-    num_workers: 8
-    drop_last: False
-    # drop_last: True
-
-    image_size: &image_size 512
-    train_csv: "./csv/kfold5/train_0.csv"
-    valid_csv: "./csv/kfold5/valid_0.csv"
-    root: "/raid/data/kaggle/recursion-cellular-image-classification/"
-    sites: [1]
-    channels: [1,2,3,4,5,6]
-
-  stage0:
-
-    optimizer_params:
-      optimizer: Nadam
-      lr: 0.001
-
-    scheduler_params:
-      scheduler: MultiStepLR
-      milestones: [10]
-      gamma: 0.3
-
-    state_params:
-      num_epochs: 2
-
-    callbacks_params: &callback_params
-      loss:
-        callback: DSMixupCallback
-        loss_weights: [0.3, 0.3, 1.0]
-        fields: ["images"]
-      optimizer:
-        callback: OptimizerCallback
-        accumulation_steps: 2
-      accuracy:
-        callback: DSAccuracyCallback
-        logit_names: ["m2", "m3", "final"]
-      scheduler:
-        callback: SchedulerCallback
-        reduce_metric: *reduce_metric
-        mode: "batch"
-      saver:
-        callback: CheckpointCallback
-
-  stage1:
-
-    optimizer_params:
-      optimizer: Nadam
-      lr: 0.0001
-
-    scheduler_params:
-      scheduler: OneCycleLR
-      num_steps: &num_epochs 40
-      lr_range: [0.0005, 0.00001]
-      # lr_range: [0.0015, 0.00003]
-      warmup_steps: 5
-      momentum_range: [0.85, 0.95]
-
-    state_params:
-      num_epochs: *num_epochs
-
-    callbacks_params: *callback_params
\ No newline at end of file
diff --git a/configs/config_pseudo.yml b/configs/config_pseudo.yml
index 572e8ce..5a71ed1 100644
--- a/configs/config_pseudo.yml
+++ b/configs/config_pseudo.yml
@@ -3,13 +3,11 @@ model_params:
   n_channels: 5
   num_classes: 1108
   model_name: "se_resnext50_32x4d"
-  weight: "/raid/bac/kaggle/logs/recursion_cell/test/190826/controls/se_resnext50_32x4d/checkpoints/best.pth"
 
 args:
   expdir: "src"
   logdir: &logdir "./logs/cell"
   baselogdir: "./logs/cell"
-  # resume: "/raid/bac/kaggle/logs/recursion_cell/test/190826/pseudo_from_control/fold_0/se_resnext50_32x4d/checkpoints/best.pth"
 
 distributed_params:
   opt_level: O1
@@ -21,20 +19,18 @@ stages:
     minimize_metric: False
 
   criterion_params:
-#    criterion: CrossEntropyLoss
     criterion: LabelSmoothingCrossEntropy
 
   data_params:
     batch_size: 64
     num_workers: 8
     drop_last: False
-    # drop_last: True
 
     image_size: &image_size 512
     train_csv: "./csv/train_0.csv"
     valid_csv: "./csv/valid_0.csv"
     dataset: "pseudo"
-    root: "/raid/data/kaggle/recursion-cellular-image-classification/"
+    root: "/data/"
     sites: [1]
     channels: [1,2,3,4,5,6]
 
@@ -54,8 +50,6 @@ stages:
 
     callbacks_params: &callback_params
       loss:
-        #        callback: CriterionCallback
-#        callback: TwoHeadsCriterionCallback
         callback: LabelSmoothCriterionCallback
       optimizer:
         callback: OptimizerCallback
@@ -68,10 +62,6 @@ stages:
         reduce_metric: *reduce_metric
       saver:
         callback: CheckpointCallback
-#      slack:
-#        callback: SlackLogger
-#        channel: logs_cell
-#        url: "https://hooks.slack.com/services/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"
 
   stage1:
 
@@ -83,7 +73,6 @@ stages:
       scheduler: OneCycleLR
       num_steps: &num_epochs 40
       lr_range: [0.0005, 0.00001]
-      # lr_range: [0.0015, 0.00003]
       warmup_steps: 5
       momentum_range: [0.85, 0.95]
 
diff --git a/docker/Dockerfile b/docker/Dockerfile
new file mode 100644
index 0000000..d3f5ede
--- /dev/null
+++ b/docker/Dockerfile
@@ -0,0 +1,39 @@
+FROM pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel
+
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    libsm6 \
+    libxext6 \
+    libfontconfig1 \
+    libxrender1 \
+    libswscale-dev \
+    libtbb2 \
+    libtbb-dev \
+    libjpeg-dev \
+    libpng-dev \
+    libtiff-dev \
+    libjasper-dev \
+    libavformat-dev \
+    libpq-dev \
+    libturbojpeg \
+    software-properties-common \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
+
+RUN pip install catalyst==19.06.5
+RUN pip install albumentations==0.3.0
+RUN pip install cnn_finetune==0.5.3
+RUN pip install timm
+RUN pip install click
+RUN pip install pandas
+
+WORKDIR /tmp/unique_for_apex
+# uninstall Apex if present, twice to make absolutely sure :)
+RUN pip uninstall -y apex || :
+RUN pip uninstall -y apex || :
+# SHA is something the user can touch to force recreation of this Docker layer,
+# and therefore force cloning of the latest version of Apex
+RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
+WORKDIR /tmp/unique_for_apex/apex
+RUN pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
+WORKDIR /workspace
diff --git a/images/pipeline.png b/images/pipeline.png
new file mode 100644
index 0000000..aba2cd1
Binary files /dev/null and b/images/pipeline.png differ
diff --git a/output.json b/output.json
new file mode 100644
index 0000000..665dde3
--- /dev/null
+++ b/output.json
@@ -0,0 +1 @@
+{"test_plate_id_to_group_id": {"HEPG2-08_1": 3, "HEPG2-08_2": 0, "HEPG2-08_3": 2, "HEPG2-08_4": 1, "HEPG2-09_1": 1, "HEPG2-09_2": 3, "HEPG2-09_3": 0, "HEPG2-09_4": 2, "HEPG2-10_1": 0, "HEPG2-10_2": 2, "HEPG2-10_3": 1, "HEPG2-10_4": 3, "HEPG2-11_1": 0, "HEPG2-11_2": 2, "HEPG2-11_3": 1, "HEPG2-11_4": 3, "HUVEC-17_1": 0, "HUVEC-17_2": 2, "HUVEC-17_3": 1, "HUVEC-17_4": 3, "HUVEC-18_1": 0, "HUVEC-18_2": 2, "HUVEC-18_3": 1, "HUVEC-18_4": 3, "HUVEC-19_1": 2, "HUVEC-19_2": 1, "HUVEC-19_3": 3, "HUVEC-19_4": 0, "HUVEC-20_1": 2, "HUVEC-20_2": 1, "HUVEC-20_3": 3, "HUVEC-20_4": 0, "HUVEC-21_1": 3, "HUVEC-21_2": 0, "HUVEC-21_3": 2, "HUVEC-21_4": 1, "HUVEC-22_1": 0, "HUVEC-22_2": 2, "HUVEC-22_3": 1, "HUVEC-22_4": 3, "HUVEC-23_1": 0, "HUVEC-23_2": 2, "HUVEC-23_3": 1, "HUVEC-23_4": 3, "HUVEC-24_1": 3, "HUVEC-24_2": 0, "HUVEC-24_3": 2, "HUVEC-24_4": 1, "RPE-08_1": 1, "RPE-08_2": 3, "RPE-08_3": 0, "RPE-08_4": 2, "RPE-09_1": 0, "RPE-09_2": 2, "RPE-09_3": 1, "RPE-09_4": 3, "RPE-10_1": 0, "RPE-10_2": 2, "RPE-10_3": 1, "RPE-10_4": 3, "RPE-11_1": 0, "RPE-11_2": 2, "RPE-11_3": 1, "RPE-11_4": 3, "U2OS-04_1": 2, "U2OS-04_2": 1, "U2OS-04_3": 3, "U2OS-04_4": 0, "U2OS-05_1": 3, "U2OS-05_2": 0, "U2OS-05_3": 2, "U2OS-05_4": 1}, "label_group_list": [[1, 3, 5, 6, 8, 10, 26, 33, 40, 46, 48, 55, 56, 59, 70, 74, 77, 80, 88, 89, 91, 94, 102, 105, 109, 112, 113, 124, 126, 133, 136, 144, 147, 155, 157, 158, 162, 172, 177, 181, 188, 192, 195, 203, 211, 212, 217, 221, 222, 224, 238, 239, 242, 253, 254, 256, 258, 266, 268, 272, 274, 276, 277, 281, 282, 287, 290, 293, 295, 296, 301, 302, 308, 309, 312, 313, 320, 324, 339, 340, 341, 345, 346, 348, 350, 352, 355, 360, 361, 366, 371, 372, 373, 379, 385, 388, 389, 391, 392, 395, 397, 398, 399, 400, 411, 413, 415, 417, 431, 436, 439, 442, 445, 450, 456, 459, 460, 471, 478, 481, 482, 486, 489, 497, 500, 503, 505, 506, 508, 513, 514, 523, 532, 537, 538, 541, 543, 547, 549, 552, 555, 556, 557, 559, 565, 566, 567, 572, 576, 577, 581, 582, 585, 588, 598, 604, 605, 606, 607, 611, 613, 618, 620, 622, 624, 626, 647, 653, 655, 657, 660, 670, 673, 676, 681, 687, 689, 692, 695, 699, 700, 703, 704, 705, 708, 715, 734, 742, 743, 744, 745, 750, 757, 760, 762, 764, 765, 770, 775, 784, 786, 791, 793, 797, 798, 806, 810, 811, 825, 828, 831, 838, 840, 845, 847, 853, 868, 870, 875, 877, 881, 883, 884, 889, 890, 891, 894, 897, 900, 907, 910, 914, 921, 938, 939, 945, 950, 959, 960, 962, 972, 974, 978, 981, 982, 983, 984, 986, 987, 1006, 1008, 1013, 1015, 1020, 1023, 1025, 1027, 1028, 1031, 1039, 1041, 1042, 1056, 1060, 1062, 1065, 1066, 1067, 1076, 1084, 1089, 1096, 1100, 1101, 1102, 1104, 1106], [4, 21, 24, 29, 32, 34, 35, 38, 39, 49, 51, 53, 62, 65, 76, 84, 86, 90, 96, 98, 99, 114, 115, 119, 120, 121, 122, 123, 125, 127, 129, 134, 135, 142, 145, 148, 151, 159, 163, 167, 168, 170, 171, 173, 174, 180, 183, 184, 185, 186, 190, 191, 197, 198, 210, 214, 215, 229, 234, 237, 244, 246, 248, 250, 252, 255, 259, 265, 267, 271, 278, 279, 283, 289, 291, 300, 307, 310, 311, 314, 319, 322, 326, 335, 337, 338, 347, 359, 362, 363, 365, 368, 369, 375, 377, 380, 382, 387, 394, 396, 401, 402, 405, 410, 422, 432, 435, 437, 438, 443, 444, 446, 448, 449, 453, 463, 464, 465, 469, 477, 480, 484, 485, 487, 490, 491, 494, 501, 504, 510, 512, 515, 516, 520, 525, 527, 530, 533, 545, 546, 548, 551, 554, 564, 570, 578, 583, 586, 590, 593, 595, 599, 603, 608, 614, 616, 617, 625, 627, 629, 632, 633, 636, 637, 643, 644, 646, 648, 649, 652, 658, 664, 665, 666, 671, 672, 674, 677, 678, 679, 686, 690, 691, 707, 713, 716, 718, 719, 727, 738, 740, 748, 749, 755, 759, 763, 768, 769, 774, 777, 782, 795, 796, 800, 802, 805, 807, 808, 809, 815, 816, 818, 819, 820, 821, 829, 834, 842, 848, 858, 860, 861, 863, 864, 866, 869, 873, 886, 888, 895, 901, 908, 916, 919, 922, 926, 933, 936, 937, 940, 942, 943, 944, 947, 955, 961, 963, 969, 970, 977, 989, 990, 992, 995, 999, 1000, 1007, 1012, 1014, 1018, 1029, 1034, 1051, 1053, 1057, 1059, 1068, 1071, 1074, 1079, 1085, 1086, 1087, 1091, 1093, 1099, 1105], [2, 7, 11, 19, 20, 22, 25, 27, 28, 41, 42, 43, 44, 47, 52, 57, 60, 61, 63, 64, 66, 72, 75, 78, 81, 83, 87, 92, 93, 95, 100, 101, 108, 111, 117, 118, 130, 131, 132, 138, 146, 149, 153, 160, 164, 165, 166, 169, 175, 178, 182, 189, 194, 196, 206, 207, 216, 218, 219, 225, 228, 232, 233, 236, 245, 257, 262, 263, 269, 284, 285, 299, 303, 305, 315, 317, 327, 329, 330, 333, 334, 336, 342, 349, 351, 353, 357, 364, 376, 381, 383, 384, 390, 393, 404, 407, 408, 409, 412, 414, 421, 423, 424, 425, 428, 429, 434, 447, 452, 454, 457, 461, 462, 466, 467, 473, 475, 479, 483, 492, 495, 499, 502, 507, 511, 517, 521, 529, 535, 536, 539, 542, 544, 550, 553, 558, 562, 563, 569, 571, 579, 580, 584, 597, 600, 601, 615, 623, 639, 640, 642, 650, 651, 654, 659, 661, 662, 667, 669, 682, 683, 685, 694, 697, 701, 706, 709, 714, 717, 721, 722, 723, 724, 726, 729, 730, 731, 732, 733, 735, 736, 739, 747, 751, 752, 753, 754, 761, 766, 776, 779, 781, 783, 787, 788, 789, 790, 801, 804, 813, 826, 827, 830, 837, 844, 849, 851, 852, 856, 857, 859, 865, 867, 871, 874, 876, 879, 887, 893, 896, 898, 902, 903, 905, 909, 913, 915, 917, 918, 920, 923, 927, 932, 934, 941, 946, 951, 952, 956, 957, 965, 966, 967, 968, 975, 988, 991, 996, 1002, 1003, 1016, 1017, 1019, 1024, 1030, 1032, 1033, 1035, 1040, 1043, 1044, 1046, 1047, 1049, 1052, 1058, 1063, 1064, 1070, 1077, 1078, 1080, 1083, 1088, 1090, 1092, 1098], [0, 9, 12, 13, 14, 15, 16, 17, 18, 23, 30, 31, 36, 37, 45, 50, 54, 58, 67, 68, 69, 71, 73, 79, 82, 85, 97, 103, 104, 106, 107, 110, 116, 128, 137, 139, 140, 141, 143, 150, 152, 154, 156, 161, 176, 179, 187, 193, 199, 200, 201, 202, 204, 205, 208, 209, 213, 220, 223, 226, 227, 230, 231, 235, 240, 241, 243, 247, 249, 251, 260, 261, 264, 270, 273, 275, 280, 286, 288, 292, 294, 297, 298, 304, 306, 316, 318, 321, 323, 325, 328, 331, 332, 343, 344, 354, 356, 358, 367, 370, 374, 378, 386, 403, 406, 416, 418, 419, 420, 426, 427, 430, 433, 440, 441, 451, 455, 458, 468, 470, 472, 474, 476, 488, 493, 496, 498, 509, 518, 519, 522, 524, 526, 528, 531, 534, 540, 560, 561, 568, 573, 574, 575, 587, 589, 591, 592, 594, 596, 602, 609, 610, 612, 619, 621, 628, 630, 631, 634, 635, 638, 641, 645, 656, 663, 668, 675, 680, 684, 688, 693, 696, 698, 702, 710, 711, 712, 720, 725, 728, 737, 741, 746, 756, 758, 767, 771, 772, 773, 778, 780, 785, 792, 794, 799, 803, 812, 814, 817, 822, 823, 824, 832, 833, 835, 836, 839, 841, 843, 846, 850, 854, 855, 862, 872, 878, 880, 882, 885, 892, 899, 904, 906, 911, 912, 924, 925, 928, 929, 930, 931, 935, 948, 949, 953, 954, 958, 964, 971, 973, 976, 979, 980, 985, 993, 994, 997, 998, 1001, 1004, 1005, 1009, 1010, 1011, 1021, 1022, 1026, 1036, 1037, 1038, 1045, 1048, 1050, 1054, 1055, 1061, 1069, 1072, 1073, 1075, 1081, 1082, 1094, 1095, 1097, 1103, 1107]]}
\ No newline at end of file
diff --git a/preprocessing/combinations.py b/preprocessing/combinations.py
deleted file mode 100644
index ded9d4f..0000000
--- a/preprocessing/combinations.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from itertools import combinations
-
-a = [1, 2, 3, 4, 5, 6]
-result = [list(i) for i in combinations(a, 3)]
-print(result)
-print(len(result))
\ No newline at end of file
diff --git a/preprocessing/convert_weight.py b/preprocessing/convert_weight.py
deleted file mode 100644
index e512209..0000000
--- a/preprocessing/convert_weight.py
+++ /dev/null
@@ -1,6 +0,0 @@
-import torch
-
-import pdb
-pdb.set_trace()
-checkpoint = torch.load("../pretrained/results/se_resnext50.attention.per_image_norm.1024/checkpoint/swa.10.022.pth", map_location='cpu')
-
diff --git a/preprocessing/image_to_arr.py b/preprocessing/image_to_arr.py
deleted file mode 100644
index 60c62dd..0000000
--- a/preprocessing/image_to_arr.py
+++ /dev/null
@@ -1,109 +0,0 @@
-import pandas as pd
-import numpy as np
-import cv2
-import os
-
-import click
-from tqdm import *
-
-
-def load_image(path):
-    image = cv2.imread(path, 0)
-    return image
-
-
-def image_path(dataset,
-               experiment,
-               plate,
-               address,
-               site,
-               channel,
-               base_path):
-    """
-    Returns the path of a channel image.
-    Parameters
-    ----------
-    dataset : str
-        what subset of the data: train, test
-    experiment : str
-        experiment name
-    plate : int
-        plate number
-    address : str
-        plate address
-    site : int
-        sites number
-    channel : int
-        channel number
-    base_path : str
-        the base path of the raw images
-    Returns
-    -------
-    str the path of image
-    """
-    return os.path.join(base_path, dataset, experiment, "Plate{}".format(plate),
-                        "{}_s{}_w{}.png".format(address, site, channel))
-
-
-def load_images_as_tensor(image_paths, dtype=np.uint8):
-    n_channels = len(image_paths)
-
-    data = np.ndarray(shape=(512, 512, n_channels), dtype=dtype)
-
-    for ix, img_path in enumerate(image_paths):
-        data[:, :, ix] = load_image(img_path)
-
-    return data
-
-
-@click.group()
-def cli():
-    print("Convert images to array")
-
-
-@cli.command()
-@click.option('--csv', type=str)
-@click.option('--base_path', type=str)
-@click.option('--output', type=str)
-@click.option('--dataset', type=str)
-def image_to_arr(
-        csv,
-        base_path,
-        output,
-        dataset,
-    ):
-    channels = [1, 2, 3, 4, 5, 6]
-    df = pd.read_csv(csv)
-    experiments = df['experiment'].values
-    plates = df['plate'].values
-    wells = df['well'].values
-
-    import pdb
-    pdb.set_trace()
-
-    for experiment, plate, well in tqdm(zip(experiments, plates, wells), total=len(experiments)):
-        for site in [1, 2]:
-            channel_paths = [
-                image_path(
-                    dataset=dataset,
-                    experiment=experiment,
-                    plate=plate,
-                    address=well,
-                    channel=channel,
-                    site=site,
-                    base_path=base_path,
-                ) for channel in channels
-            ]
-            image = load_images_as_tensor(channel_paths, dtype=np.float32)
-            os.makedirs(
-                os.path.join(output, dataset, experiment, "Plate{}".format(plate)),
-                exist_ok=True
-            )
-            np.save(
-                os.path.join(output, dataset, experiment, "Plate{}".format(plate), "{}_s{}.npy".format(well, site)),
-                image
-            )
-
-
-if __name__ == '__main__':
-    cli()
diff --git a/src/__init__.py b/src/__init__.py
index 335376e..1b36341 100644
--- a/src/__init__.py
+++ b/src/__init__.py
@@ -6,35 +6,15 @@
 from losses import *
 from callbacks import *
 from optimizers import *
-from schedulers import *
 
 
 # Register models
 registry.Model(ResNet)
 registry.Model(cell_senet)
 registry.Model(cell_densenet)
-registry.Model(SENetGrouplevel)
-registry.Model(EfficientNet)
-registry.Model(SENetTIMM)
-registry.Model(InceptionV3TIMM)
-registry.Model(GluonResnetTIMM)
-registry.Model(DSInceptionV3)
-registry.Model(DSSENet)
-registry.Model(DSResnet)
-registry.Model(ResNet50CutMix)
-registry.Model(Fishnet)
-registry.Model(SENetCellType)
-registry.Model(SENetCellMultipleDropout)
-registry.Model(MixNet)
 
 # Register callbacks
 registry.Callback(LabelSmoothCriterionCallback)
-registry.Callback(SmoothMixupCallback)
-registry.Callback(DSAccuracyCallback)
-registry.Callback(DSCriterionCallback)
-registry.Callback(SlackLogger)
-registry.Callback(TwoHeadsCriterionCallback)
-registry.Callback(DSMixupCallback)
 
 # Register criterions
 registry.Criterion(LabelSmoothingCrossEntropy)
@@ -42,6 +22,4 @@
 # Register optimizers
 registry.Optimizer(AdamW)
 registry.Optimizer(Nadam)
-registry.Optimizer(RAdam)
-
-registry.Scheduler(CyclicLRFix)
\ No newline at end of file
+registry.Optimizer(RAdam)
\ No newline at end of file
diff --git a/src/callbacks.py b/src/callbacks.py
index 4fd5638..d2cc6a5 100644
--- a/src/callbacks.py
+++ b/src/callbacks.py
@@ -1,54 +1,5 @@
 from catalyst.dl.core import Callback, RunnerState
-from catalyst.dl.callbacks import CriterionCallback
-from catalyst.dl.utils.criterion import accuracy
-import torch
 import torch.nn as nn
-import numpy as np
-from typing import List
-import logging
-from slack_logger import SlackHandler, SlackFormatter
-
-
-class SlackLogger(Callback):
-    """
-    Logger callback, translates state.metrics to console and text file
-    """
-
-    def __init__(self, url, channel):
-        self.logger = None
-        self.url = url
-        self.channel = channel
-
-    @staticmethod
-    def _get_logger(url, channel):
-        logger = logging.getLogger("metrics")
-        logger.setLevel(logging.INFO)
-
-        slackhandler = SlackHandler(
-            username='logger',
-            icon_emoji=':robot_face:',
-            url=url,
-            channel=channel
-        )
-        slackhandler.setLevel(logging.INFO)
-
-        formater = SlackFormatter()
-        slackhandler.setFormatter(formater)
-        logger.addHandler(slackhandler)
-
-        return logger
-
-    def on_stage_start(self, state: RunnerState):
-        self.logger = self._get_logger(self.url, self.channel)
-
-    def on_stage_end(self, state):
-        self.logger.handlers = []
-
-    def on_epoch_end(self, state):
-        pass
-        # import pdb
-        # pdb.set_trace()
-        # self.logger.info("", extra={"state": state})
 
 
 class LabelSmoothCriterionCallback(Callback):
@@ -109,325 +60,3 @@ def on_batch_end(self, state: RunnerState):
         })
 
         self._add_loss_to_state(state, loss)
-
-
-class SmoothMixupCallback(LabelSmoothCriterionCallback):
-    """
-    Callback to do mixup augmentation.
-    Paper: https://arxiv.org/abs/1710.09412
-    Note:
-        MixupCallback is inherited from CriterionCallback and
-        does its work.
-        You may not use them together.
-    """
-
-    def __init__(
-        self,
-        fields: List[str] = ("images",),
-        alpha=0.5,
-        on_train_only=True,
-        **kwargs
-    ):
-        """
-        Args:
-            fields (List[str]): list of features which must be affected.
-            alpha (float): beta distribution a=b parameters.
-                Must be >=0. The more alpha closer to zero
-                the less effect of the mixup.
-            on_train_only (bool): Apply to train only.
-                As the mixup use the proxy inputs, the targets are also proxy.
-                We are not interested in them, are we?
-                So, if on_train_only is True, use a standard output/metric
-                for validation.
-        """
-        assert len(fields) > 0, \
-            "At least one field for MixupCallback is required"
-        assert alpha >= 0, "alpha must be>=0"
-
-        super().__init__(**kwargs)
-
-        self.on_train_only = on_train_only
-        self.fields = fields
-        self.alpha = alpha
-        self.lam = 1
-        self.index = None
-        self.is_needed = True
-
-    def on_loader_start(self, state: RunnerState):
-        self.is_needed = not self.on_train_only or \
-            state.loader_name.startswith("train")
-
-    def on_batch_start(self, state: RunnerState):
-        if not self.is_needed:
-            return
-
-        if self.alpha > 0:
-            self.lam = np.random.beta(self.alpha, self.alpha)
-        else:
-            self.lam = 1
-
-        self.index = torch.randperm(state.input[self.fields[0]].shape[0])
-        self.index.to(state.device)
-
-        for f in self.fields:
-            state.input[f] = self.lam * state.input[f] + \
-                (1 - self.lam) * state.input[f][self.index]
-
-    def _compute_loss(self, state: RunnerState, criterion):
-        if not self.is_needed:
-            return super()._compute_loss(state, criterion)
-
-        pred = state.output[self.output_key]
-        y_a = state.input[self.input_key]
-        y_b = state.input[self.input_key][self.index]
-
-        loss = self.lam * criterion(pred, y_a) + \
-            (1 - self.lam) * criterion(pred, y_b)
-        return loss
-
-
-class DSCriterionCallback(Callback):
-    def __init__(
-        self,
-        input_key: str = "targets",
-        output_key: str = "logits",
-        prefix: str = "loss",
-        criterion_key: str = None,
-        loss_key: str = None,
-        multiplier: float = 1.0,
-        loss_weights: List[float] = None,
-    ):
-        self.input_key = input_key
-        self.output_key = output_key
-        self.prefix = prefix
-        self.criterion_key = criterion_key
-        self.loss_key = loss_key
-        self.multiplier = multiplier
-        self.loss_weights = loss_weights
-
-    def _add_loss_to_state(self, state: RunnerState, loss):
-        if self.loss_key is None:
-            if state.loss is not None:
-                if isinstance(state.loss, list):
-                    state.loss.append(loss)
-                else:
-                    state.loss = [state.loss, loss]
-            else:
-                state.loss = loss
-        else:
-            if state.loss is not None:
-                assert isinstance(state.loss, dict)
-                state.loss[self.loss_key] = loss
-            else:
-                state.loss = {self.loss_key: loss}
-
-    def _compute_loss(self, state: RunnerState, criterion):
-        outputs = state.output[self.output_key]
-        input = state.input[self.input_key]
-        assert len(self.loss_weights) == len(outputs)
-        loss = 0
-        for i, output in enumerate(outputs):
-            loss += criterion(output, input) * self.loss_weights[i]
-        return loss
-
-    def on_stage_start(self, state: RunnerState):
-        assert state.criterion is not None
-
-    def on_batch_end(self, state: RunnerState):
-        if state.loader_name.startswith("train"):
-            criterion = state.get_key(
-                key="criterion", inner_key=self.criterion_key
-            )
-        else:
-            criterion = nn.CrossEntropyLoss()
-
-        loss = self._compute_loss(state, criterion) * self.multiplier
-
-        state.metrics.add_batch_value(metrics_dict={
-            self.prefix: loss.item(),
-        })
-
-        self._add_loss_to_state(state, loss)
-
-
-class DSMixupCallback(DSCriterionCallback):
-    """
-    Callback to do mixup augmentation.
-
-    Paper: https://arxiv.org/abs/1710.09412
-
-    Note:
-        MixupCallback is inherited from CriterionCallback and
-        does its work.
-
-        You may not use them together.
-    """
-
-    def __init__(
-        self,
-        fields: List[str] = ("features",),
-        alpha=1.0,
-        on_train_only=True,
-        **kwargs
-    ):
-        """
-        Args:
-            fields (List[str]): list of features which must be affected.
-            alpha (float): beta distribution a=b parameters.
-                Must be >=0. The more alpha closer to zero
-                the less effect of the mixup.
-            on_train_only (bool): Apply to train only.
-                As the mixup use the proxy inputs, the targets are also proxy.
-                We are not interested in them, are we?
-                So, if on_train_only is True, use a standard output/metric
-                for validation.
-        """
-        assert len(fields) > 0, \
-            "At least one field for MixupCallback is required"
-        assert alpha >= 0, "alpha must be>=0"
-
-        super().__init__(**kwargs)
-
-        self.on_train_only = on_train_only
-        self.fields = fields
-        self.alpha = alpha
-        self.lam = 1
-        self.index = None
-        self.is_needed = True
-
-    def on_loader_start(self, state: RunnerState):
-        self.is_needed = not self.on_train_only or \
-            state.loader_name.startswith("train")
-
-    def on_batch_start(self, state: RunnerState):
-        if not self.is_needed:
-            return
-
-        if self.alpha > 0:
-            self.lam = np.random.beta(self.alpha, self.alpha)
-        else:
-            self.lam = 1
-
-        self.index = torch.randperm(state.input[self.fields[0]].shape[0])
-        self.index.to(state.device)
-
-        for f in self.fields:
-            state.input[f] = self.lam * state.input[f] + \
-                (1 - self.lam) * state.input[f][self.index]
-
-    def _compute_loss(self, state: RunnerState, criterion):
-        if not self.is_needed:
-            return super()._compute_loss(state, criterion)
-
-        outputs = state.output[self.output_key]
-        input_a = state.input[self.input_key]
-        input_b = state.input[self.input_key][self.index]
-        assert len(self.loss_weights) == len(outputs)
-        loss = 0
-        for i, output in enumerate(outputs):
-            loss_ = self.lam * criterion(output, input_a) + \
-                   (1 - self.lam) * criterion(output, input_b)
-            loss += loss_ * self.loss_weights[i]
-
-        return loss
-
-
-class TwoHeadsCriterionCallback(Callback):
-    def __init__(
-        self,
-        input_key: str = "targets",
-        output_key: str = "logits",
-        prefix: str = "loss",
-        criterion_key: str = None,
-        loss_key: str = None,
-        multiplier: float = 1.0,
-        loss_weights: List[float] = None,
-    ):
-        self.input_key = input_key
-        self.output_key = output_key
-        self.prefix = prefix
-        self.criterion_key = criterion_key
-        self.loss_key = loss_key
-        self.multiplier = multiplier
-        self.loss_weights = loss_weights
-
-    def _add_loss_to_state(self, state: RunnerState, loss):
-        if self.loss_key is None:
-            if state.loss is not None:
-                if isinstance(state.loss, list):
-                    state.loss.append(loss)
-                else:
-                    state.loss = [state.loss, loss]
-            else:
-                state.loss = loss
-        else:
-            if state.loss is not None:
-                assert isinstance(state.loss, dict)
-                state.loss[self.loss_key] = loss
-            else:
-                state.loss = {self.loss_key: loss}
-
-    def _compute_loss(self, state: RunnerState, criterion):
-        outputs = state.output[self.output_key]
-        outputs1 = state.output["logits1"]
-        input_sirna = state.input[self.input_key]
-        input_cell = state.input['cell_type']
-        loss = 0
-
-        loss += criterion(outputs, input_sirna)
-        loss += nn.CrossEntropyLoss()(outputs1, input_cell)
-
-        return loss
-
-    def on_stage_start(self, state: RunnerState):
-        assert state.criterion is not None
-
-    def on_batch_end(self, state: RunnerState):
-        if state.loader_name.startswith("train"):
-            criterion = state.get_key(
-                key="criterion", inner_key=self.criterion_key
-            )
-        else:
-            criterion = nn.CrossEntropyLoss()
-
-        loss = self._compute_loss(state, criterion) * self.multiplier
-
-        state.metrics.add_batch_value(metrics_dict={
-            self.prefix: loss.item(),
-        })
-
-        self._add_loss_to_state(state, loss)
-
-
-class DSAccuracyCallback(Callback):
-    """
-    Accuracy metric callback.
-    """
-
-    def __init__(
-        self,
-        input_key: str = "targets",
-        output_key: str = "logits",
-        prefix: str = "acc",
-        logit_names: List[str] = None,
-    ):
-        self.prefix = prefix
-        self.metric_fn = accuracy
-        self.input_key = input_key
-        self.output_key = output_key
-        self.logit_names = logit_names
-
-    def on_batch_end(self, state: RunnerState):
-        outputs = state.output[self.output_key]
-        targets = state.input[self.input_key]
-
-        assert len(outputs) == len(self.logit_names)
-
-        batch_metrics = {}
-
-        for logit_name, output in zip(self.logit_names, outputs):
-            metric = self.metric_fn(output, targets)
-            key = f"{self.prefix}_{logit_name}"
-            batch_metrics[key] = metric[0]
-
-        state.metrics.add_batch_value(metrics_dict=batch_metrics)
diff --git a/src/dataset.py b/src/dataset.py
index 33812e4..b4b7dfe 100644
--- a/src/dataset.py
+++ b/src/dataset.py
@@ -489,12 +489,26 @@ def __init__(self,
                  channels=[1, 2, 3, 4, 5, 6],
                  site_mode='random'
                  ):
+        """
+        Dataset to train with control images
+        :param csv_file: Does not matter.
+        :param root: Data root.
+        :param transform: albumentation composed functions.
+        :param sites: Site of image. Can be 1 or 2.
+        :param mode: Train or validation or test mode.
+        :param channels: The channels to be used. Ex: [1,2,3,4,5]
+        :param site_mode: can be ["one", "two", "random"]
+        If site_mode is "one", dataset load the `sites` of the image.
+        If site_mode is "two", dataset load both sides of the image. Now param `sites` does not matter.
+        If site_mode is "random", dataset load random site [1] or [2].
+        """
         print("Channels ", channels)
         print("sites ", sites)
         print(csv_file)
         assert site_mode in ['random', 'two', 'one']
         # df = pd.read_csv(csv_file, nrows=None)
         df = combine_metadata(base_path=root)
+        # df = df.sample(1000)
         df = df[
             (df.well_type != 'treatment') & (df.site != 1)
         ]
diff --git a/src/ensemble.py b/src/ensemble.py
index 76761db..d2014f1 100644
--- a/src/ensemble.py
+++ b/src/ensemble.py
@@ -1,146 +1,18 @@
 import pandas as pd
-import torch
 import numpy as np
-from sklearn.metrics import accuracy_score
+import click
 import json
 from scipy.optimize import linear_sum_assignment
 from scipy.special import softmax
-from tqdm import tqdm
 
 
-expect_dist_dict = {}
-for i in range(1108):
-    expect_dist_dict[i] = 1
+@click.group()
+def cli():
+    print("Ensemble")
 
 
-def get_jaccard_sim(str1, str2):
-    a = set(str1.split())
-    b = set(str2.split())
-    c = a.intersection(b)
-    return float(len(c)) / (len(a) + len(b) - len(c))
-
-
-def get_group(sirna_group, group_label_dict):
-    scores = []
-    for k, v in group_label_dict.items():
-        scores.append(get_jaccard_sim(sirna_group, v))
-    return np.argmax(scores)
-
-
-def get_group_score(sirna_group, group_label_dict):
-    scores = []
-    for k, v in group_label_dict.items():
-        scores.append(get_jaccard_sim(sirna_group, v))
-    return np.max(scores)
-
-
-from ortools.graph import pywrapgraph
-def mcf_cal(X, dict_dist):
-    X = X / X.sum(axis=0)
-    m = X * 1000000000
-    m = m.astype(np.int64)
-    nb_rows, nb_classes = X.shape[0], X.shape[1]
-    mcf = pywrapgraph.SimpleMinCostFlow()
-    # Suppliers: distribution
-    for j in range(nb_classes):
-        mcf.SetNodeSupply(j + nb_rows, int(dict_dist[j]))
-    # Rows
-    for i in range(nb_rows):
-        mcf.SetNodeSupply(i, -1)
-        for j in range(nb_classes):
-            mcf.AddArcWithCapacityAndUnitCost(j + nb_rows, i, 1, int(-m[i][j]))
-    mcf.SolveMaxFlowWithMinCost()
-
-    assignment = np.zeros(nb_rows, dtype=np.int32)
-    for i in range(mcf.NumArcs()):
-        if mcf.Flow(i) > 0:
-            assignment[mcf.Head(i)] = mcf.Tail(i) - nb_rows
-    return assignment
-
-
-public_experiments = [
-    "HEPG2-08",
-    "HUVEC-17",
-    "RPE-08",
-    "U2OS-04",
-]
-
-
-def leaderboard_jaccard(df, leaderboard='public'):
-    print()
-    print("**" * 50)
-    if leaderboard == 'public':
-        gb = df[df.experiment.isin(public_experiments)]
-    else:
-        gb = df[~df.experiment.isin(public_experiments)]
-
-    avg_jaccard = gb['group_label_score'].mean()
-    print("Leader board: {}, average jaccard: {}".format(leaderboard, avg_jaccard))
-    gb = gb.sort_values(by="group_label_score")
-    for exp, plate, score in zip(gb.experiment, gb.plate, gb.group_label_score):
-        print(f"Experiment: {exp}, plate: {plate}, score: {score}")
-
-
-def _get_predicts(predicts, coefficients):
-    return torch.einsum("ij,j->ij", (predicts, coefficients))
-
-
-def _get_labels_distribution(predicts, coefficients):
-    predicts = _get_predicts(predicts, coefficients)
-    labels = predicts.argmax(dim=-1)
-    counter = torch.bincount(labels, minlength=predicts.shape[1])
-    return counter
-
-
-def _compute_score_with_coefficients(predicts, coefficients):
-    counter = _get_labels_distribution(predicts, coefficients).float()
-    counter = counter * 100 / len(predicts)
-    max_scores = torch.ones(len(coefficients)).cuda().float() * 100 / len(coefficients)
-    result, _ = torch.min(torch.cat([counter.unsqueeze(0), max_scores.unsqueeze(0)], dim=0), dim=0)
-
-    return float(result.sum().cpu())
-
-
-def _find_best_coefficients(predicts, coefficients, alpha=0.001, iterations=100):
-    best_coefficients = coefficients.clone()
-    best_score = _compute_score_with_coefficients(predicts, coefficients)
-
-    for _ in range(iterations):
-        counter = _get_labels_distribution(predicts, coefficients)
-        label = int(torch.argmax(counter).cpu())
-        coefficients[label] -= alpha
-        score = _compute_score_with_coefficients(predicts, coefficients)
-        if score > best_score:
-            best_score = score
-            best_coefficients = coefficients.clone()
-
-    return best_coefficients
-
-
-def pavel_calib(y):
-    alpha = 0.01
-
-    coefs = torch.ones(y.shape[1]).cuda().float()
-    last_score = _compute_score_with_coefficients(y, coefs)
-    print("Start score", last_score)
-
-    while alpha >= 0.0001:
-        coefs = _find_best_coefficients(y, coefs, iterations=3000, alpha=alpha)
-        new_score = _compute_score_with_coefficients(y, coefs)
-
-        if new_score <= last_score:
-            alpha *= 0.5
-
-        last_score = new_score
-        print("Score: {}, alpha: {}".format(last_score, alpha))
-
-    predicts = _get_predicts(y, coefs)
-
-    return predicts
-
-
-def load_one_fold_6C5(model, fold):
-    test_pred_6C5 = []
+def load_one_fold(predict_root, model_name, fold):
+    test_preds = []
     for channel in [
         "[1,2,3,4,5]",
         "[1,2,3,4,6]",
@@ -149,127 +21,41 @@ def load_one_fold_6C5(model, fold):
         "[1,3,4,5,6]",
         "[2,3,4,5,6]",
     ]:
-        pred = np.load(f"./prediction_6C5/fold_{fold}/{model}_{channel}_test.npy")
-        test_pred_6C5.append(pred)
-    test_pred_6C5 = np.asarray(test_pred_6C5)
-    test_pred_6C5 = test_pred_6C5.mean(axis=0)
-    return test_pred_6C5
-
-
-def load_one_fold_6C6(model, fold):
-    pred = np.load(f"../prediction_6channels/{model}_6_channel_fold{fold}_Adam.npy")
-    return pred
+        pred = np.load(f"{predict_root}/{channel}/fold_{fold}/{model_name}/pred_test.npy")
+        test_preds.append(pred)
+    test_preds = np.asarray(test_preds)
+    test_preds = test_preds.mean(axis=0)
+    return test_preds
 
 
-def load_one_fold_6C6_1139(model, fold):
-    pred = np.load(f"./prediction_6channels_1139/fold_{fold}/{model}_[1,2,3,4,5,6]_test.npy")
-    return pred[:, :1108]
-
-
-def load_kfold_6C6(model):
-    pred = 0
-    for fold in range(5):
-        pred += load_one_fold_6C6(model, fold)
-    return pred / 5
-
-
-def load_kfold_6C6_1139(model):
-    pred = 0
+def load_kfold(predict_root, model_name):
+    preds = 0
     for fold in range(5):
-        pred += load_one_fold_6C6_1139(model, fold)
-    return pred / 5
-
-
-def load_kfold_6C5(model):
-    pred = 0
-    for fold in range(5):
-        pred += load_one_fold_6C5(model, fold)
-    return pred / 5
-
-
-def load_pseudo_kfold():
-    pred = 0
-    for fold in range(5):
-        pred += np.load(f"./prediction/pseudo/fold_{fold}/se_resnext50_32x4d_[1,2,3,4,5]_test.npy")
-    pred = pred / 5
-    return pred
-
-
-def load_baseline_kfold():
-    pred = 0
-    for fold in range(5):
-        pred += np.load(f"./prediction_6C5/fold_{fold}/se_resnext50_32x4d_[1,2,3,4,5]_test.npy")
-    pred = pred / 5
-    return pred
-
-
-if __name__ == '__main__':
-    model = "se_resnext50_32x4d"
-
-    resnext101_6C4_new = np.load("test_pred_se_resnext101_32x4d_14runs_fold4.npy")
-    kfold_6channels = load_kfold_6C6(model)
-    kfold_6C5 = load_kfold_6C5(model)
-    posneg_6C6_1139 = load_kfold_6C6_1139(model)
-    posneg_1139 = np.load("../submission/se_resnext50_32x4d_1139classes_1108.npy")
-    posneg_1139_2 = np.load("../submission/se_resnext50_32x4d_c1234_s1_affine_warmup_1139.npy")
-    posneg_1139_2 = posneg_1139_2[:, :1108]
-
-    yu4u_logits = 0
-    yu4u_logits += np.load("../yu4u_logits/seed_0.npy")
-    yu4u_logits += np.load("../yu4u_logits/seed_1.npy")
-    yu4u_logits += np.load("../yu4u_logits/seed_2.npy")
-    yu4u_logits /= 3
-
-    kfold_pseudo_12345 = load_pseudo_kfold()
-    kfold_baseline_12345 = load_baseline_kfold()
-
-    baseline = np.load("./prediction_6C5/fold_0/se_resnext50_32x4d_[1,2,3,4,5]_test.npy")
-    pseudo_0 = np.load("./submission/se_resnext50_32x4d_pseudo.npy")
-    dontdrop = np.load("./submission/se_resnext50_32x4d_dont_drop_1.npy")
-
-    test_df = pd.read_csv("/raid/data/kaggle/recursion-cellular-image-classification/test.csv")
-    test_pred = softmax(kfold_pseudo_12345, axis=1)
-
-    # Load group label
-    group_labels = np.load("group_labels.npy", allow_pickle=True)
-    group_label_dict = {}
-    for i, group_label in enumerate(group_labels):
-        group_label_dict[i] = group_label
-
-    expect_dist_dict = {}
-    for i in range(1108):
-        expect_dist_dict[i] = 1
-
-    test_exp = test_df.experiment.unique()
-    test_pred_no = np.zeros((test_df.shape[0],))
-    test_pred_cal = np.zeros((test_df.shape[0],))
-
-    for exp in tqdm.tqdm(test_exp, total=len(test_exp), desc='Experiment level calibration '):
-        exp_df = test_df[test_df.experiment == exp]
-        exp_pred = test_pred[exp_df.index]
-        exp_pred_cls = np.argmax(exp_pred, axis=1)
-        test_pred_no[exp_df.index] = exp_pred_cls
-        calib_cls = mcf_cal(exp_pred, expect_dist_dict)
-        test_pred_cal[exp_df.index] = calib_cls
-
-    test_df["sirna_pred"] = test_pred_no.astype(int)
-    test_df["sirna_cali"] = test_pred_cal.astype(int)
-
-    gb = test_df.groupby(['plate', 'experiment']).agg({
-        'sirna_cali': ['unique']
-    }).reset_index()
-    gb.columns = ["plate", "experiment", 'sirna_unique']
-    gb['sirna_unique'] = gb['sirna_unique'].apply(lambda x: " ".join([str(i) for i in np.sort(x)]))
-    gb["group_label"] = gb["sirna_unique"].apply(lambda x: get_group(x, group_label_dict))
-    gb["group_label_score"] = gb["sirna_unique"].apply(lambda x: get_group_score(x, group_label_dict))
-
-    # Evaludate public jaccard scores
-    leaderboard_jaccard(gb, leaderboard='public')
-    leaderboard_jaccard(gb, leaderboard='private')
-
-    prob = test_pred
-
-    with open('output.json', 'r') as f:
+        preds += load_one_fold(predict_root, model_name, fold) / 5
+    return preds
+
+
+@cli.command()
+@click.option('--data_root', type=str, default='/data/')
+@click.option('--predict_root', type=str, default='/logs/pseudo/')
+@click.option('--group_json', type=str, default='group.json')
+def ensemble(
+    data_root='/data/',
+    predict_root='/logs/pseudo/',
+    group_json="group.json",
+):
+    model_names = ['se_resnext50_32x4d']
+    ensemble_preds = 0
+    for model_name in model_names:
+        ensemble_preds += load_kfold(predict_root, model_name)
+
+    # Just a maigc
+    ensemble_preds = ensemble_preds / 121
+
+    test_df = pd.read_csv(f"{data_root}/test.csv")
+    ensemble_preds = softmax(ensemble_preds, axis=1)
+
+    with open(group_json, 'r') as f:
         m = json.load(f)
 
     id_codes = test_df.id_code.values
@@ -287,11 +73,8 @@ def load_baseline_kfold():
         test_plate_id = id_codes[start_id][:-4]
         label_group_id = test_plate_id_to_group_id[test_plate_id]
         group_labels = label_group_list[label_group_id]
-        plate_prob = prob[start_id:end_id, group_labels]
-        #     plate_prob = softmax(plate_prob, axis=1)
+        plate_prob = ensemble_preds[start_id:end_id, group_labels]
         plate_prob = plate_prob / plate_prob.sum(axis=0, keepdims=True)
-        # TODO: adjust normalization degree
-        #     print(plate_prob.shape)
         row_ind, col_ind = linear_sum_assignment(1 - plate_prob)
         col_ind = np.array(group_labels)[col_ind]
         sirnas.extend(col_ind)
@@ -299,4 +82,8 @@ def load_baseline_kfold():
     sub = pd.DataFrame.from_dict(
         data={"id_code": id_codes, "sirna": sirnas}
     )
-    sub.to_csv("./submission/kfold_pseudo_12345.csv", index=False)
+    sub.to_csv(f"{predict_root}/submission.csv", index=False)
+
+
+if __name__ == '__main__':
+    cli()
diff --git a/src/experiment.py b/src/experiment.py
index 4d3c10c..7498cd4 100644
--- a/src/experiment.py
+++ b/src/experiment.py
@@ -57,13 +57,13 @@ def get_datasets(self, stage: str, **kwargs):
         channels = kwargs.get('channels', [1, 2, 3, 4, 5, 6])
         site_mode = kwargs.get('site_mode', 'random')
         root = kwargs.get('root', None)
-        dataset = kwargs.get('dataset', "normal")
+        dataset = kwargs.get('dataset', "non_pseudo")
         if dataset == 'pseudo':
             dataset_function = RecursionCellularPseudo
             print("Using pseudo dataset")
-        elif dataset == 'normal':
+        elif dataset == 'non_pseudo':
             dataset_function = RecursionCellularSite
-            print("Using normal dataset")
+            print("Using non pseudo dataset")
         elif dataset == 'control':
             dataset_function = RecursionCellularControl
             print("Using Control dataset")
diff --git a/src/inference.py b/src/inference.py
index 3d8254a..ac74966 100644
--- a/src/inference.py
+++ b/src/inference.py
@@ -1,12 +1,9 @@
-import pandas as pd
 import numpy as np
 
 import torch
 import torch.nn as nn
-import torch.nn.functional as Ftorch
 from torch.utils.data import DataLoader
 import os
-import glob
 import click
 from tqdm import *
 
@@ -18,6 +15,11 @@
 device = torch.device('cuda')
 
 
+@click.group()
+def cli():
+    print("Inference")
+
+
 def predict(model, loader):
     model.eval()
     preds = []
@@ -33,156 +35,85 @@ def predict(model, loader):
     return preds
 
 
-def predict_ds(model, loader):
-    model.eval()
-    preds = []
-    with torch.no_grad():
-        for dct in tqdm(loader, total=len(loader)):
-            images = dct['images'].to(device)
-            pred = model(images)
-            pred = [p.detach().cpu().numpy() for p in pred]
-            preds.append(pred)
-
-    preds = np.concatenate(preds, axis=1)
-    print(preds.shape)
-    return preds
-
-
-def predict_all():
-    test_csv = '/raid/data/kaggle/recursion-cellular-image-classification/test.csv'
-    # test_csv = "./csv/pseudo_all/valid_0"
-    # test_csv = './csv/valid_0.csv'
-    model_name = 'se_resnext50_32x4d'
-
-    channel_str = "[1,2,3,4,5]"
-
-    scheme = "pseudoall2_from_control"
-
-    for fold in [
-        0
-    ]:
-
-        log_dir = f"/raid/bac/kaggle/logs/recursion_cell/{scheme}/{channel_str}/fold_{fold}/{model_name}/"
-        root = "/raid/data/kaggle/recursion-cellular-image-classification/"
-        sites = [1]
-        channels = [int(i) for i in channel_str[1:-1].split(',')]
-
-        ckp = os.path.join(log_dir, "checkpoints/best.pth")
-        model = cell_senet(
-            model_name=model_name,
-            num_classes=1108,
-            n_channels=len(channels) * len(sites),
-            # weight=f"/raid/bac/kaggle/logs/recursion_cell/test/190826/controls/{model_name}/checkpoints/best.pth",
-        )
-
-        checkpoint = f"{log_dir}/checkpoints/best.pth"
-        checkpoint = torch.load(checkpoint)
-        model.load_state_dict(checkpoint['model_state_dict'])
-        model = model.to(device)
-        model = nn.DataParallel(model)
-
-        print("*" * 50)
-        print(f"checkpoint: {ckp}")
-        print(f"Channel: {channel_str}")
-        preds = []
-        for site in [1, 2]:
-            # Dataset
-            dataset = RecursionCellularSite(
-                csv_file=test_csv,
-                root=root,
-                transform=valid_aug(512),
-                mode='test',
-                sites=[site],
-                channels=channels,
-                site_mode="one",
-            )
-
-            loader = DataLoader(
-                dataset=dataset,
-                batch_size=128,
-                shuffle=False,
-                num_workers=4,
-            )
-
-            pred = predict(model, loader)
-            preds.append(pred)
-
-        preds = np.asarray(preds).mean(axis=0)
-        all_preds = np.argmax(preds, axis=1)
-        df = pd.read_csv(test_csv)
-        submission = df.copy()
-        submission['sirna'] = all_preds.astype(int)
-        os.makedirs(f"./prediction2/{scheme}/fold_{fold}/", exist_ok=True)
-        submission.to_csv(f'./prediction2/{scheme}/fold_{fold}/{model_name}_{channel_str}_test.csv', index=False, columns=['id_code', 'sirna'])
-        np.save(f"./prediction2/{scheme}/fold_{fold}/{model_name}_{channel_str}_test.npy", preds)
-
+@cli.command()
+@click.option('--data_root', type=str, default='/data/')
+@click.option('--model_root', type=str, default='/logs/pseudo/')
+@click.option('--model_name', type=str, default='se_resnext50_32x4d')
+@click.option('--out_dir', type=str, default='./predictions/pseudo/')
+def predict_all(
+    data_root='/data/',
+    model_root='/logs/pseudo/',
+    model_name='se_resnext50_32x4d',
+    out_dir='./predictions/pseudo/'
+):
+    test_csv = f'{data_root}/test.csv'
 
-def predict_deepsupervision():
-    test_csv = '/raid/data/kaggle/recursion-cellular-image-classification/test.csv'
-    # test_csv = './csv/valid_0.csv'
-    model_name = 'DSInceptionV3'
-    experiment = '6channels_sgd'
+    assert model_name in ["se_resnext50_32x4d", "se_resnext101_32x4d", "densenet121"]
 
     for channel_str in [
-        "[1,2,3,4,5,6]",
+        "[1,2,3,4,5]",
+        "[1,2,3,4,6]",
+        "[1,2,3,5,6]",
+        "[1,2,4,5,6]",
+        "[1,3,4,5,6]",
+        "[2,3,4,5,6]",
     ]:
-
-        log_dir = f"/raid/bac/kaggle/logs/recursion_cell/test/190731/fold_0/{model_name}/"
-        root = "/raid/data/kaggle/recursion-cellular-image-classification/"
-        sites = [1]
-        channels = [int(i) for i in channel_str[1:-1].split(',')]
-
-        # log_dir = log_dir.replace('[', '[[]')
-        # log_dir = log_dir.replace(']', '[]]')
-
-        ckp = os.path.join(log_dir, "checkpoints/best.pth")
-        model = DSInceptionV3(
-            num_classes=1108,
-            n_channels=len(channels) * len(sites)
-        )
-
-        checkpoint = torch.load(ckp)
-        model.load_state_dict(checkpoint['model_state_dict'])
-        model = model.to(device)
-        # model = nn.DataParallel(model)
-
-        print("*" * 50)
-        print(f"checkpoint: {ckp}")
-        print(f"Channel: {channel_str}")
-        preds = []
-        for site in [1, 2]:
-            # Dataset
-            dataset = RecursionCellularSite(
-                csv_file=test_csv,
-                root=root,
-                transform=valid_aug(512),
-                mode='test',
-                sites=[site],
-                channels=channels
-            )
-
-            loader = DataLoader(
-                dataset=dataset,
-                batch_size=128,
-                shuffle=False,
-                num_workers=8,
-            )
-
-            pred = predict_ds(model, loader)
-            preds.append(pred)
-
-        preds = np.asarray(preds)#.mean(axis=0)
-        print(preds.shape)
-        # all_preds = np.argmax(preds, axis=1)
-        df = pd.read_csv(test_csv)
-        submission = df.copy()
-        # submission['sirna'] = all_preds.astype(int)
-        os.makedirs("./prediction/DS/", exist_ok=True)
-        # submission.to_csv(f'./prediction/DS/{model_name}_test.csv', index=False, columns=['id_code', 'sirna'])
-        np.save(f"./prediction/DS/{model_name}_{experiment}.npy", preds)
+        for fold in [0, 1, 2, 3, 4]:
+            log_dir = f"{model_root}/{channel_str}/fold_{fold}/{model_name}/"
+            sites = [1]
+            channels = [int(i) for i in channel_str[1:-1].split(',')]
+
+            ckp = os.path.join(log_dir, "checkpoints/best.pth")
+
+            if model_name in ["se_resnext50_32x4d", "se_resnext101_32x4d"]:
+                model = cell_senet(
+                    model_name=model_name,
+                    num_classes=1108,
+                    n_channels=len(channels) * len(sites),
+                )
+            else:
+                model = cell_densenet(
+                    model_name=model_name,
+                    num_classes=1108,
+                    n_channels=len(channels) * len(sites),
+                )
+
+            checkpoint = f"{log_dir}/checkpoints/best.pth"
+            checkpoint = torch.load(checkpoint)
+            model.load_state_dict(checkpoint['model_state_dict'])
+            model = model.to(device)
+            model = nn.DataParallel(model)
+
+            print("*" * 50)
+            print(f"checkpoint: {ckp}")
+            # print(f"Channel: {channel_str}")
+            preds = []
+            for site in [1, 2]:
+                # Dataset
+                dataset = RecursionCellularSite(
+                    csv_file=test_csv,
+                    root=data_root,
+                    transform=valid_aug(512),
+                    mode='test',
+                    sites=[site],
+                    channels=channels,
+                    site_mode="one",
+                )
+
+                loader = DataLoader(
+                    dataset=dataset,
+                    batch_size=128,
+                    shuffle=False,
+                    num_workers=4,
+                )
+
+                pred = predict(model, loader)
+                preds.append(pred)
+
+            preds = np.asarray(preds).mean(axis=0)
+            os.makedirs(f"{out_dir}/{channel_str}/fold_{fold}/{model_name}/", exist_ok=True)
+            np.save(f"{out_dir}/{channel_str}/fold_{fold}/{model_name}/pred_test.npy", preds)
 
 
 if __name__ == '__main__':
-    # predict_all()
-    # predict_deepsupervision()
-    predict_all()
+    cli()
diff --git a/src/models/__init__.py b/src/models/__init__.py
index 6f7ae3e..83ad6a3 100644
--- a/src/models/__init__.py
+++ b/src/models/__init__.py
@@ -1,9 +1,3 @@
 from .resnet import ResNet, ResNet50CutMix
-from .senet import cell_senet, SENetTIMM, SENetGrouplevel, SENetCellType, SENetCellMultipleDropout
-from .densenet import cell_densenet
-from .efficientnet import EfficientNet
-from .inceptionv3 import InceptionV3TIMM
-from .gluon_resnet import GluonResnetTIMM
-from .deepsupervision import DSInceptionV3, DSSENet, DSResnet
-from .fish_net import Fishnet
-from .mixnet import MixNet
\ No newline at end of file
+from .senet import cell_senet
+from .densenet import cell_densenet
\ No newline at end of file
diff --git a/src/models/deepsupervision/__init__.py b/src/models/deepsupervision/__init__.py
deleted file mode 100644
index 70ef4b9..0000000
--- a/src/models/deepsupervision/__init__.py
+++ /dev/null
@@ -1,3 +0,0 @@
-from .inception_v3 import DSInceptionV3
-from .senet import DSSENet
-from .resnet import DSResnet
\ No newline at end of file
diff --git a/src/models/deepsupervision/inception_v3.py b/src/models/deepsupervision/inception_v3.py
deleted file mode 100644
index 746d424..0000000
--- a/src/models/deepsupervision/inception_v3.py
+++ /dev/null
@@ -1,175 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torchvision import models
-from catalyst.contrib.modules.common import Flatten
-from catalyst.contrib.modules.pooling import GlobalConcatPool2d
-
-
-class DSInceptionV3(nn.Module):
-    def __init__(
-        self,
-        num_classes=6,
-        pretrained=True,
-        n_channels=4,
-
-    ):
-        super(DSInceptionV3, self).__init__()
-        self.model = models.inception_v3(
-            pretrained=pretrained,
-            transform_input=False,
-            # aux_logits=False
-        )
-
-        # Adapt number of channels
-        conv1 = self.model.Conv2d_1a_3x3.conv
-        self.model.Conv2d_1a_3x3.conv = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        # self.model.Conv2d_1a_3x3.conv.weight.data[:, :3, :, :] = conv1.weight.data
-        # self.model.Conv2d_1a_3x3.conv.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-        self.deepsuper_2 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(288 * 2),
-            nn.Linear(288 * 2, num_classes)
-        )
-
-        self.deepsuper_4 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(768 * 2),
-            nn.Linear(768 * 2, num_classes)
-        )
-
-        self.deepsuper_6 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(768 * 2),
-            nn.Linear(768 * 2, num_classes)
-        )
-
-        self.deepsuper_8 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(1280 * 2),
-            nn.Linear(1280 * 2, num_classes)
-        )
-
-        self.deepsuper_9 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(2048 * 2),
-            nn.Linear(2048 * 2, num_classes)
-        )
-
-        self.deepsuper_10 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(2048 * 2),
-            nn.Linear(2048 * 2, num_classes)
-        )
-
-        # WARNING: should adapt the Linear layer to be suitable for each image size !!!
-        self.fc = nn.Sequential(
-            nn.Conv2d(in_channels=2048, out_channels=128, kernel_size=(1, 1)),
-            nn.ReLU(),
-            Flatten(),
-            nn.Linear(25088, 1024), # Take care here: 3200 for 224x224, 25088 for 512x512
-            nn.ReLU(),
-            nn.Dropout(0.3),
-            nn.Linear(1024, num_classes)
-        )
-
-        self.is_infer = False
-
-    def freeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-    def forward(self, x):
-        if self.model.transform_input:
-            x = x.clone()
-            x[:, 0] = x[:, 0] * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
-            x[:, 1] = x[:, 1] * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
-            x[:, 2] = x[:, 2] * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
-        # 299 x 299 x 3
-        x = self.model.Conv2d_1a_3x3(x)
-        # 149 x 149 x 32
-        x = self.model.Conv2d_2a_3x3(x)
-        # 147 x 147 x 32
-        x = self.model.Conv2d_2b_3x3(x)
-        # 147 x 147 x 64
-        x = F.max_pool2d(x, kernel_size=3, stride=2)
-        # 73 x 73 x 64
-        x = self.model.Conv2d_3b_1x1(x)
-        # 73 x 73 x 80
-        x = self.model.Conv2d_4a_3x3(x)
-        # 71 x 71 x 192
-        x = F.max_pool2d(x, kernel_size=3, stride=2)  # => Finish first convs
-
-        # 35 x 35 x 192
-        x = self.model.Mixed_5b(x)  # => Finish mixed 0
-        # 35 x 35 x 256
-        x = self.model.Mixed_5c(x)  # => Finish mixed 1
-        # 35 x 35 x 288
-        x = self.model.Mixed_5d(x)  # => Finish mixed 2
-        # import pdb
-        # pdb.set_trace()
-        x_mix_2 = self.deepsuper_2(x)
-        # 35 x 35 x 288
-        x = self.model.Mixed_6a(x) # => Finish mixed 3
-        # 17 x 17 x 768
-        x = self.model.Mixed_6b(x)  # => Finish mixed 4
-        x_mix_4 = self.deepsuper_4(x)
-        # 17 x 17 x 768
-        x = self.model.Mixed_6c(x)  # => Finish mixed 5
-        # 17 x 17 x 768
-        x = self.model.Mixed_6d(x)  # => Finish mixed 6
-        x_mix_6 = self.deepsuper_6(x)
-        # 17 x 17 x 768
-        x = self.model.Mixed_6e(x)  # => Finish mixed 7
-        # 17 x 17 x 768
-        # if self.model.training and self.model.aux_logits:
-        #     aux = self.model.AuxLogits(x)
-        # 17 x 17 x 768
-        x = self.model.Mixed_7a(x)   # => Finish mixed 8
-        x_mix_8 = self.deepsuper_8(x)
-        # 8 x 8 x 1280
-        x = self.model.Mixed_7b(x)   # => Finish mixed 9
-        x_mix_9 = self.deepsuper_9(x)
-        # 8 x 8 x 2048
-        x = self.model.Mixed_7c(x)   # => Finish mixed 10
-        # 8 x 8 x 2048
-
-        # here is the model output
-        x_mix_10 = self.deepsuper_10(x)
-        x_final = self.fc(x)
-
-        return x_mix_2, x_mix_4, x_mix_6, x_mix_8, x_mix_9, x_mix_10, x_final
-
-    def freeze(self):
-        # Freeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        # Unfreeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-if __name__ == '__main__':
-    model = DSInceptionV3()
\ No newline at end of file
diff --git a/src/models/deepsupervision/resnet.py b/src/models/deepsupervision/resnet.py
deleted file mode 100644
index e8974d3..0000000
--- a/src/models/deepsupervision/resnet.py
+++ /dev/null
@@ -1,110 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torchvision import models
-from catalyst.contrib.modules.common import Flatten
-from catalyst.contrib.modules.pooling import GlobalConcatPool2d
-from cnn_finetune import make_model
-
-
-class DSResnet(nn.Module):
-    def __init__(
-        self,
-        model_name='resnet50',
-        num_classes=6,
-        pretrained=True,
-        n_channels=4,
-
-    ):
-        super(DSResnet, self).__init__()
-        self.model = make_model(
-            model_name=model_name,
-            num_classes=num_classes,
-            pretrained=True,
-            dropout_p=0.3
-        )
-        # print(self.model)
-        conv1 = self.model._features[0]
-        self.model._features[0] = nn.Conv2d(in_channels=n_channels,
-                                out_channels=conv1.out_channels,
-                                kernel_size=conv1.kernel_size,
-                                stride=conv1.stride,
-                                padding=conv1.padding,
-                                bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model._features[0].weight.data[:,:3,:,:] = conv1.weight.data
-        self.model._features[0].weight.data[:,3:n_channels,:,:] = conv1.weight.data[:,:int(n_channels-3),:,:]
-
-        # self.deepsuper_1 = nn.Sequential(
-        #     nn.AdaptiveAvgPool2d(),
-        #     Flatten(),
-        #     nn.BatchNorm1d(256),
-        #     nn.Linear(256, num_classes)
-        # )
-
-        self.deepsuper_2 = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            Flatten(),
-            nn.BatchNorm1d(512),
-            nn.Linear(512, num_classes)
-        )
-
-        self.deepsuper_3 = nn.Sequential(
-            nn.AdaptiveAvgPool2d(1),
-            Flatten(),
-            nn.BatchNorm1d(1024),
-            nn.Linear(1024, num_classes)
-        )
-
-        self.is_infer = False
-
-    def freeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-    def forward(self, x):
-        # block 0
-        x = self.model._features[0](x)
-        x = self.model._features[1](x)
-        x = self.model._features[2](x)
-        x = self.model._features[3](x)
-        # x_1 = self.deepsuper_1(x)
-
-        # block 1
-        x = self.model._features[4](x)
-        # block 2
-        x = self.model._features[5](x)
-        x_2 = self.deepsuper_2(x)
-        # block 3
-        x = self.model._features[6](x)
-        x_3 = self.deepsuper_3(x)
-        # block 4
-        x = self.model._features[7](x)
-        x = self.model.pool(x)
-        x = x.view(x.size(0), -1)
-        x_final = self.model._classifier(x)
-
-        return x_2, x_3, x_final
-
-    def freeze(self):
-        # Freeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze(self):
-        # Unfreeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-if __name__ == '__main__':
-    x = torch.zeros((2, 4, 512, 512))
-    model = DSResnet()
-    out = model(x)
\ No newline at end of file
diff --git a/src/models/deepsupervision/senet.py b/src/models/deepsupervision/senet.py
deleted file mode 100644
index ecc4821..0000000
--- a/src/models/deepsupervision/senet.py
+++ /dev/null
@@ -1,110 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-from torchvision import models
-from catalyst.contrib.modules.common import Flatten
-from catalyst.contrib.modules.pooling import GlobalConcatPool2d
-from cnn_finetune import make_model
-
-
-class DSSENet(nn.Module):
-    def __init__(
-        self,
-        model_name='se_resnext50_32x4d',
-        num_classes=6,
-        pretrained=True,
-        n_channels=4,
-
-    ):
-        super(DSSENet, self).__init__()
-        self.model = make_model(
-            model_name=model_name,
-            num_classes=num_classes,
-            pretrained=True,
-            dropout_p=0.3
-        )
-        # print(self.model)
-        conv1 = self.model._features[0].conv1
-        self.model._features[0].conv1 = nn.Conv2d(in_channels=n_channels,
-                                out_channels=conv1.out_channels,
-                                kernel_size=conv1.kernel_size,
-                                stride=conv1.stride,
-                                padding=conv1.padding,
-                                bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model._features[0].conv1.weight.data[:,:3,:,:] = conv1.weight.data
-        self.model._features[0].conv1.weight.data[:,3:n_channels,:,:] = conv1.weight.data[:,:int(n_channels-3),:,:]
-
-        self.deepsuper_1 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(256 * 2),
-            nn.Linear(256 * 2, num_classes)
-        )
-
-        self.deepsuper_2 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(512 * 2),
-            nn.Linear(512 * 2, num_classes)
-        )
-
-        self.deepsuper_3 = nn.Sequential(
-            GlobalConcatPool2d(),
-            Flatten(),
-            nn.BatchNorm1d(1024 * 2),
-            nn.Linear(1024 * 2, num_classes)
-        )
-
-        # WARNING: should adapt the Linear layer to be suitable for each image size !!!
-        self.fc = nn.Sequential(
-            nn.Conv2d(in_channels=2048, out_channels=128, kernel_size=(1, 1)),
-            nn.ReLU(),
-            Flatten(),
-            nn.Linear(32768, 1024), # Take care here: 3200 for 224x224, 25088 for 512x512
-            nn.ReLU(),
-            nn.Dropout(0.3),
-            nn.Linear(1024, num_classes)
-        )
-
-        self.is_infer = False
-
-    def freeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze_base(self):
-        # pass
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-    def forward(self, x):
-        x = self.model._features[0](x)
-        x = self.model._features[1](x)
-        x_1 = self.deepsuper_1(x)
-        x = self.model._features[2](x)
-        x_2 = self.deepsuper_2(x)
-        x = self.model._features[3](x)
-        x_3 = self.deepsuper_3(x)
-        x = self.model._features[4](x)
-        x_final = self.fc(x)
-
-        return x_1, x_2, x_3, x_final
-
-    def freeze(self):
-        # Freeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze(self):
-        # Unfreeze all the backbone
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-if __name__ == '__main__':
-    x = torch.zeros((2, 4, 512, 512))
-    model = DSSENet()
-    out = model(x)
\ No newline at end of file
diff --git a/src/models/efficientnet.py b/src/models/efficientnet.py
deleted file mode 100644
index e9880d6..0000000
--- a/src/models/efficientnet.py
+++ /dev/null
@@ -1,39 +0,0 @@
-import torch.nn as nn
-import pretrainedmodels
-from cnn_finetune import make_model
-import timm
-from .utils import *
-
-
-class EfficientNet(nn.Module):
-    def __init__(self,  model_name="tf_efficientnet_b5",
-                        num_classes=1108,
-                        n_channels=6):
-        super(EfficientNet, self).__init__()
-
-        self.model = timm.create_model(model_name, pretrained=True, num_classes=num_classes)
-        conv1 = self.model.conv_stem
-        self.model.conv_stem = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.conv_stem.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.conv_stem.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-    def forward(self, x):
-        return self.model(x)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.model.classifier.parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = True
diff --git a/src/models/fish_net.py b/src/models/fish_net.py
deleted file mode 100644
index 4d802c9..0000000
--- a/src/models/fish_net.py
+++ /dev/null
@@ -1,53 +0,0 @@
-from .fishnet import fishnet99, fishnet150
-from cnn_finetune import make_model
-from .utils import *
-
-
-class Fishnet(nn.Module):
-    def __init__(self,  model_name="fishnet99",
-                        pretrained=None,
-                        num_classes=1108,
-                        n_channels=6):
-        super(Fishnet, self).__init__()
-
-        if model_name == 'fishnet99':
-            self.model = fishnet99(
-                pretrained=pretrained,
-                n_class=num_classes
-            )
-            self.fc = self.model.fish.fish[9][4][1]
-        elif model_name == 'fishnet150':
-            self.model = fishnet150(
-                pretrained=pretrained,
-                n_class=num_classes
-            )
-            self.fc = self.model.fish.fish[9][4][1]
-        else:
-            raise Exception("Invalid model name !")
-
-        conv1 = self.model.conv1[0]
-        self.model.conv1[0] = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.conv1[0].weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.conv1[0].weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-    def forward(self, x):
-        out, score_feat = self.model(x)
-        return out
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.fc.parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.fc.parameters():
-            param.requires_grad = True
diff --git a/src/models/fishnet/__init__.py b/src/models/fishnet/__init__.py
deleted file mode 100644
index a0759cb..0000000
--- a/src/models/fishnet/__init__.py
+++ /dev/null
@@ -1,2 +0,0 @@
-from .net_factory import *
-from torchvision.models import *
\ No newline at end of file
diff --git a/src/models/fishnet/fish_block.py b/src/models/fishnet/fish_block.py
deleted file mode 100644
index 64a18e7..0000000
--- a/src/models/fishnet/fish_block.py
+++ /dev/null
@@ -1,73 +0,0 @@
-import torch.nn as nn
-
-
-class Bottleneck(nn.Module):
-    def __init__(self, inplanes, planes, stride=1, mode='NORM', k=1, dilation=1):
-        """
-        Pre-act residual block, the middle transformations are bottle-necked
-        :param inplanes:
-        :param planes:
-        :param stride:
-        :param downsample:
-        :param mode: NORM | UP
-        :param k: times of additive
-        """
-
-        super(Bottleneck, self).__init__()
-        self.mode = mode
-        self.relu = nn.ReLU(inplace=True)
-        self.k = k
-
-        btnk_ch = planes // 4
-        self.bn1 = nn.BatchNorm2d(inplanes)
-        self.conv1 = nn.Conv2d(inplanes, btnk_ch, kernel_size=1, bias=False)
-
-        self.bn2 = nn.BatchNorm2d(btnk_ch)
-        self.conv2 = nn.Conv2d(btnk_ch, btnk_ch, kernel_size=3, stride=stride, padding=dilation,
-                               dilation=dilation, bias=False)
-
-        self.bn3 = nn.BatchNorm2d(btnk_ch)
-        self.conv3 = nn.Conv2d(btnk_ch, planes, kernel_size=1, bias=False)
-
-        if mode == 'UP':
-            self.shortcut = None
-        elif inplanes != planes or stride > 1:
-            self.shortcut = nn.Sequential(
-                nn.BatchNorm2d(inplanes),
-                self.relu,
-                nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False)
-            )
-        else:
-            self.shortcut = None
-
-    def _pre_act_forward(self, x):
-        residual = x
-
-        out = self.bn1(x)
-        out = self.relu(out)
-        out = self.conv1(out)
-
-        out = self.bn2(out)
-        out = self.relu(out)
-        out = self.conv2(out)
-
-        out = self.bn3(out)
-        out = self.relu(out)
-        out = self.conv3(out)
-
-        if self.mode == 'UP':
-            residual = self.squeeze_idt(x)
-        elif self.shortcut is not None:
-            residual = self.shortcut(residual)
-
-        out += residual
-
-        return out
-
-    def squeeze_idt(self, idt):
-        n, c, h, w = idt.size()
-        return idt.view(n, c // self.k, self.k, h, w).sum(2)
-
-    def forward(self, x):
-        out = self._pre_act_forward(x)
-        return out
\ No newline at end of file
diff --git a/src/models/fishnet/fishnet.py b/src/models/fishnet/fishnet.py
deleted file mode 100644
index df2263a..0000000
--- a/src/models/fishnet/fishnet.py
+++ /dev/null
@@ -1,224 +0,0 @@
-'''
-FishNet
-Author: Shuyang Sun
-'''
-from __future__ import division
-import torch
-import math
-from .fish_block import *
-
-
-__all__ = ['fish']
-
-
-class Fish(nn.Module):
-    def __init__(self, block, num_cls=1000, num_down_sample=5, num_up_sample=3, trans_map=(2, 1, 0, 6, 5, 4),
-                 network_planes=None, num_res_blks=None, num_trans_blks=None):
-        super(Fish, self).__init__()
-        self.block = block
-        self.trans_map = trans_map
-        self.upsample = nn.Upsample(scale_factor=2)
-        self.down_sample = nn.MaxPool2d(2, stride=2)
-        self.num_cls = num_cls
-        self.num_down = num_down_sample
-        self.num_up = num_up_sample
-        self.network_planes = network_planes[1:]
-        self.depth = len(self.network_planes)
-        self.num_trans_blks = num_trans_blks
-        self.num_res_blks = num_res_blks
-        self.fish = self._make_fish(network_planes[0])
-
-    def _make_score(self, in_ch, out_ch=1000, has_pool=False):
-        bn = nn.BatchNorm2d(in_ch)
-        relu = nn.ReLU(inplace=True)
-        conv_trans = nn.Conv2d(in_ch, in_ch // 2, kernel_size=1, bias=False)
-        bn_out = nn.BatchNorm2d(in_ch // 2)
-        conv = nn.Sequential(bn, relu, conv_trans, bn_out, relu)
-        if has_pool:
-            fc = nn.Sequential(
-                nn.AdaptiveAvgPool2d(1),
-                nn.Conv2d(in_ch // 2, out_ch, kernel_size=1, bias=True))
-        else:
-            fc = nn.Conv2d(in_ch // 2, out_ch, kernel_size=1, bias=True)
-        return [conv, fc]
-
-    def _make_se_block(self, in_ch, out_ch):
-        bn = nn.BatchNorm2d(in_ch)
-        sq_conv = nn.Conv2d(in_ch, out_ch // 16, kernel_size=1)
-        ex_conv = nn.Conv2d(out_ch // 16, out_ch, kernel_size=1)
-        return nn.Sequential(bn,
-                             nn.ReLU(inplace=True),
-                             nn.AdaptiveAvgPool2d(1),
-                             sq_conv,
-                             nn.ReLU(inplace=True),
-                             ex_conv,
-                             nn.Sigmoid())
-
-    def _make_residual_block(self, inplanes, outplanes, nstage, is_up=False, k=1, dilation=1):
-        layers = []
-
-        if is_up:
-            layers.append(self.block(inplanes, outplanes, mode='UP', dilation=dilation, k=k))
-        else:
-            layers.append(self.block(inplanes, outplanes, stride=1))
-        for i in range(1, nstage):
-            layers.append(self.block(outplanes, outplanes, stride=1, dilation=dilation))
-        return nn.Sequential(*layers)
-
-    def _make_stage(self, is_down_sample, inplanes, outplanes, n_blk, has_trans=True,
-                    has_score=False, trans_planes=0, no_sampling=False, num_trans=2, **kwargs):
-        sample_block = []
-        if has_score:
-            sample_block.extend(self._make_score(outplanes, outplanes * 2, has_pool=False))
-
-        if no_sampling or is_down_sample:
-            res_block = self._make_residual_block(inplanes, outplanes, n_blk, **kwargs)
-        else:
-            res_block = self._make_residual_block(inplanes, outplanes, n_blk, is_up=True, **kwargs)
-
-        sample_block.append(res_block)
-
-        if has_trans:
-            trans_in_planes = self.in_planes if trans_planes == 0 else trans_planes
-            sample_block.append(self._make_residual_block(trans_in_planes, trans_in_planes, num_trans))
-
-        if not no_sampling and is_down_sample:
-            sample_block.append(self.down_sample)
-        elif not no_sampling:  # Up-Sample
-            sample_block.append(self.upsample)
-
-        return nn.ModuleList(sample_block)
-
-    def _make_fish(self, in_planes):
-        def get_trans_planes(index):
-            map_id = self.trans_map[index-self.num_down-1] - 1
-            p = in_planes if map_id == -1 else cated_planes[map_id]
-            return p
-
-        def get_trans_blk(index):
-            return self.num_trans_blks[index-self.num_down-1]
-
-        def get_cur_planes(index):
-            return self.network_planes[index]
-
-        def get_blk_num(index):
-            return self.num_res_blks[index]
-
-        cated_planes, fish = [in_planes] * self.depth, []
-        for i in range(self.depth):
-            # even num for down-sample, odd for up-sample
-            is_down, has_trans, no_sampling = i not in range(self.num_down, self.num_down+self.num_up+1),\
-                                              i > self.num_down, i == self.num_down
-            cur_planes, trans_planes, cur_blocks, num_trans =\
-                get_cur_planes(i), get_trans_planes(i), get_blk_num(i), get_trans_blk(i)
-
-            stg_args = [is_down, cated_planes[i - 1], cur_planes, cur_blocks]
-
-            if is_down or no_sampling:
-                k, dilation = 1, 1
-            else:
-                k, dilation = cated_planes[i - 1] // cur_planes, 2 ** (i-self.num_down-1)
-
-            sample_block = self._make_stage(*stg_args, has_trans=has_trans, trans_planes=trans_planes,
-                                            has_score=(i==self.num_down), num_trans=num_trans, k=k, dilation=dilation,
-                                            no_sampling=no_sampling)
-            if i == self.depth - 1:
-                sample_block.extend(self._make_score(cur_planes + trans_planes, out_ch=self.num_cls, has_pool=True))
-            elif i == self.num_down:
-                sample_block.append(nn.Sequential(self._make_se_block(cur_planes*2, cur_planes)))
-
-            if i == self.num_down-1:
-                cated_planes[i] = cur_planes * 2
-            elif has_trans:
-                cated_planes[i] = cur_planes + trans_planes
-            else:
-                cated_planes[i] = cur_planes
-            fish.append(sample_block)
-        return nn.ModuleList(fish)
-
-    def _fish_forward(self, all_feat):
-        def _concat(a, b):
-            return torch.cat([a, b], dim=1)
-
-        def stage_factory(*blks):
-            def stage_forward(*inputs):
-                if stg_id < self.num_down:  # tail
-                    tail_blk = nn.Sequential(*blks[:2])
-                    return tail_blk(*inputs)
-                elif stg_id == self.num_down:
-                    score_blks = nn.Sequential(*blks[:2])
-                    score_feat = score_blks(inputs[0])
-                    att_feat = blks[3](score_feat)
-                    return blks[2](score_feat) * att_feat + att_feat
-                else:  # refine
-                    feat_trunk = blks[2](blks[0](inputs[0]))
-                    feat_branch = blks[1](inputs[1])
-                return _concat(feat_trunk, feat_branch)
-            return stage_forward
-
-        stg_id = 0
-        # tail:
-        while stg_id < self.depth:
-            stg_blk = stage_factory(*self.fish[stg_id])
-            if stg_id <= self.num_down:
-                in_feat = [all_feat[stg_id]]
-            else:
-                trans_id = self.trans_map[stg_id-self.num_down-1]
-                in_feat = [all_feat[stg_id], all_feat[trans_id]]
-
-            all_feat[stg_id + 1] = stg_blk(*in_feat)
-            stg_id += 1
-            # loop exit
-            if stg_id == self.depth:
-                score_feat = self.fish[self.depth-1][-2](all_feat[-1])
-                score = self.fish[self.depth-1][-1](score_feat)
-                return score, score_feat
-
-    def forward(self, x):
-        all_feat = [None] * (self.depth + 1)
-        all_feat[0] = x
-        return self._fish_forward(all_feat)
-
-
-class FishNet(nn.Module):
-    def __init__(self, block, **kwargs):
-        super(FishNet, self).__init__()
-
-        inplanes = kwargs['network_planes'][0]
-        # resolution: 224x224
-        self.conv1 = self._conv_bn_relu(3, inplanes // 2, stride=2)
-        self.conv2 = self._conv_bn_relu(inplanes // 2, inplanes // 2)
-        self.conv3 = self._conv_bn_relu(inplanes // 2, inplanes)
-        self.pool1 = nn.MaxPool2d(3, padding=1, stride=2)
-        # construct fish, resolution 56x56
-        self.fish = Fish(block, **kwargs)
-        self._init_weights()
-
-    def _conv_bn_relu(self, in_ch, out_ch, stride=1):
-        return nn.Sequential(nn.Conv2d(in_ch, out_ch, kernel_size=3, padding=1, stride=stride, bias=False),
-                             nn.BatchNorm2d(out_ch),
-                             nn.ReLU(inplace=True))
-
-    def _init_weights(self):
-        for m in self.modules():
-            if isinstance(m, nn.Conv2d):
-                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
-                m.weight.data.normal_(0, math.sqrt(2. / n))
-            elif isinstance(m, nn.BatchNorm2d):
-                m.weight.data.fill_(1)
-                m.bias.data.zero_()
-
-    def forward(self, x):
-        x = self.conv1(x)
-        x = self.conv2(x)
-        x = self.conv3(x)
-        x = self.pool1(x)
-        score, score_feat = self.fish(x)
-        # 1*1 output
-        out = score.view(x.size(0), -1)
-
-        return out, score_feat
-
-
-def fish(**kwargs):
-    return FishNet(Bottleneck, **kwargs)
\ No newline at end of file
diff --git a/src/models/fishnet/net_factory.py b/src/models/fishnet/net_factory.py
deleted file mode 100644
index 1af3462..0000000
--- a/src/models/fishnet/net_factory.py
+++ /dev/null
@@ -1,70 +0,0 @@
-import torch
-import torch.nn as nn
-from .fishnet import fish
-
-
-def fishnet150(pretrained, n_class, **kwargs):
-    """
-    :return:
-    """
-    net_cfg = {
-        #  input size:   [224, 56, 28,  14 | 7,   14,  28 | 56,   28,  14]
-        # output size:   [56,  28, 14,   7 | 14,  28,  56 | 28,   14,   7]
-        #                  |    |    |   |    |    |    |    |     |    |
-        'network_planes': [64, 128, 256, 512, 512, 512, 384, 256, 320, 832, 1600],
-        'num_res_blks': [2, 4, 8, 4, 2, 2, 2, 2, 2, 4],
-        'num_trans_blks': [2, 2, 2, 2, 2, 4],
-        'num_cls': 1000,
-        'num_down_sample': 3,
-        'num_up_sample': 3,
-    }
-    cfg = {**net_cfg, **kwargs}
-    model = fish(**cfg)
-
-    if pretrained:
-        state_dict = torch.load(pretrained)['state_dict']
-        new_state_dict = {}
-        for k, v in state_dict.items():
-            # Remove module prefix
-            k = k.replace('module.', '')
-            new_state_dict[k] = v
-
-        model.load_state_dict(new_state_dict, strict=True)
-
-    model.fish.fish[9][4][1] = nn.Conv2d(1056, n_class, kernel_size=(1, 1), stride=(1, 1))
-
-    return model
-
-
-def fishnet99(pretrained, n_class, **kwargs):
-    """
-    :return:
-    """
-    net_cfg = {
-        #  input size:   [224, 56, 28,  14 | 7,   14,  28 | 56,   28,  14]
-        # output size:   [56,  28, 14,   7 | 14,  28,  56 | 28,   14,   7]
-        #                  |    |    |   |    |    |    |    |     |    |
-        'network_planes': [64, 128, 256, 512, 512, 512, 384, 256, 320, 832, 1600],
-        'num_res_blks': [2, 2, 6, 2, 1, 1, 1, 1, 2, 2],
-        'num_trans_blks': [1, 1, 1, 1, 1, 4],
-        'num_cls': 1000,
-        'num_down_sample': 3,
-        'num_up_sample': 3,
-    }
-    cfg = {**net_cfg, **kwargs}
-
-    model = fish(**cfg)
-
-    if pretrained:
-        state_dict = torch.load(pretrained)['state_dict']
-        new_state_dict = {}
-        for k, v in state_dict.items():
-            # Remove module prefix
-            k = k.replace('module.', '')
-            new_state_dict[k] = v
-
-        model.load_state_dict(new_state_dict, strict=True)
-
-    model.fish.fish[9][4][1] = nn.Conv2d(1056, n_class, kernel_size=(1, 1), stride=(1, 1))
-
-    return model
\ No newline at end of file
diff --git a/src/models/gluon_resnet.py b/src/models/gluon_resnet.py
deleted file mode 100644
index 21f4164..0000000
--- a/src/models/gluon_resnet.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from cnn_finetune import make_model
-import timm
-from .utils import *
-
-
-class GluonResnetTIMM(nn.Module):
-    def __init__(self,  model_name="gluon_resnet50_v1d",
-                        num_classes=1108,
-                        n_channels=6):
-        super(GluonResnetTIMM, self).__init__()
-
-        self.model = timm.create_model(model_name, pretrained=True, num_classes=num_classes)
-        print(self.model)
-        conv1 = self.model.conv1
-        self.model.conv1 = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.conv1.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.conv1.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-    def forward(self, x):
-        return self.model(x)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.model.get_classifier().parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.model.get_classifier().parameters():
-            param.requires_grad = True
\ No newline at end of file
diff --git a/src/models/inceptionv3.py b/src/models/inceptionv3.py
deleted file mode 100644
index 37d4093..0000000
--- a/src/models/inceptionv3.py
+++ /dev/null
@@ -1,37 +0,0 @@
-from cnn_finetune import make_model
-import timm
-from .utils import *
-
-
-class InceptionV3TIMM(nn.Module):
-    def __init__(self,  model_name="gluon_inception_v3",
-                        num_classes=1108,
-                        n_channels=6):
-        super(InceptionV3TIMM, self).__init__()
-
-        self.model = timm.create_model(model_name, pretrained=True, num_classes=num_classes)
-        conv1 = self.model.Conv2d_1a_3x3.conv
-        self.model.Conv2d_1a_3x3.conv = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.Conv2d_1a_3x3.conv.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.Conv2d_1a_3x3.conv.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-    def forward(self, x):
-        return self.model(x)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.model.fc.parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.model.fc.parameters():
-            param.requires_grad = True
\ No newline at end of file
diff --git a/src/models/mixnet.py b/src/models/mixnet.py
deleted file mode 100644
index 44e9691..0000000
--- a/src/models/mixnet.py
+++ /dev/null
@@ -1,64 +0,0 @@
-import torch.nn as nn
-import torch
-import pretrainedmodels
-from cnn_finetune import make_model
-import timm
-
-
-class MixNet(nn.Module):
-    def __init__(self,  model_name="tf_efficientnet_b5",
-                        num_classes=1108,
-                        n_channels=6,
-                        weight=None):
-        super(MixNet, self).__init__()
-
-        self.model = timm.create_model(
-            model_name=model_name,
-            pretrained=True,
-            num_classes=31
-        )
-        conv1 = self.model.conv_stem
-        self.model.conv_stem = nn.Conv2d(in_channels=n_channels,
-                                    out_channels=conv1.out_channels,
-                                    kernel_size=conv1.kernel_size,
-                                    stride=conv1.stride,
-                                    padding=conv1.padding,
-                                    bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.conv_stem.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.conv_stem.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-        if weight:
-            model_state_dict = torch.load(weight)['model_state_dict']
-            new_model_state_dict = {}
-            for k, v in model_state_dict.items():
-                new_model_state_dict[k[6:]] = v
-            self.model.load_state_dict(new_model_state_dict)
-            print(f"\n\n******************************* Loaded checkpoint {weight}")
-
-        in_features = self.model.classifier.in_features
-        self.model.classifier = nn.Linear(
-            in_features=in_features, out_features=num_classes
-        )
-
-    def forward(self, x):
-        return self.model(x)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.model.classifier.parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-if __name__ == '__main__':
-    import torch
-    model = MixNet(model_name='mixnet_xl')
-    x = torch.randn((1, 6, 320, 320))
-    y = model(x)
diff --git a/src/models/senet.py b/src/models/senet.py
index 16468e3..7f17840 100644
--- a/src/models/senet.py
+++ b/src/models/senet.py
@@ -4,213 +4,6 @@
 from .utils import *
 
 
-class SENetTIMM(nn.Module):
-    def __init__(self,  model_name="seresnext26_32x4d",
-                        num_classes=1108,
-                        n_channels=6):
-        super(SENetTIMM, self).__init__()
-
-        self.model = timm.create_model(model_name, pretrained=True, num_classes=num_classes)
-        conv1 = self.model.layer0.conv1
-        self.model.layer0.conv1 = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model.layer0.conv1.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model.layer0.conv1.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-    def forward(self, x):
-        return self.model(x)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-        for param in self.model.get_classifier().parameters():
-            param.requires_grad = True
-
-    def unfreeze(self):
-        for param in self.model.get_classifier().parameters():
-            param.requires_grad = True
-
-
-class SENetGrouplevel(nn.Module):
-    def __init__(self,  model_name="seresnext26_32x4d",
-                        num_classes=1108,
-                        n_channels=6):
-        super(SENetGrouplevel, self).__init__()
-
-        self.model = make_model(
-            model_name=model_name,
-            num_classes=num_classes,
-            pretrained=True,
-            dropout_p=0.3
-        )
-        print("*" * 100)
-        print("SENetGrouplevel")
-        conv1 = self.model._features[0].conv1
-        self.model._features[0].conv1 = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model._features[0].conv1.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model._features[0].conv1.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-        self.group_label_embedding = nn.Embedding(num_embeddings=4, embedding_dim=8)
-
-        in_features = self.model._classifier.in_features
-        self.final_fc = nn.Linear(
-            in_features=in_features + 8, out_features=num_classes
-        )
-
-    def forward(self, x, group_label):
-        features = self.model._features(x)
-        features = self.model.pool(features)
-        features = features.view(features.size(0), -1)
-
-        group_embedding = self.group_label_embedding(group_label)
-        features = torch.cat([
-            features, group_embedding
-        ], 1)
-
-        return self.final_fc(features)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-class SENetCellType(nn.Module):
-    def __init__(self,  model_name="seresnext26_32x4d",
-                        num_classes=1108,
-                        n_channels=6):
-        super(SENetCellType, self).__init__()
-
-        self.model = make_model(
-            model_name=model_name,
-            num_classes=num_classes,
-            pretrained=True,
-            dropout_p=0.3
-        )
-        print("*" * 100)
-        print("SENetGrouplevel")
-        conv1 = self.model._features[0].conv1
-        self.model._features[0].conv1 = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model._features[0].conv1.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model._features[0].conv1.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-        in_features = self.model._classifier.in_features
-        self.final_sirna = nn.Linear(
-            in_features=in_features, out_features=num_classes
-        )
-
-        self.final_cell_type = nn.Linear(
-            in_features=in_features, out_features=4
-        )
-
-    def forward(self, x):
-        features = self.model._features(x)
-        features = self.model.pool(features)
-        features = features.view(features.size(0), -1)
-
-        return self.final_sirna(features), self.final_cell_type(features)
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
-class SENetCellMultipleDropout(nn.Module):
-    def __init__(self,  model_name="seresnext26_32x4d",
-                        num_classes=1108,
-                        n_channels=6,
-                        num_samples=4,
-                        weight=None):
-        super(SENetCellMultipleDropout, self).__init__()
-
-        self.model = make_model(
-            model_name=model_name,
-            num_classes=31,
-            pretrained=True
-        )
-        print("*" * 100)
-        print("SENetGrouplevel")
-        conv1 = self.model._features[0].conv1
-        self.model._features[0].conv1 = nn.Conv2d(in_channels=n_channels,
-                                             out_channels=conv1.out_channels,
-                                             kernel_size=conv1.kernel_size,
-                                             stride=conv1.stride,
-                                             padding=conv1.padding,
-                                             bias=conv1.bias)
-
-        # copy pretrained weights
-        self.model._features[0].conv1.weight.data[:, :3, :, :] = conv1.weight.data
-        self.model._features[0].conv1.weight.data[:, 3:n_channels, :, :] = conv1.weight.data[:, :int(n_channels - 3), :, :]
-
-        if weight:
-            model_state_dict = torch.load(weight)['model_state_dict']
-            self.model.load_state_dict(model_state_dict)
-            print(f"\n\n******************************* Loaded checkpoint {weight}")
-
-        in_features = self.model._classifier.in_features
-        self.num_samples = num_samples
-
-        self.classifier = nn.Linear(
-            in_features, num_classes
-        )
-
-    def forward(self, x):
-        features = self.model._features(x)
-        features_flip = torch.flip(features, dims=[3])
-
-        features_flip = self.model.pool(features_flip)
-        features_flip = features_flip.view(features_flip.size(0), -1)
-
-        features = self.model.pool(features)
-        features = features.view(features.size(0), -1)
-
-        out_logits = []
-        for i in range(self.num_samples):
-            if i % 2 == 0:
-                feature = F.dropout(features, p=0.3)
-            else:
-                feature = F.dropout(features_flip, p=0.3)
-            logits = self.classifier(feature)
-            out_logits.append(logits)
-        return out_logits
-
-    def freeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = False
-
-    def unfreeze(self):
-        for param in self.model.parameters():
-            param.requires_grad = True
-
-
 def cell_senet(model_name='se_resnext50_32x4d', num_classes=1108, n_channels=6, weight=None):
     model = make_model(
         model_name=model_name,
diff --git a/src/runner.py b/src/runner.py
index 27b490a..f1fa606 100644
--- a/src/runner.py
+++ b/src/runner.py
@@ -4,34 +4,7 @@
 
 class ModelRunner(Runner):
     def predict_batch(self, batch: Mapping[str, Any]):
-        # import pdb
-        # pdb.set_trace()
-        if 'group_labels' in batch:
-            output = self.model(batch["images"], batch['group_labels'])
-        else:
-            output = self.model(batch["images"])
+        output = self.model(batch["images"])
         return {
             "logits": output
         }
-
-    # def _run_stage(self, stage: str):
-    #     self._prepare_state(stage)
-    #     loaders = self.experiment.get_loaders(stage)
-    #     self.callbacks = self.experiment.get_callbacks(stage)
-    #
-    #     self._run_event("stage_start")
-    #     for epoch in range(18, self.state.num_epochs):
-    #         self.state.stage_epoch = epoch
-    #
-    #         self._run_event("epoch_start")
-    #         self._run_epoch(loaders)
-    #         self._run_event("epoch_end")
-    #
-    #         if self._check_run and self.state.epoch >= 3:
-    #             break
-    #         if self.state.early_stop:
-    #             self.state.early_stop = False
-    #             break
-    #
-    #         self.state.epoch += 1
-    #     self._run_event("stage_end")
diff --git a/src/rxrxio.py b/src/rxrxio.py
deleted file mode 100644
index ee4c713..0000000
--- a/src/rxrxio.py
+++ /dev/null
@@ -1,249 +0,0 @@
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-import numpy as np
-from skimage.io import imread
-import pandas as pd
-
-import tensorflow as tf
-
-DEFAULT_BASE_PATH = 'gs://rxrx1-us-central1'
-DEFAULT_METADATA_BASE_PATH = os.path.join(DEFAULT_BASE_PATH, 'metadata')
-DEFAULT_IMAGES_BASE_PATH = os.path.join(DEFAULT_BASE_PATH, 'images')
-DEFAULT_CHANNELS = (1, 2, 3, 4, 5, 6)
-RGB_MAP = {
-    1: {
-        'rgb': np.array([19, 0, 249]),
-        'range': [0, 51]
-    },
-    2: {
-        'rgb': np.array([42, 255, 31]),
-        'range': [0, 107]
-    },
-    3: {
-        'rgb': np.array([255, 0, 25]),
-        'range': [0, 64]
-    },
-    4: {
-        'rgb': np.array([45, 255, 252]),
-        'range': [0, 191]
-    },
-    5: {
-        'rgb': np.array([250, 0, 253]),
-        'range': [0, 89]
-    },
-    6: {
-        'rgb': np.array([254, 255, 40]),
-        'range': [0, 191]
-    }
-}
-
-
-def load_image(image_path):
-    with tf.io.gfile.GFile(image_path, 'rb') as f:
-        return imread(f, format='png')
-
-
-def load_images_as_tensor(image_paths, dtype=np.uint8):
-    n_channels = len(image_paths)
-
-    data = np.ndarray(shape=(512, 512, n_channels), dtype=dtype)
-
-    for ix, img_path in enumerate(image_paths):
-        data[:, :, ix] = load_image(img_path)
-
-    return data
-
-
-def convert_tensor_to_rgb(t, channels=DEFAULT_CHANNELS, vmax=255, rgb_map=RGB_MAP):
-    """
-    Converts and returns the image data as RGB image
-    Parameters
-    ----------
-    t : np.ndarray
-        original image data
-    channels : list of int
-        channels to include
-    vmax : int
-        the max value used for scaling
-    rgb_map : dict
-        the color mapping for each channel
-        See rxrx.io.RGB_MAP to see what the defaults are.
-    Returns
-    -------
-    np.ndarray the image data of the sites as RGB channels
-    """
-    colored_channels = []
-    for i, channel in enumerate(channels):
-        x = (t[:, :, i] / vmax) / \
-            ((rgb_map[channel]['range'][1] - rgb_map[channel]['range'][0]) / 255) + \
-            rgb_map[channel]['range'][0] / 255
-        x = np.where(x > 1., 1., x)
-        x_rgb = np.array(
-            np.outer(x, rgb_map[channel]['rgb']).reshape(512, 512, 3),
-            dtype=int)
-        colored_channels.append(x_rgb)
-    im = np.array(np.array(colored_channels).sum(axis=0), dtype=int)
-    im = np.where(im > 255, 255, im)
-    return im
-
-
-def image_path(dataset,
-               experiment,
-               plate,
-               address,
-               site,
-               channel,
-               base_path=DEFAULT_IMAGES_BASE_PATH):
-    """
-    Returns the path of a channel image.
-    Parameters
-    ----------
-    dataset : str
-        what subset of the data: train, test
-    experiment : str
-        experiment name
-    plate : int
-        plate number
-    address : str
-        plate address
-    site : int
-        sites number
-    channel : int
-        channel number
-    base_path : str
-        the base path of the raw images
-    Returns
-    -------
-    str the path of image
-    """
-    return os.path.join(base_path, dataset, experiment, "Plate{}".format(plate),
-                        "{}_s{}_w{}.png".format(address, site, channel))
-
-
-def load_site(dataset,
-              experiment,
-              plate,
-              well,
-              site,
-              channels=DEFAULT_CHANNELS,
-              base_path=DEFAULT_IMAGES_BASE_PATH):
-    """
-    Returns the image data of a sites
-    Parameters
-    ----------
-    dataset : str
-        what subset of the data: train, test
-    experiment : str
-        experiment name
-    plate : int
-        plate number
-    address : str
-        plate address
-    site : int
-        sites number
-    channels : list of int
-        channels to include
-    base_path : str
-        the base path of the raw images
-    Returns
-    -------
-    np.ndarray the image data of the sites
-    """
-    channel_paths = [
-        image_path(
-            dataset, experiment, plate, well, site, c, base_path=base_path)
-        for c in channels
-    ]
-    return load_images_as_tensor(channel_paths)
-
-
-def load_site_as_rgb(dataset,
-                     experiment,
-                     plate,
-                     well,
-                     site,
-                     channels=DEFAULT_CHANNELS,
-                     base_path=DEFAULT_IMAGES_BASE_PATH,
-                     rgb_map=RGB_MAP):
-    """
-    Loads and returns the image data as RGB image
-    Parameters
-    ----------
-    dataset : str
-        what subset of the data: train, test
-    experiment : str
-        experiment name
-    plate : int
-        plate number
-    address : str
-        plate address
-    site : int
-        sites number
-    channels : list of int
-        channels to include
-    base_path : str
-        the base path of the raw images
-    rgb_map : dict
-        the color mapping for each channel
-        See rxrx.io.RGB_MAP to see what the defaults are.
-    Returns
-    -------
-    np.ndarray the image data of the sites as RGB channels
-    """
-    x = load_site(dataset, experiment, plate, well, site, channels, base_path)
-    return convert_tensor_to_rgb(x, channels, rgb_map=rgb_map)
-
-
-def _tf_read_csv(path):
-    with tf.io.gfile.GFile(path, 'rb') as f:
-        return pd.read_csv(f)
-
-
-def _load_dataset(base_path, dataset, include_controls=True):
-    df = _tf_read_csv(os.path.join(base_path, dataset + '.csv'))
-    if include_controls:
-        controls = _tf_read_csv(
-            os.path.join(base_path, dataset + '_controls.csv'))
-        df['well_type'] = 'treatment'
-        df = pd.concat([controls, df], sort=True)
-    df['cell_type'] = df.experiment.str.split("-").apply(lambda a: a[0])
-    df['dataset'] = dataset
-    dfs = []
-    for site in (1, 2):
-        df = df.copy()
-        df['sites'] = site
-        dfs.append(df)
-    res = pd.concat(dfs).sort_values(
-        by=['id_code', 'sites']).set_index('id_code')
-    return res
-
-
-def combine_metadata(base_path=DEFAULT_METADATA_BASE_PATH,
-                     include_controls=True):
-    """
-    Combines all metadata files into a single dataframe and
-    expands it to include sites, not just wells.
-    Note, that the dtype of sirna is a float due to the missing
-    test values but it should be treated as an int.
-    Parameters
-    ----------
-    base_path : str
-        where the metadata files from Kaggle live
-    include_controls : bool
-        indicate if you want the controls included in the dataframe
-    Returns
-    -------
-    pandas.DataFrame the combined metadata
-    """
-    df = pd.concat(
-        [
-            _load_dataset(
-                base_path, dataset, include_controls=include_controls)
-            for dataset in ['test', 'train']
-        ],
-        sort=True)
-    return df
\ No newline at end of file
diff --git a/src/schedulers.py b/src/schedulers.py
deleted file mode 100644
index 0a0755b..0000000
--- a/src/schedulers.py
+++ /dev/null
@@ -1,213 +0,0 @@
-import math
-from torch.optim.optimizer import Optimizer
-from torch.optim.lr_scheduler import _LRScheduler
-
-
-class CyclicLRFix(_LRScheduler):
-    """Sets the learning rate of each parameter group according to
-    cyclical learning rate policy (CLR). The policy cycles the learning
-    rate between two boundaries with a constant frequency, as detailed in
-    the paper `Cyclical Learning Rates for Training Neural Networks`_.
-    The distance between the two boundaries can be scaled on a per-iteration
-    or per-cycle basis.
-    Cyclical learning rate policy changes the learning rate after every batch.
-    `step` should be called after a batch has been used for training.
-    This class has three built-in policies, as put forth in the paper:
-    "triangular":
-        A basic triangular cycle w/ no amplitude scaling.
-    "triangular2":
-        A basic triangular cycle that scales initial amplitude by half each cycle.
-    "exp_range":
-        A cycle that scales initial amplitude by gamma**(cycle iterations) at each
-        cycle iteration.
-    This implementation was adapted from the github repo: `bckenstler/CLR`_
-    Args:
-        optimizer (Optimizer): Wrapped optimizer.
-        base_lr (float or list): Initial learning rate which is the
-            lower boundary in the cycle for each parameter group.
-        max_lr (float or list): Upper learning rate boundaries in the cycle
-            for each parameter group. Functionally,
-            it defines the cycle amplitude (max_lr - base_lr).
-            The lr at any cycle is the sum of base_lr
-            and some scaling of the amplitude; therefore
-            max_lr may not actually be reached depending on
-            scaling function.
-        step_size_up (int): Number of training iterations in the
-            increasing half of a cycle. Default: 2000
-        step_size_down (int): Number of training iterations in the
-            decreasing half of a cycle. If step_size_down is None,
-            it is set to step_size_up. Default: None
-        mode (str): One of {triangular, triangular2, exp_range}.
-            Values correspond to policies detailed above.
-            If scale_fn is not None, this argument is ignored.
-            Default: 'triangular'
-        gamma (float): Constant in 'exp_range' scaling function:
-            gamma**(cycle iterations)
-            Default: 1.0
-        scale_fn (function): Custom scaling policy defined by a single
-            argument lambda function, where
-            0 <= scale_fn(x) <= 1 for all x >= 0.
-            If specified, then 'mode' is ignored.
-            Default: None
-        scale_mode (str): {'cycle', 'iterations'}.
-            Defines whether scale_fn is evaluated on
-            cycle number or cycle iterations (training
-            iterations since start of cycle).
-            Default: 'cycle'
-        cycle_momentum (bool): If ``True``, momentum is cycled inversely
-            to learning rate between 'base_momentum' and 'max_momentum'.
-            Default: True
-        base_momentum (float or list): Lower momentum boundaries in the cycle
-            for each parameter group. Note that momentum is cycled inversely
-            to learning rate; at the peak of a cycle, momentum is
-            'base_momentum' and learning rate is 'max_lr'.
-            Default: 0.8
-        max_momentum (float or list): Upper momentum boundaries in the cycle
-            for each parameter group. Functionally,
-            it defines the cycle amplitude (max_momentum - base_momentum).
-            The momentum at any cycle is the difference of max_momentum
-            and some scaling of the amplitude; therefore
-            base_momentum may not actually be reached depending on
-            scaling function. Note that momentum is cycled inversely
-            to learning rate; at the start of a cycle, momentum is 'max_momentum'
-            and learning rate is 'base_lr'
-            Default: 0.9
-        last_epoch (int): The index of the last batch. This parameter is used when
-            resuming a training job. Since `step()` should be invoked after each
-            batch instead of after each epoch, this number represents the total
-            number of *batches* computed, not the total number of epochs computed.
-            When last_epoch=-1, the schedule is started from the beginning.
-            Default: -1
-    Example:
-        >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
-        >>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
-        >>> data_loader = torch.utils.data.DataLoader(...)
-        >>> for epoch in range(10):
-        >>>     for batch in data_loader:
-        >>>         train_batch(...)
-        >>>         scheduler.step()
-    .. _Cyclical Learning Rates for Training Neural Networks: https://arxiv.org/abs/1506.01186
-    .. _bckenstler/CLR: https://github.com/bckenstler/CLR
-    """
-
-    def __init__(self,
-                 optimizer,
-                 base_lr,
-                 max_lr,
-                 step_size_up=2000,
-                 step_size_down=None,
-                 mode='triangular',
-                 gamma=1.,
-                 scale_fn=None,
-                 scale_mode='cycle',
-                 cycle_momentum=True,
-                 base_momentum=0.8,
-                 max_momentum=0.9,
-                 last_epoch=-1):
-
-        if not isinstance(optimizer, Optimizer):
-            raise TypeError('{} is not an Optimizer'.format(
-                type(optimizer).__name__))
-        self.optimizer = optimizer
-
-        base_lrs = self._format_param('base_lr', optimizer, base_lr)
-        if last_epoch == -1:
-            for lr, group in zip(base_lrs, optimizer.param_groups):
-                group['lr'] = lr
-
-        self.max_lrs = self._format_param('max_lr', optimizer, max_lr)
-
-        step_size_up = float(step_size_up)
-        step_size_down = float(step_size_down) if step_size_down is not None else step_size_up
-        self.total_size = step_size_up + step_size_down
-        self.step_ratio = step_size_up / self.total_size
-
-        if mode not in ['triangular', 'triangular2', 'exp_range'] \
-                and scale_fn is None:
-            raise ValueError('mode is invalid and scale_fn is None')
-
-        self.mode = mode
-        self.gamma = gamma
-
-        if scale_fn is None:
-            if self.mode == 'triangular':
-                self.scale_fn = self._triangular_scale_fn
-                self.scale_mode = 'cycle'
-            elif self.mode == 'triangular2':
-                self.scale_fn = self._triangular2_scale_fn
-                self.scale_mode = 'cycle'
-            elif self.mode == 'exp_range':
-                self.scale_fn = self._exp_range_scale_fn
-                self.scale_mode = 'iterations'
-        else:
-            self.scale_fn = scale_fn
-            self.scale_mode = scale_mode
-
-        self.cycle_momentum = cycle_momentum
-        if cycle_momentum:
-            if 'momentum' not in optimizer.defaults:
-                raise ValueError('optimizer must support momentum with `cycle_momentum` option enabled')
-
-            base_momentums = self._format_param('base_momentum', optimizer, base_momentum)
-            if last_epoch == -1:
-                for momentum, group in zip(base_momentums, optimizer.param_groups):
-                    group['momentum'] = momentum
-            self.base_momentums = list(map(lambda group: group['momentum'], optimizer.param_groups))
-            self.max_momentums = self._format_param('max_momentum', optimizer, max_momentum)
-
-        super(CyclicLRFix, self).__init__(optimizer, last_epoch)
-
-    def _format_param(self, name, optimizer, param):
-        """Return correctly formatted lr/momentum for each param group."""
-        if isinstance(param, (list, tuple)):
-            if len(param) != len(optimizer.param_groups):
-                raise ValueError("expected {} values for {}, got {}".format(
-                    len(optimizer.param_groups), name, len(param)))
-            return param
-        else:
-            return [param] * len(optimizer.param_groups)
-
-    def _triangular_scale_fn(self, x):
-        return 1.
-
-    def _triangular2_scale_fn(self, x):
-        return 1 / (2. ** (x - 1))
-
-    def _exp_range_scale_fn(self, x):
-        return self.gamma**(x)
-
-    def get_lr(self):
-        """Calculates the learning rate at batch index. This function treats
-        `self.last_epoch` as the last batch index.
-        If `self.cycle_momentum` is ``True``, this function has a side effect of
-        updating the optimizer's momentum.
-        """
-        cycle = math.floor(1 + self.last_epoch / self.total_size)
-        x = 1. + self.last_epoch / self.total_size - cycle
-        if x <= self.step_ratio:
-            scale_factor = x / self.step_ratio
-        else:
-            scale_factor = (x - 1) / (self.step_ratio - 1)
-
-        lrs = []
-        for base_lr, max_lr in zip(self.base_lrs, self.max_lrs):
-            base_height = (max_lr - base_lr) * scale_factor
-            if self.scale_mode == 'cycle':
-                lr = base_lr + base_height * self.scale_fn(cycle)
-            else:
-                lr = base_lr + base_height * self.scale_fn(self.last_epoch)
-            lrs.append(lr)
-
-        if self.cycle_momentum:
-            momentums = []
-            for base_momentum, max_momentum in zip(self.base_momentums, self.max_momentums):
-                base_height = (max_momentum - base_momentum) * scale_factor
-                if self.scale_mode == 'cycle':
-                    momentum = max_momentum - base_height * self.scale_fn(cycle)
-                else:
-                    momentum = max_momentum - base_height * self.scale_fn(self.last_epoch)
-                momentums.append(momentum)
-            for param_group, momentum in zip(self.optimizer.param_groups, momentums):
-                param_group['momentum'] = momentum
-
-        return lrs