autorope · tikurahul · Dec 23, 2020 · Dec 22, 2020 · Dec 22, 2020 · Dec 23, 2020
diff --git a/docs/dev_guide/model.md b/docs/dev_guide/model.md
@@ -1,9 +1,14 @@
 # How to build your own model
 
+---
+ **Note:** _This requires version >= 4.1.X_
+
+---
+
 * [Overview](model.md#overview)
 * [Constructor](model.md#constructor)
-* [Training Interface](model.md#training interface)
-* [Parts Interface](model.md#parts interface)
+* [Training Interface](model.md#training-interface)
+* [Parts Interface](model.md#parts-interface)
 * [Example](model.md#example)
 
 ## Overview
@@ -62,6 +67,8 @@ only the image as input.
 
 The function returns a single data item if the model has only one input. You 
 need to return a tuple if your model uses more input data.
+
+
 **Note:** _If your model has more inputs, the tuple needs to have the image in 
 the first place._ 
 
@@ -84,9 +91,11 @@ be fed into `tf.data`. Note, `tf.data` expects a dictionary if the model has
 more than one input variable, so we have chosen to use dictionaries also in the
 one-argument case for consistency. Above we have shown the implementation in the
 base class which works for all models that have only the image as input. You
-don't have to overwrite neither `x_transform` nor
-`x_translate` if your model only uses the image as input data.
-**Note:** _the keys of the dictionary must match the name of the **input**  
+don't have to overwrite neither `x_transform` nor `x_translate` if your 
+model only uses the image as input data.
+
+
+**Note:** _the keys of the dictionary must match the name of the **input** 
 layers in the model._
 
 ```python
@@ -100,6 +109,8 @@ def y_translate(self, y: XY) -> Dict[str, Union[float, np.ndarray]]:
 Similar to the above, this provides the translation of the `y` data into the
 dictionary required for `tf.data`. This example shows the implementation of
 `KerasLinear`.
+
+
 **Note:** _the keys of the dictionary must match the name of the **output**
 layers in the model._
 
@@ -115,8 +126,12 @@ def output_shapes(self):
 This function returns a tuple of _two_ dictionaries that tells tensorflow which
 shapes are used in the model. We have shown the example of the 
 `KerasCategorical` model here.
-**Note 1:** _The keys of the two dictionaries must match the name of the
-**input** and **output** layers in the model._
+
+
+**Note 1:** _As above, the keys of the two dictionaries must match the name 
+of the **input** and **output** layers in the model._
+
+
 **Note 2:** _Where the model returns scalar numbers, the corresponding 
 type has to be `tf.TensorShape([])`._
 
@@ -141,6 +156,8 @@ Here we are showing the implementation of the linear model. Please note that
 the input tensor shape always contains the batch dimension in the first 
 place, hence the shape of the input image is adjusted from 
 `(120, 160, 3) -> (1, 120, 160, 3)`.
+
+
 **Note:** _If you are passing another array in the`other_arr` variable, you will
 have to do a similar re-shaping.
 
@@ -177,7 +194,7 @@ class KerasSensors(KerasPilot):
         sensor_in = Input(shape=(self.num_sensors, ), name='sensor_in')
         y = sensor_in
         z = concatenate([x, y])
-        # here we add two more dens layers
+        # here we add two more dense layers
         z = Dense(50, activation='relu', name='dense_3')(z)
         z = Dropout(drop)(z)
         z = Dense(50, activation='relu', name='dense_4')(z)
@@ -237,9 +254,11 @@ class KerasSensors(KerasPilot):
                    'n_outputs1': tf.TensorShape([])})
         return shapes
 ```
-We could have inherited from `KerasLinear` which would provide the 
-implementation of `y_transform(), y_translate(), compile()`. The model 
-requires the sensor data to be an array in the TubRecord with key `"sensor"`.
+We could have inherited from `KerasLinear` which already provides the 
+implementation of `y_transform(), y_translate(), compile()`. However, to 
+make it explicit for the general case we have implemented all functions here. 
+The model requires the sensor data to be an array in the TubRecord with key 
+`"sensor"`. 
 
 ### Creating a tub
 

diff --git a/docs/utility/donkey.md b/docs/utility/donkey.md
@@ -66,13 +66,13 @@ donkey tubclean <folder containing tubs>
 * Hit `Ctrl + C` to exit
 
 ## Train the model
-**Note:** _This section only applies to version 4.x_
-The command train the model.
+**Note:** _This section only applies to version >= 4.1_
+This command trains the model.
 ```bash
 donkey train --tub=<tub_path> [--config=<config.py>] [--model=<model path>] [--model_type=(linear|categorical|inferred)] 
 ```
 The `createcar` command still creates a `train.py` file for backward 
-compatibility, but it's not need and training can be run like this.
+compatibility, but it's not required for training.
 
 
 ## Make Movie from Tub
@@ -95,6 +95,52 @@ donkey makemovie --tub=<tub_path> [--out=<tub_movie.mp4>] [--config=<config.py>]
 * optional `--start` and/or `--end` can specify a range of frame numbers to use.
 * scale will cause ouput image to be scaled by this amount
 
+## Check Tub
+
+This command allows you to see how many records are contained in any/all tubs. It will also open each record and ensure that the data is readable and intact. If not, it will allow you to remove corrupt records.
+
+> Note: This should be moved from manage.py to donkey command
+
+Usage:
+
+```bash
+donkey tubcheck <tub_path> [--fix]
+```
+
+* Run on the host computer or the robot
+* It will print summary of record count and channels recorded for each tub
+* It will print the records that throw an exception while reading
+* The optional `--fix` will delete records that have problems
+
+## Augment Tub
+
+This command allows you to perform the data augmentation on a tub or set of tubs directly. The augmentation is also available in training via the `--aug` flag. Preprocessing the tub can speed up the training as the augmentation can take some time. Also you can train with the unmodified tub and the augmented tub joined together. 
+
+Usage:
+
+```bash
+donkey tubaugment <tub_path> [--inplace]
+```
+
+* Run on the host computer or the robot
+* The optional `--inplace` will replace the original tub images when provided. Otherwise `tub_XY_YY-MM-DD` will be copied to a new tub `tub_XX_aug_YY-MM-DD` and the original data remains unchanged
+
+
+## Histogram
+
+This command will show a pop-up window showing the histogram of record values in a given tub.
+
+> Note: This should be moved from manage.py to donkey command
+
+Usage:
+
+```bash
+donkey tubhist <tub_path> --rec=<"user/angle">
+```
+
+* Run on the host computer
+
+* When the `--tub` is omitted, it will check all tubs in the default data dir
 
 ## Plot Predictions
 
@@ -113,6 +159,55 @@ donkey tubplot <tub_path> [--model=<model_path>]
 * Will show a pop-up window showing the plot of steering values in a given tub compared to NN predictions from the trained model
 * When the `--tub` is omitted, it will check all tubs in the default data dir
 
+## Continuous Rsync
+
+This command uses rsync to copy files from your pi to your host. It does so in a loop, continuously copying files. By default, it will also delete any files
+on the host that are deleted on the pi. This allows your PS3 Triangle edits to affect the files on both machines.
+
+Usage:
+
+```bash
+donkey consync [--dir = <data_path>] [--delete=<y|n>]
+```
+
+* Run on the host computer
+* First copy your public key to the pi so you don't need a password for each rsync:
+
+```bash
+cat ~/.ssh/id_rsa.pub | ssh pi@<your pi ip> 'cat >> .ssh/authorized_keys'
+```
+
+* If you don't have a id_rsa.pub then google how to make one
+* Edit your config.py and make sure the fields `PI_USERNAME`, `PI_HOSTNAME`, `PI_DONKEY_ROOT` are setup. Only on windows, you need to set `PI_PASSWD`.
+* This command may be run from `~/mycar` dir
+
+## Continuous Train
+
+This command fires off the keras training in a mode where it will continuously look for new data at the end of every epoch.
+
+Usage:
+
+```bash
+donkey contrain [--tub=<data_path>] [--model=<path to model>] [--transfer=<path to model>] [--type=<linear|categorical|rnn|imu|behavior|3d>] [--aug]
+```
+
+* This command may be run from `~/mycar` dir
+* Run on the host computer
+* First copy your public key to the pi so you don't need a password for each rsync:
+
+```bash
+cat ~/.ssh/id_rsa.pub | ssh pi@<your pi ip> 'cat >> .ssh/authorized_keys'
+```
+
+* If you don't have a id_rsa.pub then google how to make one
+* Edit your config.py and make sure the fields `PI_USERNAME`, `PI_HOSTNAME`, `PI_DONKEY_ROOT` are setup. Only on windows, you need to set `PI_PASSWD`.
+* Optionally it can send the model file to your pi when it achieves a best loss. In config.py set `SEND_BEST_MODEL_TO_PI = True`.
+* Your pi drive loop will autoload the weights file when it changes. This works best if car started with `.json` weights like:
+
+```bash
+python manage.py drive --model models/drive.json
+```
+
 ## Joystick Wizard
 
 This command line wizard will walk you through the steps to create a custom/customized controller.  

diff --git a/donkeycar/parts/tub_v2.py b/donkeycar/parts/tub_v2.py
@@ -9,10 +9,11 @@
 
 
 class Tub(object):
-    '''
+    """
     A datastore to store sensor data in a key, value format. \n
     Accepts str, int, float, image_array, image, and array data types.
-    '''
+    """
+
     def __init__(self, base_path, inputs=[], types=[], metadata=[],
                  max_catalog_len=1000, read_only=False):
         self.base_path = base_path
@@ -28,15 +29,15 @@ def __init__(self, base_path, inputs=[], types=[], metadata=[],
         if not os.path.exists(self.images_base_path):
             os.makedirs(self.images_base_path, exist_ok=True)
 
-    def write_record(self, record):
-        '''
+    def write_record(self, record=None):
+        """
         Can handle various data types including images.
-        '''
+        """
         contents = dict()
         for key, value in record.items():
             if value is None:
                 continue
-            elif not key in self.input_types:
+            elif key not in self.input_types:
                 continue
             else:
                 input_type = self.input_types[key]
@@ -99,9 +100,9 @@ def _image_file_name(cls, index, key, extension='.jpg'):
 
 
 class TubWriter(object):
-    '''
+    """
     A Donkey part, which can write records to the datastore.
-    '''
+    """
     def __init__(self, base_path, inputs=[], types=[], metadata=[],
                  max_catalog_len=1000):
         self.tub = Tub(base_path, inputs, types, metadata, max_catalog_len)

diff --git a/scripts/convert_to_tub_v2.py b/scripts/convert_to_tub_v2.py
@@ -22,29 +22,53 @@
 
 
 def convert_to_tub_v2(paths, output_path):
+    empty_record = {'__empty__': True}
+
+    if type(paths) is str:
+        paths = [paths]
     legacy_tubs = [LegacyTub(path) for path in paths]
     output_tub = None
-    print('Total number of tubs: %s' % (len(legacy_tubs)))
+    print(f'Total number of tubs: {len(legacy_tubs)}')
 
     for legacy_tub in legacy_tubs:
         if not output_tub:
-            output_tub = Tub(output_path, legacy_tub.inputs, legacy_tub.types, list(legacy_tub.meta.items()))
+            # add input and type for empty records recording
+            inputs = legacy_tub.inputs + ['__empty__']
+            types = legacy_tub.types + ['boolean']
+            output_tub = Tub(output_path, inputs, types,
+                             list(legacy_tub.meta.items()))
 
         record_paths = legacy_tub.gather_records()
         bar = IncrementalBar('Converting', max=len(record_paths))
-
+        previous_index = None
         for record_path in record_paths:
             try:
                 contents = Path(record_path).read_text()
                 record = json.loads(contents)
                 image_path = record['cam/image_array']
+                current_index = int(image_path.split('_')[0])
                 image_path = os.path.join(legacy_tub.path, image_path)
                 image_data = Image.open(image_path)
                 record['cam/image_array'] = image_data
-                output_tub.write_record(record)
+                # first record or they are continuous, just append
+                if not previous_index or current_index == previous_index + 1:
+                    output_tub.write_record(record)
+                    previous_index = current_index
+                # otherwise fill the gap with dummy records
+                else:
+                    # Skipping over previous record here because it has
+                    # already been written.
+                    previous_index += 1
+                    # Adding empty record nodes, and marking them deleted
+                    # until the next valid record.
+                    while previous_index < current_index:
+                        idx = output_tub.manifest.current_index
+                        output_tub.write_record(empty_record)
+                        output_tub.delete_record(idx)
+                        previous_index += 1
                 bar.next()
             except Exception as exception:
-                print('Ignoring record path %s\n' % (record_path), exception)
+                print(f'Ignoring record path {record_path}\n', exception)
                 traceback.print_exc()