Skip to content

Latest commit

 

History

History
380 lines (326 loc) · 13.2 KB

About_Split-Model.md

File metadata and controls

380 lines (326 loc) · 13.2 KB

Learn Split Model

Processing Flow of Single Shot MultiBox Detector: SSD

SSD consists of three parts.

  1. Prepare. (Resize image to 300x300.)
  2. Detection. (Find a candidate boxes.)
  3. Non-Maximum Suppression. (Final detection.)
    Split model divides the model before and after non-maximum suppression.
    This split position provides an excellent performance and solves the performance problem of tf.where on GPU.

The first split model was created by @wkelongws.

He did nice work!
For realtime object detection, this is the most important part.
tensorflow/models#3270
before split ssd_mobilenet_v1_coco_2017_11_17:

after split ssd_mobilenet_v1_coco_2017_11_17:

Learn how to divide ssd_mobilenet_v1_coco_2017_11_17 model.

First point: Non-Maximum Suppression.

This has two input nodes.


Second point: Two input nodes: ExpandDims_1 and convert_scores.

Postprocessor/ExpandDims_1

Shape of ExpandDims_1 is ?x1917x1x4. (see output shape)
"?" means that input array length is not fixed length.

This input array length using as mini batch size "24" at the training time.
At the prediction time, input image uses with array as [[image]]. This means the input array length is "1".
(When the prediction time, you can predict multiple images at once.)

In the training time and prediction time, input image array length is different. Therefore, the input is defined with tf.placeholder and the shape is defined as "None"(means not fixed array length).

That "None" will appear as "?".


Divide here.
Write the definition of this division point in the source code:lib/load_graph_nms_v1.py as follows.

        """ SPLIT TARGET NAME """
        SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
                             'Postprocessor/ExpandDims_1',
        ]

Postprocessor/convert_scores

Shape of convert_scores is ?x1917x90. (see output shape)


Divide here.
Write the definition of this division point in the source code:lib/load_graph_nms_v1.py as follows.

        """ SPLIT TARGET NAME """
        SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
                             'Postprocessor/ExpandDims_1',
        ]

Programming

Add new inputs (score_in, expand_in) for secondary graph (cpu part).

Write new inputs in default graph with tf.placeholder. source code:lib/load_graph_nms_v1.py

        tf.reset_default_graph()

        """ ADD CPU INPUT """
        target_in = [tf.placeholder(tf.float32, shape=(None, split_shape, num_classes), name=SPLIT_TARGET_NAME[0]),
                     tf.placeholder(tf.float32, shape=(None, split_shape, 1, 4), name=SPLIT_TARGET_NAME[1]),
        ]

The first, I reset the default graph. I wrote it to mean that the graph is empty at this time.
The shape is in the previous graph diagram.
Set the same name for name. The new input name is appended "_1" to the name automatically, so use it.

Get graph_def of new inputs.

Now, new inputs exist in default graph, get graph def from there.
After get graph def of new inputs, reset default graph. New inputs tf.placeholder were created only for graph def. Don't need anymore.

        """
        Load placeholder's graph_def.
        """
        target_def = []
        for node in tf.get_default_graph().as_graph_def().node:
            for stn in SPLIT_TARGET_NAME:
                if node.name == stn:
                    target_def += [node]
        tf.reset_default_graph()

Load Frozen Graph.

Load frozen graph to graph_def variable.

        graph_def = tf.GraphDef()
        with tf.gfile.GFile(model_path, 'rb') as fid:
            serialized_graph = fid.read()
            graph_def.ParseFromString(serialized_graph)

If non-split model, the loaded graph_def is imported into the default graph and return the default graph.

    def load_frozen_graph_without_split(self):
        """
        Load frozen_graph.
        """
        model_path = self.cfg['model_path']

        tf.reset_default_graph()

        graph_def = tf.GraphDef()
        with tf.gfile.GFile(model_path, 'rb') as fid:
            serialized_graph = fid.read()
            graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(graph_def, name='')
        """
        return
        """
        return tf.get_default_graph()

But in split model, processing continues.


Next is the most important code in model operation.
Load all inputs of all nodes and write inputs into edges[NODE_NAME].

            """
            Check the connection of all nodes.
            edges[] variable has input information for all nodes.
            """
            edges = {}
            name_to_node_map = {}
            node_seq = {}
            seq = 0
            for node in graph_def.node:
                n = self.node_name(node.name)
                if n in SPLIT_TARGET_NAME:
                     print(node)
                name_to_node_map[n] = node
                edges[n] = [self.node_name(x) for x in node.input]
                if n in SPLIT_TARGET_NAME:
                     print(edges[n])
                node_seq[n] = seq
                seq += 1

The node 'Postprocessor/ExpandDims_1' has 2 inputs.
Node of Postprocessor/ExpandDims_1:

name: "Postprocessor/ExpandDims_1"
op: "ExpandDims"
input: "Postprocessor/Reshape_2"
input: "Postprocessor/ExpandDims_1/dim"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "Tdim"
  value {
    type: DT_INT32
  }
}

Therefore, edges['Postprocessor/ExpandDims_1'] has 2 input node names. Edge of Postprocessor/ExpandDims_1:

['Postprocessor/Reshape_2', 'Postprocessor/ExpandDims_1/dim']

The node 'Postprocessor/convert_scores' has 1 input.
Node of Postprocessor/convert_scores:

name: "Postprocessor/convert_scores"
op: "Sigmoid"
input: "Postprocessor/scale_logits"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}

Therefore, edges['Postprocessor/convert_scores'] has 1 input node name. Edge of Postprocessor/convert_scores:

['Postprocessor/scale_logits']

As you can see, the edges[] variable has input information for all nodes.


Alert if split target is not in the graph.
Raise ERROR is also good.

            """
            Alert if split target is not in the graph.
            """
            dest_nodes = SPLIT_TARGET_NAME
            for d in dest_nodes:
                assert d in name_to_node_map, "%s is not in graph" % d

Follow all input nodes from the split point and add it into keep_list. This is GPU part.

            """
            Making GPU part.
            Follow all input nodes from the split point and add it into keep_list.
            """
            nodes_to_keep = set()
            next_to_visit = dest_nodes

            while next_to_visit:
                n = next_to_visit[0]
                del next_to_visit[0]
                if n in nodes_to_keep:
                    continue
                nodes_to_keep.add(n)
                next_to_visit += edges[n]

            nodes_to_keep_list = sorted(list(nodes_to_keep), key=lambda n: node_seq[n])

            keep = graph_pb2.GraphDef()
            for n in nodes_to_keep_list:
                keep.node.extend([copy.deepcopy(name_to_node_map[n])])

Making CPU part is simple. It removes GPU part from loaded graph and add new inputs.

            """
            Making CPU part.
            It removes GPU part from loaded graph and add new inputs.
            """
            nodes_to_remove = set()
            for n in node_seq:
                if n in nodes_to_keep_list: continue
                nodes_to_remove.add(n)
            nodes_to_remove_list = sorted(list(nodes_to_remove), key=lambda n: node_seq[n])

            remove = graph_pb2.GraphDef()
            for td in target_def:
                remove.node.extend([td])
            for n in nodes_to_remove_list:
                remove.node.extend([copy.deepcopy(name_to_node_map[n])])

Finally, add device info and import into the default graph. And return the default graph.

            """
            Import graph_def into default graph.
            """
            with tf.device('/gpu:0'):
                tf.import_graph_def(keep, name='')
            with tf.device('/cpu:0'):
                tf.import_graph_def(remove, name='')

        return tf.get_default_graph()

Use split model.

The input of the primary graph (gpu part) does not change and it is image array. The output operation names are ExpandDims_1 and convert_scores.

The input of secondary graph (cpu part) becomes expand_in and score_in created with tf.placeholder. The output operation names are not change, these are detection_boxes, detection_scores, detection_classes and num_detections.

If load_graph() returns expand_in and score_in, I can use it for secondary graph's input tensor. But I wrote it with graph.get_tensor_by_name() like any other operations.
source code:lib/detection_nms_v1.py

        if SPLIT_MODEL:
            SPLIT_TARGET_NAME = ['Postprocessor/convert_scores',
                                 'Postprocessor/ExpandDims_1',
            ]
            split_out = []
            split_in = []
            for stn in SPLIT_TARGET_NAME:
                split_out += [graph.get_tensor_by_name(stn+':0')]
                split_in += [graph.get_tensor_by_name(stn+'_1:0')]

Diagram of split model.

New Output: ExpandDims_1 and convert_scores.


New Input: ExpandDims_1_1 and convert_scores_1.



Split model for new Non-Maximum Suppression.

In 2018, we know ssd_mobilenet_v1 was something changed.
And we also know ssd_mobilenet_v2 was uploaded.
Ok, let's check ssd_mobilenet_v2 first.

Looking at graph, I can see that there are three inputs.
Graph diagram of ssd_mobilenet_v2_2018_03_29:


Let's look at these input nodes.

See type and output shape.
ExpandDims_1 is the same as previous one.
And, what is Slice? It seems convert_scores. Just renamed it.
And, what is stack_1? This is new face!

ExpandDims_1:

Slice:

stack_1:

Write code and build graph.

stack_1 seems to be an array of Float. That is, tf.placeholder with shape is None.
source code:lib/load_graph_nms_v2.py

        """ SPLIT TARGET NAME """
        SPLIT_TARGET_NAME = ['Postprocessor/Slice', # Tensor
                             'Postprocessor/ExpandDims_1', # Tensor
                             'Postprocessor/stack_1', # Float array
        ]
        """ ADD CPU INPUT """
        target_in = [tf.placeholder(tf.float32, shape=(None, split_shape, num_classes), name=SPLIT_TARGET_NAME[0]),
                     tf.placeholder(tf.float32, shape=(None, split_shape, 1, 4), name=SPLIT_TARGET_NAME[1]), # shape=output shape
                     tf.placeholder(tf.float32, shape=(None), name=SPLIT_TARGET_NAME[2]), # array of float
        ]

Build split graph.

Write code and run.

Operations.
source code:lib/detection_nms_v2.py

        if SPLIT_MODEL:
            SPLIT_TARGET_NAME = ['Postprocessor/Slice',
                                 'Postprocessor/ExpandDims_1',
                                 'Postprocessor/stack_1'
            ]
            split_out = []
            split_in = []
            for stn in SPLIT_TARGET_NAME:
                split_out += [graph.get_tensor_by_name(stn+':0')]
                split_in += [graph.get_tensor_by_name(stn+'_1:0')]

Of course, arguments and returns of sess.run() use this.

Check Other Models.

  • ssdlite_mobilenet_v2_coco_2018_05_09
  • ssd_inception_v2_coco_2018_01_28
  • ssd_mobilenet_v1_coco_2018_01_28

These are the same BatchMultiClassNonMaxSuppression inputs as ssd_mobilenet_v2_coco_2018_03_29.
ssdlite_mobilenet_v2_coco_2018_05_09:

ssd_inception_v2_coco_2018_01_28:

ssd_mobilenet_v1_coco_2018_01_28: