SolvedYAD2K CoreML conversion fails due to Lambda layer

I'm trying to convert a Darknet Yolo v2 model to Keras and then to CoreML using Apple's coremltools:
https://github.com/apple/coremltools/

This procedure apparently used to work according to this tutorial:
https://github.com/hollance/YOLO-CoreML-MPSNNGraph

I'm kind of a noob, but from what I understand Lambda layers just allow you to run arbitrary code (which is unsurprisingly not supported by Apple's API). It looks like this is where this is happening:

yad2k.py

elif section.startswith('reorg'):
            block_size = int(cfg_parser[section]['stride'])
            assert block_size == 2, 'Only reorg with stride 2 supported.'
            all_layers.append(
                Lambda(
                    space_to_depth_x2,
                    output_shape=space_to_depth_x2_output_shape,
                    name='space_to_depth_x2')(prev_layer))
            prev_layer = all_layers[-1]

Is there a way to do space_to_depth with the keras API so the conversion is supported? I'm really out of my depth (pun intended) here and don't really understand what's going on. Any help would be appreciated. :)

23 Answers

✔️Accepted Answer

In general @pchensoftware described everything correctly, but I would make a bit more detailed explanation for those who want to convert the network theirselves.

I made it all on Linux, maybe there are some corrections needed for mac or windows. There are a lot of dirty hacks. I assume you want it to just fucking work by making a kludge in the heart of your code and not make a proper solution with tests, CI, agile etc.. If so, then:

Darknet to Keras

So you have trained a full yolo and want to run it on coreml. First, convert it to Keras format using

./yad2k.py yolo.cfg yolo.weights yolo-voc.h5

The name yolo here is just for example, replace it with the name of your weight and config files.

If you trained your own network it's possible that it has slightly different header format for the .weights file. You can notice it by looking at the output of yad2k.py.

In the end it writes Read 50768311 of 50768311.0 from Darknet weights. If the number mismatches by one, then probably your model is using different header format of .weights where length is 64 bits instread of previous 32. If so and only if, apply the following patch to yad2k.py:

Replace
shape=(4, ), dtype='int32', buffer=weights_file.read(16))
with
shape=(5, ), dtype='int32', buffer=weights_file.read(20))

In my case as you can see the numbers equal and nothing has to be done.

In the end you should have your yolo-voc.h5 converted. Try it with

./test_yolo.py --anchors_path ... --classes_path ... model_data/yolo-voc.h5

In the images/out you should see images with detected objects.

Keras to CoreMl

First of all get the source of coremltools, you will need to patch it.
Build it from source using the instructions on their site.

Create a file convert.py in the directory where you cloned the source with this content:

#!/usr/bin/env python
from coremltools._scripts.converter import _main

_main()

This is needed just to run the conversion. Maybe there is some other way, who knows. So finally, you will run all of this with this command:

./convert.py --srcModelPath yolo-voc.h5 --dstModelPath yolo-voc.mlmodel --inputNames image --imageInputNames image --outputNames grid --scale 0.00392156885937

The scale parameter is important, don't forget to set it.

But before you can run it first let's correct something. First of all coremltools are available only for python2. If you know how to run it with python3 don't do what I describe in the Patching Keras chapter, for me it was easier to kludge the code so it just works.

Patching Coremltools

Make a virtualenv
virtualenv -p python2 --system-site-packages venv2

Activate it
source venv2/bin/activate

Install some libs

pip install h5py==2.7.1
pip install keras==2.0.6

Then patch the main code. Here is the diff:

diff --git a/coremltools/converters/keras/_keras2_converter.py b/coremltools/converters/keras/_keras2_converter.py
index 530c8bf..8e3cc95 100644
--- a/coremltools/converters/keras/_keras2_converter.py
+++ b/coremltools/converters/keras/_keras2_converter.py
@@ -69,6 +69,8 @@ if _HAS_KERAS2_TF:
         
         _keras.applications.mobilenet.DepthwiseConv2D:_layers2.convert_convolution,
 
+        _keras.layers.core.Lambda: _layers2.convert_reorganize,
+
     }
     
 
diff --git a/coremltools/converters/keras/_layers2.py b/coremltools/converters/keras/_layers2.py
index 01d2bdd..900af43 100644
--- a/coremltools/converters/keras/_layers2.py
+++ b/coremltools/converters/keras/_layers2.py
@@ -866,6 +866,12 @@ def convert_reshape(builder, layer, input_names, output_names, keras_layer):
     else:
         _utils.raise_error_unsupported_categorical_option('input_shape', str(input_shape), 'reshape', layer)
 
+def convert_reorganize(builder, layer, input_names, output_names, keras_layer):
+
+    input_name, output_name = (input_names[0], output_names[0])
+
+    builder.add_reorganize_data(name = layer, input_name = input_name, output_name=output_name, block_size=2)
+
 def convert_simple_rnn(builder, layer, input_names, output_names, keras_layer):
     """
     Convert an SimpleRNN layer from keras to coreml.
     

Patching Keras

Don't do it if you have coremltools running in python3, only if you have out of box version with python 2.

You also need to patch the Keras code a bit. Edit venv2/local/lib/python2.7/site-packages/keras/utils/generic_utils.py, add there two functions

def space_to_depth_x2(x):
    """Thin wrapper for Tensorflow space_to_depth with block_size=2."""
    # Import currently required to make Lambda work.
    # See: https://github.com/fchollet/keras/issues/5088#issuecomment-273851273
    import tensorflow as tf
    return tf.space_to_depth(x, block_size=2)


def space_to_depth_x2_output_shape(input_shape):
    """Determine space_to_depth output shape for block_size=2.

    Note: For Lambda with TensorFlow backend, output shape may not be needed.
    """
    return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 *
            input_shape[3]) if input_shape[1] else (input_shape[0], None, None,
                                                    4 * input_shape[3])

and instead of marshal.loads(code.encode('raw_unicode_escape'))

put

    if len(code) == 422:
        code = space_to_depth_x2.__code__
    else:
        code = space_to_depth_x2_output_shape.__code__

It's possible that len is different in your case, maybe not, remember this is a very dirty hack that works only on one PC. In general this code is executed twice and for the first time you have to pass code = space_to_depth_x2.__code__, for the second code = space_to_depth_x2_output_shape.__code__. How you will do it is up to you.

Enjoy!

./convert.py --srcModelPath yolo-voc.h5 --dstModelPath yolo-voc.mlmodel --inputNames image --imageInputNames image --outputNames grid --scale 0.00392156885937

After that I can run my full YOLO on iphone.

Other Answers:

A quick and dirty solution can be applied directly on the cfg file to remove the reorg layer.

The reorg layer in the keras_yolo.py code uses the space_to_depth function of tensorflow. It moves extra data in height and width into the depth. It reduces the height and width dimensions without lossing information.

In the cfg file, the part to modify is the following:

[reorg]                                                                                                            
stride=2                                                                                                           

[route]                                                                                                            
layers=-1,-4 

It resizes the convolutional layer just above in the code with the space_to_depth function (from 38x38x64 to 19x19x256) and concatenates it with a previous convolutional layer (19x19x1024) to produce an output of size 19x19x1280.

It is possible to replace these 4 lines of code with:

[maxpool]
size=2
stride=2

[route]
layers=-2

[maxpool]
size=2
stride=2

[route]
layers=-4

[maxpool]
size=2
stride=2

[route]
layers=-6

[maxpool]
size=2
stride=2

[route]
layers=-1,-3,-5,-7

[route]
layers=-1, -11

It will transform the output of the convolutional layer (38x38x64) into a smaller one (19x19x64). This output is duplicated 4 times and they are concatenated to match the correct shape (19x19x256). Thus the final route concatenates the stacked features maps with the previous convolutional layer to produce an output with the correct shape (19x19x1280).
EDIT: the Keras to CoreML converter doesn't allow [route] with the same name of layer multiple times, thus it is not possible to just add a layers=-1,-1,-1,-1 instead of the four independant routes.

If you don't match the correct shapes, the trained weights won't be at the corresponding layers and it will produce wrong results.

Using the specific postprocessing presented here with a classification confidence threshold of 0.3 and an IoU threshold of 0.4, the results are not as good as before but they look correct over the images.

Using the 2017 COCO validation dataset and the Python COCO API to compute the mAP scores, we obtain the following results before removing the reorg layer:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.195
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.421
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.154

The results after the quick and dirty solution:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.159
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.110

Few points are lost because we should not use duplicates of the maxpooling layer, it needs to be the entire 38x38x64 layer but reshaped as 19x19x256.
I couldn't find a quick solution to solve this just by modifing the cfg file.

A longer solution is to construct the entire graph in Tensorflow (or another framework) with a reshape tensor at the needed place (between batch_normalization_20 and conv2d_22). Then you load the weights to perfectly match the graph and you export it as a protobuf file. Finally you convert the protobuf file as a keras model to convert it into a CoreML model.

Related Issues:

6
YAD2K CoreML conversion fails due to Lambda layer
In general @pchensoftware described everything correctly I'm trying to convert a Darknet Yolo v2 mod...