//
you're reading...

Featured

How to Get Core ML Produce Images as Output

As I was working on my image processing application with heavy machine-learning content, I’ve discovered some issues when trying to convert my machine learning models from Keras into Core ML so that I can run the model in my iOS app.  The conversion tools provided by Apple has decent support for machine-learning models with images as input.  However for models that produce images as output, support is a bit lacking.

Specifically the Keras model converter delivered as part of coremltools version 2.1 can’t seem to create Core ML models that output images even when support for images as input is pretty good.  You can take a Keras model taking a nm3 tensor as input and then convert it in such a way that the resulting Core ML model would take in a CVPixelBuffer instance as input. There are options that you can use to specify a normalization factor, per-channel bias, even whether the model expects RGB or BGR images as input.  However there’s practically no equivalent support for the output end of the model.  There’s no easy way to configure the converter to create a Core ML model that outputs a CVPixelBuffer instance hence ready for use by other iOS or macOS system functions.

Core ML Conversion

The documentation on this isn’t so clear either. Browsing Apple’s forums only yield a partial answer.  Fortunately after many hours asking Google, traversing into Apple’s own developer forums, and some independent poking around, I’ve finally found a solution.  Now you don’t need to go through the same predicament that I went through.  Interested? Read on.

The Problem

Most machine-learning models that processes bitmap images expect pixels as floating-point values ranging between 0..1 inclusively.  This likely stems from the nature of many machine-learning primitives which works best on small-valued numbers — take the sigmoid function, as an example.  On top of that, since these algorithms usually work with 32-bit floating-point values, having values small meaning there are more bits available for storing numbers to the right of the decimal point — in other words, more precision. This also seem to resonate well with functions that are more responsive when with inputs close to zero.

However the more common formats for bitmap images are the ones having color values as integers.  Typically each pixel is a triplet of integers, each ranging 0..255 inclusively, and each integer represents the red, green, and blue color channel.  The Keras converter has options to convert RGB images of 8-bit channels into the floating point channels that many machine-learning models expect as input, including converting the 0..255 range of integers into 0..1 floating-point values. However there’s no such option when you want the output to be integer tensors ranging 0..255.

You can post-process the output tensor yourself and convert it into an image.  Just beware that naive post-processing may not be optimal.  If you iterate through every element in the MLMultiArray instance and extract each channel value of each pixel trough the subscript operator be aware of these overheads:

  • This is an Objective-C dynamic dispatch method call – with all the associated overheads of looking up functions at runtime.
  • The method produces an NSNumber instance, which may involve object allocation – unless you would be always lucky enough that the values can be safely inlined as tagged pointers.
  • If you are using the multi-dimensional version of the subscript operator, you would need to do object allocation to specify the tensor index, at least for the array.

Moreover converting each pixel value this way means that you are not taking advantage of acceleration by the GPU or even the vector processor. But optimizing this conversion yourself often means that you are duplicating the some of the work that Core ML is already doing.  Fortunately there is a way to get Core ML do this post-processing such that you can do away with MLMultiArray altogether and get a ready-to-use CVPixelBuffer instead.

The Solution

To create a Core ML model that outputs an image, you would need to perform some surgery to the model after the conversion process.  First you convert your machine-learning model from whatever format that you originally train it on into the Core ML format.  Then you use the coremltools Python library to add a post-processing layer on top of your original model’s output layer. Then save the result into another Core ML model ready for use in your Xcode project.

The following is an overview of the surgery procedure.  I recommend you do this surgery in the Jupyter Notebook environment so that you can “go slow” and check the result of each command.

  1. Load the converted Core ML model.
  2. Add a new ActivationLinear layer at the end of the model, using alpha=255 and beta=0.
  3. Mark the new layer as an image output layer.
  4. Save the changed model as a new Core ML model file.
  5. Test the new model using samples from the original training data.

Loading the Core ML model

Use the coremltools library to load the model into the Python environment.  A Core ML model file is in the protocol buffer format and probably you can edit it using other protobuf tools. But it is easier if you just use the tools that Apple provides.

Furthermore an MLModel object isn’t meant for modification.  To change the model, you would need to access the spec attribute and mutate that instead. Then when you’re done, you create another MLModel object from the modified spec object and save that new object into a file.

coreml_model = coremltools.models.MLModel('my_model.mlmodel')
spec = coreml_model.get_spec()
spec_layers = getattr(spec,spec.WhichOneof("Type")).layers

Adding a Linear Activation Output Layer

In this step you would need to find the original model’s output layer and then append the model with a new activation layer.  This activation layer would convert the tensor from having a range of 0..1 into 0..255, ready for use in an image.  This new layer will become the model’s output layer

# find the current output layer and save it for later reference
last_layer = spec_layers[-1]

# add the post-processing layer
new_layer = spec_layers.add()
new_layer.name = 'convert_to_image'

# Configure it as an activation layer
new_layer.activation.linear.alpha = 255
new_layer.activation.linear.beta = 0

# Use the original model's output as input to this layer
new_layer.input.append(last_layer.output[0])

# Name the output for later reference when saving the model
new_layer.output.append('image_output')

# Find the original model's output description
output_description = next(x for x in spec.description.output if x.name==last_layer.output[0])

# Update it to use the new layer as output
output_description.name = new_layer.name

Marking the Output Layer as an Image

Then you would need to let Core ML model know that your new output layer represents an image.  To do this, use the convert_multiarray_output_to_image on the new layer.  I’ve found this function in Apple’s discussion forum posted by an Apple staff — not sure why it is not integrated into Core ML.  Nevertheless, the function is not a complete solution without additional the linear activation output layer described earlier.

# Function to mark the layer as output
# https://forums.developer.apple.com/thread/81571#241998
def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False): 
    """ 
    Convert an output multiarray to be represented as an image 
    This will modify the Model_pb spec passed in. 
    Example: 
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel') 
        spec = model.get_spec() 
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False) 
        newModel = coremltools.models.MLModel(spec) 
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel') 
    Parameters 
    ---------- 
    spec: Model_pb 
        The specification containing the output feature to convert 
    feature_name: str 
        The name of the multiarray output feature you want to convert 
    is_bgr: boolean 
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR 
    """ 
    for output in spec.description.output: 
        if output.name != feature_name: 
            continue 
        if output.type.WhichOneof('Type') != 'multiArrayType': 
            raise ValueError("%s is not a multiarray type" % output.name) 
        array_shape = tuple(output.type.multiArrayType.shape) 
        channels, height, width = array_shape 
        from coremltools.proto import FeatureTypes_pb2 as ft 
        if channels == 1: 
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE') 
        elif channels == 3: 
            if is_bgr: 
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR') 
            else: 
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB') 
        else: 
            raise ValueError("Channel Value %d not supported for image inputs" % channels) 
        output.type.imageType.width = width 
        output.type.imageType.height = height 

# Mark the new layer as image
convert_multiarray_output_to_image(spec, output_description.name, is_bgr=False)

Saving the Updated Model

Having a new layer, you would need to save it into a new model file.  To do this, create a new MLModel object using the spec object you’ve been working on and then save that object into a file.

updated_model = coremltools.models.MLModel(spec)

updated_model.author = 'John Doe'
updated_model.license = 'Do as You Please'
updated_model.short_description = 'Sample Model'
updated_model.input_description['image'] = 'Input Image'
updated_model.output_description[output_description.name] = 'Predicted Image'

model_file_name = 'updated_model.mlmodel'
updated_model.save(model_file_name)

Testing the Updated Model

To make sure that your surgery is correct, take a sample from the original training set used to train the model and then use the updated Core ML model to perform prediction.  If all goes well, the prediction result should roughly match the truth value that you have in the training data.  Remember that at this point you are not testing the model’s quality but instead making sure that the surgery went well as expected.  Hence that’s why I recommend to use some samples from the training set.

loaded_model = coremltools.models.MLModel(model_file_name)
y_hat = loaded_model.predict({'image': ... })

Final Words

Hopefully this is useful for you.  It took me a few weeks to figure this out – time that you don’t have to spend yourself.  In any case, I do feel that coremltools should have this feature built-in without all of this manual surgery. But then again, there could be other options or corner cases that I haven’t considered.

Thats all for now. Until next time 



Do you enjoy this post? Enter your e-mail address below to receive articles like this one in your mailbox.
* indicates required

Discussion

No comments yet.

Leave a Reply

Free Updates!

Learn how to grow your indie business while keeping your day job.

Categories

Archives

Keep updated!

Don't miss out on new articles!