Image recognition on Arm Cortex-M with CMSIS-NN in 5 steps

Image recognition on Arm Cortex-M with CMSIS-NN in 5 steps
    Watch the video

    click to begin

    Youtube

    In this video we're going to show a demo of a neural network model
    performing real-time image recognition on a Cortex-M7 processor using Arm's CMSIS-NN library.
    processor using Arm's CMSIS-NN library.
    This highlights the fact that you don't need a high-spec machine or cloud compute to do real-time ML tasks – we've made it so you can do them fast and efficiently on embedded devices.
    cloud compute to do real-time machine learning tasks – we've made it
    so you can do them fast and efficiently
    on embedded devices.
    We'll briefly touch on all the steps that we took to get this up and running
    And we've made all the files available on github here
    for you to do this yourself
    in a Linux environment.
    So here we're using a STM32F7 development board
    and we've got an ST camera connected.
    As with any peripheral, we'll need the basic program for the camera
    to interact with the board - we'll come back to this later on.
    As we're only interested in optimizing our model with CMSIS-NN,
    the steps we're going to show here apply to any board with a Cortex-M processor.
    So the first step in our demo is selecting and training a model.
    And for a model to fit and run on a constrained device, it needs to be small.
    So here we're using the CIFAR10 dataset
    which has been trained with Caffe, available here.
    The CIFAR-10 dataset consists of sixty thousand
    32x32 colour images in these 10 classes.
    We're using a 3-layer convolutional neural network,
    which we can see illustrated here. We've already trained our model.
    so we're just going to show you how to get it to run on a Cortex-M device
    It's important to note that CMSIS-NN's optimizations
    makes use of SIMD instructions.
    And because only Cortex-M4 andM7 cores support SIMD,
    you'll only see the performance benefits with these two cores.
    The next step in the process is to quantize the model.
    Now this is a key step for being able to deploy a model on a resource constrained
    device like a microcontroller, as it greatly reduces the size
    of the model by converting the 32-bit floating point model
    to an 8-bit fixed point model as well as improving
    the overall compute performance.
    This only has a very small impact on the accuracy of the model.
    From 80.3% accuracy when unquantized
    to 79.9% accuracy after quantization.
    So here we navigate to the directory that we've downloaded from github,
    and then in the CMSIS-NN folder we see all the scripts
    and files that you need to generate a quantized model that you can deploy.
    Using this command here
    we're running this python script
    to quantize our CFAR10 model for Cortex-M7
    Here we've specified the weights and also the location
    of where we'll save the quantized model. Now we run that,
    and this is a good time to take a break and do something else, as it can take a couple of hours
    to complete on just a CPU
    Now the script parses the network graph connectivity, and then
    finds the right quantization parameters to quantize.
    So when the quantization has completed,
    we need to transform the model operations and network graph connectivity
    and generate the code we need consisting of neural network function calls.
    Essentially we're transforming the model from a Caffe format
    to a C format.
    So we run the transform python script on our quantized model.
    Here we specify an output directory
    and this generates these files.
    So let's take a quick look in our main file,
    and we can see that the transformation has defined all of the layers needed for this particular network,
    and it's generated all of these function calls
    that call the CMSIS-NN library functions.
    These functions are for the different layers extracted from our trained Caffe model with the different weights
    And then here we have a mock main function.
    This shows the run_nn call, that runs all of the
    different layers and saves the output on the buffer.
    But because it's a mock function, we need to incorporate this function into code
    that actually captures images from the camera and displays them on the screen.
    So for simplicity, we've combined part of the c program,
    he run_nn function, with the basic program
    that comes with the camera.
    And if you want to run this with a different application,
    you'll need to do this with the program that you use.
    After we've combined the code,
    we run a make file as a way of compiling the combined code.
    So from this, we've generated two files, a hex file and a bin file.
    And we can see both of these in our build folder.
    And now the final step is to upload one of these files to the board
    and then we can see the program in action.
    So here, we've got the final program running on the board
    and we can see the image classification in real-time
    on our Cortex-M7 board and it's performing very consistently with a high degree of accuracy
    And we can that the time taken to compute the result of an image through the network
    is very fast.
    So again, we've made all of the scripts available that you need to quantize your model and
    converting it to C code
    here on GitHub
    And for detailed step-by-step instructions on how to run this demo yourself,
    you can read the guide that we've put together, on our developer website by
    following the link in the description.
    Code Radio 🎧 + 💻 24/7 concentration music for programmers 🔥 jazzy beats from freeCodeCamp.org What is Arm Leading Edge? Partner interviews with Toradex and AAEON Arm Demo - always-connected PCs (Wo)man vs Machine Learning: Who Can Win the Image Classification Race? Intelligent Connectivity at Arm - The Kigen Family Arm eSIM Workshop Arm Partner Demo - Ampere Intelligent Connectivity at Arm - The Pelion IoT Platform Intelligent Connectivity at Arm – from IoT over smartphones to large screen compute devices Intelligent Connectivity at Arm - The Neoverse Platforms