Azure Project Overhead Part 2

Azure Project Overhead Part 2
An Azure Custom Vision model can be brought to the Edge for faster inference!!

Following what I did in Overhead Part 1, in Part 2 I wanted to add an improvement whereby the Raspberry Pi would be able to assess what it was seeing and simply report what the result was in text within the result SMS. My hope was to try and eliminate or reduce the need for having the image sent to an email address and having the user open an email as a separate task.

I discovered that it was possible to have the application take the image and then assess the image and give it a resultant output name or label. Image classification focusses on the idea that with enough training data that is organised into the various labels or classes we desire in the domain and run through a machine learning algorithm to produce a model, then we can get an extremely accurate classification confidence or prediction score when we introduce a brand new image from the same domain to the trained model. We can then keep improving the model by re-training with additional training data to allow the model to work better with new images. With a confidence score as a percentage, my application could return it back to user as part of the SMS reply.

I added in functionality for the .NET application to predict what the camera has seen in terms that are easy to understand, such as ‘Spot 1 Free’, ‘Spot 2 Free’ , 'Pillar Spot Free' etc. This was possible through the use of around 1200 images taken of the car park showing different states of the car park, labelling them with our labels 'Spot 1 Free', 'Spot 2 Free' etc and training a model using Custom Vision (on the Free tier, using a 'General' model). After training a model with the free Quick Training option we can use the resultant REST API that can be called with the resultant prediction key and our image payload (as the byte array itself or a Url) to get a prediction result as a confidence score. (The Standard tier training-by-hour option will produce a higher quality model but costs about £7 per hour of training).

private static HttpClient client { get; set; }
private static string predictionKey { get; set; } 
private static string predictionEndPointUrl { get; set; }

public static string PredictImage(byte[] byteData) {

    HttpResponseMessage response;
    //remember that for a Custom Vision API prediction, your image data must be less than 4MB. Reduce image resolution or use compression to be under 4MB
    using (var content = new ByteArrayContent(byteData))
        var uri = new Uri(predictionEndPointUrl);
        var request = new HttpRequestMessage(HttpMethod.Post, uri);
        request.Headers.Add("Prediction-Key", predictionKey);
        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");

        request.Content = content;

        response = client.Send(request);
        var data =  response.Content.ReadAsStringAsync();
        JsonSerializer serializer = new JsonSerializer();
        CustomVisionResponse cvResponse = JsonConvert.DeserializeObject<CustomVisionResponse>(data.Result);
        cvResponse.predictions.Sort((a, b) => b.probability.CompareTo(a.probability)); //sort order of confidence score from highest to lowest
        string prediction1 = cvResponse.predictions[0].tagName + ": " + (cvResponse.predictions[0].probability * 100).ToString().Substring(0,6) + "% confidence.";
        string prediction2 = cvResponse.predictions[1].tagName + ": " + (cvResponse.predictions[1].probability * 100).ToString().Substring(0,6) + "% confidence.";
        //get the best 2 prediction scores
        return prediction1 + " " + prediction2;


public PredictionSequence(){

    client = new HttpClient();
    //keep the Custom Vision PredictionKey and Prediction endpoint as secrets in keyvault
    predictionKey = Imager.AccessKeyVault("PredictionKey").Result;
    predictionEndPointUrl = Imager.AccessKeyVault("PredictionEndpoint").Result;
    client.DefaultRequestHeaders.Add("Prediction-Key", predictionKey);

public class CustomVisionResponse{

    public string id { get; set; }
    public string project { get; set; }
    public string iteration { get; set; }
    public DateTime created { get; set; }
    public List<Prediction> predictions { get; set; }

    public class Prediction
        public string tagId { get; set; }
        public string tagName { get; set; }
        public double probability { get; set; }


As seen in Part 1, we specifically used an image resolution of 2700 x 2300. This is done to make sure we use an input image that can be under 4MB rather than using further jpeg compression to be under 4MB which can affect the output accuracy. With this new scoring in place, we can update our previous methods for sending the text result back to the user as follows to include the result prediction score:

var prediction = PredictionSequence.PredictImage(imageData); // add prediction call

PostCompletionMessage(recordedblockBlobName, prediction, logicAppUrl); //PostCompltionMessage method now with additional prediction param

private static void SendTextMessage(string prediction)
{  //use Twilio.Rest.Api.V2010.Account namespace here
    TwilioClient.Init(accountSid, authToken);

    body:"Imaging Complete: "+ prediction,
    from: new Twilio.Types.PhoneNumber(fromNumber),
    to: new Twilio.Types.PhoneNumber(toNumber)

private static void PostCompletionMessage(string recordedblockBlobName, string prediction, string logicAppUrl)
    var uri = new Uri(logicAppUrl);
    var request = new HttpRequestMessage(HttpMethod.Post, uri);
    request.Content = new StringContent("{" + $"\"fileName\":\"{recordedblockBlobName}\",\"prediction\":\"{prediction}\"" + "}", Encoding.UTF8, "application/json"); 

After my initial implementation, the results were pretty good, with most predictions being accurate, coming back with confidence levels of 83% to 94% where the model was actually correct. However, I felt that the predictions themselves were being slow to come back from Custom Vision. We have a way to run the prediction sequence as a Docker Container running on the device itself for hopefully faster inference. With Custom Vision, we do this by changing the training domain to 'General Compact' in the project settings, training again, then downloading the appropriate Docker file (currently, there is Windows, Linux and ARM (this ARM dockerfile is specifically designed for RPi but can also be modified to run on a Jetson Nano here, but I was unable to get this to work on my Nano, ran into the h5py install error hell when trying to complete the install tensorflow step)). Running the inference at the Edge also means that prediction results are completely free and there is no usage against the 10 000 predictions per month limit we would have on the Free tier for the Custom Vision REST API.

I chose ARM and I copied the extracted files from the zip file to the Pi, switching to the file location of the Dockerfile, then running:

sudo docker -t build myimage .

This builds the Docker image, and then I ran the following to allow predictions to be made from the local flask server that acts as the server for receiving requests:

sudo docker run -p -d myimage

Then I made a call to the new local endpoint with a sample image (image still needs to be under 4MB) from my training set with:

curl -X POST -F imageData='locationOfTestFile.jpg'

Nice. This worked with a 93% confidence score but wait... this Edge prediction on a Rpi 3b took about 8 seconds to get the prediction result. WHAT'S HAPPENING!! That's even worse than the REST API call which I thought 'should' take longer because that needs to go to the Cloud first and then come back. Using Ethernet was not an option for me considering the location of the Rpi. Here is what I discovered compared to other devices (Desktop PC and Surface both used Linux containers on Windows):


Desktop (Intel Core i5-4690k)

Surface Book 2

Raspberry Pi 3b

Raspberry Pi 4

Edge inference





REST API call(all wifi for fairness)










It turns out when taking  a further look at the supplied ARM Docker file, there are additional lines that can be enabled to allow the use of OpenCV image resizing which is faster than the default CVS image resizing:

##make sure these lines are uncommented in the Docker file and save
RUN echo "deb jessie/updates main" >> /etc/apt/sources.list & apt update -y
RUN apt install -y  zlib1g-dev libjpeg-dev gcc libglib2.0-bin libsm6 libxext6 libxrender1 libjasper-dev libpng16-16 libopenexr23 libgstreamer1.0-0 libavcodec58 libavformat58 libswscale5 libqtgui4 libqt4-test libqtcore4
RUN pip install opencv-python --extra-index-url ''

##add this line to properly add opencv to the image for faster resizing
RUN apt-get install python-opencv -y

By building up a new image that uses faster OpenCV image resizing, the edge predictions are now much faster, averaging around 3secs on both Rpi 3b and 4 :). Unlimited inference at the edge!!. Thinking on a wider horizon, to make these faster we can leverage the processing power of any other more capable edge device with more compute that we may have in our environment, run inference on that high compute node and then supply the results to the lower compute nodes placed in a variety of locations for example. In an industrial setting, you would use something like an Azure Stack Edge to do inference and pass around data in your estate as needed. In my environment, my desktop machine could run the Linux docker container locally with its network IP address like so and wait for requests:

##download and unpack the Linux or Windows zip from Custom Vision and build your PC, the Dockerfile this time does not need any further editing  
sudo docker -t build largecompute .

## being the address of your PC, the machine with plenty CPU resources
sudo docker run -p -d largecompute

Then the low compute device (e.g. a Rpi) could then make its requests to the edge inference server running on the large compute machine (e.g. a powerful PC):

curl -X POST -F imageData=@'testimagelocation.jpg'

The result, a Rpi that now gets prediction results in under 300ms!!. This hybrid approach is not advisable for production scenarios where network security is a concern, but the idea of being able to return prediction results quickly from relatively slow machines is very powerful.


When I looked for a fast way to do inference at the edge, there were different avenues I looked into, but I always  ran into some issues which led me to try the next available option. Here are some of them:

ONNX Inference - runs blazingly fast, I was getting results in 60ms.....on an x64 Windows machine and there only seems to be support for the project type to be Universal Windows Platform which requires Windows. Even when we find a way to call into the ONNX APIs within our desired .NETCore/NET5 project type (which is needed to run on Raspberry Pi), we still need to get the ONNX runtime to be properly installed on our Rpi. To get the ONNX runtime to run on ARMv7 (for Rpi 3b), you have to build it here and use Raspian Stretch, but I could not build it after trying to build the docker image from a Windows machine and also natively on the Rpi itself (which was building ok for a whole 19hours..... then just failed because a whl file could not be built).  The other way round this if we still want to use C#, is to write the onnx calling code in the application, and then use Windows IoT Core as the operating system for our Pi. This OS requires a significantly faster performing microsd card from my experience with it in the past to get a good enough experience. Then you have to use a compatible USB camera (which I didn't have) since there is no CSI camera support in Windows IoT Core

ML.NET - got really excited here because I could train my models locally on my Desktop for unlimited iterations and longer periods of time, and with GPU training support added in later versions, I felt that the GTX 960 could be put to good use and I could get a high quality model fast and deploy it to my ARM Rpi. The model reported about 86% accuracy and predicted images well in well less than 1 second, but as you guessed it... ML.NET inference is only supported x86 and x64 machines and this simply cannot be deployed to ARM systems.

Azure ML - another solution I didn't specifically try here, but this still means using the Cloud to get prediction results and this would result in return times similar to the custom vision REST API calls. This is also a rather expensive solution in general    

The optimum solution therefore, from start to finish for a Raspberry Pi with a good response time for prediction results with good practicality is to use Edge inference to have unlimited inference calls and save on costs beyond the 10 000 free predictions per month with the cloud REST API, enable OpenCV resizing in the docker file to have quicker image resizing/processing and also utilise Azure IOT Edge (not covered here) as a much more efficient way to continuously deploy newer/better trained models back onto the device. The hybrid solution I had by having a local PC for handle the requests works and is even faster but can be impractical for some because this high end machine can use a large amount of power over time and this be wasteful during times of low traffic.