Deep Learning — it’s not only about kitties in mobiles, or how we proceeded in locomotive bogies defects identification

I`d like to mention, that due to specific features, we were able to use “TTA (test-time augmentation)” — spectacular example of kaggle-style hack from competitions, that flows badly to production and semantic segmentation on se_resnext50 encoder base, giving overwhelming result for mask prediction accuracy.

Task description.

It is necessary to construct hardware-software system of brake pads defects detection and data output to the shift man.

Task background.

It appeared, that a vast number of brake pads, about 80%, are changed in LTSP (locomotive technical servicing points), and this occurs every 72 hours for each locomotive.

Basic mass of checks in LTSP is visual inspection of external locomotive bogie part by the expert.

Task solving plan:Equipment selectionData collectionLearning modelDevelopment of server with REST APIDevelopment of client for Android padDesign and assembly of support for camera and light placementPilot testsEquipment selection.

Probably, the hardest problem was to choose camera, lens and light in conditions of limited budget and time: MVP was to be accomplished within 1,5 months.

Google made me an expert in the hardware for machine vision in a couple of days.

Finally, I chose Basler cameras and pulse illumination of 6k lumen, synchronizing with the camera.

In favor of Basler (70 f/s, resolution up to 1920х1024) counted it`s python API, what made all systems components integration much easier, the only minus was price for the cameras about 1200$.

Choice of camera lens was complicated, because focal distance and viewing angle were not evident, so we had to risk it, but lens calculator and some luck gave us an opportunity to succeed in this.

 Illuminating: on a trial basis were determined light-emitting diodes glow required time, their types and lens characteristics.

Three different LED lens modification were examined, with angles of 30, 45 and 60.

Finally, I chose mat lens with angle 45.

Assembly and control signal of camera pulse illumination checkFor server hardware I chose Intel Core i7–7740X Kaby Lake, 46gb RAM, 1 TB SSD and 3х1080Ti — that`s enough to predict 2 three section locomotives no longer, than in 2 minutes.

Primitive cooling of video cards sandwich blows off 10 degrees.

Data collection.

Development of data set is an individual case, so you can`t commit it to someone, that`s why I was sent out to very far and little known city at the back of our huge motherland.

I made photos of about 400 pads to my mobile phone (!!!).

At risk of preempting myself, let me tell you, that brave train yard employees, maybe for the fear of inspector from Moscow arrival, changed all locomotive pads to the new ones and covered them with a fresh coat of paint, looking at this was a bit fun and scary at the same time.

I looked forward to the worst, but I had some more 400 photos of other brake pads, which I made in the train yard, located in Moscow.

The only thing left was to believe in a miracle, load with augmentation, invent the heuristic for false segments deletion, which were a few, as I didn`t think about anti-examples.

Expectations:Reality:Here I should mention, that there were no examples of seriously worn brake pads.

Learning model.

Model with encoder se_resnext50 and decoder with scse block from this repository showed the best results, but scse(realization for pytorch) had to be removed for the reasons of process prediction speeding up, as prediction was to be made within a minute.

For model learning was used framework Pytorch 1.


1, with large number of augmentations from albumentations and self recorded Horizontal Flip augmentation for changing category under a mapping.

As a function of loss I chose The Lovász-Softmax loss, it acted almost the same as bce + jaccard, but better, than BCE, which fits too much for marking.

Also algorithm of wheel pair and brake pad sequence number definition was a challenge, there were some variants with metric learning, but I had to show results quickly, and I got an idea to mark out pads to classes 1 and 2, where 1 meant orientation to the right, and 2 was to the left.

Network began to predict not only mask, but also orientation.

By means of simple heuristics I succeeded in identifying sequence numbers of wheel pairs and pads safe enough, making average predictions further, I practically use ТТА with a little shift of object to the left, when moving, and various light angles, this gives a good result in mask accuracy even at 320х320 resolution.

Also I had a separate problem with pad wedge-shaped defect definition, I had plenty of ideas about this starting from Hough transform and lasting to marks of pad angles/borders with dots/lines of different classes.

Finally, the winner was that variant, which reflected, how this was done by working men: you have to pull back 5 cm from the thin edge and measure the width, if it is within normal limits, then pad is skipped.

Pipeline for learning was taken from MICCAI 2017 Robotic Instrument Segmentation.

Learning process consists of 3 stages: learning with frozen encoder, learning of the whole network and learning with CosineAnnealingLR.

In first two stages was used ReduceLROnPlateau.

REST server and Android client programming.

For REST server development I chose flask — the simplest kind of thing, start takes 2 minutes.

I decided to make selfmade storage data base in the form of simple folder structure with active file.

Application for pad was based on Android Studio, luckily, last versions are quite a paradise for the developer.

Design and assembly of support for the camera and light placement.

I casted my mind back, when I produced electric vehicle charging stations, and this experience helped me much — we decided to use frames from constructive aluminium, typed by 3D-printer.

Tests are starting!The result exceeded all expectations.

Computer vision experts can find this problem pretty plain and simple.

However, I had a bit of scepsis regarding two things: firstly, learning data selection was not very wide and didn`t contain critical cases like very thin brake pads; secondly, tests were conducted in quite different conditions of shooting and lighting.

Jaccard comes up to 0.

96 in validation, brake pads are segmented visually clear, if we add averaging over several photos, we get really good accuracy of pad width evaluation.

During tests conduction it occurred, that there was an opportunity to work with different locomotive bogies, but cameras should possess higher speed.

In conclusion I would like to say, that technology showed very good results, and I think, that it has high potential in relation to human factor excluding, locomotive waiting time reduction and forecasting.

GratitudeI am thankful to ods.

ai community, without your help I would not succeed in doing this for such a short period of time!.Many thanks to n01z3, who gave me an idea to get on with DL, for his priceless advice and extraordinary professionality!.Also I am very grateful to my mastermind Visiliy Manko (CEO, Aurorai company), and best designer Tatyana Brusova.

   Hope to see you in the next episode of my story!Aurorai, llc????.Read this story later in Journal.

????.Wake up every Sunday morning to the week’s most noteworthy Tech stories, opinions, and news waiting in your inbox: Get the noteworthy newsletter >.

. More details

Leave a Reply