Robotics Feedback Loop

Image Classification for Robotic Plastering with Convolutional Neural Networks

Intro

Image Classification for Robotic Plastering with Convolutional Neural Networks

Inspecting robotically fabricated objects to detect and classify discrepancies between virtual target models and as-built realities is one of the challenges that faces robotic fabrication. Industrial-grade computer vision methods have been widely used to detect manufacturing flaws in mass production lines. However, in mass-customization, a versatile and robust method should be flexible enough to ignore construction tolerances while detecting specified flaws in varied parts. This study aims to leverage recent developments in machine learning and convolutional neural networks to improve the resiliency and accuracy of surface inspections in architectural robotics. Under a supervised learning scenario, the authors compared two approaches: 1) transfer learning on a general purpose Convolutional Neural Network (CNN) image classifier, and 2) design and train a CNN from scratch to detect and categorize flaws in a robotic plastering workflow. Both CNNs were combined with conventional search methods to improve the accuracy and efficiency of the system. A web-based graphical user interface and a real-time video projection method were also developed to facilitate user interactions and control over the workflow.

Motivation:

Surface finishing is an essential domain in the architectural construction practice, which requires high-skilled workers and demand accurate quality control procedures. By way of example, the authors have developed a robotic workflow to use industrial robots for decorative plastering techniques. One of the remaining challenges in this workflow is to implement an automated, precise, and reliable quality control pipeline to guarantee satisfying results through a touch-up scenario. The touch-up procedure would let the user automatically inspect the surface and detect any unwanted fabrication artifact and command the robot to correct it.

Our approach requires a vision-based solution to detect texture flaws (i.e., scratches, bubbles, …) and small-scale 3D finishing issues (i.e., holes, uncovered patches). It proposes a single-camera solution without 3D reconstruction as the main input for the quality check workflow. This will result in simpler hardware setup, faster workflow, and lower costs. This approach can also be useful for other fabrication workflows, for example subtractive and deforming manufacturing.

The proposed system takes advantage of a state of the art computer vision method based on Convolutional Neural Network (CNNs or ConvNets) for image classification and object detection.

CNN Model:

A significant trade-off of using transfer learning is the heavy model that it entails. Trained to classify one thousand classes of objects, the CNN trained model occupies hundreds of megabytes on the system storage and requires expensive computation to process a single image.

However, in our case, most of the captured images are of low contrast with primarily white backgrounds and subtle changes in color. This color space requires different feature layers for an efficient classification. Accordingly, the authors designed and trained a sequential multi-layer CNN. This architecture has already been proved its performance in several state-of-the-art models, including AlexNet and later VGGNet . The proposed architecture is significantly simpler than of the Inception, resulting in a speed boost.

We designed and tested a series of CNNs using Keras with Tensorflow back-end to find an optimum architecture. Several combinations of convolutional, dropouts, and fully connected layers have been tested. In each architecture, all models have been trained for a fixed number of epochs and the model with the highest f1 score were selected. The results from each architecture were then compared with each other to select the optimum architecture. The selected architecture demonstrated the highest f1 score on both 5 and 3-class classification, while the others failed to demonstrate same f1 score or took longer epochs to converge to the same score.

The proposed architecture consists of four convolutional layers (3x3 kernel) paired with relu activation function, and maxPooling (2x2), followed by three fully connected layers and softmax at the end. To reduce the effects of overfitting, it also leverages dropout to prevent inter-dependencies between hidden layer nodes. Then we compared it with using transfer learning technique on VGG 16.

CNN Model

Training Data:

The data set consists of images taken from a series of plaster finishes applied by a robot on drywall test panels. To collect the training samples, the GoPro camera was used to take 5 mega-pixel images of available plastered panels. Due to the GoPro camera significant lens distortion an image calibration method was applied using OpenCV, and only the central 1024px x 1024px region of each image was used. Images were manually cropped and labeled into smaller sections as one of the three main classes: 1) pass, 2) fail, 3) markup. Then the same data set was categorized in five classes; perfect or near-perfect plaster regions were labeled as 1) pass, while images containing fabrication flaws including: 3) holes, 4) scratches, and 5) unfinished surfaces were categorized as fail. The markup class was left intact .

The markup class was dedicated to hand drawn characters that users could sketch on the work surface to communicate with the robot. Markup training samples were taken from hand drawn marks on a white surface in the same lighting condition as the plastered panels.

Data Set

Training:

We leveraged data augmentation to increase the data set size and improve the model’s resiliency against small variations of the input data. Training and test samples where reshaped to the same size (28x28x3) beforehand. The model was trained in two scenarios, one with 1) pass, 2) fail, and 3) markup labels[1] and the second one trained to define different types of fail including 1) bad finish, 2) hole, 3) rough finish.

Markups are simple user-defined drawings, i.e. circles and crosses, that can be used to communicate with the system.

Trainng

User Interaction:

Interaction model consists of a real-time image projection over the working area. Users can jog the robot to a specific region of the surface and trigger the feedback process. In this case, a mounted projector highlights the detected flaws directly on the surface in real-time (Figure 7). In this model of interaction, users do not need to use a separate computer unit, making it more useable for on-site applications. Users can also interact with the robot by drawing a series of pre-defined mark ups on the surface. Using such markups, user can override the feedback loop results and force the robot to add/remove specific regions to error/pass cases (Figure 8).

User Interaction