PYNQ_301: OBJECT DETECTION¶


The aim of this notebook is to:

  • Understand what is YOLOv3 and VOC.
  • Learn how to use the YOLOv3 model to detect objects in an image and a webcam feed.
  • Learn how to display information on an OLED.

Usefull Term Cheat Sheet:¶

Here is a cheat sheet of some usefull terms that may pop up in this notebook. As always, feel free to ask me (Aamir) any questions!

Term Description
DPU Data Processing Unit
YOLO This is the object detection algorithm we are using!
Overlay Overlays are designs for the FPGA
FPGA Field Programmable Gate Array. This is a special kind of chip that we can reprogram to do many different tasks very efficiently!
Array Another word for list. Multiple variables or items stored under one variable name.
Tensor A multidementional array also known as a matrix. (arrays within arrays)

| Anchors | They are a set of predefined bounding boxes of a certain height and width. | OLED | We will use an Organic light emitting diode (OLED) display to display some useful information after predictions.

image.png

What is YOLO?¶


image-2.png

You only look once (YOLO) is a Convolutional Neural Network (CNN) for performing object detection in real-time. CNNs are classifier-based systems that can process input images as structured arrays of data and recognize patterns between them.

What is VOC?¶


VOC is a dataset which contains data from the PASCAL Visual Object Classes Challenge. A total of 11540 images are included in this dataset, where each image contains a set of objects, out of 20 different classes, making a total of 27450 annotated objects.
image-3.png
image-4.png

Hardware Setup¶


1. KRIA KV260 Board
image-2.png

2. PYNQ Grove Adapter
image.png

3. OLED Display
image-3.png

4. Webcam
image-5.png

Let's get started¶


1. Prepare the overlay¶

We will download the overlay onto the board.

In [ ]:
from pynq_dpu import DpuOverlay
from pynq_peripherals import PmodGroveAdapter
overlay = DpuOverlay("dpu.bit")
In [ ]:
# Initiate pmod adapter. G4 indicates that the OLED is connected to the G4 slot on the PYNQ Grove Adapter
adapter = PmodGroveAdapter(overlay.PMODA, G4='grove_oled')

2. Utility functions¶

In this section, we will prepare a few functions for later use.

In [ ]:
# Lets import some libraries we will need later on:
import os                                  # The os library provides some functions for interacting with your operation system
import time                                # The time library provides some time related functions 
import numpy as np                         # numpy or np is a library for processing numerical data.
import cv2                                 # cv2 is a library for image processing which we need later
import random                              # This module help in generating pseudo-random number generators
import colorsys                            # This module helps in creating bidrectional conversions of color values
from matplotlib.patches import Rectangle   # This module is used to plot rectangles
import matplotlib.pyplot as plt            # Used to create a figure/ plot areas in a figure, plot lines in a figure
from IPython.display import display, Image # Imports public APIs for display tools in IPython

# This line enables matplotlib graphs to be included in the notebook, next to the code
%matplotlib inline                        

Now we'll load the yolov3 xmodel which is trained on the voc dataset.

In [ ]:
overlay.load_model("tf_yolov3_voc.xmodel")

The YOLOv3 model predicts offsets from a predetermined set of boxes with particular height-width ratios; those predetermined set of boxes are the anchor boxes. Let's define those..

In [ ]:
anchor_list = [10,13,16,30,33,23,30,61,62,45,59,119,116,90,156,198,373,326]
anchor_float = [float(x) for x in anchor_list]
anchors = np.array(anchor_float).reshape(-1, 2)

The VOC dataset consists of 20 classes which can be detected. The voc_classes.txt has the list of classes.

image.png

In [ ]:
# Get Model Classification Information
def get_class(classes_path):
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names
    
classes_path = "img/voc_classes.txt"
class_names = get_class(classes_path)

To understand the output of the model in an easier manner we would want to draw a bounding box around the object which is detected and display a score which represents the probability of thhe detected object belonging to a specific class.¶

image.png

We can associate each class with a specific color. The cell below does this for us.¶

In [ ]:
# Define unique colors for each class
num_classes = len(class_names)
hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]
colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
colors = list(map(lambda x: 
                  (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 
                  colors))
random.seed(0)
random.shuffle(colors)
random.seed(None)

Let's define some useful functions which will be used later in the notebook.¶

In [ ]:
# This function resizes the image with unchanged aspect ratio using padding.
def letterbox_image(image, size):
    ih, iw, _ = image.shape
    w, h = size
    scale = min(w/iw, h/ih)
    
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = cv2.resize(image, (nw,nh), interpolation=cv2.INTER_LINEAR)
    new_image = np.ones((h,w,3), np.uint8) * 128
    h_start = (h-nh)//2
    w_start = (w-nw)//2
    new_image[h_start:h_start+nh, w_start:w_start+nw, :] = image
    return new_image

# This function performs pre-processing by helping us in converting the image into an array which can be fed for processing.
def pre_process(image, model_image_size):
    image = image[...,::-1]
    image_h, image_w, _ = image.shape
 
    if model_image_size != (None, None):
        assert model_image_size[0]%32 == 0, 'Multiples of 32 required'
        assert model_image_size[1]%32 == 0, 'Multiples of 32 required'
        boxed_image = letterbox_image(image, tuple(reversed(model_image_size)))
    else:
        new_image_size = (image_w - (image_w % 32), image_h - (image_h % 32))
        boxed_image = letterbox_image(image, new_image_size)
    image_data = np.array(boxed_image, dtype='float32')
    image_data /= 255.
    image_data = np.expand_dims(image_data, 0) 
    return image_data

# This function gets information on box position, its size along with confidence and box class probabilities
def _get_feats(feats, anchors, num_classes, input_shape):
    num_anchors = len(anchors)
    anchors_tensor = np.reshape(np.array(anchors, dtype=np.float32), [1, 1, 1, num_anchors, 2])
    grid_size = np.shape(feats)[1:3]
    nu = num_classes + 5
    predictions = np.reshape(feats, [-1, grid_size[0], grid_size[1], num_anchors, nu])
    grid_y = np.tile(np.reshape(np.arange(grid_size[0]), [-1, 1, 1, 1]), [1, grid_size[1], 1, 1])
    grid_x = np.tile(np.reshape(np.arange(grid_size[1]), [1, -1, 1, 1]), [grid_size[0], 1, 1, 1])
    grid = np.concatenate([grid_x, grid_y], axis = -1)
    grid = np.array(grid, dtype=np.float32)

    box_xy = (1/(1+np.exp(-predictions[..., :2])) + grid) / np.array(grid_size[::-1], dtype=np.float32)
    box_wh = np.exp(predictions[..., 2:4]) * anchors_tensor / np.array(input_shape[::-1], dtype=np.float32)
    box_confidence = 1/(1+np.exp(-predictions[..., 4:5]))
    box_class_probs = 1/(1+np.exp(-predictions[..., 5:]))
    return box_xy, box_wh, box_confidence, box_class_probs


# This function is used to correct the bounding box position by scaling it
def correct_boxes(box_xy, box_wh, input_shape, image_shape):
    box_yx = box_xy[..., ::-1]
    box_hw = box_wh[..., ::-1]
    input_shape = np.array(input_shape, dtype = np.float32)
    image_shape = np.array(image_shape, dtype = np.float32)
    new_shape = np.around(image_shape * np.min(input_shape / image_shape))
    offset = (input_shape - new_shape) / 2. / input_shape
    scale = input_shape / new_shape
    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes = np.concatenate([
        box_mins[..., 0:1],
        box_mins[..., 1:2],
        box_maxes[..., 0:1],
        box_maxes[..., 1:2]
    ], axis = -1)
    boxes *= np.concatenate([image_shape, image_shape], axis = -1)
    return boxes

# This function is used to get information on the valid objects detected and their scores
def boxes_and_scores(feats, anchors, classes_num, input_shape, image_shape):
    box_xy, box_wh, box_confidence, box_class_probs = _get_feats(feats, anchors, classes_num, input_shape)
    boxes = correct_boxes(box_xy, box_wh, input_shape, image_shape)
    boxes = np.reshape(boxes, [-1, 4])
    box_scores = box_confidence * box_class_probs
    box_scores = np.reshape(box_scores, [-1, classes_num])
    return boxes, box_scores


# This function suppresses non-maximal boxes by eliminating boxes which are lower than the threshold
def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    areas = (x2-x1+1)*(y2-y1+1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 1)
        h1 = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= 0.55)[0]  # threshold
        order = order[inds + 1]

    return keep

# This function gives essential information about the objects detected like bounding box information, score of the object 
# detected and the class associated with it
def evaluate(yolo_outputs, image_shape, class_names, anchors):
    score_thresh = 0.2
    anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    boxes = []
    box_scores = []
    input_shape = np.shape(yolo_outputs[0])[1 : 3]
    input_shape = np.array(input_shape)*32

    for i in range(len(yolo_outputs)):
        _boxes, _box_scores = boxes_and_scores(
            yolo_outputs[i], anchors[anchor_mask[i]], len(class_names), 
            input_shape, image_shape)
        boxes.append(_boxes)
        box_scores.append(_box_scores)
    boxes = np.concatenate(boxes, axis = 0)
    box_scores = np.concatenate(box_scores, axis = 0)

    mask = box_scores >= score_thresh
    boxes_ = []
    scores_ = []
    classes_ = []
    for c in range(len(class_names)):
        class_boxes_np = boxes[mask[:, c]]
        class_box_scores_np = box_scores[:, c]
        class_box_scores_np = class_box_scores_np[mask[:, c]]
        nms_index_np = nms_boxes(class_boxes_np, class_box_scores_np) 
        class_boxes_np = class_boxes_np[nms_index_np]
        class_box_scores_np = class_box_scores_np[nms_index_np]
        classes_np = np.ones_like(class_box_scores_np, dtype = np.int32) * c
        boxes_.append(class_boxes_np)
        scores_.append(class_box_scores_np)
        classes_.append(classes_np)
    boxes_ = np.concatenate(boxes_, axis = 0)
    scores_ = np.concatenate(scores_, axis = 0)
    classes_ = np.concatenate(classes_, axis = 0)

    return boxes_, scores_, classes_

# This function is used to draw boxes around objects post prediction.
def draw_boxes(image, boxes, scores, classes):
    _, ax = plt.subplots(1)
    ax.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    image_h, image_w, _ = image.shape

    for i, bbox in enumerate(boxes):
        [top, left, bottom, right] = bbox
        width, height = right - left, bottom - top
        center_x, center_y = left + width*0.5, top + height*0.5
        score, class_index = scores[i], classes[i]
        label = '{}: {:.4f}'.format(class_names[class_index], score) 
        color = tuple([color/255 for color in colors[class_index]])
        ax.add_patch(Rectangle((left, top), width, height,
                               edgecolor=color, facecolor='none'))
        ax.annotate(label, (center_x, center_y), color=color, weight='bold', 
                    fontsize=12, ha='center', va='center')
    return ax
In [ ]:
# Extract the total number of images ending with "JPEG" format present in the 'img' directory
image_folder = 'img'
original_images = [i for i in os.listdir(image_folder) if i.endswith("JPEG")]
total_images = len(original_images)

3. Object Detection in an image using a filesystem¶

The next few cells help us in processing data for our predictions.

In [ ]:
# Setup input and output tensors
dpu = overlay.runner
inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()

shapeIn = tuple(inputTensors[0].dims)

shapeOut0 = (tuple(outputTensors[0].dims)) # (1, 13, 13, 75)
shapeOut1 = (tuple(outputTensors[1].dims)) # (1, 26, 26, 75)
shapeOut2 = (tuple(outputTensors[2].dims)) # (1, 52, 52, 75)

outputSize0 = int(outputTensors[0].get_data_size() / shapeIn[0]) # 12675
outputSize1 = int(outputTensors[1].get_data_size() / shapeIn[0]) # 50700
outputSize2 = int(outputTensors[2].get_data_size() / shapeIn[0]) # 202800

# Setup Buffers
input_data = [np.empty(shapeIn, dtype=np.float32, order="C")]
output_data = [np.empty(shapeOut0, dtype=np.float32, order="C"), 
               np.empty(shapeOut1, dtype=np.float32, order="C"),
               np.empty(shapeOut2, dtype=np.float32, order="C")]
image = input_data[0]

The function defined below is the main function which performs pre-processing on a frame, makes model predictions and decode the output.¶

In [ ]:
# Function to perform pre-processing, model predictions and decoding output
def run(frame, display=False):
    
    # Pre-processing
    image_size = frame.shape[:2]
    image_data = np.array(pre_process(frame, (416, 416)), dtype=np.float32)
    
    # Fetch data to DPU and trigger it
    image[0,...] = image_data.reshape(shapeIn[1:])
    job_id = dpu.execute_async(input_data, output_data)
    dpu.wait(job_id)
    
    # Retrieve output data
    conv_out0 = np.reshape(output_data[0], shapeOut0)
    conv_out1 = np.reshape(output_data[1], shapeOut1)
    conv_out2 = np.reshape(output_data[2], shapeOut2)
    yolo_outputs = [conv_out0, conv_out1, conv_out2]
    
    # Decode output from YOLOv3
    boxes, scores, classes = evaluate(yolo_outputs, image_size, class_names, anchors)
    
    if display:
        _ = draw_boxes(frame, boxes, scores, classes)
        
    return boxes, scores, classes
In [ ]:
# Read an input image in the "img" direectory
input_image = cv2.imread(os.path.join(image_folder, original_images[4]))
In [ ]:
# Perform pre-processing, model predictions and decode the output from the image
run(input_image, display=True)

4. Object Detection by Using a Webcam¶

In [ ]:
# Start capturing a video 
videoIn = cv2.VideoCapture(0)
videoIn.set(cv2.CAP_PROP_BUFFERSIZE, 1)  # Disable buffering
videoIn.set(cv2.CAP_PROP_FRAME_WIDTH, 640);
videoIn.set(cv2.CAP_PROP_FRAME_HEIGHT, 480);

print("Capture device is open: " + str(videoIn.isOpened()))

Running the next cell, would result in capturing a frame¶

In [ ]:
# Extract the frame from the video
ret, frame = videoIn.read()
In [ ]:
# Perform predictions on the frame
boxes, scores, classes = run(frame, display=True)
In [ ]:
# Store information of the detected object which has the highest score. And store the class information of this object
best_score = np.argmax(scores)
class_names[classes[best_score]]

We will now print the object with the highest score on the OLED.

In [ ]:
oled = adapter.G4
oled.set_default_config()
oled.set_normal_display()
oled.put_string("Detected") 
oled.set_position(2, 0)
oled.put_string(f"{class_names[classes[best_score]]}")
In [ ]:
videoIn.release()    # Releasing the video capture object

Example OLED Output for a monitor would look like:¶

image.png

5. Real Time Object Detection from webcam¶

In [ ]:
cap = cv2.VideoCapture(0)    # Start the video capture
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)  # Disable buffering
In [ ]:
display_handle=display(None, display_id=True)    # This displays the video once the frame is updated

Keeping track of only best scoring object in the image...

In [ ]:
last_object = "None"

# while True:
for i in range(200):
    _, frame = cap.read()
    boxes, scores, classes = run(frame)
    
    if scores.any():

        best_score = np.argmax(scores)

        # Draw bounding box
        y_min,x_min,y_max,x_max = map(int, boxes[best_score])
        frame = cv2.rectangle(frame, (x_min,y_min), (x_max, y_max), color=255)

        # Label
        text = f"{class_names[classes[best_score]]}: {scores[best_score]:.2f}"
        text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)[0]
        frame = cv2.putText(frame, text, (x_min, y_min-text_size[1]), cv2.FONT_HERSHEY_SIMPLEX, 1, 255, 1, cv2.LINE_AA)

        _, frame = cv2.imencode('.jpeg', frame)
        display_handle.update(Image(data=frame.tobytes()))
        
        if class_names[classes[best_score]] == last_object:
            pass
        else:
            oled.clear_display()
            oled.put_string(class_names[classes[best_score]])
            last_object = class_names[classes[best_score]]
In [ ]:
cap.release()    # Releasing the video capture object

END OF NOTEBOOK

¶

CHALLENGES:¶

Modify the code in section 5 above in incremental steps:¶

  • Draw a bounding box only when a person is detected.
  • Draw a bounding box when a person is detected with a score of 0.8 and above.

Hint¶

Check the return type of classes.

Bonus Challenge: Display multiple objects detected in a video¶

In [ ]:
cap = cv2.VideoCapture(0)    # Start the video capture
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)  # Disable buffering
In [ ]:
display_handle=display(None, display_id=True)    # This displays the video once the frame is updated
In [ ]:
# Enter Code Here..




        
cap.release()    # Releasing the video capture object

Let's clear the the overlay and the dpu specific data:

In [ ]:
del overlay # Clean up code