xiii.three. Object Detection and Bounding Boxes¶

Open up the notebook in Colab

Open the notebook in Colab

Open the notebook in Colab

Open the notebook in SageMaker Studio Lab

In earlier sections (e.g., Section 7.1Department seven.4), we introduced various models for image classification. In image classification tasks, we assume that at that place is only 1 major object in the image and nosotros only focus on how to recognize its category. Still, there are often multiple objects in the prototype of interest. Nosotros not only want to know their categories, but besides their specific positions in the paradigm. In reckoner vision, we refer to such tasks as object detection (or object recognition).

Object detection has been widely applied in many fields. For example, self-driving needs to plan traveling routes by detecting the positions of vehicles, pedestrians, roads, and obstacles in the captured video images. Besides, robots may use this technique to detect and localize objects of interest throughout its navigation of an environment. Moreover, security systems may demand to detect aberrant objects, such as intruders or bombs.

In the next few sections, we will introduce several deep learning methods for object detection. Nosotros will begin with an introduction to positions (or locations) of objects.

                            %              matplotlib              inline              from              mxnet              import              image              ,              np              ,              npx              from              d2l              import              mxnet              every bit              d2l              npx              .              set_np              ()            
                            %              matplotlib              inline              import              torch              from              d2l              import              torch              equally              d2l            
                            %              matplotlib              inline              import              tensorflow              as              tf              from              d2l              import              tensorflow              as              d2l            

We volition load the sample paradigm to be used in this section. Nosotros can see that there is a dog on the left side of the paradigm and a cat on the correct. They are the two major objects in this image.

                                d2l                .                set_figsize                ()                img                =                image                .                imread                (                '../img/catdog.jpg'                )                .                asnumpy                ()                d2l                .                plt                .                imshow                (                img                );              

../_images/output_bounding-box_d6b70e_15_0.svg

                                d2l                .                set_figsize                ()                img                =                d2l                .                plt                .                imread                (                '../img/catdog.jpg'                )                d2l                .                plt                .                imshow                (                img                );              

../_images/output_bounding-box_d6b70e_18_0.svg

                                d2l                .                set_figsize                ()                img                =                d2l                .                plt                .                imread                (                '../img/catdog.jpg'                )                d2l                .                plt                .                imshow                (                img                );              

../_images/output_bounding-box_d6b70e_21_0.svg

13.iii.1. Bounding Boxes¶

In object detection, we usually use a bounding box to describe the spatial location of an object. The bounding box is rectangular, which is determined by the \(10\) and \(y\) coordinates of the upper-left corner of the rectangle and the such coordinates of the lower-right corner. Some other ordinarily used bounding box representation is the \((x, y)\)-axis coordinates of the bounding box center, and the width and superlative of the box.

Here we define functions to convert between these two representations: box_corner_to_center converts from the two-corner representation to the center-width-acme presentation, and box_center_to_corner vice versa. The input statement boxes should be a 2-dimensional tensor of shape (\(due north\), 4), where \(n\) is the number of bounding boxes.

                                #@salve                def                box_corner_to_center                (                boxes                ):                """Convert from (upper-left, lower-right) to (center, width, height)."""                x1                ,                y1                ,                x2                ,                y2                =                boxes                [:,                0                ],                boxes                [:,                i                ],                boxes                [:,                2                ],                boxes                [:,                3                ]                cx                =                (                x1                +                x2                )                /                two                cy                =                (                y1                +                y2                )                /                ii                west                =                x2                -                x1                h                =                y2                -                y1                boxes                =                np                .                stack                ((                cx                ,                cy                ,                w                ,                h                ),                axis                =-                1                )                return                boxes                #@save                def                box_center_to_corner                (                boxes                ):                """Convert from (center, width, tiptop) to (upper-left, lower-right)."""                cx                ,                cy                ,                w                ,                h                =                boxes                [:,                0                ],                boxes                [:,                ane                ],                boxes                [:,                2                ],                boxes                [:,                three                ]                x1                =                cx                -                0.5                *                w                y1                =                cy                -                0.5                *                h                x2                =                cx                +                0.v                *                w                y2                =                cy                +                0.5                *                h                boxes                =                np                .                stack                ((                x1                ,                y1                ,                x2                ,                y2                ),                axis                =-                ane                )                return                boxes              
                                #@salve                def                box_corner_to_center                (                boxes                ):                """Convert from (upper-left, lower-correct) to (eye, width, height)."""                x1                ,                y1                ,                x2                ,                y2                =                boxes                [:,                0                ],                boxes                [:,                ane                ],                boxes                [:,                2                ],                boxes                [:,                3                ]                cx                =                (                x1                +                x2                )                /                2                cy                =                (                y1                +                y2                )                /                2                w                =                x2                -                x1                h                =                y2                -                y1                boxes                =                torch                .                stack                ((                cx                ,                cy                ,                w                ,                h                ),                axis                =-                1                )                return                boxes                #@save                def                box_center_to_corner                (                boxes                ):                """Convert from (center, width, elevation) to (upper-left, lower-right)."""                cx                ,                cy                ,                w                ,                h                =                boxes                [:,                0                ],                boxes                [:,                one                ],                boxes                [:,                2                ],                boxes                [:,                3                ]                x1                =                cx                -                0.5                *                westward                y1                =                cy                -                0.5                *                h                x2                =                cx                +                0.5                *                westward                y2                =                cy                +                0.5                *                h                boxes                =                torch                .                stack                ((                x1                ,                y1                ,                x2                ,                y2                ),                axis                =-                1                )                return                boxes              
                                #@save                def                box_corner_to_center                (                boxes                ):                """Convert from (upper-left, lower-right) to (center, width, height)."""                x1                ,                y1                ,                x2                ,                y2                =                boxes                [:,                0                ],                boxes                [:,                one                ],                boxes                [:,                2                ],                boxes                [:,                3                ]                cx                =                (                x1                +                x2                )                /                2                cy                =                (                y1                +                y2                )                /                two                w                =                x2                -                x1                h                =                y2                -                y1                boxes                =                tf                .                stack                ((                cx                ,                cy                ,                westward                ,                h                ),                axis                =-                1                )                return                boxes                #@save                def                box_center_to_corner                (                boxes                ):                """Convert from (heart, width, top) to (upper-left, lower-right)."""                cx                ,                cy                ,                w                ,                h                =                boxes                [:,                0                ],                boxes                [:,                1                ],                boxes                [:,                2                ],                boxes                [:,                3                ]                x1                =                cx                -                0.5                *                due west                y1                =                cy                -                0.5                *                h                x2                =                cx                +                0.5                *                w                y2                =                cy                +                0.5                *                h                boxes                =                tf                .                stack                ((                x1                ,                y1                ,                x2                ,                y2                ),                centrality                =-                i                )                return                boxes              

We volition define the bounding boxes of the dog and the cat in the image based on the coordinate information. The origin of the coordinates in the image is the upper-left corner of the prototype, and to the correct and down are the positive directions of the \(ten\) and \(y\) axes, respectively.

                            # Here `bbox` is the abbreviation for bounding box              dog_bbox              ,              cat_bbox              =              [              threescore.0              ,              45.0              ,              378.0              ,              516.0              ],              [              400.0              ,              112.0              ,              655.0              ,              493.0              ]            

We tin can verify the correctness of the two bounding box conversion functions by converting twice.

                                    boxes                  =                  np                  .                  assortment                  ((                  dog_bbox                  ,                  cat_bbox                  ))                  box_center_to_corner                  (                  box_corner_to_center                  (                  boxes                  ))                  ==                  boxes                
                                    array                  ([[                  True                  ,                  True                  ,                  True                  ,                  Truthful                  ],                  [                  True                  ,                  True                  ,                  True                  ,                  True                  ]])                
                                    boxes                  =                  torch                  .                  tensor                  ((                  dog_bbox                  ,                  cat_bbox                  ))                  box_center_to_corner                  (                  box_corner_to_center                  (                  boxes                  ))                  ==                  boxes                
                                    tensor                  ([[                  True                  ,                  True                  ,                  True                  ,                  True                  ],                  [                  True                  ,                  Truthful                  ,                  Truthful                  ,                  True                  ]])                
                                    boxes                  =                  tf                  .                  constant                  ((                  dog_bbox                  ,                  cat_bbox                  ))                  box_center_to_corner                  (                  box_corner_to_center                  (                  boxes                  ))                  ==                  boxes                
                                    <                  tf                  .                  Tensor                  :                  shape                  =                  (                  2                  ,                  4                  ),                  dtype                  =                  bool                  ,                  numpy                  =                  array                  ([[                  True                  ,                  True                  ,                  Truthful                  ,                  True                  ],                  [                  Truthful                  ,                  True                  ,                  True                  ,                  True                  ]])                  >                

Allow us draw the bounding boxes in the prototype to check if they are authentic. Before drawing, we will define a helper function bbox_to_rect . It represents the bounding box in the bounding box format of the matplotlib packet.

                            #@save              def              bbox_to_rect              (              bbox              ,              color              ):              """Catechumen bounding box to matplotlib format."""              # Convert the bounding box (upper-left x, upper-left y, lower-right 10,              # lower-correct y) format to the matplotlib format: ((upper-left x,              # upper-left y), width, meridian)              render              d2l              .              plt              .              Rectangle              (              xy              =              (              bbox              [              0              ],              bbox              [              1              ]),              width              =              bbox              [              2              ]              -              bbox              [              0              ],              elevation              =              bbox              [              3              ]              -              bbox              [              ane              ],              make full              =              Fake              ,              edgecolor              =              colour              ,              linewidth              =              2              )            

After adding the bounding boxes on the prototype, we can see that the main outline of the ii objects are basically inside the two boxes.

                                    fig                  =                  d2l                  .                  plt                  .                  imshow                  (                  img                  )                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  dog_bbox                  ,                  'blue'                  ))                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  cat_bbox                  ,                  'reddish'                  ));                

../_images/output_bounding-box_d6b70e_55_0.svg

                                    fig                  =                  d2l                  .                  plt                  .                  imshow                  (                  img                  )                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  dog_bbox                  ,                  'blueish'                  ))                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  cat_bbox                  ,                  'red'                  ));                

../_images/output_bounding-box_d6b70e_58_0.svg

                                    fig                  =                  d2l                  .                  plt                  .                  imshow                  (                  img                  )                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  dog_bbox                  ,                  'blue'                  ))                  fig                  .                  axes                  .                  add_patch                  (                  bbox_to_rect                  (                  cat_bbox                  ,                  'blood-red'                  ));                

../_images/output_bounding-box_d6b70e_61_0.svg

13.three.2. Summary¶

  • Object detection not just recognizes all the objects of interest in the prototype, merely as well their positions. The position is generally represented by a rectangular bounding box.

  • Nosotros can convert betwixt ii commonly used bounding box representations.

13.3.iii. Exercises¶

  1. Find another paradigm and try to characterization a bounding box that contains the object. Compare labeling bounding boxes and categories: which usually takes longer?

  2. Why is the innermost dimension of the input argument boxes of box_corner_to_center and box_center_to_corner e'er iv?