Download - EdgeDuet:Tiling Small Object Detection for Edge Assisted … · 2021. 7. 21. · EdgeDuet:Tiling Small Object Detection for Edge Assisted Autonomous Mobile Vision Xu Wang*, Zheng

EdgeDuet: Tiling Small Object Detection for Edge Assisted Autonomous Mobile Vision

Xu Wang*, Zheng Yang*, Jiahang Wu*, Yi Zhao*, Zimu Zhou‡

*School of Software and BNRist, Tsinghua University, China‡School of Information Systems, Singapore Management University, Singapore

Motivation• Various autonomous mobile vision applications

– All deploy HD cameras and require continuously detect objects in complexscenes to make decision on the go, where small objects are common.

• An ideal object detection engine:– Accuracy– Real-Time– Resource-Efficient 2

Robot dogs for patrolling Drones for traffic surveillance Humanoid robot for workingSmall Objects are common

Existing Object Recognition Solutions• Run a light object detection model locally on-board

– Model compression techniques reduce the workload of deep learning models.– Low-resolution Inputs reduce the consumption of computation and memory

resources.

3

Missing most small objects

Existing Object Recognition Solutions• Offload the high-resolution video to the edge and run a heavy model

- Accurate detection on the edge ≠ Accurate detection on the device

4

View changes when uploading high-resolution frames

• Pioneer studies leverage “detect + track” strategy to support real-time object detection and decrease the influence of network delay– Offloading key frames to the edge.– tracking objects in current frame with cached detection results of previous

frames.• Still face a long network delay in each high-resolution frame’s routine

5Glimpse, Sensys’15 EAAR, Mobicom ’19

Existing Object Recognition Solutions

Question

6

• Since both an on-device light model and an on-edge heavy modeldon’t work for accurate object detection in autonomous mobilevision, could we joint the two system together and only offload smallobject detection to the edge?– The large-sized objects don’t have the transmission delay and could be tracked

immediately.– Only the partial content of high-resolution frame containing small objects should

be uploaded to the edge and the transmission delay is low.– Parallelism is more efficient by splitting the image into sub-images without

crossing the boundary of objects.

Question

7

• What and how to offload to the edge?– No priori knowledge about small objects– Paralleling processing

• How to aggregate the detection results and track all object in real-time?– Duplicate detection results– Multiple objects to track

Our System

8

• EdgeDuet: an edge-device collaborative framework for enhancing small object detection with tile-level parallelism– Low Latency– High Accuracy

System Overview

9

Offloading Module

Local Object Detection Module

Real-time Tracking Module

The Offloading Module

10

• EdgeDuet exploits RoI frame encoding to compress video frames, and content-prioritized tile offloading for highly parallel object detection at the edge.

RoI Frame Encoding

11

• Goals– Compress pixel blocks containing small objects in high quality.– Compress the rest of the frame in low quality.

• Determine blocks containing small objects• Determine compression levels

RoI Frame Encoding

12

• Determine blocks containing small objects– Objects is “small” if the local object detector cannot detect it but remote object

detector can.– We illustrate the capacity of local objector for small objects with the recall curve

of object size.– The size threshold of each class is defined as the value below which the recall

value is less than 90%.– Approximate locations are estimated by the locations of small objects of the

previous frame.

Class-dependent size threshold for small objects

RoI Frame Encoding

13

• Determine compression levels– High quality level should trade off the accuracy of object detection and the

transmission data size.– Low quality level should not be too small to miss new objects.– The low-quality level is chosen such that the remote object detector outputs low

confidence scores on the compressed blocks but will not fail to locate objects.

Content-Prioritized Tile Offloading

14

• We offload the whole frame in the unit of tile

3x2 tiles


15

• We offload the whole frame in the unit of tile

E U D I R

E U D I R

E U D I R

E U D I R

E U D I R

E U D I R

E

U

D

I

R

Tile Encode

Tile Upload

Tile Decode

Inference

BBox Return


16

• Enable tile-level parallelism– Modify the frame encoding, frame decoding and object detection stages to

eliminate dependencies among tiles.


17

• Tile-level encoding– Current HEVC video encoder support encoding tiles in parallel, however, it

won’t upload the bit-stream until the whole frame is encoded.– We modify the open-sourced video Encoder Kvazaar[3] to support tile-level

encoding.

[3] Kvazaar. https://github.com/ultravideo/kvazaar


18

• Tile-level decoding– Existing video decoders depend on the first tile to locate the other tiles.– We fake each tile as a “first tile” by modifying the bit-stream in video encoder

and the HEVC parser in the video decoder accordingly.

The output of video decoder


19

• Object Detection– Performing object detection on each tile separately may miss objects which

cross the boundaries of adjacent tiles.– Overlap-tiling: split each frame into primary tiles and overlap-tiles and group

each primary tile with its surrounding overlap tiles for small object detection.


20

• Enable Content-based Priority– Modify the task schedule module in Kvazaar.– Once receiving a frame to encode, Kvazaar split the frame into tiles and submit

the tasks of each tile to a task queue.– Add a dynamic priority mapping module to change the order of tasks in the

queue.– The priority is the number of small objects of the corresponding tile group.

Local Object Detector

21

• The local object detector aims to detect medium- to large-sized objects in the video frames locally on the mobile device.– The local object detector should balance between offline accuracy and

latency to achieve high online accuracy.– We choose YOLOv3FP16 (640x640) as the local object detector.

Performance of local detector on VisDrone dataset

Real-time Tracking

22

• The module aggregates the offloaded and the local detection results into the cache and tracks all objects with multiple single-objecttrackers.– To avoid duplicated result, we drop the results of the local detector for small

objects and those of the remote detector for medium- to large-sized.– Adaptively update the tracking results based on the speed of the objects.

General workflow using multiple single-object trackers

Experiment

23

• Dataset– VisDrone[4]

• Compared Methods– Glimpse– EAAR– LaT

• Network Setting:– 4G– Wi-Fi 2.5GHz– Wi-Fi 5GHz

• Metrics:– Latency– IoU Accuracy

[4] Vision Meets Drones: Past, Present and Future.

• 2 x Intel Xeon CPU E5-2560 v4• 2 x GTX 2080ti GPU• 256GB Memory

• iPhone 11 with the A13 bionic chip

Experiment

24

• Overall Performance– EdgeDuet notably outperforms the two offloading schemes, Glimpse and EAAR,

in both accuracy and latency under all the three network conditions.– LaT is the fastest because it only performs local detection. However, pure local

detection has the worst accuracy.

Experiment

25

• The accuracy of small objects– EdgeDuet achieves 161.5%, 245.0%, 292.4% improvement for small object

detection accuracy under the three network conditions.

Experiment

26

• Benefits of Individual Modules in EdgeDuet– EdgeDuet has smaller frame size than EAAR and Glimpse.– EdgeDuet achileves 12.2% and 5.1 % latency improvement over Frame-Level

and Tile-Level.– EdgeDuet improves the overall accuracy by 4.2% with adaptive tracker

configuration.

RoI Frame Encoding Content-Prioritized Tile Offloading Adaptive tracker

Conclusion & Contribution• EdgeDuet is the first framework that enhances small object

detection in crowded scenes via collaboration between the edge and the mobile device.

• We push the state-of-the-art offloaded object detection studies from task-level parallelism to tile-level parallelism, which notably reduces the offloading latency. EdgeDuet is a systematic design that enables accurate, real-time object detection on mobile devices even in the case of low network bandwidth.

• We implement EdgeDuet as a cross-platform framework. Evaluations on VisDrone show that EdgeDuet improves the overall accuracy by 44.7% and the end-to-end latency by 34.2% over the state-of-the-art object detection offloading schemes.

27

28

Xu WangTsinghua University

[email protected]