A Deep Learning-Based In- eld Fruit Counting Method Using Video … · In 2018 IEEE/RSJ...

A Deep Learning-Based In-field Fruit CountingMethod Using Video Sequences

JiaqiWang1[0000−0002−1708−3573], WenliZhang1[0000−0003−3151−5755],KaizhenChen1[0000−0001−6871−4091], HuibinLi2[0000−0002−4901−2104],

YunShi2[0000−0002−6294−0124], and WeiGuo3[0000−0002−3017−5464]

1 Beijing University of Technology [email protected] Chinese Academy of Agricultural Sciences [email protected]

3 The University of Tokyo [email protected]

1 Introduction

In recent years, computer vision-basedfruit counting in orchards has become ahot research topic in smart agriculture.Modern farms started to getting bene-fits on fruit yield estimation and preci-sion marketing strategy decisions fromsuch technology. There are mainly twotasks for developing such techniques:precision fruit detection and countingfrom orchard images..

For fruit detection task, researchershave proposed deep learning-based im-age detection algorithms for fruit de-tection [1–4]. But they did not addressthe simultaneous presence of small-scale targets. For fruit localization andcounting, researchers have proposedmethods based on static images andvideo sequences[1, 3, 5–7]. The video-based counting method collects fruitimages from multiple viewpoints andis considered as an efficient solutionfor fruit counting. However, the currentvideo-based methods do not discuss thecomplex occlusion situations that mayexist in global video sequences, whichresult in the loss of tracking targets.

Therefore, using orange as a studycase, we propose the following solutionsto the above two tasks: 1) We pro-posed an improved Yolov3 [8] detection

Fig. 1. The Improved-Yolov3 NetworkStructure

model based on the principle of match-ing the feature map’s receptive field tothe target scale [9]. 2) We first analyzethe complex occlusion of orange fruitsand define the counting region at eachglobal video sequence frame. Then, us-ing the multi-objective tracking algo-rithm Sort [10] to count the fruits thatonly appear in the pre-defined region.

2 Method

In this study, the video sequence wascaptured by the DJI Osmo Actioncamera (DJI Technology Co., Ltd.,ShenZhen, China) in an orange or-chard in Sichuan Province, China. Theproposed fruit detection and countingmethod based on video include twosteps: fruit detection and fruit trackingcounting.

2 JiaqiWang et al.

Table 1. Fruit Detection Performance

Method Precision Recall F1-score AP FPPI

Yolov3 0.926 0.90 0.911 0.960 2.294improved-Yolov3 0.926 0.926 0.926 0.968 2.35

Table 2. Fruit Counting Performance

Counting Method Number of fruit counts Inference time

manual counting 90 30simproved-Yolov3(No Track) 900 0.02simproved-Yolov3+Sort(proposed) 102 0.08s

Fig. 2. Visualization of Fruit Detection

Step 1. Fruit detection methodbased on improved-Yolov3: Firstly, wecalculate the size of the receptive field[11] of the Yolov3 network, and clus-ter the orange dataset to count the or-ange scale distribution. Secondly, wedesign the shallow prediction layer fordetecting orange based on the princi-ple of matching the feature map recep-tive field to the target scale. Then us-ing a multi-level fusion strategy to fusethe shallow layer feature with the deeplayer feature to enhance the semanticfeatures of the shallow feature map. Fi-nally, the fusion features are used todetect small-scale oranges in each im-age frame. The improved-Yolov3 net-work structure is shown in Figure 1,where the yellow region indicates theshallow prediction layer.

Step 2. Fruit tracking countingmethod based on specified area: Firstly,the orange detection results from step1 are input to the tracking algorithmSort, and determine whether these or-anges are in the specified count area. Ifthe fruit is in the count area, it will beassigned a unique number and trackedframe by frame until it leaves the countarea. Finally, the number of orange or-dinal numbers is counted as the finalorange counting results.

3 Results and Discussion

In this study, we used 330 orange im-ages and divided them into the trainset and test set at the ratio of 8:2.Table 1 shows the comparison resultsbetween the improved-Yolov3 and theoriginal Yolov3 for the five metrics ofPrecision, Recall, F1-score, FPPI, andAP. Figure 2 shows the detection re-sults of the improved-Yolov3, where thered boxes correspond to ground truthand the blue boxes correspond to de-tection results. The orange countingresults shown in Table 2, where theproposed improved-Yolov3 with track-ing algorithms count 102 oranges at aspeed of 0.08s per frame, is close to themanual count result.

In-field Fruit Counting 3

References

1. A Koirala, KB Walsh, Z Wang, andC McCarthy. Deep learning forreal-time fruit detection and orchardfruit load estimation: Benchmarkingof ‘mangoyolo’. Precision Agriculture,20(6):1107–1135, 2019.

2. Orly Enrique Apolo Apolo,Jorge Martınez Guanter, Grego-rio Egea Cegarra, PurushothamanRaja, and Manuel Perez Ruiz. Deeplearning techniques for estimationof the yield and size of citrus fruitsusing a uav. European journal ofagronomy: the official journal ofthe European Society for Agronomy,115(4):183–194, 2020.

3. Ramesh Kestur, Avadesh Meduri, andOmkar Narasipura. Mangonet: Adeep semantic segmentation architec-ture for a method to detect and countmangoes in an open orchard. Engi-neering Applications of Artificial In-telligence, 77:59–69, 2019.

4. Nicolai Hani, Pravakar Roy, andVolkan Isler. A comparative study offruit detection and counting methodsfor yield mapping in apple orchards.Journal of Field Robotics, 37(2):263–282, 2020.

5. Zhenglin Wang, Kerry Walsh, andAnand Koirala. Mango fruit load es-timation using a video based mangoy-olo—kalman filter—hungarian algo-rithm method. Sensors, 19(12):2742,2019.

6. Xu Liu, Steven W Chen, ShreyasAditya, Nivedha Sivakumar, SandeepDcunha, Chao Qu, Camillo J Tay-lor, Jnaneshwar Das, and Vijay Ku-mar. Robust fruit counting: Combin-ing deep learning, tracking, and struc-ture from motion. In 2018 IEEE/RSJInternational Conference on Intelli-gent Robots and Systems (IROS),pages 1045–1052. IEEE, 2018.

7. Xu Liu, Steven W Chen, ChenhaoLiu, Shreyas S Shivakumar, Jnanesh-war Das, Camillo J Taylor, James Un-

derwood, and Vijay Kumar. Monoc-ular camera based fruit counting andmapping with semantic data associa-tion. IEEE Robotics and AutomationLetters, 4(3):2296–2303, 2019.

8. Joseph Redmon and Ali Farhadi.Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767,2018.

9. Wenjie Luo, Yujia Li, Raquel Urta-sun, and Richard Zemel. Understand-ing the effective receptive field in deepconvolutional neural networks. In Ad-vances in neural information process-ing systems, pages 4898–4906, 2016.

10. Alex Bewley, Zongyuan Ge, LionelOtt, Fabio Ramos, and Ben Upcroft.Simple online and realtime tracking.In 2016 IEEE International Confer-ence on Image Processing (ICIP),pages 3464–3468. IEEE, 2016.

11. Vincent Dumoulin and FrancescoVisin. A guide to convolution arith-metic for deep learning. arXivpreprint arXiv:1603.07285, 2016.

Date post:	13-May-2021
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

A Deep Learning-Based In- eld Fruit Counting Method Using Video … · In 2018 IEEE/RSJ...

Documents