+ All Categories
Home > Documents > Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능...

Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능...

Date post: 15-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
53
Visual recognition in the real world SKT services 박병관 SK Telecom AI Center / 영상인식기술Cell 2019.07.02
Transcript
Page 1: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Visual recognition

in the real world SKT services

박병관

SK Telecom

AI Center / 영상인식기술Cell

2019.07.02

Page 2: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

SKT Services

2

Page 3: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Contents

3

1. T map 도로교통정보인식

a. 서비스개요

b. Core Engine Architecture

c. Core Engine

d. Multi Frame Integration

e. Evaluation

2. NUGU nemo 영상인식

a. Hand Posture 게임

b. OksusuKids 시청가이드

Page 4: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

4

T map 도로교통정보인식

Page 5: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

5

1. Service Overview

Page 6: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Service Overview

6

● Goal

○ 정보수집카메라영상에서도로안내표지판과과속카메라정보자동인식

○ 도로데이터로변환하여기존데이터검증및신규데이터생성

● 기대효과○ Agile 업데이트

■ VoC 및신규/변경도로정보의빠른반영

○ 커버리지확대

■ 촬영 Coverage ~= 검증 Coverage

Page 7: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Service Overview (Example)

7

Page 8: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

8

2. Core Engine Architectures

Page 9: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Core Engine Architectures

9

Origin

al Im

age

mid

-siz

e I

mage

Road Sign

Detector

Cro

p &

Resiz

e

Text

Detector

Cro

p, W

arp

&

Resiz

e

Language

Classifier

Text

Recognition

...

인천국제공항...

청중로봉오대로

...

Page 10: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Practical Issue #1 (다양한야외환경)

10

Lighting Blur Occlusion

Page 11: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Practical Issue #2 (비규격표지판)

11

Page 12: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

12

3. Core Engine

Page 13: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Road Sign Detection

● 도로영상에서의미있는표지판을잘검출○ 많은표지판중주요대상선별

■ 제한속도 / 도로교통표지 / 과속카메라등

■ 비슷한표지판까지학습에반영

○ Two-stage R-CNN 기반객체검출

검출대상

검출대상아님

검출대상

검출대상아님

Page 14: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Text Detection (I)

14

● 문자검출기술개요○ 4 DoF → 5 DoF → 8 DoF → 2N DoF 로발전中

○ 표지판내문자는정해진규격존재 (Arbitrary Shape X)

■ but 차량 Motion에의한표지판회전발생

○ 표지판내문자는 5 DoF 검출후 Warping하여문자인식 Engine에전달

TYPE RECT RBOX Polygon

Degrees of Freedom 4 (x, y, w, h) 5 (x, y, w, h, θ) 2N (x1, y1, … ,xN. yN)

Example

Page 15: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Text Recognition (I)

15

● 검출된문자를잘인식○ CNN + RNN 기반 Text Recognition Engine

○ 한글의복잡도를고려한 Customized CNN + Attentive RNN

A B C D E F G... (영문/숫자/특수문자: 80여종, 음소문자)

VS

닮닳쏘쪼개걔흥홍훙흉횽(한글/영문/숫자/특수문자: 2400여종,

음소/음절문자)

Page 16: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Text Recognition (II)

16

● 검출된문자를잘인식○ 고가 + 다량의 Training DB 필요

■ 한글의복잡도로인해다량의 Training DB 필요

■ But… 한글 Labeling은굉장히비싼작업● 5음절의한글 500만단어 Labeling 예상비용

● 500만 * 5 * 10 = 2.5억 (10원/음절typing)

○ Target Customized 합성 DB 활용

■ 생성 == Labeling

■ Augmentation by 3D Effect

■ Text Detection Box의 Jittering 모사가능등

Page 17: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

3D Plate Modeling for Text Detection

Page 18: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

18

4. Multi-frame Integration

Page 19: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Multi-frame Integration

19

● 한표지판을여러 Frame에서인식하여표지판단위인식성능향상○ 다수 Frame 결과 Integration으로일부 Frame의오인식, 가림등에의한성능저하개선

○ 여러 Frame에서등장하는표지판을하나의결과로 Integration필요

○ Scene Splitting → Tracking → Word Integration → Word Refining

Page 20: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

20

5. Quantitative/Qualitative

Evaluation

Page 21: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

21

● 평가 Set을 Hard set과 Normal set으로분리하여평가

case back light

hard set

Page 22: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

22

● 평가 Set을 Hard set과 Normal set으로분리하여평가

case back light blur

hard set

Page 23: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

23

● 평가 Set을 Hard set과 Normal set으로분리하여평가

case back light blur occlusion

hard set

Page 24: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

24

● 평가 Set을 Hard set과 Normal set으로분리하여평가

case back light blur occlusion exposure

hard set

Page 25: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

25

case Hard Normal Total

E2E Acc. 90.32% 95.65% 95.18%

● 평가 Set을 Hard set과 Normal set으로분리하여평가

case back light blur occlusion exposure

hard set

Page 26: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Quantitative/Qualitative Evaluation

26

Page 27: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

27

NUGU nemo 영상인식

Page 28: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Smart Display Speaker (with Camera)

28

‘19년 4월 26일출시(국내최초)

with 영상인식

Page 29: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

29

Hand Posture 두뇌게임

Page 30: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

30

● Hand - Natural User Interface

Page 31: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

31

Input : 2D image

● 2D key points

○ Open Pose (CMU)

Page 32: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

32

Input : 2D image

● 2D key points

○ Open Pose (CMU)

● 3D key points

○ Learning to Estimate 3D Hand Pose from

Single RGB Images (ICCV 2017)

○ Generated hands for real-time 3d hand

tracking from monocular rgb (CVPR 2018)

Page 33: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

33

Input : 2D image

● 2D key points

○ Open Pose (CMU)

● 3D key points

○ Learning to Estimate 3D Hand Pose from

Single RGB Images (ICCV 2017)

○ Ganerated hands for real-time 3d hand

tracking from monocular rgb (CVPR 2018)

Input : 3D depth image

● 3D key points

○ Augmented Skeleton Space Transfer for

Depth-based Hand Pose Estimation

(CVPR 2018)

○ Occlusion-aware Hand Pose Estimation

Using Hierachical Mixture Density Network

(ECCV 2018)

Page 34: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

34

Output : Posture

● Input : Static One Frame

○ How many classes do you need to classify?

■ Hard to label

Page 35: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

35

Output : Posture

● Input : Static One Frame

○ How many classes do you need to classify?

■ Hard to label

Output : Gesture

● Input : Dynamic Varying Frames

○ Real Time Processing with Tracking

Page 36: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

36

● 학습○ 어떤 class를학습시킬것인가?

■ 확실한손자세, 다른 class와최대한 appearance 상으로 겹치지않는 class

■ 7 class + 1 negative = 8 classes

● negative hand class is important

+

Page 37: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

37

● 선택○ 2d (r,g,b) image vs 3d depth image

○ posture vs gesture

○ key point vs detection

■ rock, paper, scissors 3종■ v pose, heart, palm, okay, thumbs up, thumbs down 6종

Page 38: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

38

● 문제점○ 경계를정하는일

■ 어느회전각도까지허용할것인가?

Page 39: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

39

● 문제점○ 경계를정하는일

■ 어느회전각도까지허용할것인가?

○ Pose variation

Page 40: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

40

● 문제점○ 경계를정하는일

■ 어느회전각도까지허용할것인가?

○ Pose variation

Page 41: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

41

● 문제점○ 경계를정하는일

■ 어느회전각도까지허용할것인가?

○ Pose variation

■ 어떤 pose까지허용할것인가?

Page 42: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

42

● 문제점○ 경계를정하는일

■ 어느회전각도까지허용할것인가?

○ Pose variation

■ 어떤 pose까지허용할것인가?

Page 43: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

43

● 해결방법○ Learning by Failure

■ 완벽한 engine을초기에만들수없다.

■ 쉬운 (평이한) 손자세 DB는학습에도움이되지않는다.

■ 엔진의문제점은실사용자로부터얻는것이확실하다.

■ CBT를통한실패 Case분석및엔진고도화의지속적인 Iteration (8차까지진행된 CBT)

Page 44: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

반짝반짝두뇌게임

44

● 성능○ 4차까지의 CBT를통해 base-line엔진문제점파악

■ Pose Variation

● 아이들의다양한손동작● roll, pitch, yaw 방향 pose variation db 보강

○ 6차테스트후■ Scale Variation

● 가까운거리(20 cm 이하)에서인식률이상대적으로떨어짐

● 다양한 Scale DB 보강○ 출시된이후에도 CBT 진행하며성능고도화中

Page 45: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

45

얼굴검출 OKSUSU Kids 시청습관

Page 46: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

OksusuKids 시청가이드서비스

46

● 어린이시청습관을위한영상인식서비스○ 15cm 이내거리에서디스플레이사용시 VoD를멈추고 ‘뒤로가기' 안내

○ VoD 시작 1분후부터동작, 1회 ‘뒤로가기’ 안내후 5분뒤다시서비스동작

Page 47: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

OksusuKids 시청가이드서비스

47

● Embedded 얼굴검출기술을활용한디스플레이와얼굴사이거리추정

● Embedded 필요성○ Privacy concerns

○ Server cost

○ Prompt response

Page 48: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Legacy Face Detector

48

● Legacy embedded face detector

○ Shallow learning based (Runs 9fps @ NUGU nemo)

○ We need to go deeper...

Page 49: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Limitation/Performance

49

● Nvidia GTX 1080 Ti vs NUGU nemo gpu

○ 11.34 TFLOPS vs 0.007 TFLOPS

Page 50: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Current Face Detection @ NUGU

50

Page 51: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

51

Wrap Up

Page 52: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

Infra for Visual Recognition

52

Training GPU

Infra : DGX-1V Inference GPU Infra :

V100

Page 53: Visual recognition in the real world SKT services...반짝반짝두뇌게임 44 성능 4차까지의CBT를통해base-line엔진문제점파악 Pose Variation 아이들의다양한손동작

맺음말

53

● 서비스적용을위한길

○ 출시전서비스에맞는Training DB와 Test DB 확보○ 서비스출시후지속적인 Update 가능한구조

○ Beyond Open Source and Paper

■ 공개된 Network 이상의 Adaptation / Modification

○ 풍부한 GPU Infra

○ 서비스에대한애정과열정 (VoC마저사랑할수있는 Mind set)


Recommended