Stream upload and asynchronous job processing in large scale systems

Post on 11-Apr-2017

1,129 views 0 download

transcript

Stream Upload And Asynchronous Job Processing System

Lê Bá Minh – minhlb@vng.com.vnTechnical Manager – Zalo Team - VNG

Agenda

• 1/ Why we need an Asynchronous Job Processing System?• 2/ How it works ?• 3/ Application• 4/ Q &A

Parallel Stream Upload

• Data is separated in chunks

Facts

• Zalo Stream Upload• Background continuous Voice Upload• Background Image upload• …

• Facts (now)• 1M voices /day • 800K images /day• Peak: 500 Chunks/second

• Expect:• Scalable (more than 5000 chunks/second)• High performance

What we need• Asynchronous Job processing System

Collect Data

Processing Data

Response

Collect Data

Processing DataResponse

Workers

What we need

• Asynchronous Job processing System• Batch Job• Big data job• High Reliable: No job missed• Distributed job processing workers • High performance• Persistent• Load balancing, Failed over, Recoverable

Open-source solutions

• Share-memory workers• All workers in one physical server• No fail-over• Un-scalable

• Gearman• Good but not completely fit our requirement• No Batch Job support• Not full reliable (lost job)• Not full load-balance• Un-stable if more than 2000 jobs/second

Zalo Asyn Job Processing System

Client

Client

Worker 1

Worker 2

Worker 3

Z Database

Short Connection

Long Connection

TCP

TCP

Worker Manager

Job Caching

Job Manager

Persistent Manager

Job Clean-Up

Job Server

TCP

TCP

TCP

Implementation

• C/C++ for Job Server• C/C++, Java for client and workers • Binary Protocol• Z-Database

Job State

Queuing

Processing

Failed Time Out

Finished

Deliver to Worker

Worker ACK Failed

Worker ACK Finished

No ACK

Started

Job Type

• Single Job• Simple task • Immediately deliver

• Batch Job• Multiple tasks• Deliver when received all tasks

Deployment

Job Server 1

Job Server 2

Synchronized

Business Server

Worker 1

Worker 2

Worker 3

Applications

• Using for all Asynchronous job processing in Zalo: voice upload, image upload, feed processing…• Benchmark (single server)

• 50K images/seconds (640x480)• 50k voices/seconds (30s)

• Advantages• Batch Jobs• Never lost job• Worker can restart or stop any time• Fail-over, Load Balancing, Quick recover in failure

• Issue• Job duplication (handled by worker)

Q&A