Architecture of Falcon, a new chat messaging backend system build on Scala

Architecture of Falcon, a new chat messaging

backend system build on Scala

Yusuke YasudaChatWork

2017/02/26

Architecture of Falcon, a new backend chat messaging system build on Scala

2017/02/27 © ChatWork All rights reserved.

Goal of Architecture• Scalability:

• linear increase of throughput by adding nodes • keep stable and low latency

• High Performance:• achieve 100 times higher throughput than the current load without

further architectural changes• Resiliency:

• avoid chain reaction of failures• fast recovery from partial failure

• Low cost:• keep cluster size as small as possible• resist temporal load without additional resources• high performance/resource ratio

• Legacy system integration• keep consistency without transactions

Architecture of Falcon, a new backend system build on Scala


Architecture Overview



Architecture Overview• “Write API” exposes asynchronous API. Persists event and

immediately returns `202 Accepted`. No queries and mutations. Storage is Kafka.

• “Read API” can only query read model. No mutation. Both query by key and query by key range are supported. Storage is HBase.

• ReadModelUpdater is a Kafka consumer creates read model queried by Read API from events generated by Write API.

• PostProcessorForwarder is a Kafka consumer notifies legacy PHP system to execute remaining transactions, e.g. push notification.



CQRS: Command Query Responsibility Segregation

• Command and Query responsibility is segregated at system level.• Specialized responsibility make a system simple

• Each system uses different models• “Write API” uses immutable events to represent history of user

actions.• “Read API” uses read models optimized for query.

• Dedicated storages are used for command and query system each.


Convergent Evolution of technology“Convergent evolution is the independent evolution of similar features in species of different lineages. ”

https://en.wikipedia.org/wiki/Convergent_evolution

DDD

Fighting against complexity of domain model with Event Sourcing

Big Data

Fighting against complexity of big data with Log Processing

https://www.infoq.com/news/2016/05/event-sourcing-stream-processing

Two communities invented similar features independently.

Falcon is influenced by knowledge of two communities.


https://en.wikipedia.org/wiki/Convergent_evolution

https://www.infoq.com/news/2016/05/event-sourcing-stream-processing


Inter-system Synchronization• Falcon subsystems and PHP system are so called “microservices”.

• Microservices do not share persistent storage.• Event Sourcing to synchronize systems with properties:

• No events are lost (within retention period).• The order of message events are preserved within a chat room.

• Events are processed in at-least-once manner.• Processing the same event twice has no effects (idempotent).




Kafka features helpful for Event Sourcing• auto-sharding

• Events are partitioned to be processed in parallel.• strong consistency:

• One partition can be processed by single consumer.• Consumer can have internal states.

• Resilient:• Partition assigned to crashed consumer is rebalanced to another consumer automatically.

• Easy to connect services• Forward events to next topic

topic 1

topic 3

topic 2

topic 4


• subsystem may show temporarily poor performance:• load spikes• Compaction of HBase• Legacy PHP system failure caused by process saturation• AWS component failure

•Using Kafka as command-side storage help defend subsystem:•Kafka can easily handle events produced with higher throughput as 40 times as normal load without scaling out.•Kafka consumer can consume events with stable throughput. This ensures subsystem to deal with predictable throughput.•Throttling of Kafka ensures upper limit of throughput.

Defending Subsystems from Unpredictable Load



Defending Subsystems from Unpredictable Load

1. SQL query latency increased at Amazon Aurora of PHP system

2. PostProcessorForwarder caused timeout to call PHP system

4. Throughput of processing events decreased.Once subsystem recovered from failure, the throughput increased to consume stacked events but never exceeded upper limit due to throttling.

3. Events stacked on queue in Kafka



ACID semantics of Falcon

• Atomicity: No atomicity among posting message and associated operations, e.g. unread count calculation. Intermediate state can be observed.

• Consistency: Eventual consistency. Read “Consistency Model”.• Isolation: No concurrent mutation of the same record. Events are

processed sequentially. No need to isolate.• Durability: Yes. No messages are lost.• Visibility: No guarantee. There is short term posted message

cannot be observed. We try making the term as short a.p.

ACID does not provide high availability and scalability.Falcon does not have ACID properties.

http://people.eecs.berkeley.edu/~brewer/cs262b/TACC.pdf


http://people.eecs.berkeley.edu/~brewer/cs262b/TACC.pdf


Consistency ModelChoose C or A based on CAP theorem.

CA CA CA

CA



Consistency Model

CA CA CA

CA

•Availability for user-facing subsystems, Write API and Read API• Ensure always writable and readable. Loosing availability means service down.

•Consistency for background subsystems, ReadModelUpdater and PostProcessorForwarder.• Ensure internal state consistency. Loosing availability is not obvious for users.



Recovery from human errors• Falcon can recover from data corruption without service stop.

• The system might damage data was ReadModelUpdater if malfunctioning.• “Write API”, “Read API”, “PostProcessorForwarder” cannot mutate data.

• Since input events are preserved in Kafka, output can be recalculated by resetting offsets of Kafka consumer.

• Stopping ReadModelUpdater does not affect availability of service.


Date post:	21-Mar-2017
Category:	Engineering
Upload:	tanukkii
View:	249 times
Download:	2 times

Architecture of Falcon, a new chat messaging backend system build on Scala

Engineering