APACHE SLING & FRIENDS TECH MEETUP BERLIN, 23-25 SEPTEMBER 2013
Distributed Eventing and Jobs Carsten Ziegeler | Adobe Research Switzerland
1
About
2
§ RnD Team at Adobe Research Switzerland § Co-founder Adobe Granite
§ OSGi Core Platform and Enterprise Expert Groups
§ Member of the ASF § Current PMC Chair of Apache Sling § Apache Sling, Felix, ACE
§ Conference Speaker § Technical Reviewer § Article/Book Author
§ [email protected] § @cziegeler
Overview
3
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Overview
4
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
OSGi Event Admin Publish Subscribe Model
5
OSGi Event Admin Component A publish
deliver
Component X
Component Y
OSGi Event Admin Publish Subscribe Model
6
§ OSGi event is a data object with § Topic (hierarchical namespace) § Properties (key-value-pairs)
§ Resource Event § Topic:
org/apache/sling/api/resource/Resource/ADDED § Properties: path, resource type etc.
OSGi Event Admin Publish Subscribe Model
7
§ Publisher creates event object § Sends event through EventAdmin service § Either sync or async delivery
§ Subscriber is an OSGi service (EventHandler) § Service registration properties § Interested topic(s)
§ org/apache/sling/api/resource/Resource/*
§ Additional filters (optional) § (path=/libs/*)
OSGi Event Admin Publish Subscribe Model
8
§ Immediate delivery to available subscribers
§ No guarantee of delivery § No distributed delivery
OSGi Event Admin Publish Subscribe Model
9
§ Immediate delivery to available subscribers
§ No guarantee of delivery § No distributed delivery
Discovery Sling Job Distribu3on
Overview
10
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Overview
11
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Installation Scenarios
12
Clustered JCR
Single Instance
JCR
Instance 1
Instance 2
Instance 3
Topologies I Apache Sling Discovery
13
Clustered JCR JCR
ID : A ID : X ID : 42 ID : 1
Single Instance
Instance 1
Instance 2
Instance 3
§ Instance: Unique Id (Sling ID)
Topologies I Apache Sling Discovery
14
§ Instance: Unique Id (Sling ID) § Cluster: Unique Id and leader
Cluster 99
Cluster 35
Clustered JCR JCR
ID : A ID : X ID : 42 ID : 1
Single Instance
Instance 1
Instance 2
Instance 3
Leader Leader
Topologies I Apache Sling Discovery
15
§ Topology: Set of clusters
Cluster 99
Cluster 35
Clustered JCR JCR
ID : A ID : X ID : 42 ID : 1
Single Instance
Instance 1
Instance 2
Instance 3
Leader Leader
Topology Topology
Cluster 99
Cluster 35
Topologies I Apache Sling Discovery
16
Clustered JCR JCR
ID : A ID : X ID : 42 ID : 1
Single Instance
Instance 1
Instance 2
Instance 3
Leader Leader
Topology
§ Topology: Set of clusters
Topologies II Apache Sling Discovery
17
§ Instance § Sling ID § Optional:
Name and description § Belongs to a cluster
§ Might be the cluster leader
§ Additional distributed properties § Extensible through own services
(PropertyProvider) § E.g. data center, region or enabled job topics
Cluster 99
ID : 42
Instance 3
Topology
Leader
Topologies II Apache Sling Discovery
18
§ Cluster § Elects (stable) leader § Stable instance ordering
Cluster 99
ID : 42
Instance 3
Topology
Leader
Topologies II Apache Sling Discovery
19
§ TopologyEventListener § Receives events on
topology changes § Topology is changing § Topology changed § Properties changed
Cluster 99
ID : 42
Instance 3
Topology
Leader
Overview
20
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Job Handling I Apache Sling Job Handling
21
§ Job : Guaranteed processing, exactly once § Exactly one job consumer
§ Started by client code, e.g. for replication, workflow... § Job topic § Payload is a serializable map
Job Handling I Apache Sling Job Handling
22
§ Sling Job Manager handles and distributes jobs § Delivers job to a job consumer… § …and waits for response § Retry and failover
§ Notification listeners (fail, retry, success)
Starting / Processing a Job I Apache Sling Job Handling
23
public interface JobConsumer { String PROPERTY_TOPICS = "job.topics"; enum JobResult { OK, FAILED, CANCEL, ASYNC } JobResult process(Job job);}
public interface JobManager { Job addJob(String topic, String optionalName, Map<String, Object> properties); …}
Star3ng a job
Processing a job
Note: Star3ng/processing of jobs through Event Admin is deprecated but s3ll supported
Starting / Processing a Job II Apache Sling Job Handling
24
@Component@Service(value={JobConsumer.class})@Property(name=JobConsumer.PROPERTY_TOPICS, value="org/apache/sling/jobs/backup")public class BackupJobConsumer implements JobConsumer { @Override public JobResult process(final Job job) { // do backup return JobResult.OK; }}
Job Handling I Apache Sling Job Handling
25
• New jobs are immediately persisted • Jobs are “pushed” to the processing
instance • Processing instances use different
queues • Associated with job topic(s) • Main queue • 0..n custom queues
Job Queue I Apache Sling Job Handling
26
• Queue is configurable • Queue is started on demand in own
thread • And stopped if unused for some time
Job Queue II Apache Sling Job Handling
27
• Queue Types • Ordered queue • Parallel queues: Plain and Topic Round
Robin
Job Queue III Apache Sling Job Handling
28
• Limit for parallel threads per queue • Number of retries (-1 = endless) • Retry delay • Thread priority
Additional Configurations Apache Sling Job Handling
29
• Job Manager Configuration = Main Queue Configuration • Maximum parallel jobs (15) • Retries (10) • Retry Delay
• Eventing Thread Pool Configuration • Used by all queues • Pool size (35) = Maximum parallel jobs for
a single instance
Monitoring – Web Console Apache Sling Job Handling
30
Monitoring – Web Console Apache Sling Job Handling
31
Job Distribution I Apache Sling Job Distribution
32
• Each instance determines enabled job topics • Derived from Job Consumers (new API
required) • Can be whitelisted/blacklisted (in Job
Consumer Manager) • Announced through Topology
Job Distribution II Apache Sling Job Distribution
33
• Job Distribution depends on enabled job topics and queue type • Potential set of instances derived from
topology (enabled job topics) • Ordered: processing on leader only, one job
after the other • Parallel: Round robin distribution on all
potential instances § Local cluster instances have preference
Job Distribution III Apache Sling Job Distribution
34
• Failover • Instance crash: leader redistributes jobs to
available instances § Leader change taken into account
• On enabled job topics changes: potential redistribution
Sling Job Distribution
35
Topology
Instance Sling ID: 1
Job Manager
Instance Sling ID: 2
Job Manager
Instance Sling ID: 3
Job Manager
Instance Sling ID: 4
Job Manager
Job Consumer Topic: A
Job Consumer Topic: B
Job Consumer Topic: C
Sling Job Distribution
36
Instance Sling ID: 1
Job Manager
Instance Sling ID: 2
Job Manager
Instance Sling ID: 3
Job Manager
Instance Sling ID: 4
Job Manager A
Job Consumer Topic: A
Job Consumer Topic: B
A:2
Job Consumer Topic: C Job
Topology
Sling Job Distribution
37
Instance Sling ID: 1
Job Manager
Instance Sling ID: 2
Job Manager
Instance Sling ID: 3
Job Manager
Instance Sling ID: 4
Job Manager B
Job Consumer Topic: A
Job Consumer Topic: B
B:3
Job Consumer Topic: C Job
Topology
Sling Job Distribution
38
Instance Sling ID: 1
Job Manager
Instance Sling ID: 2
Job Manager
Instance Sling ID: 3
Job Manager
Instance Sling ID: 4
Job Manager C
Job Consumer Topic: A
Job Consumer Topic: B
C:4
Job Consumer Topic: C Job
?
Topology
Discovery and Eventing
39
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Discovery and Eventing – What’s Next?
40
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
New OSGi Specifica-on • Distributed Even3ng • Cloud Compu3ng
Discovery and Eventing – What’s Next?
41
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Job Distribu-on • Improved load balancing • Pull based distribu3on
One More Thing…
42
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g
Job Progress Tracking Apache Sling Job Processing
43
§ Jobs can inform about § Progress (percentage) § ETA
§ Additional informational messages § All information is persisted
NEW
Improved Failure Handling I Apache Sling Job Processing
44
§ Currently, no history of jobs § Immediately removed once
§ Job succeeds § Job is cancelled
§ What happened? § What did go wrong?
Improved Failure Handling II Apache Sling Job Processing
45
§ Cancelled jobs are kept § With a reason and log § Can be retried
§ Successful jobs can be kept § With a message and log
NEW
Discovery and Eventing
46
Discovery Sling Job Distribu3on
Felix OSGi Event Admin
From
Even3
ng to
Job Processin
g