+ All Categories
Home > Documents > Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  ·...

Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  ·...

Date post: 18-Apr-2018
Category:
Upload: dinhkiet
View: 215 times
Download: 2 times
Share this document with a friend
23
Scaling Massive Content Stores in the Cloud CloudExpo New York June 2016 @johnnewton Alfresco Founder & CTO
Transcript
Page 1: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Scaling Massive Content

Stores in the CloudCloudExpo New York – June 2016

@johnnewton – Alfresco Founder & CTO

Page 2: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow
Page 3: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Government Financial Services Healthcare Manufacturing Corporate

Alfresco Customers

Page 4: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Somewhere in a secret underground location

Page 5: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

someone is trying to store…

One Billion Documents!!!

http://www.warnerbros.com/austin-powers-international-man-mystery

Page 6: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Some have attempted before … and failed

Page 7: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content Use Cases at Scale

Enterprise

Document Library

Loans &

Policies

Claims & Case

Processing

Transaction &

Logistics Records

Research &

Analysis

Real-time Video

Internet of Things

Medical & Personnel Records

Government

Records & Archives

Discovery &

Litigation

Page 8: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content Management Applications

Document

Library

Image

Management

File Sync &

Share

Search & Retrieval

Business

Process

Management

Records

Management

Case

Management

Media Management

Information

Archiving

Page 9: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content vs. Data vs Files vs. EFSS

Data Files EFSS Content and ECM

Page 10: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content Architecture as a Big Data Problem

10

Files /Renditions

Metadata

Directory CategoriesRelationships

Indexes

Search

Activities

Security People

APIs

Processes /Tasks

Rules

Semantics

Types

ContentObject

Access Create – Manage – Distribute – Use

Context

DatabaseDistributed

FSDatabaseSolr /

ElasticSearch

Page 11: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content at Scale in the Enterprise

Users at Scale

Concurrency Content Count

Read/Write

Throughput

Geographic

Distribution

Volume Size

Page 12: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

The Problem with Traditional Approaches

Provisioning and

Administration

Geographic Distribution Lack of Agility

Lack of Redundancy Lack of Elasticity

Page 13: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Content Management Architecture

13

Alfresco Share

Alfresco Repository

Alfresco SOLR

Activiti Workflow

Engine

Database

FS Content

Store

Indexes

S3

RDS

EBS or Ephemeral

PIOPS EBS

(or Glacier)

EC2

Page 14: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Scaling in Tiers

Alfresco

Transformation Server

Alfresco

Transformation Server

Alfresco Solr

Alfresco Local Repo

(Index Tracking)

Alfresco Solr

Alfresco Local Repo

(Index Tracking)

Alfresco Repository Alfresco Repository

Alfresco Share Alfresco Share

Alfresco Activiti Suite

Alfresco Activiti Suite

Page 15: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Data Meta-Model

A

B

C

D

Folder

Folder

Doc

Docrendition

Class

Type Aspect

Property

Association

Constraint

Child Association

Folder

Document

contains

name

name

content

Auditable who by

when

rendition

Type

Child Association

Type

Association

Property

Property

PropertyAspect

Model Metadata Organization

1 Billion 15 Billion

Page 16: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Next Generation Relational Architectures

AZ 1 AZ 2

EBSmirror

EBSmirror

Amazon S3

EBS

StandbyInstance

PrimaryInstance

AZ 1 AZ 3

Amazon S3

PrimaryInstance

AZ 2

ReplicaInstance

• Highly-available — synchronous vs. asynchronous replication

• Significantly more efficient use of network I/O

• Self-healing, Fault-tolerant, Instant crash recovery

MySQL with standby Next Generation DBMS

async

4/6 quorum

PiTR

Sequential

write

Sequential

writeDistributed

writes

Amazon Elastic Block Store (EBS)

Page 17: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Index and Search Architecture

Full-Text Query

Metadata Query

Facets & Buckets

Security Filters

Results Processing

Credit: Ryan Tobora

ThinkBig, Teradata

http://thinkbig.teradata.com/solrcl

oud-terminology/

Text Extraction

Metadata Injection

& Path Processing

Shingles

ACL Processing

Results ProcessTerm-hit Highlighting

x 20 instances

Page 18: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Storage Layer

File Storage Architecture

In Place

AWS Import/Export

Direct

Streaming

Aurora EBS

Metadata ContentMetadata

Content

Archive Layer

S3 Amazon Glacier

Metadata Content

File

System

ProtocolsAPIs

Page 19: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

BM4 Test Execution Environment – 1.2B Docs

UI Test x 20 m3.2xlarge Simulate 500 Users• Selenium / Firefox• 1 hour constant load• 10 sec think time

UI Test UI Test

Alfresco Alfresco Alfresco x 10 c3.2xlarge Alfresco with Share and Repo

Solr x 20 m3.2xlarge Solr Solr

Aurora x 1 db.r3.xlarge

ELB

Sharded Solr Cloud

sites folders files transactions dbSize GB

10,804 1,168,206 1,168,206,000 15,475,064 3,185

Simulate AWS Import/Export

(in place)

Page 20: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Benchmark Results

• Document load rate 1000 documents per second (with 10 nodes)

• 3 Million per Hour!

• Load rate was consistent even passing the 1B document

• Sub-second login times and good responses for other actions

• Open Library: 4.5s

• Page Results: 1s

• Navigate to Site: 2.3

• Aurora indexes used efficiently at 3.2TB

• No indications of any size-related bottlenecks with 1.1 Billion Documents

• CPU loads:

• Database: 8-10%

• Alfresco (each of 10 nodes): 25-30%

Page 21: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

What a Difference

ECM ECM ECM

Search Search Search

FS FS FS

Hardware Hardware Hardware

Load Balancer

DR Plan

HSM HSM HSM

3-6 MonthsQuestionable ScaleLittle Redundancy

Lots of $$$

< 30 mins10x Faster

Fault-TolerantOpen, Cost Effective

ELB

Alfresco Alfresco Alfresco

Solr Solr Solr

S3

EC2 EC2 EC2

AZ1 AZ2 AZ3

EBS EBS EBS

Page 22: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Well, what am I supposed to do

with all this frickin’

hardware?!!

Page 23: Scaling in the Cloud - SYS-CON Mediares.cdn.sys-con.com/session/3182/John_Newton_v2.pdf ·  · 2016-06-17Stores in the Cloud ... Case Management Media Information ... Activiti Workflow

Thank [email protected]

@johnnewton


Recommended