+ All Categories
Home > Documents > The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code –...

The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code –...

Date post: 20-May-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
40
@jtahoyle #IPEXPO mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved. The Power of Serverless: How we bet big on the cloud with transformational results Jamie Hoyle, MirrorWeb
Transcript
Page 1: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

@jtahoyle #IPEXPO mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

The Power of Serverless: How we bet big on the cloud with

transformational results

Jamie Hoyle, MirrorWeb

Page 2: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Who am I?

• VP, Product at MirrorWeb

• Responsible for facilitating product direction and developing future product strategy

• Digital transformation advocate - which is why I’m here!

• Lifelong Bury FC fan hoping I’ll have a team to support on Saturday

Page 3: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

• Digital archiving and compliance specialist

• Web archiving, social media archiving, customer journey, data analytics

• Wide collection of clients - public sector, brand preservation, FS&I regulatory

What do we do?

Page 4: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Client base

Large FS&I Firms

Page 5: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

The engineering challenge

Capture, processing, and preservation

of Actual Big Data

FS&I regulatorycompliance

requirements

Page 6: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

The engineering challenge• We deal with huge amounts of data

• There are lots of ways that we have to collect data, and we have to store all that data in its original format until the end of time*

• We have to use lots of different methods to ingest data, which means we have to maintain a lot of different applications

• We have to process all that information - analytics, screening, full-text search

• Our product has to have a robust pipeline to ensure these different content types are correctly process and made accessible

• Our data ingest and workload requirements vary on a day-by-day basis

• We need flexible capacity and have to be able to scale to any size very quickly

Page 7: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

The engineering challenge• We offer regulatory compliance services

• We can never, ever, ever lose data, or fail to collect any data

• We have to be always-on and always-reliable - HA clusters, no maintenance windows

• We have to be ready to provide any data in court-admissible formats in any jurisdiction at any time

• All our data has to be available all the time - no cold stores or tapes

• Our capture surface for FS&I changes frequently depending on market trends. Instagram, Snapchat, WhatsApp…

• We have to iterate fast and ship constantly to keep up with demand

Page 8: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

“We have to be always on, all the time, our data has to be constantly warm, we need to do instant analysis on that

data, and we need to ship to production on a daily basis.”

Page 9: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

–Johnny Appleseed

“Type a quote here.”

Page 10: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

–Johnny Appleseed

“Type a quote here.”

Page 11: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

The solution?

Page 12: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Serverless.

Page 13: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Serverless.

What does it do? Does it do things? Let’s find out!

Page 14: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

What is serverless?

• Where your cloud provider manages the allocation and management of servers rather than you.

• Yes, there’s still servers somewhere…. you just don’t have to deal with them!

• Pay-per-request, not a fixed monthly cost

• Often manifested as micro service architecture

Page 15: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Why did we turn to serverless?

• Easy to maintain

• Deployments are a dream

• Built-in fault tolerance by design

• For all of our use cases, cheaper than traditional infrastructure… and almost certainly cheaper for all of your use cases too

• Vast scale

Page 16: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Isn’t it just code?

• It can be.…

• …but you won’t get very far.

• Deploying serverless as code doesn’t matter if your only database instance goes down for 12 hours.

• Service resiliency is only as good as your single point of failure.

Page 17: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Understanding the paradigm• Serverless isn’t a magic bullet. You can still write bad serverless

functions that will still fail.

• Don’t try and reuse monolithic codebases. Design your codebase for the architecture that you’re using.

• You can just run your Python WSGI or Node Express.js app as a single serverless function… but why would you?

• Remember: serverless platforms are designed for running lots of small things.

Page 18: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Understanding the paradigm• Serverless puts us back in a land of limited compute resources

• The more efficient your code, the fewer resources you use…

• …and you’re paying for resource usage now, not having a server sit there 24/7!

• The few hundred milliseconds saving per execution equals big savings. Micro-optimisations matter again!

• Profiling tools help - NewRelic, Datadog etc.

Page 19: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

– Edna Mode, Incredibles 2

“Done properly, parenting is a heroic act. Done properly.”

serverless

Page 20: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Building resilient serverless applications

• Your cloud service provider has vast capacity to scale, and resources to support their technology that far outstrip yours

• Think about the rest of your architecture - what interfaces with your serverless code? Is it as resilient as your serverless tech?

• Use managed services as much as you can - but make sure you know how to recover when things go wrong

Page 21: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Building efficient serverless applications

• Use microservices

• Split your codebases into as many different serverless functions as is reasonable

• Share core functions as libraries (or Lambda Layers on AWS)

• Choose your language wisely - AWS Lambda now supports Node.js, Golang, Python…

• Continuous improvement.

Page 22: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Case study: MirrorWeb Social Media

Page 23: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Page 24: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

(other clouds are available)

Page 25: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - Platform Support

YOUR PLATFORM

HERE

Page 26: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - the numbers

• 1,000+ social media accounts archived

• 6,000+ posts archived a day

• 6 core platforms, new platforms added all the time

Page 27: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - the MVP

https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp

Page 28: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - the MVP

https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp

Launch Partner

450,000 tweets 17,000 videos

15% of accounts no longer available… …this is why we need archives!

Page 29: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

1. Triggering a crawl

Amazon CloudWatch Event Scheduler Lambda Function

Server Count: 0

Page 30: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

2. Getting content

Amazon CloudWatch Event Scheduler Lambda Function

Server Count: 0

Social Crawl Lambda Function

Social DynamoDB

table

Scheduler Lambda Function

Page 31: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

3. Getting media

Amazon CloudWatch Event Scheduler Lambda Function

Server Count: 0

Social Crawl Lambda Function

Media D/L Lambda Function

Social Crawl Lambda Function

Social DynamoDB

table

Social Media S3

Bucket

Page 32: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

4. Adding webhook support

Amazon CloudWatch Event Scheduler Lambda Function

Managed Count: 4

Social Crawl Lambda Function

Media D/L Lambda Function

Social Crawl Lambda Function

Amazon AuroraDB

Cluster

Social Media S3

Bucket

API Gateway HTTP req

Page 33: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

5. Full text social search

Amazon CloudWatch Event Scheduler Lambda Function

Social Post Indexing Function

Media D/L Lambda Function

Social Crawl Lambda Function

Amazon AuroraDB

Cluster

Social Media S3

Bucket

Amazon Aurora

INS/UPD triggerManaged Count: 8

Amazon API Gateway HTTP(S) Request

ElasticSearch Service

Page 34: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

6. Image tagging

Amazon CloudWatch Event Scheduler Lambda Function

AWS Rekognition (+Lambda)

Media D/L Lambda Function

Social Crawl Lambda Function

Amazon AuroraDB

Cluster

Social Media S3

Bucket

Managed Count: 8

Amazon API Gateway HTTP(S) Request

ElasticSearch Service

Social Media S3

Bucket Trigger

Social Post Indexing Function

ElasticSearch Service

Page 35: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Look at where we started…

Amazon CloudWatch Event Scheduler Lambda Function

Server Count: 0

Social Crawl Lambda Function

Social DynamoDB

table

Media D/L Lambda Function

Social Media S3

Bucket

Page 36: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Look at where we are…

Amazon CloudWatch Event Scheduler Lambda Function

Media D/L Lambda Function

Social Crawl Lambda Function

Amazon AuroraDB

Cluster

Social Media S3

Bucket

Managed Count: 8

Amazon API Gateway HTTP(S) Request

Social Post Indexing Function

ElasticSearch Service

AWS Rekognition (+Lambda)

Multi-cloud S3 DR bucket

Page 37: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - why serverless?• Serverless is perfect for event-driven architecture - all social media

posts are events

• Unpredictable requirements - needs to scale with client archiving demands on a per-hour basis

• Excellent fault tolerance - a non-core part of the service being unavailable doesn’t impact our ability to archive

• Easy to extend upon - individual features remain as separate codebases that we’ve glued together

Page 38: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

SMA - what would have happened?

• In a world without serverless, we’d be running monolith social media capture servers…

• …storage RAID arrays across multiple datacenter and cloud providers

• HTTP load balancers for webhooks…

• …and we probably wouldn’t have a full extract-transform-load (ETL) pipeline for data analytics.

• It’s transformed our business and won us clients.

Page 39: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

In summary• Serverless isn’t just code – it’s the rest of your architecture, too.

Design your systems with that in mind.

• Fully commit to the paradigm. Separate out your code into microservices so you can ship fast and ship often.

• Micro-optimisations matter again. Write efficient code to see the full benefits of the architecture.

• Your cloud service provider has much greater capacity for scale than you do. Make full use of it.

Page 40: The Power of Serverless...pipeline for data analytics. ... • Serverless isn’t just code – it’s the rest of your architecture, too. Design your systems with that in mind.

@jtahoyle #IPEXPO mirrorweb.com Confidential and Proprietary. Copyright © by MirrorWeb Limited. All Rights Reserved.

Q&AJamie Hoyle

[email protected] @jtahoyle


Recommended