+ All Categories
Home > Documents > Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern...

Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern...

Date post: 13-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
36
Best Pracces: IMS CDC to Modern Streaming Plaorms Sco Quillicy SQData – a Syncsort Company November 2019 Session HA Place your custom session QR code here. Please remove the border and text beforehand.
Transcript
Page 1: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Best Practices: IMS CDC toModern Streaming PlatformsScott QuillicySQData – a Syncsort Company

November 2019Session HA

Place your customsessionQR code here.Please remove theborder and textbeforehand.

Page 2: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Topics

Introductions

Concepts

Architectural Overview Modern Streaming Platforms IMS Changed Data Capture (CDC) Flows

Best Practices Overall Approach Performance & Throughput

Conclusion

Q&A

Page 3: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

About Me Scott Quillicy

38 Years Database Experience Database Performance Software Development Mainframe Changed Data Capture / Replication Since the Early 90’s

Started With: IMS Version 1.2 DB2 V1.3

Founded SQData in 2000 to Provide Customers with: A Better Way of Replicating Mainframe Data → Particularly IMS Solutions that Combine Consulting Expertise with Technology Technology Built Around Best Practices

SQData Acquired by Syncsort in August 2019 OEM Supplier of Mainframe CDC Connectors Now Part of the Connect Family of Products

Page 4: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Objectives

1) Practical Overview of Streaming IMS CDC to Modern Platforms

2) Focus → IMS System / DBA Perspective

3) How to Minimize Impact on the Mainframe

4) Early Detection of Red Flags

5) How to Avoid a Data Replication Time Sink

Page 5: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

First Hurdle → The Great Divide

Page 6: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Today’s Popular Streaming Platforms

Apache Kafka

Azure Event Hubs

Amazon Kinesis

Google Pub/Sub

Page 7: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Streaming Platforms → Interest Over Time

Page 8: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Streaming Platform Architecture

Page 9: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Recommended Streaming Deployment Model

Initial Data Landing Highly Secure

Sanitized Data Ready for Consumption Capture & Ingestion

Best of ClassTransform &Sanitization

Page 10: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

General Best Practices

Avoid Data Collection Overkill

Help Bridge the Great Divide

Approach with a Comprehensive Strategy

Involve the Business Unit(s) from the Beginning

Be Aware of Application Release Cycle

Page 11: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Major Red Flag

You Need to Start Small

Minimize Data Volume

Deploy in Small Increments Realize Success Early Adjust Infrastructure Predictable Costs & Duration

Adapt Easier to New Targets

In “The Lake”

Everything

Does Not Have to Be

Data Collection Overkill

Page 12: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Bridging The Great Divide Realize that they are going to take your data – Make the best of it Be able to translate mainframe-speak to distributed / cloud Mentor them on working with copybooks and data types Help them understand IMS data structures and keys

Page 13: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Technical Best Practices Summary

Keep it Simple & Efficient → Minimize the Number of Moving Parts

Avoid Overscaling → Start with Basic Configuration & Work Up

Keep Full Extracts / Initial Loads on Standby Need if ‘Point of No Return’ Reached Must be Able to Run Against Live Databases

Discovery / Planning Data Sources → CDC and Bulk Loads Latency Requirements Data Volume → CDC and Bulk Loads Peak Transaction Arrival Rate

Ensure Proper Monitoring / Alerts are Set

Goal → ‘Set & Forget’ Deployment

Page 14: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Streaming Performance Aspects

Throughput / Latency Primarily Depends on Speed of the Target

Fortunately, Streaming Platforms are Very Fast Targets

Top Items Affecting Performance / Throughput:

Message Size

IMS CDC Streaming → Transaction Size and Arrival Rate

Target Replication Scaling Factor → Affects Acknowledgement Repsonse

IMS Initial Loads → Overall Data Volume

Streaming Platform Configuration → Must be Tuned for Source Workload

Page 15: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Key Monitoring Metrics

Operational Component Down Unexpectedly

Latency → How Far Behind You Are in Publishing

Latency → When Did You Last Hear from the Engine(s).

Informational Overall Throughput → Records / Bytes per Second

Workload Patterns / Peak Transaction Arrival Rate

Number of CDC Records by Database / Segment

Number of Transactions

zIIP Offload Statistics

Page 16: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Message Size

Small Messages Perform Better than Larger Ones (a bit obvious) It Boils Down to the Number of Bytes per Message

IMS Segments can be Large (redefines, arrays, etc.)

Updates can Contain Before and After Images

So...What to Do? Suggest Avro as the Target Data Format

A Condensed Version of JSON JSON Typically Used for Data Validation Only → Can be Easily Read Avro Messages Roughly the Size of Source Segment (x 2 for Updates)

Reduce the Number of Fields in the CDC Message

Evaluation the Requirement for Publishing Before Images of Updates

Page 17: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Sample IMS CDC Record in JSON Format{"object_name":"IMSDB01.SEG02 "stck":"d4c4b51993db0000", "timestamp":"2018-08-12T11:11:18Z", "change_op":"U", "seq":"2", "parent_key":{ "seg1.key1":12345 }, "after_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"4087.66" }, "before_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"2964.32" }}

Page 18: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Deployment Architecture

Page 19: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC Streams → Methods of Capture Log Based (x’99’s) Data Capture Exit Vendor Proprietary (over the top)

Page 20: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC - Architecture to Avoid

IMS Unloader

IMS Logger Exit Significant Performance Impact on IMS Definite No-Go for High Volume IMS Shops Severe Choke Point → Intrusive Can Cause IMS to Crash

IMS Log CaptureExit

(DFSFLGX0)

Publisher

IMS Log Datasets(OLDS)

Page 21: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC Streaming Illustration Start with Basic Configuration and Scale Up as Required

Need to be Able to Scale on Both the Source and the Target Sides

Suggestion → Use a Separate Set of Topics for the CDC Streams vs Initial Loads

IMS Unloader

TransientStorage

Publisher

IngestEngine

CDC Topics

SchemaRegistry

Capture

z/OS Linux

Page 22: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC Streams → Target Side Scaling Start Here First Parallelize the Apply/Ingest Engines – One (1) Publisher Subscription Order May Matter → Engines Need to be able to Maintain Transaction Order Desired Behavior

Minimal Back Pressure on the Source Side Publisher Latency Should be within a Tolerable Limit

IMS Unloader

TransientStorage

Publisher

CDC Topics

SchemaRegistry

IngestEngine

Capture

z/OS Linux

Page 23: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC Streams → Source Side Scaling

IMS Unloader

TransientStorage

Publisher

CDC Topics

SchemaRegistry

Multiple Publisher Subscriptions → Split by Database / Partition / FP Area Parallel Ingest Engines on Target Side – One (1) per Subscription Desired Behavior

Minimal Back Pressure on Source Side Minimal CPU Consumption

IngestEngine

IngestEngine

IngestEngine

Capture

z/OS Linux

Page 24: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS CDC Streams → Source Side Scaling...

IMS Unloader

CDC Topics

SchemaRegistry

Consider for Extreme CDC Data Volume → 1TB+ per Day Multiple Captures / Publishers → Split by IMS Subsystem / Data Sharing Partner Suggestion → Combine Online SSIDs and Split Out Batch SSIDs

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IMS Unloader

IMS Unloader

TransientStorage

Publishers

Capture

Capture

Capture

z/OS Linux

Page 25: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

The Initial Load Required to Prime the Target(s) and to Re-Materialize if Point of No Return Reached Load via Mass Inserts → There are No Traditional Utilities on the Target Side Alternative: Online Snapshots → FTP → Transform → Ingest into Kafka Important: Need to be Able to Run Unloads Against Live Databases Recommend → Use a Separate Set of Topics for the Initial Load vs CDC

IMS Unloader

TransientStorage

Publisher

IngestEngine

Initial Load Topics

SchemaRegistry

z/OS Linux

Page 26: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

The Initial Load → Target Side ScalingSame Model as for CDC Streams Parallelize the Ingest Engine – One (1) Publisher Subscription Desired Behavior

Minimal Back Pressure on the Source Side Minimize Time and CPU Required to Complete the Initial Loads

IMS Unloader

TransientStorage

Publisher

Initial Load Topics

SchemaRegistry

IngestEngine

z/OS Linux

Page 27: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

The Initial Load → Source Side Scaling

IMS Unloader

TransientStorage

Publisher

Initial Load Topics

SchemaRegistry

IngestEngine

IngestEngine

IngestEngine

Multiple Publisher Subscriptions → Split by Database / Partition / FP Area Parallel Ingest Engines on Target Side – One (1) per Subscription Desired Behavior

Minimal Back Pressure on the Source Side Minimize Time and CPU Required to Complete the Initial Loads

z/OS Linux

Page 28: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

The Initial Load → Source Side Scaling...

IMS Unloader

Initial Load Topics

SchemaRegistry

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IngestEngine

IMS Unloader

IMS Unloader

TransientStorage

Publishers

Consider for Extreme Data Volume → 5TB+ IMS Data Multiple Unloads / Publishers → Split by IMS Subsystem / Data Sharing Partner Goal → Minimize Time & CPU Required to Complete the Initial Loads

z/OS Linux

Page 29: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Common Operational Situations

Page 30: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Target Streaming Platform Outage Target Unavailable to Receive Data → Ingest Engine(s) Should Stop and Disconnect

Data will Start Backing Up on the Source Side Option 1: Wait Until Cluster is Available and Restart

OK if for a Short Period of Time – Depends on Source Transaction Rate Eventually, You Mayl Reach the Point of No Return

Option 2: Spin Off CDC Data to a File(s) – Process when Target Comes Back Online

IMS Unloader

TransientStorage

Publisher

IngestEngine

CDC Topics

SchemaRegistry

Capture XBack Pressure Building UpCDC

Data File(s)

Page 31: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Target Streaming Platform Slow Throughput Measured with a Calendar → Back Pressure on Source Increases Common Cause → Target Configuration

One (1) or More Brokers Down – Correct and Restart Out of Memory: Set queued.max.requests at a Reasonable Value (more can be trouble) Memory Buffers: Should be Entirely in RAM Log Files: Review Strategy and Settings – There are Many

Important: Leverage Target Monitor Tools such as the Confluent Control Center May Need to Back Off CDC Until Tuning has been Optimized

IMS Unloader

TransientStorage

Publisher

IngestEngine

CDC Topics

SchemaRegistry

Capture

CDC Data Backing Up

Page 32: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Compressed PRILOG in RECON Physical SLDS Exist, but Entries Disappear from the RECON IMS Considers SLDS to be Inactive → CDC Thinks Otherwise Can be Avoided by Increasing the DBRC Log Retention Period Capture Agent Should Provide a Method of Recovery If Capture Cannot Handle → Unwelcome Manual Intervention & Perhaps a Reload

IMS Unloader

SLDS IMS RECON

CompressedPRILOG

Page 33: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

IMS Capture / Publisher Slow Target Not Doing Much → Issue Appears to be on the Source Side (but probably not) Common Causes

Busy System - Processes are Running at Low Priority Abnormally Large Units-of-Work – Exclude ‘Purgers’ (mass deletes)

Monitor: Capture Latency vs Publisher Latency Capture Current → Look at the Publisher Publisher → Check Data Flow Rate and Last Ack Time from Engine

IMS Unloader

Publisher

IngestEngine

CDC Topics

SchemaRegistry

Capture

Page 34: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

In Conclusion... Be Patient with The Great Divide...It will be Challenging

Avoid Data Collection Overkill...At All Costs

Keep Things Simple → Minimize the Number of Moving Parts

Scale Only as Required Meet Throughput Latency Requirements Minimize Back Pressure on the Source

Involve the Business from the Beginning

Encourage / Insist Distributed Team to Have a Target Outage Mitigation Process Spinning Off CDC Records to Disk on Target Side Reloading Source Data

Don’t Skimp on Planning / Discovery

Page 35: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA
Page 36: Best Practices: IMS CDC to Modern Streaming Platforms · Best Practices: IMS CDC to Modern Streaming Platforms Scott Quillicy SQData – a Syncsort Company November 2019 Session HA

Please submit your session feedback!

• Do it online at http://conferences.gse.org.uk/2019/feedback/nn

• This session is HA


Recommended