+ All Categories
Home > Documents > In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring...

In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring...

Date post: 01-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
31
Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems
Transcript
Page 1: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

Best Practices for Monitoring Distributed In-Memory ComputingDenis MekhanikovJuly 31, 2019

2019 © GridGain Systems

Page 2: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

What communication with GridGain support often looks like

Customer: The cluster is hanging.

GG: Please send logs.

Customer: We don’t have logs.

GG: Did you take thread dumps?

Customer: Nope.

GG: The problem is probably in GC.

What is the memory consumption level?

Customer: ...

2019 © GridGain Systems2

Page 3: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Why should we monitor?

3

• Check if everything is fine• Prevent upcoming issues• Discover and react to the issues that

already happened

• Find a reason for an issue and prevent it from happening again

Dashboarding

Logging

Page 4: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

What to monitor?

4

• Every node in isolation• Connection between nodes• System as a whole

Page 5: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Every node is...

5

• Hardware (hypervisor)• Operating System• Virtual machine• Application

Page 6: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Hardware / Hypervisor / OS

6

• CPU• Memory• Disk• System logs • Cloud Provider’s logs

Page 7: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Network

7

• Ping monitoring• Network hardware monitoring

TCP dumps

Page 8: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

JVM

8

GC logs• JMX

Java Flight RecorderThread DumpsHeap Dumps

● java -XX:+HeapDumpOnOutOfMemoryError ...

Page 9: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Application

9

• Logs• JMX• Throughput / Latency• Test queries

Page 10: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

10

Page 11: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

11

Metrics

Page 12: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

12

Logs

Page 13: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

13

JVM

MAT

Java Flight Recorder

Page 14: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

14

Network

Page 15: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

15

Benchmarking

Page 16: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Tools

16

Page 17: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems

GridGain

17

Page 18: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain

18

OS

JVM

GridGain

Hardware

Page 19: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Cache Metrics

19

CacheMetricsMXBean• CacheGets• AverageGetTime• AverageTxCommitTime• ...

CacheGroupMetricsMXBean• LocalNodeMovingPartitionsCount• ClusterMovingPartitionsCount• ClusterOwningPartitionsCount• ...

Page 20: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Cache Metrics

20

How to enable cache metrics

CacheConfiguration<K, V> cacheCfg = new CacheConfiguration<>("cache");

// Enable metrics.cacheCfg.setStatisticsEnabled(true);

ignite.createCache(cacheCfg);

Page 21: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Discovery and Communication

21

TcpDiscoverySpiMBean• MessageWorkerQueueSize• AvgMessageProcessingTime• Coordinator• NodesFailed• ...

TcpCommunicationSpiMBean• OutboundMessagesQueueSize• SentMessagesCount• ReceivedMessagesCount• ...

Page 22: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Data Storage

22

Ram

Disk

WAL

Page 23: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Data Storage Metrics

23

Data volume

DataStorageMetricsMXBean• WalTotalSize• TotalAllocatedSize• OffheapUsedSize• ...

DataRegionMetricsMXBean• TotalAllocatedPages• AllocationRate• PagesFillFactor• ...

Page 24: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Data Storage Metrics

24

Checkpoints

DataStorageMetricsMXBean• DirtyPages• CheckpointTotalTime• LastCheckpointDuration• UsedCheckpointBufferSize• LastCheckpointPagesWriteDuration• LastCheckpointMarkDuration• LastCheckpointTotalPagesNumber• ...

Checkpoint marker

Ram

Disk

WAL

Page 25: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Data Storage Metrics

25

Page replacement

DataRegionMetricsMXBean• PagesReplaceRate• PagesReplaceAge• PagesReplaced

Ram

Disk

R/W

Page 26: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: Data Storage Metrics

26

How to enable data storage metrics

DataStorageConfiguration storageCfg = new DataStorageConfiguration();DataRegionConfiguration regionCfg = new DataRegionConfiguration();regionCfg.setName("myDataRegion");

// Enable metrics.storageCfg.setMetricsEnabled(true); // Metrics for data storage.regionCfg.setMetricsEnabled(true); // Metrics for a particular data region.

storageCfg.setDataRegionConfigurations(regionCfg);

Page 27: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: IO metrics

27

Coming in 2.8

IoStatisticsMetricsMXBean• CacheGroupLogicalReads• CacheGroupPhysicalReads• IndexLogicalReads• IndexPhysicalReads• ...

Page 28: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain: WebConsole

28

Page 29: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

GridGain Monitoring

29

Demo

Page 30: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Checklist for monitoring

30

• CPU / Memory / Disk / Network• GC logs• Application logs

+ Problematic places specific to your setup

Page 31: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems

2019 © GridGain Systems

Q&A

31

https://github.com/dmekhanikov/ignite-elk/

https://console.gridgain.com/ [email protected]


Recommended