+ All Categories
Home > Documents > IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First,...

IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First,...

Date post: 21-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
15
Observability a gUide foR bUYeRS
Transcript
Page 1: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

Observabilitya guide for buyers

Page 2: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

Subscribe to ITOps Times Weekly News Digestto get the latest news, news analysis and commentary delivered to your inbox.

• Reports on the technologies affecting IT Operators —APM, Data Center Optimization, Multi-Cloud, ITSM and Storage

• Insights into the practices and innovations reshaping IT Ops such as AIOps Automation, Containers, DevOps,Edge Computing and more

• The latest news from the IT providers, industry consortia,open source projects and research institutions

www.ITOpsTimes.com

Subscribe today to keep up with everything happening in the IT Ops industry.

Stay on top of the IT industry

Page 3: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

Contents

September 2019 3

Observability makes reactive operation teams proactivepage 4

Observabilitya guide for buyers

How COVID-19 impacts the needfor observabilitypage 8

How does Micro Focus help companies with observability?page 9

Gartner’s 3 requirements for APMpage 13

A guide to monitoringtoolspage 14

Application Performance Monitoring: What it means intoday’s complex software worldpage 10

Page 4: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

4 May 2020

By Jenna Sargent

Observability makesreactive operation

Page 5: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

You’ve likely heard the term ob-servability being passed around forthe past few years, and you mighthave assumed that it is just anothermarketing buzzword for monitoring.And you wouldn’t be alone in thatthinking. Some experts would saythat “observability” and “AIOps” and“Application Performance Monitor-ing (APM)” are just terms used todistinguish between products on themarket.

But others would argue thatthere’s a concrete difference be-tween all these terms. Wes Cooper,Product Marketing Manager atMicro Focus, is among them.

Cooper believes that a big differ-entiator between observability andmonitoring is that observability isproactive, while monitoring is reac-tive. In other words, observabilityaims to tell you why somethingbroke, while monitoring just tellsyou what is broken. This differentia-tor is key.

There are a number of other dif-ferences between the two, but theyall tie back into this idea of beingproactive versus being reactive. Ac-cording to Cooper, observability isgood for looking into the unknown,acts as a compliment to DevOps,and uses automation to help fixproblems. Monitoring, on the otherhand, identifies known problems, issiloed in ops teams, and spots prob-lems, but doesn’t fix them, he ex-plained.

How to move from reactive to proactive

Getting from reactive to proactivestates of operation isn’t as simple asflipping on a switch. According toCooper, there are a number of thingscompanies need to do in order to getfrom the “what” state of monitoringto the “why” state of observability.

First, an organization needs to becollecting data from all data points.This means looking at metrics,events, logs, topology, and anychanges. They also need to collectinformation from all data domains,including on-premises, privateclouds, public cloud, and containers.Lastly, they also need to be lookingat things from all perspectives, suchas the system and the end users.

In addition to gathering all of thisdata, companies also have to utilizemachine learning and big data analyt-ics in order to actually gain insightsfrom all of this data. In other words,companies have to adopt AIOps, amethodology and technology that in-troduces AI, machine learning, andautomation into IT operations.

AIOps is a clear break from themonitoring of the past. AIOps takesinto account not just the application,but also the infrastructure— how thecloud is performing, how the networkis performing, etc. With APM, you’reonly looking at the application itselfand the data tied to that application,explained Stephen Elliot, program di-

rector of I&O at research firm IDC. “I think now that one of the big

differences is not only do you haveto have that data, but it’s a muchbroader set of data — logs, metrics,traces — that is collected in nearreal-time or real-time with streamingarchitectures,” said Elliot.”

The three pillars of observability

Cindy Sridharan’s popular “Dis-tributed Systems Observability”book published by O’Reilly claimsthat logs, metrics, and traces are thethree pillars of observability.

According to Sridharan, an eventlog is a record of events that con-tains both a timestamp and payloadof content. Event logs come in threeforms:

Plaintext: A log record stored inplaintext is the most commonlyused type of log

Structured: A log record that istypically stored in JSON format andhighly advocated for as the form touse

Binary: Examples of binaryevent logs include Protobuf format-ted logs, MySQL binlogs, systemdjournal logs, etc.

Logs can be useful in identifyingunpredictable behavior in a system.Sridharan explained that often dis-tributed systems experience failuresnot because of one specific event

May 2020 5

continued on page 6 >

teams proactive

Observability-ITOpsGuide.qxp_Layout 1 6/3/20 4:42 PM Page 5

Page 6: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

6 May 2020

happening, but because of a seriesof possible triggers. In order to pindown the cause of an event, opera-tions teams need to be able to startwith a symptom pinpointed by ametric or log, infer the life cycle of arequest across various system com-ponents, and iteratively ask ques-tions about interactions betweenparts of that system.

Logs are the base of the three pil-lars, and both metrics and traces arebuilt on top of them, Sridharan wrote.

Sridharan defines metrics as nu-meric representations of datameasured across time intervals.They are useful in observability be-cause they can be used by machinelearning algorithms to gain insightson the behavior of a system overtime. According to Sridharan, theirnumerical nature also allows forlonger retention of data and easierquerying, making them well suitedfor building dashboards that showhistorical trends.

Traces are the final pillar of ob-servability. According to Sridharan,a trace is “a representation of a se-ries of causally related distributedevents that encode the end-to-endrequest flow through a distributedsystem.” They can provide visibilityinto the path that a request took andthe structure of that request. Tracescan help uncover the unintentionaleffects of a request, making themparticularly well-suited for complexenvironments, like microservices.

Monitoring the right data

Organizations can sometimesfall into the trap of monitoring every-thing. According to Cooper, it’s pos-sible for a company to get to a point

where they’re monitoring too much.It’s important to be able to deter-mine what data to observe, andwhat to ignore.

Rather than taking data in fromevery single domain, Cooper recom-mends focusing on what’s impor-tant—metrics, events, logs, andchange data. “Part of getting tobeing more proactive or more ob-servable is being able to collect andstore all those different data types,”he said. “So machine data coming infrom metrics, events logs, topology,are all really important to paint thatfull picture.”

Infrastructure complexity adds tothe need for observability

According to Cooper, the evolu-

tion of observability has mostly hap-pened within the past few years. Hebelieves that up until the past fewyears monitoring was sufficient atmost companies. For a long time itwas typical to have one applicationrunning on one server. If an organi-zation had 20 applications, theycould have one or two people look-ing after those and using basicmonitoring tools to get performancestatistics on those servers.

That just isn’t the case anymore.Now, organizations have workloadsthat aren’t just running on physicalservers they have control over. Nowworkloads are running all over theplace: across different public cloudvendors, in private clouds, in con-tainerized environments, etc.

Those technologies all help in-crease velocity and reduce the fric-tion created through the process ofgetting code into production. Butthat increased velocity comes at aprice: greater complexity.

“The fact now is you have a lotmore data that’s being produced inthese environments every single day,”said Cooper. “And so having the abil-ity to ingest all that data, but alsomake sense of that data is some-thing that’s taking a ton of operators’time. And they don’t have the abilityto go through that manually everyday.They literally are to the point whereyou have to enlist analytics and ma-chine learning to help comb throughall that data and make sense of it.”

Operators’ jobs have become notonly more complicated, but more im-portant. For example, something sim-ple like a bad configuration or changecan throw off an entire service. “Re-ally being able to enlist tools that cantake in all this information, all thisperformance information, all the userdata, and being able to paint a pictureof that across all these different envi-ronments has really made monitoringa lot more complex and that’s why or-ganizations have to get more proac-tive today,” said Cooper.

This is where the need for AIOpsis clear. With all of this new com-plexity, you need to be monitoringnot just the application, but also thenetwork and the infrastructure.

< continued from page 5

Observability makes reactive operation teams proactive

continued on page 8 >

“All the user data, and being able topaint a picture of that across all thesedifferent environments has reallymade monitoring a lot more complex.”

—Wes Cooper, Product Marketing Managerat Micro Focus

Observability-ITOpsGuide.qxp_Layout 1 6/3/20 4:42 PM Page 6

Page 7: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

oring a o monity tBring clarit

e of AuturFo they “Hi” tSa

emediation nd r

AIOps the

ations Bridgeercus Opoo Fout:e abearn morL Micr

Page 8: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

Observability makes reactive operation

AIOps is also important because itallows operators to train the systemto reconfigure itself in order to ac-commodate changing loads or pro-vision data storage as needed.

Goodbye, blame cultureAs with any new technology,

shifting to AIOps and observabilitywill require a culture change at theorganization. Joe Butson, co-founder of consulting company BigDeal Digital, believes that the au-tomation in AIOps will eliminate

blame cultures, where fingers arepointed when there is an incident.AIOps will lead to an acceptancethat problems are going to happen.“One of things about the culturechange that’s underway is onewhere you move away from blamingpeople when things go down to, weare going to have problems, let’s notlook for root cause analysis as towhy something went down, but whatare the inputs? The safety culture isvery different. We tended to rootcause it down to ‘you didn’t do this,’and someone gets reprimanded andfired, but that didn’t prove to be as

helpful, and we’re moving to a gen-erative culture, where we know therewill be problems and we look to thefuture,” said Butson.

This ties back into the move fromreactive to proactive, advocated forby Cooper. When you’re proactiveabout detecting issues, rather thanonly reacting to issues, that blameculture goes away.

“It’s critical to success becauseyou can anticipate a problem and fixit. In a perfect world, you don’t evenhave to intervene, you just have tomonitor the intervention and new re-sources are being added as needed,”Butson said. “You can have the moni-toring automated, so you can auto-scale and auto-descale. You’d studytraffic based on having all this data,and you are able to automate it. Oncein a while, something will break andyou’ll get a call in the middle of thenight. But for the most part, by au-tomating it, being able to take some-thing down, or roll back if you’retrying to bring out some new featuresto the market and it doesn’t work,being able to roll back to the last bestconfiguration, is all automated.”

The state of observability and monitoring

The promises made by observ-ability and AIOps are enticing, butCharley Rich, research director atGartner, cautions against declaringobservability as the holy grail ofmonitoring just yet. According toRich, APM is a mature market, andis past the bump on the Gartnerhype cycle. “AIOps, on the otherhand, is just climbing up the moun-tain of hype,” Rich said. “Very, verydifferent. What that means in plainEnglish is that what’s said about

< continued from page 6

8 May 2020

How COVID-19 impacts the need for observability

It’s difficult these days to talk about trends and predictions withoutframing things in terms of COVID-19. The pandemic is going to have im-pacts — whether they be large or small — on every aspect of business,and observability is no exception.

According to Cooper, the shift to remote work has forced companiesto question what they need to be monitoring. For example, a lot of cus-tomers are now connecting to their companies’ networks from homethrough a virtual private network (VPN). This adds another layer of tech-nologies that need to be monitored that might not have even been a con-sideration in an office that doesn’t offer remote work normally. Inaddition to monitoring them to make sure they’re up and running, opera-tions teams should also be ensuring that they’re configured properly sothat proprietary information doesn’t get out, Cooper explained.

A lot of operations teams have been working on creating dashboardsthat display real-time views on how their VPNs are performing, Coopersaid. This includes things like who is using the VPN and how quickly theyare able to access services on that VPN.

There are also a lot of collaboration tools being used to facilitate re-mote work that need to be looked at. Companies need to be monitoringtools like Skype for Business, Zoom, or Microsoft Teams and ensure thatthey’re performing well under increased load. “We’ve got 15,000 people inour company and we’ve noticed from a Teams perspective that there isexcess load that’s on their servers right now. So I think that from thiswhole, even when we talk about what’s relevant with COVID, we’ve seen alot of monitoring or operations teams on the scramble. With everythinggoing remote, we need to make some adjustments in how we’re doingmonitoring. There’s definitely some relevancy,” said Cooper. n

Observability-ITOpsGuide.qxp_Layout 1 6/3/20 4:42 PM Page 8

Page 9: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

AIOps today is just not quite true.You have to look at it from the per-spective of maturity.”

Rich believes that there are a lot ofcompanies out there making as-sumptions about AIOps that aren’tquite true. For example, some believethat AIOps will automatically solveproblems on its own and a number ofvendors market their AIOps solutionusing terms like self-healing, a capa-bility that Rich says simply doesn’texist yet. “You know, you and I go outand have a cocktail while the com-puter’s doing all the work,” said Rich.“It sounds good; it’s certainly aspira-tional and it’s what everyone wants,but today, the solutions that runthings to fix are all deterministic.Somewhere there’s a script with abunch of if-then rules and there’s ahard-coded script that says, ‘if thishappens, do that.’ Well, we’ve had thatcapability for 30 years. They’re justdressing it up and taking it to town.”

But Rich isn’t entirely dismissiveof the hype around AIOps. “It’s veryexciting, and I think we’re going to

get there,” said Rich. “It’s just, we’reearly, and right now, today, AIOpshas been used very effectively forevent correlation — better than tradi-tional methods, and it’s been verygood for outlier and anomaly detec-tion. We’re starting to see in ITSMtools more use of natural languageprocessing and chatbots and virtualsupport assistants. That’s an areathat doesn’t get talked about a lot.Putting natural language processingin front of workflows is a way of de-mocratizing them and making com-plex things much more easilyaccessible to less-skilled IT workers,which improves productivity.”

It’s important to be vigilant about

breaking down what a solutionpromises versus what it actuallydoes. According to Rich, there areplenty of solutions out there usingmachine learning algorithms to helpthem self-adjust. But leveraging ma-chine learning doesn’t automaticallymake a solution an AIOps solution.

“Everybody’s doing this,” Richsaid. “We in the last market guidesegmented the market of solutionsinto domain-centric and domain-ag-nostic AIOps solutions. So domain-centric might be an APM solutionthat’s got a lot of machine learningin it but it’s all focused on the do-main, like APM, not on some otherthing. Domain-agnostic is more gen-eral-purpose, bringing in data fromother tools. Usually a domain-ag-nostic tool doesn’t collect, like amonitoring tool does. It relies oncollectors from monitoring tools.And then, at least in concept, it canlook across different data streams,different tools, and come up with across-domain analysis. That’s thedifference there.” n

May 2020 9

teams proactive

How does Micro Focus help companieswith observability?

Micro Focus enables teams to monitor across alldomains and then consolidate that data. It helps withthe collection of data and then stores that data in a sin-gle data lake architecture.

In addition to allowing companies to observe acrossdifferent domains, Micro Focus takes observability tothe next level with machine learning and analytics.After bringing data into that data lake architecture, itprovides machine learning capabilities for that data sothat operators and users can better understand whatthe data actually means. By using Micro Focus’ tools,operators can see where patterns are occurring andthen address them.

Micro Focus then provides a layer of automation on

top of that so that teams can automate on top of whatthey’re observing. It’s one thing to be able to look at anenvironment proactively and be able to spot problemsand trends. Being able to go out and automate theprocess of remediation is the other half of the equation.

In light of the pandemic, Micro Focus is also tailoringits solutions to work for remote-first workplaces. Its cus-tomers have identified three primary requirements forremote IT operations. They need visibility into the healthof collaboration tools, monitoring of VPN and VirtualDesktop Infrastructure (VDI) solutions, and keeping userexperience, continuity, and security top of mind.

Remote-first IT operations teams can utilize MicroFocus’ solutions to address those challenges. n

Observability-ITOpsGuide.qxp_Layout 1 6/3/20 4:42 PM Page 9

Page 10: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

Application Performance Monitoring:

What it means in today’s

10 May 2020

software continues to grow asthe driver of today’s global economy,and how a company’s applicationsperform is critical to retaining cus-tomer loyalty and business. Peoplenow demand instant gratificationand will not tolerate latency — noteven a little bit.

as a result, application perform-ance monitoring is perhaps more im-portant than ever to companieslooking to remain competitive in thisdigital economy. but today’s aPMdoesn’t look much like the aPM of adecade ago. Performance monitor-ing then was more about the applica-tion itself, and very specific to thedata tied to that application. backthen, applications ran in datacenters

on-premises, and written as mono-liths, largely in Java, tied to a singledatabase. With that simple n-tier ar-chitecture, organizations were ableto easily collect all the data theyneeded, which was then displayed inNetworks operations Centers to sys-tems administrators. The hard workcame from command-line launchingof monitoring tools — requiring sys-tems administration experts — sift-ing through log files to see what wasreal and what was a false alarm, andfrom reaching the right people to re-mediate the problem.

in today’s world, doing aPM effi-ciently is a much greater challenge.applications are cobbled together,not written in monoliths. some of

those components might be runningon-premises while others are likely tobe cloud services, written as mi-croservices and running in contain-ers. data is coming from theapplication, from containers, Kuber-netes, service meshes, mobile andedge devices, aPis and more. Thecomplexities of modern software ar-chitectures broaden the definition ofwhat it means to do performancemonitoring.

“aPM solutions have adapted andadjusted greatly over the last 10years. you wouldn’t recognize them atall from what they were when thismarket was first defined,” said Charleyrich, a research director at gartnerand lead author of the aPM MagicQuadrant, as well as the lead authoron gartner’s aioPs market guide.

By David Rubinstein

Page 11: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

complex software world

May 2020 11

so, although aPM is a maturepractice, organizations are having tolook beyond the application — tomultiple clouds and data sources, tothe network, to the iT infrastructure— to get the big picture of what’sgoing on with their applications. andwe’re hearing talk of automation,machine learning and being proac-tive about problem remediation,rather than being reactive.

“aPM, a few years ago, started ex-panding broadly both downstreamand upstream to incorporate infra-structure monitoring into the prod-ucts,” rich said. “Many times, there’sa problem on a server, or a VM, or acontainer, and that’s the root cause ofthe problem. if you don’t have that in-frastructure data, you can only infer.”

rekha singha, the software-Com-puting systems research area headat Tata Consultancy services, seestwo major monitoring challenges

that modern software architecturespresent.

first, she said, is multi-layered dis-tributed deployment using big datatechnologies, such as Kafka, Hadoopand Hdfs. The second is that mod-ern software, also called software2.0, is a mix of traditional task-drivenprograms and data-driven machinelearning models. “The distributed de-ployment brings additional perform-ance monitoring challenges due tocascaded failures, staggeredprocesses and global clock synchro-nization for co-relating events acrossthe cluster, she explained. ”further, asoftware 2.0 architecture may need atight integrated pipeline from devel-opment to production to ensure goodaccuracy for data-driven models. Per-formance definition for software 2.0architectures are extended to bothsystem performance and model per-formance.”

Moreover, she added, modern ap-plications are largely deployed onheterogeneous architectures, includ-ing CPu, gPu, fPga and asiCs. “Westill do not have mechanisms tomonitor performance of these hard-ware accelerators and the applica-tions executing on them,” she noted.

The new culture of APM

despite these mechanisms fortotal monitoring not being available,companies today need to competeto be more responsive to customerneeds. and to do so, the have to beproactive. Joe butson, co-founder ofconsulting company big deal digital,said, “We’re moving to a culture ofresponding ‘our hair’s on fire,’ to

continued on page 12

Page 12: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

12 May 2020

APM: What it means in today’s complex software world

being proactive,” he said. “We have alot more data … and we have to getthat information into some sort of avisualization tool. and, we have toprioritize what we’re watching. Whatthis has done is change the cultureof the people looking at this infor-mation and trying to monitor andtrying to move from a reactive toproactive mode.”

in earlier days of aPM, whenthings in application slowed orbroke, people would get paged. but-son said, “it’s fine if it happens from9 to 5, you have lots of people in theoffice, but then, some poor person’sgot the pager that night, and thatjust didn’t work because what itmeant in the MTTr — mean time torecovery — depending upon whenthe event occurred, it took a longtime to recover. in a very digitizedworld, if you’re down, it makes it intothe press, so you have a lot of risk,from an organizational perspective,and there’s reputation risk.

High-performing companies arelooking at data and anticipatingwhat could happen. and that’s a re-ally big change, butson said. “orga-nizations that do this well arewinning in the marketplace.”

Who’s job is it, anyway?

With all of this data being gener-ated and collected, more people inmore parts of the enterprise needaccess to this information. “i thinkthe big thing is, 10-15 years ago,there were a lot of app supportteams doing monitoring, i&o teams,who were very relegated to thistask,” said stephen elliot, programvice president for i&o at research

firm idC. “you know, ‘identify theproblem, go solve it.’ Then the warrooms were created. Now, with agileand devops, we have [site reliabilityengineers], we have devops engi-neers, there are a lot broader set ofpeople that might own the responsi-bility, or have to be part of thebroader process discussion.”

and that’s a cultural change. “inthe NoCs, we would have had opera-tions engineers and sys admins look-ing at things,” butson said. “We’removing across the silos and have thedevelopment people and their man-agers looking at refined views, be-cause they can’t consume it all.”

it’s up to each segment of the or-ganization looking at data to priori-tize what they’re looking at. “Thedev world comes at it a little differ-ently than the operations people,:butson continued. “operations peo-ple are looking for stability. The de-velopment people really care aboutspeed. and now that you’re bringingsecurity people into it, they look attheir own things in their own way.When you’re talking about opera-tions and engineering and the busi-ness people getting together, that’snot a natural thing, but it’s far betterto have the end-to-end shared visionthan to have silos. you want to havea shared understanding. you wantpeople working together in a cross-functional way.”

enterprises are thinking throughthe question of who owns responsi-bility for performance and availabil-ity of a service. according to idC’selliot, there is a modern approach toperformance and availability. Hesaid at modern companies, thethinking is, “ ‘we’ve got a devopsteam, and when they write the serv-ice, they own the service, they havefull end-to-end responsibilities, in-

cluding security, performance andavailability.’ That’s a modern, ad-vanced way to think.”

in the vast majority of companies,ownership for performance andavailability lies with particular groupshaving different responsibilities. Thiscan be based on the enterprise’s or-ganizational structure, and the skillsand maturity level that each teamhas. for instance, an infrastructureand operations group might own per-formance tuning. elliot said, “We’vetalked to clients who have a cloudCoe that actually have responsibilityfor that particular cloud. While theymay be using utilities from a cloudprovider, like aWs Cloud Watch orCloud Trail, they also have the ideathat they have to not only trust theirdata but then they have to validate it.They might have an additional ob-servability tool to help validate theperformance they’re expecting fromthat public cloud provider.”

in those modern organizations,site reliability engineers (sres) oftenhave that responsibility. but again, el-liot here stressed skill sets. “When wetalk to customers about an sre, it’sreally dependent on, where did thesefolks come from?” he said. “Wherethey reallocated internally? are they acombination of skills from ops anddev and business? Typically, thesefolks reside more along the lines of iToperations teams, and generally theyhave operating history with perform-ance management, change manage-ment, monitoring. They also startthinking about are these the righttasks for these folks to own? do theyhave the skills to execute it properly?”

organizations also have to bal-ance that out with the notion of ap-plying development practices totraditional i&o principles, and bring-ing a software engineering mindset

< continued from page 11

Page 13: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

to systems admin disciplines. and,according to elliot, “it’s a hard transi-tion.”

Compound all that with the grow-ing complexity of applications, run-ning the cloud as containerizedmicroservices, managed by Kuber-netes using, say, an istio servicemesh in a multicloud environment.

TCs’ singha explained that con-tainers are not permanent, and mi-croservices deployments haveshorter execution times. Therefore,any instrumentation in these typesof deployment could affect the guar-antee of application performance,she said. as for functions as a serv-ice, which are stateless, applicationstates need to be maintained explic-itly for performance analysis, shecontinued.

it is these changes in softwarearchitectures and infrastructure thatare forcing organizations to rethinkhow they approach performancemonitoring from a culture stand-point and from a tooling standpoint.

aPM vendors are adding capabil-ity to do infrastructure monitoring,which encompasses server monitor-ing, some amount of log file analyst,and some amount of network per-formance monitoring, gartner’s richsaid.others are adding or haveadded capabilities to map out busi-ness processes and relate the mile-stones in a business process towhat the aPM solution is monitor-ing. “all the data’s there,” rich said.“it’s in the payloads, it’s accessiblethrough aPis.” He said this ability tobring out visualize data can showyou, for instance, why boston usersare abandoning their carts 20%greater than they are in New yorkover the last three days, and comeup with something in the applicationthat explains that. n

Gartner’s 3 requirements for APMaPM, as gartner defines it in its Magic Quadrant criteria, is based on three

broad sets of capabilities, and in order to be considered an aPM vendor by gart-ner, you have to have all three. Charley rich, gartner research director and lead au-thor of its aPM Magic Quadrant, explained:

The first one is digital experience monitoring (DXM). That, rich said, is “theability to do real user monitoring, injecting Javascript in a browser, and synthetictransactions — the recording of those playbacks from different geographicalpoints of presence.” This is critical for the last mile of a transaction and allows youto isolate and use analytics to figure out what’s normal and what is not, and under-stand the impact of latency. but, he cautioned, you can’t get to the root cause of is-sues with dXM alone, because it’s just the last mile.

digital experience monitoring as defined by gartner is to capture the uX latencyerrors — the spinner or hourglass you see on a mobile app, where it’s just waitingand nothing happens — and find out why.

rich said this is done by doing real user monitoring — for web apps, that meansinjecting Javascript into the browser to break down the load times of everythingon your page as well as background calls. it also requires the ability to capturescreenshots automatically, and capture entire user sessions. This, he said, “canget a movie of your interactions, so when they’re doing problem resolution, notonly do they have the log data, actual data from what you said when a ticket wasopened, and other performance metrics, but they can see what you saw, and play itback in slow-motion, which often provides clues you don’t know.”

The second component of a gartner-defined aPM solution is application dis-covery diagnostics and tracing. This is the technology to deploy agents out tothe different applications, VMs, containers, and the like. With this, rich siad, youcan “discover all the applications, profile all their usage, all of their connections,and then stitch that together to what we learn from digital experience to representthe end-to-end transaction, with all of the points of latency and bottlenecks and er-rors so we understand the entire thing from the web browser all the way throughapplication servers, middleware and databases.”

The final component is analytics. using ai, machine-learning analytics appliedto application performance monitoring solutions can do event correlation, reducefalse alarms, do anomaly detection to find outliers, and then, do root cause analy-sis driven by algorithms and graph analysis.

— David Rubinstein

May 2020 13

Page 14: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

14 May 2020

n AppDynamics: The appdy-namics application intelligence Plat-form provides a real-time, end-to-endview of application performance andits impact on digital customer expe-rience, from end-user devicesthrough the back-end ecosystem—lines of code, infrastructure, usersessions and business transactions.The platform was built to handle themost complex, heterogeneous, dis-tributed application environments; tosupport rapid identification and res-olution of application issues beforethey impact users; and to deliverreal-time insights into the correlationbetween application and businessperformance.

n Dynatrace provides softwareintelligence to simplify enterprisecloud complexity and accelerate dig-ital transformation. With ai andcomplete automation, our all-in-oneplatform provides answers, not justdata, about the performance of ap-plications, the underlying infrastruc-ture and the experience of all users.We help companies mature existingenterprise processes from Ci to Cdto devops, and bridge the gap fromdevops to hybrid-to-native aiops.

n IBM helps organizations mod-ernize their iT operations manage-ment with its aiops solution. it helpsorganizations see patterns and con-texts that aren’t obvious, helpingthem avoid investigation, improve re-sponsiveness, and lower operationscosts. it also automates iT tasks tominimize the need for human inter-vention. ibM’s solution can be incor-porated no matter what stage in thedigital transformation journey a cus-tomer is at.

n InfluxData: aPM can be per-formed using influxdata’s platforminfluxdb. influxdb is a purpose-built

time series database, real-time ana-lytics engine and visualizationpane. it is a central platform whereall metrics, events, logs and tracingdata can be integrated and centrallymonitored. influxdb also comesbuilt-in with flux: a scripting andquery language for complex opera-tions across measurements.

n Instana is a fully automaticapplication Performance Monitor-ing (aPM) solution that makes iteasy to visualize and manage theperformance of your business ap-plications and services. The onlyaPM solution built specifically forcloud-native microservice architec-tures, instana leverages automationand ai to deliver immediate action-able information to devops. for de-velopers, instana’s autoTracetechnology automatically capturescontext, mapping all your applica-tions and microservices withoutcontinuous additional engineering.

n Moogsoft is a pioneer andleading provider of aiops solutionsthat help iT teams work faster andsmarter. With patented ai analyzingbillions of events daily across theworld’s most complex iT environ-ments, the Moogsoft aiops platformhelps the world’s top enterprisesavoid outages, automate service as-surance, and accelerate digital trans-formation initiatives. founded in2011, Moogsoft has more than 120customers worldwide and strategicpartnerships with leading managedservice providers and outsourcingorganizations.

n Optanix’ aiops platform wasdeveloped from the group up, ratherthan just adding an analytics engineor machine learning capabilities toan existing platform. The solution of-fers full-stack detection and moni-toring, predictive analysis and smartanalytics, true and actionable rootcause analysis, and business serviceprioritization.

n MicroFocus: as more services are delivered through more chan-nels, monitoring and resolving issues becomes exponentially more diffi-cult. Micro focus operations bridge cuts through the complexity ofhybrid iT, so you can keep services running. Make the shift to automated,ai-based, business-focused delivery—powered by aiops. you’ll monitorapplications, detect anomalies, and fix problems with new speed and in-sight. That’s how you satisfy users in the digital enterprise.

A guide to observability toolsFEATURED PROVIDER

mcplus2
Highlight
Page 15: IT Ops Times – Observability · 2020. 6. 15. · to the “why” state of observability. First, an organization needs to be collecting data from all data points. This means looking

n Plumbr: Plumbr is a modernmonitoring solution designed to beused in microservice-ready environ-ments. using Plumbr, engineeringteams can govern microservice ap-plication quality by using data fromweb application performance moni-toring. Plumbr unifies the data frominfrastructure, applications, andclients to expose the experience ofa user. This makes it possible to dis-cover, verify, fix and prevent issues.Plumbr puts engineering-driven or-ganizations firmly on the path toproviding a faster and more reliabledigital experience for their users.

n ScienceLogic offers a “con-text-infused” aiops platform thathelps organizations discover and un-derstand the relationship betweeninfrastructure, applications, andbusiness services. it also allowsthem to integrate and share dataacross different technologies in real-time, and apply multi-directional in-tegrations for automatingresponsive and proactive actions inthe cloud.

n Splunk provides real-time visi-bility across the enterprise. its data-to-everything Platform enables iToperations teams to prevent prob-lems before they impact customers.it offers observability across silosand enables users to investigatedeeper when needed and use pre-dictive analytics to anticipate out-ages.

n StackState’s aiops platformhelps iT operations teams breakdown silos in their teams and tools.its solution combines logs, events,metrics, and traces in real time inorder to help customers resolve is-sues faster. With stackstate, organi-zations can consolidate all of theirdata into a single platform with anunderstandable ui. n

May 2020 15

Information You Need:ITOps Times

Every business today is a software company, and executingon rapid-fire releases while adopting new technologies such ascontainers, infrastructure as code,and software-defined networks isparamount for success. ITOps Times is aimed at IT managers who need to stay on top

of rapid, massive changes to how software is deployed, man-aged and updated.

www.ITOpsTimes.com

NEWS AS IT HAPPENS, AND WHAT IT MEANS.


Recommended