Post on 04-Aug-2020
transcript
Kirill: This is episode number 11, with aspiring data scientist Garth
Zoller.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur. And
each week, we bring you inspiring people and ideas to help
you build your successful career in data science. Thanks for
being here today and now let’s make the complex simple.
(background music plays)
Welcome to this episode of the SuperDataScience podcast. I'm
very excited about this episode because I have a very
interesting and inspirational guest for you. Today we've got
Garth Zoller, who is a student of mine on SuperDataScience.
And why I'm so excited about this episode is because Garth is
just starting out into data science. Even though he's got a
background in databases, he's only getting into the field of
data science and he's learning very rapidly, very quickly. He's
learning all of the things he needs to know about data science.
And being at that start of the journey, he is sharing the
methodologies, the approaches that he is using to get into
data science.
So those of you who are also just starting out into data
science, who maybe don't know how to tackle this very broad
and complex field, you will find this episode very, very
valuable and also very inspirational. So in this episode, we
talk about how a right brain person can get into data science
and how they can leverage their skills in data science. And
moreover, Garth uses a litmus test saying that if he, being a
right brain person, can get into data science, then you can
too. So you can already tell this is going to be an inspiring
episode.
And here also, Garth shares the tools that he's learning, how
he's learned a little bit about stats, how he's learning about R
programming, Tableau, Microsoft BI. So you'll get like a
sequence of tools that he's been learning and that he's been
aspiring to master. Also, we'll talk about the most important
skill of data science, which in Garth's view, is how to define a
problem, how to think about a problem. And not only will we
talk about that, Garth will actually give us a walkthrough of
one of his case studies. So one of those projects that he's
currently doing at work, we'll actually get to see how he
thought about the problem and what approach he came up
with to tackle that problem. That is going to be super valuable
for those of you who want to know more about how to solve
data science problems.
Also, we'll talk about a culture of data science in an
organisation, and how that is important for you to be able to
learn those skills, and if that culture doesn't exist, what steps
you can take to prompt the organisation to start being more
open to allowing you to learn more data science and bring
more value into the organisation.
And also, we'll talk about data exploration tools. So we'll talk
about using data visualisation tools for data exploration. And
finally, we will talk about giving back to the community and
how you can give back to the community and give back to
those other people out there who also want to learn data
science.
So all in all, it's a very exciting episode. Can't wait to get
started. And without further ado, I bring to you Garth Zoller.
(background music plays)
Hello everybody, and welcome to this episode of the
SuperDataScience podcast. Today I've got one of the top
students of SuperDataScience, Garth Zoller, here on the show
with me. Garth, thank you for joining me. How are you today?
Garth: I'm very well, Kirill. Thank you for having me.
Kirill: It's great to hear you and so right now, I'm sitting in Brisbane,
just so that our listeners can paint a picture. I'm sitting in
Brisbane, and Garth, you're in Durham, in North Carolina. Is
that correct?
Garth: That is correct.
Kirill: How's the weather there?
Garth: We've just exited our humid summer. So we're getting into the
best season ever, which is our nice, cool fall.
Kirill: Nice. Very nice. And it's very early morning, isn't it? Like 7 am
or something?
Garth: Yup.
Kirill: It's really crazy how the time difference, right? In Brisbane
right now, it's already 9 pm on Thursday. And you're only
entering Thursday. It's like 7 am.
Garth: And we're both tired! We're at opposite ends of the day. That's
right.
Kirill: Yeah, yeah, totally. But it's near the end of the week, weekend
coming up. So it'll be fun. So for those of you who don't know
-- probably most listeners won't know this, but Garth is a very
avid student on SuperDataScience, and recently we had a
showcase of our new features, and Garth missed it, the actual
presentation that I was running, and then he emailed me
afterwards saying all the things that he thought about the
presentation, giving some great suggestions of how we can
improve the customer experience. I was so surprised at the
feedback he had, and some great, really good suggestions on
how we can really deliver great experiences for those who are
learning data science. And so I jumped on a call right away
with Garth to ask a bit more about what he thinks, and after
seeing the passion, Garth, that you have for data science, I
was sure that we have to have this podcast.
So I'm really excited about talking about specifically what
you're doing in this space, and how you're getting into the
data science space. Are you excited as well?
Garth: I'm super excited!
Kirill: Totally, it's going to be fun! Alright, so. I've been looking at
your LinkedIn, and you've worked at a company called NetApp
for the past 14 years, with like a little gap in between. But 14
years at NetApp. So what exactly does NetApp do?
Garth: NetApp roughly is a data storage solutions company. And they
focus on designing the hardware and software for managing
large sets of data. They don't store other people's data, but
those systems will efficiently and in a high-performance way
manage data sets.
Kirill: So it's not like a OneDrive, or a Google Drive, or a Dropbox.
It's not that type of service.
Garth: Correct. So companies would come to us and buy this
hardware/software combination and then implement it on
premise in their data infrastructure.
Kirill: Ok, so is that similar to what EMC do?
Garth: Conceptually. We like to think that NetApp is the one that we
would mention first, but that's ok.
Kirill: I gotcha, I gotcha. Alright. NetApp. Ok, and you have gone
through multiple roles in NetApp. I'm just counting them on
your LinkedIn profile. There's like 5, 6, 7, 8 different roles in
NetApp. So can you take us through that journey since you've
started in the company back in 2000. How have you moved
through the roles, and what has this meant for you and your
professional development?
Garth: This has really been a surprise. So I'll start with a general
framework that I actually heard about in my Executive MBA
program, and I wish I had known about it a long time ago. I
might have made different choices. But it's this concept of 2
years up or out. So you stay in a role for 2 years, and if you're
not being promoted, then find somewhere else to go, either in
the company or to another company. And I'd never heard that
before! So my model at NetApp has always been, and just even
in my career in general, is just find interesting work to do.
And yeah, if I go up, ok, fantastic. That's not a bad goal. People
have different levels of ambition and whatever in that respect,
but I place a much greater value on doing work that I
personally find meaningful and interesting and challenging.
So that's how my choices have gone at NetApp.
I came in with a background in databases to do technical
support, third line technical support. And then at a time that
the company was just starting to put databases on their
storage. So there was a great need for that knowledge. And
then I ended up managing that team; it was a small local
team. And then that led to managing a larger global team in a
different product line. And after that, I thought, well, let's look
at some other areas. I was interested in leaving the support
context and expanding out into the business and strategic
side. I wasn't exactly sure how to do that at first, but program
management seemed to be the way to go. And so I started
getting involved in the -- at that time, we had business units
in the programming office, and was given a particular product
to be the program manager for that area.
But my background in databases just kept rearing its head in
terms of the kinds of metrics and analytics that the business
unit GMs were asking for. And I had that knowledge. And so
rather than spend hours and hours in Excel trying to
accomplish something, I just whipped it into SQL Server and
popped it out in a couple minutes! But that also became a real
value add for that business area and then that led to other
types of activities, from beyond basic "how many widgets did
I sell?" to "hey, can you do forecasting?" Well, at that time, I
had no idea how to do forecasting, but sure, I'll give it a go,
why not! And then you just start self motivating in what areas
you want to expand into next.
Kirill: So you came into NetApp, and you had this database
experience, you progressed through the different roles. And
the question is, why data science? When did data science
enter your professional career, and why did you choose to
expand your knowledge and expertise into this field?
Garth: Program management was very interesting, and they're great
skills that I still use, even in data science today in terms of
structuring the problem definition, and having a single source
of truth to refer back to. But I couldn't stop my interest and
fascination in what was happening in the data. There's
something mysterious about data, like a puzzle, like a
mystery. And I want to find out, what is it telling us? What
can I do with it? And ultimately, how can I make better
decisions based on it? And so that led to opportunities to go
into roles that we call Business Analyst. You could
equivalently call that Data Scientist, but we call it Business
Analyst. And their primary objective is metrics, dashboard
reporting, month end quarter, and that kind of thing. But
there's tons of ad hoc activities that come in all the time, of
"hey, we'd like to put together a new kind of product or
packaging solution of some sort. Can we look at historical
data for like products and see how might we position for
pricing? Or how might we position for volume, forecasting of
the volume, things of that nature?" So I entered into that
space and then kind of built a competency in that area.
Kirill: Yeah, and right now, you're a Business Analyst in the
marketing operations. So specifically, this is B2B, right? So
you're dealing with B2B marketing. Is that correct?
Garth: That's correct. Both from a direct sales and a channel sales
perspective.
Kirill: I can totally see how data science skills would be valuable
there, because I personally did some work in customer
experience and insights. But there was a lot of work that we
did with the marketing team in data science. But it's a bit
different though, business to business marketing. What
would you say are any specifics about business to business
marketing and how you use data science in there?
Garth: On some level, there is a lot of similarity, surprisingly. And
there are some differences. So for example, if you were doing
some analysis with respect to margins and discounts, if you're
talking about the channel, you don't have as much control
there. They're taking your raw parts and building the solution,
and they're going to control what kinds of margins and
discounts they're going after. You can provide guidance, but
ultimately, they'll make a choice.
But even in direct sales, I think I see actually more similarity.
So for example, from a B2C perspective, ultimately you're
trying to satisfy the needs of your customers. That's the same
in B2B as well, and the methodologies you might employ from
a data science perspective to do that can be similar. It doesn't
have to be as fancy as something like a conjoint analysis. But
this idea of, you don't want to just get your feature
development roadmap because you talked to your top 2
customers. That's insufficient. You're going to miss it. So in
that sense, I see a lot more similarities than dissimilarities. I
don't know if I've necessarily, other than the channel var
perspective that I just mentioned, I don't think I can
remember something off the top of my head that sticks out as
being, "this is a square B2B challenge that I've had to
address."
Kirill: That's really cool. And to me, it means that wow, so my B2C
experience, I can actually apply it into the B2B space. And I'm
sure a lot of our listeners will be either that way or vice versa.
So if you have B2B experience, you now know, from this, we
can assume that it's transferrable skills. And that's great to
know.
And I want to move on a little bit to the thing that fascinates
me most about you, which is that when we were speaking,
you mentioned all these different areas and ways that you're
improving your data science skill set. I don't think I've met
anybody who is so passionate about learning about data
science, and just like you are getting as much knowledge as
you can from all of these different sources that we spoke
about. What drives you? What keeps you motivated to learn
about data science in such huge strides?
Garth: Fear? No. Because at this point I have a really clear
understanding of kind of how I learn and what I want to
accomplish with data science. I didn’t always know that and
in fact, I’d say even though I’ve been using data science in
various capacities, it wasn’t until really recently that the light
bulb kind of came on and I said, "Ah, okay. Now I’ve got a
clear path that I’m going to follow."
But rather than being what I kind of envision as a typical data
scientist, somebody who is really good with numbers and kind
of very left-brain, I’m actually a right-brain person that has
lived in the left-brain space for my career. So I do a lot of
context switching between kind of the visual graphic
connection idea space to, "Okay, now I’ve got to translate that
to the formulas and algorithm execution space." So for me to
come at a particular problem like data science or challenge, I
need to come at it from as many different angles as possible
because my brain is going to start making those neural
science learning connections that help form more of a systems
model type of view. To folks who aren’t familiar with that
concept, it’s basically, rather than just trying to solve the
problem you’re not—well, I shouldn’t say never, but probably
rarely solving a problem in isolation. Your problem exists in a
larger system and so thinking about things in a system model,
you’re going to draw and say, "Well, here’s how we’re going to
solve this problem but what’s the impact over there?" Or
"What’s their disincentive over there and how does that
impact my model?" So as I’m learning data science and having
that passion for data science, I’m trying to come at it from as
many angles as possible so that I can incorporate that in my
solutions.
Kirill: Yeah, that totally makes sense. And the more angles you
attack it from, the more opportunities you have to learn. And
just for the benefit of our listeners, can you mention a couple
of sources, if you don’t mind sharing these? Where are you
learning data science from?
Garth: SuperDataScience, of course! Thank you, thank you! So I had
first found you on Udemy and some of your courses in
Tableau and data science there. And I should mention too, I
would classify myself kind of at the beginning of my data
science journey. There’s so much more I have to learn. So I
look at every source as a possible opportunity. Coursera – I
have a course that’s being done by Duke in stats right now
that I’m going through that I’m finding tremendously helpful
not only as a review but also picking up some new things,
books. Pretty much anywhere that I can find, I’m going to
leverage.
Kirill: Yeah, totally. And I really appreciate you’re honest about
being able to admit that you’re at the beginning of this data
science journey; and I think people come to data science from
different backgrounds. We’ve had guests that came from
chemical engineering, from physics, from neuroscience,
people that have come from a strategy perspective, people that
come from economics, and so on. And you’re coming from the
space of databases but still, you can appreciate that the data
science field that you’re entering into is quite broad and you
do need to undertake some education. And I think a lot of our
listeners will be able to relate to that because it’s one thing
listening and learning from somebody who is very well
established in the field of data science and learning from their
experiences and successes. It’s a whole different story being
able to relate and to learn from somebody who’s just venturing
into the field of data science and to this space, and just
embarking on their journey. I’m sure you’ll share some very
intriguing and valuable insights with us on this podcast
episode.
Garth: Absolutely. That’s sort of my whole model in life. Again,
harkening back to this idea that who I am and knowing who
you are, the breadth and the vast array of different kinds of
experiences is really what I do well. And I’ve always been
fascinated by people that were just the opposite. They’re very
deep, they’re so knowledgeable, they’re so skilful. And I
thought, "Boy, you kind of wish -- and think that the grass is
greener over there," but you are who you are and it’s best, as
you mentioned in a prior podcast, to focus on your strengths
and enjoy those. So from a learning perspective, yeah, I’ve
listened to those other podcasts and I’m thinking, "Boy, those
guys are so smart," or as I’m learning, I might find a blog
somewhere or an instructor in whatever course and think,
"Oh, look how accomplished they are. Look at how much
they’ve done in terms of their deliverable, their quantifiable
output, what they’re talking about, the depth of their
knowledge." It just seems like the mountain that’s too great to
climb and it can feel very defeating at some level and
discouraging, but my experience and my encouragement to
listeners who are in a similar spot to myself is, do not be
defeated by that. There’s no need to be defeated by that. It’s
so doable to focus on one thing at a time. You don’t have to
solve the whole mountain, just find the rock that’s in front of
your foot and solve that one.
And an example might be, let’s say you know that or you’ve
heard that R is useful for data science. Well, okay. But maybe
you haven’t quite solidified your knowledge of powerful stats
like functions in Excel. Start there instead. And then once
you’ve got that, then the next step might be go to R and try
and accomplish those same things in R. And that gives you a
nice focused scope. The net-net for me is the 20/80 rule.
Focus on the 20% of activities that accomplish 80% of the
outcome. Do it faithfully, do it consistently, and you and your
organisation will be light years ahead of wherever you’re at
from a data science perspective.
Kirill: Yeah, totally. I absolutely agree with that. It’s kind of like data
science is—I wouldn’t say it’s unique in that way; but it has
that advantage and it is so broad that you can learn a lot of
different elements. You can go deep, like a lot of specialists
do, but also you can learn a lot of different elements about
data science and then pick the ones that you’re good at, pick
the ones that you’re interested in, pick the ones that excite
you the most and slowly deepen your knowledge in those, and
then pick up something else, pick up something else. And
through this breadth of knowledge you will still develop a
great expertise. And just speaking of that, once you started
learning data science, where did you decide to start?
Garth: I’d have to think about that.
Kirill: Was it R programming? You said you already had some
experience with SQL, so that doesn’t count. That’s your
background. But when you started venturing into data
science, was it machine learning algorithms, was it maybe
linear regressions, was it R programming, was it the
visualisation with Tableau? Like, what was your first step into
this vast, vast field which is data science?
Garth: If I remember correctly, I think it was during school and it was
a project we were doing – some sort of marketing or market
segment analysis of some sort. And I think it was actually
using Statspack, which at the time was an add-in to Excel
that allows you to do some correlations, and CP values, and
R-squared values, and things of that nature. And I had used
that largely because it was ubiquitously available and free
because I already had it. But the next step after that was
dipping my toe into R and starting to look at that. I kind of
segued a little bit into Python. I’m not totally clear at this point
how I can make that meaningful to me yet so I’m still kind of
squarely in the R space and looking at that.
Kirill: Okay. That’s interesting. So would you say that—so, you went
from a database background into the Statspack Excel add-in,
which I’m assuming would require some statistical
knowledge, some understanding of p-values, and maybe some
distributions. Would you say that this statistical knowledge is
essential for somebody to get into the space of data science?
Garth: At least a basic level. So statistics – just like data science, you
can go incredibly deep. And at some point – again, I’m going
to stress this – you really have to learn who you are. Because
at some point you’re going to have to embrace either things
you don’t know, and be comfortable with that. Or the fact that
you’ll never know it. Like, you’re just never going to be that
super-genius and whatever, because most of us – if we think
about it in a standard, normal distribution – most of us are in
the first standard deviation, we’re all in the 68% of average.
And that’s hugely powerful. That’s where most of us are; that
doesn’t stop you from being excellent at what you do. But you
have to decide for yourself and get comfortable with that so
that you can then make that kind of linear progression in
terms of what you want to accomplish next and start defining
what those steps are.
Kirill: Yeah. I’m just going to repeat that notion that was in the other
podcast—I think it was in podcast #4 with Brendan Hogan --
where we—what you already mentioned, that most successful
people in the world are not people who constantly work on
their weaknesses. It’s the people who understand what their
strengths are, understand what their weaknesses are. And
they focus on their strengths and they’re ignoring their
weaknesses. That’s not my quote, it’s something that the CEO
of Deloitte Australia, Giam Swiegers, kept telling us when we
were aspiring to become better consultants at Deloitte. He
said focus on your strengths. Ignore your weaknesses. That’s
exactly what you’re mentioning now, that if you understand
your weaknesses and you understand who you are, what
you’re good at and what you’re not good at, maybe it’s not
worth getting good at something you don’t want to be good at.
Just focus on the things you love and focus on the things that
you’re passionate about and your existing strengths make
them even stronger. That’s what will make you valuable and
that’s what will make you respect yourself and push yourself
even further.
Garth: Absolutely! That’s 100% correct. And while you were speaking
about that, I thought what if a person that’s listening says,
"Well, I’m not all that good at stats. Then what?" Well, again,
to close on that question, the net-net is you do need, I think,
a basic knowledge of stats. But that’s very achievable. And I
do use myself as kind of a litmus test. This will sound strange.
I do have a positive self-image, but I like to acknowledge the
fact I’m not the shiniest apple in the barrel. I’m in the barrel.
So if I can get it, and again, coming from a right brain art
background to a left brain world of application and having to
make all this context which is, "If I can do it I can absolutely
help you or know that you can do it," and those basics are
achievable. It’s okay if it’s not perfectly crystal clear the very
first time you do it. There’s stuff that I’ll hit and I’ll think, "Oh,
geez. I’m going to have to go again for the third time to try and
understand this concept." And maybe at that point, I’m
looking at it from somebody else describing it, trying to get a
different view of it. And suddenly it will click for who knows
why. But just don’t stop, don’t let that stop you. You can
accomplish the basics and again, that 20/80 rule. You just
get that 20%, and you’re going to start making powerful,
powerful impact.
Kirill: Totally. I totally agree. That’s what a lot of things in life are
about. You do not stop. If you fall, you get up and you do it
again. If you have to do it the second, the third, the fourth
time, you keep doing it. And eventually you will find a way to
make it click. That’s what we are as humans and resilience is
a huge factor. In terms of self-respect, in terms of learning
new things, it’s very valuable. That was great that you
commented on that. I was actually going to ask you about
that, how was it being a right brain person getting into the
field of stats, but you pretty much answered that question.
And I also like this litmus test analogy, that you’re using
yourself as a litmus test. That if a person with a right brain
mentality can get into stats and understand that basic 20%
that are required for 80% of the things, then—yeah, it is
doable, it is possible and nobody has an excuse not to do it.
So what was your next step? So you mentioned you learned
some stats, you started on R programming. So how deep did
you get into R programming? Do you use it on a daily basis
and what was your next step after that, or what is going to be
your next step?
Garth: I don’t yet use it on a daily basis. I’m planning that in the next
one to two quarters, to try and implement that into my work.
I’m still in the learning journey. So I would say that it’s not
like, "Oh, I’ve accomplished R." I mean certainly, I’ve gone
through some courses and kind of gotten a sense for it. But
again, that idea of wanting or needing, in some cases, to go
back multiple times and transition from the, "Okay, I know
the mechanics of how the functions work," but I think the
most incredibly important skill from a data scientist's
perspective is to think about how to think about a problem.
The mechanics of any given tool, you can learn those and get
either super-great or be okay at it.
But if you can rightly think about how to frame the problem
and how to attack the problem, that is, in my opinion, the
most valuable thing. And with respect to work in terms of next
steps, part of it I’m letting kind of in parallel the tasks that
were given to me by my executive management. Their ad hoc
requests kind of drive what I might need to learn next because
it’s so vast, who knows what you’re supposed to do. There’s
no one way. But also, just because I have this learning path,
I know that there are some basic skills, whether it’s R, or
Python, or whatever. You know, I can always continue to
make continuous improvement in those areas. But I should
mention, in addition to that, we do have particular
visualisation tools that I’ve had to spend some time learning
those as well. We use Tableau right now at NetApp. We also
are exploring and have a lot of interest in Power BI. I use
Microsoft Power BI on my own when I’m not doing work
projects. So those types of tools take some time to learn and
you’ll know what you want to do and if you can’t figure out
how to make the tool do it, you’ll kind of spend some cycles
working through things like that.
Kirill: Totally. I really like your idea of the most important skill being
how to be able to think about a problem, how to frame a
problem. I think a lot of the time, that is disregarded. People
spend so much time focusing on, actually, the tools. And a lot
of our listeners who are in this phase, that are starting out
into the data science journey, I highly encourage you to
consider this. That a lot of the time that you’re learning
something new, you spend your efforts in understanding how
to code in R, or how to use Tableau, or how to understand the
statistics that you’re using, those methodologies and so on.
But what slips past your attention is how to actually—what
you’re working on, what is the problem, what is the end goal.
And then in your head identifying how you're going to tackle
it, what are the steps that you’re going to take.
Yes, of course, you need to know the tools to understand the
steps in detail. But even just having a general picture, getting
a general sense, developing that intuition for how to attack
these problems, that is – like you correctly mentioned, Garth
– that is probably one of the most important aspects of a data
scientist’s job. And just falling on that question, I wanted to
ask you, can you give us an example of when you recently had
to—again, if you can disclose this information—when you
recently had a business problem and there was a specific way
that you went about thinking about it that helped you solve
the problem, or tackle it in a more efficient way, or actually
come up with a solution to this problem?
Garth: I do, actually. It’s going on now at the moment.
Kirill: All right, that’s interesting. Let’s hear it.
Garth: I don’t have it fully solved yet but I’ll tell you how I've
approached it thus far. So I was asked to do an analysis. I’m
able to share at least this point: they were looking at what
they call master purchase agreements. Basically if you have a
large customer and you kind of put into a contract for some
period of time whatever the kind of standard discounting that
you’re going to get based on who you are, or your planned
purchase run rate, or whatever. The VP had a sense that there
were some inefficiencies there. And he couldn’t put his finger
on it, that was just his intuition. I’d like to stress that there’s
a large value in intuition. You can really hone that and use
that to your benefit. It’s just you shouldn’t necessarily make
every business decision only based on intuition. That’s what
data science helps with. So he had this idea that there might
be some inefficiencies in our master purchase agreement but
he wasn’t sure. So he said, "Hey, Garth, you’re the data
science guy. Look at the data and see what you can find." And
that was the assignment. (Laughter)
Kirill: Lovely. I love those. They come to you without a business
problem. They’re like, "Just look at the data and tell me what
you can find." Wonderful! Fantastic!
Garth: It’s so vague it’s fantastic. You know, through no fault of their
own, a lot of the business leaders that you’re going to work
with, they don’t necessarily know. They may not know the
data, what they’ve got, they might not even know what fields
are available. But they certainly don’t know—you know,
you’re in the data all the time—what kinds of relationships
might exist already as they’re trying to think about the
problem. So here’s this nice vague problem. Of course, I had
no idea at first what to do and I thought at first, "Well, what
can I do?" When I don’t know what to do, I ask what can I do.
What do I know how to do? What can I start when something
is ill-defined, and who knows what the next step is. And the
first step I thought of is, if nothing else, I can visualise the
data and just look at the shapes of the data.
Kirill: Interesting.
Garth: To inform my next step. Again, Mr Visual—if I can get a sense
for the shape of the Big Data, that might then help me
understand what I might next look at. Whether that next thing
is right or wrong, I don’t know yet, but it helps me get a sense
for the overall picture. So I—
Kirill: Hold on, hold on. Sorry, I’m going to pause you here. What do
you mean by "shapes"? So that our listeners can get a better
understanding of your thinking process, what do you mean
by "shapes" of the data?
Garth: Definitely. So what I did is I took that data and put it into a
histogram. And just visualised the data. In this case it’s
Tableau. It could have been any tool, doesn’t really matter.
But I was looking for things like—and if you’re studying your
basic stats you’ll learn this or you know it already, but the
idea of—what does the distribution of the data look like? Is it
a normal distribution? Is it skewed left? Is it skewed right? I
might throw in some box plots, or actually, before I leave that,
I might also look at the modality. You know, is it uni-modal,
is it bi-modal, is it multi-modal? Meaning how many major
peaks does the data have? Is it very spiky and I have multiple
peaks? Well, then I don’t have just one mean anymore. I
probably have two means, and I might want to look and decide
— well, for the start I might look at the peak on the right, or
the peak on the left. And that’s another model for
troubleshooting and solving problems. It’s the "Divide by 50
Rule". Just pick a problem, pick a spot in the middle and say,
forget about the stuff on the left, or right, whatever, pick
whatever one you want, and go focus on the other half and
just say, "Can I solve the problem with that half?" And if I
can’t, then find the mid-point of that one and split that in half.
Sooner or later you’re either going to exhaust that whole path,
or you’re going to find the problem.
So looking at this and the modalities and the distribution of
the data, I instantly found that in this particular problem, the
data that I was looking at was very skewed to the right. Which
is the reverse in statistics to what you might think it means,
that the bulk, the majority of your data is actually stacked on
the left, and that the small tail runs to the right. That’s a right
skew. And when I looked at that again, I didn’t have some
profound insight of, "Aha, there is the problem." But it did
suggest some things about, "Well, in this particular model
that I’m looking at, there is a heavy concentration here that
might not be ideal. And there might be some other things in
the business that we can look at or do to address that
concentration." I also then thought, "Okay, here I’ve got this
visual. Let me put some lines in there for the 20/80 rule as
well," which is approximately, generally speaking, 20% of your
products or 20% of your customers account for 80% of your
revenues as a rule of thumb. I don’t know if that’s necessarily
always true or it’s true in this case, but let me model that.
And I saw variance there.
Then I started to think about, "Okay, why did I experience
variance there, and what could that mean, and is it for all of
my customers that this is true? Or is it just particular
customers?" And suddenly you can see kind of a waterfall
effect of what I can next look at as I traverse down that path.
But what I want to highlight here too, and this is so useful as
a technique, I had some quick visuals, that quick sense, the
general feel or shapes of the data. We went back to that
executive sponsor who asked the question within the first day
or two. Again, no particular insights, just "this is what we’re
seeing in terms of the shape. Are we going in the right
direction?" is what was asked. And it’s kind of like a
negotiation. All of a sudden he starts, "Oh, well look at this,"
and "I think this," and he’s including his experience and his
background and things that he knows that he never shared
with you ahead of time. But that has then informed a whole
another series of direction to take.
So the next step is based on his feedback from that data shape
analysis. Now I have the next step of, "Okay, I have a direction
for classifying my customers in a particular way. And then I’m
going to do some random sampling in each category to see if
the behaviour is unique to a particular class or if it’s
consistent across all classes of customers." And you can see
now where this is going in terms of how it’s becoming more
meaningful to informing the strategy of this master purchase
agreement.
Kirill: I kid you not, you kept me on the edge of my seat this whole
time you were speaking. I was like, "What happened? What
did that executive sponsor say? What did you come up with?"
That is so cool. And it’s such a good walkthrough. Thank you
so much for walking us through this case study. It's like an
actual case study, a real life project, and how you got a very
ambiguous challenge. And again, as you said, at no fault of
the person that is requesting it. Sometimes that happens in
life, when the executives might have a gut feel for something,
and it's much better than having no feeling whatsoever, no
understanding, no intuition about what’s going on. So when
you’re supplied with this intuition or gut feel, that at least
guides you in the right direction. And then you were able to
break down such an ambiguous problem into simple steps
that you undertook.
Sometimes, like in this case, you don’t even have the ends in
mind. You know that you want to find something. You don’t
even know what it is. But still, every single step that you were
taking had its own end in mind. And that whole finding the
shapes of the data, understanding the skewness, and then
applying the 80/20 rule and seeing if it applies. I actually did
that myself in one of my projects, just check the 80/20 rule.
So things that you know should be true, just check if they’re
true or not, and that might give you some ideas. And also what
was very valuable, and I want to repeat this for the benefit of
our listeners, is working with stakeholders. Whether or not
the challenge is ambiguous, or if it’s defined, it’s so important
to go back. Even if it’s a very straightforward task like, "Within
three weeks we need this visualisation with this dashboard,
or these insights conveyed." It’s still so powerful to go back
and check with the stakeholder whether you’re 30% through
their project or you’ve delivered your first major insight. You
go back and check. And even if it’s what they wanted that
you’re delivering, they might change their mind. They might
see the insights and they might come up—together you might
come up with a much better way to solve the problem. Or you
might come up with even more valuable insights that they
have now decided that they don’t want those previous
insights. What’s going to be more valuable for their
organisation is a different insight. So would you agree with
that, Garth, that even if it’s not an ambiguous project, it’s
important to go back to the stakeholder and discuss with
them and work with them closely in solving this challenge?
Garth: I agree 100%. In fact, I’d extend it a little bit. So I think that
there’s—particularly if you’re in the early part of your learning
journey -- there's a tendency to kind of view yourself as the
person who looks at the data and reports the result. The value
of a data scientist is so much more than that. In this case,
having those discussions with the stakeholders ultimately
leads me to start thinking about the broader system and the
larger question and what other questions might need to be
answered and offering those into the discussion. And you’ll
see light bulbs going. So not every stakeholder that you have,
whatever level they’re at – director, VP, executive VP, CEO,
whatever – they may not be a data scientist and so as you’re
describing these results, you always have to think about how
to pose this back into the context of the business problem,
rather than just, "Look at this awesome regression," or "Look
at the linearity of this trend line."
Kirill: You’ve got to speak their language, yeah?
Garth: So a good example of that is—there’s another piece of data
that I was looking at where—and this isn’t particularly
proprietary. This is true for any business, really. You know,
you’re kind of looking at your distribution of margin to
burden, or to discount. You know, how much profit are you
making, and how much discount are you doing. And I saw a
little blip in the data where I’m like, "Okay, you know what?
There seems to be a pattern here where some of these
bookings, or margins, or profits are coming from kind of low
margin but high discount." And that’s kind of like a worst case
scenario. You don’t really want that.
But because I was thinking about it in the context of the
business problem, and I was going back to that stakeholder
and having this conversation, just through that discussion,
we started brainstorming and connecting the dots of the
system and saying, "Well, I guess on one point, we could look
at the data and just say, okay, this is not ideal. We should
immediately address this, 'problem'.’’ But then we thought,
"No, that may not be a problem. What if those are basically
kind of your cherry pick accounts?" You don’t have to spend
a lot of time as a salesperson doing anything to cultivate them,
they’re just kind of recurring, always there, in which case that
low margin/high discount, those dollars are golden. Don’t
touch them. You know, you’re not spending much time there
anyway.
Another part of the analysis, we saw areas where we had some
cluster of data points where there was higher margin and low
discounting, all part of the same dataset. And I thought, "Well,
let’s highlight this group of data points and talk to the reps
and find out." You know, what’s the same about those kinds
of accounts. Are those accounts a particular kind of profile,
whether it’s the industry, the size, the kinds of business
challenges that they’re facing, their customers. What’s the
same about those accounts? What’s the same about the
messaging and how the reps positioned products and services
to those accounts? Take that from those successful areas and
go run a short experiment. Go talk to sales in the "problem
area" and say, "Hey, look, this is what we’ve learned from
other areas. We’d like to run an experiment." You know,
maybe take the worst performing salesperson in that area.
They’d love to try an experiment to help their numbers, right?
Kirill: Yeah.
Garth: And maybe work out a deal with the executives saying, "Hey,
don’t let this guy go if he participates in our experiment this
quarter." And then see do those messages, do those profiles,
does that work in this particular area? If it does, then you’ve
just scaled data science knowledge and learning across your
organisation and had profound value impact.
Kirill: Very powerful. I totally love it. And just out of curiosity, and
for the benefit of those listening to the podcast, what tool do
you use for data exploration?
Garth: Probably for quick hits—I’m not opposed to just kind of
digging in using SQL.
Kirill: Well, that’s given your background in—
Garth: That’s my background. Although, and I should mention too,
as a data scientist, basic SQL is important because you’re
going to spend a lot of time transforming and cleaning up data
before you ever get to visualisation. But to answer your
question about visualisation — probably just because my day
to day isn't — we have Tableau, I do use Tableau. I have a
personal preference for Power BI. It is a growing tool. It itself
would not say that it’s a direct competitor today, but it
absolutely accomplishes the 80/20. It’s readily available for
free and I can accomplish my visualisations very quickly with
it so I’m a big supporter of that.
Kirill: Yeah. And from that I take that Tableau, Microsoft Power BI
or maybe some other organisation might have ClickView.
From that, what I was actually going at was that data
exploration doesn’t have to be complex. You don’t have to
know R, or you don’t have to use statistical tools to perform
data exploration. Data exploration, and this is one of my
favourite parts about data science, can be very visual. And it
should be very visual. So, tools like Microsoft Power BI,
Tableau, ClickView – their whole mission and their point is to
visualise the data, to help humans see the data and
understand the data and that’s what data exploration is
about.
And when you were giving that example of looking at the
shape of the data, that’s exactly it. It’s not about writing some
complex code in R to visualise the distribution, or even just
get statistical information about the distribution. It’s about
doing just a couple of drag and drops in one of these
visualisation tools and there you go, that’s your data
exploration. Personally, I think anybody can get into this
space, which is one of the most exciting spaces about data
science. You can get into it very quickly, and if you don’t know
where to start, if you don’t know where you want to start with
stats or R or Tableau or some other tools or databases, then
one of my favourite places as a recommendation to start is
these visualisation tools that not only allow you to visualise
data, but also allow you to perform data exploration.
On that note, I actually had an interesting question for you.
From what you’ve described so far, in NetApp it looks like
they’ve created this awesome culture and I’ve got a feeling a
lot of our listeners are going to look at job postings by NetApp
after this podcast so they should expect an influx of new data
scientists. But it looks like NetApp have created this amazing
culture where data scientists can grow and learn and
strengthen their skills. Would you say that you are getting
support from your organisation when you decide to undertake
a new algorithm or learning in data science or experiment
with something? Would you say that your managers and the
organisation as a whole supports these initiatives?
Garth: That question is so key. You aren’t able to be as successful
without the support of the organisation and it does take an
organisation approach. So you’ll start small. You’ll start in
your immediate area, and hopefully you can help your
immediate boss understand the value of data science, but
you’ve got to extend that out. So when I’m thinking about
solving a data science problem, I do have in mind that what I
ultimately deliver isn’t just about the pretty picture. I
absolutely have to tie it back to the business and how to make
it more efficient, how to make it more profitable, how to do
something that matters and affects that bottom line, and
ideally also ties to the goals of what the organisation is trying
to drive. In our case, we’re probably like a lot of companies.
Data science is such an explosive, in the best possible way,
area of growth, that we’re looking to say, "We’ve got pockets
of data science in the company. How do we bring that together
in kind of a standardised, cohesive way so that we’re all
benefitting from what’s being learned? A lot of problems are
similar, so there’s no need for everyone in the organisation to
reinvent the wheel on their own. Let’s look at what’s worked
in the past for those similar kinds of problems."
So in that sense it kind of ties back to what I mentioned before
and the idea of how you communicate the data science results
back in business problem language that executives and
sponsors understand will help them better see the value of
data science. And it will be a slow roll. And it can be also very
threatening. Please understand that too. When you start
finding patterns and data that don’t particularly favour
current behaviour you get a target on your back and all kinds
of anti—and this isn’t unique to NetApp. This is every
organisation I’ve ever been involved in, which is—you know,
people defend, they’re scared. "That’s new. That’s questioning
my skills and abilities to manage my business," whatever. And
I think you also mentioned this in another podcast, which is,
you don’t have to present it in a threatening way. It’s like,
"Guys, here’s an opportunity here, and it’s not so much that
this is pointing out something you’ve done wrong, but it could
absolutely free your cycles, whether your people or your time,
to go and be awesome in so many other areas that you don’t
even have time to explore right now."
Kirill: Exactly.
Garth: So that kind of framing really helps drive the kind of organic
growth of data science in an organisation.
Kirill: Fantastic. I was actually going to ask you what your
recommendation is for those who want to start learning data
science, get that buy-in from their organisation, but you
basically answered that question. You take it step by step.
And you make sure that you’re aware of the challenges and
the threats that that can pose to you, but you pose it in a
beneficial way. You always find the business value in the
activities that you’re doing. And it would be, for lack of a better
word, it would be just silly for a business not to leverage those
insights that you’re all of a sudden providing and—like, you
can learn data science in your own free time, especially if
you’re passionate about it, and then you could bring those
skills to your business, and sooner or later you’ll find ways
you can apply this knowledge.
And once you start driving value, once you position it in a way
that’s going to bring value to the business, the smart people
at the top that are driving this business are going to pick up
on it and they’re going to slowly start encouraging it. And
hopefully you can get them to create this culture where you’ll
be encouraged to further learn data science.
That was very interesting, and we’ve got–-slowly wrapping up
this podcast, we’ve got some interesting questions. But before
we move on to those, I noticed on your LinkedIn—and I think
I want to talk about this because even though it’s not related
to data science, I think it’s quite important. You’ve highlighted
that you were a part of a volunteer experience and cause. You
were in the Comfort Zone Camp. You were a Big Buddy there
back in 2012. Can you tell us a little bit more about this
experience because I think giving back to the community is
always important and, you know, even if we spend just two
minutes talking about this it can be very valuable.
Garth: Oh, definitely. So Comfort Zone Camp is a local organisation
that’s basically a bereavement camp for kids up to the age of
18 who’ve lost either a sibling or a parent or other immediate
relative. We disconnect from technology, we’re out in the
woods and it’s a space to have some guided counselling with
professional counsellors to help with the grieving process. It
helps them feel comfortable to talk through and work through
some of those emotions. Oftentimes their parents are also
grieving, and they aren’t always able to give that support or
meet the needs of where the child’s at at that time. So this
camp helps provide that space to do it.
But to your other point too—that’s one way. Even in data
science, there’s plenty of opportunities to give back. I mean,
I’m always—it’s funny, it’s probably human nature to think,
"Well, I’m not so good at it; so what can I possibly contribute?"
Whatever knowledge you know, you have something to
contribute right now. Find somebody who wants to learn what
you know and help them learn it. Whether it’s provide the
direction, provide some feedback, whatever it is, the
opportunities are endless to give back. So I always, always try
to think of how can I help somebody else achieve what they’re
trying to achieve, whether it’s in data science or elsewhere.
Kirill: That is very powerful and I totally agree with that. So there’s
lots of ways you can give back to communities, to friends, to
families, to people in general. Thank you very much for
pointing that out and for your contributions. It’s very
inspiring to see somebody let—like, I can relate to you, you’re
so passionate about data science, and I can also see that
you’re a kind and nice, generous person, and you find the time
to give back. So thank you so much for being a person like
that.
Garth: My pleasure.
Kirill: All right. So moving back to our wrap-up questions, what
would you say—and this is going to be an interesting one—
what would you say is your one most favourite thing about
being a data scientist?
Garth: I would say it’s just that whole discovery process and—
actually, no, it’s twofold. It’s the discovery process, because I
do think data is—hopefully in reruns it was global, but we had
this cartoon back in the days called "Scooby Doo" and it’s like
this mystery cartoon. I was endlessly fascinated by it. They
were always solving mysteries and that to me is what data
science is. You’ve got this mystery, this problem, and it's
discovery. And it’s just fascinating to see well, what’s
possible? What’s not possible? The other half to me,
harkening back to my right brain side, is that—like, I use
sketch notes extensively to understand complex problems
and remember it. But when it comes to expressing of visuals
and getting those messages back to the execs, the visuals,
Tableau or Power BI or Clickview or all those create—that’s
one piece. But it’s a part of a larger presentation, and I’ll start
applying infographics to help facilitate the discussions
around those visualisations. And so I kind of get to marry art
and data science and logic and then this piece together. To
me that’s been kind of a dream come true in a sense.
Kirill: Fantastic. I can imagine how for right brain persons such as
yourself, how that’s that additional creative part. We talked
about that creative part within the actual data science, the
technology. But this other creative part, actually bringing the
insights to life and making them speak the language of your
audience, I can imagine how that would be super exciting as
well. And from where you are now, from what you are learning
about data science, or from how far you’ve gotten into the
field, from what you see now where do you think the field of
data science is going and what would you recommend to our
listeners to look into to prepare for the future of data science?
Garth: Based on what I’ve seen and what I’m getting a sense of, is it
feels very much like the late 90’s/early 2000s with the
technology boom that happened at that time. That was largely
centred around network administration and system
administration. That’s kind of where data science is to me
today in the industry. It is the thing. And what to me is a little
different about data science compared to that earlier time;
that earlier time is about the hardware or the software. Data
science, we have some tools but again, it’s really about hat
way of thinking about problems. And being disciplined and
how you create little miniature, quick experiments to vet out
proof points of whether something is valid or statistically
meaningful.
That extends well beyond a tool. And well beyond a particular
timeframe. To me, it’s more like maybe personal computers
were back when they started. Can you really get sustainable
employment if you don’t have some basic computer skills
today? It’s going to be difficult. Data science to me is like that
going forward. It will be the differentiator between
organisations. You know, the organisations that have a deep
concentrated knowledge of data science skills are going to
have a competitive advantage. And whatever you’re doing,
even if you’re not a data scientist – maybe you’re a people
manager or whatever else – applying data science to what you
do will differentiate you from your peers. It ultimately always
affects the relationship you have with your customer. You’re
able to create a more authentic and provable relationship,
giving them exactly what they need based on the data, and
also driving the bottom line.
Kirill: Fantastic. Love it! Data science across the board. Everybody
needs to know data science to a lesser or greater extent,
depending on your role, depending on how deep you want to
get into it. But I totally agree. It’s a great analogy: it’s like
computers back in the 90s. You thought computers were this
chic thing that maybe some people would need to know, some
people wouldn’t. Kind of like a non-compulsory thing, and
people could use them if they wanted to, but now everybody
uses a computer. It’s natural, right? Same thing about data
science. I totally agree. Thank you very much, Garth, for
coming on the show and sharing all of this valuable
knowledge. If any of our listeners would like to contact you,
or follow you, or find you, or follow your career, how can they
do that?
Garth: LinkedIn is a good way to find me. I don’t yet have a thing like
a blog. Maybe as I feel like I have something to say, I’ll add
one, but for now LinkedIn is a good way to find me.
Kirill: For sure. We’ll include the URL to your LinkedIn. But
definitely I highly encourage you to start a blog. And any of
our listeners who are passionate about learning data science,
I think—as you said, teaching and sharing and giving back, it
will help you improve your skills. But it’s also a very valuable
thing to do. And if you ever start a blog, we’ll definitely include
it in the show notes and let our listeners know about it. One
final question for you today is what is your favourite book that
can help our listeners become better data scientists?
Garth: Harkening back to something I mentioned earlier again, I’m a
fan of Microsoft Power BI largely because it leverages what I
already knew about Excel and functions there, and adds a
little bit of SQL. And it’s just a powerful, inexpensive tool, so
I highly recommend a book called "Introducing Microsoft
Power BI" by Alberto Ferrari and Marco Russo. They’re so good
at being able to describe how to use that tool in a very clear
and easy to understand way. So I’m a big fan of that one.
Another book that I really strongly recommend, and this has
to do with the idea of data presentation, is called "Show Me
the Numbers" by Steven Few. He does such a good job of
walking through, or actually helping you avoid the temptation
of just letting a tool tell you, "Well this is the visualisation."
No, it’s going to make some assumptions that aren’t
necessarily always right. Stephen Few walks you through,
why would you want to choose one visualisation versus
another type of visualisation based on the problem you’re
trying to solve? What are some of the artistic elements that
really matter around the use of white space? Or around the
use of colours? Or colour intensity to highlight different parts
of your data? Such a great book that I just can’t recommend
it enough.
Kirill: Fantastic. Sounds like a great book already. I can feel the
enthusiasm in your voice. So there we have it. We’ve got two
books: "Introducing Microsoft Power BI" by Alberto Ferrari
and Marco Russo, and "Show Me the Numbers" by Stephen
Few. So we’ll leave the links to those books in the show notes,
so we’ll definitely check them out and pick them up. I’m
personally very interested in Power BI after our conversation
yesterday and after our chat today, so I will put those onto my
"to read" list. And finally, again, thank you so much, Garth,
for coming on the show. Really appreciate you taking the time
to share this, and I’m so happy that so many of our listeners
are going to get value out of this, especially those who are just
starting out into the field of data science. I think this interview
has been a great inspiration to those people. Thank you so
much.
Garth: Oh, my pleasure. I hope so, and look for me in the community.
Kirill: All right. Talk to you soon. Bye.
Garth: Thank you, bye bye!
Kirill: So there you have it. I hope you enjoyed this episode and
probably you could tell how inspirational it was, how energetic
this episode was, and I’m sure you picked up lots of very
interesting skills. Personally, for me it was very motivational
to see somebody like Garth learning about data science and
just pushing the boundaries, always constantly finding new
materials, finding ways to improve his knowledge, and that
even inspires me to go and start learning more and more
myself. So definitely take this as an inspiration that people
are learning data science, people are finding ways to go about
it. It is a complex, broad field to get into, but it is possible to
take those baby steps one step at a time and learn those
things that you need to learn, set yourselves challenges and
goals, and you will get into it and you will become a successful
data scientist.
And don’t forget that you can get the show notes for this
episode at superdatascience.com/11, so just the number 11,
and there you’ll get the transcript for this episode, you’ll get
links to the books that we mentioned, and you’ll also be able
to follow Garth and get a link to his LinkedIn. And if you’re
listening to us on iTunes, then please rate this podcast. It’s
quite a new podcast, and any ratings that you can submit,
especially if you like the show, will be very, very beneficial. I
can’t wait to see you next time. Until then, happy analysing.