SDS PODCAST
EPISODE 231:
DATA VISUALIZERS:
THE
STORYTELLERS
OF DATA SCIENCE
Kirill Eremenko: This is episode number 231 with Data Visualizer,
Mollie Pettit.
Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur. Each week we bring you inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today
and now let's make the complex simple.
Kirill Eremenko: Welcome back to SuperDataScience Podcast, ladies
and gentlemen. Super excited to have you on the show
today. Today we got a very exciting, lively, and
energetic guest joining us for the episode, Mollie Pettit.
Mollie was one of our speakers at DataScienceGo
2018. She did a fantastic job. The audience totally
loved her presentation and what you need to know
about Mollie is that she is a data visualizer. A
professional data visualizer. And right now you might
be wondering why am I stressing that she's a data
visualizer and how is that different to a data scientist.
Well, in this episode you will find out exactly why and
how those two terms are slightly different.
Kirill Eremenko: Also in this podcast we will talk a lot about D3.js a
JavaScript Library for creating outstanding,
phenomenal, mind blowing visualizations for data
science projects. You'll find out exactly when to use
D3, exactly when not to use D3, and what are the
advantages and disadvantages of this tool. Mollie uses
D3 quite a lot.
Kirill Eremenko: Also in this podcast you'll get a case study. We'll
discuss one of Mollie's case studies which is about
Illinois traffic stops and police officers pull over people
and what kind of biases may exist there, may not, and
how she went about exploring it. I will provide a link
where you can actually look at this project as you
listen to this podcast or after you listen to this
podcast.
Kirill Eremenko: And finally, we'll talk about using data science for good
and how Mollie participates in those projects and how
you can get involved as well. So, a podcast saturated
with lots of topics, lots of interesting things that we're
going to discuss. Can't wait for you to check it out.
Without further ado, I bring to you Mollie Pettit, a
professional data visualizer.
Kirill Eremenko: Welcome to the SuperDataScience Podcast, ladies and
gentlemen. Today I've got a super exciting guest on the
show with us, Mollie Pettit. Mollie, how are you doing
today?
Mollie Pettit: I'm doing great. How about you?
Kirill Eremenko: Doing well as well and how's Chicago these days?
That's where you are today, right?
Mollie Pettit: Yeah, that's right. Chicago is great. A little bit chilly
right now opposed to where you're at. It just started to
snow.
Kirill Eremenko: Wonderful. Wonderful. So, you haven't always been in
Chicago, right? You moved there a few years ago.
Mollie Pettit: Yeah, that's right. I moved here three years ago. Before
that I was actually living in Abu Dhabi for a few years
and before that California for grad school.
Kirill Eremenko: Wow. Such a crazy story. 'Cause we met at
DataScienceGo and as much as I wasn't able to attend
your talk, but I just watched it on the DataScienceGo
recordings and you really have a crazy story, like how
you went into geology and then visualization and
things like that. So, I'm really excited to dive into this
and learn about and share with all our listeners.
Mollie Pettit: Sure.
Kirill Eremenko: Before we get started on that, tell us a bit about who is
Mollie Pettit. Like, what would you say to somebody
you meet for the first time? How would you describe
what you do professionally right now?
Mollie Pettit: Who is Mollie Pettit? Yeah. So, I am a freelancer. I do
data science and data visualization. I do a variety of
projects. Nowadays my focus is mostly on doing a lot
of interactive data visualization projects, but I still do
... but a lot of it sometimes involves analysis before the
visualization. Then others are just straight up data
science analysis projects. So, that's a lot of what I do.
Kirill Eremenko: Okay. Okay. Wonderful. So, you moved to Chicago for
your job, is that correct?
Mollie Pettit: I did, yeah.
Kirill Eremenko: Okay. So, do you mind sharing with us what company
do you work for right now?
Mollie Pettit: Sure. So, right now I actually just work for myself. I
started a one person LLC, which is not creatively
named Mollie Pettit, LLC for the moment. I actually
originally moved to Chicago to work with Datascope
Analytics who I worked with for a couple years, not the
data science consultancy, which has actually since
then been bought by, well acquired by IDEO. So, the
Datascope Analytics became the data science team at
IDEO and it's growing.
Kirill Eremenko: Okay. Okay. Cool. I didn't know that you started your
own business. How's that going? How's the Mollie
Pettit, LLC going?
Mollie Pettit: Yeah, it's going well. There's a few things that are
about to be live which are exciting and kind of more
projects coming up, about to be started. It's going well.
It's nice. It's enjoyable to have the freedom to kind of
have the hours that you want and work from where
you like. I like that.
Kirill Eremenko: In the meantime you've also been very busy attending
and speaking at conferences, right? How many did you
attend this year? It's crazy. Like, you were at
DataScienceGo, Data Science Salon, the Tapestry, is it
what tens, hundreds? How many did you attend?
Mollie Pettit: No. Not quite that high. I think I've been to maybe
three or four this year. And I'm about to go to another
one. So, I spoke at Data Science Law in New York,
Data Science Law in Miami, DataScienceGo. I feel like
maybe I'm missing something, but I can't remember.
Like, I spoke at Data Science Law in LA, but that was
the end of last year. Then I'll be going to Tapestry in a
couple weeks. So, I'm looking forward to that.
Kirill Eremenko: Okay. Nice. Nice, nice. What inspires you to speak at
conferences? It obviously takes a lot of your time? Why
do you do it?
Mollie Pettit: I think there's a lot of really nice things about
speaking at a conference. I think one, it gives you the
ability to tell people about something that you're
working on, something that you're really excited about.
It gives you really good opportunity to meet a lot of
other people who are also in data science or in data
visualization or are looking to get into it. A lot of really
interesting conversations. You get to learn about what
other people are doing. Yeah, I think those are a lot of
the reasons that I enjoy. Also, there's that added
benefit of just getting to travel. Travel around.
Kirill Eremenko: Yeah, that's true. That's true. And kind of like what
you said, it broadens your horizons, helps you not just
think outside the box, but sometimes in order to think
outside the box we need some external input which is
already outside the box in order to like start thinking
like that.
Mollie Pettit: Sure. Yeah.
Kirill Eremenko: Cool. Well, I'm very excited to talk about, probably to
start our conversation with what we were debating
about just before the podcast: data science and
visualization. Are these the same thing? Or two
adjacent fields? I'd totally love and appreciate your
opinion on that. Can you share with more of us why
do you think or why is your position that data science
and visualization are actually quite different areas? As
far as I understand.
Mollie Pettit: I think there's overlap and I think that I would say
that data visualization is an important part of data
science. But I think that when you start getting into
interactive kind of front end of data visualization,
which is a lot of what I do now, the reason that it is a
bit different is because it requires the use of different
tools and languages. For example, when I do data
science I tend to use Python, whereas when I'm doing
front end data visualization I use D3.js, which is a
JavaScript Library.
Mollie Pettit: So, there is different languages being used and I think
that there's a lot of overlap. If there was a Venn
diagram, they would definitely cross in the center,
right?
Kirill Eremenko: Uh-huh. (affirmative).
Mollie Pettit: Because something that's really great about data
visualization is once you've done data science and you
have the interesting insights and you have these
things that you want to then get across to an
audience, which could be a massive public audience or
perhaps it's just an internal audience, data
visualization is something that can then be used to tell
that story really well. I think that having a data
science background is very helpful in doing data
visualization. But when you're doing data visualization
versus data science, you have just different focuses.
With data science you're trying to really uncover these
interesting insights and if you're doing EDA, for
example. Whereas with data visualization, you are
trying to display those insights in a way that it's very
easy to understand.
Kirill Eremenko: Gotcha. What's EDA?
Mollie Pettit: Oh, exploratory data analysis.
Kirill Eremenko: Uh-huh. (affirmative). Okay. Cool.
Mollie Pettit: All right.
Kirill Eremenko: That's okay. I actually also identify that visualization
can be used for two things. That you can use it for, I
call it visual data mining, VDM.
Mollie Pettit: Oh, for sure.
Kirill Eremenko: And the other thing is obviously presenting your
insights and creating these beautiful visualizations.
Mollie Pettit: Yeah.
Kirill Eremenko: And I like how in your talk you mention what D3 is
good at and before you describe what it's good at you
actually said what it's not good for. One of the things
it's not ideal for is when you want to do that
exploratory data analysis. When you want to do
quickly put something together, identify what are the
insights, what are the trends. It doesn't have to be
attractive. It doesn't have to be super presentable. Just
get some quick insights from the data.
Mollie Pettit: Yeah. Yes, exactly. Like you said, there are multiple
reasons to do data viz and some of them are much
more tied into data science like using data
visualization for this exploratory aspect. Were you
wanting me to get into what D3 is and what it's not
good for and what it is good for?
Kirill Eremenko: That's a good point. Yeah. Let's do that, because I
think we've heard D3 on the podcast before by some
speakers especially with Nadieh coming to the podcast.
We talk about Nadieh Bremer here. Yeah, give us a
guide. It looks like D3's a tool that's used most often.
Is that about right?
Mollie Pettit: Yeah, it's a really popular tool and there's a few
reasons for that. D3's a little bit more complicated. It
has a more steep learning curve than some other tools
that someone might use. For instance, people
sometimes might use a wrapper that will allow them to
still use Python to create some of these visualizations,
but the benefit of using D3 itself is that it is really
flexible and customizable and you can make these
visualizations do exactly what you want with a lot of
different interactions. Hover and click and various
things like that. So, it's extremely customizable. It lets
you tell the story that you want to tell.
Kirill Eremenko: I love D3 myself. I tried it when I was back in Deloitte
we had an option of picking a tool for a project and we
didn't end up using D3, because it was too complex,
but nevertheless, my director and I we decided to have
a challenge who can learn D3 the best in like two or
three weeks it was. And we had to come up with a
visualization. It was really fun. D3 is kind of like
working with the webpage.
Mollie Pettit: Yeah.
Kirill Eremenko: On a webpage you right click and we've all probably
done this back then. You right click and click "view
page source" and you look into the HTML and see the
sets and so on. So, D3 actually manipulates all of that
dynamically to place different objects on the screen
and so it's really cool because it's so structured. Even
though it's a programming language, it's so structured
in the way that HTML is so structured. I found it
fascinating. You're right, it has steep learning curve,
but it's so fun to try to do that because instantly you
get feedback, right? You see a rectangle on your screen
and then all of a sudden it turns into a circle. It all
happens dynamically that whole library.
Mollie Pettit: Sure.
Kirill Eremenko: So, smooth. I like the smoothness of it.
Mollie Pettit: Yeah, it is. It's a steeper learning curve, but once
you're gotten over that hump, you're able to do so
much.
Kirill Eremenko: Yeah. That's true. True. When did you first encounter
D3?
Mollie Pettit: Actually, I have a question for you. Did you enter that
challenge and how did you do?
Kirill Eremenko: Oh, yeah. It was just my director and I and nobody
else wanted to join because it was too complex
apparently or something. He was visualizing some
client data about trains or something like that and I
was visualizing ... see what I did was I took our team,
it was like we had 15 people on the team or 12 or
something, and I got the data internally about the
billable hours, how much hours they're billing and
how much hours they're spending on training and how
much hours they're spending on something else, like
admin work. And I put those into like ... and I called it
the Pie Factory, because I created a pie chart for every
person. And you could like click on it and all this
information would pop up. You know, what clients
they've been working on, how much money they've
billed. You had to really put into perspective how
much money everybody's bringing into the business.
Kirill Eremenko: Personally, I think I won, because I finished mine on
time even though it was simpler than his. I finished on
time, but his was more complex and it was very ... also
had some cool dynamic visualizations in there. It was
great fun in there. This was something I found in your
talk very interesting. At the end actually, you got some
questions and one of the questions was: how do you
learn the tools? How do you choose what to learn? And
what you said was that you don't actually pick the
tools you want to learn you pick the project you want
to do. Like a PET project or a work project and then
you find along the way you just decide or you see what
you need, what tools you need to accomplish the task
at hand and you actually go and learn those tools as
you're doing a project. I thought that was amazing
advice.
Mollie Pettit: Yeah. I think really often people when they get a new
project or task that they're going to try to tackle they
think about, "Okay. Well, what do I know that can help
me tackle this?" But I think it's nice and better to go at
it in terms of what's the best way that this can be
tackled? Do I know how to do that yet? If I don't,
maybe is this a good opportunity for me to learn that
thing to tackle this problem?
Kirill Eremenko: Yeah. And you also mention in your talk that ... what
was that company, Datascope that you worked for?
Mollie Pettit: Datascope, yep.
Kirill Eremenko: Yeah, in Datascope that they had their philosophy. It's
if you have a project, you need to use the best tool for
that project as opposed to a tool that might be good
enough that you know really well. So, even if you know
five tools that might be good enough, maybe you
should use the one that's the best. If you don't know
it, doesn't matter. Go learn it. I love that.
Mollie Pettit: Yeah. It's a good opportunity to learn it. That was
something I really enjoyed about working at that
company. I think that it's easy to have the other
mentality of I'm gonna do what I know and I think
working there really kind of got that out of me and got
me to a point where I felt way more comfortable being
like, "Oh, yeah. I don't know this thing. Let's figure it
out."
Kirill Eremenko: Yeah and that should be the mentality of a data
scientist, right?
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: Like, constant curiosity. Anyway, let's jump back to
D3. So, what is D3? What does the abbreviation stand
for? What is triple D?
Mollie Pettit: Yeah, D3 stands for data driven documents.
Kirill Eremenko: Okay and what does that mean?
Mollie Pettit: Data driven documents. Data is what you're going, you
know, the data that you're going to be putting into
some sort of visualization. Documents is your web
document. So, your website. Driven would just be the
act of I guess putting that into the website. So, using
data to make stuff on the web.
Kirill Eremenko: Nice. Nice.
Mollie Pettit: Is basically what that means, yeah.
Kirill Eremenko: So, when was the first time you encountered D3?
Mollie Pettit: I think the first time I encountered D3 was early on at
Datascope, actually. So, when I first [crosstalk
00:18:41].
Kirill Eremenko: Was it a project?
Mollie Pettit: No. So, when I first started at Datascope, they used to
have this set up where when somebody was new at the
company rather than going right onto a client project,
they would have an opportunity to do a PET project.
They would dabble, they would kind of slowly get
involved in client projects, but this kind of gave them
an opportunity to get settled then to learn something
new that they wanted to learn. So, when I first started
I decided to do a PET project that was a network app,
this web app that would be a network diagram of Star
Trek characters, because I am a Trekie. So, I scraped
every single Star Trek episode transcript and movie
transcript and put together this app where people
could select any combination of episodes and movies
and hit "engage" and a network diagram would appear
using D3 that would show the connections between
the various characters in that selection of episodes
and movies.
Kirill Eremenko: Wow. Wow. Very nerdy.
Mollie Pettit: Very nerdy, yeah. And then once that diagram
appeared people could click on a node to focus on it
and have it highlighted and its connections and choose
particular characters they were interested in. So, it
was fun.
Kirill Eremenko: So, how long did that take you?
Mollie Pettit: The actual visualization part I'm not sure. The whole
project took a couple of months, but that was ... I
mean, I was not just doing that. There were other
things happening at the same time. That also though
involved a lot of things in preparing the data to be
visualized. Like the scraping of all the transcripts and
getting everything set up in such a way that it would
be usable in a visualization. So, there was a lot of
different steps for that project.
Kirill Eremenko: Gotcha. What I love about approaching that is by the
end of those, it sounds like quite a lot, a few months,
by the end of those few months, you have a super
brand new skill. You might not be the expert at D3,
but you know that there's certain things that you
know how to do. Like, in three months you might be
70% up to speed or 80% up to speed with what D3 is
all about and how to use it. So, you build up so much
confidence in that time, wouldn't you say?
Mollie Pettit: Yeah. Yeah, for sure. It was definitely a great
introduction to D3 and also I mean, I hadn't even
actually done a huge amount of web scraping at that
point, so that also was a very good crash course in
that, because these were not straightforward set up
sites. They were very inconsistent. So, there was a lot
of exceptions to account for.
Kirill Eremenko: Okay. Gotcha.
Mollie Pettit: So, that was good to do. There's a lot of different things
that I had to do for this project, so I learned a lot along
the way.
Kirill Eremenko: You were kind of like in both fields. You are both in
data science and you've done data science work and
you're in visualization. As I understand you're doing
more and more visualization work now.
Mollie Pettit: Yes.
Kirill Eremenko: Why the shift? Why did you decide to move away from
the data science, I guess the web scraping, the
algorithms and so on and move more into the space of
visualization?
Mollie Pettit: It's not because I don't enjoy data science, I do. And I
still enjoy that I still get to do it when I'm doing data
visualization projects sometimes and I like having the
occasional straight up data science project, but I think
the reason I like to focus on data visualization is
honestly I just find it really fun. I really enjoy creating
this ability to tell stories really well. An ability to
highlight things that are really interesting and also
coding when you're creating something in D3 for
instance, you know, you write a few more lines of code
and you hit "refresh" and you get to see this new thing
that you added. So, that's really nice too.
Kirill Eremenko: Yeah. More room for ... it's kind of like quicker
feedback. You get the results faster.
Mollie Pettit: Yeah. Yeah.
Kirill Eremenko: Rather than waiting a few months. Okay. All right.
Would you recommend this path to data scientists?
Maybe listeners who are tuning into this podcast who
are not yet sure if they want to do data science,
visualization, how would somebody make up their
mind of which way they want to go?
Mollie Pettit: Pick a project and do it. That's the best way I can ever
think of to figure out if you like something. I think that
if people really enjoy kind of the visual and design
aspect but still want to use some data science I think
in order to understand which way you want to go, you
really just have to pick some projects and do them. I
think that's how I learned what direction I wanted to
go every kind of step of the way is I just kept doing
things. I kept learning new things and once I started
kind of getting into D3 and visualization I realized I
really loved it. I started ... well, I, while still at
Datascope, started asking to be on more visualization
projects and by doing more and more of them I
realized I just really liked that and I kind of started
focusing more on that direction. I think the way to
know if you like something is to do it.
Kirill Eremenko: Gotcha. I can see that D3 and from my experience with
it and from the visualizations I've seen ... there's, by
the way, there's a really cool library by Michael
Bostock. It's called, what is it called? Blocks. Bl.ocks.
Or something like that?
Mollie Pettit: Oh, yeah. The website. Yeah.
Kirill Eremenko: Blocks.org. Like that, but it's like bl.ocks.org or
something like that.
Mollie Pettit: Yeah.
Kirill Eremenko: We'll put it in the show notes. There's some really
amazing D3 visualization and templates that you can
use and copy and adjust and just explore all open
source. So, I can see that D3 is way ahead in terms of
the capabilities than other tools. Like, even Tableau,
which I love dearly, great tool, but it's more agile. It's
more drag and drop. It allows you to create
visualization that are fast, but at the same time even
though it has a lot of flexibility, nowhere near to what
D3 offers. The price you pay in D3 is you have to code.
You have to design your visualization -
Mollie Pettit: Right, yeah.
Kirill Eremenko: - very carefully. So, what I want to ask you is, what do
you see in the future? Do you see that D3 has a
future? It's been around for a couple of years and it's
had a really interesting path, but do you see other
tools edging it out and more people moving to tools like
Tableau and more drag and drop, self-serve analytics
type of tools? Or do you see that there is a market,
there's a place for more sophisticated tool like D3 in
the space of data visualization?
Mollie Pettit: Yeah. I think that there's room for both and I think
they have different applications and different reasons
to be used. Like you said Tableau is really great and
something that's nice about it is you don't have to
learn a whole language. Yeah, you don't have to code.
You can very quickly make some really beautiful
things. Because you're not actually coding though, you
have less control. So, if you're trying to do something
very complex, you may eventually kind of hit a
roadblock and hit the end of the capabilities of being
able to customize the way you want. D3 is more
complicated to learn and is harder to learn, but it is
much more customizable and flexible and you are able
to customize things in the way that you want. You
don't really hit these roadblocks that you might hit
with Tableau.
Mollie Pettit: So, I think that they both are very great and they have
different strengths and different weaknesses. So, I
think they're both going to stick around.
Kirill Eremenko: That's good, because in one of the previous podcasts I
had one of the guests made a good comment that it's
important to understand also what is the future of a
tool before you go and learn it. You know, like is this
tool going to be around?
Mollie Pettit: Sure.
Kirill Eremenko: And by the sound of it, D3 is going to be around. But
by the way -
Mollie Pettit: Yep. That's how I -
Kirill Eremenko: - how is the community of D3?
Mollie Pettit: Sorry. How's the community?
Kirill Eremenko: Yeah, is there a community in D3? People, like when
you have a question or somebody has questions, do
they post it online and is it easy to get answers and
help and guidance?
Mollie Pettit: Oh, yeah. That's a good question. So, one thing that's
really nice is Bl.ocks, which you've mentioned. Which
is a lot of times if you have something that you're
trying to make, especially when you're first starting,
you can often find an example for it in Bl.ocks. So,
what Bl.ocks is really nice for is you not only get to see
this interactive visualization right in front of you, but
the whole code is right below it. There's also, let me
make sure I have this right, blocksbuilder.org. And
something that's nice about blocksbuilder.org is you
can access any of the posts that are posted on Bl.ocks,
but it allows you to write there, edit them, and what
that's good for is -
Kirill Eremenko: Nice.
Mollie Pettit: Yeah. What that's good for is, let's say you're looking
at some code and you're like, "Hm. I'm not sure exactly
what this line does." And you can edit it and see if you
break it or see if the color does change. You know, you
can do things straight in there to very quickly get an
understanding of what things are doing. So, that's
really nice and then also I don't know if you've
actually, have you heard of Observable?
Kirill Eremenko: Nope. No, I haven't heard of it. What is that?
Mollie Pettit: So, Observable. It's kind of like a Jupiter Notebook,
but for D3.
Kirill Eremenko: Oh, nice.
Mollie Pettit: So, Observable is a website and it was also started by
Mike Bostock. But yeah, it has that kind of set up
where you can easily kind of like tell a story, but then
within that story have code and have a working,
interactive visualization in the middle of it. Very much
like a Jupiter Notebook, but specific for kind of front
end interactive stuff.
Kirill Eremenko: Wow.
Mollie Pettit: Yeah, so I think that there are some, like you can
definitely find some D3 answers on Stack Overflow,
but I think something that's really nice about D3 is
you can also just find a lot of examples. So, even if you
can't necessarily find someone who's asked the same
question, you can probably find someone who's done
the thing you're trying to do.
Kirill Eremenko: Gotcha. Gotcha.
Mollie Pettit: Yeah.
Kirill Eremenko: There's even a conference in San Francisco about D3,
right?
Mollie Pettit: There is, yeah. There's D3.unconf. The last one was
last September. It didn't happen this year. But it will
... I'm pretty sure it's gonna be happening next year.
I'm not involved in planning that, so I don't have
specific details. As far as a community, there's also a
D3 Slack that I'm a part of that has upwards of, let's
see, I'm looking at it now, about four thousand
members. There's a help section in there. So,
sometimes people will post in there and say, "I'm
trying to do this thing, but it's not working. How do I
do it?" And people will respond there.
Kirill Eremenko: Gotcha. Gotcha. What's an unconference, by the way,
while we touch on this?
Mollie Pettit: That's a good question. I can tell you a little bit about
what it was. So, it's a lot less, at least this particular
unconf, it wasn't full of talks. So, there was only one or
two talks. I believe they were done by Nadiah as well
as, Nadiah Bremer as well as Sarah Drasner. Those
were at the very beginning of the unconf. The rest of it
were these discussion sessions where there would be
maybe four different discussions going on at the same
time and you would choose a room to go to and you
would discuss that topic. Sometimes that would
involve someone being at a computer and kind of
pulling up things that people were talking about that
were either D3 related or just visualization related. It
was just kind of these guided conversations and a way
for people to kind of meet other people who were doing
a similar thing. So, it was less talks and more
discussion.
Kirill Eremenko: Wow, interesting. And how big was this discussion?
Was it like hundreds of people?
Mollie Pettit: Not in each discussion, no. I'm not even ... I'm trying
to think how many people were there total. Probably
within a couple hundred total and each discussion
probably had upwards of 50 or so people in it.
Kirill Eremenko: Interesting. Interesting. I heard from unconferences
first from Pablos Holman who was at DataScienceGo
as well. I have never been to one, but I find it's a quite
interesting concept. I gotta check it out.
Mollie Pettit: Yeah. I really enjoyed it. It was my first unconf, but it
was great.
Kirill Eremenko: Okay. All right. Cool. Well, thanks for that overview of
D3 and the future. I hope all the listeners are pretty
excited and I can personally vouch for it. It's a really
fun experience. I don't use it anymore, but what I
learned in the process of learning it really was
fascinating and helped me even improve the way I
understand websites. The way I understand
interactivity and what's possible with visualization.
Mollie Pettit: Yes. It definitely improves that knowledge, for sure.
Kirill Eremenko: And next I wanted to talk a bit about the case study
that you shared with us at DataScienceGo.
Mollie Pettit: Oh, sure.
Kirill Eremenko: The case study of Illinois traffic. I found that very
interesting how like policemen pull over people and
you were actually investigating whether there's bias,
specifically racial bias and how police officers pick the
cars that they pull over, the cars that they search, and
then the citings that they hand out. That was a really
cool project. How did that all start?
Mollie Pettit: The way that started was I went to a meeting and I
don't remember what the meeting was called. This was
I think a little over a year ago. This meeting was for
people in tech who wanted to use their knowledge and
use what they could do to help in some way. So, the
people that were at this meeting were people in tech
who wanted to find some way to volunteer and help
out and then also organizations that wanted that help.
Mollie Pettit: So, at that meeting I ended up meeting Karen Sheley
or Shelley. I would like to check on that. So, at that
meeting I met Karen Sheley who works for the ACLU.
She had mentioned that she really needed or at least
really wanted to have some sort of a data contact,
because they were trying to put together a traffic stops
report that would just go through the analysis of this
traffic stops data. Who police is pulling over and
searching and citing, et cetera, in different law
enforcement agencies across Illinois. What they were
really just looking for is somebody that they could call
on for help. Like, if they had questions about the
analysis or the data. I was there with a colleague and
we were like, "We can do more than that. We can help
with the analysis." The people who were doing the
analysis at the time, it was mostly some simple Excel
stuff that was being done. We wanted to kind of help
them do something more complicated with this so that
they could have an even more in depth report.
Mollie Pettit: So, we worked with them to do this analysis and look
at the search rates, et cetera across different agencies.
Then it eventually evolved and I started working with
them on a website that would walk people through this
analysis that had been done and they could look at
these data visualizations that would be interactive.
They could choose different agencies. They could click
on things and get more information and it would really
tell the story of what these racial disparities in traffic
stops look like in different agencies.
Kirill Eremenko: Gotcha. You mentioned that the website at the time
when we were recording this is not yet live, but it's
about to go up. So, by the time recording is live, it's
definitely out there already. What's the website? Where
can people go maybe right now while they're listening
to this podcast?
Mollie Pettit: Yeah. If you go to illinoistrafficstops.com, you'll be able
to find it.
Kirill Eremenko: Nice. Nice. So, it's similar to the visualizations that you
shared at DataScienceGo, right?
Mollie Pettit: Yes. I shared some of the visualizations at
DataScienceGo. Yeah, I think I had a bit of a Chicago
focus, but on this particular website you can look at
any of the agencies.
Kirill Eremenko: Okay. Fantastic. All right. So, that's how you guys met
and that's what you helped them or decided to help
them out with. So, how did the project go? So, like you
got this idea, then what happened? Like, was this part
of ... you obviously had a job at the same time. So, this
was like a free time project that you were doing?
Mollie Pettit: Yeah. So, when I first started it was. It was a free time
project. It was something I was doing when there was
time. But actually something that was really nice is I
was able to incorporate it into Datascope at the time.
So, as a consultancy, sometimes there is downtime,
right?
Kirill Eremenko: Uh-huh. (affirmative). Yep.
Mollie Pettit: Sometimes you just finished a project and you're
gonna start another project in a week and you're
waiting for that to start. So, I convinced everyone there
that we kind of bring this in internally and when
people had a down week if they wanted to work on
this, they could. So, for a little bit it was kind of an
internal project at Datascope. That was really great,
because then we were able to utilize this time that
would have been downtime anyway to do something
that we thought was really exciting to work on and
important. After the acquisition, one of my former
colleagues at Datascope and I kind of kept up with it.
Chris Kucharczyk. So, him and I have been the main
people kind of working on it this past year. Then more
recently a good friend of mine, Alex Alleavitch came on
as a front end engineer.
Kirill Eremenko: Okay. Gotcha. All right. So, now we've got the picture
painted and this is our... Super excited and impatient
to find out what is this project all about, how did it go.
So, tell us the starting point of the project. What kind
of data do you have? Where does it comes from? And
then we'll go from there.
Mollie Pettit: Sure. Yeah. So, the data that we have is whenever a
person is pulled over in Illinois, the law enforcement
officer is required to fill out a form and that form
details information about who was pulled over, what
was their gender and race, information from their
driver's license, why was that person pulled over. Once
they were pulled over, did the officer search that
person? If that person was searched, was contraband
found or not? Then what was the result of that stop?
Was that person cited? Or given a verbal or written
warning? So, that's the data that we're working with.
What the data looked like raw was one, you know, line
of data for each stop that occurred.
Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Gotcha.
And just to clarify, the officer had to guess the gender,
the race of the person.
Mollie Pettit: Gender would be on the driver's license, but the race
they needed to guess, yeah.
Kirill Eremenko: Uh-huh. (affirmative). Okay. Gotcha. So, then you
visualize that. Unfortunately, we can't share the
visualization on the podcast, but we'll include a link to
the website, the illinoistrafficstops.com, is that right?
Is that the URL?
Mollie Pettit: Yes, illinoistrafficstops.com. Uh-huh. (affirmative).
Kirill Eremenko: Yeah, we'll include a link in the show notes and people
can check it out there. But basically you have this
visualization of what different races the police officers
would stop, and where do you go from there?
Mollie Pettit: The first thing that we looked at was who was stopped.
We didn't end up focusing on a stop rate metric
though, because there's a few reasons and I kind of
talked about this in the talk, but some of the reasons
why we decided not to do that was because it's not a
metric that's very accurate, because if you were going
to do a stop -
Kirill Eremenko: Tell us first of all, what is a stop rate? Like, I found
that part of your talk very interesting, 'cause that's the
first thing I would jump at, right? You're thinking
through all these reasons that you mention just now,
the stop rate is indeed the first thing that comes to
mind. So, what is a stop rate? And then why did you
decide not to go with that part?
Mollie Pettit: Sure, yeah. So, the stop rate would refer to the metric
calculated by dividing a races stopped population by
its driving population. So, of the drivers of a particular
race, how often are they stopped is the stop rate. And
... oh, sorry. Go ahead.
Kirill Eremenko: So, for instance, if you have let's say, I don't know,
let's say you have a hundred thousand white people in
a city and over that period of time, over a year, or
whatever period of time you're looking at, if ten
thousand white people are pulled over by police, then
the stop rate would be ten percent. Ten thousand
divided by a hundred thousand. But if you have let's
say 50 thousand African American people in the city
and they were also stopped ten thousand times, then
the stop rate there would be greater, it would be 20%.
Ten thousand over 50 thousand. Is that right?
Mollie Pettit: Sure. Uh-huh. (affirmative).
Kirill Eremenko: Okay. So, that's your stop rate. But this is the part I
found really interesting. It's not the best metric,
because we not knowingly actually make some
assumptions about these two data sets by calculating
the stop rate. Can you tell us about these assumptions
we make? Once you uncover them in the video I was
like, wow, indeed this is true. That does make sense
why it wouldn't be so accurate. So, what would you
suggest are the assumptions?
Mollie Pettit: In the talk that I gave, one of the first things I did was
I kind of show the stops demographics of Chicago and
then compare that to the stops demographics, or
sorry, the population demographics of Chicago and
show the differences there. So, what people often want
to do is they want to take the population of a city and
they want to assume that that's the driving population
and then create a stop rate from that, but there's a few
issues with that. One is that you don't actually know
what the driving population is of a city. You don't
know who drives to work. Maybe some people drive
much further to work or take the train or walk or
maybe people are driving through other cities in order
to get to work. So, the driving population through a
city might be very different or like a town, I think
that's a lot more relevant for small towns that the
people who are actually driving through that town,
that population might be different than the town itself.
So, comparing those two things isn't all that accurate,
because you don't really know what the driving
population was. So, that's one.
Mollie Pettit: And then another thing that was kind of an issue is
that on the traffic stops form, the traffic stops form
and the census are a bit different. So, on the traffic
stops form, Hispanic/Latino is listed as a race along
with Black, Asian, White, et cetera. Whereas on the
census form Hispanic/Latino is listed as an ethnicity
and then races are separate. So, you choose one, are
you Hispanic, Latino, or not and then also what's your
race. So, that makes comparing these two forms
tricky.
Mollie Pettit: Then another thing is that when someone's filling out
the census they are self reporting, whereas an officer
who has pulled somebody over is making an educated
guess of the race of that person. So, there's a lot of
things that makes it hard to compare this data for an
actually accurate metric.
Kirill Eremenko: Gotcha. Makes sense. That's very, very insightful. And
so what did you do instead?
Mollie Pettit: Yeah, so instead what we decided to do was to focus
on after a person was already stopped, what
happened? So, a big focus is looking at the search
rates. So, once all of the stops that involved Black
drivers, what's the percentage of those stops that
resulted in a search? So, looking at that, you can
compare what are the search rates for each race, how
does the search rates of Black and Hispanic drivers
compare to that of White drivers in that particular
agency. That's where you can I think get a much more
accurate read on various disparities, racial disparities
within the data.
Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Okay.
Gotcha. And then you actually developed another
metric which is to do a benchmarking, right?
Mollie Pettit: Yeah, that's right. Uh-huh. (affirmative).
Kirill Eremenko: Tell me about it.
Mollie Pettit: Yeah, so ... oh, go ahead.
Kirill Eremenko: Pardon. No, just tell us how that works, if you don't
mind.
Mollie Pettit: Yeah. So, a common critique of the application of this
text is that the rate at which drivers are searched.
Some people think, "Well, maybe that's not a good
indicator of bias." Perhaps a officer in his line of work
has noticed particular trends that causes him to
search a particular group of people more. So, then he
would just be doing appropriate police work, because
he's using his experience to inform his decisions. So
what we also did is we looked at what are the search
hit rates for various races. And what I mean by a hit
rate is, was contraband found or not. And what we've
found by looking at hit rates, is that in general across
agencies there was very few agencies where there was
a significant difference, like a statistically significant
difference between the search rate of White drivers and
minority drivers. In the cases where there were
significant differences, it was often that the minorities
had a lower hit rate than the White drivers.
Mollie Pettit: So, in Chicago, if you're looking at consent search
rates, Black and Hispanic drivers are searched about
three times more than White drivers. But if you then
look at the hit rates, Black drivers actually have a
lower hit rate and Hispanic drivers is about equal, but
neither of them are actually that significantly different
than the White hit rate. So, their search rates are
much higher, but the hit rates are not.
Kirill Eremenko: Uh-huh. (affirmative). Gotcha. And I was very
impressed and I think this is something that we need
to all do more of that in your visualizations you
actually presented statistical significance. I think you
came up with a very eloquent way to do it. You just
make something more transparent, like a data point or
a part of the realization more transparent, less opaque
if it's not statistically significant or if it's less
statistically significant than the other dots.
Mollie Pettit: Yeah.
Kirill Eremenko: That seems really clear. How did you come up with
that idea?
Mollie Pettit: You know, that was something we were wracking our
brains with for a while. We were realizing that being
able to show the statistical significance would be really
important in this, because if you're not showing what's
significant and what isn't, you're only telling a part of
the story and it can lead also to making conclusions
that aren't quite right, because you're assuming that
all of these are equally important. So, over time we
kind of came up with this ideas of just trying out using
opacity. So, yeah, as you said, things that are
statistically significantly different. So, if you're looking
at a plot, if a rate for that particular race is statistically
significantly different than the White rate for that
particular agency, it'll be fully opaque and otherwise
it's gonna be a lot lighter, a lot more transparent.
Kirill Eremenko: So, what's -
Mollie Pettit: As soon as we implemented it and we could see what it
looked like, we're like, "Ah, this is it."
Kirill Eremenko: Yeah. It's a great technique I think. I think it's a good
tip as well for our listeners to take away. Once they see
your visualizations they will be convinced that that's
one of the best ways. What would you say -
Mollie Pettit: Thank you.
Kirill Eremenko: Thank you. What is the test that you use for statistical
significance? Let's talk a bit about that, because a lot
of data scientists, especially if you're starting out don't
even consider the importance of doing statistically
significance tests.
Mollie Pettit: Sure. So, we used the Z test for two population
proportions. That's what it's called.
Kirill Eremenko: Okay. And so in a nutshell, what does it allow you to
do?
Mollie Pettit: It allows, oh gosh, let's see. I haven't had to talk about
this.
Kirill Eremenko: Just in short, why do you need to do statistical
significance test? What is the risk if you don't do one?
Mollie Pettit: Oh, sure. So, one of the things that we're showing in
this visualization is we're comparing the rates of two
races. We're comparing the search rates of Black
Drivers by this agency versus White drivers. But let's
say you're looking at a town and only two people were
pulled over or only two Black drivers were pulled over.
Those rates are going to be less significant, because
there's not enough data. There's not enough
information. If you pull over two Asian drivers and you
search one of them, that means 50% of the Asian
drivers in that city were searched. That's high.
Kirill Eremenko: Yeah. Yeah.
Mollie Pettit: But when you realize, only two people were pulled
over, like that's not a statistically significant
comparison.
Kirill Eremenko: Gotcha. Gotcha. That's a great example. So, basically
it shows you need more data. Like, there's not enough
data to make conclusive or any statistically significant
conclusions from that [crosstalk 00:52:02] to derive
any conclusive results.
Mollie Pettit: Sure, because that number is still a valid number,
right? It's still exactly the rate that is existent. It's just
not enough to say that there is a difference when
you're comparing it to the other rates.
Kirill Eremenko: Uh-huh. (affirmative). Totally, totally agree. It's cool to
see somebody in the space of visualization doing that,
because sometimes even practitioners in the space of
like machine learning don't do that. I've seen models
being deployed that haven't been checked for
statistical significance. Whereas in visualization it's
even easier to forget about that. So, it's a great ...
you're leading by example so other people... Even
when you're doing visualization it's important to test
these things.
Mollie Pettit: Yeah. Thanks.
Kirill Eremenko: Okay. So, another thing. So, with this bias, right? I
liked what you said in your presentation that you're
not doing this to point fingers at people and say,
"You're biased." Or "You're biased."
Mollie Pettit: Right.
Kirill Eremenko: Sometimes this bias happens unconsciously or
subconsciously and by looking at the data, because
this is like an important ethical consideration, right?
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: While looking at that data, we can at least shed light
on this bias and people become more aware of things
they might be doing unconsciously. I think that was a
very nice way of putting it. That data science isn't here
to shame people or here to cause, provoke people to
more conflict. It's here to point out what is the state of
things. Let's shed some light on -
Mollie Pettit: Yeah, exactly. Exactly. Like, what does the data
actually say? What is actually happening? Yeah,
exactly. Exactly what you said. The whole purpose is
not to point fingers. The purpose of doing the analysis
and doing the website, we're just really hoping it's
going to act as an informational tool both for the
public, but also I'm hoping that officers at agencies
across Illinois might look their own agency up and if
there are disparities in the data then they might think
about why that is and how many they can fix it. I
think it's a really helpful tool just to bring these
disparities to light so that the law enforcement
agencies of Illinois can make informed improvements
in their agency.
Kirill Eremenko: Yeah. Totally, totally agree. We're getting close to the
end of the podcast. I want to kind of leave this thought
with our listeners, a quote that you mentioned in your
talk. I don't know if you actually had this thought
written down, but it came out really well and you said,
"It's hard to fix problems when you don't know what
the problems are and it's hard to know what the
problems are if you don't have the data."
Mollie Pettit: Uh-huh. (affirmative).
Kirill Eremenko: I think that was really cool. So, in general racial bias is
something we want to fix and it's a problem, right? But
you can't really know what's ... sometimes these things
happen, sometimes we don't know the details of these
things. You can't know the problem in full unless you
actually go and analyze the data, which I think you've
done quite successfully with this project of yours.
Mollie Pettit: Thank you.
Kirill Eremenko: The Illinois traffic stops. Do you have any plans on
doing any more similar projects where, you know, like
PET projects where you help organizations that need,
that need to use data to do good in the world?
Mollie Pettit: Yeah. Honestly, I would love that. I would love if I
could spend the majority of my time on projects like
this. I don't know that I will ever be able to be
spending all of my time on projects like this, because
they don't always pay. This was mostly a volunteer
project. It's something that I just really wanted to do. It
started out kind of on the side and at some point I
decided to take a break from work and just focus on it
for a month and finish it.
Kirill Eremenko: So, you obviously did a very successful project with
this Illinois traffic stops initiative and I'm sure it will
help lots of people. Do you have any plans on doing
more projects like that where you help organizations
that use data and data science for good?
Mollie Pettit: Yes. Ideally, that's something I would really love to do.
So, there's first of all this project could be expanded.
There's a lot more things that could be added to the
site and more things that could be dug into. But
additionally outside of that, really wanting to do as
much of this kind of work as possible. In fact, the
people that I worked on this particular project with, if
you do go to illinoistrafficstops.com and go to the
bottom, you'll see that there's a little section, a little
support section basically detailing how a lot of
volunteer hours have gone into creating this. Despite
wanting to do it full time, you know, wallets don't
always allow that. So, there's a place where if people
want to contribute to the continuation of this project
as well as other social good projects, they can donate
with that link. Anything donated will only go to
basically pay for the creation of more projects either
this one or similar that are all social good focused.
Kirill Eremenko: Fantastic. I commend you guys on that. That's an
amazing idea. In fact, I'll be one of the first people to
donate. I, honestly, this is one of the first things I'm
going to do after this podcast. I often, like, I want to
help in the world, but oftentimes I kind of like stop,
because I hear stories that with a lot of organizations
that you donate to, you don't know where the money's
going. You don't know if it's going towards the admin
or is it going somewhere else. You know, in certain
countries it might be going in exactly the opposite
direction than what you think. But if these little
initiatives, little projects that are run by people that I
personally know, I know that this is going to be used
for good that is going to actually help contribute to the
world. So, thank you so much for doing that.
Mollie Pettit: Yeah, exactly.
Kirill Eremenko: You have me on board with that already.
Mollie Pettit: Fabulous.
Kirill Eremenko: Awesome. Okay. Well, Mollie, thank you so much for
coming today on the show. Being fantastic. I loved
your talk. I loved our conversation today. Before I let
you go, where would you say our listeners can best
find you, get in touch, follow you and your amazing
visualizations and projects?
Mollie Pettit: Yeah. So, the best place to find and follow and interact
me would probably be Twitter. My handle is Mollzmp,
which is M-0-L-L-Z-M-P.
Kirill Eremenko: Uh-huh. (affirmative). Mollzmp. Okay. Gotcha.
Mollie Pettit: Yep.
Kirill Eremenko: Okay. We will include that in the show notes and yeah.
We have one more final question for you today. What's
your favorite book that you can recommend to our
listeners to help them become better at their careers?
Mollie Pettit: I have a book in mind. It's D3 specific. So, if you're out
there listening and D3 is something that you are
interested in learning about and trying your hand at,
my favorite book to recommend people for getting into
learning is called Interactive Data Visualization for the
Web and that is by Scott Murray.
Kirill Eremenko: So there you go, ladies and gentlemen. Interactive
Data Visualization for the Web is your book
recommended by Mollie. Mollie, thanks again for
coming on the show today. Had a fantastic time with
you and I'm sure lots of listeners will get amazing
insights from our today's chat. Thanks so much.
Mollie Pettit: Well, thank you.
Kirill Eremenko: So, there you have it. That was Mollie Pettit and I hope
you enjoyed this episode as much as I did. Lots of
great energy, lots of laughs, and lots of interesting that
we talked about such as D3, the case study about
Illinois traffic stops, and using data science for good.
So, make sure to check out the illinoistrafficstops.com
website where you can play on with this case study
and actually see the interactivity of D3 in action on the
website. Also, if you can afford it, then at the bottom
there's a link where you can support Mollie's effort of
doing data science for good. I think that's a great way
to give back to the community. These projects often
are very helpful, but there's no funding for them. We
can all help like that. Or on the other hand you can
use your own data science skills to create your own
data science for good project or participate in one and
look out for those. I think it's a wonderful, fantastic
thing. A fantastic way of giving back to the world
through your data science skills or if you don't have
the time, through supporting others.
Kirill Eremenko: Also, Mollie asked me to mention that she has a Meet
Up in Chicago. So, if you're in Chicago and you want
to go to Mollie's Meet Up, then you can find the link to
this Meet Up in the show notes or you can go to
meetup.com and look for Chicago Data Viz
Community. Otherwise all of the links for this episode
will be in the show notes at
www.superdatascience.com/231. That's
superdatascience.com/231. You can get the link to the
Meet Up in Chicago, the Illinois traffic stop URL, the
Twitter handle for Mollie's Twitter. Make sure to follow
her there. And all the other items that we mentioned in
this podcast.
Kirill Eremenko: On that note, thanks so much for being here. I look
forward to seeing you back here next time and until
then, happy analyzing.