SDS PODCAST EPISODE 231: DATA VISUALIZERS: THE ... · Mollie Pettit: I think there's overlap and I...

SDS PODCAST

EPISODE 231:

DATA VISUALIZERS:

THE

STORYTELLERS

OF DATA SCIENCE

http://www.superdatascience.com/231

Kirill Eremenko: This is episode number 231 with Data Visualizer,

Mollie Pettit.

Kirill Eremenko: Welcome to the SuperDataScience Podcast. My name

is Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur. Each week we bring you inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today

and now let's make the complex simple.

Kirill Eremenko: Welcome back to SuperDataScience Podcast, ladies

and gentlemen. Super excited to have you on the show

today. Today we got a very exciting, lively, and

energetic guest joining us for the episode, Mollie Pettit.

Mollie was one of our speakers at DataScienceGo

2018. She did a fantastic job. The audience totally

loved her presentation and what you need to know

about Mollie is that she is a data visualizer. A

professional data visualizer. And right now you might

be wondering why am I stressing that she's a data

visualizer and how is that different to a data scientist.

Well, in this episode you will find out exactly why and

how those two terms are slightly different.

Kirill Eremenko: Also in this podcast we will talk a lot about D3.js a

JavaScript Library for creating outstanding,

phenomenal, mind blowing visualizations for data

science projects. You'll find out exactly when to use

D3, exactly when not to use D3, and what are the

advantages and disadvantages of this tool. Mollie uses

D3 quite a lot.

Kirill Eremenko: Also in this podcast you'll get a case study. We'll

discuss one of Mollie's case studies which is about


Illinois traffic stops and police officers pull over people

and what kind of biases may exist there, may not, and

how she went about exploring it. I will provide a link

where you can actually look at this project as you

listen to this podcast or after you listen to this

podcast.

Kirill Eremenko: And finally, we'll talk about using data science for good

and how Mollie participates in those projects and how

you can get involved as well. So, a podcast saturated

with lots of topics, lots of interesting things that we're

going to discuss. Can't wait for you to check it out.

Without further ado, I bring to you Mollie Pettit, a

professional data visualizer.

Kirill Eremenko: Welcome to the SuperDataScience Podcast, ladies and

gentlemen. Today I've got a super exciting guest on the

show with us, Mollie Pettit. Mollie, how are you doing

today?

Mollie Pettit: I'm doing great. How about you?

Kirill Eremenko: Doing well as well and how's Chicago these days?

That's where you are today, right?

Mollie Pettit: Yeah, that's right. Chicago is great. A little bit chilly

right now opposed to where you're at. It just started to

snow.

Kirill Eremenko: Wonderful. Wonderful. So, you haven't always been in

Chicago, right? You moved there a few years ago.

Mollie Pettit: Yeah, that's right. I moved here three years ago. Before

that I was actually living in Abu Dhabi for a few years

and before that California for grad school.


Kirill Eremenko: Wow. Such a crazy story. 'Cause we met at

DataScienceGo and as much as I wasn't able to attend

your talk, but I just watched it on the DataScienceGo

recordings and you really have a crazy story, like how

you went into geology and then visualization and

things like that. So, I'm really excited to dive into this

and learn about and share with all our listeners.

Mollie Pettit: Sure.

Kirill Eremenko: Before we get started on that, tell us a bit about who is

Mollie Pettit. Like, what would you say to somebody

you meet for the first time? How would you describe

what you do professionally right now?

Mollie Pettit: Who is Mollie Pettit? Yeah. So, I am a freelancer. I do

data science and data visualization. I do a variety of

projects. Nowadays my focus is mostly on doing a lot

of interactive data visualization projects, but I still do

... but a lot of it sometimes involves analysis before the

visualization. Then others are just straight up data

science analysis projects. So, that's a lot of what I do.

Kirill Eremenko: Okay. Okay. Wonderful. So, you moved to Chicago for

your job, is that correct?

Mollie Pettit: I did, yeah.

Kirill Eremenko: Okay. So, do you mind sharing with us what company

do you work for right now?

Mollie Pettit: Sure. So, right now I actually just work for myself. I

started a one person LLC, which is not creatively

named Mollie Pettit, LLC for the moment. I actually

originally moved to Chicago to work with Datascope

Analytics who I worked with for a couple years, not the


data science consultancy, which has actually since

then been bought by, well acquired by IDEO. So, the

Datascope Analytics became the data science team at

IDEO and it's growing.

Kirill Eremenko: Okay. Okay. Cool. I didn't know that you started your

own business. How's that going? How's the Mollie

Pettit, LLC going?

Mollie Pettit: Yeah, it's going well. There's a few things that are

about to be live which are exciting and kind of more

projects coming up, about to be started. It's going well.

It's nice. It's enjoyable to have the freedom to kind of

have the hours that you want and work from where

you like. I like that.

Kirill Eremenko: In the meantime you've also been very busy attending

and speaking at conferences, right? How many did you

attend this year? It's crazy. Like, you were at

DataScienceGo, Data Science Salon, the Tapestry, is it

what tens, hundreds? How many did you attend?

Mollie Pettit: No. Not quite that high. I think I've been to maybe

three or four this year. And I'm about to go to another

one. So, I spoke at Data Science Law in New York,

Data Science Law in Miami, DataScienceGo. I feel like

maybe I'm missing something, but I can't remember.

Like, I spoke at Data Science Law in LA, but that was

the end of last year. Then I'll be going to Tapestry in a

couple weeks. So, I'm looking forward to that.

Kirill Eremenko: Okay. Nice. Nice, nice. What inspires you to speak at

conferences? It obviously takes a lot of your time? Why

do you do it?


Mollie Pettit: I think there's a lot of really nice things about

speaking at a conference. I think one, it gives you the

ability to tell people about something that you're

working on, something that you're really excited about.

It gives you really good opportunity to meet a lot of

other people who are also in data science or in data

visualization or are looking to get into it. A lot of really

interesting conversations. You get to learn about what

other people are doing. Yeah, I think those are a lot of

the reasons that I enjoy. Also, there's that added

benefit of just getting to travel. Travel around.

Kirill Eremenko: Yeah, that's true. That's true. And kind of like what

you said, it broadens your horizons, helps you not just

think outside the box, but sometimes in order to think

outside the box we need some external input which is

already outside the box in order to like start thinking

like that.

Mollie Pettit: Sure. Yeah.

Kirill Eremenko: Cool. Well, I'm very excited to talk about, probably to

start our conversation with what we were debating

about just before the podcast: data science and

visualization. Are these the same thing? Or two

adjacent fields? I'd totally love and appreciate your

opinion on that. Can you share with more of us why

do you think or why is your position that data science

and visualization are actually quite different areas? As

far as I understand.

Mollie Pettit: I think there's overlap and I think that I would say

that data visualization is an important part of data

science. But I think that when you start getting into


interactive kind of front end of data visualization,

which is a lot of what I do now, the reason that it is a

bit different is because it requires the use of different

tools and languages. For example, when I do data

science I tend to use Python, whereas when I'm doing

front end data visualization I use D3.js, which is a

JavaScript Library.

Mollie Pettit: So, there is different languages being used and I think

that there's a lot of overlap. If there was a Venn

diagram, they would definitely cross in the center,

right?

Kirill Eremenko: Uh-huh. (affirmative).

Mollie Pettit: Because something that's really great about data

visualization is once you've done data science and you

have the interesting insights and you have these

things that you want to then get across to an

audience, which could be a massive public audience or

perhaps it's just an internal audience, data

visualization is something that can then be used to tell

that story really well. I think that having a data

science background is very helpful in doing data

visualization. But when you're doing data visualization

versus data science, you have just different focuses.

With data science you're trying to really uncover these

interesting insights and if you're doing EDA, for

example. Whereas with data visualization, you are

trying to display those insights in a way that it's very

easy to understand.

Kirill Eremenko: Gotcha. What's EDA?

Mollie Pettit: Oh, exploratory data analysis.


Kirill Eremenko: Uh-huh. (affirmative). Okay. Cool.

Mollie Pettit: All right.

Kirill Eremenko: That's okay. I actually also identify that visualization

can be used for two things. That you can use it for, I

call it visual data mining, VDM.

Mollie Pettit: Oh, for sure.

Kirill Eremenko: And the other thing is obviously presenting your

insights and creating these beautiful visualizations.

Mollie Pettit: Yeah.

Kirill Eremenko: And I like how in your talk you mention what D3 is

good at and before you describe what it's good at you

actually said what it's not good for. One of the things

it's not ideal for is when you want to do that

exploratory data analysis. When you want to do

quickly put something together, identify what are the

insights, what are the trends. It doesn't have to be

attractive. It doesn't have to be super presentable. Just

get some quick insights from the data.

Mollie Pettit: Yeah. Yes, exactly. Like you said, there are multiple

reasons to do data viz and some of them are much

more tied into data science like using data

visualization for this exploratory aspect. Were you

wanting me to get into what D3 is and what it's not

good for and what it is good for?

Kirill Eremenko: That's a good point. Yeah. Let's do that, because I

think we've heard D3 on the podcast before by some

speakers especially with Nadieh coming to the podcast.

We talk about Nadieh Bremer here. Yeah, give us a


guide. It looks like D3's a tool that's used most often.

Is that about right?

Mollie Pettit: Yeah, it's a really popular tool and there's a few

reasons for that. D3's a little bit more complicated. It

has a more steep learning curve than some other tools

that someone might use. For instance, people

sometimes might use a wrapper that will allow them to

still use Python to create some of these visualizations,

but the benefit of using D3 itself is that it is really

flexible and customizable and you can make these

visualizations do exactly what you want with a lot of

different interactions. Hover and click and various

things like that. So, it's extremely customizable. It lets

you tell the story that you want to tell.

Kirill Eremenko: I love D3 myself. I tried it when I was back in Deloitte

we had an option of picking a tool for a project and we

didn't end up using D3, because it was too complex,

but nevertheless, my director and I we decided to have

a challenge who can learn D3 the best in like two or

three weeks it was. And we had to come up with a

visualization. It was really fun. D3 is kind of like

working with the webpage.


Kirill Eremenko: On a webpage you right click and we've all probably

done this back then. You right click and click "view

page source" and you look into the HTML and see the

sets and so on. So, D3 actually manipulates all of that

dynamically to place different objects on the screen

and so it's really cool because it's so structured. Even

though it's a programming language, it's so structured


in the way that HTML is so structured. I found it

fascinating. You're right, it has steep learning curve,

but it's so fun to try to do that because instantly you

get feedback, right? You see a rectangle on your screen

and then all of a sudden it turns into a circle. It all

happens dynamically that whole library.


Kirill Eremenko: So, smooth. I like the smoothness of it.

Mollie Pettit: Yeah, it is. It's a steeper learning curve, but once

you're gotten over that hump, you're able to do so

much.

Kirill Eremenko: Yeah. That's true. True. When did you first encounter

D3?

Mollie Pettit: Actually, I have a question for you. Did you enter that

challenge and how did you do?

Kirill Eremenko: Oh, yeah. It was just my director and I and nobody

else wanted to join because it was too complex

apparently or something. He was visualizing some

client data about trains or something like that and I

was visualizing ... see what I did was I took our team,

it was like we had 15 people on the team or 12 or

something, and I got the data internally about the

billable hours, how much hours they're billing and

how much hours they're spending on training and how

much hours they're spending on something else, like

admin work. And I put those into like ... and I called it

the Pie Factory, because I created a pie chart for every

person. And you could like click on it and all this

information would pop up. You know, what clients


they've been working on, how much money they've

billed. You had to really put into perspective how

much money everybody's bringing into the business.

Kirill Eremenko: Personally, I think I won, because I finished mine on

time even though it was simpler than his. I finished on

time, but his was more complex and it was very ... also

had some cool dynamic visualizations in there. It was

great fun in there. This was something I found in your

talk very interesting. At the end actually, you got some

questions and one of the questions was: how do you

learn the tools? How do you choose what to learn? And

what you said was that you don't actually pick the

tools you want to learn you pick the project you want

to do. Like a PET project or a work project and then

you find along the way you just decide or you see what

you need, what tools you need to accomplish the task

at hand and you actually go and learn those tools as

you're doing a project. I thought that was amazing

advice.

Mollie Pettit: Yeah. I think really often people when they get a new

project or task that they're going to try to tackle they

think about, "Okay. Well, what do I know that can help

me tackle this?" But I think it's nice and better to go at

it in terms of what's the best way that this can be

tackled? Do I know how to do that yet? If I don't,

maybe is this a good opportunity for me to learn that

thing to tackle this problem?

Kirill Eremenko: Yeah. And you also mention in your talk that ... what

was that company, Datascope that you worked for?

Mollie Pettit: Datascope, yep.


Kirill Eremenko: Yeah, in Datascope that they had their philosophy. It's

if you have a project, you need to use the best tool for

that project as opposed to a tool that might be good

enough that you know really well. So, even if you know

five tools that might be good enough, maybe you

should use the one that's the best. If you don't know

it, doesn't matter. Go learn it. I love that.

Mollie Pettit: Yeah. It's a good opportunity to learn it. That was

something I really enjoyed about working at that

company. I think that it's easy to have the other

mentality of I'm gonna do what I know and I think

working there really kind of got that out of me and got

me to a point where I felt way more comfortable being

like, "Oh, yeah. I don't know this thing. Let's figure it

out."

Kirill Eremenko: Yeah and that should be the mentality of a data

scientist, right?

Mollie Pettit: Uh-huh. (affirmative).

Kirill Eremenko: Like, constant curiosity. Anyway, let's jump back to

D3. So, what is D3? What does the abbreviation stand

for? What is triple D?

Mollie Pettit: Yeah, D3 stands for data driven documents.

Kirill Eremenko: Okay and what does that mean?

Mollie Pettit: Data driven documents. Data is what you're going, you

know, the data that you're going to be putting into

some sort of visualization. Documents is your web

document. So, your website. Driven would just be the

act of I guess putting that into the website. So, using

data to make stuff on the web.


Kirill Eremenko: Nice. Nice.

Mollie Pettit: Is basically what that means, yeah.

Kirill Eremenko: So, when was the first time you encountered D3?

Mollie Pettit: I think the first time I encountered D3 was early on at

Datascope, actually. So, when I first [crosstalk

00:18:41].

Kirill Eremenko: Was it a project?

Mollie Pettit: No. So, when I first started at Datascope, they used to

have this set up where when somebody was new at the

company rather than going right onto a client project,

they would have an opportunity to do a PET project.

They would dabble, they would kind of slowly get

involved in client projects, but this kind of gave them

an opportunity to get settled then to learn something

new that they wanted to learn. So, when I first started

I decided to do a PET project that was a network app,

this web app that would be a network diagram of Star

Trek characters, because I am a Trekie. So, I scraped

every single Star Trek episode transcript and movie

transcript and put together this app where people

could select any combination of episodes and movies

and hit "engage" and a network diagram would appear

using D3 that would show the connections between

the various characters in that selection of episodes

and movies.

Kirill Eremenko: Wow. Wow. Very nerdy.

Mollie Pettit: Very nerdy, yeah. And then once that diagram

appeared people could click on a node to focus on it

and have it highlighted and its connections and choose


particular characters they were interested in. So, it

was fun.

Kirill Eremenko: So, how long did that take you?

Mollie Pettit: The actual visualization part I'm not sure. The whole

project took a couple of months, but that was ... I

mean, I was not just doing that. There were other

things happening at the same time. That also though

involved a lot of things in preparing the data to be

visualized. Like the scraping of all the transcripts and

getting everything set up in such a way that it would

be usable in a visualization. So, there was a lot of

different steps for that project.

Kirill Eremenko: Gotcha. What I love about approaching that is by the

end of those, it sounds like quite a lot, a few months,

by the end of those few months, you have a super

brand new skill. You might not be the expert at D3,

but you know that there's certain things that you

know how to do. Like, in three months you might be

70% up to speed or 80% up to speed with what D3 is

all about and how to use it. So, you build up so much

confidence in that time, wouldn't you say?

Mollie Pettit: Yeah. Yeah, for sure. It was definitely a great

introduction to D3 and also I mean, I hadn't even

actually done a huge amount of web scraping at that

point, so that also was a very good crash course in

that, because these were not straightforward set up

sites. They were very inconsistent. So, there was a lot

of exceptions to account for.

Kirill Eremenko: Okay. Gotcha.


Mollie Pettit: So, that was good to do. There's a lot of different things

that I had to do for this project, so I learned a lot along

the way.

Kirill Eremenko: You were kind of like in both fields. You are both in

data science and you've done data science work and

you're in visualization. As I understand you're doing

more and more visualization work now.

Mollie Pettit: Yes.

Kirill Eremenko: Why the shift? Why did you decide to move away from

the data science, I guess the web scraping, the

algorithms and so on and move more into the space of

visualization?

Mollie Pettit: It's not because I don't enjoy data science, I do. And I

still enjoy that I still get to do it when I'm doing data

visualization projects sometimes and I like having the

occasional straight up data science project, but I think

the reason I like to focus on data visualization is

honestly I just find it really fun. I really enjoy creating

this ability to tell stories really well. An ability to

highlight things that are really interesting and also

coding when you're creating something in D3 for

instance, you know, you write a few more lines of code

and you hit "refresh" and you get to see this new thing

that you added. So, that's really nice too.

Kirill Eremenko: Yeah. More room for ... it's kind of like quicker

feedback. You get the results faster.

Mollie Pettit: Yeah. Yeah.

Kirill Eremenko: Rather than waiting a few months. Okay. All right.

Would you recommend this path to data scientists?


Maybe listeners who are tuning into this podcast who

are not yet sure if they want to do data science,

visualization, how would somebody make up their

mind of which way they want to go?

Mollie Pettit: Pick a project and do it. That's the best way I can ever

think of to figure out if you like something. I think that

if people really enjoy kind of the visual and design

aspect but still want to use some data science I think

in order to understand which way you want to go, you

really just have to pick some projects and do them. I

think that's how I learned what direction I wanted to

go every kind of step of the way is I just kept doing

things. I kept learning new things and once I started

kind of getting into D3 and visualization I realized I

really loved it. I started ... well, I, while still at

Datascope, started asking to be on more visualization

projects and by doing more and more of them I

realized I just really liked that and I kind of started

focusing more on that direction. I think the way to

know if you like something is to do it.

Kirill Eremenko: Gotcha. I can see that D3 and from my experience with

it and from the visualizations I've seen ... there's, by

the way, there's a really cool library by Michael

Bostock. It's called, what is it called? Blocks. Bl.ocks.

Or something like that?

Mollie Pettit: Oh, yeah. The website. Yeah.

Kirill Eremenko: Blocks.org. Like that, but it's like bl.ocks.org or

something like that.



Kirill Eremenko: We'll put it in the show notes. There's some really

amazing D3 visualization and templates that you can

use and copy and adjust and just explore all open

source. So, I can see that D3 is way ahead in terms of

the capabilities than other tools. Like, even Tableau,

which I love dearly, great tool, but it's more agile. It's

more drag and drop. It allows you to create

visualization that are fast, but at the same time even

though it has a lot of flexibility, nowhere near to what

D3 offers. The price you pay in D3 is you have to code.

You have to design your visualization -

Mollie Pettit: Right, yeah.

Kirill Eremenko: - very carefully. So, what I want to ask you is, what do

you see in the future? Do you see that D3 has a

future? It's been around for a couple of years and it's

had a really interesting path, but do you see other

tools edging it out and more people moving to tools like

Tableau and more drag and drop, self-serve analytics

type of tools? Or do you see that there is a market,

there's a place for more sophisticated tool like D3 in

the space of data visualization?

Mollie Pettit: Yeah. I think that there's room for both and I think

they have different applications and different reasons

to be used. Like you said Tableau is really great and

something that's nice about it is you don't have to

learn a whole language. Yeah, you don't have to code.

You can very quickly make some really beautiful

things. Because you're not actually coding though, you

have less control. So, if you're trying to do something

very complex, you may eventually kind of hit a

roadblock and hit the end of the capabilities of being


able to customize the way you want. D3 is more

complicated to learn and is harder to learn, but it is

much more customizable and flexible and you are able

to customize things in the way that you want. You

don't really hit these roadblocks that you might hit

with Tableau.

Mollie Pettit: So, I think that they both are very great and they have

different strengths and different weaknesses. So, I

think they're both going to stick around.

Kirill Eremenko: That's good, because in one of the previous podcasts I

had one of the guests made a good comment that it's

important to understand also what is the future of a

tool before you go and learn it. You know, like is this

tool going to be around?


Kirill Eremenko: And by the sound of it, D3 is going to be around. But

by the way -

Mollie Pettit: Yep. That's how I -

Kirill Eremenko: - how is the community of D3?

Mollie Pettit: Sorry. How's the community?

Kirill Eremenko: Yeah, is there a community in D3? People, like when

you have a question or somebody has questions, do

they post it online and is it easy to get answers and

help and guidance?

Mollie Pettit: Oh, yeah. That's a good question. So, one thing that's

really nice is Bl.ocks, which you've mentioned. Which

is a lot of times if you have something that you're

trying to make, especially when you're first starting,


you can often find an example for it in Bl.ocks. So,

what Bl.ocks is really nice for is you not only get to see

this interactive visualization right in front of you, but

the whole code is right below it. There's also, let me

make sure I have this right, blocksbuilder.org. And

something that's nice about blocksbuilder.org is you

can access any of the posts that are posted on Bl.ocks,

but it allows you to write there, edit them, and what

that's good for is -

Kirill Eremenko: Nice.

Mollie Pettit: Yeah. What that's good for is, let's say you're looking

at some code and you're like, "Hm. I'm not sure exactly

what this line does." And you can edit it and see if you

break it or see if the color does change. You know, you

can do things straight in there to very quickly get an

understanding of what things are doing. So, that's

really nice and then also I don't know if you've

actually, have you heard of Observable?

Kirill Eremenko: Nope. No, I haven't heard of it. What is that?

Mollie Pettit: So, Observable. It's kind of like a Jupiter Notebook,

but for D3.

Kirill Eremenko: Oh, nice.

Mollie Pettit: So, Observable is a website and it was also started by

Mike Bostock. But yeah, it has that kind of set up

where you can easily kind of like tell a story, but then

within that story have code and have a working,

interactive visualization in the middle of it. Very much

like a Jupiter Notebook, but specific for kind of front

end interactive stuff.


Kirill Eremenko: Wow.

Mollie Pettit: Yeah, so I think that there are some, like you can

definitely find some D3 answers on Stack Overflow,

but I think something that's really nice about D3 is

you can also just find a lot of examples. So, even if you

can't necessarily find someone who's asked the same

question, you can probably find someone who's done

the thing you're trying to do.

Kirill Eremenko: Gotcha. Gotcha.


Kirill Eremenko: There's even a conference in San Francisco about D3,

right?

Mollie Pettit: There is, yeah. There's D3.unconf. The last one was

last September. It didn't happen this year. But it will

... I'm pretty sure it's gonna be happening next year.

I'm not involved in planning that, so I don't have

specific details. As far as a community, there's also a

D3 Slack that I'm a part of that has upwards of, let's

see, I'm looking at it now, about four thousand

members. There's a help section in there. So,

sometimes people will post in there and say, "I'm

trying to do this thing, but it's not working. How do I

do it?" And people will respond there.

Kirill Eremenko: Gotcha. Gotcha. What's an unconference, by the way,

while we touch on this?

Mollie Pettit: That's a good question. I can tell you a little bit about

what it was. So, it's a lot less, at least this particular

unconf, it wasn't full of talks. So, there was only one or

two talks. I believe they were done by Nadiah as well


as, Nadiah Bremer as well as Sarah Drasner. Those

were at the very beginning of the unconf. The rest of it

were these discussion sessions where there would be

maybe four different discussions going on at the same

time and you would choose a room to go to and you

would discuss that topic. Sometimes that would

involve someone being at a computer and kind of

pulling up things that people were talking about that

were either D3 related or just visualization related. It

was just kind of these guided conversations and a way

for people to kind of meet other people who were doing

a similar thing. So, it was less talks and more

discussion.

Kirill Eremenko: Wow, interesting. And how big was this discussion?

Was it like hundreds of people?

Mollie Pettit: Not in each discussion, no. I'm not even ... I'm trying

to think how many people were there total. Probably

within a couple hundred total and each discussion

probably had upwards of 50 or so people in it.

Kirill Eremenko: Interesting. Interesting. I heard from unconferences

first from Pablos Holman who was at DataScienceGo

as well. I have never been to one, but I find it's a quite

interesting concept. I gotta check it out.

Mollie Pettit: Yeah. I really enjoyed it. It was my first unconf, but it

was great.

Kirill Eremenko: Okay. All right. Cool. Well, thanks for that overview of

D3 and the future. I hope all the listeners are pretty

excited and I can personally vouch for it. It's a really

fun experience. I don't use it anymore, but what I

learned in the process of learning it really was


fascinating and helped me even improve the way I

understand websites. The way I understand

interactivity and what's possible with visualization.

Mollie Pettit: Yes. It definitely improves that knowledge, for sure.

Kirill Eremenko: And next I wanted to talk a bit about the case study

that you shared with us at DataScienceGo.

Mollie Pettit: Oh, sure.

Kirill Eremenko: The case study of Illinois traffic. I found that very

interesting how like policemen pull over people and

you were actually investigating whether there's bias,

specifically racial bias and how police officers pick the

cars that they pull over, the cars that they search, and

then the citings that they hand out. That was a really

cool project. How did that all start?

Mollie Pettit: The way that started was I went to a meeting and I

don't remember what the meeting was called. This was

I think a little over a year ago. This meeting was for

people in tech who wanted to use their knowledge and

use what they could do to help in some way. So, the

people that were at this meeting were people in tech

who wanted to find some way to volunteer and help

out and then also organizations that wanted that help.

Mollie Pettit: So, at that meeting I ended up meeting Karen Sheley

or Shelley. I would like to check on that. So, at that

meeting I met Karen Sheley who works for the ACLU.

She had mentioned that she really needed or at least

really wanted to have some sort of a data contact,

because they were trying to put together a traffic stops

report that would just go through the analysis of this


traffic stops data. Who police is pulling over and

searching and citing, et cetera, in different law

enforcement agencies across Illinois. What they were

really just looking for is somebody that they could call

on for help. Like, if they had questions about the

analysis or the data. I was there with a colleague and

we were like, "We can do more than that. We can help

with the analysis." The people who were doing the

analysis at the time, it was mostly some simple Excel

stuff that was being done. We wanted to kind of help

them do something more complicated with this so that

they could have an even more in depth report.

Mollie Pettit: So, we worked with them to do this analysis and look

at the search rates, et cetera across different agencies.

Then it eventually evolved and I started working with

them on a website that would walk people through this

analysis that had been done and they could look at

these data visualizations that would be interactive.

They could choose different agencies. They could click

on things and get more information and it would really

tell the story of what these racial disparities in traffic

stops look like in different agencies.

Kirill Eremenko: Gotcha. You mentioned that the website at the time

when we were recording this is not yet live, but it's

about to go up. So, by the time recording is live, it's

definitely out there already. What's the website? Where

can people go maybe right now while they're listening

to this podcast?

Mollie Pettit: Yeah. If you go to illinoistrafficstops.com, you'll be able

to find it.


Kirill Eremenko: Nice. Nice. So, it's similar to the visualizations that you

shared at DataScienceGo, right?

Mollie Pettit: Yes. I shared some of the visualizations at

DataScienceGo. Yeah, I think I had a bit of a Chicago

focus, but on this particular website you can look at

any of the agencies.

Kirill Eremenko: Okay. Fantastic. All right. So, that's how you guys met

and that's what you helped them or decided to help

them out with. So, how did the project go? So, like you

got this idea, then what happened? Like, was this part

of ... you obviously had a job at the same time. So, this

was like a free time project that you were doing?

Mollie Pettit: Yeah. So, when I first started it was. It was a free time

project. It was something I was doing when there was

time. But actually something that was really nice is I

was able to incorporate it into Datascope at the time.

So, as a consultancy, sometimes there is downtime,

right?

Kirill Eremenko: Uh-huh. (affirmative). Yep.

Mollie Pettit: Sometimes you just finished a project and you're

gonna start another project in a week and you're

waiting for that to start. So, I convinced everyone there

that we kind of bring this in internally and when

people had a down week if they wanted to work on

this, they could. So, for a little bit it was kind of an

internal project at Datascope. That was really great,

because then we were able to utilize this time that

would have been downtime anyway to do something

that we thought was really exciting to work on and

important. After the acquisition, one of my former


colleagues at Datascope and I kind of kept up with it.

Chris Kucharczyk. So, him and I have been the main

people kind of working on it this past year. Then more

recently a good friend of mine, Alex Alleavitch came on

as a front end engineer.

Kirill Eremenko: Okay. Gotcha. All right. So, now we've got the picture

painted and this is our... Super excited and impatient

to find out what is this project all about, how did it go.

So, tell us the starting point of the project. What kind

of data do you have? Where does it comes from? And

then we'll go from there.

Mollie Pettit: Sure. Yeah. So, the data that we have is whenever a

person is pulled over in Illinois, the law enforcement

officer is required to fill out a form and that form

details information about who was pulled over, what

was their gender and race, information from their

driver's license, why was that person pulled over. Once

they were pulled over, did the officer search that

person? If that person was searched, was contraband

found or not? Then what was the result of that stop?

Was that person cited? Or given a verbal or written

warning? So, that's the data that we're working with.

What the data looked like raw was one, you know, line

of data for each stop that occurred.

Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Gotcha.

And just to clarify, the officer had to guess the gender,

the race of the person.

Mollie Pettit: Gender would be on the driver's license, but the race

they needed to guess, yeah.


Kirill Eremenko: Uh-huh. (affirmative). Okay. Gotcha. So, then you

visualize that. Unfortunately, we can't share the

visualization on the podcast, but we'll include a link to

the website, the illinoistrafficstops.com, is that right?

Is that the URL?

Mollie Pettit: Yes, illinoistrafficstops.com. Uh-huh. (affirmative).

Kirill Eremenko: Yeah, we'll include a link in the show notes and people

can check it out there. But basically you have this

visualization of what different races the police officers

would stop, and where do you go from there?

Mollie Pettit: The first thing that we looked at was who was stopped.

We didn't end up focusing on a stop rate metric

though, because there's a few reasons and I kind of

talked about this in the talk, but some of the reasons

why we decided not to do that was because it's not a

metric that's very accurate, because if you were going

to do a stop -

Kirill Eremenko: Tell us first of all, what is a stop rate? Like, I found

that part of your talk very interesting, 'cause that's the

first thing I would jump at, right? You're thinking

through all these reasons that you mention just now,

the stop rate is indeed the first thing that comes to

mind. So, what is a stop rate? And then why did you

decide not to go with that part?

Mollie Pettit: Sure, yeah. So, the stop rate would refer to the metric

calculated by dividing a races stopped population by

its driving population. So, of the drivers of a particular

race, how often are they stopped is the stop rate. And

... oh, sorry. Go ahead.


Kirill Eremenko: So, for instance, if you have let's say, I don't know,

let's say you have a hundred thousand white people in

a city and over that period of time, over a year, or

whatever period of time you're looking at, if ten

thousand white people are pulled over by police, then

the stop rate would be ten percent. Ten thousand

divided by a hundred thousand. But if you have let's

say 50 thousand African American people in the city

and they were also stopped ten thousand times, then

the stop rate there would be greater, it would be 20%.

Ten thousand over 50 thousand. Is that right?

Mollie Pettit: Sure. Uh-huh. (affirmative).

Kirill Eremenko: Okay. So, that's your stop rate. But this is the part I

found really interesting. It's not the best metric,

because we not knowingly actually make some

assumptions about these two data sets by calculating

the stop rate. Can you tell us about these assumptions

we make? Once you uncover them in the video I was

like, wow, indeed this is true. That does make sense

why it wouldn't be so accurate. So, what would you

suggest are the assumptions?

Mollie Pettit: In the talk that I gave, one of the first things I did was

I kind of show the stops demographics of Chicago and

then compare that to the stops demographics, or

sorry, the population demographics of Chicago and

show the differences there. So, what people often want

to do is they want to take the population of a city and

they want to assume that that's the driving population

and then create a stop rate from that, but there's a few

issues with that. One is that you don't actually know

what the driving population is of a city. You don't


know who drives to work. Maybe some people drive

much further to work or take the train or walk or

maybe people are driving through other cities in order

to get to work. So, the driving population through a

city might be very different or like a town, I think

that's a lot more relevant for small towns that the

people who are actually driving through that town,

that population might be different than the town itself.

So, comparing those two things isn't all that accurate,

because you don't really know what the driving

population was. So, that's one.

Mollie Pettit: And then another thing that was kind of an issue is

that on the traffic stops form, the traffic stops form

and the census are a bit different. So, on the traffic

stops form, Hispanic/Latino is listed as a race along

with Black, Asian, White, et cetera. Whereas on the

census form Hispanic/Latino is listed as an ethnicity

and then races are separate. So, you choose one, are

you Hispanic, Latino, or not and then also what's your

race. So, that makes comparing these two forms

tricky.

Mollie Pettit: Then another thing is that when someone's filling out

the census they are self reporting, whereas an officer

who has pulled somebody over is making an educated

guess of the race of that person. So, there's a lot of

things that makes it hard to compare this data for an

actually accurate metric.

Kirill Eremenko: Gotcha. Makes sense. That's very, very insightful. And

so what did you do instead?


Mollie Pettit: Yeah, so instead what we decided to do was to focus

on after a person was already stopped, what

happened? So, a big focus is looking at the search

rates. So, once all of the stops that involved Black

drivers, what's the percentage of those stops that

resulted in a search? So, looking at that, you can

compare what are the search rates for each race, how

does the search rates of Black and Hispanic drivers

compare to that of White drivers in that particular

agency. That's where you can I think get a much more

accurate read on various disparities, racial disparities

within the data.

Kirill Eremenko: Uh-huh. (affirmative). Uh-huh. (affirmative). Okay.

Gotcha. And then you actually developed another

metric which is to do a benchmarking, right?

Mollie Pettit: Yeah, that's right. Uh-huh. (affirmative).

Kirill Eremenko: Tell me about it.

Mollie Pettit: Yeah, so ... oh, go ahead.

Kirill Eremenko: Pardon. No, just tell us how that works, if you don't

mind.

Mollie Pettit: Yeah. So, a common critique of the application of this

text is that the rate at which drivers are searched.

Some people think, "Well, maybe that's not a good

indicator of bias." Perhaps a officer in his line of work

has noticed particular trends that causes him to

search a particular group of people more. So, then he

would just be doing appropriate police work, because

he's using his experience to inform his decisions. So

what we also did is we looked at what are the search


hit rates for various races. And what I mean by a hit

rate is, was contraband found or not. And what we've

found by looking at hit rates, is that in general across

agencies there was very few agencies where there was

a significant difference, like a statistically significant

difference between the search rate of White drivers and

minority drivers. In the cases where there were

significant differences, it was often that the minorities

had a lower hit rate than the White drivers.

Mollie Pettit: So, in Chicago, if you're looking at consent search

rates, Black and Hispanic drivers are searched about

three times more than White drivers. But if you then

look at the hit rates, Black drivers actually have a

lower hit rate and Hispanic drivers is about equal, but

neither of them are actually that significantly different

than the White hit rate. So, their search rates are

much higher, but the hit rates are not.

Kirill Eremenko: Uh-huh. (affirmative). Gotcha. And I was very

impressed and I think this is something that we need

to all do more of that in your visualizations you

actually presented statistical significance. I think you

came up with a very eloquent way to do it. You just

make something more transparent, like a data point or

a part of the realization more transparent, less opaque

if it's not statistically significant or if it's less

statistically significant than the other dots.


Kirill Eremenko: That seems really clear. How did you come up with

that idea?


Mollie Pettit: You know, that was something we were wracking our

brains with for a while. We were realizing that being

able to show the statistical significance would be really

important in this, because if you're not showing what's

significant and what isn't, you're only telling a part of

the story and it can lead also to making conclusions

that aren't quite right, because you're assuming that

all of these are equally important. So, over time we

kind of came up with this ideas of just trying out using

opacity. So, yeah, as you said, things that are

statistically significantly different. So, if you're looking

at a plot, if a rate for that particular race is statistically

significantly different than the White rate for that

particular agency, it'll be fully opaque and otherwise

it's gonna be a lot lighter, a lot more transparent.

Kirill Eremenko: So, what's -

Mollie Pettit: As soon as we implemented it and we could see what it

looked like, we're like, "Ah, this is it."

Kirill Eremenko: Yeah. It's a great technique I think. I think it's a good

tip as well for our listeners to take away. Once they see

your visualizations they will be convinced that that's

one of the best ways. What would you say -

Mollie Pettit: Thank you.

Kirill Eremenko: Thank you. What is the test that you use for statistical

significance? Let's talk a bit about that, because a lot

of data scientists, especially if you're starting out don't

even consider the importance of doing statistically

significance tests.


Mollie Pettit: Sure. So, we used the Z test for two population

proportions. That's what it's called.

Kirill Eremenko: Okay. And so in a nutshell, what does it allow you to

do?

Mollie Pettit: It allows, oh gosh, let's see. I haven't had to talk about

this.

Kirill Eremenko: Just in short, why do you need to do statistical

significance test? What is the risk if you don't do one?

Mollie Pettit: Oh, sure. So, one of the things that we're showing in

this visualization is we're comparing the rates of two

races. We're comparing the search rates of Black

Drivers by this agency versus White drivers. But let's

say you're looking at a town and only two people were

pulled over or only two Black drivers were pulled over.

Those rates are going to be less significant, because

there's not enough data. There's not enough

information. If you pull over two Asian drivers and you

search one of them, that means 50% of the Asian

drivers in that city were searched. That's high.

Kirill Eremenko: Yeah. Yeah.

Mollie Pettit: But when you realize, only two people were pulled

over, like that's not a statistically significant

comparison.

Kirill Eremenko: Gotcha. Gotcha. That's a great example. So, basically

it shows you need more data. Like, there's not enough

data to make conclusive or any statistically significant

conclusions from that [crosstalk 00:52:02] to derive

any conclusive results.


Mollie Pettit: Sure, because that number is still a valid number,

right? It's still exactly the rate that is existent. It's just

not enough to say that there is a difference when

you're comparing it to the other rates.

Kirill Eremenko: Uh-huh. (affirmative). Totally, totally agree. It's cool to

see somebody in the space of visualization doing that,

because sometimes even practitioners in the space of

like machine learning don't do that. I've seen models

being deployed that haven't been checked for

statistical significance. Whereas in visualization it's

even easier to forget about that. So, it's a great ...

you're leading by example so other people... Even

when you're doing visualization it's important to test

these things.

Mollie Pettit: Yeah. Thanks.

Kirill Eremenko: Okay. So, another thing. So, with this bias, right? I

liked what you said in your presentation that you're

not doing this to point fingers at people and say,

"You're biased." Or "You're biased."

Mollie Pettit: Right.

Kirill Eremenko: Sometimes this bias happens unconsciously or

subconsciously and by looking at the data, because

this is like an important ethical consideration, right?


Kirill Eremenko: While looking at that data, we can at least shed light

on this bias and people become more aware of things

they might be doing unconsciously. I think that was a

very nice way of putting it. That data science isn't here

to shame people or here to cause, provoke people to


more conflict. It's here to point out what is the state of

things. Let's shed some light on -

Mollie Pettit: Yeah, exactly. Exactly. Like, what does the data

actually say? What is actually happening? Yeah,

exactly. Exactly what you said. The whole purpose is

not to point fingers. The purpose of doing the analysis

and doing the website, we're just really hoping it's

going to act as an informational tool both for the

public, but also I'm hoping that officers at agencies

across Illinois might look their own agency up and if

there are disparities in the data then they might think

about why that is and how many they can fix it. I

think it's a really helpful tool just to bring these

disparities to light so that the law enforcement

agencies of Illinois can make informed improvements

in their agency.

Kirill Eremenko: Yeah. Totally, totally agree. We're getting close to the

end of the podcast. I want to kind of leave this thought

with our listeners, a quote that you mentioned in your

talk. I don't know if you actually had this thought

written down, but it came out really well and you said,

"It's hard to fix problems when you don't know what

the problems are and it's hard to know what the

problems are if you don't have the data."


Kirill Eremenko: I think that was really cool. So, in general racial bias is

something we want to fix and it's a problem, right? But

you can't really know what's ... sometimes these things

happen, sometimes we don't know the details of these

things. You can't know the problem in full unless you


actually go and analyze the data, which I think you've

done quite successfully with this project of yours.

Mollie Pettit: Thank you.

Kirill Eremenko: The Illinois traffic stops. Do you have any plans on

doing any more similar projects where, you know, like

PET projects where you help organizations that need,

that need to use data to do good in the world?

Mollie Pettit: Yeah. Honestly, I would love that. I would love if I

could spend the majority of my time on projects like

this. I don't know that I will ever be able to be

spending all of my time on projects like this, because

they don't always pay. This was mostly a volunteer

project. It's something that I just really wanted to do. It

started out kind of on the side and at some point I

decided to take a break from work and just focus on it

for a month and finish it.

Kirill Eremenko: So, you obviously did a very successful project with

this Illinois traffic stops initiative and I'm sure it will

help lots of people. Do you have any plans on doing

more projects like that where you help organizations

that use data and data science for good?

Mollie Pettit: Yes. Ideally, that's something I would really love to do.

So, there's first of all this project could be expanded.

There's a lot more things that could be added to the

site and more things that could be dug into. But

additionally outside of that, really wanting to do as

much of this kind of work as possible. In fact, the

people that I worked on this particular project with, if

you do go to illinoistrafficstops.com and go to the

bottom, you'll see that there's a little section, a little


support section basically detailing how a lot of

volunteer hours have gone into creating this. Despite

wanting to do it full time, you know, wallets don't

always allow that. So, there's a place where if people

want to contribute to the continuation of this project

as well as other social good projects, they can donate

with that link. Anything donated will only go to

basically pay for the creation of more projects either

this one or similar that are all social good focused.

Kirill Eremenko: Fantastic. I commend you guys on that. That's an

amazing idea. In fact, I'll be one of the first people to

donate. I, honestly, this is one of the first things I'm

going to do after this podcast. I often, like, I want to

help in the world, but oftentimes I kind of like stop,

because I hear stories that with a lot of organizations

that you donate to, you don't know where the money's

going. You don't know if it's going towards the admin

or is it going somewhere else. You know, in certain

countries it might be going in exactly the opposite

direction than what you think. But if these little

initiatives, little projects that are run by people that I

personally know, I know that this is going to be used

for good that is going to actually help contribute to the

world. So, thank you so much for doing that.

Mollie Pettit: Yeah, exactly.

Kirill Eremenko: You have me on board with that already.

Mollie Pettit: Fabulous.

Kirill Eremenko: Awesome. Okay. Well, Mollie, thank you so much for

coming today on the show. Being fantastic. I loved

your talk. I loved our conversation today. Before I let


you go, where would you say our listeners can best

find you, get in touch, follow you and your amazing

visualizations and projects?

Mollie Pettit: Yeah. So, the best place to find and follow and interact

me would probably be Twitter. My handle is Mollzmp,

which is M-0-L-L-Z-M-P.

Kirill Eremenko: Uh-huh. (affirmative). Mollzmp. Okay. Gotcha.

Mollie Pettit: Yep.

Kirill Eremenko: Okay. We will include that in the show notes and yeah.

We have one more final question for you today. What's

your favorite book that you can recommend to our

listeners to help them become better at their careers?

Mollie Pettit: I have a book in mind. It's D3 specific. So, if you're out

there listening and D3 is something that you are

interested in learning about and trying your hand at,

my favorite book to recommend people for getting into

learning is called Interactive Data Visualization for the

Web and that is by Scott Murray.

Kirill Eremenko: So there you go, ladies and gentlemen. Interactive

Data Visualization for the Web is your book

recommended by Mollie. Mollie, thanks again for

coming on the show today. Had a fantastic time with

you and I'm sure lots of listeners will get amazing

insights from our today's chat. Thanks so much.

Mollie Pettit: Well, thank you.

Kirill Eremenko: So, there you have it. That was Mollie Pettit and I hope

you enjoyed this episode as much as I did. Lots of

great energy, lots of laughs, and lots of interesting that


we talked about such as D3, the case study about

Illinois traffic stops, and using data science for good.

So, make sure to check out the illinoistrafficstops.com

website where you can play on with this case study

and actually see the interactivity of D3 in action on the

website. Also, if you can afford it, then at the bottom

there's a link where you can support Mollie's effort of

doing data science for good. I think that's a great way

to give back to the community. These projects often

are very helpful, but there's no funding for them. We

can all help like that. Or on the other hand you can

use your own data science skills to create your own

data science for good project or participate in one and

look out for those. I think it's a wonderful, fantastic

thing. A fantastic way of giving back to the world

through your data science skills or if you don't have

the time, through supporting others.

Kirill Eremenko: Also, Mollie asked me to mention that she has a Meet

Up in Chicago. So, if you're in Chicago and you want

to go to Mollie's Meet Up, then you can find the link to

this Meet Up in the show notes or you can go to

meetup.com and look for Chicago Data Viz

Community. Otherwise all of the links for this episode

will be in the show notes at

www.superdatascience.com/231. That's

superdatascience.com/231. You can get the link to the

Meet Up in Chicago, the Illinois traffic stop URL, the

Twitter handle for Mollie's Twitter. Make sure to follow

her there. And all the other items that we mentioned in

this podcast.


Kirill Eremenko: On that note, thanks so much for being here. I look

forward to seeing you back here next time and until

then, happy analyzing.


Date post:	12-Mar-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times