SDS PODCAST
EPISODE 354:
DJ PATIL ON
HARNESSING
THE POWER
OF DATA SCIENCE
COMMUNITY
Kirill Eremenko: This is episode number 355 with Ex-Chief Data
Scientist of the United States, DJ Patil.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur. And each week we bring you inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today
and now let's make the complex simple.
Kirill Eremenko: Welcome back to the SuperDataScience podcast
everybody. Super excited to have you back here on this
show and I am very inspired today. Why's that? Well,
because 15 minutes ago I got off the phone with DJ
Patil and we recorded the episode you're about to hear.
It's a super exciting episode with one of the most
famous if not the most well-known person in the space
of data science. So if you don't know, if you haven't
heard of DJ Patil, he is the person who co-authored
The Harvard Business Review article called Data
Scientist is the Sexiest Profession of the 21st Century.
If you haven't read it, read it. Make sure to check it
out. That gave rise to the popularity of data science.
He also coined the term data scientist. That came
originally from when he was working at LinkedIn. He
didn't know what to call himself. Him and Jeff
Hammerbacher didn't know what to call themselves
and they came up with data scientists. That's why we
have data science right now. And also he is the ex-
chief data scientist of the United States. How amazing
is that?
Kirill Eremenko: And on top of that, in this episode, we covered of some
very interesting and important topics. So here are a
couple of examples of what you will hear about: data
privacy and ethics, data in healthcare and biotech,
DJ's work at the White House and some of his most
memorable moments while he was there, his current
mission at Devoted Health and what they're doing,
how much progress they're making, the future of data
science, data science for good versus data science for
bad or evil, and data science communities. So those
are just a couple topics we're going to cover of. I'm
sure you're going to love this chat, this conversation,
and by the end of it, you're going to be super-inspired
about data science and your career in the field. So
without further ado, I bring to you the ex-chief data
scientist of the United States, DJ Patil.
Kirill Eremenko: Welcome back to the SuperDataScience podcast,
everybody. Super excited to have you back here on
board, and today's guest is none other than DJ Patil.
DJ, how are you going? Welcome to the show.
DJ Patil: Thanks for having me.
Kirill Eremenko: Super excited to have you here. Everybody's heard
about you as the person who started this whole data
science movement. It's a huge honor to have you so
probably like first question would be how does it feel to
be the person who created data science as we know it
right now?
DJ Patil: Well, I think the way I really think about data science
and the movement is this is a community. A lot of
times when people look to a particular person and say
that person started it or this person really is the most
seminal person behind it, I think what a better way to
think about it is what I got to do was help steer the
community toward a set of problems. But the thing
that is probably more interesting than anything else is
that this community has been going on a long time. If
we go back far enough, you get the Mayans and the
Indians and Chinese astrologers and astronomers. You
move to Kepler and Copernicus doing really amazing
things with data and very difficult calculations. You
get to people like George Washington who was doing
cartography and maybe should be argued as the first
real chief data scientist of the United States.
DJ Patil: And we've had a movement and that movement has
really manifested in the next wave of people who have
unbridled enthusiasm to use data, have incredible
technical skills, we have computational power that
we've never had the likes of that are easily accessible
through the cloud, we have storage and we have the
ability to collaborate just like we are miles apart. And
so that has manifested in ways that we can apply
technology in approaches that we had not thought
before. And so I really think of it as I've had the
opportunity to be more of a community organizer than
anything around saying this is how data science
should be.
DJ Patil: I think what we can say, if anything, is there are
certain things that data science, I would hope, aren't,
then we can talk about some of those and what I think
some of the challenges are if we do data science in the
wrong way and the impact. But I also think that we
should also not get to a place where we are so
regimented that we say data science is this one narrow
thing. We should really think about data science as a
team sport and we all have different roles to play that
use data to make really fascinating good things
happen.
Kirill Eremenko: Yeah. I love it. I was reading an interview you had with
The Observer and you mentioned there that you are
generally opposed to trying to define data science too
rigorously. But it would be interesting to hear your
thoughts on what you just mentioned. What data
science is not. What are your comments there?
DJ Patil: Yeah, so I think the first thing that when we say what
data science is not is the question about, in the most
extreme form, is when should we not be using data or
data in ways that possibly cause harm. And there is a
number of ways to look at this and this is why we've
been so active, people like myself, Hilary Mason and
Mike Loukides. We actually published a book on this
around data science and ethics. It's a small ebook. We
made it free for everybody because we want everyone
to kind of take away the ideas around how do we
actually start having this conversation about the
ethical use of data. When we think about it historically
and we ask where some of the most egregious human
atrocities have taken place... Take the Nazis. One of
the most egregious cases is the phone book. The
phone book is a database.
DJ Patil: And so as we head into this next wave of technology
being able to do things, what does it look like where we
might possibly do harm to people? And it's very easy
for us to say, "Oh, no, that's not going to happen
again." But remember, we've also had a history of
biomedical research, particularly in the United States
as well as the Western World, where we've had issues
like Henrietta Lacks and Tuskegee syphilis
experiments where we've had breaches of the way we
do things in an ethical manner. Right now we're faced,
in this time right now, about how do we use
technology to ensure that they are implemented with
the values we would like? There's conversations where
people are using data that is scraped from websites
like social media, Instagram and Facebook, to create
the basis for facial recognition technology for police
departments and maybe parts of the government. Is
that acceptable? Should we allow people to do that?
DJ Patil: When we think about voting and using data to
disenfranchise voters, that's a bad problem, in my
mind, for what data scientists should be. We haven't
figure out how to self-police. Other communities have
figured out how to self-police. If somebody works on
genomic research and it isn't considered acceptable,
the community knows how to address that situation
and then there's legal ramifications on top of that. We
have to get to the place as a community of asking
ourselves what is acceptable. And the specific way that
that was actually implemented as the US chief data
scientist is a mentioned statement of that role, and
that is to responsibly unleash the power of data to
benefit all Americans.
DJ Patil: And I think data scientists should take this to the
statement of how do we responsibly unleash the power
of data to benefit everyone? Just because we can,
doesn't mean we should. That's part of the
responsibility. I think we should extend that to
everybody to make sure we are using data to empower
every single person.
Kirill Eremenko: That's a very interesting and valid point of view and
here I'd like to refer to what I mentioned before the
podcast. I was listening to your open hearing on
developing and deploying next-generation technologies
to, I think it was to the Congress? I don't know enough
about politics to understand the dynamics there. One
of the concerns that came up there was I noticed a lot
of the questions were around China, around how the
US is competing with China for domination in the
space of artificial intelligence and other exponential
technologies.
Kirill Eremenko: While these ethical considerations are extremely
important, they are crucial, one of the issues that I
can see and what I also heard in this open hearing is
that they are usually limited to one single jurisdiction
or certain set of countries, maybe like the Western
World or America or Europe or China and so on. And
so what are your comments on imposing these ethical
and certain restrictions on development of data
science, while absolutely important, can inevitably
slow down or inhibit the rate of progress that the US
or the Western World will have as opposed to what will
happen in China where they have their own ethical
considerations which might be very different and they
can get much further ahead. What kind of
consequences can that carry?
DJ Patil: Great question. One of the reasons I think everyone is
fixated on China is largely due to how aggressively
they are investing. And it gets to a place where we can
easily point the finger and say, "China is doing all this
stuff and so we should slow them down." I think the
better way to look at it is why aren't we investing as
aggressively in our own societies to continue to keep
up our pace and our competitive edge? We have
dropped the amount of funding that we have
supported our basic sciences every year. We continue
to have questions, even right now, around the Centers
for Disease Control, the CDC, about funding. This
current administration wants to cut the funding to
those groups and yet we're seeing the ramifications
when we don't fund research as well as these groups.
And that's not just a US, that's the Western World.
China is increasing the funding. We are entering a
space where, within the next 30 years, we will no
longer have singular dominance that we've seen.
DJ Patil: As that develops, one of the questions that's inherent
is values and what does it look like with western
values? Part of the reason why western values are
important is it's about democratic process. But when
we think about science and we think about areas like
cloning humans, we have a framework that has been
developed through a lot of hardship. Much of that has
been in Europe through the Nuremberg trials that
turned into Nuremberg code that turned into bioethics
after WWII. And we've realized that certain things and
experimenting on humans has, not only negative
repercussions for society, but it takes away not only
human dignity but it actually is a road down which
you get into all sorts of thorny issues that we have
realized that are just not acceptable for people when
they don't have consent.
DJ Patil: In China, it doesn't have to be China, it can be other
countries with totalitarian regimes, that you run into
the same aspects. So when we think about the power
that is about to be unleashed through technology and
data, we have to ensure that that technology works for
us rather than against us. And when we look at some
of the technology deployments that are being done
where you have groups that are being persecuted
through the use of technology, facial recognition or
other things, that's a problem and we have to figure
out how, as a society, we are going to make sure that
the technology and the focus of how we implement
those technologies is really on the side of democratic
values.
Kirill Eremenko: Mm-hmm (affirmative) Gotcha.
Kirill Eremenko: Hey everybody, I hope you are enjoying this amazing
episode with DJ Patil. This is a quick announcement
and we'll get right back to it. We are hiring at
SuperDataScience. With the recent pandemic and the
corona virus we all know how a lot of people have lost
their jobs and their source of income, so hopefully this
will be a breath of fresh air for some people out there.
We are a 100% remote team, we all work online, we
continue to grow and I've just, literally just published
10 new positions at SuperDataScience, which might be
suitable to you.
Kirill Eremenko: And even if they are not suitable to you, check them
out, they are at superdatascience.com/careers, check
them out and send them to somebody you know who
may have been displaced by this pandemic and all the
lockdowns, who may have lost their job and source of
income. You could change their life. We are creating
opportunities for people to do their best work, to
contribute, to create amazing products, to create
amazing experiences for people studying data science.
Kirill Eremenko: So here are some of the positions that have just been
released: VP of Marketing, Product Designer, General
Manager, VP of Sales, Junior Media Creator, Sales
Representative, B2B events Sales Representative,
Event Marketer, B2B Sales Representative and
Marketing Strategist. And those are just some of the
intial positions that we have available right now. More
will come soon, so keep an eye out at
superdatascience.com/careers. Maybe we'll even post
a data scientist position in the near future.
Kirill Eremenko: But even if none of these are relevant to you
specifically, if you know somebody who's in marketing,
or in sales, or who's a great general manager, who's
great at creating amazing products in education and
learning experiences, or who's great at running events
or somebody who is amazing at creating animated
videos, if you know any of these people, any people
with the right talents and skills, please send them this
link, superdatascience.com/careers. This could
change their life or career especially in these dificult
times. Thank you very much for your help and let's get
back to the episode with DJ Patil.
Kirill Eremenko: One of your co-panelists on this open hearing, Mr.
Chris Darby from In-Q-Tel, he had an interesting
comment. He said, "All roads lead to two places..." in
technology, I'm assuming, "... microelectronics and
biotechnology." And data science is at the core of all
technologies right now, in my perspective, because it's
data, right? And then he proceeded to quote a
scientist, as he mentioned, a scientist from China, and
he said that according to the scientist, the quote was,
"The Europeans won the industrial revolution, the
Americans won the IT revolution, and in China, we're
going to win the bio-revolution." What are your
thoughts on that and how can America and the
Western World compete with China in the space of the
bio-revolution?
DJ Patil: So I think it's very easy to try to just highlight China
as the bad guy in this kind of situation. And it's more
useful, I think, to ask who are we really competing
against? To me, we're competing against cancer. We're
competing against the pandemic that is already here.
We're going to have far too many people that are going
to be killed by this disease because we weren't able to
use data efficiently to know where it is, to test
appropriately, and develop strategies to get ahead of
the typical infection curve that is the exponential rate
of infections. So when I look at that, I look at what's
holding back a cure. Well, one, we have the best data
sets right now in the United States and across Europe
because we have not only genetic diversity but we have
great electronic medical records. The problem is the
data is fragmented over thousands of databases and
there's no ability to easily pull that data together.
DJ Patil: Earlier this week, new rules were passed by the
administration to actually make sure that the data
remains a patient's data and you can take your data
and move it, and that includes to researchers. The
reason that's so powerful is, if we're able to bring that
data together and you have fantastic data scientists
working on that data, maybe there's cures already out
there we just haven't realized are cures. And when we
partner with epidemiologists, researchers, and the
traditional drug discovery units, maybe we'll find
something that could be used from off-label use. It's
not already used for one thing but if we use it there it's
going to have fantastic impact. Maybe it's going to help
us identify new forms of disease vectors that we hadn't
thought about and then when we look at them we'll go,
"Oh, wow. How amazing is it that we now have this
targeted population that if we find a cure for, we're
going to give them disproportionate value added for
life."
DJ Patil: We look at something like ALS, Lou Gehrig's disease.
We look at Alzheimer's. We look at all these things.
These diseases don't care what race you are. They
don't care where you live. These are problems of a
species. What I look at as a country, and this was why
it was so important that when President Obama
launched the Precision Medicine Initiative and put Joe
Biden in charge of the Cancer Moonshot, was that we
have to put data together along with all sorts of other
things, microelectronics, biotech, new sensor designs,
all these things together to find new ways to think
about these diseases. We cannot be thinking about
them in the ways of the previous few decades. Central
to that thesis are going to be the data scientists. The
data scientists are going to be the ones that are going
to unlock this. Whether you call them a data scientist,
you call them an epidemiologist, that person who is
looking at data right now, that person is going to be
key for helping us get ahead of this pandemic that is
here now called COVID-19.
Kirill Eremenko: Yeah, that's definitely a big problem. I saw recently
that Johns Hopkins University released data to the
public that you can go and analyze about COVID-19.
As you say, maybe somebody will come up with a
solution along the way.
DJ Patil: Well, this is why transparency of data is so critical.
Right now, we don't have great transparency of data
between countries. China has been far too slow in
releasing the data. That was true during SARS. We've
seen this also during MERS that there wasn't enough
data sharing. And the Ebola incident, one of the most
powerful things that was used to help get ahead of the
Ebola incident was Google Docs because people would
share their data as spreadsheets and you didn't know
when that spreadsheet was last really updated and by
who. So having real-time and somebody filling in the
data that they saw in their town and updating it daily
gave everyone a clear indicator of where the disease
was moving and propagating and allowed us to get
infrastructure in place to make sure that you could
start helping people.
DJ Patil: That transparency is not happening fast enough right
now in the United States. For example, where are the
total number of tests? How many are administered?
How many are positive? All of this, if there was very
aggressive data sharing across a federal system, across
the states, across the cities, across the towns, we'd
have a much better realistic picture. And then we
could start developing strategies very quickly. We
could learn from the Chinese because they've dealt
with this first. We could learn from the Italians. And
then we could share with countries that are going to
be impacted that don't have the quality of healthcare
system that we do so the number of deaths in those
societies is going to be substantially higher. We could
save a lot more lives if we had people just doing
something very simple with just data sharing.
DJ Patil: This is one of the things that's really important that I
have found in my experience around these things is we
often look to the AI solution right away. A lot of times,
we could just go with the tiny, bare-bones, just share
some data and you'll find a huge amount of lift in the
problem. That's not to say we shouldn't do the AI
solution. I'm not saying that at all. I'm saying that let's
focus just on some of the basics. Can I give you a
concrete example?
Kirill Eremenko: Sure. Sure.
DJ Patil: In Miami-Dade, Florida, they realize that we have, as
many places in the United States, is that we have this
problem of too many people in jail. And one of the root
causes of that is mental health issues. People with
mental health issues get taken to jail rather than
actually getting to the treatment centers that they
should. Same with drug addiction. So if you see a
person who's constantly getting picked up for mental
health issues, why do we keep taking them to jail?
That's kind of crazy.
Kirill Eremenko: Mm-hmm (affirmative)
DJ Patil: So, instead, what they decided is they decided to say
let's share data between our public health system and
our criminal justice system but in a super-secure way
that respects privacy. The data can only flow from
criminal justice to the health system, not the other
way around. And when somebody gets picked up, they
check in with the public health system and if they see
that person they don't take them to jail. It cost
something like a million, million and a half dollars, to
get this going. In the first year alone it saved 10 million
dollars.
Kirill Eremenko: Wow.
DJ Patil: But the real value is it closed a full jail. Then a little
later on they closed a second jail. All that was done
there is sharing data. It's the spreadsheet. It's literally
a spreadsheet, now with a lot of safeguards in place,
but a spreadsheet.
Kirill Eremenko: Very interesting. Wow. Yeah, so that really shows
importance of having this role in the government. It
was very exciting to hear when you got the chief data
scientist position which was created for the first time
by Obama and you were the first chief data scientist
for the US. I think it's very important. Is there a chief
data scientist at the moment in the US?
DJ Patil: They are looking for one, is what they tell me. So
maybe somebody here will apply. As much as I might
be harsh on this administration, there also have been
a number of really good things this administration has
done around data. For example, recently President
Trump did sign an executive order that basically asked
a reevaluation of how we look at organ donations. The
fact, right now, is too many people in this country go
without an organ when they could easily receive a
kidney or a liver or a heart or something that would
give them an incredible number of days left in their
lives. They would be able to take that.
DJ Patil: But the reason that happens is there aren't any quality
measures that actually assess when are people doing a
good job of actually making sure those organs get to
the right person. And so, as a result, many times these
groups that actually have the responsibility to do this,
they let the organs expire. They're left in the body for
too long. They're not picked up in time. They're
mishandled. And so a person who's waiting on the
operating table to receive their kidney doesn't get it.
And that's just a tragedy when it could be so easy to
do.
DJ Patil: We're not talking, again, any sophisticated AI. We're
talking about just measuring something and having a
dashboard that allows us to ask ourselves are we
doing a good job or not, and continuously improve it.
Kirill Eremenko: Gotcha. No. Totally agree. Totally agree. In the interest
of time, let's proceed to our little experiment that we
did on LinkedIn asking people for questions. So, as
you saw, there are dozens of questions posted for you
from people. Very excited to hear from you. Maybe let's
have a few. What is your favorite question out of the
ones that you saw on LinkedIn?
DJ Patil: Oh, boy. I can't pull it up here simultaneously.
Kirill Eremenko: Oh, okay. No worries. One of my favorite questions
was from Akshey who asked, "What do you think
makes a good data scientist and how do you approach
any data science problem?"
DJ Patil: The thing that I have found, Akshey, time and time
again is the best data scientists have curiosity. They're
the people that just have this ability to go, "What
about this? What about this? What about this?" And
the question I used to literally give back in the days of
LinkedIn is I used to say, "Pretend you had all of
LinkedIn's data. What would you be interested in
knowing? What would be the first thing you would
want to know?" And you'd be surprised how many
people would just stare at me blankly. The best data
scientists, they would just start and they'd have idea
after idea after idea. And they would just keep going
until I was like, "Okay. Okay. We're good." The best
ones, oftentimes, would be like, "Have you thought
about this or this?" and I'd be like, "Oh my gosh, no. I
haven't." Or they'd say, "What if you combined data
from LinkedIn with this other data set? Have you
thought about that? And what about this? Have you
tried this? Or could we turn this into a product that
would have value this way?"
DJ Patil: That curiosity plus passion is something that you
develop especially at the intersection of multi-
disciplinary sciences. So, myself, I was working in
non-linear dynamics. Was doing math but was also
doing a tremendous amount of weather data. And so
you kind of have to sit at these intersections and
you're just trying to find data sets. You're trying to
figure out things. What I tell a lot of data scientists is
you need to play with a lot of data sets to just develop
intuition, to develop curiosity. Be very fast at plotting
something, trying something, getting a sense of what's
going on in the data. For me, sometimes when I get a
data set, the first thing I love to do is just kind of tab
through the data and just get a sense. There's this
moment like if you use Unix or Linux you're using the
more command and you're just seeing what's in this
file. Are there characters? Are there just numbers? Are
the numbers decimal? You just let it blur and you just
get a sense of what's in there. It just starts to expose.
DJ Patil: And then I'm trying to find lots of ways to just visualize
it. Visualization, for me, oftentimes, is just histograms
to get a sense of what's in this and then trying to go,
"What if? What about this? What about that?" The
more you can develop that the better I think you're
going to be at being really fast at helping find solutions
for another person.
Kirill Eremenko: Gotcha. Curiosity. Wonderful answer. I love it. Suman
asks, "What are the new challenges where data science
is heading towards? What is your vision for data
science in the next five years?"
DJ Patil: Wow, Suman, great question. So the first is I think
there are so many areas I'm so excited about data
science impacting. I think data scientists are one of
the new form of first responders. You know, when
there's an earthquake in a remote area of the world,
before people can even get in to help, first responders
now have the ability to look at satellite imagery, drone
footage, being able to tell which roads are washed out
or bridges have been wiped out. If it's a hurricane we
could use drones plus just a little bit of computer
vision to actually tell which houses people are on.
Could we then route boats to quickly get to all those
people just like we'd use Uber or Lyft or UPS uses
routing algorithms?
DJ Patil: In terms of the biological fields of trying to understand
how disease manifests using large data sets to find
that basis like the Precision Medicine Initiative. I think
about the world of understanding new chemicals and
particularly about material sciences and using data
science to help understand how to get better
manufacturing. That's a fantastic area.
DJ Patil: I look at the world of how do we create tailored
education and help people learn faster? Myself, I was
such a bad student. I think tailored education
would've really helped someone like me. I could go on
and on. If there's one thing I think that I'm most
excited about for the data science field over the next
five years is this is central to the success of every
institution and every organization from nonprofit to
for-profit, the government, everybody will have to have
some notion of data. And everybody that's being
trained in undergraduate curriculum will have some
element of data literacy.
Kirill Eremenko: Mm-hmm (affirmative) Absolutely. Absolutely. Like
Andrew Ng says, "AI is the new electricity." Right? I
can't even think of a single business that doesn't use
electricity right now, whereas 100 years ago I think the
residential electrification of the US was around 50%.
So it's massive.
Kirill Eremenko: Okay, next one is a fun one from Abhishek. "Which
was your most memorable work memory when you
were at the White House?"
DJ Patil: Oh, boy. What's my most memorable? The White
House is phenomenal in the way that there are
moments where things are incredibly astonishing
positive and astonishingly sad. That's just the
reflection of how complex the world is.
Kirill Eremenko: For example, what do you mean?
DJ Patil: On a positive, I remember so many positive ones. One
that always stands out in my mind was the day the
President was flying back from being in the south
where he was doing a memorial for a number of people
who were shot in a church. And he was flying back but
that was the same day the Supreme Court ruled that
anybody can get married to whoever they like because
love is love. We put the colors on the White House as a
rainbow. And I remember the President's helicopter
coming in from such a tragedy, circling around, and
we were thinking about the juxtaposition of such
vicious hatred at one moment that the President is
having to console people over and the next moment
having these amazing crowds there to celebrate such a
phenomenal activity.
DJ Patil: So many times meeting with people who have rare
diseases and are looking for a hope and realizing that
they cannot wait. They can't wait for bureaucracy to
figure out how this is to work. They need the data in
people's hands who are going to figure out how to find
this cure for something that their loved one has or
they have. Time is so essential. What data science is,
is it is an accelerant to solutions. If we're not careful, it
is an accelerant to entropy. It can cause incredible
harm. But when used and wielded correctly, it is an
accelerant to help to deliver solutions very effectively.
Kirill Eremenko: Wonderful. Thank you. Siddharth asks a question.
Something we touched on already, in this podcast, I
think. Maybe we could elaborate. Quite a long-winded
question but I think it's an important one. "Data
science seems to enforce centralized power rather than
decentralized power in multiple contexts. The best
consumer company's driven by data science are
monopolous like Facebook and Amazon. The best
enterprise data science companies are like Palantir
and Databricks which primarily serve the largest
companies in the world as their customers. Data
seems to do much more help to the Chinese
surveillance state than it helps democratize and
improve the way we vote. How can we use data science
as an equalizing force for society rather than a
centralizing force? Is that even possible?"
DJ Patil: Yeah, so one of the most important things that just
happened this last week is the belief that a patient's
data is theirs. It doesn't belong to the hospital. It
doesn't belong to the doctor. It belongs to the actual
human. For quite some time, the hospitals and
physicians have believed that they own the data. You
should not. Now, it is codified that it's your data and
you have access to it. If you want to move it? By all
means, you should be able to get access to it and you
should be able to take it to where you want. You want
to donate it? Great. Good for you. Donate it. It's giving
you control. That's part one.
DJ Patil: Part two is what we have to ensure is that there's
transparency of data. You have to be able to access it.
We still don't have enough reporting requirements for
people to know what data is being collected? Who's got
my data? Who sold my data? We're starting to see
elements of that in different policies, some of which are
in Europe under what's called GDPR and in California
under California Consumer Protection Acts, CCPA. But
we need more of that. Right now there are many data
brokers who can suck up data and use it without you
knowing it. Some of those data sets have real
implications for the population. For example, data sets
that are collected and used in loans has been shown to
actually impact negatively the black population. How
do we ensure that safeguard? We need that form of
watchdog. Somebody who's actually looking over the
shoulders for people to actually make sure that people
are using data in an acceptable way for society.
DJ Patil: The other part here is how do we train data scientists?
As we go forward and we think about the companies
and we think about who is there, what's fascinating is
we always talk about data interviews but we never
actually talk about giving people an ethics interview
around data. So one of the things that anybody who
interviews with me, you'll go through an ethics
interview with me because I view ethics as part of
asking a question around cultural fit. If we can't see
eye to eye on how we think about the importance of
ethical issues, then how do we deal with it?
DJ Patil: I'll give everyone an example of one because it's not
hard. Supposed we're working on a problem and we
know we're not supposed to use race but we find a
proxy for race. We also find that if we use this proxy
for race, we're going to help a lot of people. What's
your next step?
Kirill Eremenko: Oh. That's a tough... What answers do you normally
get to that?
DJ Patil: Well, I think the real answer that's interesting is, first,
as an organization, what safeguards do you have to
make sure I have the resources to be able to address
this problem correctly? Is the organization prepared
because what if I don't know the answer? How do we
adjudicate this? Who do I ask? Do we culturally have
this? Everybody that is interviewing at a company
should ask their company how do you handle ethical
issues around data and technology? If everyone asked
that question when they interviewed Facebook or
Google or any of the other companies that were called
out, you would start to see a material change in their
approach.
Kirill Eremenko: Yeah, wow. It's so tempting not to ask, right? You just
want the job. You just want the high salary. You have
to put the global best interest, the greater good, ahead
of yourself in order to ask that question.
DJ Patil: This is a thing that we have to grapple with as a
community. We want the salaries. We want the power.
We want the prestige. Where is responsibility in that
conversation? To be empowered by society to do things
with data and technology means that we have to lead
the way, also, on responsibility. We should be leading
from the front. We shouldn't have to have civic groups
push us and say, "Have you thought about this or
this? What about these issues?" We shouldn't have
regulators saying, "Hey, how are you doing this?" We
should be going to them and saying, "Hey, we have the
following concerns. We're not sure we have all the right
answers. What should the answers be? Can we work
together to figure it out?"
DJ Patil: We need to push society to understand the
implications of what we are developing, the positive,
the negative. Otherwise, if we do not, what will happen
is that data sets will be harder to access. There will be
more restrictions on it. Progress will slow. That also
means that people won't have as many jobs. But more
importantly than all of that is that somebody who
needs a cure, somebody who needs help in a disaster,
somebody who is relying on a technological
breakthrough to happen to improve their quality of life
or a loved one's life, will not get the solution in the
time they need.
Kirill Eremenko: Gotcha. Well, I can feel how you're passionate about
this. Now it makes sense to me how or why from
working at the White House and doing public service
you moved into the healthcare space and doing data
there.
DJ Patil: Yeah, well, the reason I moved into healthcare is, a big
part of my portfolio that President Obama had set up
intentionally was healthcare. And I think rightly so
because he realized that people who are typically in
technology don't work on national security problems or
something else. We don't often gravitate to healthcare.
Or that people have been working in healthcare for a
while but they haven't had access to some of the newer
techniques that we really pioneered in the consumer
and enterprise companies. So what happens if we get
people together to do that? That genesis and looking at
that left us with the question that we had a chance
when we left the administration to ask well, what are
we going to spend our time doing?
DJ Patil: Well, if you look at that, one of the greatest challenges
that we have is how do we ensure that people have
access to the care they need, they want, they deserve.
And so we said the only way that this is going to
happen is if we actually show the way forward in what
we believe is true. And so we said we were going to do
this when the only way to actually make it work, in
our model, is through a corporate enterprise. And so
we started Devoted Health and the mission is to build
a healthcare system that takes care of everyone like
their own family.
DJ Patil: Literally, we have something that we call the prime
directive which is if you're not sure what decision to
make, close your eyes, visualize literally in front of you
the person that you think of the most, your loved one.
What's the decision that you would want to make for
them? And when you have that, run it by other people
to make sure it's legal, it's safe, it doesn't have
downsides. Then take the action. In healthcare, time is
of the essence and so we have to build those solutions.
We have to build those technologies.
DJ Patil: And parts of it are already proving. We find everyday
somebody who is in a situation where our job is to
figure out how to unstick something in the healthcare
system for them. And it's not rocket science. A lot of
times it's just finding out something very obvious and
trying to figure out how do they actually get an answer
from somebody? Why do they have a drug interaction?
Why have they been prescribed drugs that are going to
cause some kind of interaction? Has anybody looked?
Has anybody double checked with them? Those simple
things.
Kirill Eremenko: Wow. Sounds like you're making massive progress
with Devoted Health and I wish that it goes really well
and we all see results, especially-
DJ Patil: We hope so. It's not a winner-take-all market. We're
excited that more people are coming to work on these
problems. We need more people in this country to
work on these things. If we have more people working
on these problems together, the we wins. It's what is
behind when we say we, the people. We, the people,
isn't just a whole bunch of individuals. It's we as
collective people, as citizens, as community, as
companies, as nonprofits, as religious groups. When
we all come together against a problem and we decided
people should have not only access to healthcare, they
should have access to good quality healthcare. And it
should be affordable. Then we're going to see the
change happen.
Kirill Eremenko: Yeah. Gotcha. Amazing to hear this trajectory and the
progress that's being made. I know you have to go, DJ.
DJ Patil: Can I give one more thing?
Kirill Eremenko: Of course.
DJ Patil: Yep. What I would tell people to think about a lot of
times when we're thinking about data science and
we're thinking about the problems that we pick. As
data scientists, we get to pick our problems that we
want to work on these days. Ask yourself, what is
going to move the needle the most for your children
and your children's children? Because we're in that
inflection point as a society that if we pick the
problems that move the needle for our children and
our children's children, we will select a set of problems
that will deliver outside value for decades to come.
DJ Patil: When that impact manifests and we look back at our
careers and we look back at what we've done and how
many people we've helped along the way, then we can
rest easy. If we look back and we only say, "Gee, that
only benefited me." what good is that at the end of the
day? It doesn't matter if you wrote the fastest
algorithm in the world, you're traveling alone. And
that's a sad lonely place you could be and it's a wasted
set of skills, in my opinion, because everybody that is
working in the data science field has such phenomenal
opportunity to have an impact now. And society
cannot wait for the impact that every one of you can
provide.
Kirill Eremenko: So much leverage. Data science provides so much
leverage.
DJ Patil: It's leverage and that's why we have to do it as a team.
It is a team sport and all of us have to be on that team
together collaboratively to make this happen. This is
why the community that you're putting together is so
important. Without that community, where are we
supposed to talk about these hard things? Where are
we supposed to have dialogue? Where are we supposed
to push each other? Where are we supposed to learn
from each other? We have to create those
communities. And it's not just one community, it's
going to be different kinds for different types. Who
knows where it's going to evolve? But without us as a
community, we're going to be struggling to actually be
on the right side of this equation over the long arc of
history.
Kirill Eremenko: Gotcha. Wow. Thank you very much, DJ. I think we
can wrap on that. I know you have to go but that was
very inspiring. I feel so inspired just listening to you
right now.
DJ Patil: Yeah. Well, thank you for everything you're doing for
the community. It's very much appreciated.
Kirill Eremenko: Thank you very much.
Kirill Eremenko: So there you have it, everybody. Thank you so much
for being here today and being part of the
SuperDataScience community. As you heard from DJ
Patil himself, communities in data science are ultra-
important because where are we going to discuss these
critical issues, ethical privacy, future of technology
issues that are on everybody's mind that are dictating
where this field, and where the world is going. Because
data is underpinning all technologies that are
revolutionizing the world and data science is the way
to deal with data. And on that, I hope you enjoyed this
episode. My personal favorite part was when DJ was
talking about the importance of doing data science not
just for yourself. Being in the field not just for the
purpose of benefitting yourself, but instead, thinking
about others. How you're impacting the world, the
communities around you, people around you?
Because, as data scientists, we have so much leverage
to create impact, in DJ's words, "it would be such a
waste of our skills to just think about ourselves and
not think about others." I found that very inspiring. I
hope you did, too.
Kirill Eremenko: And if you enjoyed this episode, I highly encourage you
to follow DJ on LinkedIn where he has over 700,000
followers as well as other social media. We're going to
include all of the relevant links in the show notes, as
always, and you can find them at
superdatascience.com/355. That's
superdatascience.com/355. And one thing I would like
to ask of you, if you did enjoy this episode, please
share it with your friends and colleagues. Let's spread
the word about data science and what missions we
have as data scientists across the community. If you
know a data scientist, if you know a data science
manager, data science leader, data science
practitioner, somebody who is getting into the field of
data science, send them this episode. It's very easy to
share, just send them the link
superdatascience.com/355. And on that note, my
friends, I really appreciate you being here today. Can't
wait to see you back here next time. And until then,
happy analyzing.