Data and Business Team Collaboration

Bruno Wu

Data Scientist @Move

Prompt:

We have two kinds of interfaces (1) technical (building models, turning them into APIs, embedding them into products and (2) business (translating business problems into data problems.) How can we repeatedly take outputs from models and translate them into value for the business through interventions, experiments, new product features? What tools do we need to create to do this again and again? (How do we not let get things “Lost in Translation”?)

I have to confess

Split / / Personality

Goals/values of business and technical interfaces

Towards a Common Framework

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing

Release v2… IMPACT

Goal:Increase Velocity

of the Vortex

Identify Stage-Transition Tasks (STTs)

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing


- Problem definitions / scoping

- Find / acquire data and labels

- Feature engineering /selection

- Algorithm training / selection

- Data engineering

- Testing / Optimization

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing


Dissecting Stage-Transition Tasks (STTs)

1. Problem definition/scoping

2. Find/acquire data and labels

3. Feature engineering/selection

4. Algorithm testing/selection

5. Testing and OptimizationSpeed

Automate

Standardize

Collaborate

Define Problem and Scope (Problem -> Data)

- Scoping document and data product roadmap wiki

Well-Defined Problems (Problem -> Data)

- Well-defined problems. Example: “Lookyloos”

- Ratio = Revenue Lost Per Lead Submitter (from shrinking lead form) ÷ Revenue Gained Per Lead Submitter (from ad impressions)

Not So Well-Defined Problems (Problem -> Data)

- Unfortunately, many business problems that are of value are also difficult to define:

- Who are Potential Sellers or Millennial on Realtor.com?

- What constitute similar neighborhoods?

- Increase collaboration with domain experts and expand options for data collection.

Acquiring Labels (Problem -> Data)

Collecting Labels is often the most critical yet difficult.

- Implicit v. explicit labels – user’s actions v surveys or registration

- Tools:

1. Guidelines and budget to create more API/products for automating labels generation (e.g. contents, widgets, games)

2. “Human-in-the-Loop” services. e.g. CrowdFlower, Amazon Mechanical Turks (e.g. evaluate relevancy for neighborhood, recommendations and image tagging)

Collecting Labels

Acquiring Features (Problem -> Data)

Data Enrichment products or services

- Develop data acquisition strategy / guidelines to allow data scientists subscribing for increasing feature space

- Tools:

1. Census data seems to be adding predictive power. Increase feature space. (e.g. PolicyMap)

2. Cross-device tracking services to link users across platforms to increase feature space.

Data Enrichment

Feature Engineering / Selection (Data -> Model)

- Increasing standardization and rigor of the feature engineering and selection process will help to speed things up:

- Tools: Google BigQuery/Python/R

Categorization and cataloguing

of features

Create derivative features

Systematic tests to measure

feature importance

Systematic procedures for

feature selection

Feature Engineering / Selection (Data -> Model)

- Need the most amount of time and input from domain experts.

- Collaboration is crucial for this task, otherwise data scientists are making educated guesses.

- Extensive collaboration on the feature engineering side from business team is still missing at the moment.

Categorization and cataloguing

of features

Create derivative features

Systematic tests to measure

feature importance

Systematic procedures for

feature selection

Algorithm Training / Selection (Data-> Model)

- Lost in Translation (Part 1)

- On the one hand: Models are well understood by data scientists but black-box to other stakeholders.

Algorithm Training / Selection (Data-> Model)

- Lost in Translation (Part 2)

- On the other hand: Need more open-mindedness from stakeholders

- A lot of times, effective models are not simple heuristics based on a strong signal but a mix of weak signals.

- “Life is messy” and “wisdom of the crowd” analogies.

- Embrace and be comfortable adapting nuances, not view simplification as paramount.

Testing and Optimization (Model -> Testing)

- Embrace quantity: AirBnB has ~100 tests running at any point in time. How?

- If tools and guidelines are sufficiently in place, we should aim to remove barriers for testing as much as possible.

Testing and Optimization (Model -> Testing)- This is beginning to happen @Move. - Possible guidelines and tools for improving

collaboration, standardization, automation for testing:- Not looking under the hood- Tracking tools for internal APIs- Require users to clarify hypothesis / create query to

measure the right metrics- Add experiment process to onboarding- Make experiment documentation more discoverable on

the wiki

Testing and Optimization (Model -> Testing)

- Currently, we utilize both third-party tools and proprietary API for testing.

- Tools: Optimizely, proprietary REST API

- Same tools / languages between data science, testing, and production helps to speed up experimentation and production: implement optimized versions only when needed. Reduce chances for “lost in translation” between experimentation, testing, and production.

Three Things To Remember

1. Embrace nuances – eliminates biases (e.g. confirmation bias, selection bias)

2. Set up systems/guidelines in order to remove barriers for frequent testing

3. Collaborate at critical points where domain experts can add the most value

Date post:	22-Jan-2018
Category:	Business
Upload:	apple
View:	137 times
Download:	4 times