+ All Categories
Home > Documents > Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8,...

Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8,...

Date post: 28-Dec-2015
Category:
Upload: horatio-woods
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Dataset Gap Analysis/Prioritization Plan Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8, 2012
Transcript

Dataset Gap Analysis/Prioritization Plan

Michelle Gierach, PO.DAAC Project Scientist

2012 PO.DAAC User Working Group (UWG) Meeting

March 7-8, 2012

Dataset Gap Analysis

Rec. 6. Do dataset gap analysis and create a report.

Rec. 19. PO.DAAC provide climatologies, anomalies, indices, and various dataset statistics for selected datasets.

Status:

• A dataset gap analysis document was created that details datasets currently available and those that will soon be available in the ocean community.

• Available climatologies, anomalies, and other value-added products (e.g., fluxes, frontal gradients) were included in the document. 

• Approx. 100 datasets were listed based upon input from the PO.DAAC User Working Group (UWG), Project Science Team (PST), and Data Engineers (DEs).

Future Plans:

• Request for information twice a year from the UWG, PST, DEs, and NASA science teams regarding additional datasets.

• After this initial phase of acquiring available datasets, the next step in FY13 will be to see where gaps still exist and work with the community to create additional climatologies, anomalies, and indicies or create them ourselves within PO.DAAC.

Dataset Gap Analysis Prioritization

Now that we have a document that lists ~100 datasets that would be of benefit to our users, how do we prioritize?

Past prioritization has been subjective and ad-hoc.

Need a system that is unbiased and provides quantitative measures to assess a dataset’s significance.

Dataset Lifecycle PhasesIdentify a Dataset of

Interest

Green-Light the Dataset

Tailor the Dataset Policy

Ingest the Dataset

Archive the Dataset

Register/Catalog the Dataset

Distribute the Dataset

Verify the Dataset

Rollout the Dataset

Maintain the Dataset

Prioritization Plan Flow Chart: NASA Process

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive

Comments/Thoughts/Questions?

Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Recommendation

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 1

Step 1a

Step 1b

Recommendation

Comments/Thoughts/Questions?

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 2

Recommendation

Comments/Thoughts/Questions?

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 3

Recommendation

Comments/Thoughts/Questions?

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 4

Recommendation

Comments/Thoughts/Questions?

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 1

Step 1a

Step 1b

Recommendation

Step 1: Dataset Identification/Classification

Approx. 100 datasets were identified within the oceanographic community.

Seven of these were classified as PO.DAAC obligations, including:

• L2B reprocessed QuikSCAT data (JPL)

• L2C QuikSCAT data (JPL)

• MEaSUREs CCMP-like product (Bourassa)

• GHRSST Pathfinder 5.2 SST

• GHRSST Global Ocean Sea Surface Temperature Multi Product Ensemble (GMPE)

• GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly

• GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly Reanalysis

First priority is given to datasets labeled as PO.DAAC obligations.

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Recommendation

Non-Obligated

Step 2

Decisional CriteriaCommunity Assessment:

Papers written / number of citations

# of Likes# of downloads/views

Technical Quality:QQC+Latency / GappinessAccuracy

Sampling issues?Caveats/known issues identified?

Processing:Has it been manipulated?Cal/Val state?Verification state?

Provenance:Maturity of

platform/instrument/sensorMaturity of ProgramParent datasets identified (if

applicable)Is the sensor fully described?Is the context of the reading(s) fully

described?State-of-the-Art technology?

Documentation:What is the state of the documentation?Is the documentation captured

(archived)?

Adherence to Process GuidelinesDid it get fast-tracked? Tons of waivers?Were all exit criteria met satisfactorily?Consistent use of units?

Access:Readily available?Foreign repository?Behind firewalls or open FTP?

Toolkits:Data visualization routine?Data reader?Verified reader/subroutine?

Relationships:Sibling/child datasets identified?Motivation/justification identified?

Rarity:Hard-to-find data?Atypical sensor/resolution/etc.?

Specification:Resolution (spatial / temporal)Spatial coverageStart timeEnd timeData format? Exotic data structure?Sizing / volume expectation?

Comments/Thoughts/Questions?

Step 2: Significance of Non-Obligated Datasets

Step 2: Significance of Non-Obligated Datasets

Prioritization criteria to assess a non-obligated dataset’s significance:

• Source: A particular dataset’s association.

• PO.DAAC-centric NASA mission/project (1)

• Non-PO.DAAC-centric NASA mission/project (0.75)

• Domestic (non-NASA) mission/project (0.5)

• International mission/project (0.25)

• Uniqueness: Would this be a new and/or one-of-a-kind dataset to PO.DAAC?

• Yes/No (1/0)

• Desirability: Is there a need/want for this dataset in the community?

• High/Medium/Low (1/0.5/0)

• Maturity (1st order): Community recognition? Technical Quality? Dataset Specifics?

• High/Medium/Low (1/0.5/0)

Step 2: Example

Score = (Source_Score*25) + (Unique_Score*20) + (Desirability_Score*30) + (Maturity_Score*25)

4 Prioritization Groups: 1st tier (green); 2nd tier (yellow); 3rd tier (orange); 4th tier (pink)

Comments/Thoughts/Questions?

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 3

Recommendation

Cost Decisional CriteriaCommunity Assessment:

Papers written / number of citations# of Likes# of downloads/views

Technical Quality:QQC+Latency / GappinessAccuracy

Sampling issues?Caveats/known issues identified?

Processing:Has it been manipulated?Cal/Val state?Verification state?

Provenance:Maturity of

platform/instrument/sensorMaturity of ProgramParent datasets identified (if applicable)Is the sensor fully described?Is the context of the reading(s) fully

described?State-of-the-Art technology?

Documentation:What is the state of the

documentation?Is the documentation captured

(archived)?

Adherence to Process GuidelinesDid it get fast-tracked? Tons of waivers?Were all exit criteria met satisfactorily?Consistent use of units?

Access:Readily available?Foreign repository?Behind firewalls or open FTP?

Toolkits:Data visualization routine?Data reader?Verified reader/subroutine?

Relationships:Sibling/child datasets identified?Motivation/justification identified?

Rarity:Hard-to-find data?Atypical sensor/resolution/etc.?

Specification:Resolution (spatial / temporal)Spatial coverageStart timeEnd timeData format? Exotic structure?Sizing / volume expectation?

Comments/Thoughts/Questions?

Step 3: Cost Analysis

Prioritization Plan Flow ChartIdentify Datasets

Obligated

Significance

Cost Analysis

Remote Link Archive Reject Dataset

Dataset List

Ranked Dataset List

Dataset List

Archive

Dataset Classification

Non-Obligated

Step 4

Recommendation


Recommended