Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | horatio-woods |
View: | 216 times |
Download: | 0 times |
Dataset Gap Analysis/Prioritization Plan
Michelle Gierach, PO.DAAC Project Scientist
2012 PO.DAAC User Working Group (UWG) Meeting
March 7-8, 2012
Dataset Gap Analysis
Rec. 6. Do dataset gap analysis and create a report.
Rec. 19. PO.DAAC provide climatologies, anomalies, indices, and various dataset statistics for selected datasets.
Status:
• A dataset gap analysis document was created that details datasets currently available and those that will soon be available in the ocean community.
• Available climatologies, anomalies, and other value-added products (e.g., fluxes, frontal gradients) were included in the document.
• Approx. 100 datasets were listed based upon input from the PO.DAAC User Working Group (UWG), Project Science Team (PST), and Data Engineers (DEs).
Future Plans:
• Request for information twice a year from the UWG, PST, DEs, and NASA science teams regarding additional datasets.
• After this initial phase of acquiring available datasets, the next step in FY13 will be to see where gaps still exist and work with the community to create additional climatologies, anomalies, and indicies or create them ourselves within PO.DAAC.
Dataset Gap Analysis Prioritization
Now that we have a document that lists ~100 datasets that would be of benefit to our users, how do we prioritize?
Past prioritization has been subjective and ad-hoc.
Need a system that is unbiased and provides quantitative measures to assess a dataset’s significance.
Dataset Lifecycle PhasesIdentify a Dataset of
Interest
Green-Light the Dataset
Tailor the Dataset Policy
Ingest the Dataset
Archive the Dataset
Register/Catalog the Dataset
Distribute the Dataset
Verify the Dataset
Rollout the Dataset
Maintain the Dataset
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive
Comments/Thoughts/Questions?
Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Recommendation
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 1
Step 1a
Step 1b
Recommendation
Comments/Thoughts/Questions?
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 2
Recommendation
Comments/Thoughts/Questions?
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 3
Recommendation
Comments/Thoughts/Questions?
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 4
Recommendation
Comments/Thoughts/Questions?
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 1
Step 1a
Step 1b
Recommendation
Step 1: Dataset Identification/Classification
Approx. 100 datasets were identified within the oceanographic community.
Seven of these were classified as PO.DAAC obligations, including:
• L2B reprocessed QuikSCAT data (JPL)
• L2C QuikSCAT data (JPL)
• MEaSUREs CCMP-like product (Bourassa)
• GHRSST Pathfinder 5.2 SST
• GHRSST Global Ocean Sea Surface Temperature Multi Product Ensemble (GMPE)
• GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly
• GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly Reanalysis
First priority is given to datasets labeled as PO.DAAC obligations.
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Recommendation
Non-Obligated
Step 2
Decisional CriteriaCommunity Assessment:
Papers written / number of citations
# of Likes# of downloads/views
Technical Quality:QQC+Latency / GappinessAccuracy
Sampling issues?Caveats/known issues identified?
Processing:Has it been manipulated?Cal/Val state?Verification state?
Provenance:Maturity of
platform/instrument/sensorMaturity of ProgramParent datasets identified (if
applicable)Is the sensor fully described?Is the context of the reading(s) fully
described?State-of-the-Art technology?
Documentation:What is the state of the documentation?Is the documentation captured
(archived)?
Adherence to Process GuidelinesDid it get fast-tracked? Tons of waivers?Were all exit criteria met satisfactorily?Consistent use of units?
Access:Readily available?Foreign repository?Behind firewalls or open FTP?
Toolkits:Data visualization routine?Data reader?Verified reader/subroutine?
Relationships:Sibling/child datasets identified?Motivation/justification identified?
Rarity:Hard-to-find data?Atypical sensor/resolution/etc.?
Specification:Resolution (spatial / temporal)Spatial coverageStart timeEnd timeData format? Exotic data structure?Sizing / volume expectation?
Comments/Thoughts/Questions?
Step 2: Significance of Non-Obligated Datasets
Step 2: Significance of Non-Obligated Datasets
Prioritization criteria to assess a non-obligated dataset’s significance:
• Source: A particular dataset’s association.
• PO.DAAC-centric NASA mission/project (1)
• Non-PO.DAAC-centric NASA mission/project (0.75)
• Domestic (non-NASA) mission/project (0.5)
• International mission/project (0.25)
• Uniqueness: Would this be a new and/or one-of-a-kind dataset to PO.DAAC?
• Yes/No (1/0)
• Desirability: Is there a need/want for this dataset in the community?
• High/Medium/Low (1/0.5/0)
• Maturity (1st order): Community recognition? Technical Quality? Dataset Specifics?
• High/Medium/Low (1/0.5/0)
Step 2: Example
Score = (Source_Score*25) + (Unique_Score*20) + (Desirability_Score*30) + (Maturity_Score*25)
4 Prioritization Groups: 1st tier (green); 2nd tier (yellow); 3rd tier (orange); 4th tier (pink)
Comments/Thoughts/Questions?
Prioritization Plan Flow ChartIdentify Datasets
Obligated
Significance
Cost Analysis
Remote Link Archive Reject Dataset
Dataset List
Ranked Dataset List
Dataset List
Archive
Dataset Classification
Non-Obligated
Step 3
Recommendation
Cost Decisional CriteriaCommunity Assessment:
Papers written / number of citations# of Likes# of downloads/views
Technical Quality:QQC+Latency / GappinessAccuracy
Sampling issues?Caveats/known issues identified?
Processing:Has it been manipulated?Cal/Val state?Verification state?
Provenance:Maturity of
platform/instrument/sensorMaturity of ProgramParent datasets identified (if applicable)Is the sensor fully described?Is the context of the reading(s) fully
described?State-of-the-Art technology?
Documentation:What is the state of the
documentation?Is the documentation captured
(archived)?
Adherence to Process GuidelinesDid it get fast-tracked? Tons of waivers?Were all exit criteria met satisfactorily?Consistent use of units?
Access:Readily available?Foreign repository?Behind firewalls or open FTP?
Toolkits:Data visualization routine?Data reader?Verified reader/subroutine?
Relationships:Sibling/child datasets identified?Motivation/justification identified?
Rarity:Hard-to-find data?Atypical sensor/resolution/etc.?
Specification:Resolution (spatial / temporal)Spatial coverageStart timeEnd timeData format? Exotic structure?Sizing / volume expectation?
Comments/Thoughts/Questions?
Step 3: Cost Analysis