+ All Categories
Home > Technology > Machine Learning for Societal Applications

Machine Learning for Societal Applications

Date post: 27-May-2015
Category:
Upload: david-lary
View: 230 times
Download: 1 times
Share this document with a friend
Description:
Machine Learning for Societal Applications for European Space Agency 2014 Summer School in Remote Sensing
Popular Tags:
75
Machine Learning for Societal Applications 2 Monday, August 11, 14
Transcript
Page 1: Machine Learning for Societal Applications

Machine Learning for Societal Applications

2

Monday, August 11, 14

Page 2: Machine Learning for Societal Applications

Satellite Observations Meteorological Analyses Population Density In-situ observationsSocial Media

Multiple Big Data Sets transferredover High SpeedNetworks

Combined Using Machine Learningto Provide a High-ResolutionGlobal Products

Combined with Electronic Health Records to provide:1. Real time personal health alerts2. Physician Decision Support Tools3. Logistical Planning for Emergency Rooms4. Improved Policy Decisions

Next Generation of High Speed Networks to Facilitate the Next Generation of Proactive Smart Health Care Applications

Veteran’s AdministrationCountry’s Largest Health Care Provider

Requires ultra-low latency gigabit to the end user

Local cloud computing coupled with widely distributed national and global sensor networks

Multiple global high-resolution datasets

Prof. David Lary

Monday, August 11, 14

Page 3: Machine Learning for Societal Applications

Next Generation of High Speed Low latency Networks to Facilitate the Next Generation of Smart Fire Detection & Water Conservation Applications

Requires ultra-low latency wireless gigabit for very-high resolution hyperspectral video imagery for real time flight control of aerial vehicles

11 drought-ridden western and central states have just been declared as primary natural disaster areas seriously threatening US food security. Further, every year between $1 and $2 billion dollars are spent on fire suppression costs alone.

A fleet of low cost aerial vehicles working together autonomously utilizing uncompressed very-high resolution hyperspectral video imagery. The geo-tagged imagery is streamed using high-speed low-latency wireless networks to communicate to a powerful cloud computing cluster running machine learning and image processing algorithms for real time direction of the optimal flight patterns, and the delivery of early warning for timely interventions.

Fire: Appropriate preemptive fire prevention can lead to massive savings in fire control costs, loss of life, and property damage.

Agriculture: Appropriate and timely early warning of crop infestations, infections and/or water stress can prevent massive avoidable losses.

Prof. David Lary

20 lb Airborne 385 channel hyperspectral imaging system

Monday, August 11, 14

Page 4: Machine Learning for Societal Applications

Next Generation of High Speed Networks to Facilitate the Next Generation of Smart Water Management Applications

With Drought Disaster Declarations in 11 western and central states, smart water management is now more critical than ever for sustainable water conservation and US Food Security. Coupling high resolution remote sensing from satellites, with machine learning, and the next generation of high speed low latency networks is facilitating the next generation of smart water management systems. These systems will benefit individual home owners, farmers, corporate campuses, golf courses, etc. and allow optimum monitoring and control of irrigation using mobile devices.

Sports fields

blown valves lead to flooding

uneven irrigation

Prof. David Lary

Monday, August 11, 14

Page 5: Machine Learning for Societal Applications

Monday, August 11, 14

Page 6: Machine Learning for Societal Applications

Unprecedented levels of air pollution in Singapore and Malaysia in June led to respiratory illnesses, school closings, and grounded aircraft.  This year it was so bad that in some affected areas there was a 100 percent rise in the number of asthma cases, and the government of Malaysia distributed gas masks.

MODIS Aqua July 21, 2013.

David Lary

Monday, August 11, 14

Page 7: Machine Learning for Societal Applications

Air pollution in Ulaanbaatar, Mongolia

Monday, August 11, 14

Page 8: Machine Learning for Societal Applications

Monday, August 11, 14

Page 9: Machine Learning for Societal Applications

This is a BigData Problem of Great Societal Relevance

• Collecting data in real time from national and global networks requires bandwidth.

• With the next generation of wearable sensors and the internet of things this data volume will rapidly increase.

• A variety of applications enabled by BigData, higher bandwidth and cloud processing.

• Future finer granularity and two way communication will dramatically increase the size of the data bringing air quality to the micro scale, just like weather data.

Time Taken10 Mbps 20 Mbps 50 Mbps 1 Gbps

40 TB training data4 Gb update

185 days 93 days 37 days 1 day 21 hours54m 27m 11m 32s

Monday, August 11, 14

Page 10: Machine Learning for Societal Applications

Think Big: Holistic & Comprehensive Informatics

Bio'Informa$cs

Medical'Informa$cs

Environmental'Informa$cs

THRIVEMul$ple'Big'Data'+'EMR'+'Social'Media'+'Machine'Learning'+'CausalityA'Cross=cu>ng'Pla@orm'for'Comprehensive'Informa$cs'for'Data*Driven*Decisions'in'Pa4ent*Centered*Care'facilitated'by'High%Speed%Low-Latency%networks,'mul$ple'massive'datasets'from'large'distributed'sensor'networks,'EMR,'and'local%cloud%compu:ng.

Monday, August 11, 14

Page 11: Machine Learning for Societal Applications

PM2.5 Invisible Killer

Monday, August 11, 14

Page 12: Machine Learning for Societal Applications

Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset

A Technology Challenge Case Study

David LaryHanson Center for Space ScienceUniversity of Texas at Dallas

Monday, August 11, 14

Page 13: Machine Learning for Societal Applications

What?

Monday, August 11, 14

Page 14: Machine Learning for Societal Applications

Type

s of

bio

logi

cal M

ater

ial

Type

s of

Dus

tTy

pes

of P

artic

ulat

esG

as M

olec

ules

0.0001 μm 0.001 μm 0.01 μm 0.1 μm 1 μm 10 μm 100 μm 1000 μm

Pollen

Mold Spores

House Dust Mite Allergens

Bacteria

Cat Allergens

Viruses

Heavy Dust

Settling Dust

Suspended Atmospheric Dust

Cement Dust

Fly Ash

Oil Smoke

Smog

Tobacco Smoke

Soot

Gas Molecules

Decreased Lung Function < 10 μm

Skin & Eye Disease < 2.5 μm

Tumors < 1 μm

Cardiovascular Disease < 0.1 μm

Hair

Pin

Cell

0.0001 μm 0.001 μm 0.01 μm 0.1 μm 1 μm 10 μm 100 μm 1000 μm

PM10 particles

PM2.5 particles

PM0.1 ultra fine particles PM10-2.5 coarse fraction

0.1 mm 1 mm

Monday, August 11, 14

Page 15: Machine Learning for Societal Applications

Type

s of

bio

logi

cal M

ater

ial

Type

s of

Dus

tTy

pes

of P

artic

ulat

esG

as M

olec

ules

0.0001 μm 0.001 μm 0.01 μm 0.1 μm 1 μm 10 μm 100 μm 1000 μm

Pollen

Mold Spores

House Dust Mite Allergens

Bacteria

Cat Allergens

Viruses

Heavy Dust

Settling Dust

Suspended Atmospheric Dust

Cement Dust

Fly Ash

Oil Smoke

Smog

Tobacco Smoke

Soot

Gas Molecules

Decreased Lung Function < 10 μm

Skin & Eye Disease < 2.5 μm

Tumors < 1 μm

Cardiovascular Disease < 0.1 μm

Hair

Pin

Cell

0.0001 μm 0.001 μm 0.01 μm 0.1 μm 1 μm 10 μm 100 μm 1000 μm

PM10 particles

PM2.5 particles

PM0.1 ultra fine particles PM10-2.5 coarse fraction

0.1 mm 1 mm

! 5!

Table!1.!PM!and!health!outcomes!(modified!from!Ruckerl*et*al.!(2006)).!

!!Health*Outcomes!

Short9term*Studies* Long9term*Studies*PM10! PM2.5! UFP! PM10! PM2.5! UFP!

Mortality* !! !! !! !! !! !!

!!!!All!causes! xxx!! xxx!! x! xx! xx! x!!!!!Cardiovascular! xxx! xxx! x!! xx! xx! x!

!!!!Pulmonary! xxx! xxx! x! xx! xx! x!Pulmonary!effects! !! !! !! !! !! !!

!!!!Lung!function,!e.g.,!PEF! xxx! xxx! xx! xxx! xxx! !!!!!!Lung!function!growth! !! !! !! xxx! xxx! !!

Asthma!and!COPD!exacerbation! !! !! !! !! !! !!

!!!!Acute!respiratory!symptoms! !! xx! x! xxx! xxx! !!!!!!Medication!use! !! !! x! !! !! !!

!!!!Hospital!admission! xx! xxx! x! !! !! !!Lung!cancer! !! !! !! !! !! !!

!!!!Cohort! !! !! !! xx! xx! x!

!!!!Hospital!admission! !! !! !! xx! xx! x!Cardiovascular!effects! !! !! !! !! !! !!

!!!!Hospital!admission! xxx! xxx! !! x! x! !!ECG@related!endpoints! !! !! !! !! !! !!

!!!!Autonomic!nervous!system! xxx! xxx! xx! !! !! !!!!!!Myocardial!substrate!and!vulnerability! !! xx! x! !! !! !!

Vascular!function! !! !! !! !! !! !!

!!!!Blood!pressure! xx! xxx! x! !! !! !!!!!!Endothelial!function! x! xx! x! !! !! !!

Blood!markers! !! !! !! !! !! !!!!!!Pro!inflammatory!mediators! xx! xx! xx! !! !! !!

!!!!Coagulation!blood!markers! xx! xx! xx! !! !! !!

!!!!Diabetes! x! xx! x! !! !! !!!!!!Endothelial!function! x! x! xx! !! !! !!

Reproduction! !! !! !! !! !! !!!!!!Premature!birth! x! x! !! !! !! !!

!!!!Birth!weight! xx! x! !! !! !! !!!!!!IUR/SGA! x! x! !! !! !! !!

Fetal!growth! !! !! !! !! !! !!

!!!!Birth!defects! x! !! !! !! !! !!!!!!Infant!mortality! xx! x! !! !! !! !!

!!!!Sperm!quality! x! x! !! !! !! !!Neurotoxic!effects! !! !! !! !! !! !!

!!!!Central!nervous!system!! !! x! xx! !! !! !!x, few studies; xx, many studies; xxx, large number of studies.

Why?

Monday, August 11, 14

Page 16: Machine Learning for Societal Applications

Why?

Monday, August 11, 14

Page 17: Machine Learning for Societal Applications

How?

Used around 40 different BigData sets from satellites, meteorology, demographics, scraped web-sites and social media to estimate PM2.5. Plot below shows the average of 5,935 days from August 1, 1997 to the present.

Monday, August 11, 14

Page 18: Machine Learning for Societal Applications

Which Platform?

Requirements:1. Large persistent storage for multiple BigData sets, 100TB+ (otherwise

before have had time to process the massive datasets the scratch space time limit has expired)

2. High Bandwidth connections3. Ability to harvest social media (e.g. twitter) and scrape web sites for data4. High level language with wide range of optimized toolboxes, matlab5. Algorithms capable of dealing with massive non-linear, non-parametric,

non-Gaussian multivariate datasets (13,000+ variables) 6. Easy to make use of multiple GPUs and CPUs7. Ability to schedule tasks at precise times and time intervals to automate

workflows (in this case tasks executed at intervals of 5 minutes, 15 minutes, 1 hour, 3 hours, 1 day)

Monday, August 11, 14

Page 19: Machine Learning for Societal Applications

How?

Exis%ng(• Social(Media(• Socioeconomic,(Census(• News(feeds(• Environmental(• Weather(• Satellite(• Sensors(• Health(• Economic(

New(• UAVs(• Smart(Dust(• Autonomous(Cars(• Sensors(

Simula%on(• Global(Weather(Models(• Economic(Models(• Earthquake(Models(

Insight(

Machine(Learning(

Data(

Same approach highly relevant for the validation and optimal exploitation of the next generation of satellites, e.g. the upcoming NASA Decadal Survey Missions.

Monday, August 11, 14

Page 20: Machine Learning for Societal Applications

How?

California Children Example

Monday, August 11, 14

Page 21: Machine Learning for Societal Applications

REMOTE SENSING, MACHINE LEARNING AND PM2.5 4

Random Forests, etc.) that can provide multi-variate non-linearnon-parametric regression or classification based on a trainingdataset. We have tried all of these approaches for estimatingPM2.5 and found the best by far to be Random Forests.

B. Random ForestsIn this paper we use one of the most accurate machine learn-

ing approaches currently available, namely Random Forests[53], [54]. Random forests are composed of an ensemble ofdecision trees [55]. Random forests have many advantagesincluding their ability to work efficiently with large datasets,accommodate thousands of input variables, provide a measureof the relative importance of the input variables in the re-gression, and effectively handling datasets containing missingdata.

Each tree in the random forest is a decision tree. A decisiontree is a tree-like graph that can be used for classificationor regression. Given a training dataset, a decision tree canbe grown to predict the value of a particular output variablebased on a set of input variables [55]. The performanceof the decision tree regression can be improved upon if,instead of using a single decision tree, we use an ensembleof independent trees, namely, a random forest [53], [54]. Thisapproach is referred to as tree bootstrap aggregation, or treebagging for short.

Bootstrapping is a simple way to assign a measure of ac-curacy to a sample estimate or a distribution. This is achievedby repeatedly randomly resampling the original dataset toprovide an ensemble of independently resampled datasets.Each member of the ensemble of independently resampleddatasets is then used to grow an independent decision tree.

The statistics of random sampling means that any given treeis trained on approximately 66% of the training dataset andso approximately 33% of the training dataset is not used intraining any given tree. Which 66% is used is different foreach of the trees in the random forest. This is a very rigorousindependent sampling strategy that helps minimize over fittingof the training dataset (e.g. learning the noise). In addition, inour implementation we keep back a random sample of data notused in the training for independent validation and uncertaintyestimation.

The members of the original training dataset not used in agiven bootstrap resample are referred to as out of bag forthis tree. The final regression estimate that is provided bythe random forest is simply the average of the ensemble ofindividual predictions in the random forest.

A further advantage of decision trees is that they can provideus the relative importance of each of the inputs in constructingthe final multi-variate non-linear non-parametric regressionmodel (e.g. Tables II and III).

C. Datasets Used in Machine Learning Regression1) PM2.5 Data: As many hourly PM2.5 observations

as possible that were available from the launch of Terraand Aqua to the present were used in this study. Forthe United States this data came from the EPA AirQuality System (AQS) http://www.epa.gov/ttn/airs/airsaqs/

TABLE IIVARIABLES USED IN THE MACHINE LEARNING ESTIMATE OF PM2.5 FORTHE MODIS COLLECTION 5.1 PRODUCTS FOR THE TERRA AND AQUADEEP BLUE ALGORITHM SORTED BY THEIR IMPORTANCE. THE MOST

IMPORTANCE VARIABLE FOR A GIVEN REGRESSION IS PLACED FIRST WITHA RANK OF 1.

Terra DeepBlue

Rank Source Variable Type

1 Population Density Input2 Satellite Product Tropospheric NO2 Column Input3 Meteorological Analyses Surface Specific Humidity Input4 Satellite Product Solar Azimuth Input5 Meteorological Analyses Surface Wind Speed Input6 Satellite Product White-sky Albedo at 2,130 nm Input7 Satellite Product White-sky Albedo at 555 nm Input8 Meteorological Analyses Surface Air Temperature Input9 Meteorological Analyses Surface Layer Height Input10 Meteorological Analyses Surface Ventilation Velocity Input11 Meteorological Analyses Total Precipitation Input12 Satellite Product Solar Zenith Input13 Meteorological Analyses Air Density at Surface Input14 Satellite Product Cloud Mask Qa Input15 Satellite Product Deep Blue Aerosol Optical Depth 470 nm Input16 Satellite Product Sensor Zenith Input17 Satellite Product White-sky Albedo at 858 nm Input18 Meteorological Analyses Surface Velocity Scale Input19 Satellite Product White-sky Albedo at 470 nm Input20 Satellite Product Deep Blue Angstrom Exponent Land Input21 Satellite Product White-sky Albedo at 1,240 nm Input22 Satellite Product Scattering Angle Input23 Satellite Product Sensor Azimuth Input24 Satellite Product Deep Blue Surface Reflectance 412 nm Input25 Satellite Product White-sky Albedo at 1,640 nm Input26 Satellite Product Deep Blue Aerosol Optical Depth 660 nm Input27 Satellite Product White-sky Albedo at 648 nm Input28 Satellite Product Deep Blue Surface Reflectance 660 nm Input29 Satellite Product Cloud Fraction Land Input30 Satellite Product Deep Blue Surface Reflectance 470 nm Input31 Satellite Product Deep Blue Aerosol Optical Depth 550 nm Input32 Satellite Product Deep Blue Aerosol Optical Depth 412 nm Input

In-situ Observation PM2.5 Target

Aqua DeepBlue

Rank Source Variable Type

1 Satellite Product Tropospheric NO2 Column Input2 Satellite Product Solar Azimuth Input3 Meteorological Analyses Air Density at Surface Input4 Satellite Product Sensor Zenith Input5 Satellite Product White-sky Albedo at 470 nm Input6 Population Density Input7 Satellite Product Deep Blue Surface Reflectance 470 nm Input8 Meteorological Analyses Surface Air Temperature Input9 Meteorological Analyses Surface Ventilation Velocity Input10 Meteorological Analyses Surface Wind Speed Input11 Satellite Product White-sky Albedo at 858 nm Input12 Satellite Product White-sky Albedo at 2,130 nm Input13 Satellite Product Solar Zenith Input14 Meteorological Analyses Surface Layer Height Input15 Satellite Product White-sky Albedo at 1,240 nm Input16 Satellite Product Deep Blue Surface Reflectance 660 nm Input17 Satellite Product Deep Blue Surface Reflectance 412 nm Input18 Satellite Product White-sky Albedo at 1,640 nm Input19 Satellite Product Sensor Azimuth Input20 Satellite Product Scattering Angle Input21 Meteorological Analyses Surface Velocity Scale Input22 Satellite Product Cloud Mask Qa Input23 Satellite Product White-sky Albedo at 555 nm Input24 Satellite Product Deep Blue Aerosol Optical Depth 550 nm Input25 Satellite Product Deep Blue Aerosol Optical Depth 660 nm Input26 Satellite Product Deep Blue Aerosol Optical Depth 412 nm Input27 Meteorological Analyses Total Precipitation Input28 Satellite Product White-sky Albedo at 648 nm Input29 Satellite Product Deep Blue Aerosol Optical Depth 470 nm Input30 Satellite Product Deep Blue Angstrom Exponent Land Input31 Meteorological Analyses Surface Specific Humidity Input32 Satellite Product Cloud Fraction Land Input

In-situ Observation PM2.5 Target

detaildata/downloadaqsdata.htm and AirNOW http://www.airnow.gov. In Canada the data came from http://www.etc-cte.ec.gc.ca/napsdata/main.aspx. In Europe the data camefrom AirBase, the European air quality database main-tained by the European Environment Agency and the Euro-

Monday, August 11, 14

Page 22: Machine Learning for Societal Applications

Monday, August 11, 14

Page 23: Machine Learning for Societal Applications

Hourly Measurements from 55 countries and more than 8,000 measurement sites from 1997-present

A lot of measurements, but notice the large gaps!

Monday, August 11, 14

Page 24: Machine Learning for Societal Applications

Gaps are inevitable because of the infrastructure and cost associated with making the measurements.

Hourly Measurements from 55 countries and more than 8,000 measurement sites from 1997-present

Monday, August 11, 14

Page 25: Machine Learning for Societal Applications

Challenge 1: Obtaining the in-situ PM2.5 data

Real time data from:

1. EPA AirNow data for USA and Canada

2. EEA data for Europe

3. Tasmania and Australia

4. Israel

5. Russia

6. Asia and Latin America by scraping http://aqicn.org/map/

7. Harvesting social media (twitter feeds from US Embassies)

Relative low bandwidth from multiple sites every 5 minutes

Monday, August 11, 14

Page 26: Machine Learning for Societal Applications

Challenge 2: (Easier)Obtaining the Satellite & Meteorological Data

Real time data from:

1. Multiple satellites MODIS Terra, MODIS Aqua, SeaWIFS, VIIRS NPP etc

2. Global Meteorological Analyses

High bandwidth from few sites every 1 to 3 hours

Monday, August 11, 14

Page 27: Machine Learning for Societal Applications

Challenge 3: Combine multiple BigData Sets with Machine Learning

Large member machine learning ensemble using massively parallel computing to produce PM2.5 data product

Algorithms capable of dealing with massive non-linear, non-parametric, non-Gaussian multivariate datasets (13,000+ variables)

Drastically reduced development time by using a high level language (Matlab) that can easily exploit parallel execution using both multiple CPUs and GPUs.

Massively parallel every 3 hours

High level language which can readily use CPUs and GPUs

Monday, August 11, 14

Page 28: Machine Learning for Societal Applications

Challenge 4: Continual Performance Improvement

Currently on around 400th version of system.

Have been making continuous improvements in:

1. Coverage of in-situ training data set

2. Inclusion of new satellite sensors

3. Additional BigData sets that help improve fidelity of the non-linear, non-parametric, non-Gaussian multivariate machine learning fits

4. Using many alternative machine learning strategies

5. Estimate uncertainties.

6. This requires frequent reprocessing of the entire multi-year record from 1997-present

Persistent massive data storage, much more than usual scratch space at HPC centers

Monday, August 11, 14

Page 29: Machine Learning for Societal Applications

Fully Automated Workflow

Requires ability to schedule automated tasks

Monday, August 11, 14

Page 30: Machine Learning for Societal Applications

Requires ability to disseminate results in multiple formats including ftp and as web and map services

Monday, August 11, 14

Page 31: Machine Learning for Societal Applications

Monday, August 11, 14

Page 32: Machine Learning for Societal Applications

Monday, August 11, 14

Page 33: Machine Learning for Societal Applications

Monday, August 11, 14

Page 34: Machine Learning for Societal Applications

Monday, August 11, 14

Page 35: Machine Learning for Societal Applications

Lake Eyre after a long dry period is a region of high PM2.5 abundance.

Lake Eyre after heavy rains is a region of lower PM2.5 abundance than usual

Monday, August 11, 14

Page 36: Machine Learning for Societal Applications

! 14!

!

!Figure!6.!Example!timeseries:!The!large!top!panel!is!for!El!Paso,!TX,!showing!the!periodic!dust! events.! The!dust! events! typically! come! from! the!Chihuahuan! and!Big!Bend!Deserts.!The!inset!shows!the!major!dust!event!of!15!April!2003!documented!by!Rivera*et*al.!(2009)!that!was!faithfully!captured!by!our!analysis.!The!lower!panels!show!timeseries!for!various!cities!around!the!world.!

04/04 04/09 04/14 04/19 04/24 04/290

10

20

30

40

50

60

70

80

90

100

April 2003

Monday, August 11, 14

Page 37: Machine Learning for Societal Applications

04/04 04/09 04/14 04/19 04/24 04/290

10

20

30

40

50

60

70

80

90

100

April 2003

Monday, August 11, 14

Page 38: Machine Learning for Societal Applications

Key System Requirements:Not always available on current HPC systems

Requirements:1. Large persistent storage for multiple BigData sets, 100TB+ (otherwise

before have had time to process the massive datasets the scratch space time limit has expired)

2. High Bandwidth connections3. Ability to harvest social media (e.g. twitter) and scrape web sites for data4. High level language with wide range of optimized toolboxes, matlab5. Algorithms capable of dealing with massive non-linear, non-parametric,

non-Gaussian multivariate datasets (13,000+ variables) 6. Easy to make use of multiple GPUs and CPUs7. Ability to schedule tasks at precise times and time intervals to automate

workflows (in this case tasks executed at intervals of 5 minutes, 15 minutes, 1 hour, 3 hours, 1 day)

Monday, August 11, 14

Page 39: Machine Learning for Societal Applications

THRIVETimely Health indicators using Remote sensing &

Innovation for the Vitality of the Environment

Prevention is better than cure

[email protected]

Address key societal issues

Monday, August 11, 14

Page 40: Machine Learning for Societal Applications

41

VA Decision Support Tools

More Than 40 Data Products from In-situ Observations, NASA Earth Observations, Earth System Models, Population Density & Emission Inventories

Personalized Alerts Dr. WatsonStaffing & Resource

Management

Machine Learning

Daily Global Air Quality Estimates

NASA Earth Observation Data

NASA Earth System Model Products

Population Density and Other Related Products

ER AdmissionsAll ICD Codes

All Prescriptions

Machine Learning

Machine Learning

THRIVE Medical Environment Analytics

Engine

Monday, August 11, 14

Page 41: Machine Learning for Societal Applications

Scenario SimulatorA US-Ignite Keystone Application

Monday, August 11, 14

Page 42: Machine Learning for Societal Applications

Environmental Impact

Have the environmental conditions for any day, anywhere on the planetfrom 1997-present as a context andsimulate the likely health outcomes

Midday

Monday, August 11, 14

Page 43: Machine Learning for Societal Applications

Environmental Impact

Have the environmental conditions for any day, anywhere on the planetfrom 1997-present as a context andsimulate the likely health outcomes

Midday

Monday, August 11, 14

Page 44: Machine Learning for Societal Applications

June 27, 2012 July 21, 2012 September 14, 2012July 5, 2011

Set training in any time or place and retrieve the actual environmental conditions, visibility, weather,air borne particulates and simulate the health outcomes. ALL driven by data.

Monday, August 11, 14

Page 45: Machine Learning for Societal Applications

WaterThe Motive

Do well with the Details by embracing the Big Picture

Prof. David Lary+1 (972) 489-2059

http://[email protected]

Monday, August 11, 14

Page 46: Machine Learning for Societal Applications

Western U.S. Drought Prompts Disaster Declarations In 11 StatesBy MICHELLE RINDELS 01/16/14 07:51 PM ET EST

LAS VEGAS (AP) — Federal officials have designated portions of 11 drought-ridden western and central states as primary natural disaster areas, highlighting the financial strain the lack of rain is likely to bring to farmers in those regions.

The announcement by the U.S. Department of Agriculture on Wednesday included counties in Colorado, New Mexico, Nevada, Kansas, Texas, Utah, Arkansas, Hawaii, Idaho, Oklahoma and California.

Rancher Ralph Miller, 79, checks on one of many “stock tanks” of water that are receding due to the severe drought. “I’d say it’s just about as bad as it can get.”

Barnhart, Texas

Monday, August 11, 14

Page 47: Machine Learning for Societal Applications

“Water is the new oil”Jim Rogers, chief executive of Duke Energy... and many others

Water crisis in California, Texas threatens US food securityWestern water scarcity issues becoming more severeWestern Farm Press, Jun. 5, 2012University of Texas at Austin

California and Texas produced agricultural products worth $56 billion in 2007, accounting for much of the nation's food production. They also account for half of all groundwater depletion in the U.S., mainly as a result of irrigating crops.

The nation’s food supply may be vulnerable to rapid groundwater depletion from irrigated agriculture, according to a new study by researchers at The University of Texas at Austin and elsewhere.

http://westernfarmpress.com/irrigation/water-crisis-california-texas-threatens-us-food-security

Monday, August 11, 14

Page 48: Machine Learning for Societal Applications

Since 1980 the population of Texas has more than doubled, but the reservoir capacity has remained almost unchanged.

During 2011the reservoir levels were the lowest during Sep-Dec that they have been since 1990.

In 2014 we are starting out with lower levels than 2013.

Monday, August 11, 14

Page 49: Machine Learning for Societal Applications

Smarter irrigation control is invaluable!

If we can use existing infrastructure it is even better!

.... from farm, to corporate campus, to golf course, to your back yard.

Monday, August 11, 14

Page 50: Machine Learning for Societal Applications

When great societal need meets appropriate scalable solution

there is much societal and economic benefit to be

gained

Monday, August 11, 14

Page 51: Machine Learning for Societal Applications

How?

California Children Example

http://holistics3.com

Monday, August 11, 14

Page 52: Machine Learning for Societal Applications

Monday, August 11, 14

Page 53: Machine Learning for Societal Applications

DATE: 12-Feb-2010

DOC NO: 0115056

ISSUE: 02

DMC DATA PRODUCT MANUAL

STATUS: FINAL

Page 16 of 127

Figure 4: Detector and channel layout of the SLIM-6-22 imager

Imager Bank 0

Channel 6 Green

Channel 5 Red

Channel 4 NIR

Imager Bank 1

Channel 1 NIR

Channel 2 Red

Channel 3 Green

Pixel 1

Pixel 14436

Pixel 14436

Pixel 1

Monday, August 11, 14

Page 54: Machine Learning for Societal Applications

20 lb Airborne hyperspectral imaging system385 channels between 400-1,700 nm

Hyperspectral data cube

Monday, August 11, 14

Page 55: Machine Learning for Societal Applications

Monday, August 11, 14

Page 56: Machine Learning for Societal Applications

uneven irrigation

blown valves lead to flooding

Sports fields

agricultural test plots

22 m resolution

Monday, August 11, 14

Page 57: Machine Learning for Societal Applications

On average, systems have water losses of about 17 percent.

49

Monday, August 11, 14

Page 58: Machine Learning for Societal Applications

Can you tell which grass has had more water?

Zooming in

Monday, August 11, 14

Page 59: Machine Learning for Societal Applications

Neighborhood

Trees

‘green’ pond

golf course

gated community

dry grass

Trees

Trees

Sports Field

Monday, August 11, 14

Page 60: Machine Learning for Societal Applications

J S Famiglietti, and M Rodell Science 2013;340:1300-1301

www.sciencemag.org SCIENCE VOL 340 14 JUNE 2013 1301

PERSPECTIVES

an accuracy of 1.5 cm equivalent water height.Because GRACE measures changes in

total water storage, it integrates the impacts of natural climate fl uctuations, global change, and human water use, including groundwater extraction, which in many parts of the world is unmeasured and unmanaged. GRACE-derived rates of groundwater losses in the world’s major aquifer systems ( 4– 6) under-score the critical need to improve monitor-ing and regulation of groundwater systems before they run dry.

Regional fl ooding and drought are driven by the surplus or defi cit of water in a river basin or an aquifer, yet few hydrologic observing networks yield suffi cient data for comprehensive monitoring of changes in the total amount of water stored in a region. GRACE observations have helped to fill this gap. They have been used to character-ize regional fl ood potential ( 8) and to assess water storage deficits in the U.S. Drought Monitor ( 9) and are included in annual State of the Climate reports ( 10). As an integrated measure of all surface and groundwater stor-age changes, GRACE data implicitly contain a record of seasonal to interannual water stor-age variations that can likely be exploited to lengthen early warning periods for regional fl ood and drought prediction (see the fi gure).

The lack of comprehensive measurements also makes large-scale hydrological models,

key tools for predicting future water avail-ability, diffi cult to validate. Low-resolution GRACE data, when combined with higher-resolution model simulations, provide an independent constraint on simulated water balances, while also adding spatial detail to GRACE’s low-resolution perspective ( 11). They are widely used to evaluate land surface models used by weather and climate forecast-ing centers around the world ( 12).

Evapotranspiration is a key factor in interbasin water allocations, yet because it disperses into the atmosphere in the vapor phase, it confounds standard measurement techniques. The ability of GRACE to weigh changes in water stored in an entire river basin allows evapotranspiration to be esti-mated in a water balance framework ( 13).

Transboundary water availability issues require sharing hydrologic data across politi-cal boundaries. However, national hydrolog-ical records are often withheld for political, socioeconomic, and defense purposes, com-plicating regional water management discus-sions. Several studies have used GRACE data to circumvent international data denial prac-tices, including in those involving lakes ( 14), river basins ( 6), and aquifers ( 4, 6). Likewise, regional and global maps of emerging trends in water availability (see the figure) can underpin discussions of geopolitical water security, confl ict, and water diplomacy ( 6).

Although it still collects 10 months of data per year, GRACE has long outlived its planned 5-year life span. The GRACE Fol-low-On (GRACE-FO) mission, planned for launch in 2017, should enable continued col-lection of critical water and related climate observations for at least a decade, forestalling potential data gaps before a more advanced satellite gravimetry system is developed and launched, as tentatively planned for the 2020s.

For GRACE and its successors to maxi-mize their value for water management, key issues must be addressed. First, the current 2- to 6-month latency before GRACE data are released must be substantially reduced to enable their use in seasonal prediction. Sec-ond, GRACE data should be better integrated into the modeling and decision support sys-tems used by operational water management centers. Finally, next-generation missions beyond GRACE-FO should aim to achieve higher spatial (<50,000 km2) and temporal (weekly or biweekly) resolution, for exam-ple through novel orbital confi gurations, so that smaller river basins and aquifers can be observed directly. The availability of GRACE data at these fi ner scales, at which most plan-ning decisions are made, would likely ensure their broader use in water management.

The GRACE-FO mission is on sched-ule for a 2017 launch, but a next-generation, improved GRACE mission is still under design and as yet unconfirmed. Given its demonstrated contributions to date and the potential for much more, a future without a GRACE mission in orbit would be an unfor-tunate and unnecessarily risky backward step for regional water management.

References

1. P. J. Durack et al., Science 336, 455 (2012). 2. K. E. Trenberth, Clim. Res. 47, 123 (2011). 3. I. M. Held, B. J. Soden, J. Clim. 19, 5686 (2006). 4. V. M. Tiwari, J. Wahr, S. Swenson, Geophys. Res. Lett. 36,

L18401 (2009). 5. B. R. Scanlon et al., Proc. Natl. Acad. Sci. U.S.A. 109,

9320 (2012). 6. K. A. Voss et al., Water Resour. Res. 49, 904 (2013). 7. B. D. Tapley et al., Science 305, 503 (2004). 8. J. T. Reager, J. S. Famiglietti, Geophys. Res. Lett. 36,

L23402 (2009). 9. R. Houborg et al., Water Resour. Res. 48, W07525 (2012). 10. J. Blunden, D. S. Arndt, Eds., Bull. Am. Meteorol. Soc. 93,

S1 (2012). 11. B. F. Zaitchik et al., J. Hydrometeorol. 9, 535 (2008). 12. S. C. Swenson, P. C. D. Milly, Water Resour. Res. 42,

W03201 (2006). 13. G. Ramillien et al., Water Resour. Res. 42, W10403 (2006). 14. S. Swenson, J. Wahr, J. Hydrol. 370, 163 (2009). 15. J. S. Famiglietti, Abstract GC31D-01, fall meeting, AGU,

San Francisco, 3 to 7 December 2012.

Supplementary Materials www.sciencemag.org/cgi/content/full/science.1236460/DC1 Fig. S1

CR

ED

IT: C

AR

OLIN

E D

E L

INA

GE

/UN

IV. O

F C

ALIF

OR

NIA

, IR

VIN

E

50°N

40°N

30°N

70°W

80°W

90°W100°W110°W

120°W

–3 –2 –1 0 1 2 3

H2O (cm/year)

15

2 3

4

6

Mixed picture. Between 2003 and 2012, GRACE data show water losses in agricultural regions such as Cali-fornia’s Central Valley (1) (�1.5 ± 0.1 cm/year) and the Southern High Plains Aquifer (2) (�2.5 ± 0.2 cm/year), caused by overreliance on groundwater to supply irrigation water. Regions where groundwater is being depleted as a result of prolonged drought include Houston (3) (�2.3 ± 0.6 cm/year), Alabama (4) (�2.1 ± 0.8 cm/year), and the Mid-Atlantic states (5) (�1.8 ± 0.6 cm/year). Water storage is increasing in the fl ood-prone Upper Missouri River basin (6) (2.5 ± 0.2 cm/year). See fi g. S1 for monthly time series for all hot spots. Data from ( 15) and from GRACE data release CSR RL05.

10.1126/science.1236460

Published by AAAS

Monday, August 11, 14

Page 61: Machine Learning for Societal Applications

Summary• Vegetation Index is dependent on amount of

irrigation

• Regular (weekly) remote sensing inspection could allow us to:

• Appropriate irrigation zones

• Help identify regions of over watering

• Help identify any burst pipes/valves

• Optimize irrigation patterns

• Automate sprinkler system controls

• Progressively more benefit as a specific history of the plots/site is built up

Monday, August 11, 14

Page 62: Machine Learning for Societal Applications

Stage 2

Monday, August 11, 14

Page 63: Machine Learning for Societal Applications

FUTURE Water Management

Water Mgmt.“Smart-GRID*”

Delivery

Models

Basin Geodata

Water/Crop Status & Forecast

Water NeedStatus & Forecast

WaterAgric +Others Use

& Forecast

*Dept. of Energy 2013Alfonso TorresMonday, August 11, 14

Page 64: Machine Learning for Societal Applications

FUTURE Water Management

Why Agriculture? ~80% water use US (USDA 2013)

Challenges: Climate change, Drought, Population, non-ag water uses.

Water Use Efficiency: ~50% US (USDA 2004)

Water Mgmt.“Smart-GRID*”

Delivery

Models

Basin Geodata

Water/Crop Status & Forecast

Water NeedStatus & Forecast

WaterAgric +Others

Status & Forecast

Monday, August 11, 14

Page 65: Machine Learning for Societal Applications

CURRENT Water Management

Current Water Mgmt.

Delivery

Models

Basin Geodata

Water/Crop Status & Forecast

WaterStatus & Forecast

WaterAgric +Others

Status & Forecast

Alfonso TorresMonday, August 11, 14

Page 66: Machine Learning for Societal Applications

Current Water Mgmt.

Delivery

Models

Basin Geodata

Water/Crop Status & Forecast

WaterStatus & Forecast

WaterAgric +Others

Status & Forecast

Water use based on:

Experience Limited estimations

No related info

CURRENT Water Management

Monday, August 11, 14

Page 67: Machine Learning for Societal Applications

CWMIS Case Example: Water Use vs. Delivery

TOP: crop water use vs. water delivery (ac-ft).BOTTOM: water use difference (ac-ft)

Typically save at least 10% Can be done on a field by field, campus by campus, home by home, or golf course by golf course basis or for an entire basin.

Alfonso Torres

Monday, August 11, 14

Page 68: Machine Learning for Societal Applications

Culex tarsalis

West Nile Virus

The same data infrastructure can also be used to help combat West Nile Virus by identifying breeding sites.

Monday, August 11, 14

Page 69: Machine Learning for Societal Applications

P. vivax is carried by the female Anopheles mosquito

Monday, August 11, 14

Page 70: Machine Learning for Societal Applications

Plasmodium vivax is a protozoal parasite and a human pathogen. The most frequent and widely distributed cause of recurring (Benign tertian) malaria, P. vivax is one of the six species of malaria parasites that commonly infect humans.[1] It is less virulent than Plasmodium falciparum, the deadliest of the six, but vivax malaria can lead to severe disease and death.[2][3] P. vivax is carried by the female Anopheles mosquito, since it is only the female of the species that bite.

Plasmodium vivax

Plasmodium falciparum http://www.worldmalariareport.org/

Monday, August 11, 14

Page 71: Machine Learning for Societal Applications

Seasonal climatic suitability for malaria transmission (CSMT)Climatic conditions are considered to be suitable for transmission when the monthly precipitation accumulation is at least 80 mm, the monthly mean temperature is between 18°C and 32°C and the monthly relative humidity is at least 60%. These thresholds are based on a consensus of the literature. In practice, the optimal and limiting conditions for transmission are dependent on the particular species of the parasite and vector.

Commentary: Web-based climate information resources for malaria control in AfricaEmily K Grover-Kopec, M Benno Blumenthal, Pietro Ceccato, Tufa Dinku, Judy A Omumbo and Stephen J Connor*Malaria Journal 2006, 5:38 doi:10.1186/1475-2875-5-38

Monday, August 11, 14

Page 72: Machine Learning for Societal Applications

Monday, August 11, 14

Page 73: Machine Learning for Societal Applications

0 500 1,000 Km

Map Produced by USGS/EROS

Vectorial CapacityIn Zones with Malaria Epidemic Potential

05 August - 12 August 2013

VCAP Values00 - 22 - 44 - 66 - 88 - 1010 - 1515 - 20> 20

Country Boundaries

Monday, August 11, 14

Page 74: Machine Learning for Societal Applications

Satellite imagery can be used to track mosquito habitats.

High-resolution (5 m) satellite images can identify very small water bodies, wetlands and other malaria-relevant land-cover types.

Of the 225 million annual reported cases of the disease, 212 million of these occur in Africa. Of the 800,000 Malaria-related deaths each year, 90% of these fatalities occur in sub-Saharan Africa.

http://www.itweb.co.za/index.php?option=com_content&view=article&id=52695

Monday, August 11, 14

Page 75: Machine Learning for Societal Applications

US IgniteEnables

Innovative “Big Data” Application Development for an

Application Development Hub

!!

Don HicksMonday, August 11, 14


Recommended