+ All Categories
Home > Documents > Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship...

Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship...

Date post: 26-Apr-2018
Category:
Upload: buidiep
View: 216 times
Download: 1 times
Share this document with a friend
17
Water Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. March 2013 February 2014 Megan Gehrke, Graduate Student, CSU Monterey Bay Advisor: Sarah Lopez, Central Coast Water Quality Preservation, Inc. Submitted: May 2014
Transcript
Page 1: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

Water Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc.

March 2013 – February 2014

Megan Gehrke, Graduate Student, CSU Monterey Bay

Advisor: Sarah Lopez, Central Coast Water Quality Preservation, Inc.

Submitted: May 2014

Page 2: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

2

Table  of  Contents  

Acknowledgements  ...............................................................................................................  3  Executive Summary  ..............................................................................................................  4  Introduction  ..............................................................................................................................  5  Project Objectives  .................................................................................................................  5  Project Approach  ...................................................................................................................  6  Project Outcomes  ..................................................................................................................  6  Sample  Completeness  ........................................................................................................................  7  Literature  Review  ...............................................................................................................................  7  “R”  Code  and  Outputs  .........................................................................................................................  9  

Conclusion  ..............................................................................................................................  10  References  ..............................................................................................................................  11  Appendix – Samples of Figures and Graphs Produced  .......................................  12  

Page 3: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

3

Acknowledgements This project was supported by Agriculture and Food Research Initiative

Competitive Grant no. 2011-38422-31204 from the USDA National Institute of

Food and Agriculture.

Page 4: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

4

Executive Summary Central Coast Water Quality Preservation, Inc. (CCWQP) is charged with

implementing a Cooperative Monitoring Program (CMP), in which monthly water

quality samples from 50 sites in agricultural watersheds of the Central Coast are

collected. The major goal of the CMP is to show changes in water quality over

time, hopefully related to advances in agricultural practices that protect water

quality. This internship was focused on the development of code for use within

“R” statistical software to increase automation in the production of figures, tables,

and statistical results for the CMP annual report.

My specific objectives for this internship were to enhance my knowledge and

skills pertaining to agricultural water quality issues and water quality data

analysis, as well as to gain further professional experience in this field. I was able

to meet my objectives through experience working with, and learning how to

visually and quantitatively interpret, a large agricultural water quality dataset with

guidance from my advisor at CCWQP. Specifically, these objectives were met

through internship tasks, which included verifying dataset completeness,

researching water quality statistical trend analyses, and creating “R” code for the

purposes of trend analysis, data characterizations, and data summaries.

This experience has helped prepare me for a future career in water quality and

environmental analysis, which could include future work with the USDA under the

Forest Service or the NRCS.

Page 5: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

5

Introduction

Central Coast Water Quality Preservation, Inc. (CCWQP) is charged with

implementing a Cooperative Monitoring Program (CMP), in which monthly water

quality samples from 50 sites in agricultural watersheds of the Central Coast are

collected. The samples are analyzed for ammonia, chlorophyll a, conductivity,

total dissolved solids, nitrate, dissolved oxygen and oxygen saturation, pH,

salinity, and turbidity, in addition to in-field measurements of air temperature,

water temperature, and flow. Samples are also analyzed for toxicity to

invertebrates, fish, and algae 4 times annually. The results are reported to the

Regional Water Quality Control Board (RWQCB) on behalf of farmers enrolled in

the Conditional Waiver for Irrigated Lands. The major goal of the CMP is to show

changes in water quality over time, hopefully related to advances in agricultural

practices that protect water quality. Past CMP reports were used by farmers,

regulators, conservation agencies, researchers, and environmental

organizations. This internship was focused on the development of code for use

within “R” statistical software package (R Core Team 2012) to increase

automation in the production of figures, tables, and statistical trend analysis

results for the CMP annual report.

Project Objectives

The main objective for this internship was to develop and implement code for use

within “R” to provide figures and results for the 2013 CMP report, and to provide

a means of automation for the production of these figures and results for future

annual reports. A personal goal for this internship was to gain professional

experience relevant to watershed science and policy. Specifically, through this

internship I hoped to enhance my knowledge of the field of agricultural water

quality, as well as my data and statistical analysis and reporting skills. Having

these skills and experience will greatly improve my ability to pursue my desired

career path, especially at agencies such as the USDA.

Page 6: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

6

Project Approach The internship was structured according to the following tasks:

1. Data preparation and sample completeness check – “R” was used to

review 7 annual data files for completeness, as well as to format and

collate these files as necessary to support further data analysis.

2. Literature review – A review of water quality trend analysis literature was

performed.

3. Trend analyses – Code was developed for use within “R” to test for trends

within the 7-year dataset. Trend tests used include Seasonal Mann

Kendall and Bayesian Point Change tests. Additionally, routines were

created to provide summary figures of the Seasonal Mann Kendall trend

analysis results.

4. Time series and data characterization – Code was developed for use

within “R” to perform data summaries and to create figures for time series

plots, box plots, stacked proportional bar plots, and pie charts of

regulatory exceedances.

These tasks were successfully completed through instruction and guidance from

my supervisor at CCWQP. Additionally, a great deal of independent work and

research was required to improve my skills in “R” and my knowledge of

environmental statistical analysis techniques.

Project Outcomes Through this project I gained extensive experience with analyzing a large water

quality dataset using numerous data and statistical analysis techniques.

Additionally, I was able to broaden my knowledge of agricultural water quality

issues as well as my technical skills in Excel and “R”.

Page 7: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

7

Sample Completeness The dataset was tested for completeness using “R”. The results showed the

locations of missing data in the dataset. Sampling event summaries were then

reviewed for each missing data value to determine whether or not the value was

truly missing from the dataset.

Literature Review A literature review of work on water quality trend analysis was performed, with

focus on agricultural water quality and the Seasonal Mann Kendall trend test

(Table 1).

Citation   Summary  Antonopoulos  VZ,  Papamichail  DM,  Mitsiou  KA.  2001.  Statistical  and  trend  analysis  of  water  quality  and  quantity  data  for  the  Strymon  River  in  Greece.  Hydrology  and  Earth  System  Sciences,  5(4):679-­‐691.  

Monthly  water  quality  and  discharge  data  were  analyzed  for  trends  and  evaluated  for  best-­‐fit  models.  The  relationships  between  concentrations  and  loads  with  discharge  were  also  examined,  using  simple  regression.  Relation  between  concentration  and  discharge  was  weak,  while  relation  between  load  and  discharge  was  very  strong.  Trends  were  detected  using  the  non-­‐parametric  Spearman's  criterion.  

Bekele  A,  McFarland  A.  2004.  Regression-­‐based  flow  adjustment  procedures  for  trend  analysis  of  water  quality  data.  Transactions  of  the  ASAE,  47(4):1093-­‐1104.  

Used  non-­‐parametric  Kendall's  tau  trend  test  (which  is  suitable  for  situations  in  which  data  are  non-­‐normal,  missing  values,  or  is  censored;  non-­‐parametric  tests  are  preferred  for  data  sets  of  moderate  length)  to  test  for  monotonic  trends  of  data  adjusted  for  flow  in  3  different  ways.  Kendall's  tau  is  based  on  the  rank  order  statistic  (compares  rank  rather  than  actual  values).  Objective  was  to  determine  best  flow  adjustment  method  (OLS  or  LOWESS).  The  LOWESS  method  was  found  to  be  more  appropriate  than  the  OLS  method  because  it  was  better  able  to  define  relationships  between  constituent  concentration  and  flow.  The  default  f-­‐value  of  0.5  was  found  to  be  adequate  for  reducing  variability  in  constituent  concentrations  due  to  flow.  

Table 1. Results of a literature review of work in water quality trend analysis, with focus on agricultural water quality and the Seasonal Mann Kendall trend test.

Page 8: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

8

Berryman  D,  Bobee  B,  Cluis  D,  Haemmerli  J.  1988.  Non-­‐parametric  tests  for  trend  detection  in  water  quality  time  series.  Water  Resources  Bulletin,  24(3):545-­‐556.  

Methods  for  choosing  trend  detection  tests  based  on  the  identification  of  sources  of  serial  dependence  in  the  time  series  are  discussed.  The  Spearman  and  Kendall  tests  for  monotonic  trends  and  the  Mann-­‐Whitney  test  for  the  detection  of  steps  are  identified  as  powerful  non-­‐parametric  tests.  

Bouraoui  F,  Turpin  N,  Boerlen  P.  1999.  Trend  analysis  of  nutrient  concentrations  and  loads  in  surface  water  in  an  intensively  fertilized  watershed.  J.  Environmental  Quality,  28(6):1878-­‐1885.  

Analyzed  nutrient  concentrations  and  loads  at  the  surface  water  outlet  of  a  heavily  fertilized  watershed.  A  non-­‐parametric  statistical  analysis  (seasonal  Mann-­‐Kendall)  was  performed  on  mean  monthly  and  mean  annual  data,  which  detected  no  trend.  Next,  data  were  compared  from  the  same  month  of  each  year  and  both  decreasing  and  increasing  trends  were  detected  for  certain  constituents.  The  Mann-­‐Kendall  test  was  chosen  based  upon  reviews  of  trend  analysis  of  water  quality  by  Walker  (1994),  Hirsch  et  al.  (1991),  and  Berryman  et  al.  (1988).  

Crain  AS,  Martin  GR.  2009.  Trends  in  surface  water  quality  at  selected  ambient-­‐monitoring  network  stations  in  Kentucky,  1979-­‐2004.  Scientific  Investigations  Report  2009-­‐5027,  USDOI  and  USGS.  

Used  the  S-­‐Plus  statistical  software  program  (designed  to  detect  monotonic  trends)  to  perform  trend  analyses  on  water  quality  data.  Tests  used  were  the  Seasonal  Kendall  non-­‐parametric  test  and  the  Tobit-­‐regression  parametric  test.  One  of  these  tests  was  selected  for  each  constituent.  Flow-­‐adjustment  methods  provided  with  the  S-­‐Plus  software  were  used  to  eliminate  effects  of  flow  on  WQ  variability.  

Hirsch  RM,  Slack  JR,  Smith  RA.  1982.  Techniques  of  trend  analysis  for  monthly  water  quality  data.  Water  Resources  Research,  18(1):107-­‐121.  

Presents  techniques  for  analysis  of  monotonic  water  quality  trends  that  account  for  non-­‐normal  distributions,  seasonality,  flow-­‐relatedness,  censored  values,  and  serial  correlation.  The  Seasonal  Kendall  test,  the  Kendall  slope  estimator  (an  estimator  of  trend  magnitude  for  skewed  data),  and  flow-­‐adjusted  constituent  concentrations  coupled  with  the  Seasonal  Kendall  test  were  explored.  Concluded  that  the  methods  explored  were  useful  for  long  time  series,  as  it  is  useful  to  have  a  set  of  objective  procedures  that  are  powerful  over  a  wide  range  of  situations  for  identifying  trends.  

McLeod  AI,  Hipel  KW,  Bodo  BA.  1991.  Trend  analysis  methodology  for  water  quality  time  series.  Environmetrics,  2(2):169-­‐200.  

Developed  a  general  trend  analysis  methodology  for  water  quality  time  series.  Designed  for  use  with  non-­‐normal,  positively  skewed  data,  with  seasonal  variation  and  interdependence  of  water  quality  variables  and  flow.  The  approach  is  divided  into  two  categories:  graphical  studies  and  trend  tests.  Trend  tests  included  Mann-­‐Kendall,  Kruskal-­‐Wallis,  and  Spearman's  partial-­‐rank  correlation.  It  was  found  that  (1)  the  Spearman  test  has  high  power  for  WQ  trend  testing  with  seasonality,  (2)  flow-­‐adjusted  WQ  data  can  eliminate  sampling  bias,  and  (3)  it  is  important  to  test  for  seasonality  before  applying  a  test  such  as  the  seasonal  Mann-­‐Kendall  test  (which  is  less  powerful  when  seasonality  is  not  present).  

Page 9: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

9

Renwick  WH,  Vanni  MJ,  Zhang  Q,  Patton  J.  2008.  Water  quality  trends  and  changing  agricultural  practices  in  a  Midwest  U.S.  watershed,  1994-­‐2006.  J.  Environmental  Quality,  37:1862-­‐1874.  

Analyzed  changes  in  farm  management  practices  that  were  likely  to  have  an  effect  on  water  quality.  Used  an  auto-­‐regressive  moving  average  model  to  include  effects  of  discharge  and  season  on  constituent  concentrations.  Also  used  LOWESS  plots  and  analyses  of  changes  in  relation  between  discharge  and  concentration.  

Yu  Y,  Zou  S,  Whittemore  D.  1993.  Non-­‐parametric  trend  analysis  of  water  quality  data  of  rivers  in  Kansas.  J.  Hydrology,  150:61-­‐80.  

Four  different  non-­‐parametric  trend  detection  methods  (Mann-­‐Kendall,  Seasonal  Kendall,  Sen's  T  test,  Van  Belle  and  Hughes  Chi-­‐square  test)  were  used  for  a  9-­‐year  water  quality  dataset.  The  different  methods  were  compared  and  were  found  to  have  practically  equal  power  for  datasets  of  at  least  9  years  in  length.  Lays  out  the  steps  for  preliminary  analyses  (dist,  dependence,  seasonality,  flow  relatedness  tests).  

“R” Code and Outputs “R” code was developed to automate the creation of boxplots, time-series plots,

precipitation and flow plots, turbidity and flow plots, stacked barplots for toxicity

results, pie charts of regulatory exceedances, and summary statistics tables (see

Appendix for samples of these products).

Additionally, code was developed to analyze water quality trends using the

Seasonal Mann Kendall test from the Kendall package (A.I. McLeod 2011). For

sampling sites with sufficient data, each analyte was tested for long-term

monotonic trends using these routines. The data were tested for trends across

both the full dataset as well as the dataset divided into wet and dry months. The

dataset was divided by season due to potential differences in trends detected as

a result of the effects of seasonality on agricultural water quality data. Meaning

that irrigation during the dry season and precipitation during the wet season may

have differing effects on water quality. Another “R” routine was developed to

create summary figures of the Seasonal Mann Kendall test results for each

hydrologic unit.

Page 10: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

10

The analyses, figures, and tables discussed above were produced for inclusion in

the 2013 CMP report. The “R” routines themselves will be used for reproduction

of these items in future annual reports.

Conclusion Through this internship I was able to gain and develop valuable knowledge,

skills, and experience in my field of interest. I learned a great deal about water

quality within agricultural watersheds on the Central Coast and gained the skills

necessary to evaluate a large water quality dataset in terms of general data

characterization, statistical trend analysis, and summary statistics. Through all of

this, I gained further expertise in the use of “R” and am more confident in my

technical skills and knowledge. These skills are extremely valuable to my

professional development and to the enhancement of future career opportunities.

Page 11: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

11

References

A.I. McLeod (2011). Kendall: Kendall rank correlation and Mann-Kendall trend

test. R package version 2.2. http://CRAN.R-project.org/package=Kendall

R Core Team (2012). R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL

http://www.R-project.org/

Page 12: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

12

Appendix – Samples of Figures and Graphs Produced A. Precipitation and flow time series plot.

B. Toxicity stacked proportional bar plot.

Page 13: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

13

C. Turbidity and flow plots.

Page 14: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

14

D. Pie charts showing proportions of regulatory exceedances.

Page 15: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

15

E. Trend analysis summary

Page 16: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

16

F. Analyte boxplots.

Page 17: Water Quality Data Analysis & R Programming … Quality Data Analysis & R Programming Internship Central Coast Water Quality Preservation, Inc. ... Project Objectives& ...

17

G. Analyte time series plots.


Recommended