+ All Categories
Home > Documents > WithinReachFinalReport

WithinReachFinalReport

Date post: 17-Aug-2015
Category:
Upload: jinyang-luo
View: 2 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Discovering Links Between Health & Hunger Team 2: Jinyang Luo, Drew Hess & Brandi Riddle
Transcript

Discovering Links Between Health & Hunger

Team 2:Jinyang Luo, Drew Hess & Brandi

Riddle

Executive Summary: Discovering Links Between Health & Hunger

Goals of the ProjectThe goal of the project was to find tenable correlations between the eight (or more) topics

provided by WithinReach. Correlations were found by finding county level data related to the variables, performing statistical analysis, and then symbolizing the relationships most likely to be backed by accessible peer reviewed research. The maps and data are meant to provide WithinReach with information to present to policy and decision making bodies. These organizations can then write legislation, create or fund programs or produce partnerships between the organizations and WithinReach.

ObjectiveThe project collected data on several factors in Washington State (Risk factors, spatial

locations, differences of occurrences among groups) and initiate the building of a database from which WithinReach can draw for future projects. The best way to illustrate these factors in the database is by creating maps of single variables for reference and then relationship maps for organizational use. Additionally, the project hopes that the database created through this project will be flexible enough to accomodate more data and readable enough to be used by people at WithinReach as well as computational analysis. See Appendix C for more information on the contents of the database as well as the processes used to add to or modify it.

Action Report1. The project began by reading through the articles found during Internet research, then came up with ideas on where the best data for the task should be found. Writing a few sentences on how the views within each reading relate to the objectives at hand was useful for deciding where to find more data.

2. The group then searched the web for data. Most of the data was found from nonprofit sites and government organizations, and the sources can be found on the “Final Data Source” file.

3. The data, however, had to be organized to fit the requirements for correlation (common rows, similar units.) The study decided to organize data by county, modifying data to fit this unit and discarding it if it did not exist. Additionally, data in percent form was chiefly used for the analysis itself, and some data had to be converted from “per 100,000” to percent format.

4. The study then developed hypotheses for correlations, and did so by finding supporting research where possible.

5. They then used statistical analysis with SPSS and ArcGIS to compare the data. In particular,

the Pearson’s Correlation was used with each pair of factors to measure the correlation with each set of data, compared at the county-level.

6. The team then created maps symbolizing the single variables in a simplistic way, and then created maps showing relationships in the most easily understandable way, illustrating the greatest similarities/differences.

7. One idea that has been posed is to produce as many maps or graphs as our positively correlated hypothesis allows and then pick the hypothesis most supported by research and put it in online format (CartoDB, MapBox etc.) as an example.

Detailed Steps:

1. Mined data from various web resources and publications. This involved converting PDFs into a readable format such as Microsoft Excel. Documents or whole web pages were also used and involved throwing them into data exporters such as import.io, copying and pasting tables into Excel, and manually creating spreadsheets of the data from images, as it is difficult to remove data from images without coding prowess.

American Community Survey US Census Washington State Department of Health WAGDA King, Pierce, Thurston, Kitsap, Spokane County etc. GIS databases Federal Data Resources: such as data gov., the dataweb, USDA gis data etc.Other University Research Publications for: UCLA library, Oregon library etc.

Found information on our variables.And then check:Puget Sound Regional Council, King, Pierce, Thurston, Kitsap, Spokane County etc. GIS databasesOrganizational Websites related to food security, food policy and distribution points in Washington State

From research and articles in our reading list, we were able to determine the type of variables that will stand in for adverse childhood experiences. They are:Single parent; no-parent householdsSingle, unemployed parent householdLow income household

2. Found supporting publications, articles, journals, studies or institutional documents and websites supporting the data to have basis for creating hypothesis.

3. Created hypothesis to orient our analysis.

Potential Hypotheses and requested hypotheses:● If the cost of care for a procedure or treatment increases then the cost of coverage rises.● If a household/county has higher rate of food insecurity their healthcare enrollment rates

are likely to be lower.● The number of hospitalization events related to diabetes or obesity decreases with

increased insurance coverage.● If a county is food insecure then it is likely to have higher rates of obesity.● If a county has high rates of breastfeeding then the county will have lower rates of

obesity.● If the county has higher rates of breastfeeding then it will have lower rates of food

insecurity.● If a county has higher rates of breastfeeding it will have a lower rate of low income

households/single parent households. ● If a county has a higher rate of single parent households then it has a higher rate of food

insecurity.

Though most of these may not being completely correlated, they may all have the underlying common factor of poverty, which contributes to a higher possibility of correlation.

4. Compiled and created spreadsheet of found data.

5. Ensured all data is formatted similarly to allow for input into programs such as SPSS.

6. A group member familiar with Python took a script and modified it to take all of the data into a single spreadsheet and find the correlation levels between every single variable and every other variable at the same time.

7. Chose highly correlated variables and created a list of most tenable correlations, and create another document to display them.

8. Created basemap layer of Washington State counties in ArcMap:Took a country level basemap that shows counties, select by attributes: 'STATE =

Washington'. Then turned the selection into a layer and removed the country layer.

Ensured the data is set to the following spatial reference and projection:

NAD 1983 StatePlane Washington North FIPS 4601 FeetProjection: Lambert Conformal ConicDatum: North American Datum 1983Units: Foot US

9. Joined the spreadsheet of data to the spatial data (WA state counties layer) so spreadsheet data now has a reference.

10. Produced nine single variable maps and represented the rates with a graduated color map.

11. Produced nine double variable maps to show the most interesting correlations and used the graduated symbol combined with graduated color to compare the two variables within each counties. Used similar colors to represent the positive correlations, and contrasting colors to represent the negative correlations.

Key Findings:Overview of Key Findings - There are many possible correlations to explore. Most correlations occur between breastfeeding and diabetes with other variables. There is a need for further study.

Overview Key Resources Developed - 1. Database2. Single data table3. County level shape file with joined data4. Maps in picture format5. Python script for future analysis in SPSS

Recommendations/Next-Steps -1. Go through to find correlations that best suit Within Reach2. Search for more studies to back up correlations3. Compile studies4. Create online map or platform to display information easily or according to needs. 5. Present to policy makers

Appendix A: Original Implementation Plan (April 2015)

Goals:The goal of the project is to find relationships, or actual correlations, between the 8 (or

more) topics provided by producing map packages that overlay each elements data with one-another. The resulting maps and data will provide Within Reach with information to present to

policy and decision making bodies. These organizations can then write legislation, create programs or produce partnerships between the organizations and Within Reach.

Objectives: To collect data on several factors in King County (Risk factors, spatial locations,

differences of occurrences among groups) and initiate the building of a database from which Within Reach can draw for future projects. In the end, it will be necessary to create maps portraying these factors in King County, the greater Seattle area, or perhaps Washington State.

Outputs:

The results of the study will be many maps comparing one factor described below with another factor. Not every combination will be used, but the factor combinations with the highest amounts of correlation (as confirmed through statistical analysis) will be showcased through the maps sent to Within Reach and presented at the Symposium. Additionally, the organization hopes to be able to collect and make use of a geographic database with which more studies involving these factors will be possible. Some work might have to be done with the geodatabase to make sure it can be used and interpreted by people who do not have experience with GIS analysis.

Data and analysis needs:

The following are topics that our output data will derive from. In the future, we will combine the data our team has collected with the data team AA has collected.

1. Obesity Rates (Drew)2. Breastfeeding Rates – The rate of women who are still breastfeeding at 6 months (Regina)3. Food Insecurity / High Food Insecurity (Brandi)4. Adverse Childhood Experiences (ACES) (Brandi)5. Rates of Diabetes (Drew)6. Health care costs (we are unsure what would be a good measure for this; one idea was the cost of hospitalization for diabetes for a night. We are open to ideas) (Brandi)7. Health Insurance Coverage Rates (Regina)8. Enrollment Rate in Medicaid (Drew)

To collect data on all of these factors, it will be necessary to browse a multitude of different websites. Public, government-collected data such as the U.S. Census or the American Community Survey (ACS) will be useful for general overviews of factors, but the data will not be complete without sources collected from efforts such as non-profit organizations or volunteers. The project’s liaison, Carrie Glover, has an idea of where much of this data will be located, and it may not be publicly accessible.

When the data has been collected, it may need to be converted into a proper form to use in GIS analysis. For example, a list of addresses will need to be geocoded into a set of spatial coordinates in order for them to be displayed on a map. Additional use of websites such as online geocoders may be useful for completing this step.

We do not yet know what data types are available for every topic. We will update the plan accordingly as new information comes in. From each of these categories we will produce general overlay maps with data we found. If there are any data we couldn’t find online, we will reach out to Carrie Glover, of Within Reach, to provide us more resources.

Technology needs & acquisition/setup tasks:

The majority of the planning and communication will take place in a folder the team has created in Google Drive. Within this folder contains ideas and links to data sources on how to proceed with the project. This website is useful as it allows simultaneous collaboration on text and spreadsheet documents. In the future, if space limits permit, Google Drive may be used for storage of the geodatabase as well as the data used in the analysis. If this is not feasible, websites such as GitHub can be used to store and share the database.

Once data has been collected, it may need to be converted into a different format to be compatible with geospatial analysis. Online websites such as the GPS Visualizer’s Easy Batch Geocoder (available for free at http://www.gpsvisualizer.com/geocoder/) will be useful for creation of new spatial data that can be positioned on a map and illustrated with one or more GIS programs.

We will be using GIS programs such as ArcGIS 10.2 or QGIS 2.8 to perform spatial analysis as well as to create the maps themselves for distribution. ArcGIS will be the best tool to use for performing the spatial analysis itself, while QGIS will be more useful for creating aesthetically pleasing maps. However, care must be taken to ensure data will be easily transferrable between the two programs, if this approach is used.

In addition, SPSS will be used to compare the relations of two statistics to figure out if they are statistically significant or not. The paired t-test will be used for this calculation if possible. However, for this approach to work, each statistic must have a corresponding statistic in the same general area: for example, statistics on food insecurity in Capitol Hill must have a corresponding statistic on health care costs in Capitol Hill. It may be necessary to perform join and spatial join operations on this data in order to find pairs of data.

The methods of geospatial analysis will differ depending on the type of data collected. With zonal data, such as data derived from the U.S. Census, it will be possible to overlay data

illustrating two different factors “on top” of each other, resulting in correlations being found across these two factors. ArcGIS’s “normalization” option will be useful for this, allowing illustration of one variable divided by another variable to be possible. In the future, a more complex cartographic model should be built, which would make it easier to visualize what should be done in regards to geospatial analysis.

Action Plan:

1. Read through the articles located from Internet research, then come up with ideas on where the best data for the task should be found. Writing a few sentences on how the views within each reading relate to the objectives at hand will be useful for deciding where to find more data.

2. Mine the following:a. American Community Survey b. US Censusc. Washington State Department of Healthd. WAGDAe. King, Pierce, Thurston, Kitsap, Spokane County etc. GIS databasesf. Federal Data Resources: such as data gov., the dataweb, USDA gis data etc.g. Other University Research Publications for: UCLA library, Oregon library etc.

- Information on rates of breastfeeding, health care costs, insurance rates and enrollment, rates of diabetes and rates of obesity in the state of Washington.

And then check:a. Puget Sound Regional Councilb. King, Pierce, Thurston, Kitsap, Spokane County etc. GIS databasesc. Organizational Websites related to food security, food policy and distribution

points in Washington state

3. Overlay each factor with each other factor (including those collected by group AA).

4. Use statistical analysis with SPSS/ArcGIS to compare the data, create series of maps illustrating the greatest similarities/differences.

5. Creating a map of every combination would involve making many maps, so perhaps we will have to implement a plan to convert to CartoDB, Mapbox or some other online mapping platform. This is all contingent on the type of data we can find or if we can even convert the all data to usable format in the allotted time. The hope is to have a series of general overlay maps that we can input to an online platform and produce clickable, easily presentable maps.

We’ll need to get data then:a. geocode, create; points, lines, polygons etc.b. convert to shapefile or mapbox or cartodb extensionsc. pick a mapping platform; ArcMap, QGIS, CartoDB, MapBox (one client is most

comfortable with)i. look for one that will present the information in the wat the client feels fits best

d. find baselayer of county levele. add created data to mapf. depending on data type (rates, percentages, numbers etc.) find best way to represent

individual data and then begin overlay process. i. example: breastfeeding rates, with obesity rates

g. determine which maps are most essential or present the most credible correlations based on research (since correlation does not always equal causation)

h. create map packagei. style mapsj. create prints of most essential relationship mapsk. transfer to interactive map format

i. make clickable, searchable, allow overlays for other variables that didn’t make the print out cut

l. hand off map package, database and readings

Work Schedule:April 27 - May 1: Data Collection, Sorting and building/converting of data into map data (shapefile, geoJSON etc.)May 2 - 3: Continued data collection/creationMay 4 - May 8: Completion of any extra data collection and creation. Begin analysis of found variable to search for correlations. May 9 - 10: continue analysis on home computers when possibleMay 11 - 15: look at maps made and determine best to cross; find relationshipsMay 16 - 17: continue previousMay 18 - 22: perform analysis for new map sets with potential for best impact and highest credible correlation according to research and articlesMay 23 - 24: send for review and updates to client mid stage; focus on data integrity and analysisMay 25 - 29: Create final map packages and write instructions for replication on mapping platformMay 30 - 31: send to client for review for presentation input and style standards to match brandingJune 1 - 5: Create map designs and styles for presentation; if time allows build online interactive forms; finalize and send to client for review final stagesJune 6 - 7: edit as necessary; Present findings

Appendix B: Guide to D igital D ata

This portion of the report will describe the files and sections of the data with a brief explanation of what each contains.

1. ArcMap GIS Files - contains the county shapefiles with joined data attribute table2. Correlation Analysis - Contains the files showing the results of inputting the data into SPSS (Statistical Package for the Social Sciences).

a. Correlation sorting - is the Python script used to automatically perform analysis on a table of similarly formatted sets of data

b. Correlations Original Table - is the entirety of the analysis in a single table. Nothing is broken down

c. Results 90%, 95% 99% Confidence - are the variable combinations sorted into their respective statistical confidence levels

d. Most Interesting Correlations - this file contains a list of the correlations with the highest level of confidence or that matched (or did not match) any of our suggested hypotheses.

3. Data in Spreadsheets - This file contains the data for each variable before they were formatted into a single Excel spreadsheet. Databasev3.xls is the entire data set in one Excel sheet4. Relationship maps are standard resolution JPEGs of some of the more interesting statistically significant correlations the SPSS run came up with. The maps with contrasting colors for variables indicate a negative correlation. Maps with same or similar colors show a positive correlation. 5. Scatterplots - shows the line of best fit, regression analysis, for the relationship of the variables. By looking at how data falls along the line, we can determine whether the correlation is strong or weak. Despite how strongly some variables correlate, the line of best fit provides us a quick way to see if the relationship could be meaningful. 6. Single Variable Map - this file contains basic maps that visualize all of our assigned variables on Washington State counties.7. Data Source - is a file that contains the web url where we obtained the data sets for each variable. The sources are from reputable organizations that have their own techniques to gather this information at various levels of government.

Appendix C: User documentation

The file “DATABASEv3.xls” and “DATABASEv3.sav” within the are the final compilations of the data used in the analyses, already formatted for Excel and SPSS respectively. This data, taken from many sources, can be used simply as reference material for county-wide data within Washington State.

The data has been processed using SPSS and organized into combinations of factors which are statistically significant. These results can be found in the files “Results - 90% Confidence,” “Results - 95% Confidence,” and “Results - 99% Confidence.” Correlations within each of these files have a 10%, 5%, and 1% chance respectively to be statistically insignificant, and the latter files should be used for data if future studies wish to be sure the results are accurate.

Each correlation in the data has five columns: factor1, factor2, pearson, significance, and positive_correlation. Factors 1 and 2 are the names of the factors in the DATABASEv3 table that are correlated. The Pearson column is the Pearson’s coefficient: the higher this number is, the more correlated the factors are. The Significance column is the “p-value” of each correlation: the lower this number is, the less likely it is to be a fluke. The Positive Correlation is the polarity of each correlation. A 1 signifies a positive correlation, and a 0 signifies a negative correlation.

The file “Results - 90% Confidence” was used to make the file “Most Interesting Correlations.” This file contains relations that have been deemed by the study to be interesting correlations to study, although these are entirely subjective. If the organization wishes to use the data in any specific detail, they should browse the “Results” files directly.

Incorporating additional data for correlation and organization is possible with every file used in this folder, although without specific programs, it may not be feasible. Data with one entry per county can be inserted into the database file and can be used in analysis. If SPSS is available, running a Pearson’s Correlation test on “DATABASEv3.sav” will provide a table that can easily be exported into Excel format. It is possible to save this file as a comma-separated values list (CSV) within Excel, allowing the final result to be organized into a readable format with every statistically significant correlation within given confidence values (90% by default).

Appendix D: Hard Copy Maps

On the following two pages are some of the maps created in the creation of the project. The results are not limited to these maps, however; different factors can be compared using the data within the Data in Spreadsheets folder.

Appendix E: Findings

The following are screenshots of correlation tests done between variables. To get the full analysis please see the data document, "Correlations - Original Table" or any other file in the Correlation Folder.

Some of the most interesting correlations occurred between breastfeeding and diabetes with other variables. The big takeaway for us was the need for more research, studies and articles to back up the findings. The great thing is that the correlations we found have the potential to be a really great basis for further research. If WithinReach chooses to convince other organizations about the need for further study, they can use this analysis to advocate for it. Since WithinReach is primarily a pathway to Primary Resources for health and nutritional needs in Washington, putting out suggestions for further study could encourage outside groups or institutions to perform them.

Appendix F: Additional Materials - Tutorials

Working With SPSS http://www.spss-tutorials.com/spss-main-goals/QGIS (if ArcMap is unavailable)http://www.qgistutorials.com/en/