+ All Categories
Home > Documents > The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for...

The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for...

Date post: 16-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
35
Ryan A. Meier Central Michigan University 2015 IMAGIN Conference The Use of Affinity Propagation to Cluster U.S. Socio-economic Census Data
Transcript
Page 1: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Ryan A. Meier

Central Michigan University

2015 IMAGIN Conference

The Use of Affinity Propagation to Cluster U.S. Socio-economic Census Data

Page 2: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

• Socioeconomic

characteristic

• Data that represents the people

• Population

• Age

• Gender

• Race Identity

• Education

• Family Size

• Household Size

• Employment Sector

• Income

• Marital status

• Nativity

• Language Spoken

WHAT COMMUNITIES BEST REPRESENT THE

UNITED STATES?

Page 3: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

• Middletown – Lynd and Lynd (1929)

-- Introduction of Data Mining –

• PRIZM – John Robbin (1978)

• The Clustering of America – Michael Weiss (1988)

• Our Patchwork Nation – Chinni and Gimpel (2010)

PREVIOUS STUDIES

Page 4: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

OBJECTIVE

• Map U.S. census socio-demographic data using affinity propagation to group zip codes into meaningful clusters.

• Identify exemplar locations of the U.S. to be used as ideal sample

sites in future research.

• Combine GIS techniques with a novel statistical analysis.

• Demonstrate an objective method for the generalization and analysis of large data sets.

Page 5: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

• Frey Labs, University of Toronto

• Clustering algorithm that expresses exemplars

− Most representative data point in the cluster

• Considers all data points as exemplars

• Parameters:

− Dissimilarity Matrix

− Preference Value

• Previous Studies

AFFINITY PROPAGATION

A visualization of AP cluster

classification with exemplar data points

(Bodenhofer, 2013 p. 3).

Page 6: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

• Cardille and Lambois (2010)

• Objectively identify signature landscapes of the U.S.

• Aid in ecosystem management

• Land use/ Land cover satellite imagery

• 17 distinct landscapes

• Interesting insight into exemplar landscapes

− Human signature in almost every exemplar

FROM THE REDWOOD FOREST TO THE GULF

STREAM WATERS: HUMAN SIGNATURE NEARLY

UBIQUITOUS IN REPRESENTATIVE US LANDSCAPES

Page 7: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• 2010 U.S. Census Data and 2008-2012 five year estimate American Community Survey (ACS)

• Zip Code Tabulation Areas (ZCTAstm)

• 40 different attributes:

Population density, age, gender, race identity, educational attainment, family size, household size, employment sector, income, marital status, nativity and place of birth, and language spoken at home.

Download Data

from U.S. Census

Bureau

Page 8: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• Null data values replaced or removed.

• The z-score for each variable was calculated to standardize the dataset.

Download Data

from U.S. Census

Bureau

Format Data in to

Spreadsheet and

Calculate Z-scores

Page 9: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• Reduced data size by 35% while maintaining 95% of information.

• Eliminated correlated data.

• Increased RAM efficiency.

• Decreased overall running time.

Download Data

from U.S. Census

Bureau

Format Data in to

Spreadsheet and

Calculate Z-scores

Run PCA on Entire

Dataset

Page 10: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• Matrix of how ‘different’ each point is from every other point.

• Size of matrix is exponential to the number of data points.

• 1,052,418,481 pairwise dissimilarities for 32,441 ZCTAs.

• Negative weighted Euclidian distance between z-scores in n-dimensional space.

Download Data

from U.S. Census

Bureau

Format Data in to

Spreadsheet and

Calculate Z-scores

Run PCA on Entire

Dataset

Create a Dissimilarity

Matrix from PCA

Results−

𝑗=1

𝐽

𝑤𝑗 𝑥𝑗 − 𝑦𝑗

Page 11: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• R package APCluster (Bodenhofer et al., 2011).

• Approximate run time was 20 hours using 50GB RAM on a 3.5 GHz current Xeon processor.

• AP runs only on one processor.

Download Data

from U.S. Census

Bureau

Format Data in to

Spreadsheet and

Calculate Z-scores

Run PCA on Entire

Dataset

Create a Dissimilarity

Matrix from PCA

Results

Run AP Using

Dissimilarity Matrix

and Preference Value

Page 12: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

METHODS

• Mapped using ArcMap 10.2.

Download Data

from U.S. Census

Bureau

Format Data in to

Spreadsheet and

Calculate Z-scores

Run PCA on Entire

Dataset

Create a Dissimilarity

Matrix from PCA

Results

Run AP Using

Dissimilarity Matrix

and Preference Value

Map Results

Page 13: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

RESULTS

• 22 unique clusters

and exemplars

• Appearance of regions and spatial patterns

Page 14: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

WHY 22 CLUSTERS?

A graph showing the number of resulting clusters based on the preference value, starting with -11.705, the minimum value in the dissimilarity matrix. The red dot indicates the chosen preference value of -234.1.

Page 15: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

22 Clusters of

America

1. Worcester, MA2. Newtonville, MA3. Easton, ME4. Lodi, NJ5. Greenwhich, NJ6. Edison, NJ7. Penn Yan, NY8. Savannah, GA9. Hagerhill, KY10. Columbus, OH11. Fort Wayne, IN12. Wabash, IN13. Northome, MN14. Aurora, IL15. Downers Grove, IL

16. Pierce City, MO17. Hardy, NE18. Lafayette, LA19. Sanger, TX20. Lockhart, TX21. Desert Hot

Springs, CA22. Lower Kalskag, AK

Page 16: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

CLUSTER REGIONALITY AND PATTERNS

Page 17: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 10 & 18- The South

10. Columbus, OH18. Lafayette, LA

Page 18: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 10 & 18- The South

Page 19: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 9- Native Born Caucasian

9. Hagerhill, KY

Page 20: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 22- American Indian and

Alaska Native

22. Lower Kalskag, AK

Page 21: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 22- American Indian and

Alaska Native

22. Lower Kalskag, AK

Page 22: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 14 & 21- Hispanic and Latino

14. Aurora, IL 21. Desert Hot

Springs, CA

Page 23: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 1, 2, 6, & 15- Cities and Suburban area

1. Worcester, MA2. Newtonville, MA6. Edison, NJ15. Downers Grove, IL

Page 24: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 1, 2, 6, & 15- Cities and Suburban area

1. Worcester, MA2. Newtonville, MA6. Edison, NJ15. Downers Grove, IL

Page 25: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Cluster 5, 7, 12, & 13- Rural Areas

5. Greenwhich, NJ7. Penn Yan, NY12. Wabash, IN13. Northome, MN

Page 26: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

DEGREE OF CLUSTERING

Page 27: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Degree of Clustering Spatial

Analysis

Page 28: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

DISCUSSION

Page 29: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

COMPARED TO PREVIOUS STUDIES

• Easier to see regions

• Looks cleaner

• Shows more heterogeneity

• Greater detail of urban areas

Chinni and Gimpel (2010)

Page 30: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

22 Clusters of

America

1. Worcester, MA2. Newtonville, MA3. Easton, ME4. Lodi, NJ5. Greenwhich, NJ6. Edison, NJ7. Penn Yan, NY8. Savannah, GA9. Hagerhill, KY10. Columbus, OH11. Fort Wayne, IN12. Wabash, IN13. Northome, MN14. Aurora, IL15. Downers Grove, IL

16. Pierce City, MO17. Hardy, NE18. Lafayette, LA19. Sanger, TX20. Lockhart, TX21. Desert Hot

Springs, CA22. Lower Kalskag, AK

Page 31: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

CONCLUSION

Page 32: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

Questions?

Page 33: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

REFERENCES

Anderson, M. J. 1988. The American Census: A Social History. New Haven, CT: Yale University Press. pp.

Bodenhofer, U., and Kothmeier, A., 2011. APCluster: An R Package for Affinity Propagation Clustering. In:

Bioinformatics, 27: pp. 2463-2464

Cardille, J. A., and Lambois, M., 2010. From The Redwood Forest to the Gulf Stream Waters: Human Signature Nearly

Ubiquitous in Representative US Landscapes. In: Frontiers in Ecology and the Environment, 8(3): pp. 130-

134.

Chang, C-J., and Shyue, S-W., 2009. A Study on the Application of Data Mining to Disadvantaged Social Classes in

Taiwan’s Population Census. In: Ecpert Systems with Applications, 36(1): pp. 510-518.

Chinni, D., and Gimpel, J., 2010. Our Patchwork Nation. New York, USA: Penguin Group Inc.

Dueck, D., and Frey, B. J., 2007. Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In:

Proceedings, 11th IEEE International Conference, Rio de Janeiro, Brazil, Computer Vision, pp. 1-8.

Fan, B., 2009. A Hybrid Spatial Data Clustering Method for Site Selection: The Data Driven Approach of GIS Mining. In:

Expert Systems with Applications, 36(2 part II): pp. 3923-3936.

Fligstein, N., 1981. Going North: Migration of Blacks and Whites from the South, 1900-1950. New York, USA: Academic

Press, Inc.

Frey, B. J., and Dueck, D., 2007. Clustering by Passing Messages Between Data Points. In: Science 315: pp. 972-76.

Furse, D. H., Punj, G. N., and Stewart, D. W., 1984. A Typology of Individual Search Strategies Among Purchasers of New

Automobiles. In: Journal of Consumer Research, 10(4): pp. 417-431.

Goss, J., 1995. “We Know Who You Are and We Know Where You Live”: The Instrumental Rationality of

Geodemographic Systems. In: Economic Geography, 71(2): pp. 171-198.

Page 34: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

REFERENCES cont.

Green, P. E., Frank, R. E., and Robinson, P. J., 1967. Cluster Analysis in Test Market Selection. In: Management Science

(pre-1986), 13(8): pp. B387 (14).

Greenacre, M., 2005. Weighted Metric Multidimensional Scaling. In: Studies in Classification, Data Analysis, and

Knowledge Organization: pp. 141-149.

Hanson, Sandra L., 2004. Classic Book Reviews: The Past Revived. In: Journal of Marriage and Family 62, 3: pp. 847-49.

Helsen, K., and Green, P. E., 1991. A Computational Study of Replicated Clustering with an Application to Market

Segmentation. In: Decision Sciences, 22(5): pp. 1124-1141.

Karimipour, F., Delavar, M. R., and Kinaie, M., 2005. Water Quality Management Using GIS Data Mining. In: Journal of

Environmental Informatics, 5(2): pp. 61-72.

Keim, D. A., Panse, C., Sips, M., and North, S. C., 2004. Pixel Based Visual Data Mining of Geo-spatial Data. In:

Computers & Graphics, 28: pp. 327-344.

Kopanakis, I., and Theodoulidis, B., 2003. Visual Data Mining Modeling Techniques for the Visualization of Mining

Outcomes. In: Journal of Visual Languages & Computing, 14(6): pp. 543-589.

Lê, J. S., and Husson, F., 2008. FactoMineR: An R Package for Multivariate Analysis. In: Journal of Statistical Software,

25(1): pp. 1-18.

Lynd, R. S., and Lynd H. M., 1929. Middletown. New York, USA: Harcourt, Brace & World, Inc. pp. 3-9.

Mennis, J., and Guo, D., 2009. Spatial Data Mining and Geographic Knowledge Discovery—An introduction. In:

Computers, Environment and Urban Systems, 33(6): pp. 403-408.

Mines, R., 1981. Developing a Community Tradition of Migration: A Field Study in rural Zacatecas. Mexico, and

California Settlement Areas. In: Center for U.S.-Mexican Studies. UC San Diego.

Page 35: The Use of Affinity Propagation to Cluster Socio-economic ...Non-Metric Affinity Propagation for Un-Supervised Image Categorization. In: Proceedings, 11th IEEE International Conference,

REFERENCES cont.

Murray, C., Kulkarni, S., Michaud, C., Tomijima, N., Bulzacchelli, M., Iandiorio, T., and Ezzati, M., 2006. Eight Americas:

Investigating Mortality Disparities Across Counties, and Race-Counties in the United States. In: PLoS

Medicine, 3(9): pp. 1513-1524.

Paasi, A., 2004. Place and Region: Looking Through the Prism of Scale. In: Progress in Human Geography 28(4): pp.

536-546.

Punj, G., and Stewart, D. W., 1983. Cluster Analysis in Marketing Research: Review and Suggestions for Application. In:

Journal of Marketing Research, 20(2): pp. 134-148.

Rouse, R., 1991. Mexican Migration and the Social Space Postmodernism. In: Diaspora: A Journal of Transnational

Studies, 1(1): pp. 8-23.

Slocum, T., McMaster, R., Kessler, F., and Howard, H., 2009. Data Classification. In: Thematic Cartography and

Geovisualization 3rd, Upper Saddle River, NJ: Pearson Education Inc. pp. 57-75.

Spielman, S. E., and Thill, J-C., 2008. Social Area Analysis, Data Mining, and GIS. In: Computers, Environment and Urban

Systems, 32(2): pp. 110-122.

Weiss, M. J., 1988. The Clustering of America. New York, USA: Harper & Row, Pubishers.

Winkle, K., 1991. The U.S. Census as a Source in Political History. In: Social Science History, 15(4): pp. 565-57.


Recommended