Spatial Analysis of Surnames in Great Britain James Cheshire Department of Geography and CASA, UCL...

Spatial Analysis of Surnames in Great Britain

James CheshireDepartment of Geography and CASA, UCL

jamescheshire.co.uk

“It may be thought by some that the investigation of the distribution of names is an idle amusement, productive of no utility of man. I have come to think, however...that it is a matter of much importance to the antiquarian, the historian the ethnologist and also to the more practical politician” Henry Guppy, 1890.

Outline- Surnames in Great Britain*.- Surnames and Geography.- Research aims.- Surnames and Genetics.- Unearthing Great Britain’s surname regions.- Effects of scale.- 2 interesting examples.- Surname regions in Great Britain?- Future research.

* I will be talking about every surname registered in Great Britain. The majority would have originated in Britain; these remain the dominant driver of surname regions and will therefore be the focus of the contextual information that follows. When I refer to Great British surnames I am, however, referring to those registered, not necessarily those originating in Britain.

What are surnames in Great Britain?- In 1066 the Normans “brought with them a

new, upper class fashion for surnames” (Miles, 2005).

- Main purpose was to clarify the right to ownership of land.- Indicated family place of origin in France or land

acquired in England.

What are surnames in Great Britain?

Category Example ExplanationOccupational (Metonyms)Profession Smith Blacksmith/ metal workerOffice/ Trade Reeve Chief magistrate/ overseerRank/Status Knight A knighted personOccupation Features Falconer One who kept/trained FalconsLocal Surnames (50% of surnames)Toponymic (from landscape) Rivers Dweller near riverToponymic (from village/ region) Cornwall Man from CornwallHabitation (residence) Gate Habitation at/near a gateHabitation (work) Hall A worker at the hall.Surnames of RelationshipFrom personal name (patronymic) Johnson/ Jones Son of JohnFrom personal name (metronymic) Margaretson Son of MargaretPersonal name from other relative Also: Johnson Related to JohnPersonal name from diminutive Dickens Son of Dick (Richard)Clan or tribal names MacBain Related to the MacBain clan.NicknamesFrom animals Fox Slyness or other attributesFrom characteristic traits Careless Free from care/ responsibilityFrom objects Shorthose Someone who wore short bootsFrom physical features Little A small personFrom times and seasons Pasque Person born at EasterFrom iconic description Drinkwater Heavy drinker

It took around 300 years for surnames to be widely adopted, with people taking naming inspiration from every aspect of their lives:

- With greater recording of the population (starting with the Domesday book of 1085) surnames became patronymical (inherited from the father).

- They became fixed to a family lineage rather than location and could move with their “owners”.

- Many names originated in only one place/ region due to different conventions throughout Britain.

What are surnames in Great Britain?

Surnames and Geography

Is it the case that these places of origin remain the areas of highest concentration for these names?

...or over the 1000 years since surnames arrived in Britain has population movement (including international migrants) caused spatial mixing of surnames?

Surnames and geography- some examples: individual surnames

Lewis Smith

Macleod Buckley

Surnames and geography- some examples: groups of surnames

Source: Schurer, K. 2004

Genitival “s” names Patronymic/ Metronymic Names

Surnames and geography- some examples: groups of surnames

Surnames and Geography

-The surnames of Britain appear to exhibit a clear geography.

- This presents an interesting regionalisation problem.

- It also has broader cultural significance.

Aims

- Aggregate the multiple surname distributions to establish broad regional variations.

- Undertake the first study of this kind on two “complete” population registers.

- Establish the extent to which the derived regions are genetic/ cultural.

- Develop a methdological framework for the future spatial analysis of names.

- Demonstrate the inherently spatial nature of surnames and their utility as a resource.

Data1881 Census29 Million People425, 793 Surnames345, 781 <10 occurrences Principle level of geography: 657 Registration Districts

2001 Enhanced Electoral Roll45.6 Million People1,597, 805 Surnames1,457, 681< 10 occurrencesPrinciple level of geography: 410 Districts* (excl. N.Ireland)Additional analysis on: approx 10650 Wards (inc. N.Ireland)

* In the analysis the 32 London Boroughs have been aggregated to a single district (leaving 379 districts) as their high dissimilarity in comparison with the rest of Britain and each other was distorting the results of the regionalisation.

1881 Surname Frequencies (top 500 names)

2001 Surname Frequencies (top 500 names)

Genes and Surnames

- If surnames are inherited then they behave much like a genetic attribute.

- Obviously only really works for men (unless women keep their maiden names).

Genes and Surnames

King and Jobling, 2009

Genes and Surnames

- Previous diagram does not account for geography (the surname possessors could live anywhere).

- The fact that many surnames have stayed concentrated in their point of origin suggests that the groups of people possessing them haven't moved much. – They are therefore even more likely to be related.

Isonymy (“same name”)

- Concept forms the basis to this analysis.

- George Darwin (son of Charles) was interested in isonymous marriages.- His perspective was a genetic

one. He wanted to quantify the effects of inbreeding between cousins*.

* His father and mother were cousins so he had a vested interest!

Coefficient of Isonymy“The probability of members of two populations or subpopulations having genes in common by descent as estimated from sharing the same surnames” (Lasker, 1985:142).

where Si1 is the number of occurrences of the ith surname in a sample from Area 1 and Si2 is the number of occurrences from the same surname from Area 2.

The resulting values can be considered as the proportional correspondence in terms of a shared surname pool between a particular place and all others in the country .

CheshireMateos

Singleton

LongleyO’Brien

AdnanLewis

Smith

Dormandy

Evans

PopeRohde

Penny

Buckle

Cheshire

Mateos

Singleton

Longley

O’Brien

Adnan

Lewis

Smith

DormandyEvans

Pope Rohde

Penny

Buckle

CheshireSingleton

Longley

O’Brien

Adnan

Smith

Buckle

Mateos

Lewis

DormandyEvans

Rohde

Penny

Richards

Whitfield

Johns

Dolan

Cheshire

Singleton Longley O’BrienAdnan

Smith

Buckle

MateosLewis

Dormandy

Evans

RohdePennyRichardsWhitfield

JohnsDolan

A

CB

C= 1 in 17B= 1 in 17

Take Cheshire from A, probability of removing Cheshire from:

A= 2 in 20C= 1 in 17

Take Mateos from B, probability of removing Mateos from:

Repeat this process for each name and sum the probabilities for each comparison...

Take Johns from C, probability of removing Johns from:

A= 0B= 1 in 17

Coefficient of Isonymy

Cheshire

Singleton

O’Brien

Lewis

Smith

Mateos

LongleyAdnan

Dormandy

Evans

CheshireSingleton

Longley

O’Brien

Adnan

Smith

Buckle

Mateos

Lewis

DormandyEvans

Rohde

Penny

Richards

Whitfield

Johns

Dolan

Cheshire

Singleton Longley O’BrienAdnan

Smith

Buckle

MateosLewis

DormandyRohdePennyRichards

WhitfieldJohns

Dolan

A

CB

Coefficient of isonymy between districts A, B and C:

A B C

A 1 1/17+1/10+0=0.16 1/17+0+0=0.05

B 1/17+1/10+0=0.16 1 1/17+1/17= 0.10

C 1/17+0+0=0.05 1/17+1/17= 0.10 1

Coefficient of Isonymy

Lasker Distance

where L is the Lasker distance and i and j are two separate populations

- This takes the Coefficient of Isonymy values and does the following:- Turns them from very small numbers to

larger ones.- Inverts them so that smaller values

represent greater similarity (rather than greater difference).

Lasker Distance Matrices

95Z 99ZZ OOLN 00BL 7.520982 7.336616 7.219516 00BM 7.428889 7.315671 7.425037 00BN 7.347616 7.356772 7.394888 00BP 7.452982 7.299915 7.330886 00BQ 7.410027 7.300150 7.387787

Yarmouth Yeovil York Aberayron 6.389540 6.289929 6.438361 Aberdeen 6.356152 7.019357 6.213222 Abergavenny 6.412893 6.361753 6.566717 Aberystwith 6.327093 6.319481 6.467985 Abingdon 6.353814 6.559106 6.621873

2001 Matrix 1881 Matrix

Can be thought of as placing the districts in “surname space”.

Analysing the Lasker Distance Matrix

The purpose is to group/ split the data by surname similarity.

- Clustering - Multidimensional Scaling

District i or jLasker’s Distance

Clustering: K-Means- The K-means algorithm randomly allocates a set of k seeds within the data matrix and then allocates all data points to their nearest seed.

- A new mean cluster centroid is then calculated for each cluster, and a new partitioning of the data points is made based on the new nearest centroid.

- Centroids are then recalculated for the new clusters, and the algorithm repeats these steps until no more switching takes place.

K-Means Clustering (K=15) 1881

K-Means Clustering (K=15) 2001

Clustering: Ward’s Hierachical Clustering

- Considers union of every cluster pair.

- The two clusters with the minimum increase in ‘information loss’ are combined.

- Information loss is defined by Ward in terms of an error sum-of-squares criterion.

Ward’s Hierarchical Clustering 1881

2001

Ward’s Hierachical Clustering1881 2001

Ward’s Hierarchical Clustering (K=15) 1881 2001

Multidimensional Scaling1881 2001

Summary- There is undoubtedly a regionalisation to Great

British surnames.- The underlying causes appear to be cultural

rather than explicitly environmental: i.e. surname dissemination does not appear to be related to topographic barriers.

- The Scotland/ England transition is a lot more discrete than the Wales/ England transition.

- To what extent is this patterning an artefact of the spatial units used in the Lasker Distance calculations...?

Higher Resolution Analysis- Does calculating the Lasker Distance between smaller areas

create a different picture of the surname regions in Britain?- Is small scale variation sufficient to mask broader trends/

effects?- These questions are explored with 2001 CAS Wards. - Some considerations:

- Data size: at Ward level the Lasker Distance calculation involves 1,597, 805 *10500*10500 cells of data.- Small numbers problem

- Key advantage is the reduced influence of London (accounts for only 6% of the units of analysis instead of 13%). It can therefore be included in the cluster analysis.

Higher Resolution Analysis: CAS Wards

Corby1881 2001

MDS

Ward’s

K-Means

- In 1932 Stewarts and Lloyds built a new iron and steel works in Corby.

- Labour sourced from closing Scottish steelworks, mainly in Lanarkshire.

- Into the 1970s, 50% of the incoming population Scottish.

- Transformed population from 1,500 to 34,000 .

- Annual Highland Games.

Corby

1881 2001MDS

Ward’s

K-Means

Danelaw

Danelaw1881 2001 2001 Ward Level

“It might appear...that the family of nomenclature of Englishmen was for the most part in a confused jumble, and that on account of the rapid means of inter-communication, which we enjoy in the present Century, most of the distinctions that existed in the past would have been lost in the whirl and bustle of the industrial era in which we live. It might have seemed...that chance had played such as part in the intermingling of inhabitants of different counties and districts, that it would seem a hopeless task to unravel the entangled skein...I found it was yet possible to pick up the threads. By this means I have found order where I expected disorder and method where I only looked for chance. ” Henry Guppy, 1890.

Surname Regions Great Britain?

Surname Regions in Britain?

- Multiple levels from broad contiguous regions to small areas of intra-region similarities.- Each level representing a different slice through time?

• Likely to reflect areas of genetic and cultural similarities/ difference.

Spatial Analysis of Surnames

Methods ApplicationsAugmentation

Clustering

Visualisation

Surname Sampling

Surface Analysis

Geodemographics

Genetic Characteristics

Functional/ Uniform Regions?

Population Sampling

Geo-Genealogy

Hypothesis generation

Migration flowsTemporal Analysis Temporal Analysis

A Population Geology of the UK?

Effective Population Sampling- Using surname regions to inform sample design

regions of the Britain: - For example there is little point in sampling a person from Corby if you wish to genetically characterise the Northamptonshire population.- Equally, the Corby population may have unrepresentative views on Scottish devolution, for example.

- Do the sub-regional groups show more allegiance to each other than the broader regions they fall within?

Conclusions

- Surname regions exist in contemporary Britain.- To a remarkable degree they remain unchanged

from their conception nearly 1000 years ago.- Unearthing these regions by establishing a clear

methodological framework and utilising complete population registers provides a firm basis for future research.

Date post:	30-Dec-2015
Category:	Documents
Upload:	georgia-matthews
View:	216 times
Download:	1 times

Spatial Analysis of Surnames in Great Britain James Cheshire Department of Geography and CASA, UCL...

Documents