Towards an Enhanced UK Spatial Interaction Data Service

Post on 18-Jan-2016

30 views 0 download

Tags:

description

Towards an Enhanced UK Spatial Interaction Data Service. Adam Dennett, Oliver Duke-Williams and John Stillwell School of Geography, University of Leeds Presentation for British Society for Population Studies, University of St Andrews, 11-13 September 2007. Outline of presentation. - PowerPoint PPT Presentation

transcript

School of GeographyFaculty of EnvironmentSchool of GeographyFaculty of Environment

Towards an Enhanced UK Spatial Towards an Enhanced UK Spatial Interaction Data ServiceInteraction Data Service

Adam Dennett, Oliver Duke-Williams and John StillwellAdam Dennett, Oliver Duke-Williams and John StillwellSchool of Geography, University of LeedsSchool of Geography, University of Leeds

Presentation for British Society for Population Studies, Presentation for British Society for Population Studies, University of St Andrews, 11-13 September 2007 University of St Andrews, 11-13 September 2007

• Introduction: relevant background on interaction data and CIDER and WICID

• Audit of Interaction Data Sources: a brief overview of the variety of interaction data sources available in the UK

• What were the recommendations of the audit? • How do we propose to take things forward to create an

enhanced UK spatial interaction data service?• The new INTERACTION system: overview of the issues and

challenges involved• The new data: an overview of the individual characteristics of

each of the new proposed datasets

• CIDER: the Centre for Interaction Data Estimation and Research

• Based now, principally, at the University of Leeds though software runs at Manchester

• Data Support Unit: part of the ESRC-funded UK Census Programme

Based at Edina at the University of Edinburgh provides access to digital boundary data associated with census outputs, as well as look-up tables for geographical conversion.

Based at at the University of Manchester. Provides access to census aggregate outputs from 1981 to 2001 through the interface.

Based principally in the School of Geography at the University of Leeds. Provides access to interaction datasets through the interface.

SARs for small samples of households and individuals are supported by the Cathie Marsh Centre for Census and Survey Research (CCSR) based at the University of Manchester.

CeLSIUS, based at the London School of Hygiene and Tropical Medicine, provides access to the Longitudinal Study dataset, comprising linked records for 1% of the population of England and Wales from 1971.

Based at the University of St-Andrews, the Scottish LS is a replica of the England and Wales LS, although samples 5.3% of the Scottish Population.

Census Registration Service (University of Essex)

Access to Census Data and the Census Data Support Units

Public access to key statistics, census area statistics and standard tables through National Statistics and NOMIS

Provide essentially the same data for 2001, although CDU gives access to data from 1981 and 1991 as well.

Introduction - CIDERIntroduction - CIDER

Migration

Commuting

• Currently we currently administer interaction (flow) data from the 1981, 1991 and 2001 Censuses

• 2001 Census: Special Workplace Statistics (SWS) (Levels 1, 2 & 3)

• 2001 Census: Special Travel Statistics (STS) (Scotland Levels 1,2 & 3 and Level 2 Scottish postal sectors)

• 2001 Census: Special Migration Statistics (SMS) (Levels 1,2 & 3)

• Also comparable datasets from 1991 and 1981• As well as the standard District, Ward and OA geographies

available, different aggregations of these basic units, as well as various bespoke geographies are available for different data years

Introduction - WICIDIntroduction - WICID

CIDER’s objectives of relevance to this presentation:

• To gather/estimate further UK census-based data sets and include them in the system

• To expand the WICID system to incorporate a range of UK interaction data sets from outside of the census

• To undertake research based on the current and future interaction data sets held within the software system

Purpose of the Audit:

• Before adding new datasets to WICID, we needed to know what was out there!

• To identify and evaluate sources of interaction data in the UK that might complement the current census datasets held in WICID

• To make recommendations relating to the inclusion of the most useful datasets in a new, expanded version of WICID called INTERACTION

Additional data should be included in the new system from the following four sources:

• 2001 Census: the large and more complex matrices of migration and commuting flows commissioned from ONS that have national coverage at district and sub-district spatial scales

• NHSCR: annual flows, from 1975 to 1998, of NHSCR patient re-registration movements between 100 FHSA-based zones, disaggregated by age and sex; and

annual flows, from 1998/99 onwards, of NHS patients movements between HAs, disaggregated by age and sex

• HESA: annual flows, from 2001 onwards, of student movements between MLSOA of parental domicile and HEI, disaggregated by various characteristics

• NHS IC: annual flows, from 2001 onwards, of hospital patients from LLSOA or MLSOA of residence to hospital, disaggregated by various attributes

• CIDER is currently in negotiation with the custodians of these targeted data sets to see if incorporation of the data into a an extended version of WICID is possible.

• All current indications are positive, but due to the differing availability and cost of particular data sets, it is likely that the acquisition and incorporation of some data will happen before others.

• Securing additional funding via the Census Development Programme should allow for the purchase of data and trial of a new improved INTERACTION data system which incorporates these new data sources.

• Overview of the issues and challenges involved with adding new non-census datasets to the new INTERACTION system.

• The new data: A more detailed look at the individual characteristics of each of the new proposed datasets.

Session Data

Meta Data

Database Server:PostgreSQL 8.2.4

Web server:Apache 2.0.59,

supporting php 5

Web client:IE, Firefox, Opera, etc…

Interaction Data

• System originally designed to handle a variety of primary (migration) data

• Metadata is key as it describes the primary data held in the database. The system relies on this metadata to recognise the range of primary data stored

• The system has very few ‘hardcoded’ assumptions about the data – it is all looked up whenever a data page on the user’s browser is produced

• Data need only have a single origin and destination identifier, with a set of fields (generally a set of counts disaggregating the flow)

DATA TABLE

2001 dataTotal Migrants

By Religion

Origin – England ST Ward

Destination – England ST Ward

datatype:1. 1991 SMS set 15. 2001 SMS level 16. 2001 SMS level 210. 2001 SWS level 145. 2001 Migrants by Religion

orig_geogtype:1. 1991 SMS counties7. UK interaction data wards 200111. UK interaction data districts 200122. UK Standard table wards 200136. 1981 SMS foreign origins

dest_geogtype:1. 1991 SMS counties7. UK interaction data wards 200111. UK interaction data districts 200122. UK Standard table wards 200136. 1981 SMS foreign origins

relname:1. data_1991sms17. data_2001sms322. data_c0648_religion

familytype:1. commuting data2. migration data

orig_label origin orig_geogtype dest_label destination dest_geogtype variable_1 variable_2 variable_300EB Hartlepool UA 1 28 00EB Hartlepool UA 1 28 7906 1307 5746

00EC Middlesbrough UA 2 28 00EB Hartlepool UA 1 28 99 16 6000EE Redcar and Cleveland UA 3 28 00EB Hartlepool UA 1 28 47 10 26

00EF Stockton0on0Tees UA 4 28 00EB Hartlepool UA 1 28 235 36 17600EH Darlington UA 5 28 00EB Hartlepool UA 1 28 36 7 23

20UB Chester0le0Street 6 28 00EB Hartlepool UA 1 28 20 4 1320UD Derwentside 7 28 00EB Hartlepool UA 1 28 13 3 10

20UE Durham 8 28 00EB Hartlepool UA 1 28 25 6 1620UF Easington 9 28 00EB Hartlepool UA 1 28 196 40 135

20UG Sedgefield 10 28 00EB Hartlepool UA 1 28 41 7 2920UH Teesdale 11 28 00EB Hartlepool UA 1 28 4 0 4

Unique origin and destination identifiers which allow for

swift look-up and extraction

Variables = counts of Christian, Muslim,

Jewish etc…

Table in database given name, e.g. data_c0648_religion

Pairwise table – generally set up origin/ destination metadata at

this stage.

Rest of the metadata is set up through the web-interface

Essentially the process involves letting the system know there is a new table in the database, and the structure of this table in terms of the variables included

Once the metadata is set up, defining the structure of the table in WICID, (in terms of the type of data, the variables included, the geographies that apply), the web-based query builder can then be used to extract any data selected by the user from the database

• Flexible nature of the current WICID system should allow for the addition of non-census datasets as long as the data is prepared in the required pair-wise origin, destination, variable format

Main challenges:• Re-designing the interface to handle time-series data. Current data are

discrete, cross-sectional data • Some of the datasets (HES for example) present issues related to

geographies: Currently, HES destination is a specific point, rather than an area

• Metadata redesign to clearly identify different datasets and characteristics for users

• Incorporation of ‘on-the-fly’ disclosure control routines for datasets like HESA

Possible future data selection layoutCurrent data selection layout

• Output complexities will need to be solved, with extra dimensions to the data output

e.g. Current: origin/destination by age by sex

Could be: origin/destination by age by sex by year

• Currently, census data supplied to us has already been subjected to statistical disclosure control methods, such that small counts are suppressed before the data is put onto the system - this can affect the accuracy of query results

• Where some new datasets will be supplied in primary unit form, this offers us the opportunity to only apply statistical disclosure control where it is necessary, thus increasing data accuracy for the end user

• Different techniques will need to be trialled and evaluated before data is made widely available

Three new non-census data sets would be included in INTERACTION:

• National Health Service Central Register (NHSCR) data from 1975 to present

• Hospital Episode Statistics (HES) data from 2001 to present

• Higher Education Statistics Agency (HESA) student data from 2001 to present

• NHSCR data will be available as a time series for a consistent set of 100 Zones based on the FHSA geography from 1975 to 1998

• Post-1998 data will be available for Health Authority areas in England and wales and equivalent areas in Scotland and Northern Ireland

• Variables will be restricted to broad age and sex categories

Changing patterns of net migration as shown by NHSCR data

1980-82

1988-90

Source: Stillwell (1994)Environment and Planning A

• We would be aiming to include HES data from 2001 until the present

• Data contains information on all in-patient episodes relating to Hospitals in England

• Origins are as detailed as Ward or SOA. Destinations are available down to Postcode Unit level

• The ‘journey to hospital’ data can be disaggregated by a huge variety of variables, including:

• Age (at end and start of hospital episode)• Sex• Ethnicity• Duration of episode• Type of episode (related to treatment given)• Diagnosis category (International Classification of Diseases and

related health problems [ICD-10] classification) – contains information on every known illness/disease/injury

• Separate classifications for maternity and mental health episodes • Type of operation (if applicable)

Total number on in-patient visits (including repeats) made to Yeovil and Weston hospitals from England in 2000/01

• Hospital Episode Statistics provide a unique opportunity to study hospital catchment areas in relation to specific treatments and enable measurements of ‘market penetration’ – something becoming more relevant under the new NHS Patient Choice directive which allows patients more choice over where they are treated

• Spatial interaction modelling will enable analyses of the frictional effect of distance on the ‘commute’ to hospital, and the testing of ‘what if’ scenarios in relation to the opening and closing of hospitals

• Optimum locations for new hospitals or treatment centres in relation to demand could be explored through location-allocation modelling

• We would be aiming to include HESA data from 2001 until the present

• Data contains information on the home address and destination of higher or further education institution

• Origins could be as detailed as MLSOA with destinations only as accurate as the location of the HE institution attending – no way to ascertain exactly where student is living

• Student migrations can be disaggregated by:

• Age group (5 years)• Disability (disabled/not known to be disabled/not known)• Ethnicity (white/non-white/unknown)• Domicile (middle layer Super Output Area)• Postcode of HEI headquarters• Level of study (postgraduate, first degree, other undergraduate)• Subject area• Term-time accommodation• Major source of tuition fees• Mode of study (full-time/part-time)• Gender

• Students are the section of the population most actively involved in internal migration in Britain

• Increasing numbers of students are entering into higher education, with large numbers of students becoming features of many of Britain’s major urban centres

• Students have significant social, cultural, economic and environmental impacts on the areas they live with issues such as ‘studentification’ becoming active topics of political debate

• Times series and cross-sectional analysis of student migration data in Britain should allow for greater understanding and prediction of student in-migration impacts

• An extensive audit of interaction data in the UK led to CIDER identifying a number of key sources that could be incorporated into an updated version of the WICID system

• New data sources would compliment existing census-based interaction datasets and would move CIDER towards providing a more complete interaction data service

• An number of technical challenges will need to be overcome as we move from WICID to INTERACTION

• Easy access to new interaction data sources will provide unique opportunities for substantive research to be carried out in relation to internal migration in the UK

School of GeographyFaculty of EnvironmentSchool of GeographyFaculty of Environment

Thank youThank youAdam Dennett,Adam Dennett,

Centre for Interaction Data Estimation and Research,Centre for Interaction Data Estimation and Research,School of Geography,School of Geography,University of LeedsUniversity of Leeds

a.r.dennett@leeds.ac.ukhttp://www.geog.leeds.ac.uk/people/a.dennett/

For the full auditFor the full audit: : http://www.geog.leeds.ac.uk/wpapers/index.html