+ All Categories
Home > Documents > Implementing Coding Tools for a New Classification

Implementing Coding Tools for a New Classification

Date post: 19-Mar-2016
Category:
Upload: otylia
View: 25 times
Download: 1 times
Share this document with a friend
Description:
Implementing Coding Tools for a New Classification. John Perry, UK Office for National Statistics. Operation 2007 - The players:. In the UK: The Standard Industrial Classification of Economic Activities (SIC) (current version SIC (2003) - PowerPoint PPT Presentation
26
Implementing Coding Tools for a New Classification John Perry, UK Office for National Statistics
Transcript
Page 1: Implementing Coding Tools for a New Classification

Implementing Coding Tools for a New Classification

John Perry, UK Office for National Statistics

Page 2: Implementing Coding Tools for a New Classification

Operation 2007 - The players:• In the UK:

The Standard Industrial Classification of Economic Activities (SIC) (current version SIC (2003)

• In Europe: NACE, the Nomenclature générale des activités économiques dans les Communautés européens (current version NACE Rev 1.1)

• In the UN: ISIC, the International Standard Industrial Classification of all Economic Activities (current version ISIC Rev 3.1)

Page 3: Implementing Coding Tools for a New Classification

The UK SIC• is a 5 digit classification system

• is required, by EU legislation, to be identical to NACE down to and including the 4 digit Class level

• contains a national 5th digit level which does not exist in NACE

Page 4: Implementing Coding Tools for a New Classification

The Results – changes in structureSIC 2003 SIC 2007

NACE Classes 514 615

NACE Classes not split

414 537

UK Sub Class splits

285 191

Total Sub Classes 699 728

Page 5: Implementing Coding Tools for a New Classification

ACTR as an aid to coding• ACTR – Automatic Coding by Text Recognition• Developed by Statistics Canada• ONS standard tool for coding, initially industry and

occupation• Replaces Precision Data Coder for industry coding• Determines a code from a text description• Extent of automation of process is controlled by

parameters

Page 6: Implementing Coding Tools for a New Classification

Knowledge Bases – SIC2003• ACTR relies heavily on indexes of standard

descriptions:– Business descriptions from responses to the Business

Register Survey– Published index for the SIC2003– The short descriptions for each SIC2003 code– Standard descriptions for construction industry statistics– Trade code descriptions for PAYE (Pay As You Earn

Tax) employers– Farm type descriptions

• With a total of > 30,000 standard descriptions

Page 7: Implementing Coding Tools for a New Classification

How ACTR works• Each input description is converted to a standard form• This is compared with the standard forms of descriptions

held in the knowledge base• The closeness is presented as a score between 0 and 10• The system has rules to determine whether the score is

sufficient to confirm a match:– Requires a score of more than 7.5 to code automatically (our

setting which may differ for other data sets)– Lower scores are passed through interactive coding

• Coding does not depend on the order in which the knowledge bases are checked

Page 8: Implementing Coding Tools for a New Classification

Extract from Business Register Survey Questionnaire

Page 9: Implementing Coding Tools for a New Classification
Page 10: Implementing Coding Tools for a New Classification
Page 11: Implementing Coding Tools for a New Classification
Page 12: Implementing Coding Tools for a New Classification

ACTR Process• Supplied text: Horticultural services• HORTICULTURAL SERVICE• Best fit index entry: Sales and service of

horticultural machinery• HORTICULTURAL MACHINERY SALE SERVICE

• Score is 6.911 (out of 10)

• ACTR prefers SIC 2003 code: 51880 (Wholesale of agricultural machinery and accessories)

Page 13: Implementing Coding Tools for a New Classification
Page 14: Implementing Coding Tools for a New Classification

Interactive coding• Scores below 7.5 are passed to clerical staff for

coding interactively• The system presents options in descending order

of score• If none of the choices appear good, staff modify the

description• Once a decision is made, the person coding

confirms the choice• The index description is then held on the IDBR.

Page 15: Implementing Coding Tools for a New Classification

Introducing the SIC2007 (NACE Rev 2)• New index files:

– SIC2007 headings– SIC2007 index

• Initially code forward from the SIC2003 using bridging codes – these are codes for each knowledge base entry that link the SIC2003 and SIC2007

• Later will change to code backwards from the SIC2007

• Eventually dual coding will cease

Page 16: Implementing Coding Tools for a New Classification

Impact of ACTR on IDBR at Micro Level• Existing SIC 2003 is 01120 (Growing of vegetables

etc)• The preferred ACTR SIC 2003 is 51880

(Wholesale of agricultural machinery and accessories)

• The SIC 2007 comes from the bridging code– SIC 2003: 51880– Bridging code: MTOLR– SIC 2007: 46610

• SIC 2003 code will change but only when agreed

Page 17: Implementing Coding Tools for a New Classification

Conversion to SIC2007• ACTR will deal with units that have a suitable

business description• Conversion tables will deal with:

– Units with descriptions that ACTR is unable to code (vague descriptions)

– Units without a description– Units supplied through administrative sources (existing

VAT traders, PAYE employers, Registered Companies)

Page 18: Implementing Coding Tools for a New Classification

Creation of Conversion Tables• Tables have been created to convert units from

SIC2003 to SIC2007:– Using ACTR bridging codes– Coding existing data through ACTR– Producing cross-tabulation of SIC2003 to SIC2007– Allocating on a probability basis rounded to nearest 5%– Validate relationships against the acceptable range of

industries• Best fit tables also produced for users who cannot

accommodate probability based conversion

Page 19: Implementing Coding Tools for a New Classification

Codingprocess

Page 20: Implementing Coding Tools for a New Classification

Impact on the IDBR at the Macro Level• Impact on SIC 2003 is only on those reporting units

that have business descriptions for local units, where ACTR can code.

– ACTR codes 620,000– ACTR does not code 210,000– No business description 340,000– Administrative data only 1,660,000– Total local units 2,830,000

• SIC 2007 comes from the bridging codes only where ACTR codes – otherwise SIC 2007 comes from conversion from SIC 2003

Page 21: Implementing Coding Tools for a New Classification

A AGRICULTURE, HUNTING AND FORESTRY SIC 2003B FISHING

C MINING AND QUARRYING

D MANUFACTURING

E ELECTRICITY, GAS AND WATER SUPPLY

F CONSTRUCTION

G WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES

H HOTELS AND RESTAURANTS

I TRANSPORT, STORAGE AND COMMUNICATION

J FINANCIAL INTERMEDIATION

K REAL ESTATE, RENTING AND BUSINESS ACTIVITIES

L PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL

M EDUCATION

N HEALTH AND SOCIAL WORK

O OTHER COMMUNITY, SOCIAL AND PERSONAL SERVICE ACTIVITIES

P PRIVATE HOUSEHOLDS EMPLOYING STAFF AND UNDIFFERENTIATED

Q EXTRA-TERRITORIAL ORGANISATION AND BODIES

Page 22: Implementing Coding Tools for a New Classification

Impact at SIC 2003 broad industry level (provisional counts)

Section Starting stock In Out Net ChangeA & B 167,000 0.5% 0.6% -0.1%C, D and E 180,000 5.9% 5.2% +0.7%F 260,000 1.4% 0.9% +0.5%G 530,000 2.4% 2.5% -0.1%H 188,000 2.3% 1.6% +0.7%I 116,000 2.7% 2.4% +0.3%J 58,000 6.5% 3.3% +3.2%K 872,000 1.2% 1.3% -0.1%L 29,000 10.4% 11.1% -0.7%M, N and O 432,000 2.9% 3.8% -0.9%

Page 23: Implementing Coding Tools for a New Classification

A Agriculture, Forestry And Fishing SIC 2007B Mining And Quarrying

C Manufacture

D Electricity, Gas, Steam And Air Conditioning Supply

E Water Supply; Sewage, Waste Management And Remediation Activities

F Construction

G Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles

H Transportation And Storage

I Accommodation And Food Service Activities

J Information And Communication

K Financial And Insurance Activities

L Real Estate Activities

M Professional, Scientific And Technical Activities

N Administrative And Support Service Activities

O Public Administration And Defence; Compulsory Social Security

P Education

Q Human Health And Social Work Activities

R Arts, Entertainment And Recreation

S Other Service Activities

T Activities Of Households

U Activities Of Extraterritorial Organisations And Bodies

Page 24: Implementing Coding Tools for a New Classification

Correspondence between SIC 2003 and SIC 2007 for local units coded by ACTR

SIC2003 SIC2007 A B C D E F G H I J K L M N O Q A 2147 473 . 2 . . . . . . . . . . . .B . . 684 . . . . . . . . . . . . .C 2 . 18 29525 . . 2 . . . 23 . . . . .D . . . . 807 . . . . . . . . . . .E . . . 203 856 . . . . . . . . . 2557 .F . . . 48 . 27719 . . . . 3046 . . . . .G . . . . . . 157840 . . . . . . . . .H . . . . . . 180 . 20437 . . . . . . .I . . . . . . . 55729 . . . . . . . .J . . . 1819 . . . . 2911 . 8264 . . . 1499 .K . . . . . . . . . 28198 855 . . . . .L . . . . . . . . . . 18285 . . . . .M . . . . . . . . . . 34010 . . 2440 4 .N 851 . . . . 6 . . 6142 . 44819 . . . 111 .O . . . . . . . . . . . 19552 . . . .P . . . . . . . . 1 . 23 . 34575 . 389 .Q . . . . . . . . . . . . . 62873 . .R . . . . . . . . . . . . . . 24662 .S . . . 138 . . 1376 . . . 320 . . . 25549 .U . . . . . . . . . . . . . . . 8

Page 25: Implementing Coding Tools for a New Classification

Implementation timetable

December 2006

NACE published

January 2007 SIC 2007 is published on NS websiteFebruary 2007 Development and tuning of data coder (ACTR) –

first release on 2007 basis, subject to revisionJune 2007 Re-coding using ACTRAugust 2007 New release of ACTR, using SIC 2007 index

November 2007

SIC 2007 Index published (consistent with ACTR August 2007)

January 2008 SIC 2007 fully implemented on the Register2008 ???? ACTR SIC 2003 overwrites historic SIC 2003

Page 26: Implementing Coding Tools for a New Classification

Conclusions• The ACTR tool delivers considerable savings in terms of

cost and burden on businesses compared to traditional survey approaches.

• The knowledge base is portable (i.e. independent of the coding engine), enabling sharing this with any interested parties, e.g. administrative data suppliers, to increase the consistency of coding.

• The use of bridging codes permits simultaneous coding to multiple classification systems, essential if periods of dual-coding are required.

• The knowledge base approach can help to inform the development of future versions of a classification, by providing a reference frame of business activity descriptions.


Recommended