+ All Categories
Home > Technology > India Census Data Processing

India Census Data Processing

Date post: 02-Nov-2014
Category:
Upload: diwakergupta
View: 34 times
Download: 9 times
Share this document with a friend
Description:
 
Popular Tags:
56
DATA CAPTURE IN CENSUS OF DATA CAPTURE IN CENSUS OF INDIA INDIA Registrar General & Census Registrar General & Census Commissioner, India Commissioner, India Visit Our Website at Visit Our Website at www.censusindia.gov.in www.censusindia.gov.in
Transcript
Page 1: India Census Data Processing

DATA CAPTURE IN CENSUS DATA CAPTURE IN CENSUS OF INDIAOF INDIA

Registrar General & Census Registrar General & Census Commissioner, India Commissioner, India

Visit Our Website atVisit Our Website atwww.censusindia.gov.inwww.censusindia.gov.in

Page 2: India Census Data Processing

FEATURES OF INDIAN CENSUS • India – a large country with more than a billion India – a large country with more than a billion

population Censuses is then one of the world largest population Censuses is then one of the world largest

administrative and statistical exerciseadministrative and statistical exercise

• Diversity in languages – Schedules filled in 16 Diversity in languages – Schedules filled in 16

languages languages

• 2 million enumerators deployed in 2001 Census – 2 million enumerators deployed in 2001 Census –

likely to increase further in 2011 census. likely to increase further in 2011 census.

Page 3: India Census Data Processing

FEATURES OF INDIAN CENSUS (Contd..)

• Census which is conducted using ‘canvasser’ method is in Census which is conducted using ‘canvasser’ method is in

two phases: two phases: House-listing House-listing

Population EnumerationPopulation Enumeration

• Census Organization has experimented with new IT Census Organization has experimented with new IT

innovations since the beginninginnovations since the beginning

• Technology is required particularly for data Technology is required particularly for data

capture/processing – mainly due to large volume and for capture/processing – mainly due to large volume and for

speedier tabulation & release of Census resultsspeedier tabulation & release of Census results

Page 4: India Census Data Processing

MODE FOR DATA CAPTURE & PROCESSING MODE FOR DATA CAPTURE & PROCESSING SINCE 1961SINCE 1961

Census 1961 1971 1981 1991 2001

Population 43.9 Million

54.8 Million

68.3 Million

84.6 Million 102.8 Million

Collection%

100 100 100 100 100

Capture % 5 15 25 45 100

Mode Hand Punch

Key Punch Data Entry

Data Entry Scanning/ICR

Time taken

8-9Years 8-9Years 8-9 Years 7-8 Years 3-5 Years

Page 5: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Important ConsiderationsImportant Considerations

• Conventional data entry not suitable for large volume (228 Conventional data entry not suitable for large volume (228 million schedules for 102.8 million population) of data. million schedules for 102.8 million population) of data.

• Availability of advanced IT tools and techniques. Availability of advanced IT tools and techniques.

• Capture and process all the collected information. Capture and process all the collected information.

• Complexities in data entry due to multiplicity of Complexities in data entry due to multiplicity of languages/responses and size (A3) Census Schedule. languages/responses and size (A3) Census Schedule.

Page 6: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Important Considerations (Contd..)Important Considerations (Contd..)

• Retrieval of original documents for correction labor – Retrieval of original documents for correction labor – intensive. intensive.

• Reduce the time span from 5-8 years to 3-5 years. Reduce the time span from 5-8 years to 3-5 years.

• Compact , reliable and efficient archival system. Compact , reliable and efficient archival system.

• Better workflow management. Better workflow management.

Page 7: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Selection and Consequent ActionSelection and Consequent Action

• Evaluation of various available technologies (OMR/OCR/ICR).Evaluation of various available technologies (OMR/OCR/ICR).

Trial run with NCS and DRS OMR. Trial run with NCS and DRS OMR. Trial Run with various ICR vendors. Trial Run with various ICR vendors.

• Opted for ICR technology(TIS eFlow) Opted for ICR technology(TIS eFlow)

• IT Infrastructure in all the 15 Data Centers upgraded to meet the new IT Infrastructure in all the 15 Data Centers upgraded to meet the new requirement.requirement.

Page 8: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Model Conceived for implementationModel Conceived for implementation

• Services of System Integrator hired to guide and assist in the Services of System Integrator hired to guide and assist in the implementation of ICR technology. implementation of ICR technology.

• An unique model for OutsourcingAn unique model for Outsourcing

SI to work in our premises for better SI to work in our premises for better communication and control communication and control maintain data security, safety and confidentiality maintain data security, safety and confidentiality

Capacity building (Training and guiding to IT staff)Capacity building (Training and guiding to IT staff)

Production Linked payment to SI Production Linked payment to SI

Page 9: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Work Flow of ORGI (TIS Eflow characteristic)

Design data capture workflow

Presents a graphical view of the system

Monitors the processing and workflow in real time

Enables to customize applications and add custom features

Page 10: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Work flow Modules

Scan Portal, File Portal, Controller FormID, Manual

FormID RC Processing [OCR/ICR]

Tile, Completion, CAC & Exception

Export

Page 11: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001

CENSUS ORGI Workflow Stages

ASCII F

ILE

Prepare Batch

Scanning

Recognition

Tiling

Completion

Exception

Export/Archival

Server

Page 12: India Census Data Processing

Server

Controller station

Tiling & Completion stations

Export station Scanning station

Recognition stationsException stations

DATA CAPTURE & PROCESSING IN 2001 CENSUS LAN SETUP - ORGI DATA CENTERs

Forms are fed thru SCANNER(S) batch by batch

Field by field character

images are automatically RECOGNISED

Tile/Correction station - Un-recognised Characters are corrected by OPERATORS

Supervisors Handle Exceptional cases referred

by Operators

Supervisor Export completed batches as ASCII file for further processing

Supervisor Monitor the workflow & Balance the load at different stages of operation

Form IMAGES stored in Network

DISK

Page 13: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

eFlow customizationeFlow customization

• customization of Scanning software for Batching the customization of Scanning software for Batching the imagesimages

• optimization of Batch Size for Network movement optimization of Batch Size for Network movement of images and dataof images and data

• Customization of workflow management to reduce Customization of workflow management to reduce the workload on Manual Identification stationthe workload on Manual Identification station

Page 14: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

eFlow customization (Contd..)eFlow customization (Contd..)

• Development of new Management Information tools Development of new Management Information tools for operators and daily production status etcfor operators and daily production status etc

• creation of JUSTICR.mdb to recognize the Indian creation of JUSTICR.mdb to recognize the Indian enumerators writing patternsenumerators writing patterns

• Creation and implementation of various static and Creation and implementation of various static and Dynamic Dictionaries for CACDynamic Dictionaries for CAC

Page 15: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Results Achieved

• First time 100% data captured, processed and released within five year of Census

• Auto Recognition Rate 90% & false positive < 2%

• Considerable financial saving

• Assimilation of IT skills internally in the organisation.

Page 16: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Results Achieved (Contd..)

•Manual Coding was replaced by Computer Assisted Coding

Schedule Caste/ Schedule Tribe Languages spoken, Education level Migration particulars, NIC and NCO

•Indigenous data capture for other projects

Economic Census Sample Registration System Verbal Autopsy

Page 17: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Difficulties ExperiencedDifficulties Experienced

• Unable to use color drop-out at scanning stage Unable to use color drop-out at scanning stage

• Difficult to handle bad images during scanning stages.Difficult to handle bad images during scanning stages.

• Bad/Back Images due to variation in paper/print qualityBad/Back Images due to variation in paper/print quality

• Over writing/use of whitener, grid line recognize as 1Over writing/use of whitener, grid line recognize as 1

• Limitation of recognizing Indian languages affected the Limitation of recognizing Indian languages affected the through putthrough put

Page 18: India Census Data Processing

DATA CAPTURE & PROCESSING IN 2001 CENSUS

Difficulties Experienced (Contd..)Difficulties Experienced (Contd..)

• Operational Constraints in Manual IdentificationOperational Constraints in Manual Identification

• No powerful tools for online Load balancing among No powerful tools for online Load balancing among various stages of eflowvarious stages of eflow

• Lack of concurrent quality check at each stage of eflowLack of concurrent quality check at each stage of eflow

• Lack of Auto coding features for textual responsesLack of Auto coding features for textual responses

• Even Single image non recognition leads to redo whole Even Single image non recognition leads to redo whole batch batch

Page 19: India Census Data Processing

LESSONS LEARNT FOR FUTURE

• Outsourcing in controlled environment beneficial and cost-effective

• Good quality of paper

• ICR friendly Form Design • Use of Bar Code for better work flow and Inventory management

• Good quality printing

Page 20: India Census Data Processing

LESSONS LEARNT FOR FUTURE

(Contd..)

• Special training to enumerators for filling the forms

• For CAC, use knowledge Based dictionaries to increase throughput

• Use of concurrent quality check procedures on the line of USA and UK

Page 21: India Census Data Processing

DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING Technology for 2011 CensusTechnology for 2011 Census

• Continuation of ICR TechnologyContinuation of ICR Technology

International and national experience shows as on International and national experience shows as on

date no better substitute for scanning & ICR date no better substitute for scanning & ICR

technology technology

Expertise and competence gained in using ICR Expertise and competence gained in using ICR

technology available in the organization technology available in the organization

Page 22: India Census Data Processing

DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING Technology for 2011 Census (contd..)Technology for 2011 Census (contd..)

• Use more efficient scanners having facility for image Use more efficient scanners having facility for image

enhancement, noise removal, color drop-out, better enhancement, noise removal, color drop-out, better

throughput and on-spot detection and correction throughput and on-spot detection and correction

(through in-built software) of bad images to be used.(through in-built software) of bad images to be used.

• Use of improved version of ICR software with better Use of improved version of ICR software with better

recognition and built-in enhanced workflow recognition and built-in enhanced workflow

management capability. management capability.

• Use new features in Auto/Computer Assisted Coding in Use new features in Auto/Computer Assisted Coding in

ICR software ICR software

Page 23: India Census Data Processing

Thank you.

Visit Our Website atwww.censusindia.gov.in

Page 24: India Census Data Processing

Steps involved in e-Flow ProcessSteps involved in e-Flow Process

• Intelligent Character Recognition (ICR)Intelligent Character Recognition (ICR) Technology is Technology is

used to extract the handwritten/machine printed (typeset) used to extract the handwritten/machine printed (typeset)

character(s) from the scanned images to generate the character(s) from the scanned images to generate the

computer processable data file. In brief, following steps are computer processable data file. In brief, following steps are

involved in using ICR technology.involved in using ICR technology.

• Scanninganning:- Paper based forms are scanned to create bit :- Paper based forms are scanned to create bit

map image filemap image file

• File Portalle Portal::- It is an Image File Registration module in ::- It is an Image File Registration module in

eflow as an input to next activity.eflow as an input to next activity.

• Form Identificationrm Identification:- Automatically identifies the Images :- Automatically identifies the Images

of various schedules based on the Empty Form Image (EFI) of various schedules based on the Empty Form Image (EFI)

template created during the designing stage.template created during the designing stage.

Page 25: India Census Data Processing

Steps involved in e-Flow ProcessSteps involved in e-Flow Process

• Manual Identificationnual Identification: Unidentified forms due to bad : Unidentified forms due to bad

images are matched by the operator manually on computer images are matched by the operator manually on computer

with the help of EFIs .with the help of EFIs .

• Processing:ocessing: This module is heart and brain of the ICR This module is heart and brain of the ICR

technology. It automatically recognize the data technology. It automatically recognize the data

(numerals/alpha) from the images with the help of various (numerals/alpha) from the images with the help of various

engines (CGK, AEG,KADMOS,TISICR etc) engines (CGK, AEG,KADMOS,TISICR etc)

• Tile: le: This module displays the images of similar digit at one This module displays the images of similar digit at one

place to identify any wrongly recognized character by place to identify any wrongly recognized character by

system for correction and thus, enhances the accuracy and system for correction and thus, enhances the accuracy and

quality of data.quality of data.

Page 26: India Census Data Processing

STEPS INVOLVED IN eFLOW PROCESSSTEPS INVOLVED IN eFLOW PROCESS

• Completionmpletion:- Unrecognized or wrongly marked :- Unrecognized or wrongly marked

recognized characters in the Tiling will be presented for recognized characters in the Tiling will be presented for

correction using images displayed simultaneously.correction using images displayed simultaneously.

• Exceptioneption:- If any character image is not understood by :- If any character image is not understood by

operator at completion station (module), that will be operator at completion station (module), that will be

corrected in Exception station by an officer competent to corrected in Exception station by an officer competent to

make decision.make decision.

• Exportport:- System exports the data generated in above :- System exports the data generated in above

steps to server for further processing like steps to server for further processing like

editing/aggregation/tabulation etc.editing/aggregation/tabulation etc.

Page 27: India Census Data Processing

eFLOW CONTROLLER

Page 28: India Census Data Processing

e-FLOW WORKFLOW FOR ORGI

Page 29: India Census Data Processing

EXAMPLE – BACK IMAGE

Page 30: India Census Data Processing

EXAMPLE – IMPROPER GRID LINES

Page 31: India Census Data Processing

EXAMPLE – USE OF WHITENER

Casual writing pattern

Page 32: India Census Data Processing

CAC Of MOTHER TONGUE

Page 33: India Census Data Processing

CAC OF HIGHEST EDUCATION LEVEL ATTAINED

Page 34: India Census Data Processing

CAC OF NATIONAL INDUSTRIAL CLASSIFICATION

NIC

Page 35: India Census Data Processing

HOUSEHOLD SCHEDULE IMAGE OF SIDE A

Page 36: India Census Data Processing

HOUSEHOLD SCHEDULE IMAGE OF SIDE B

Page 37: India Census Data Processing

FORM-ID STATION

Page 38: India Census Data Processing

MANUAL-ID STATION

Page 39: India Census Data Processing

IMAGE AFTER FORMOUT IN PROCESSING

Page 40: India Census Data Processing

SEGMENTATION OF A FIELD IN PROCESSING

Page 41: India Census Data Processing

VOTING IN PROCESSING

3 3 8 3

ICR 1 ICR 4ICR 3ICR 2

Majority = 3 Unanimous = ?

Page 42: India Census Data Processing

FINAL RESULT IN PROCESSING

Page 43: India Census Data Processing

TILING STATION

Page 44: India Census Data Processing

COMPLETION STATION

[Field mode display]

Page 45: India Census Data Processing

EXCEPTION STATION

Form FieldDate

Original Form Image Viewer

Exception Area

Page 46: India Census Data Processing

EXPORT STATION

Page 47: India Census Data Processing

HOUSEHOLD SCHEDULE- SIDE AHOUSEHOLD SCHEDULE- SIDE A

Mother Tongue &Other languages

Name of SC/ST EducationReligion

Page 48: India Census Data Processing

NCO

HOUSEHOLD SCHEDULE- SIDE B

NCO NICPlace of Birth &Last residence

Page 49: India Census Data Processing

DATA CAPTURE & PROCESSING

Selection of technologySelection of technology OMR/OCR / ICR in 2001 OMR/OCR / ICR in 2001

Recognition of hand written descriptive entries in different languages is beyond the capabilities of the known ICR SW and hence a conscious decision was taken to go in for the recognition of Only Numeric Characters, leaving the rest to be handled thru Image enabled computer assisted coding (CAC) . Following key features were introduced in the data capture solution.

Parameters for selecting the ICR Software

Highest recognition rate and lowest percentage of false positive with customization and assured support & Training

•Facility of organized workflow in LAN environment with centralized controls with Computer Assisted Coding facility.

•In built quality enhancement tools to trap the wrongly recognized characters so as to facilitate corrective action.

•Use of multiple engines with voting algorithm. Ability to incorporate validation rules to trap inconsistent entries/wrong recognition. Learning capabilities of engines.

Page 50: India Census Data Processing

DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING• Parameters for selecting the scannerParameters for selecting the scanner

– Speed to match with our volumeSpeed to match with our volume

– Duty cycle (life and production tolerance)Duty cycle (life and production tolerance)

– Must be duplex scanningMust be duplex scanning

– Resolution minimum to 200dpiResolution minimum to 200dpi

– Image enhancement facility like noise removing, Image enhancement facility like noise removing, skewing, cropping, contrast skewing, cropping, contrast

– Hopper size and scanning path(U,J or flat belt)Hopper size and scanning path(U,J or flat belt)

– Maintenance & Training servicesMaintenance & Training services

Page 51: India Census Data Processing

DATA CAPTURE & PROCESSINGDATA CAPTURE & PROCESSING

Selection of Scanner/Hardware/ICR softwareSelection of Scanner/Hardware/ICR software

• High level technical committee has evaluated and High level technical committee has evaluated and selected the above items on the basis of demonstrated selected the above items on the basis of demonstrated capabilities of concerned items by various vendorscapabilities of concerned items by various vendors

• As a result CMC was selected System Integrator, ACER As a result CMC was selected System Integrator, ACER and HP for Computer Hardware with OS Window NT and HP for Computer Hardware with OS Window NT 4.04.0

• Kodak Module 7520 Scanner, TIS for ICR softwareKodak Module 7520 Scanner, TIS for ICR software

• National Informatics Centre has done LAN cabling and National Informatics Centre has done LAN cabling and inspection of Hardwareinspection of Hardware

• Up gradation of 15 Data CentersUp gradation of 15 Data Centers

Page 52: India Census Data Processing

SETUP AT D.P. DIVISION (HQ)

HARDWARE

Server: (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives CD Writer Local Area NetworkIntelligent Workstations (P-III)800MHz, 128 MB, 9GB HDD,CD & 1.4 MB Floppy DriveLaser & Line Matrix Printer

SOFTWARE

Operating Systems: Windows 98, Windows NT

Latest Software Packages: IMPS, MS-Office, MS Visual Studio, MS SQL Server, ISM Publisher (Hindi, English), Adobe Publishing Collection

Page 53: India Census Data Processing

SETUP AT D.D.E. CENTRES15 Locations (State Capitals)

HARDWAREHigh Speed Scanner – 24 (Nos.)Server (45 No.): (P-III, 800 MHz, 512 MB, 6*36 GB HDD, CD & 1.44 MB Floppy Drive) 40/80 GB DLT Drives 100 MB Zip Drives, CD Writer Local Area Network24 Workstation with each ServerIntelligent Workstations (P-III)800MHz, 128 MB, 9GB HDD, Laser & Line Matrix Printer

SOFTWAREOperating Systems: Windows NT, Windows 98,

Latest Software Packages: E-FLOW, MS-OFFICE, Software Package for Computer Assisted Coding

Page 54: India Census Data Processing

SNAPSHOTS OF HARDWARE RESOURCESSNAPSHOTS OF HARDWARE RESOURCESS

lno

Lo

ca

tio

n

Fil

e P

ort

al

Fo

rmID

Pro

ce

ssin

g

Ex

po

rt

ME

RG

E

Co

ntr

oll

er

Su

b-t

ota

l

Sca

n

RC

Ma

nu

al

ID

Ex

ce

pti

on

Su

b-T

ota

l

Til

e

Co

mp

leti

on

Su

b t

ota

l

To

tal

PC

s

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 171 Ahmedabad 1 3 6 1 1 1 13 1 3 2 5 11 7 26 33 572 Bangalore 1 3 6 1 1 1 13 1 3 2 6 12 7 27 34 593 Bhopal

eflow1 1 2 3 1 1 1 9 1 1 1 3 6 3 13 16 31eflow2 1 2 3 1 1 1 9 1 1 1 3 6 3 13 16 31eflow3 1 2 3 1 1 1 9 1 1 3 5 3 13 16 30

4 Bhubaneswar 1 2 4 1 1 1 10 1 3 1 4 9 5 19 24 435 Chandigarh

eflow1 1 1 4 1 1 1 9 1 3 1 3 8 4 15 19 36eflow2 1 1 4 1 1 1 9 1 3 1 3 8 4 15 19 36

6 Chennaieflow1 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37eflow2 1 2 4 1 1 1 10 1 3 1 3 8 4 16 20 38

7 Delhieflow1 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37eflow2 1 2 4 1 1 1 10 1 3 1 3 8 4 15 19 37eflow3 1 1 3 1 1 1 8 3 1 3 7 4 13 17 32eflow4

8 Guwahati 1 2 5 1 1 1 11 1 3 1 4 9 6 20 26 46

Un-manned PC Operators PCSupervisory staff PCDistribution of PCs for various stages of Form Processing using e-FLOW - HHOLD PROJECT

Page 55: India Census Data Processing

SNAPSHOTS OF HARDWARE RESOURCESSNAPSHOTS OF HARDWARE RESOURCESS

lno

Lo

cati

on

Fil

e P

ort

al

Fo

rmID

Pro

cess

ing

Ex

po

rt

ME

RG

E

Co

ntr

oll

er

Su

b-t

ota

l

Sca

n

RC

Ma

nu

al

ID

Ex

cep

tio

n

Su

b-T

ota

l

Til

e

Co

mp

leti

on

Su

b t

ota

l

To

tal

PC

s

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 16 179 Hyderabad

eflow1 1 2 5 1 1 1 11 1 3 1 4 9 5 19 24 44eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 19 24 43

10 Jaipur 1 3 7 1 1 1 14 1 3 2 6 12 7 28 35 6111 Kolkatta

eflow1 1 1 3 1 1 1 8 1 3 1 3 8 4 14 18 34eflow2 1 1 3 1 1 1 8 1 3 1 3 8 4 13 17 33

12 Lucknoweflow1 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42eflow3 1 2 5 1 1 1 11 3 1 4 8 5 19 24 43

13 Mumbaieflow1 1 2 3 1 1 1 9 1 3 1 3 8 4 15 19 36eflow2 1 2 3 1 1 1 9 1 3 1 3 8 4 15 19 36eflow3 1 2 4 1 1 1 10 3 1 3 7 4 15 19 36

14 Patnaeflow1 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42eflow2 1 2 4 1 1 1 10 1 3 1 4 9 5 18 23 42

15 Trivandrum 1 2 4 1 1 1 10 1 3 1 3 8 4 16 20 38Total 28 54 114 28 28 28 280 24 78 31 101 234 128 480 608 1122

Distribution of PCs for various stages of Form Processing using e-FLOW - HHOLD PROJECTUn-manned PC Supervisory staff PC Operators PC

Page 56: India Census Data Processing

DATA CAPTURE & PROCESSING

Role of the Integrator

• Supply, Installation and On-site Maintenance of SCANNERS.

• Supply, Installation of Form Processing Software.

• Manage LAN and load balancing from one stage to another.

• Provide Software Core-Team centrally at ORGI HQ.

• Impart operational training to the staff at each location.

• Provide Software Personnel at each site

• Provide scanner operators and carry out Scanning operations

• Achieve > 90% recognition rate and < 2% false positive


Recommended