Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se

Post on 23-Feb-2016

97 views 0 download

description

Swedish inventors  ‐  matching to registers and descriptive data Presentation at APE-INV Brussels September 5 th 2011. Lina Ahlin and Olof Ejermo lina.ahlin@circle.lu.se olof.ejermo@circle.lu.se. - PowerPoint PPT Presentation

transcript

C I R C L ECentre for Innovation, Research and Competence in the Learning Economy

L U N D U N I V E R S I T YP.O. Box 117, SE-221 00 Lund, Sweden

Swedish inventors matching to registers and‐descriptive data

Presentation at APE-INVBrussels September 5th 2011

Lina Ahlin and Olof Ejermolina.ahlin@circle.lu.se

olof.ejermo@circle.lu.se

On the agenda

• What is so special with Swedish data• 1st matching • 2nd matching • Future – how to reach 100% match rate?• (Results)

Linking inventors to registers

• EPO applied patents 1978-2009 for inventors with addresses in Sweden.

• Matching done on name-home address combinations

• Problem 1: different inventors may have the same name

• Problem 2: addresses may be old• How to verify person identity and connect to

Swedish register data?

Swedish dataQ: What makes Swedish data so exciting (and why we want a high match rate)?A: Through Statistics Sweden it is possible to connect individuals to register data which connects several levels of information relevant for innovation studies:• Individual level: field/level of education, age, income, gender,

workplace• Regions: workplace, home municipality• Sectoral level: sectors, firm size, level of R&D...

can give a multifacetted view of innovation, but need a personal identifier ”personnummer” to do this

e.g. 19500131-3422

Birth date Jan 31st, 1950 Even number = female

1st matching (Oct-Dec 2010)• All Swedes (incl. Personnummer) listed on address register ”SPAR” • Matching of addresses through InfoTorg stores addresses/address changes

latest 3 years addition of personnummer– Individuals under 16 not matched

• Old patents added under the assumption that:Sven Ivar Johanson Sven Ivar JohansonStorgatan 1 = Storgatan 1111 00 Stockholm 111 00 Stockholm

Match rate 64% of inventor-patent pairs. Low peak 23% in 1978 to high peak 93% in 2008. This is because of mobility of inventors.

Register 2008-2010 Patent applied for in 1992

• InfoTorg returned 56% match rate• Manual check (visual – no robot) + 8%

64% match rate

19781980

19821984

19861988

19901992

19941996

19982000

20022004

20062008

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Fractions 64%

1985-2005: present access to individual registers at Statistics Sweden 2006-2009: additions as of Sep. 30th 2011

2nd matching (April-Sep 2011)

• Use public access to registers (Swedish geneaological association )– CD:s of Swedish population (1980)/1990

published by old addresses and birth date– CD ”Book of dead” 1901-2009 address at death

+ personnummer• Match birth date + name to personnummer

using service by InfoTorg or online sources

Methodology

• Extract data from Swedish deadbook and Swedish genealogy records for 1990 (to some extent also 1980) on all individuals in the population by letter

• Generate a variable containing name, address and postal address for all individuals in the population as well as for inventors who are not fully matched

Normalized Levenshtein (”strgroup”) in STATA

• An example of the ”name-address” string:”Sven Ivar Johanson, Storgatan 1, 111 00

Stockholm” (from EPO)= ”Sven Ifwar Johanson, Storgatan 1, 111 00

Stockholm” (from Swedish population 1990) • Replace/insert 3 letters to make strings equal• Divided by length of shortest string (48)

(3/48) = 0.0625 (=a good hit)

Adding date of birth

1. 1990 Levensthein names & adresses2. 1990 Levensthein unique names 3. Levenshtein from CD dead 1901-2009 - names

and adresses 4. Strgroup: similarity on name-address hits 1-35. Some manual additions and minor changes 6. 1980 Levenshtein names and addresses (letters

D&H)

Methodology: continued

• Manually examine each match to see whether Levenshtein-command has matched correctly

• Some hits discarded incl ambiguous name match hits

New match rate 80%

19781979

19801981

19821983

19841985

19861987

19881989

19901991

19921993

19941995

19961997

19981999

20002001

20022003

20042005

20062007

20082009

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Fractions 64%Fractions 80%

Adding personnummer (ongoing)New match rate 80%, but not full personnummer. What to do?1. Use date of birth-part of personal number for fully matched

inventors2. Join all possible combinations of birth dates for those fully

matched and those with only birth dates.3. Run Levenshtein-distance on inventor names4. Small Levenshtein-distance: accept that the inventors are the

same since name and birth date match5. Large Levenshtein-distance: reject6. Further, manually check remaining inventors. Look at

addresses for further confirmation if uncertain.

Adding personnummer ctd.

• Use Deathbook yrs 1975-2009. Use date of birth-part of personal numbers

• Re-run step 2-6 on previous slide

Adding personnummer ctd.

Problem: not all inventors were previously identified no 4 last digitsTwo options to get full personal numbers from birth dates:1. Use InfoTorg again with name + added

parameter ”birthdate”2. Manually add four last digits by using

internet service (www.upplysning.se)

Some matching problems

• Difficult to match individuals who change last names (mainly women) or with common names and who move a lot.

• Two people with the same name can live on the same address (i.e. father names his son after himself) – possibility to match the wrong person. If detected, oldest person is chosen.

• For inventors affiliated with some firms (AstraZeneca), company address given

Towards 100%• Idea: scoring methods based on identified inventors

– Name– Identified co-inventors– Technology class– City– Postal code– Which algorithm?

• Statistics Sweden for validating parent/child name similarity problem?

• Use 1980 population CD?• Strategy of focusing on highly productive unmatched inventors?

Suggestions/questions

Patent distribution by sector

Patent distribution in manufacturing (share of total patenting)

Patent distribution in services (share of total patenting).

Education level among inventors

Percentile distribution of inventors’ patent productivity.

Percentile All patents Contribution Patents 2004-07 Contribution 2004-07

Percentile value Percentile value Percentile value Percentile value

1% 1 0.12 1 0.11

5% 1 0.20 1 0.17

10% 1 0.25 1 0.20

25% 1 0.33 1 0.33

50% 1 0.83 1 0.50

75% 3 1.50 2 1.00

90% 6 3.00 4 2.00

95% 9 5.00 6 3.00

99% 21 11.50 12 5.83

Mean/inventor 2.81 1.40 2.06 0.97

Number of inventors

18 489 18 489 8 526 8 526

Sectors, SNI92-codes, # inventors, contribution 2004-2005.

Sector SNI92-codes Unique inventors, mean/year 2004-2005

Contribution*, mean 2004-2005

% cooperation cross sector

1994-1995

% cooperation cross sector

2004-2005

Primary 1000-14999 8.5 5.9 28% 28%

Manufacturing 15000-37999 1567 749.9 11% 11%

Services 38000-74999, 80410, 80423-80425, 80427-80429, 85200, 85325, 91111-91330, 92110-92130, 92310, 92330-92400, 92611-92614, 92621-99000

806.5 411.1 23% 23%

Academia 80301-80309 and ** 190 72.6 54% 54%

Public sector 75000-80299, 80421-80422, 80426, 85000-85140, 85311-85324, 90000-90008, 92200, 92320, 92511/92530, 92615

62.5 28.4 67% 67%

* ”Contribution” counts patent fractions which adjusts for co-inventorship.** ”Academia” can also in a few cases be found in the sectors R&D in technical and natural sciences (73101-73104) and in technical testing and analysis (74300).

Cooperation by sector, 2004-05Primary Manufacturin

gServices Academia Public

sectorSum

Primary43% 57% 0% 0% 100%

Manufacturing

1% 77% 17% 5% 100%Services

1% 66% 24% 9% 100%Academia

0% 29% 48% 22% 100%Public sector

0% 18% 37% 45% 100%

The most important patenting academic institutions 2004-2005

Univ/institute

Contributions/year

Share Patents/billion research revenue SEK

Patents/thousand FTE, NTM

Lund 20.3 23% 6.3 15.0

Uppsala 11.6 13% 4.2 9.7

Karolinska 11.6 13% 3.9 9.3

KTH 9.8 11% 5.7 8.7

Göteborg 9.0 10% 3.7 10.9

Linköping 7.9 9% 6.4 10.3

Chalmers 7.2 8% 5.1 8.6

Stockholm 2.9 3% 1.7 4.1

Umeå 2.3 3% 1.5 2.8

Sum 82.6 94% 4.4 9.3

Others (13) 5.0 6% 1.3 1.8