+ All Categories
Home > Documents > Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty

Date post: 15-Mar-2016
Category:
Upload: xander-powers
View: 37 times
Download: 0 times
Share this document with a friend
Description:
Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation. - PowerPoint PPT Presentation
23
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
Transcript
Page 1: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science1

Scaling in the Geography of US Computer Science

Rui Carvalho and Michael BattyUniversity College London

[email protected] [email protected]://www.casa.ucl.ac.uk/

Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)

Page 2: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science2

Motivation• Why Geography?

– Scientists: who can I collaborate with in my city/country?

– Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?

– Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);

• Preprint server challenges:– [USA] NIH-funded investigators are

required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);

– [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;

• Data mining challenges:– Processing of large databases give promise

to uncover knowledge hidden behind the mass of available data;

– Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;

• Statistical Challenges:– Conventional wisdom holds that

(geographical) spatial point processes have characteristic scales...

– Yet most “real world” phenomena are often far from equilibrium.

PNAS, 6 April 2004

Page 3: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science3

Plan• Open Archives Datasets:

– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)

• Geographical Datasets:– The US census bureau makes available on the web datasets for

geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:

– Extract ZIP codes from authors’ addresses;– Map research centres geographically;

• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical scale?

Page 4: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science4

Plan• Open Archives Datasets:

– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)

• Geographical Datasets:– The US census bureau makes available on the web datasets for

geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:

– Extract ZIP codes from authors’ addresses;– Map research centres geographically;

• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical

scale?

Can Statistical Physics Help?

Page 5: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science5

What is Citeseer?• Founded by Steve Lawrence and C. Lee Giles in

1997 (NEC);• Now at Penn State http://citeseer.ist.psu.edu/• Archive of computer science research papers

harvested from the web and submitted by users;• Currently (Dec 2005) contains over 730,000

documents;• Citeseer was developed as a model for

Autonomous Citation Indexing, i.e. citation indexes are created automatically;

• Can search content in postscript and PDF files.

Page 6: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science6

Data Collecting and Parsing• Citeseer metadata:

– 525,055 computer science research papers;– 399,757 (76.14%) of which are unique;– 103,172 (25.81%) of the unique papers have one or more US

authors;– 2,975 different ZIP codes in the unique papers belong to the

US conterminous states (48 states, plus the District of Columbia);

• 5 most productive ZIP codes:1. Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;2. Count 3403 Zip: 02139 MIT, Cambridge, MA;3. Count: 2954 Zip: 94305 Stanford Univ, CA;4. Count: 2691 Zip: 94720 Univ California at Berkley, CA;5. Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL

Page 7: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science7

Q1: How productive are the research centres?

Page 8: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science8

Q2: Non-trivial spatial structures?

Page 9: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science9

The Geography of Citeseer

Page 10: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science10

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

Page 11: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science11

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

Page 12: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science12

CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Page 13: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science13

CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Page 14: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science14

Spatial Point Processes

• Moments:– First moment: ρ, expected number of points

per unit area;– Second moment: Ripley’s function. ρK(r) is

the expected number of points within distance r of a point.

• For a Poisson process, ;

• But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.

2K r r

Page 15: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science15

The Two-Point Correlation Function

• The two-point correlation function

describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;

• For a Poisson process g(r)=1;

• Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.

22 1 2 1 2 1 2, ( ) ( ) ( ) ( )x x dV x dV x g r dV x dV x

Page 16: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science16

Computation of the Two-Point Correlation Function

Intersection with border gives more than one polygon

Geographical range at which the two-point correlation function can be approximated by a power-law

Page 17: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science17

Two-Point Correlation Function

Page 18: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science18

Speculation: knowledge diffusion?

Page 19: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science19

Speculation: Universality?

Page 20: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science20

To find out more

• http://www.casa.ucl.ac.uk/

• Spatially Embedded Complex Systems Engineering (SECSE):http://www.secse.net/members: UCL, Leeds, Southampton, Sussex

[email protected] [email protected]

Page 21: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science21

Page 22: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science22

Plot of state R&D expenditure (NSF) vs population

Page 23: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science23

Poisson Point Process

• We say that a spatial process is completely random iff:– The number of events in any planar region A

with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;

– For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.


Recommended