Date post: | 15-Mar-2016 |
Category: |
Documents |
Upload: | xander-powers |
View: | 37 times |
Download: | 0 times |
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science1
Scaling in the Geography of US Computer Science
Rui Carvalho and Michael BattyUniversity College London
[email protected] [email protected]://www.casa.ucl.ac.uk/
Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science2
Motivation• Why Geography?
– Scientists: who can I collaborate with in my city/country?
– Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?
– Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);
• Preprint server challenges:– [USA] NIH-funded investigators are
required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);
– [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;
• Data mining challenges:– Processing of large databases give promise
to uncover knowledge hidden behind the mass of available data;
– Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;
• Statistical Challenges:– Conventional wisdom holds that
(geographical) spatial point processes have characteristic scales...
– Yet most “real world” phenomena are often far from equilibrium.
PNAS, 6 April 2004
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science3
Plan• Open Archives Datasets:
– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:
– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical scale?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science4
Plan• Open Archives Datasets:
– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:
– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical
scale?
Can Statistical Physics Help?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science5
What is Citeseer?• Founded by Steve Lawrence and C. Lee Giles in
1997 (NEC);• Now at Penn State http://citeseer.ist.psu.edu/• Archive of computer science research papers
harvested from the web and submitted by users;• Currently (Dec 2005) contains over 730,000
documents;• Citeseer was developed as a model for
Autonomous Citation Indexing, i.e. citation indexes are created automatically;
• Can search content in postscript and PDF files.
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science6
Data Collecting and Parsing• Citeseer metadata:
– 525,055 computer science research papers;– 399,757 (76.14%) of which are unique;– 103,172 (25.81%) of the unique papers have one or more US
authors;– 2,975 different ZIP codes in the unique papers belong to the
US conterminous states (48 states, plus the District of Columbia);
• 5 most productive ZIP codes:1. Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;2. Count 3403 Zip: 02139 MIT, Cambridge, MA;3. Count: 2954 Zip: 94305 Stanford Univ, CA;4. Count: 2691 Zip: 94720 Univ California at Berkley, CA;5. Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science7
Q1: How productive are the research centres?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science8
Q2: Non-trivial spatial structures?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science9
The Geography of Citeseer
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science10
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science11
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science12
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science13
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science14
Spatial Point Processes
• Moments:– First moment: ρ, expected number of points
per unit area;– Second moment: Ripley’s function. ρK(r) is
the expected number of points within distance r of a point.
• For a Poisson process, ;
• But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.
2K r r
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science15
The Two-Point Correlation Function
• The two-point correlation function
describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;
• For a Poisson process g(r)=1;
• Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.
22 1 2 1 2 1 2, ( ) ( ) ( ) ( )x x dV x dV x g r dV x dV x
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science16
Computation of the Two-Point Correlation Function
Intersection with border gives more than one polygon
Geographical range at which the two-point correlation function can be approximated by a power-law
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science17
Two-Point Correlation Function
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science18
Speculation: knowledge diffusion?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science19
Speculation: Universality?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science20
To find out more
• http://www.casa.ucl.ac.uk/
• Spatially Embedded Complex Systems Engineering (SECSE):http://www.secse.net/members: UCL, Leeds, Southampton, Sussex
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science21
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science22
Plot of state R&D expenditure (NSF) vs population
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science23
Poisson Point Process
• We say that a spatial process is completely random iff:– The number of events in any planar region A
with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;
– For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.