Post on 16-Mar-2020
transcript
HVR 6
100150
200250
300
Computational thinking and the pedagogy of network science
Aaron Clauset@aaronclausetComputer Science Dept. & BioFrontiers InstituteUniversity of Colorado, BoulderExternal Faculty, Santa Fe Institute
20 June 2017 ©
who are network scientists?
Physicists
Computer Scientists
Applied Mathematicians
Statisticians
Biologists
Ecologists
Sociologists
Political Scientists
it’s a big community!}
who are network scientists?
Physicists
Computer Scientists
Applied Mathematicians
Statisticians
Biologists
Ecologists
Sociologists
Political Scientists
it’s a big community!
• different traditions
• different tools
• different questions
}
who are network scientists?
Physicists
Computer Scientists
Applied Mathematicians
Statisticians
Biologists
Ecologists
Sociologists
Political Scientists
it’s a big community!
• different traditions
• different tools
• different questions
increasingly, not ONE community, but MANY, only loosely interacting communities
}
who are network scientists?
Physicists
Computer Scientists
Applied Mathematicians
Statisticians
Biologists
Ecologists
Sociologists
Political Scientists
phase transitions, universality
data / algorithm oriented, predictions
dynamical systems, diff. eq.
inference, consistency, covariates
experiments, causality, molecules
observation, experiments, species
individuals, differences, causality
rationality, influence, conflict
}
how do we teach networks?
what are core concepts?
what are core tools?
these vary by field!
how do we teach networks?
what are core concepts?
what are core tools?
these vary by field!
for students:
• what topics should be covered?
• how should they be taught?
• how should evaluations be structured?
• what is introductory vs. what is advanced?
http://santafe.edu/~aaronc/courses/5352/
Network Analysis and Modeling
Instructor: Aaron Clauset
This graduate-level course will examine modern techniques for analyzing and modeling the structure and dynamics of complex networks. The focus will be on statistical algorithms and methods, and both lectures and assignments will emphasize model interpretability and understanding the processes that generate real data. Applications will be drawn from computational biology and computational social science. No biological or social science training is required. (Note: this is not a scientific computing course, but there will be plenty of computing for science.)
Full lectures notes online (~150 pages in PDF)
how do we teach networks?
how do we teach networks?
http://santafe.edu/~aaronc/courses/5352/
Network Analysis and Modeling
Instructor: Aaron Clauset
This graduate-level course will examine modern techniques for analyzing and modeling the structure and dynamics of complex networks. The focus will be on statistical algorithms and methods, and both lectures and assignments will emphasize model interpretability and understanding the processes that generate real data. Applications will be drawn from computational biology and computational social science. No biological or social science training is required. (Note: this is not a scientific computing course, but there will be plenty of computing for science.)
Full lectures notes online (~150 pages in PDF)
• graduate level Computer Science course
• 4 instances since 2013
• roughly 140 students (mostly CS, some APPM, PHYS, etc.)
• regular lectures
• 6 problem sets + 1 team project (of their choosing; team = 2-3 students)
• 150 pages of textbook style lecture notes
• textbooks:
how do we teach networks?
Network Analysis and Modeling
learning goals:
1. develop a network intuition for reasoning about how structural patterns are related, and how they influence dynamics in / on networks
2. master basic terminology and concepts
3. master practical tools for analyzing / modeling structure of network data
4. build familiarity with advanced techniques for exploring / testing hypotheses about networks
how do we teach networks?Network Analysis and Modeling
current format (lectures):
1. network basics2. centrality measures3. random graphs (simple)4. configuration model5. large-scale structure (communities, hierarchies, etc.)6. probabilistic generative models (SBMs)7. metadata, label and link prediction8. spreading processes (social, biological, SI-type)9. data wrangling + data sampling (artifacts)10. role of statistics in hypothesis generation / testing11. spatial networks12. citations networks, dynamics, preferential attachment13. temporal networks14. student project presentations
building intuitionbasic concepts, toolspractical toolsadvanced tools
network data for assignments
lessons learned
what’s difficult:
1. students need to know many different things:
2. can’t teach all of these things to all types of students!
• vast amounts of advanced material in each of these directions
• students have little experience / intuition of what makes good science
• some probability Erdos-Renyi, configuration, calculations• some mathematics physics-style calculations, phase transitions• some statistics basic data analysis, correlations, distributions• some machine learning prediction, likelihoods, features, estimation algorithms• some programming data wrangling, coding up measures and algorithms
lessons learned
what works well:
1. simple mathematical problemsbuild intuition + practice with concepts
nA nB
A
B
calculate the diameter
closeness centrality
modularity of a line graph
n− rr
betweenness of
Q(r)
A
A
lessons learned
what works well:
2. analyze real networkstest understanding + practice with implementing methods
102
103
104
105
2
2.5
3
3.5
Network size, n
Mean g
eodesi
c path
length
USF
Haverford
Caltech
Penn
mean geodesics and O(log n)1 4 7 10 13 16 19 22 25 28 31 34
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
vertex label
harm
onic
centr
alit
y
Karate clubconfiguration modelreal-world network
node centrality vs. configuration model(when is a pattern interesting?) Assortativity (gender)
-0.1 -0.05 0 0.05 0.1
Den
sity
×10-3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
attribute assortativity
lessons learned
what works well:
3. simple prediction taskstest intuition + run numerical experiments
Fraction of labels observed, f0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Frac
tion
of c
orre
ct la
bel p
redi
ctio
ns
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1malaria genes, HVR5Norwegian boards, net1m-2011-08-01
Fraction of edges observed, f0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AUC
0.5
0.6
0.7
0.8
0.9
1HVR5 malaria genes network
degree productJaccard coefficientshortest pathbaseline (guessing)
label prediction via homophily link prediction via heuristic
in-degree, kin
100 101 102 103 104 105
Pr(K
≥ k
in)
10-6
10-5
10-4
10-3
10-2
10-1
100
r=1r=4no preferential attachment
015
5
1
l
10
10
cin-cout p
15
0.550 0
lessons learned
what works well:
4. simple simulationsexplore dynamics vs. structure + numerical experiments
simulate epidemics (SIR) on planted partitions simulate Price’s model
lessons learned
what works well:
5. team projectsteamwork + exploring their own ideas
key takeaways
0
0.51
a(t)
0
1
0
1
0200
400600
0
1
alignment position t
1
23 4
56
78
9
calculate alignment scoresconvert to alignment indicatorsremove short aligned regionsextract highly variable regions
NGDYKEKVSNNLRAIFNKIYENLNDPKLKKHYQKDAPNY
NGDYKKKVSNNLKTIFKKIYDALKDTVKETYKDDPNY
NGDYKEKVSNNLRAIFKKIYDALEDTVKETYKDDPNY
16
6
13
16 6
13
A
B
C
D
• network intuition is hard to develop!
good intuition draws on many skills (probability, statistics, computation, causal dynamics, etc.)
• best results for students seem to come from1. exercises to get practice with calculations2. practice analyzing diverse real-world networks3. conducting out numerical experiments & simulations
• students loved practical tasks (link and label prediction very popular)
• centrality measures seem useless… (why do we teach them?)
• configuration model as go-to null model for checking if pattern is "interesting"
• idea: teach concepts using link prediction as common theme?
key takeaways
0
0.51
a(t)
0
1
0
1
0200
400600
0
1
alignment position t
1
23 4
56
78
9
calculate alignment scoresconvert to alignment indicatorsremove short aligned regionsextract highly variable regions
NGDYKEKVSNNLRAIFNKIYENLNDPKLKKHYQKDAPNY
NGDYKKKVSNNLKTIFKKIYDALKDTVKETYKDDPNY
NGDYKEKVSNNLRAIFKKIYDALEDTVKETYKDDPNY
16
6
13
16 6
13
A
B
C
D
thanks
http://santafe.edu/~aaronc/courses/5352/
Network Analysis and Modeling
Instructor: Aaron Clauset
This graduate-level course will examine modern techniques for analyzing and modeling the structure and dynamics of complex networks. The focus will be on statistical algorithms and methods, and both lectures and assignments will emphasize model interpretability and understanding the processes that generate real data. Applications will be drawn from computational biology and computational social science. No biological or social science training is required. (Note: this is not a scientific computing course, but there will be plenty of computing for science.)
Full lectures notes online (~150 pages in PDF)
fin
0
0.51
a(t)
0
1
0
1
0200
400600
0
1
alignment position t
1
23 4
56
78
9
calculate alignment scoresconvert to alignment indicatorsremove short aligned regionsextract highly variable regions
NGDYKEKVSNNLRAIFNKIYENLNDPKLKKHYQKDAPNY
NGDYKKKVSNNLKTIFKKIYDALKDTVKETYKDDPNY
NGDYKEKVSNNLRAIFKKIYDALEDTVKETYKDDPNY
16
6
13
16 6
13
A
B
C
D
network science