Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | philomena-harrell |
View: | 227 times |
Download: | 7 times |
Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
Graphing to visualize data
Satish [email protected]
Shiv KalyanaramanGoogle: “Shiv RPI”
[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
Issues with graphing Types of graphs Examples of graph usage & what you get out of them
Art: how to choose what graph to use? Graphing Tools Pitfalls and mistakes in graphing Advanced: visualization In class work: reviewing graphing use in selected technical
papers
Overview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
Thoughts on Presentation Styles
Primary purpose: illustrate to help understand
“The goal of simulation is intuition, not numbers," - R.W. Hamming
Corollary: don’t dump data on the reader. Distill it into presentations that give insight instead…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
Descriptive Statistics
InvolvesCollecting DataPresenting DataCharacterizing DataUnderstanding
data: distill insights!
X = 30.5 SX = 30.5 S22 = 113 = 113
00
2525
5050
Q1Q1 Q2Q2 Q3Q3 Q4Q4
$$
Insights: Somewhat skewed Bell shape: perhaps a Poisson (distrn) would fit?
Statistics obtained from data
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
To graph or not to graph
Use graphs whenTrends in data are not obvious It is hard to explain the X-Y relationship in
words Consider tables if
The number of data-points are smallReader might find exact value of data-points
useful
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
Summary Table: Frequencies
1. Lists Categories & No. Elements in Category 2. Obtained by Tallying Responses in Category 3. May Show Frequencies (Counts), % or Both
Row Is Category
Tally:|||| |||||||| ||||
Major CountAccounting 130Economics 20Management 50Total 200
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
Example Tables from Networking
SACK (Multiple Sources)
LT-TCP (Multiple Sources)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
What kind of graph?
Pie-charts to depict “fraction of a whole” Bar-charts when data-points few and table is not
suitable Line-plots when there are a lot of data-points Box-plots if statistical inference is drawn: shows
1st, 2nd, 3rd quartile for each point. Scatter-plots, 3-d plots only if necessary –
AVOID complex graphs
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
Econ.10%
Mgmt.25%
Acct.65%
Pie Chart
1. Shows Breakdown of Quantity into Categories
2. Useful for Showing Relative Differences
3. Angle Size (360°) x (Percent)
Majors
(360°) (10%) = 36°
36°
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
Pie Chart Networking Example
Source: http://www.caida.org/~bhuffake/papers/skitviz/
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Another eg: VPN Classification
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
00 5050 100100 150150
Acct.Acct.
Econ.Econ.
Mgmt.Mgmt.
Bar Chart
Horizontal Horizontal Bars for Bars for Categorical Categorical VariablesVariables
Bar Length Bar Length Shows Shows Frequency Frequency or %or %
1/2 to 1 Bar 1/2 to 1 Bar WidthWidth
Equal Bar Equal Bar WidthsWidths
Zero PointZero Point
FrequencyFrequency
MajorMajor
Percent Used AlsoPercent Used Also
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
Networking Example Bar Chart
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Example Analysis with Bar Charts
LT-TCP is able toreduce timeouts drasticallykeep the queue non-empty maximizing throughput and capacity utilization.minimize use of FEC to level needed
Shivkumar KalyanaramanRensselaer Polytechnic Institute
15
00
11
22
33
44
55
Histogram: for “distributions”
FrequencyFrequency
Relative Relative FrequencyFrequency
PercentPercent
00 1515 2525 3535 4545 5555
Lower BoundaryLower Boundary
Bars Bars TouchTouch
ClassClass Freq.Freq.15 but < 2515 but < 25 3325 but < 3525 but < 35 5535 but < 4535 but < 45 22
CountCount
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
Recall: Real Example Histogram What is the fairness between TCP goodputs when we use different queuing
policies? What is the confidence interval around your estimates of mean file size? Note: “distribution” need not just be a probability/frequency distribution
FQ
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 4 7 10 13 16 19 22 25 28 31Flow Number
Thro
ughp
ut(M
bps)
RED
0
1
2
3
4
5
6
7
8
9
10
1 4 7 10 13 16 19 22 25 28 31Flow Number
Thro
ughp
ut(M
bps)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
00 5050 100100 150150
Acct.Acct.
Econ.Econ.
Mgmt.Mgmt.
Dot Chart or Scatterplots
FrequencyFrequency
MajorMajor Line Length Line Length Shows Shows Frequency or %Frequency or %
Equal Equal SpacingSpacing
Like Like Horizontal Horizontal Bar ChartBar Chart
Percent Used AlsoPercent Used Also
Horizontal Horizontal Lines for Lines for Categorical Categorical VariablesVariables
Zero PointZero Point
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
Scatter Plots
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
Scatter plots with trends
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
WiFi Analysis: Scatter Plots http://www.sigcomm.org/sigcomm2004/papers/p442-aguayo1111.pdf
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
Line Charts:Example:
Comparative Performance
Note: also plots confidence intervals!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Line Plots for Distributions: Example
Hop count and RTT distributions
Source: http://www.caida.org/~bhuffake/papers/skitviz/
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Recall: Distribution Shape
1. Describes How Data Are Distributed 2. Measures of Shape
Skew = Symmetry
Right-SkewedRight-SkewedLeft-SkewedLeft-Skewed SymmetricSymmetric
MeanMean = = MedianMedian = = ModeModeMeanMean MedianMedian ModeMode ModeMode MedianMedian MeanMean
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
Box Plot
Graphical Display of Data Using5-Number Summary
MedianMedian
44 66 88 1010 1212
QQ33QQ11 XXlargestlargestXXsmallestsmallest
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
3D Graphs Example Illustrates a complex parameter response surface ...
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
3D Plots: N/w Example: Code Red Worm Analysis
http://www.prism.uvsq.fr/users/qst/Tomography/Articles_jmf/renesys_bgp_instabilities2001.pdf http://www.caida.org/outreach/isma/0112/talks/andyo/index.pdf http://www.renesys.com/resource_library/Renesys-NANOG23.pdf
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Contd…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
Tools: Gnuplot
To use with data-generating programs for repetitive plotting
E.g. generate the plot of throughput for every 1 hour interval in the last week.
http://www.gnuplot.info TIP: Export gnuplot plots as “.fig” file and edit it in
xfig for greater flexibility
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Tools: XmGrace
For more intricate details (e.g., creating error-bars, different shades for bar-charts); GUI-driven, very user friendly.
http://plasma-gate.weizmann.ac.il/Grace/ Exports images to EPS (good for LaTeX
documents), PNG (good for PowerPoint) etc. Can also run on Windows on top of Cygwin!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
Tools: MATLAB
For complex 3-d and other statistical plots like box-plots, scatter-plots and in general if enormous quantities of data is involved.
http://www.mathworks.com
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
Tools: Excel Data Presentations
Open up Excel to a new Worksheet. Code a data set as below:
Blue 34
White 68
Red 25
Green 50 Explore simple data presentation possibilities…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
Graphs: things to watch out Purpose: illustrate entire time-series or response
distribution Label the x- and y-axis Check what units the x- and y-axes are in (not “goats” or
“sheep”!) Check if either scale is logarithmic (changes meaning) Check where is the origin (or zero point) for each axis! After understanding WHAT is being plotted, close your
eyes and ask: what will different patterns on this graph imply (relative
to what I want to understand)? See if the relative performance is over- or under-
emphasized (if two systems are being compared) Several examples in the Jain textbook
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
Errors in Presenting Data
1. Using ‘Chart Junk’
2. No Relative Basis in Comparing Data Batches
3. Compressing the Vertical Axis
4. No Zero Point on the Vertical Axis
Shivkumar KalyanaramanRensselaer Polytechnic Institute
34
‘Chart Junk’
Bad PresentationBad Presentation Good PresentationGood Presentation
1960: $1.001960: $1.00
1970: $1.601970: $1.60
1980: $3.101980: $3.10
1990: $3.801990: $3.80
Minimum WageMinimum Wage Minimum WageMinimum Wage
00
22
44
19601960 19701970 19801980 19901990
$$
Shivkumar KalyanaramanRensselaer Polytechnic Institute
35
No Relative Basis
Good PresentationGood Presentation
A’s by ClassA’s by Class A’s by ClassA’s by Class
Bad PresentationBad Presentation
00
100100
200200
300300
FRFR SOSO JRJR SRSR
Freq.Freq.
0%0%
10%10%
20%20%
30%30%
FRFR SOSO JRJR SRSR
%%
Shivkumar KalyanaramanRensselaer Polytechnic Institute
36
Compressing Vertical Axis
Good PresentationGood Presentation
Quarterly SalesQuarterly Sales Quarterly SalesQuarterly Sales
Bad PresentationBad Presentation
00
2525
5050
Q1Q1 Q2Q2 Q3Q3 Q4Q4
$$
00
100100
200200
Q1Q1 Q2Q2 Q3Q3 Q4Q4
$$
Shivkumar KalyanaramanRensselaer Polytechnic Institute
37
No Zero Point on Vertical Axis
Good PresentationGood Presentation
Monthly SalesMonthly Sales Monthly SalesMonthly Sales
Bad PresentationBad Presentation
00
2020
4040
6060
JJ MM MM JJ SS NN
$$
3636
3939
4242
4545
JJ MM MM JJ SS NN
$$
Shivkumar KalyanaramanRensselaer Polytechnic Institute
38
Graphing Practices: In pictures
Shivkumar KalyanaramanRensselaer Polytechnic Institute
39
Graphing Practices…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
40
Graphing Practices…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
41
Graphing Practices….
Shivkumar KalyanaramanRensselaer Polytechnic Institute
42
Checklist: In textbook
Shivkumar KalyanaramanRensselaer Polytechnic Institute
43
More Complex Visualizations Internet topology aspects: CAIDA skitter project
http://www.caida.org/tools/measurement/skitter/visualizations.xml
Shivkumar KalyanaramanRensselaer Polytechnic Institute
44
More…
Shivkumar KalyanaramanRensselaer Polytechnic Institute
45
The End