+ All Categories
Home > Documents > Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000...

Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000...

Date post: 23-Apr-2020
Category:
Upload: others
View: 15 times
Download: 0 times
Share this document with a friend
33
Visualizing data Why it's important How to do it well
Transcript
Page 1: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Visualizing data

Why it's importantHow to do it well

Page 2: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Comparison is the primary occupation of scientists

● Compare treatment groups to control● Compare treatments to one another● Correct conclusions only possible when the

correct comparison is made● Visualizing experimental data makes

comparison easy

Page 3: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Not just pretty pictures

● Visualization is an important part of correct data analysis

▬ Deriving knowledge from information▬ Deriving meaning from data – especially large data

sets

● Visualization is an important part of communicating results to others

▬ We're visual creatures▬ A picture is worth a thousand words

Page 4: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Example: what is the relationship between mass and wing length?

● Expect bigger birds to have bigger wing spans

● Flat line – mass is not related to wing length in this bird

● What's wrong with this picture?

Page 5: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Grouped by sex

● Group by sex, and fit a line to each sex's data swarm

● Now the pattern is apparent

Page 6: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Using graphs to understand nature of relationship between variables

● You can use graphical methods to get an idea of the functional relationship between variables

● If we want to predict how a response variable changes when we make a change in a predictor, we need to know the correct functional form

● Different functions are straight lines on plots with logarithmic axes

▬ Log-log plots – both x and y are on log scales, power functions will be linear

▬ Semi-log plots – one linear axis, one log-scale axis– Exponential relationships are linear when y-axis is log-scale– Logarithmic relationships are linear when x-axis is log-scale

Page 7: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Three power function

relationships between x and y

0 2000 4000 6000 8000 10000 120000

20000

40000

60000

80000

100000

120000

X and Y linearly related

X

Y

0 2000 4000 6000 8000 10000 120000

200000000

400000000

600000000

800000000

1000000000

1200000000

Y = aX^2

X

Y

0 2000 4000 6000 8000 10000 120000

2000000000000

4000000000000

6000000000000

8000000000000

10000000000000

12000000000000

Y = aX^3

X

Y

Y=a X b

Y = aX1

Y = aX2

Y = aX3

Page 8: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Log Y scales linearly with log X

log (Y )=log (a )+ b log (X )

Y=a X b

y = b + mx

Page 9: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Log-log plots

10 100 1000 100001

100

10000

1000000

100000000

10000000000

1000000000000

100000000000000

Three functions on a log-log scale

Linear

Squared

Cubed

Log (X)

Log

(Y

)

Slope = 3

Slope = 2

Slope = 1

Both axes on a log scale

If data are a straight line on a log-log plot, then the relationship is a power function

y

x

Page 10: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Exponential relationships

Y=a10bX

0 2 4 6 8 10 120.00E+000

2.00E+016

4.00E+016

6.00E+016

8.00E+016

1.00E+017

1.20E+017

b = 2

X

Y

0 2 4 6 8 10 120.00E+000

2.00E+026

4.00E+026

6.00E+026

8.00E+026

1.00E+027

1.20E+027

b = 3

X

Y

Page 11: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Log of Y is linear with X

Y=a10bX

log (Y )=log(a)+bX

Page 12: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

On a semi-log plot – y-axis on log scale

0 2 4 6 8 10 121.00E-003

1.00E+000

1.00E+003

1.00E+006

1.00E+009

1.00E+012

1.00E+015

1.00E+018

1.00E+021

1.00E+024

1.00E+027

All three on a semi-log plot

Linear

b = 2

b = 3

X

Y

X is on a linear scale

Page 13: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Logarithmic relationships

10Y=aX b

Page 14: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Y is related to the log of X

10Y=aX b

Y=log (a)+b log (X )

Y is on a linear scale, x is logarithmic

Page 15: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Wrong axis scales for the data lead to curved lines

10 2010 4010 6010 8010 10010120101.00E+000

1.00E+002

1.00E+004

1.00E+006

1.00E+008

1.00E+010

1.00E+012

1.00E+014

Power functions in a semi-log plot

Linear

Squared

Cubed

Log (X)

Log

(Y)

1 101.00E-003

1.00E+000

1.00E+003

1.00E+006

1.00E+009

1.00E+012

1.00E+015

1.00E+018

1.00E+021

1.00E+024

1.00E+027

Exponential functions in a log-log plot

Linear

b = 2

b = 3

X

Y

Thus, changing the scale from linear to logarithmic can help you diagnose the functional relationship between variables

Once we know the relationship, we can have Excel give us the equation for the line

Page 16: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Common graph types

● The graph you should use depends on the type of data you will display

● Common graph types (i.e. those supported by Excel) cover most of the basic data display tasks

● Less common, task-specific graph types (not supported by Excel) are used in various fields of Biology

▬ Statistical packages▬ Dedicated graphing software

Page 17: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Excel's graph typesGraph type Use

Column A numeric variable plotted at levels of a categorical variable

Bar A horizontal column chart

Histogram Displaying distribution of data (not a distinct graph type in Excel)

Line Values of a numeric variable displayed at the same levels of a categorical variable.

Pie Proportions, percentages

Area Line graph with the area below the lines shaded

Scatter Relationship between two numeric variables

Surface A three dimensional surface, with categorical x,y and numeric z

Bubble A scatter plot with symbol size set to display a third variable

Radar Each numeric variable is a ray, each observation is plotted on each ray, with points connected

Page 18: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Column chart

One categorical variable (grouping)One numeric variable (response)

Data in Excel

Pivot table for graphing

Graph of pivot table data

Page 19: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Grouped column chart

Two categorical variables (treatment group, plant type)One numeric variable (response)

Data in ExcelPivot table for graphing

Graph of pivot table data

Page 20: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

HistogramData in Excel

Bins, midpoints, and frequencies

Bar chart of freq's, no gap between bars

Excel is not a good choice for histograms – use MINITAB, or equivalent

Page 21: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Line

Data in Excel

Graph

X axis can be numbers, dates, ordinal categories

BUT, regardless, axis is treated as a categorical axis

Order on the graph is the same as the order in the sheet

Page 22: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Line graphs are easy to misuse

Data in Excel

Graph

X categories are not in order

Amount of spacing between them is not even, but they're plotted as though the spacing is the same

Page 23: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Scatter

Without connecting lines With connecting lines

Page 24: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Line graphs vs. scatter plots with connected points?

Line graph Scatter plot

Page 25: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Pie chart

Data in ExcelGraph of percentages

Proportions of totalPercentagesRelative frequencies

Counts will be converted to proportions of the total (okay)

Any other numbers used will be converted to proportions (not okay)

Page 26: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

High order data sets

● Some data sets have many different variables● Flat screens only have 2 dimensions – two

variables easy to display● Adding variables means adding dimensions we

don't have▬ 3-d graphs use depth cueing (perspective tricks)▬ Plot “slices” through the data▬ Use symbols/lines

Page 27: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Surface plot – depth cueing

Data in Excel

Graph of values in body of matrix

Three dimensionsX and Y are row and column labels, treated as categoricalZ is numeric, in body of matrix

Page 28: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Plotting slices across a third variable

● Can group data based on levels of a third variable

● You can see the effect of grouping by comparing the groups

● Example: plotting the length, width, area data as a set of lines

Page 29: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Slices through the data – lines defined by lengths

Data in Excel

Each length is a series, width on x-axis

Line colors selected to indicate lengths

Page 30: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Symbol properties

● Can subset the data and display with:▬ Symbol size▬ Symbol color▬ Symbol type

Page 31: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Bubble chartData in Excel

Chart - symbolsize is proportional topetal length

Page 32: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Radar charts – multiple numeric axes

● Each ray is a different variable● Each data point is plotted on each ray, with

lines connecting

Page 33: Visualizing data · y = b + mx. Log-log plots 10 100 1000 10000 1 100 10000 1000000 100000000 10000000000 1000000000000 100000000000000 Three functions on a log-log scale Linear Squared

Radar plot with four axes

Data in Excel

Chart


Recommended