+ All Categories
Home > Data & Analytics > Data visualisations quality aspects

Data visualisations quality aspects

Date post: 03-Aug-2015
Category:
Upload: european-environment-agency
View: 44 times
Download: 0 times
Share this document with a friend
31
Data visualisation Daviz, quality and interoperability
Transcript
Page 1: Data visualisations quality aspects

Data visualisationDaviz, quality and interoperability

Page 2: Data visualisations quality aspects

About me

● Web technology manager (EEA)● M.Sc. in Computer Science

(Lund University, SWE)● Surveyor (ITA)● 15 years in IT and web development

(programming and project management)● Junior Researcher: Machine vision for

surveillance cameras at Axis● E-commerce websites for telecom industry● Product Owner of DaViz and many powerful Plone Add-ons● Technical manager for the EEA main portal and CMS● Data Visualisation, Data Science, Open Data, Statistics, Semantic Web,

Linked Data, Usability and User Experience, Artificial Intelligence, Agile/Lean management…

[email protected]

Page 3: Data visualisations quality aspects

DaViz, what and why

desktop based

web-based

Page 4: Data visualisations quality aspects

Remove any visual clutter

Page 5: Data visualisations quality aspects

before

after

Page 6: Data visualisations quality aspects

Unsorted (Don’t) Sorted (Do)

Page 7: Data visualisations quality aspects

Remove legend when not needed

There is no need to have a legend when there is only one data category shown. What is measured can be added to the title or axis.

Page 8: Data visualisations quality aspects

Avoid pie charts and donuts

The human mind thinks linearly: we can easily compare lengths/heights of line segments but when it comes to angles and areas most of us can't judge them well.

Page 9: Data visualisations quality aspects

Do you see what works best?

Page 10: Data visualisations quality aspects

Avoid stacked barchart

Don’t Do

Page 11: Data visualisations quality aspects

Correlation does not imply causation

● see also "

Superimposing time series is the biggest source of silly theories"

Per capita consumption of cheese correlates with number of people who died by becoming tangled in their bedsheets

Page 12: Data visualisations quality aspects

Use map only when needed

Page 13: Data visualisations quality aspects

The map on the right is just trying to show too much information at once.

Moreover data would be much easier to compare with a basic bar chart (below).

Page 14: Data visualisations quality aspects

Difficult to compare bar charts placed on map, since they are not aligned. A bar chart would make it much easier and precise to compare countries.

Page 15: Data visualisations quality aspects

Countries with relative small area are hidden, countries with large areas are made more prominent (intentional?). Is country’s area really relevant here? Is the geo-distribution important? How to compare properly?

Page 16: Data visualisations quality aspects

Colors

● Different colors should be used for

different categories (e.g., male/female,

types of fruit), not different values in a

range (e.g., age, temperature).

● Do not use rainbows for range values

● If you want color to show a numerical

value, use a range that goes from

white to a highly saturated color in one

of the universal color categories. no

rainbows

Page 17: Data visualisations quality aspects

Don’t Do

Page 18: Data visualisations quality aspects

Don’t forget 7%-10% of

your male audience

(color deficiency)

what color-deficient people seeoriginal chart

Use Vischeck to test your images. If the chart is

readable in black and white than it is even better!

Page 19: Data visualisations quality aspects

Choose your chart type wisely

Online tools like the Data Visualization Catalogue or a decision diagram [2006, A.Abela] helps you finding the right chart for your data.

Page 20: Data visualisations quality aspects

Data provenance, trust, legitimacy

● Adding data source information helps giving credibility and trust in your chart

● When adding source info on your chart, distinguish datasource info from figure source info

● Disclose who financed the data visualisation work and data collection

● Disclose your data and methodology -> reproducible and verifiable

Page 21: Data visualisations quality aspects

from: “Legitimacy, transparency,reproducibility”, Andrea Saltelli, JRC, Head of the Econometrics and Applied Statistics Unit

Page 22: Data visualisations quality aspects

Show the level of confidence, build trust

Ask these questions before publishing your chart, and be prepared for the critiques:

1. What was the source of your data?2. How well do the sample data represent the population?3. Does your data distribution include outliers? How did they

affect the results?4. What assumptions are behind your analysis? Might certain

conditions render your assumptions and your model invalid?5. Why did you decide on that particular analytical approach?

What alternatives did you consider?6. How likely is it that the independent variables are actually

causing the changes in the dependent variable? Might other analyses establish causality more clearly?

Page 24: Data visualisations quality aspects

Typical statistical error - EU trends

See online example

It is not statistically correct to make a trend analysis of data across timewhen the data in question (or sample) is not representative for the whole.E.g. EU12 is not representative for EU25 or EU28, therefore the data cannotbe used to state a trend for the entire EU as it is in 2014, EU has changed!

very important info!

Page 25: Data visualisations quality aspects

Typical statistical error - including no data

See online example

We cannot say “20.9% of our colleagues are male”. But we can say “20.9% of the sample we met are male”, but this is not saying much about the entire population (the entire staff).

Page 26: Data visualisations quality aspects

Typical statistical error - including no data

See online example

If we have used a proper sampling technique, e.g. randomly selecting the staff, we have a sample of (580 people) that is representative for the whole (1000 people) with a 95% confidence level and a margin-error of 2.64%. We can now say that 39.7% +- 2.64% are male at our work, with a confidence level of 95%, and that is a big difference to what we said in previous slide (20.9%) !https://www.checkmarket.com/market-research-resources/sample-size-calculator/

Page 27: Data visualisations quality aspects

Show the level of confidence

Tell your audience how confident you are in your assertions by. Include error bars any time you use data to make an argument

source: The importance of uncertainty, Berkeley Science review. http://sciencereview.berkeley.edu/importance-uncertainty/

Page 28: Data visualisations quality aspects

Get it professionally reviewed

Have a statistician review your analysis and your representation. You will be surprised about how much corrections and improvements you can achieve.

Page 29: Data visualisations quality aspects

Welcome to the data science!

source: http://sciencereview.berkeley.edu/article/first-rule-data-science/

Page 30: Data visualisations quality aspects

I shall not use visualization to intentionally hide or confuse the truth which it is intended to portray. I will respect the great power visualization has in garnering wisdom and misleading the uninformed. I accept this responsibility willfully and without reservation, and promise to defend this oath against all enemies, both domestic and foreign.

hippocratic oath for data scientists

VisWeek2011, Jason Moore, A code for ethics for data visualisations professionals

Page 31: Data visualisations quality aspects

THANK YOU!

More resources: http://www.eea.europa.eu/data-and-maps/daviz/learn-more/


Recommended