Date post: | 13-Jan-2015 |
Category: |
Education |
Upload: | tony-hirst |
View: | 1,002 times |
Download: | 0 times |
Visualising Activity Data
Dept of Communication and Systems,The Open University
Scattered puzzle pieces next to solved fragment by Horia Varlan
Tony Hirst
Today’s link shortener is bit.ly
Read: [ jlKwGq ]as: http://bit.ly/jlKwGq
Visual Analysis
vs.
Presentation Graphics
This is NOT a presentation about:- data discovery- data preparation- data cleansing
BUT…
ScraperWiki[ aGhJtK ]
Search and replace…
…add regular expressions and you have search and replace “on steroids”
Google Refine[ aq1jUE ]
Example: walkthrough (@jenit) [ awGQPT ]Example: merging two tables by column [ pWK3C0 ]
DataWrangler[ gmE3yz ]
Data has shape and structure
Hierarchical Data
Treemaps
Many Eyes[ qY5786 ]
plot srcfile using ($1):(column(focusCar) -$2) with lines title "VET", srcfile using ($1):(column(focusCar) -$3) with lines title "WEB", srcfile using ($1):(column(focusCar) -$4) with lines title "HAM", srcfile using ($1):(column(focusCar) -$5) with lines title "BUT", srcfile using ($1):(column(focusCar) -$6) with lines title "ALO", srcfile using ($1):(column(focusCar) -$7) with lines title "MAS", srcfile using ($1):(column(focusCar) -$8) with lines title "SCH", srcfile using ($1):(column(focusCar) -$9) with lines title "ROS", …
Or heatmaps in R:[ qXmPgs ]
Text processing with Unix tools[ m5tz63 ] [ lOVySX ]
Count number of lines in a file: wc -l L2sample.csv
View first few lines in a file: head L2sample.csv or head -n 4 L2sample.csv
View last few lines in a file: tail L2sample.csv or tail -n 15 L2sample.csv
Sample contiguous rows from start or end of file:head -n 1 L2sample.csv > headers.csvtail -n 20 L2sample.csv > subSample.csvcat headers.csv subSample.csv >
subSampleWithHeaders.csvSample contiguous rows from middle of file:head -n 15 L2sample.csv | tail -n 6 >
middleSample.csvSplit large file into smaller files:split -l 15 L2sample.csv subSamples
Search for lines containing a term:grep mendeley L2sample.csvgrep EBSCO L2sample.csv >
rowsContainingEBSCO.csv
More text processing tricks
Extract columns:cut -f 3 L2sample.csvcut -f 1,2,14,17 L2sample.csv > columnSample.csv
Sort data in a column:cut -f 40 L2sample.csv | sort
Identify distinct entries in a column:cut -f 40 L2sample.csv | sort | uniq
Count how many times each distinct term appears in a column:cut -f 40 L2sample.csv | sort | uniq –c
Sort can also sort by column (-k), reverse order (-r):cut -f 40 L2_2011-04.csv | sort | uniq -c | sort -k 1 -r > uniqueSID.csv
[ dAdIo3 ]
Time series data
aka “seasonal subseries”
[ j3HODr ]
Trends
#time series data in d#first differencefd=np.diff(d)
Autocorrelation
matplotlib[ qSIcrV ]
Graphs and Networks
digraph test {
CSV [shape=box]KML [shape=box]JSON [shape=box]XML [shape=box]RDF [shape=box]HTML [shape=box]GoogleSpreadsheet [shape=Msquare]RDFTripleStore [shape=Msquare]"[SPARQL]" [shape=diamond]"[YQL]" [shape=diamond]"[GoogleVizDataAPI]" [shape=diamond]"<GoogleGadgets>" [shape=doubleoctagon]"<GoogleVizDataCharts>" [shape=doubleoctagon]"<GoogleMaps>" [shape=doubleoctagon]"<GoogleEarth>" [shape=doubleoctagon]"<JQueryCharts_etc>" [shape=doubleoctagon]
"[SPARQL]"->RDF;"[SPARQL]"->XML;"[SPARQL]"->CSV;"[SPARQL]"->JSON;JSON-> "<JQueryCharts_etc>";CSV->"{GoogleRefine}"CSV->ScraperWikiJSON->ScraperWiki"[YQL]"->ScraperWikiScraperWiki->CSVHTML->ScraperWikiHTML->"[YQL]""[SPARQL]"->"[YQL]""{GoogleRefine}"->CSV [style=dashed]CSV->"<Gephi>" [style=dashed]"<Gephi>"->CSV [style=dashed]RDF->"[YQL]”}
Graphviz
Gephi
[ nKoB4b]
[ nKoB4b]
Statistical Graphs
R
Graphics Libraries
Protovis
Processing
I hope that’s been
ouseful.info ….?