537© Fabio Nelli 2018 F. Nelli, Python Data Analytics, https://doi.org/10.1007/978-1-4842-3913-1
APPENDIX A
Writing Mathematical Expressions with LaTeXLaTeX is extensively used in Python. In this appendix there are many examples that can
be useful to represent LaTeX expressions inside Python implementations. This same
information can be found at the link http://matplotlib.org/users/mathtext.html.
With matplotlibYou can enter the LaTeX expression directly as an argument of various functions that can
accept it. For example, the title() function that draws a chart title.
import matplotlib.pyplot as plt
%matplotlib inline
plt.title(r'$\alpha > \beta$')
With IPython Notebook in a Markdown CellYou can enter the LaTeX expression between two '$$'.
$$c = \sqrt{a^2 + b^2}$$
c = a +b2 2
538
With IPython Notebook in a Python 2 CellYou can enter the LaTeX expression within the Math() function.
from IPython.display import display, Math, Latex
display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
Subscripts and SuperscriptsTo make subscripts and superscripts, use the ‘_’ and ‘^’ symbols:
r'$\alpha_i > \beta_i$'
a bi i>
This could be very useful when you have to write summations:
r'$\sum_{i=0}^\infty x_i$'
i=0
¥
åxi
Fractions, Binomials, and Stacked NumbersFractions, binomials, and stacked numbers can be created with the \frac{}{}, \binom{}{},
and \stackrel{}{} commands, respectively:
r'$\frac{3}{4} \binom{3}{4} \stackrel{3}{4}$'
3
4
3
4
3
4æ
èç
ö
ø÷
Fractions can be arbitrarily nested:
51
4
-x
Appendix A Writing MAtheMAticAl expressions With lAtex
539
Note that special care needs to be taken to place parentheses and brackets around
fractions. You have to insert \left and \right preceding the bracket in order to inform the
parser that those brackets encompass the entire object:
51
4
-x
æ
è
ççç
ö
ø
÷÷÷
RadicalsRadicals can be produced with the \sqrt[]{} command.
r'$\sqrt{2}$'
2
FontsThe default font is italics for mathematical symbols. To change fonts, for example with
trigonometric functions as sin:
s t = Asin 2 t( ) ( )w
The choices available with all fonts are
from IPython.display import display, Math, Latex
display(Math(r'\mathrm{Roman}'))
display(Math(r'\mathit{Italic}'))
display(Math(r'\mathtt{Typewriter}'))
display(Math(r'\mathcal{CALLIGRAPHY}'))
Appendix A Writing MAtheMAticAl expressions With lAtex
540
AccentsAn accent command may precede any symbol to add an accent above it. There are long
and short forms for some of them.
\acute a or \'a
\bar a
\breve a
\ddot a or \"a
\dot a or \.a
\grave a or \`a
\hat a or \^a
\tilde a or \~a
\vec a
\overline{abc}
SymbolsYou can also use a large number of the TeX symbols.
Lowercase Greek
\alpha \beta \chi \delta \digamma
\epsilon \eta \gamma \iota \kappa
\lambda \mu \nu \omega \phi
\pi \psi \rho \sigma \tau
\theta \upsilon \varepsilon \varkappa \varphi
\varpi \varrho \varsigma \vartheta \xi
\zeta
Appendix A Writing MAtheMAticAl expressions With lAtex
541
Uppercase Greek
\delta \gamma \lambda \omega \phi \pi
\psi \sigma \theta \Upsilon \xi \mho
\nabla
Hebrew
\aleph \beth \daleth \gimel
Delimiters
/ [ \downarrow \Uparrow \Vert \backslash
\downarrow \langle \lceil \lfloor \llcorner \lrcorner
\rangle \rceil \rfloor \ulcorner \uparrow \urcorner
\vert \{ \| \} ] |
Big Symbols
\bigcap \bigcup \bigodot \bigoplus \bigotimes
\biguplus \bigvee \bigwedge \coprod \int
\oint \prod \sum
Appendix A Writing MAtheMAticAl expressions With lAtex
542
Standard Function Names
\pr \arccos \arcsin \arctan
\arg \cos \cosh \cot
\coth \csc \deg \det
\dim \exp \gcd \hom
\inf \ker \lg \lim
\liminf \limsup \ln \log
\max \min \sec \sin
\sinh \sup \tan \tanh
Binary Operation and Relation Symbols
\Bumpeq \cap \cup
\doteq \Join \subset
\supset \Vdash \Vvdash
\approx \approxeq \ast
\asymp \backepsilon \backsim
\backsimeq \barwedge \because
\between \bigcirc \bigtriangledown
\bigtriangleup \blacktriangleleft \blacktriangleright
\bot \bowtie \boxdot
\boxminus \boxplus \boxtimes
\bullet \bumpeq \cap
\cdot \circ \circeq
\coloneq \cong \cup
\curlyeqprec \curlyeqsucc \curlyvee
\curlywedge \dag \dashv
(continued)
Appendix A Writing MAtheMAticAl expressions With lAtex
543
\ddag \diamond \div
\divideontimes \doteq \doteqdot
\dotplus \doublebarwedge \eqcirc
\eqcolon \eqsim \eqslantgtr
\eqslantless \equiv \fallingdotseq
\frown \geq \geqq
\geqslant \gg \ggg
\gnapprox \gneqq \gnsim
\gtrapprox \gtrdot \gtreqless
\gtreqqless \gtrless \gtrsim
\in \intercal \leftthreetimes
\leq \leqq \leqslant
\lessapprox \lessdot \lesseqgtr
\lesseqqgtr \lessgtr \lesssim
\ll \lll \lnapprox
\lneqq \lnsim \ltimes
\mid \models \mp
\nVdash \nVdash \napprox
\ncong \ne \neq
\neq \nequiv \ngeq
\ngtr \ni \nleq
\nless \nmid \notin
\nparallel \nprec \nsim
(continued)
Appendix A Writing MAtheMAticAl expressions With lAtex
544
\nsubset \nsubseteq \nsucc
\nsupset \nsupseteq \ntriangleleft
\ntrianglelefteq \ntriangleright \ntrianglerighteq
\nvdash \nvdash \odot
\ominus \oplus \oslash
\otimes \parallel \perp
\pitchfork \pm \prec
\precapprox \preccurlyeq \preceq
\precnapprox \precnsim \precsim
\propto \rightthreetimes \risingdotseq
\rtimes \sim \simeq
\slash \smile \sqcap
\sqcup \sqsubset \sqsubset
\sqsubseteq \sqsupset \sqsupset
\sqsupseteq \star \subset
\subseteq \subseteqq \subsetneq
\subsetneqq \succ \succapprox
\succcurlyeq \succeq \succnapprox
\succnsim \succsim \supset
\supseteq \supseteqq \supsetneq
\supsetneqq \therefore \times
\top \triangleleft \trianglelefteq
\triangleq \triangleright \trianglerighteq
\uplus \vdash \varpropto
\vartriangleleft \vartriangleright \vdash
\vee \veebar \wedge
\wr
Appendix A Writing MAtheMAticAl expressions With lAtex
545
Arrow Symbols
\downarrow \leftarrow
\leftrightarrow \lleftarrow
\longleftarrow \longleftrightarrow
\longrightarrow \lsh
\nearrow \nwarrow
\rightarrow \rrightarrow
\rsh \searrow
\swarrow \Uparrow
\Updownarrow \circlearrowleft
\circlearrowright \curvearrowleft
\curvearrowright \dashleftarrow
\dashrightarrow \downarrow
\downdownarrows \downharpoonleft
\downharpoonright \hookleftarrow
\hookrightarrow \leadsto
\leftarrow \leftarrowtail
\leftharpoondown \leftharpoonup
\leftleftarrows \leftrightarrow
\leftrightarrows \leftrightharpoons
\leftrightsquigarrow \leftsquigarrow
\longleftarrow \longleftrightarrow
\longmapsto \longrightarrow
\looparrowleft \looparrowright
\mapsto \multimap
\nleftarrow \nleftrightarrow
\nrightarrow \nearrow
\nleftarrow \nleftrightarrow
(continued)
Appendix A Writing MAtheMAticAl expressions With lAtex
546
\nrightarrow \nwarrow
\rightarrow \rightarrowtail
\rightharpoondown \rightharpoonup
\rightleftarrows \rightleftarrows
\rightleftharpoons \rightleftharpoons
\rightrightarrows \rightrightarrows
\rightsquigarrow \searrow
\swarrow \to
\twoheadleftarrow \twoheadrightarrow
\uparrow \updownarrow
\updownarrow \upharpoonleft
\upharpoonright \upuparrows
Miscellaneous Symbols
\$ \AA \Finv
\game \im \p
\re \s \angle
\backprime \bigstar \blacksquare
\blacktriangle \blacktriangledown \cdots
\checkmark \circledr \circleds
\clubsuit \complement \copyright
\ddots \diamondsuit \ell
\emptyset \eth \exists
\flat \forall \hbar
\heartsuit \hslash \iiint
(continued)
Appendix A Writing MAtheMAticAl expressions With lAtex
547
\iint \iint \imath
\infty \jmath \ldots
\measuredangle \natural \neg
\nexists \oiiint \partial
\prime \sharp \spadesuit
\sphericalangle \ss \triangledown
\varnothing \vartriangle \vdots
\wp \yen
Appendix A Writing MAtheMAticAl expressions With lAtex
549© Fabio Nelli 2018 F. Nelli, Python Data Analytics, https://doi.org/10.1007/978-1-4842-3913-1
APPENDIX B
Open Data Sources
Political and Government DataData.gov
http://data.gov
This is the resource for most government-related data.
Socrata
http://www.socrata.com/resources/
Socrata is a good place to explore government-related data. Furthermore, it provides
some visualization tools for exploring data.
US Census Bureau
http://www.census.gov/data.html
This site provides information about US citizens covering population data, geographic
data, and education.
UN3ta
https://data.un.org/
UNdata is an Internet-based data service which brings UN statistical databases.
European Union Open Data Portal
http://open-data.europa.eu/en/data/
This site provides a lot of data from European Union institutions.
550
Data.gov.uk
http://data.gov.uk/
This site of the UK Government includes the British National Bibliography: metadata
on all UK books and publications since 1950.
The CIA World Factbook
https://www.cia.gov/library/publications/the-world-factbook/
This site of the Central Intelligence Agency provides a lot of information on history,
population, economy, government, infrastructure, and military of 267 countries.
Health DataHealthdata.gov
https://www.healthdata.gov/
This site provides medical data about epidemiology and population statistics.
NHS Health and Social Care Information Centre
http://www.hscic.gov.uk/home
Health data sets from the UK National Health Service.
Social DataFacebook Graph
https://developers.facebook.com/docs/graph-api
Facebook provides this API which allows you to query the huge amount of
information that users are sharing with the world.
Topsy
http://topsy.com/
Topsy provides a searchable database of public tweets going back to 2006 as well as
several tools to analyze the conversations.
Appendix B Open dAtA SOurceS
551
Google Trends
http://www.google.com/trends/explore
Statistics on search volume (as a proportion of total search) for any given term,
since 2004.
Likebutton
http://likebutton.com/
Mines Facebook’s public data—globally and from your own network—to give an
overview of what people “Like” at the moment.
Miscellaneous and Public Data SetsAmazon Web Services public datasets
http://aws.amazon.com/datasets
The public data sets on Amazon Web Services provide a centralized repository of
public data sets. An interesting dataset is the 1000 Genome Project, an attempt to build
the most comprehensive database of human genetic information. Also a NASA database
of satellite imagery of Earth is available.
DBPedia
http://wiki.dbpedia.org
Wikipedia contains millions of pieces of data, structured and unstructured, on
every subject. DBPedia is an ambitious project to catalogue and create a public, freely
distributable database allowing anyone to analyze this data.
Freebase
http://www.freebase.com/
This community database provides information about several topics, with over 45
million entries.
Appendix B Open dAtA SOurceS
552
Gapminder
http://www.gapminder.org/data/
This site provides data coming from the World Health Organization and World Bank
covering economic, medical, and social statistics from around the world.
Financial DataGoogle Finance
https://www.google.com/finance
Forty years’ worth of stock market data, updated in real time.
Climatic DataNational Climatic Data Center
http://www.ncdc.noaa.gov/data-access/quick-links#loc-clim
Huge collection of environmental, meteorological, and climate data sets from the US
National Climatic Data Center. The world’s largest archive of weather data.
WeatherBase
http://www.weatherbase.com/
This site provides climate averages, forecasts, and current conditions for over 40,000
cities worldwide.
Wunderground
http://www.wunderground.com/
This site provides climatic data from satellites and weather stations, allowing you to
get all information about the temperature, wind, and other climatic measurements.
Appendix B Open dAtA SOurceS
553
Sports DataPro-Football-Reference
http://www.pro-football-reference.com/
This site provides data about football and several other sports.
Publications, Newspapers, and BooksNew York Times
http://developer.nytimes.com/docs
Searchable, indexed archive of news articles going back to 1851.
Google Books Ngrams
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
This source searches and analyzes the full text of any of the millions of books
digitized as part of the Google Books project.
Musical DataMillion Song Data Set
http://aws.amazon.com/datasets/6468931156960467
Metadata on over a million songs and pieces of music. Part of Amazon Web Services.
Appendix B Open dAtA SOurceS
555© Fabio Nelli 2018 F. Nelli, Python Data Analytics, https://doi.org/10.1007/978-1-4842-3913-1
Index
AAccents, LaTeX, 540–547Advanced Data aggregation
apply() functions, 225transform() function, 226
Anaconda, 24, 88Anderson Iris Dataset, see Iris flower
datasetArray manipulation
joining arrayscolumn_stack() and row_stack(), 72hstack() function, 71vstack() function, 71
splitting arrayshsplit() function, 72split() function, 73–74vsplit() function, 72
Artificial intelligence, 5, 350schematization of, 352
Artificial neural networksbiological networks, 361edges, 356hidden layer, 357input and output layer, 357multi layer perceptron, 360nodes, 356schematization of, 355–356SLP (see Single layer perceptron (SLP))weight, 356
BBar chart
3D, 306–307error bars, 281horizontal, 281–282matplotlib, 278multiserial, 282–284multiseries stacked bar, 286–290pandas DataFrame, 285–286representations, 291stacked bar charts, 290x-axis, 280xticks() function, 279
Bayesian methods, 4Big Data, 353Bigrams, 498Biological neural networks, 361Blending operation, 520
CCaffe2, 355Chart typology, 267Choropleth maps
D3 library, 459geographical representations, 459HTML() function, 461–462jinja2, 462–463JSON and TSV, 463
556
JSON TopoJSON, 460require.config(), 461results, 464US population, 2014
data source census.gov, 467file TSV, codes, 466HTML() function, 468jinja2.Template, 469pop2014_by_county
dataframe, 465–466population.csv, 467–468render() function, 470–471SUMLEV values, 464
Classification and regression trees, 12Classification models, 12Climatic data, 552Clustered bar chart
IPython Notebook, 454–455jinja2, 455, 457–458render() function, 458–459
Clustering models, 4, 11–12Collocations, 498Computer vision, 507Concatenation
arrays, 188combining, 191, 193concat() function, 189–190dataframe, 191keys option, 190pivoting, 193
hierarchical indexing, 193long to wide format, 195stack() function, 194unstack() function, 194
removing, 196Correlation, 129–131Covariance, 129–131
Cross-validation, 13Cython, 22
DData aggregation
apply() functions, 226, 228–229GroupBy, 217
groupby() function, 219operations, 218output of, 220SPLIT-APPLY-COMBINE, 218
hierarchical grouping, 220–221merge(), 226numeric and string values, 219price1 column, 219transform() function, 225
Data analysischarts, 2data visualization, 2definition, 1deployment phase, 2information, 6knowledge, 6knowledge domains
computer science, 3disciplines, 3fields of application, 5machine learning and artificial
intelligence, 5mathematics and statistics, 4problems of, 3
open data, 15–16predictive model, 2process
data sources, 9deployment, 13exploration/visualization, 10–11
Choropleth maps (cont.)
Index
557
extraction, 9–10model validation, 13planning phase, 9predictive modeling, 12preparation, 10problem definition, 8stages, 6–8
purpose of, 1Python and, 17quantitative and qualitative, 14types
categorical data, 6numerical data, 6
DataFramepandas
definition, 103–105nested dict, 111operations, 121structure, 103transposition, 111
structure, 102Data manipulation
aggregation (see Data aggregation)concatenation, 188discretization and binning, 204group iteration, 222permutation, 210phases of, 181preparation (see Data preparation)string (see String manipulation)transformation, 197
Data preparation, 181DataFrame, 182merging operation, 182pandas.concat(), 182pandas.DataFrame.combine_
first(), 182
pandas.merge(), 182procedures of, 181
Data structures, operationsDataFrame and series, 121–122flexible arithmetic
methods, 120–121Data transformation
drop_duplicates() function, 199mapping
adding values, 201axes, 202dict objects, 199replacing values, 199
remove duplicates, 198–199Data visualization
adding textaxis labels, 251–252informative label, 254mathematical expression, 254–255modified of, 252text() function, 253
bar chart (see Bar chart)chart typology, 267contour plot/map, 297–299data analysis, 2313D surfaces, 302, 304grid, 256grids, subplots, 309handling date values, 264–267histogram, 277–278installation, 233IPython and IPython
QtConsole, 233, 235kwargs
figures and axes, 249horizontal subplots, 249–250linewidth, 248
Index
558
plot() function, 249vertical subplots, 250–251
legendchart of, 258legend() function, 257, 258multiseries chart, 259upper-right corner, 257
line chart (see Line chart)matplotlib architecture and
NumPy, 247matplotlib library (see matplotlib
library)mplot3d, 302multi-panel plots
grids, subplots, 309, 311subplots, 307–309
pie charts, 292axis() function, 293modified chart, 294pandas Dataframe, 296pie() function, 292shadow kwarg, 295
plotting windowbuttons of, 241commands, 241matplotlib and NumPy, 246plt.plot() function, 242, 243properties, 243QtConsole, 241–242
polar chart, 299, 301pyplot module, 239saving, charts
HTML file, 262–263image file, 264source code, 260–261
scatter plot, 3D, 304–305Decision trees, 11
Deep learning, 349, 532artificial (see Artificial neural
networks)artificial intelligence, 350data availability, 353machine learning, 351neural networks and GPUs, 352Python
frameworks, 354programming language, 354
schematization of, 352TensorFlow (see TensorFlow)
Digits datasetdefinition, 475digits.images array, 477digit.targets array, 478handwritten digits, 477handwritten number images, 475matplotlib library, 477scikit-learn library, 476
Discretization and binning, 204any() function, 210categorical type, 206cut() function, 205–206, 208–209describe() function, 209detecting and filtering
outliers, 209qcut(), 208–209std() function, 210value_counts() function, 206
Django, 17Dropping, 117–118
EEclipse (pyDev), 41–42Element-wise computation, 47Expression-oriented programming, 33
Data visualization (cont.)
Index
559
FFinancial data, 552Flexible arithmetic methods, 120–121Fonts, LaTeX, 539
GGradient theory, 523Graphics Processing Unit (GPU), 353Grouping, 11Group iteration
chain of transformations, 222, 224functions on groups
mark() function, 224–225quantiles() function, 224
GroupBy object, 222
HHandwriting recognition
digits dataset, 475–478handwritten digits, matplotlib
library, 478learning and predicting, 478, 480, 482OCR software, 473scikit-learn, 474–475svc estimator, 480TensorFlow, 480validation set, six digits, 479
Health data, 550Hierarchical indexing
arrays, 136–137DataFrame, 135reordering and sorting levels, 137–138stack() function, 136statistic levels, 138structure, 134two-dimensional structure, 134
IIDEs, see Interactive development
environments (IDEs)Image analysis
concept of, 521convolutions, 523definition, 507edge detection, 522, 525
blackandwhite.jpg image, 526–529, 531
black and white system, 525filters function, 528gradients.jpg image, 532gray gradients, 525Laplacian and Sobel filters, 531results, 528source code, 530
face detection, 532gradient theory, 523OpenCV (see Open Source Computer
Vision (OpenCV))operations, 508representation of, 522
Indexing functionalitiesarithmetic and data
alignment, 118, 120dropping, 117–118reindexing, 114, 116
Integration, 47Interactive development environments
(IDEs)Eclipse (pyDev), 41–42Komodo, 45Liclipse, 43–46NinjaIDE, 44–45Spyder, 41Sublime, 42–43
Index
560
Interactive programming language, 20Interfaced programming language, 20Internet of Things (IoT), 353Interpreted programming language, 20Interpreter
characterization, 21Cython, 22Jython, 22PVM, 21PyPy, 22tokenization, 21
IPythonand IPython QtConsole, 233–234Jupyter project logo, 37Notebook, 39, 474
DataFrames, 420QtConsole, 38shell, 36tools of, 35
Iris flower datasetAnderson Iris Dataset, 316IPython QtConsole, 316Iris setosa features, 318–319length and width,
petal, 319–320matplotlib library, 318PCA decomposition, 320target attribute, 317types of analysis, 316variables, 319–320
JJavaScript D3 Library
bar chart, 454CSS definitions, 450–451data-driven documents, 449HTML importing library, 450
IPython Notebooks, 449Jinja2 library, 451–453pandas dataframe, 453render() function, 453require.config() method, 450web chart creation, 450
Jinja2 library, 451–453Jython, 22
KK-nearest neighbors classification
decision boundaries, 325–3262D scatterplot, sepals, 324predict() function, 323random.permutation(), 323training and testing set, 322
LLaTeX
accents, 540–547fonts, 539fractions, binomials, and stacked
numbers, 538–539with IPython Notebook
in Markdown Cell, 537in Python 2 Cell, 538
with matplotlib, 537radicals, 539subscripts and superscripts, 538symbols
arrow symbols, 540, 545–546big symbols, 542binary operation and relation
symbols, 542–543Delimiters, 540–541Hebrew, 541
Index
561
lowercase Greek, 540miscellaneous symbols, 540standard function names, 542uppercase Greek, 541
Learning phase, 378Liclipse, 43–46Linear regression, 12Line chart
annotate(), 274arrowprops kwarg, 274Cartesian axes, 273color codes, 270–271data points, 267different series, 269gca() function, 273Greek characters, 272LaTeX expression, 274line and color styles, 270mathematical expressions, 275mathematical function, 268pandas, 276plot() function, 268set_position() function, 273xticks() and yticks() functions, 271
Linux distribution, 90LOD cloud diagram, 16Logistic regression, 12
MMachine learning (ML), 5
algorithm development process, 313deep learning, 351diabetes dataset, 327–328features/attributes, 314Iris flower dataset, 316learning problem, 314linear/least square regression
coef_ attribute, 329fit() function, 329linear correlation, 330parameters, 328physiological factors and
progression of diabetes, 332–333single physiological factor, 330
schematization of, 352supervised learning, 314SVM (see Support vector machines
(SVMs))training and testing set, 315unsupervised learning, 314–315
Mappingadding values, 201–202inplace option, 204rename() function, 204renaming, axes, 202, 204replacing values, 199, 201
Mathematical expressions with LaTeX, see LaTeX
MATLAB, 17matplotlib, 48matplotlib library
architectureartist layer, 236–238backend layer, 236functions and tools, 235layers, 235pylab and pyplot, 238–239scripting layer (pyplot), 238
artist layergraphical representation, 237hierarchical structure, 236primitive and composite, 237
graphical representation, 231, 233LaTeX, 232NumPy, 246
Index
562
Matrix product, 60Merging operation
DataFrame, 183–184dataframe objects, 183index, 187join() function, 187–188JOIN operation, 182left_index/right_index
options, 187left join, right join and
outer join, 186left_on and right_on, 185, 187merge() function, 183, 184
Meteorological data, 409Adriatic Sea and Po Valley, 410
cities, 412Comacchio, 413image of, 411mountainous areas, 410reference standards, 412TheTimeNow website, 413
climate, 409data source
JSON file, 414Weather Map site, 414
IPython Notebookchart representation, 425, 429, 431CSV files, 421DataFrames, 422, 432humidity function, 433–435linear regression, 431matplotlib library, 423Milan, 423read_csv() function, 421result, 423shape() function, 422SVR method, 428–429temperature, 424, 426–427, 432
Jupyter Notebook, 415access internal data, 417command line, 415dataframe, 419–420extraction procedures, 418Ferrara, 416JSON file, 416json.load() function, 415parameters, 419prepare() function, 420
RoseWind (see RoseWind)wind speed, 441
Microsoft excel filesdataframe, 162data.xls, 160, 162internal module xlrd, 160read_excel() function, 161
MongoDB, 178–179Multi Layer Perceptron (MLP)
artificial networks, 360evaluation of, 404experimental data, 404hidden layers, 397IPython session, 387learning phase, 389model definition, 387test phase and accuracy
calculation, 395, 402Musical data, 553
NNatural Language Toolkit (NLTK)
bigrams and collocations, 498common_contexts() function, 493concordance() function, 493corpora, 488downloader tool, 489
Index
563
fileids() function, 491HTML pages, text, 501len() function, 491library, 489macbeth variable, 491Python library, 488request() function, 502selecting words, 497sentimental analysis, 502sents() function, 492similar() function, 494text, network, 500word frequency, 494
macbeth variable, 495most_common() function, 494nltk.download() function, 495nltk.FreqDist() function, 494stopwords, 495string() function, 496
word search, 493Ndarray, 47
array() function, 51–53data, types, 53–54dtype (data-type), 50, 54intrinsic creation, 55–57type() function, 51–52
NOSE MODULE, 91“Not a Number” data
filling, NaN occurrences, 133filtering out NaN
values, 132–133NaN value, 131–132
NumPy libraryarray manipulation (see Array
manipulation)basic operations
aggregate functions, 62arithmetic operators, 57–59
increment and decrement operators, 60–61
matrix product, 59–60ufunc, 61
broadcastingcompatibility, 77complex cases, 78–79operator/function, 76
BSD, 50conditions and Boolean arrays, 69copies/views of objects, 75data analysis, 49indexing, 63
bidimensional array, 64monodimensional ndarray, 63negative index value, 63
installation, 50iterating an array, 67–69ndarray (see Ndarray)Numarray, 49python language, 49reading and writing array data, 82shape manipulation, 70–71slicing, 65–66structured arrays, 79vectorization, 76
OObject-oriented programming language, 20OCR, see Optical Character Recognition
(OCR) softwareOpen data, 15–16Open data sources, 353
climatic data, 552demographics
IPython Notebook, 446matplotlib, 449
Index
564
pandas dataframes, 446–447pop2014_by_state dataframe, 448pop2014 dataframe, 447–448United States Census
Bureau, 445–446financial data, 552health data, 550miscellaneous and public
data sets, 551–552musical data, 553political and government
data, 549–550publications, newspapers,
and books, 553social data, 550–551sports data, 553
Open Source Computer Vision (OpenCV)deep learning, 509image processing and analysis, 509
add() function, 515blackish image, 518blending, 520destroyWindow() method, 512elementary operations, 514imread() method, 510imshow() method, 511load and display, 510merge() method, 513NumPy matrices, 519saving option, 514waitKey() method, 511working process, 512
installation, 509MATLAB packages, 508start programming, 510
Open-source programming language, 21
Optical Character Recognition (OCR) software, 473
order() function, 127
PPandas dataframes, 446, 453Pandas data structures
DataFrame, 102–105assigning values, 107–109deleting column, 110element selection, 105–107filtering, 110membership value, 109–110nested dict, 111transposition, 111
evaluating values, 98–99index objects, 112
duplicate labels, 112–113methods, 112
NaN values, 99, 101NumPy arrays and existing
series, 96–97operations, 120–122operations and mathematical
functions, 97–98series, 93
assigning values, 95declaration, 94dictionaries, 101filtering values, 97index, 93internal elements, selection, 95operations, 102
Pandas library, 87correlation and covariance, 129–131data structures (see Pandas data
structures)
Open data sources (cont.)
Index
565
function application and mappingelement, 123row/column, 123, 125statistics, 125
getting started, 92hierarchical indexing and
leveling, 134–135, 137–138indexes (see Indexing functionalities)installation
Anaconda, 88–89development phases, 91Linux, 90module repository, Windows, 90PyPI, 89source, 90testing, 91
“Not a Number” data, 131–134python data analysis, 87–88sorting and ranking, 126–129
Permutationnew_order array, 211np.random.randint() function, 211numpy.random.permutation()
function, 210random sampling, 211
DataFrame, 211take() function, 211
Pickle—python object serializationcPickle, 168frame.pkl, 170pandas library, 169stream of bytes, 168
Political and government data, 549–550
pop2014_by_county dataframe, 465pop2014_by_state dataframe, 448–449pop2014 dataframe, 447–448Portable programming language, 20
PostgreSQL, 174Principal component analysis
(PCA), 320, 322Public data sets, 551–552PVM, see Python virtual machine (PVM)pyplot module
interactive chart, 239Line2D object, 240plotting window, 240
show() function, 240PyPy interpreter, 22Python, 17
data analysis library, 87–88deep learning, 354frameworks, 354module, 91OpenCV, 508
Python Package Index (PyPI), 39, 89Python’s world
code implementation, 28distributions, 24
Anaconda, 24Enthought Canopy, 26Python(x,y), 26
IDEs (see Interactive development environments (IDEs))
installation, 23–24interact, 28interpreter (see Interpreter)IPython (see IPython)programming language, 19–21PyPI, 39Python 2, 23Python 3, 23running, entire program code, 27SciPy
libraries, 46matplotlib, 48
Index
566
NumPy, 47pandas, 47
shell, 26source code
data structure, 30dictionaries and lists, 31functional programming, 33Hello World, 28index, 32libraries and functions, 30map() function, 33mathematical operations, 29print() function, 29
writing python code, indentation, 34–35
Python virtual machine (PVM), 21PyTorch, 355
QQualitative analysis, 14Quantitative analysis, 14
RR, 17Radial Basis Function (RBF), 340Radicals, LaTeX, 539Ranking, 128–129Reading and writing array
binary files, 82tabular data, 83–84
Reading and writing dataCSV and textual files
header option, 144index_col option, 145myCSV_01.csv, 143
myCSV_03.csv, 145names option, 145read_csv() function, 143, 145read_table() function, 143.txt extension, 142
databasescreate_engine() function, 171dataframe, 175pandas.io.sql module, 171pgAdmin III, 175–176PostgreSQL, 174read_sql() function, 172read_sql_query() function, 177read_sql_table() function, 177sqlalchemy, 171sqlite3, 171
DataFrame objects, 141functionalities, 141HDF5 library
data structures, 167HDFStore, 167hierarchical data format, 166mydata.h5, 167
HTML filesdata structures, 152read_html (), 155web_frames, 156web pages, 152web scraping, 152
I/O API Tools, 141–142JSON data
books.json, 164frame.json, 164json_normalize() function, 165JSONViewer, 162–163normalization, 164read_json() and to_json(), 162read_json() function, 164–165
Python’s world (cont.)
Index
567
Microsoft excel files, 159NoSQL database
insert() function, 179MongoDB, 178–180
pickle—python object serialization, 168RegExp
metacharacters, 146read_table(), 146skiprows, 148
TXT files, 147–148nrows and skiprows options, 149portion by portion, 149
writing (see Writing data)XML (see XML)
Regression models, 4, 12Reindexing, 114–116RoseWind
DataFrame, 436–437hist array, 438–439polar chart, 438, 440–441scatter plot representation, 438showRoseWind() function, 439, 441
SScikit-learn library, 473
data analysis, 474k-nearest neighbors classification, 322PCA, 320Python module, 313sklearn.svm.SVC, 475supervised learning, 315svm module, 475
SciPylibraries, 46matplotlib, 48NumPy, 47pandas, 47
Sentimental analysis, 502document_features()
function, 504documents, 503list() function, 504movie_reviews, 503negative/positive opinion, 505opinion mining, 502
Shape manipulationreshape() function, 70shape attribute, 70transpose() function, 71
Single layer perceptron (SLP), 371accuracy, 359activation function, 358, 359architecture, 357cost optimization, 382data analysis, 372evaluation phase, 359learning phase, 359, 378model definition, 374
explicitly, 376implicitly, 376learning phase, 375placeholders, 376tf.add() function, 377tf.nn.softmax() function, 377
modules, 372representation, 358testing set, 385test phase and accuracy
calculation, 383training sets, 359
Social data, 550sort_index() function, 126–128, 138Sports data, 553SQLite3, 171stack() function, 136
Index
568
String manipulationbuilt-in methods
count() function, 214error message, 214index() and find(), 213join() function, 213replace() function, 214split() function, 212strip() function, 212
regular expressionsfindall() function, 215–216match() function, 216re.compile() function, 215regex, 214re.split() function, 215split() function, 215
Structured arraysdtype option, 79, 81structs/records, 79
Subjective interpretations, 14Subscripts and superscripts,
LaTeX, 538Supervised learning
machine learning, 314scikit-learn, 315
Support vector classification (SVC), 475decision area, 336effect, decision boundary, 338–339nonlinear, 339–341number of points, C parameter, 337–338predict() function, 336–337regularization, 337support_vectors array, 337training set, decision space, 334–336
Support vector machines (SVMs)decisional space, 334decision boundary, 334Iris Dataset
decision boundaries, 342linear decision boundaries, 342–343polynomial decision boundaries, 344polynomial kernel, 343–344RBF kernel, 344training set, 342
SVC (see Support vector classification (SVC))
SVR (see Support vector regression (SVR))
Support vector regression (SVR)curves, 347diabetes dataset, 345linear predictive model, 345test set, data, 345
swaplevel() function, 137
TTensorFlow, 349, 354, 362, 480
data flow graph, 362Google’s framework, 362installation, 363IPython QtConsole, 364MLP (see Multi Layer Perceptron
(MLP))model and sessions, 364SLP (see Single layer perceptron (SLP))tensors
operation, 370parameters, 366print() function, 368representations of, 367tf.convert_to_tensor() function, 368tf.ones() method, 369tf.random_normal() function, 369tf.random_uniform() function, 369tf.zeros() method, 368
Index
569
Text analysis techniquesdefinition, 487NLTK (see Natural Language Toolkit
(NLTK))techniques, 488
Theano, 355trigrams() function, 499
U, VUnited States Census Bureau, 445–446Universal functions (ufunc), 61Unsupervised learning, 314
WWeb Scraping, 4, 10Wind speed
polar chart representation, 442RoseWind_Speed() function, 441ShowRoseWind() function, 442
ShowRoseWind_Speed() function, 442to_csv () function, 443
Writing dataHTML files
myFrame.html, 155to_html() function, 153–154
na_rep option, 151to_csv() function, 150–151
X, Y, ZXML
books.xml, 157–158getchildren(), 158getroot() function, 158lxml.etree tree structure, 159lxml library, 157objectify, 158parse() function, 158tag attribute, 158text attribute, 159
Index