Retos estadísticos en Big Data y Data Mining
Manuel Febrero–Bande
Dpt. de Estadística, Análisis Matemático y OptimizaciónUniv. de Santiago de Compostela
Jornada Data Science – INAMATPamplona, 25 de septiembre de 2017
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Facts about Big Data
I Big Data is about how to solve statistical problems with theavailable (an perhaps limited) computational resources. So, itchanges as time evolves. A Big Data problem in 90’s, now it isa toy example.
I The final goal is to obtain automatic inferences from the data(without human statistical participation) like in the last 30years (under different denominations).
I Big Data is the new magic word. Like –Abracadabra–, it opensthe door to a new fantasy world wherever all is possible fromthe hand of the new Master of Ceremonies: the data scientist.
I Big Data promises better decisions (like Statistics in the lastcentury) and big profits (this is new) after a huge investmentin technology.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Table of Contents
1 Introduction
2 Big Data Analysis
3 New challenges, almost same solutionsExploratory Data AnalysisClient profileA test exampleA classification example
4 Conclusion
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Table of Contents
1 Introduction
2 Big Data Analysis
3 New challenges, almost same solutionsExploratory Data AnalysisClient profileA test exampleA classification example
4 Conclusion
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Introduction
Several denominations have been employed to relate Statistics inthe era of computers:I Expert systems (AI - 80’s-90’s)I Machine Learning (AI - 90’s– )I Data (Stream) Mining (CS - 95– )I Knowledge Discovery in Databases (KDD 95– )I Pattern Recognition (ML - 2000 – )I Structure Data Mining, Graph Mining (2003– )I Business Analytics (BIDW - 2005– )I Big Data (2011 – )
The final goal is to obtain better and faster inferences from thedata.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Data Mining and Machine Learning
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Dimensions of Big Data
The V characteristics of Big Data are:I Volume. Huge (and perhaps increasing) size of the collected
data.I Velocity. Process data and produce results in limited time
with limited computer resources.I Variety. Heterogeneous and complex data representations.I Veracity. Quality of the data and its pre-processing.I Value. New economic value that support better decisions.I Variability. Changes in the structure of the data as time goes
on.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Risky asserts about Big Data
I Traditional statistics will not remain as relevant as it used tobe
I sample and selection biasesI limits of data setsI assumptions about data
I Correlations should replace models. The data speak bythemselves.
I spurious correlationsI causality
I Precision of the results is not as essential as it was previouslybelieved to be
I robustnessI precision/complexity
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Del rigor de la ciencia
En aquel Imperio, el arte de la Cartografía logró tal perfección que elmapa de una sola provincia ocupaba toda una ciudad, y el mapa delImperio, toda una provincia. Con el tiempo, estos mapas desmesuradosno satisficieron y los colegios de cartógrafos levantaron un Mapa delImperio, que tenía el tamaño del Imperio y coincidía puntualmente con él.Menos adictas al estudio de la Cartografía, las generaciones siguientesentendieron que ese dilatado Mapa era inútil y no sin impiedad loentregaron a las inclemencias del sol y los inviernos. En los Desiertos delOeste perduran despedazadas ruinas del Mapa, habitadas por animales ypor mendigos; en todo el país no hay otra reliquia de las DisciplinasGeográficas.
Jorge Luis Borges
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Aspects of Big Data
I Big Data Management.I Hardware. How to store such huge amounts of data?I Computation. How to collect and filter the sources of data?
I Big Data ProcessingI Preprocess. How to structure the data to be readily available?I Scheduling.I Parallelization
I Big Data Techniques and AlgorithmsI Three components: Statistics, Statistics and Statistics
I Big Data Ethics
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Table of Contents
1 Introduction
2 Big Data Analysis
3 New challenges, almost same solutionsExploratory Data AnalysisClient profileA test exampleA classification example
4 Conclusion
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Steps in a Big Data Analysis
DATA(from hundreds to billions)
Segmentation(Regions, Subject,Category, ...)
Dim. Reduction(Feature extr., PC,PLS)
Id. of Groups(kNN, K-means,Clustering, ...)
Descriptive analysisSumm. & Visualize,Outliers, Missing
Predictive analysisRegression,
Classification, ...
Inferential analysisEstimation, Testing,
...
Training and Validation Sets, Comparison criteria
Specific issues in Real–Time DataOutliers, Regression, Classification, Pattern Recognition, Association
Measures, Simulation scenarios,
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Do we need Big Data?
I Big Data fever is founded in the business gains that the BigData tools can provide but the gains can be achieved usingboth Big Data and small data.
I Big Data is sold as the mean of solving all questions. But first,the right question must be asked and then, the solution mustbe provided using the appropriate information/data.
I More data does not imply better information. Information is inthe data but sometimes, we are unable to extract it.
I More data does not imply more accurate data/analysis.I Big Data is unlikely to solve a problem of interest unless it has
been specifically designed to solve it (as usual).
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Issues about Big Data
Usual statistical questions also valid in Big DataI Why Big? Big N, small P – Small N, Big P – Big N, Big PI Are the good variables available? and are the available
variables accurately enough?I Does the data represent the population we wish to make
inferences about? (or predict)I Time frame?I Redundancies?I Homogeneity along time?I Missing data?
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Table of Contents
1 Introduction
2 Big Data Analysis
3 New challenges, almost same solutionsExploratory Data AnalysisClient profileA test exampleA classification example
4 Conclusion
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Basic Toolbox for Big Data
I Aggregation, Grouping and Blocking Summarizetransactions in terms of days or hours can be enough. Thegroups can be homogeneous at certain levels.
I Compression and Sparsity exploitation Sometimes theinformation can be compressed, for example, a functionaltrajectory can be summarized in few coefficients of a basis.
I Sufficient statistics What information is enough to be stored?I Fragmentation and Divisibility (divide et impera). Solve
faster many small problems than the big one.I Recursive vs global estimation Recursive estimation can
dramatically reduce the amount of memory needed for myproblem.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Why everything is so tough with Big Data? I
I The development stage is tedious because every single trytakes a lot.
I How to detect bias in your data?I Data must be purged in an automatic way. Robustness vs
EfficiencyI Possible effects of Simpson’s Paradox in numerical summaries.I The diagnostic plots are usually not useful due to the amount
of graphic elements. Rethink the graphical displays.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Why everything is so tough with Big Data? II
load("total.RData")
> Registros: 14710663> sexo importe localidad> M :10589903 Min. :-9010.00 Coruña :6623162> V : 4120327 1st Qu.: 12.00 Lugo :1013069> NA's: 433 Median : 23.00 Ourense : 893682> Mean : 36.70 Pontevedra:4619785> 3rd Qu.: 42.85 Resto :1560965> Max. :25000.00> user system elapsed> 1.59 0.20 1.79
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
A couple of examples I
system.time(mimpor <- mean(importe))
> user system elapsed> 0.77 0.09 1.19
system.time(medimpor <- median(importe))
> user system elapsed> 0.31 0.01 0.32
system.time(mtrimpor <- mean(importe, trim = 0.05))
> user system elapsed> 0.39 0.05 0.43> Media: 39.28 Mediana: 24.32 Media truncada(5%): 30.94
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
A couple of examples II
> Media de importe> Coruña Lugo Ourense Pontevedra Resto> 37.68 40.47 40.03 38.73 46.71> Coruña Lugo Ourense Pontevedra Resto> AViaj 651.99 471.15 515.08 563.40 245.77> Otros 34.43 35.95 35.14 35.91 42.98> Tienda 46.75 50.14 50.01 45.72 53.51> Ocio 34.79 68.93 50.45 48.20 45.11> Super 29.18 32.02 28.90 28.76 27.23> Casos> Coruña Lugo Ourense Pontevedra Resto> AViaj 5026 1238 983 3501 17305> Otros 4625995 654861 555500 3060637 1150450> Tienda 1491502 278025 262802 1167781 232553> Ocio 45926 1453 1539 33442 28332> Super 173902 36256 35011 169909 35250
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
timePlotno
rmal
ised
leve
l
0.8
1.0
1.2
1.4
Sum
a
ene abr jul oct
0.8
1.0
1.2
1.4
Med
ia
Suma Media
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Calendar Plot
Sum(importe) in 2015
l m m j v s d
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
enero
l m m j v s d
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
febrero
l m m j v s d
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
marzo
l m m j v s d
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
abril
l m m j v s d
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
mayo
l m m j v s d
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
junio
l m m j v s d
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
julio
l m m j v s d
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
agosto
l m m j v s d
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
septiembre
l m m j v s d
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
octubre
l m m j v s d
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
noviembre
l m m j v s d
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
diciembre
5e+05
1e+06
1500000
2e+06
2500000
3e+06
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Calendar Plot II
Mean(importe) in 2015
l m m j v s d
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
29 30 31 1 2 3 4
enero
l m m j v s d
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
2 3 4 5 6 7 8
23 24 25 26 27 28 1
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
febrero
l m m j v s d
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
30 31 1 2 3 4 5
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
23 24 25 26 27 28 1
marzo
l m m j v s d
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
4 5 6 7 8 9 10
27 28 29 30 1 2 3
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
30 31 1 2 3 4 5
abril
l m m j v s d
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
1 2 3 4 5 6 7
25 26 27 28 29 30 31
18 19 20 21 22 23 24
11 12 13 14 15 16 17
4 5 6 7 8 9 10
27 28 29 30 1 2 3
mayo
l m m j v s d
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
29 30 1 2 3 4 5
22 23 24 25 26 27 28
15 16 17 18 19 20 21
8 9 10 11 12 13 14
1 2 3 4 5 6 7
25 26 27 28 29 30 31
junio
l m m j v s d
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
3 4 5 6 7 8 9
27 28 29 30 31 1 2
20 21 22 23 24 25 26
13 14 15 16 17 18 19
6 7 8 9 10 11 12
29 30 1 2 3 4 5
julio
l m m j v s d
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
31 1 2 3 4 5 6
24 25 26 27 28 29 30
17 18 19 20 21 22 23
10 11 12 13 14 15 16
3 4 5 6 7 8 9
27 28 29 30 31 1 2
agosto
l m m j v s d
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
5 6 7 8 9 10 11
28 29 30 1 2 3 4
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
31 1 2 3 4 5 6
septiembre
l m m j v s d
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
2 3 4 5 6 7 8
26 27 28 29 30 31 1
19 20 21 22 23 24 25
12 13 14 15 16 17 18
5 6 7 8 9 10 11
28 29 30 1 2 3 4
octubre
l m m j v s d
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
30 1 2 3 4 5 6
23 24 25 26 27 28 29
16 17 18 19 20 21 22
9 10 11 12 13 14 15
2 3 4 5 6 7 8
26 27 28 29 30 31 1
noviembre
l m m j v s d
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
4 5 6 7 8 9 10
28 29 30 31 1 2 3
21 22 23 24 25 26 27
14 15 16 17 18 19 20
7 8 9 10 11 12 13
30 1 2 3 4 5 6
diciembre
34
36
38
40
42
44
46
48
50
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Regression I
I Even the most simple techniques are hard to implement (interms of time).
I We must adapt our inferences to the new paradigm(POPULATION BIG DATA).
I The idea of a simple unique model for your data fulfilling theusual hypothesis is (probably) wrong.
> [1] "Significative variables in more than 30% of repetitions"> [1] "(Intercept)" "sexoV"> [3] "segmento_edadAdultos 2" "segmento_edadJovenes"> [5] "segmento_edadSenior" "localidadLugo"> [7] "localidadOurense" "localidadResto"> [9] "semana01"> [1] "..."> [1] "semana50"> [2] "semana51"> [3] "semana52"> [4] "sexoV:localidadPontevedra"> [5] "segmento_edadAdultos 2:localidadResto"> [6] "segmento_edadJovenes:localidadResto"> [7] "segmento_edadSenior:localidadResto"
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Regression II
0 10 20 30 40 50
2830
3234
3638
40
Semana
Impo
rte
SV SS BF
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Effects
30 35 40
0.0
0.1
0.2
0.3
0.4
semana11
N = 500 Bandwidth = 0.2753
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana12
N = 500 Bandwidth = 0.3022
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana13
N = 500 Bandwidth = 0.3112
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana14
N = 500 Bandwidth = 0.2884
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana49
N = 500 Bandwidth = 0.2866
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana50
N = 500 Bandwidth = 0.2903
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana51
N = 500 Bandwidth = 0.2899
Den
sity
30 35 40
0.0
0.1
0.2
0.3
0.4
semana52
N = 500 Bandwidth = 0.3216
Den
sity
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Client profiles
I The filtering process must be carefully automatized.I The construction of every profile is a easily parallelized task.I In this case, we downscale the complexity of the analysis
increasing the complexity of the statistical objects (functionaldata analysis) as a concept test.
I By the new nature of the objects, the classical inferences orthe numerical methods must be changed accordingly.
0 10 20 30 40 50
-10
12
34
5
log(importe). 20 samples
t
X(t)
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Functional Means
0 10 20 30 40 50
3.4
3.6
3.8
4.0
4.2
log(Importe) by Sexo
t
X(t
)
MV
0 10 20 30 40 50
3.0
3.4
3.8
4.2
log(Importe) by Localidad
t
X(t
) CoruñaLugoOurensePontevedraResto
0 10 20 30 40 50
3.2
3.6
4.0
log(Importe) by Actividad
t
X(t
)
InactivoOcupadoOtrasParado
0 10 20 30 40 50
2.5
3.0
3.5
4.0
log(Importe) by Segmento Edad
t
X(t
)
Adultos 1Adultos 2JovenesSenior
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Inferences about mean
0 10 20 30 40 50
3.5
3.6
3.7
3.8
3.9
4.0
4.1
4.2
N=62547-Boot(500)
t
X(t)
0 10 20 30 40 50
3.5
3.6
3.7
3.8
3.9
4.0
4.1
4.2
N=5000-Sample(500)
t
X(t)
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Comparing means
0 10 20 30 40 50
3.4
3.6
3.8
4.0
4.2
Means by Localidad
semana
impo
rte
CLUOUPO
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Example of a testing procedure
rtest = anova.RPm(fdaobj, ~sexo + sedad + act + sexo:sedad +sexo:act + sedad:act, data.fac = dfdatos[, -c(1, 5)])
> [1] "Pvalues with all data"> sexo sedad act sexo:sedad sexo:act sedad:act> RP30 0 0 0.024 0.01 0.564 0.001> [1] "P(pvalue<=0.05), 500 replicas of N=10000"> sexo sedad act sexo:sedad sexo:act> 0.970 1.000 0.156 0.148 0.078> sedad:act> 0.204> [1] "Mean(pvalue), 500 replicas of N=10000"> sexo sedad act sexo:sedad sexo:act> 0.008 0.000 0.328 0.385 0.493> sedad:act> 0.341
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
A difficult scenarioP (Y = i |X) = m(X1,X2) with X = (X1|X2|Z1| . . . |Z48)Training:2000, Validation:1000
-1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
X1
X2
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Classical tools
BIG problem: How to select the information?
> Best with Linear Corr: Z7 Z35> Best with Dist. Corr: X1 X2 Z7 Z28
res.lda1 = lda(X, grupo)
> grupo> 1 2> 1 572 454> 2 435 539
res.lda2 = lda(X[, l2], grupo)
> grupo> 1 2> 1 578 506> 2 429 487
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
GLM, GAM and random Forest I
> [1] "GLM:X1 + X2 + Z7 + Z28"> 1 2 1 2> 1 573 453 578 506> 2 434 540 429 487> [1] "GAM:s(X1) + s(X2) + s(Z7) + s(Z28)"> 1 2 1 2> 1 1007 0 1007 0> 2 0 993 0 993
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
GLM, GAM and random Forest II
>> Call:> randomForest(x = X[, l2], y = grupo, ntree = 2000, importance = TRUE)> Type of random forest: classification> Number of trees: 2000> No. of variables tried at each split: 2>> OOB estimate of error rate: 3.55%> Confusion matrix:> 1 2 class.error> 1 982 25 0.02482622> 2 46 947 0.04632427> Error rate0: 0.216
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
GLM, GAM and random Forest III
Z33Z19Z21Z27Z37Z39Z5Z45Z8Z14Z12Z29Z31Z43Z20Z32Z3Z41Z9Z25Z42Z16Z7Z10Z18Z35Z47Z46X1X2
0 20 40 60 80
Importance
MeanDecreaseAccuracy
Z7
Z28
X1
X2
0 100 200 300 400
Importance
MeanDecreaseAccuracy
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Neural Networks and SVM I
res.nn0 = nnet(ff0, data = Xe, size = 15, rang = 0.2, maxit = 1000,trace = FALSE)
res.nn = nnet(ff1, data = Xe, size = 15, rang = 0.2, maxit = 500,trace = FALSE)
> [1] "Number of weights:91"
cbind(table(grnn0, grupo), table(grnn, grupo))
> 1 2 1 2> 1 956 52 1007 0> 2 51 941 0 993
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Neural Networks and SVM II
res.svm = svm(ff1, data = Xe, cross = 5)
> [1] "Number of SV:821"
cbind(table(res.svm0$fitted, grupo), table(res.svm$fitted, grupo))
> 1 2 1 2> 1 948 96 983 29> 2 59 897 24 964
-1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
GAM
X1
X2
-1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
Random Forest
X1
X2
-1.0 -0.5 0.0 0.5 1.0
-1.0
-0.5
0.0
0.5
1.0
Neural Network
X1
X2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
SVM
X1
X2
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Predictive performance I
GLM.1 GLM.2 GAM.1 GAM.2 RF.1 RF.2 NN.1 NN.2 SVM.1 SVM.2G1(0) 262 266 513 15 466 62 245 283 287 241G2(0) 240 232 9 463 161 311 222 250 220 252G1(S) 249 279 527 1 509 19 517 11 508 20G2(S) 251 221 5 467 20 452 13 459 16 456
Table: Predictive performance
What’s better (in this case)?I Although the performance of GAM seems to be optimal, none
of the variables are marked as significant.I The Random Forest procedure seems to be the best for
detecting the significant variables.I No useful inferences can be obtained from NN and SVM
procedures. Its good predictive ability has the cost of the hugenumber of parameters.
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Energy example I
0.0 0.2 0.4 0.6 0.8 1.0
2000
035
000
Y=Demanda-Real (dia t)
t=1/144,...,1 (10-minutal)
(MW
)
2008200920102011201220132014
0.0 0.2 0.4 0.6 0.8 1.0
2000
035
000
X=Demanda-Programada (dia t)
day t=1/144,...,1 (10-minutal)
(MW
)
2008200920102011201220132014
5 10 15 20
1000
025
000
4000
0
Energy
hours
MW
h
2008200920102011201220132014
5 10 15 20
050
100
150
Price
hours
EU
R/M
Wh
2008200920102011201220132014
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Where is the information? I
> En.h6 En.h12 En.h18 En.h24 Pr.h6 Pr.h12 Pr.h18 Pr.h24> dsem 0.010 0.137 0.129 0.026 0.004 0.029 0.025 0.006> year 0.095 0.059 0.044 0.120 0.114 0.169 0.148 0.185> mes 0.052 0.026 0.029 0.059 0.052 0.046 0.070 0.042> tmax 0.099 0.023 0.016 0.088 0.139 0.056 0.020 0.029> tmin 0.058 0.019 0.009 0.078 0.083 0.043 0.029 0.026> tmed 0.086 0.024 0.016 0.092 0.120 0.054 0.026 0.030> velm NA NA NA NA NA NA NA NA> rtemp 0.071 0.008 0.006 0.025 0.128 0.052 0.014 0.032
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Where is the information? II
> En.h6 En.h12 En.h18 En.h24 Pr.h6 Pr.h12 Pr.h18 Pr.h24> En1.h1 0.465 0.204 0.210 0.481 0.030 0.029 0.025 0.030> En1.h2 0.508 0.188 0.187 0.423 0.042 0.039 0.030 0.037> En1.h3 0.552 0.197 0.184 0.391 0.052 0.046 0.035 0.040> En1.h4 0.590 0.218 0.197 0.400 0.060 0.051 0.041 0.046> En1.h5 0.611 0.229 0.201 0.405 0.066 0.053 0.042 0.047> En1.h6 0.636 0.252 0.219 0.420 0.065 0.051 0.042 0.047> En1.h7 0.633 0.310 0.265 0.421 0.047 0.043 0.040 0.041> En1.h8 0.456 0.349 0.283 0.370 0.019 0.027 0.034 0.023> En1.h9 0.331 0.346 0.281 0.306 0.008 0.016 0.021 0.013> En1.h10 0.336 0.351 0.283 0.330 0.007 0.014 0.020 0.013> En1.h11 0.327 0.361 0.303 0.334 0.006 0.015 0.018 0.012> En1.h12 0.356 0.380 0.322 0.359 0.007 0.017 0.021 0.014> En1.h13 0.349 0.385 0.346 0.372 0.008 0.015 0.019 0.014> En1.h14 0.345 0.372 0.358 0.361 0.009 0.016 0.018 0.012> En1.h15 0.387 0.377 0.380 0.408 0.013 0.018 0.021 0.015> En1.h16 0.351 0.371 0.391 0.363 0.012 0.015 0.015 0.014> En1.h17 0.346 0.374 0.394 0.349 0.011 0.016 0.017 0.012> En1.h18 0.356 0.380 0.391 0.379 0.010 0.017 0.022 0.011> En1.h19 0.400 0.383 0.368 0.478 0.010 0.020 0.041 0.014> En1.h20 0.425 0.372 0.328 0.536 0.010 0.018 0.046 0.014> En1.h21 0.448 0.389 0.324 0.585 0.011 0.022 0.050 0.019> En1.h22 0.488 0.417 0.353 0.679 0.012 0.022 0.041 0.024> En1.h23 0.521 0.434 0.395 0.759 0.013 0.021 0.039 0.024> En1.h24 0.561 0.369 0.351 0.698 0.019 0.023 0.034 0.027
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Where is the information? III
> En.h6 En.h12 En.h18 En.h24 Pr.h6 Pr.h12 Pr.h18 Pr.h24> Pr1.h1 0.063 0.014 0.010 0.014 0.412 0.420 0.334 0.494> Pr1.h2 0.068 0.012 0.008 0.010 0.447 0.383 0.277 0.457> Pr1.h3 0.074 0.013 0.010 0.010 0.460 0.339 0.210 0.386> Pr1.h4 0.073 0.014 0.011 0.012 0.468 0.332 0.196 0.371> Pr1.h5 0.070 0.013 0.011 0.012 0.482 0.334 0.192 0.363> Pr1.h6 0.079 0.011 0.011 0.013 0.532 0.386 0.230 0.392> Pr1.h7 0.083 0.011 0.013 0.017 0.564 0.453 0.281 0.409> Pr1.h8 0.050 0.013 0.013 0.013 0.464 0.476 0.354 0.400> Pr1.h9 0.041 0.016 0.014 0.014 0.426 0.474 0.350 0.369> Pr1.h10 0.042 0.016 0.013 0.018 0.425 0.496 0.382 0.409> Pr1.h11 0.053 0.017 0.014 0.022 0.467 0.549 0.411 0.459> Pr1.h12 0.059 0.017 0.014 0.024 0.499 0.587 0.440 0.488> Pr1.h13 0.058 0.018 0.017 0.026 0.492 0.595 0.460 0.487> Pr1.h14 0.064 0.016 0.015 0.025 0.530 0.613 0.458 0.508> Pr1.h15 0.065 0.015 0.014 0.021 0.545 0.606 0.474 0.522> Pr1.h16 0.067 0.014 0.014 0.019 0.562 0.593 0.447 0.489> Pr1.h17 0.063 0.015 0.015 0.019 0.552 0.593 0.464 0.473> Pr1.h18 0.056 0.019 0.017 0.025 0.503 0.591 0.520 0.482> Pr1.h19 0.048 0.023 0.017 0.040 0.376 0.529 0.595 0.460> Pr1.h20 0.053 0.030 0.020 0.067 0.251 0.398 0.522 0.357> Pr1.h21 0.045 0.021 0.013 0.045 0.274 0.436 0.512 0.389> Pr1.h22 0.058 0.029 0.024 0.065 0.297 0.463 0.489 0.409> Pr1.h23 0.069 0.022 0.018 0.041 0.473 0.638 0.571 0.583> Pr1.h24 0.093 0.019 0.015 0.025 0.588 0.622 0.510 0.633
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Where is the information? IV
> En.h6 En.h12 En.h18 En.h24 Pr.h6 Pr.h12 Pr.h18 Pr.h24> y1 0.136 0.182 0.162 0.188 0.016 0.023 0.051 0.010> y7 0.104 0.399 0.374 0.247 0.013 0.056 0.088 0.010> x0 0.127 0.556 0.526 0.334 0.011 0.090 0.135 0.009> nu0 0.013 0.012 0.008 0.026 0.010 0.013 0.019 0.015> fu0 0.137 0.113 0.097 0.158 0.025 0.056 0.050 0.031> fu7 0.114 0.089 0.073 0.138 0.026 0.053 0.048 0.034> cc0 0.106 0.161 0.132 0.231 0.055 0.047 0.043 0.039> hi0 0.059 0.018 0.013 0.041 0.134 0.142 0.118 0.140> eo0 0.036 0.006 0.005 0.006 0.136 0.060 0.016 0.031> price1 0.083 0.025 0.021 0.036 0.586 0.622 0.511 0.558> price7 0.048 0.055 0.050 0.040 0.325 0.523 0.486 0.438> energy1 0.531 0.421 0.382 0.536 0.029 0.032 0.039 0.028> energy7 0.359 0.529 0.491 0.516 0.022 0.047 0.057 0.030
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Table of Contents
1 Introduction
2 Big Data Analysis
3 New challenges, almost same solutionsExploratory Data AnalysisClient profileA test exampleA classification example
4 Conclusion
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Summary I
I A Big Data analyst is a white unicorn that combines anstatistician, a computer scientist and a business analyst withgood communication abilities.
I The practical implementation in a company usually requires aninterdisciplinary team.
I The era of Big Data is a Big opportunity for data scientiststhat must develop smarter procedures for analysis.
I Big Data Analysis is not an universal and automatic answer toall questions. Must be tailored and guided by humans. Nomachine can select or transform the variables without anexpert guide.
I Always balance the complexities (from data, from methodsand models, from restrictions).
M. Febrero–Bande Retos Big Data
Introduction Big Data Analysis New challenges, almost same solutions Conclusion
Summary II
I Be sure that the scale of the solution is adapted to the size ofyour problem.
I What’s the question? Are you sure that you need a Big DataPlatform to answer THAT question?
I Big Data is the new universe. Use it as before you use thepopulation.
M. Febrero–Bande Retos Big Data
References
References I
Anderson, A. and Semmelroth (2015) Statistics for Big Data forDummies. Wiley.
Bühlmann, P., Drineas, P., Kane, M. and van der Laan, M. eds (2016)Handbook of Big Data. CRC Press.
Dean, J.(2014) Big Data, Data Mining, and Machine Learning. ValueCreation for Business Leaders and Practitioners. Wiley.
Prajapati, V. (2013) Big Data Analytics with R and Hadoop. PACKTPublishing Ltd.
Ratner, B. (2011) Statistical and Machine-Learning Data Mining.Techniques for Better Predictive Modeling and Analysis of Big Data, 2ed.CRC Press.
M. Febrero–Bande Retos Big Data