Date post: | 07-Jul-2018 |
Category: |
Documents |
Upload: | najim-essa |
View: | 218 times |
Download: | 0 times |
of 66
8/18/2019 Rapport BI
1/66
ACKNOWLEDGEMENTS
“Silent gratitude isn’t much use to anyone.”
[Gladys Bronwyn Stern]
Our first words of recognition and gratitude are owed to
Mr. Kamel SMAILI, for allowing us the honor to be our
beloved teacher and supervisor, he responded to our
numerous requests, and helped us by his professionalism
and experience to accomplish a better job. He managed to
make us appreciate the "beauty" of Business Intelligence,
and through his marvelous way of teaching, we could
approach the fascinating world of Data Warehouse and
Data Mining.
We would like as well to express our appreciation to all the
students of this master, we’ve been so fortunate to be in
such an impeccable atmosphere. Thank you for your
8/18/2019 Rapport BI
2/66
unending support, your devotion and for making this year
as rewarding and enjoyable to live.
Many thanks to Mr. Abdelatif Bouhlal, who’s no longer
teaching us, but from whom we’ve learned so much
regarding the art of eloquence. We’re deeply grateful to
you sir and we’ll truly never forget you.
Finally, the time spent next to all these persons has been
carved in our memories, they remain the example to
follow, and we’re hoping that someday, we will be able to
convey to our turn, as much as we were able to receive.
8/18/2019 Rapport BI
3/66
TABLE OF CONTENTS
Acknowledgments ......................................................... 2
Table of contents ............................................................ 4
Life before Business Intelligence ................................... 6
BI at a glance ................................................................ 8
INSEE Presentation .................................................... 11
Phase 1: Identifying the project perimeter
Project goals ........................................................ 13
Project progress .................................................. 14
Phase 2: Data Sourcing and ETL
Data Understanding ........................................... 18
Data cleaning ..................................................... 19
Data acquisition & ETL Process ......................... 21
Pentaho’s ETL tool .............................................. 23
Phase 3: Conceiving the Data Warehouse
Data warehousing ............................................... 24
Conceptual Data Model ...................................... 26
Physical Data Model ............................................. 4
The Data marts ..................................................... 4
8/18/2019 Rapport BI
4/66
Tools used ............................................................. 4
Phase 4 : Operating and dissecting the data Marts ..... 1
Data format & Calculated fields ........................... 4
Reports generation ................................................ 4
Phase 5 : Data Mining Phase ........................................ 4
Data Mining Presentation ..................................... 4
Analysing Data ...................................................... 4
8/18/2019 Rapport BI
5/66
LIFE BEFORE BI
In the beginning was the data, and the data was hidden
away somewhere deep in the bowels of the corporate
databases, where only an elite of highly trained users wereable to reach it.
When access to this data was needed, the only way to get
at it, was to ask or even beg one of those highly trained
elite users for help (mainly if the person who’s asking forthese information isn’t a computer scientist). But when
the query finally made its way to the top of Mr. Elite
User’s in-tray, often several months later, the information
that trickled down, in the form of a spreadsheet or even a
printed report would be horrendously out-of-date.
As for whether Mr. Elite User was likely to understand the
business requirements asked in the first place and so avoid
supplying with wrong (or at best irrelevant) information.
8/18/2019 Rapport BI
6/66
Business intelligence remains the solution to this hideous
problem; it sure not only, provides easy access to business
data with its architecture and its collection of integrated
operational as well as decision-support applications, but
improve the ability to study past behaviors and actions in
order to understand more where the organization or the
company stands.
Put simply, BI lets you make better business decisions
because it gives you access to the right information at the
right time.
8/18/2019 Rapport BI
7/66
BI AT A GLANCE
A lot of vague terms were being tossed around to define
Business Intelligence: to one Business person, it means
market research, something we would call “competitive
intelligence.” To another person, “reporting” may be a
better term, even though business intelligence goes well
beyond accessing a static report. “Reporting” and
“analysis” are terms frequently used to describe business
intelligence too. Others will use terms such as “business
analytics” or “decision support,” both with varying
degrees of appropriateness.
How these terms differ matters poorly, unless you are
trying to compare market shares for different
technologies. What matters most is to use the terminology
that is most familiar to intended users and that has a
positive connotation. No matter which terminology you
use, keep the ultimate value of business intelligence in
mind which is providing a pertinent insight, so you can
8/18/2019 Rapport BI
8/66
measure performance in order to take action at a time
when it is still possible, to eventually reach your goals.
Best of all, it lets you do it all yourself, rather than having
to depend on IT professionals to provide you with the data
you need at a time that suits their schedule; it allow you
also to track, understand, manage your business and
several others options such as :
Reporting: Reporting, as its name suggests, enables
you to format and deliver information to largeaudiences both inside and outside your organization
in the form of reports.
Query and analysis: Query and analysis tools provide
you with a means of interacting with businessinformation (by performing your own adhoc queries)
without having to understand the often complex
data that lies underneath this information.
8/18/2019 Rapport BI
9/66
Performance management: Performance
management tools let you keep track of and analyze
key performance indicators and goals using
Dashboards, Scorecards, and Analytics.
What Business Intelligence Is not:
BI is neither a product nor a system.
A data warehouse may or may not be a component
of your business intelligence architecture, but a data
warehouse is not synonymous with businessintelligence.
8/18/2019 Rapport BI
10/66
INSEE PRESENTATION
France's National Institute of Statistics and Economic
Studies (Institut National de la Statistique et des Études
Économiques: INSEE) is a Directorate General of theMinistry of the Economy, Finance, and Employment. It is
therefore a government agency whose personnel are
government employees, although not all belong to the
civil service. INSEE operates under government accounting
rules: it receives its funding from the State's generalbudget.
Getting to know INSEE
Main goal and missions, legislative framework, INSEE in
the European statistical system, brief history, INSEEresources, working at INSEE.
Official Statistics
The official statistical system collects the data needed to
compile quantitative results. In this capacity, it undertakes
8/18/2019 Rapport BI
11/66
censuses and surveys, manages databases, and also draws
on administrative sources.
Quality at the INSEE
This quality rubric describes the rules, methods and
resources that enable official statistics to meet quality
requirements as well as possible. Such a description draws
direct inspiration from the fifteen principles and related
indicators from the European Statistics Code of Practice.
French, European and International statistical sites
Statistics production is conducted under a program, which
is a "decision" applicable to the Member States. INSEE
helps to design and implement multilateral cooperation
programs under the aegis of international organizations
such as Eurostat, U.N. institutions, the World Bank, and
the International Monetary Fund (IMF).
Seminars, conferences and fairs
Conferences and seminars organized by Insee or in which
Insee has participated.
8/18/2019 Rapport BI
12/66
PHASE 1:
ACHIEVEMENT CONTEXT
8/18/2019 Rapport BI
13/66
PROJECT GOALS & OBJECTIVES
The first thing that we had to do primarily is to define the
objectives and the goals for this BI project because it ispractically impossible to create or accomplish a valid
project without a solid understanding of the scope.
Mainly, the objectives of this BI project are:
To transform data into meaningful information
to support effective decisions by improving its
quality, consistency and completeness.
Build a data warehouse based on INSEE results
(the employee’s distribution, the repartition of
student loans, the population statistics, Rates
of death, birth, weeding …) to set the stage for
successful and effective data mining.
Deploy and exploit brightly the data warehouse
with the appropriate tools.
Generate specific and flexible reports.
8/18/2019 Rapport BI
14/66
PROJECT PROGRESS
As a first and foremost important step in our BI project,
we strategically started with identifying the project
perimeter, we mainly analyzed and tried to understand allthe data in the given spreadsheets, then, we did set the
goals and the ultimate objectives relaying on every
specified note.
After identifying the project context and boundary, we
cleaned, swabbed and filtrated the data using the
appropriate ETL tools. ETL, which stands forEXTRACT
TRANSFORM AND LOAD, is the set of functions combined
into one solution that enables to “extract” data from
numerous databases, sources, applications and systems,
“transform” it as appropriate, and “load” it into another
database, a data mart or a data warehouse for analysis, or
send it along to another operational system to support a
business process. Creating a Data Warehouse was the
next phase: we tried to keep in mind that a DW is most
likely to success, if it’s highly organized and flexible.
8/18/2019 Rapport BI
15/66
Then, we exploited and analyze all the Data Marts, using
the options offered by Cognos, we generate also severalreports, and adjust the values format too.
Subsequently to this stage, was the Data Mining Phase,
devoted to apply benefits from collections of data, to
improve business by predicting and understanding
behaviors. Finally, as BI is aimed to response to all types
1
2
3
4
8/18/2019 Rapport BI
16/66
of issues, we inferred in this last phase descriptive or
explanatory models and we construed and interpret all
the results.
8/18/2019 Rapport BI
17/66
PHASE 2:DATA SOURCING & ETL
8/18/2019 Rapport BI
18/66
DATA UNDERSTANDING
After setting up our Bi project’s perimeter and goals, we
proceeded with a very central step which is the data
understanding. There are several things to be learned
about the data even after creating the Data Warehouse or
mining it, such as identifying entities and meanings of
individual attributes.
Fortunately, we didn’t have to collect data - a really crucial
phase, chiefly when it comes to several sources- the Excel
spreadsheets given were widely enough, but we did
however have our share of problems, problems relative
mainly to data comprehension, since some informationswere missed (mostly Dom’s data), other were misplaced,
per example: townships and township’s fractions… we had
to grasp the confusing or ambiguous combinations, and it
took us a long time to seize it.
8/18/2019 Rapport BI
19/66
DATA CLEANING
Data understanding is not an obligatory one, but useful
from many aspects. Main role of data surveying in this
stage is finding out from the general structure of the data,
whether or not there is useful amount of informationenfolded in extracted or given data, which lead us to the
data cleaning phase. Basic as it is, its purpose is to get
healthy Data that can improve final modeling results. This
included checking the consistency of individual attribute
values and types, quantity, removing redundancy andfinding of outliers: we did detect a few anomalies
regarding the slight difference in the the number of
recruits compared to those accepted in internal contests,
especially when there is not free intake test (concours HF
file).
Checking in this phase deals with completeness and
correctness of data. Completness defines the proportion
and regularity of missing values in data. Correctness is
related to discovery of erroneous values present in data,
their extent and possible remedies.
8/18/2019 Rapport BI
20/66
DATA ACQUISITION & ETL
PROCESS
It becomes very difficult to extract desired data. It is easy toimplement something that either misses the user’s
expectations or only partially satisfies them; Data
acquisition or the extract, transform, and load (ETL) process
is a complex set of activities whose sole principle is to
attain the most accurate and integrated data possible andmake it accessible to the enterprise through the data
warehouse.
It includes the following subprocesses :
Extracting which stands for copying the parts that we
needed to the data staging area for further work from the
INSEE’s excel spreadsheets, and purging the data that will
not be used.
8/18/2019 Rapport BI
21/66
Transforming: Once the data was extracted into the data
staging area, we used as many possible transformations as
we could, including correcting misspellings, parsing the
data into standard formats (Like the PIB, we had to convert
it from the ME to the Euro), changing data into the
appropriate Type: the major problem with the data given is
that all the attributes and the values were Text, which is
really senseless, since there is dates involved, numeric
values …. We had also to combine the sources, by matching
and aggregating the information that has the same context,
or even the same structure.
8/18/2019 Rapport BI
22/66
Loading: At the end of the transformation process, we were
able to load the data into CSV files, so that it can be easy to
import into the data base that will be created.
Nevertheless, we apply 80% of the ETL process manually,
the lens being that we had to have the cleanest DWpossible; 20% remaining was handled by an ETL tool called
Pentaho that will be explained in the next chapter.
8/18/2019 Rapport BI
23/66
PENTAHO’S ETL TOOL
We have come a long way from the days where all set
activities had to be done manually: the BI industry has
developed a plethora of tools and technologies to support
the data acquisition process, we’ve chosen for our BIproject the Pentaho data integration, that offers first fully-
unified ETL, modeling and data visualization development
environment for Agile BI.
Here's a preview of Penaho interface while using it for
data transforming:
8/18/2019 Rapport BI
24/66
PHASE 3:DESIGNING THE DATA
WAREHOUSE
8/18/2019 Rapport BI
25/66
DATA WAREHOUSING
Data warehouses collect relevant data from multipledifferent data sources, rationalize, summarize it and
catalog it in large consistent, stable, accurate, long term
data stores which allows not only, for all types of
questions to be answered but provides insights into data
to answer the same question asked multiple differentways to support the decision making process.
Although specific vocabularies vary from organization to
organization, the data warehousing industry is in
agreement that the data warehouse lifecycle model is
fundamentally as described in the diagram of the next
page.
The model, which is a cycle rather than a serialized
timeline, consists of five major phases:
8/18/2019 Rapport BI
26/66
Design: Practically speaking, the best data warehousing
practitioners are those who combine data with indicators
and other critical business metrics.
Prototype: Developing a unanimous working model of a
data warehouse or data mart design, suitable for actual
use. The purpose is to allow a back and forth between
design and prototype.
8/18/2019 Rapport BI
27/66
Deploy: It is at this phase that the single most often
neglected component can undermine the whole process.
Operation: the day-to-day maintenance of the datawarehouse or mart, the data delivery services thatprovides to analysts to keep the warehouse or martcurrent.
Enhancement: In cases where external business
conditions change discontinuously or organizations
themselves undergo discontinuous changes.
8/18/2019 Rapport BI
28/66
CONCEPTUAL DATA MODEL
The following diagram illustrates and defines the portionsthat our Data Warehouse will contain.
Part 1:
We made this portion of the conceptual model, according
to the following management rules:
8/18/2019 Rapport BI
29/66
A contest refers to one and only category, type and intake
type, but categories may have assortment of contests, the
same thing can be applied to intake types and contest
types.
A socio-professional category has at least, one specific
equipment, and inversely equipment may have several
socio-professional categories.
Part 2:
To set up this part of our CDM, we were based on these
regulations:
8/18/2019 Rapport BI
30/66
A Superior Class belongs to one precise category, but a
category encloses many superior classes. There is an
association between class categories, gender and the BAC
option that includes an effective and a percentage of a
precise date.
Part 3:
This is nearly the major bit of our CDM, which embraces
most of the entities that we have. We should mention
however, that we did merge some of the data, because
they share the same structure, such as domiciled births
and deaths.
8/18/2019 Rapport BI
31/66
8/18/2019 Rapport BI
32/66
Sample of data Items
8/18/2019 Rapport BI
33/66
PHYSICAL DATA MODEL
A Physical Data Model includes all the database
entities/tables/views, attributes/columns/fields and the
relationship between the entities that we have defined.
Database performance, indexing strategy, physical storage
and denormalization are important considerations of
creating the physical data model. How the database is
created is dependent to all the constraints implemented in
the PDM.
8/18/2019 Rapport BI
34/66
8/18/2019 Rapport BI
35/66
DATA MARTS
In order to conceive our data marts, we had to form at first
our dimensions and our fact tables. We started by
denormalizing the physical implementation, so that we can
put one fact in numerous places.
Foremost, it improves usability by grouping all the
associated attributes in a table, thus reducing significantly
the total number of tables which a user will face.
Our dimensions are as follows:
Activite Dimension: Merge of two tables ( Activité and sous
activté) this dimension presents the activities and
subactivities related to each turf
8/18/2019 Rapport BI
36/66
BAC Dimension: Merge of two tables (bac and bracnhe)
that nominates all the bac options.
Classes Dimension: Merge of two tables (classes
supérieures and categorie_classe) which proffer all the
superior classes and its categories.
Commune Dimension: combination of quite a lot of tables
(communes associées, cantons, fractions cantonales,
arrondissement, commune), this dimension remains the
geographic dimension that specifies the territory and the
ground.
8/18/2019 Rapport BI
37/66
8/18/2019 Rapport BI
38/66
Equipement, sexe, categoeir_sp, type-recrutement, etat
dimensions: they refer respectively to the equipments,
gender, socio-professional category, recruitment type, and
the state of data (If the GDP is final, semi-final or
provisory).
Though, these dimensions won’t be handy, if it’s not for a
specific kind of tables, primary in each dimensial model
and containing the most useful facts, these tables are
called: the fact table.
Every fact table represents a many-to-many relationship
and every fact table encloses a set of two or more foreign
keys that join to their respective dimension tables.
This is a list of all the fact tables that we’ve gathered and
designed:
Etablissement: This fact table represents the number of
companies in each activity by year and district.
Etablissement details: this one represents the number of
companies in each sub activity.
8/18/2019 Rapport BI
39/66
Serie_Bac: introduces the number of the students and the
percentage of girls per district.
Bibliotheque: clarifies the loans and the rate of registered
made by region and year.
Effectif class_sup : specifies the student’s number of a
specific category by gender, according of course to a
school year
Concours : presents the number of admitted present or
recruited persons that applied to a contest, and presentsthe percentage of women too.
Poucentage: shows the percentage of the used
equipments in every socio-professional category.
Mortalities: represents mortality rates per region peryear.
Marriage: introduces the number of weddings by
department and year.
8/18/2019 Rapport BI
40/66
Nb_naissances_deces: this fact table stipulates the
number of domiciled births and deaths.
PIB_Region: defines the GDP, the GDP per person, the
GDP per job to all the districts.
PIB_Departement: sets apart the GDP, the GDP per
person, and the GDP per job to all the departments.
Population: This final fact table presents municipal
population, and the one who is counted separately of all
the municipalities per year.
Withal, conceiving the data warehouse environment
usually takes the form of replicating the dimension tables
and fact tables, and presenting sometimes these tables as
logical subsets or complete “pie-wedge” of the overall
model known as data marts.
However, our data warehouse includes three Data marts
sorted by realm or context, we distinguished three ones :
8/18/2019 Rapport BI
41/66
The demographic Data Mart
This data mart treats everything that is related to
demography, like weddings, mortality, domiciled deaths
and births.
The Economic data Mart
8/18/2019 Rapport BI
42/66
This one refers to all the economic values and
measures, such as the GDP, the percentage of equipments
used by activities, the number of companies…
The Education Data Mart
The last data mart shows how the dimensions related
to education are managed, as contests, type of contests,
rate of loans….
8/18/2019 Rapport BI
43/66
THE TOOLS USED
In terms of tools used, the choice was difficult, given
the progression of advanced information technologies.
The choice was made carefully and was as follows:
At first, we used xamp to create our Mysql data basebecause of all the advantages that a Mysql db may offer,
such as:
The consolidated view of the base
Quickly testing of the reliability, security andperformance of the tables and the queries.
The robustness and ease of use of such an
Management System database.
8/18/2019 Rapport BI
44/66
But since we used Cognos 7, we had to convert our data
base to an access one, because unfortunately Cognos does
not support a Mysql Base… so eventually, we had to
export it to an XML file, which gave rise to a format
problem : all the different types of data was converted o a
text type, so basically, we had to repair each field.
Even though, we refurbished the data base, we faced
several issues, especially when it comes to robustness of
Access and Cognos: concerning Acces, every time, we had
8/18/2019 Rapport BI
45/66
8/18/2019 Rapport BI
46/66
PHASE 4:OPERATING & DISSECTING
THE DM
8/18/2019 Rapport BI
47/66
DATA FORMAT & CALCULATED
FIELDS
Once we have completed all the steps of conceiving
the data warehouse, we finally got some data loaded, and
had to quarry it, but first we started by converting data
into its appropriate format: GDP to monetary type,
assigning the percentage sign, the Euro sign….
We also customized more than a few fields to make it easy
to understand or interpret, above all when it comes to
reporting, which will be stagger in the next chapter.
Calculated fields was a real help and release, we didn’thave to change our queries or create new ones, to obtain
Data… we used it most when in the data Mart related to
studies. We could view the result of a formula that uses
information from other fields in the cube.
8/18/2019 Rapport BI
48/66
DATA FORMAT & CALCULATED
FIELDS
Cognos provides among other options the ability to create,
deploy and manage interactive, tabular or even graphical
reports, from multiple data sources. We tried to generate
the essential ones.
8/18/2019 Rapport BI
49/66
The two reports present the births and deaths domiciled in
France, the first one is general, but the second concerns
only the departments of Dom for 2006 and 2007.
8/18/2019 Rapport BI
50/66
8/18/2019 Rapport BI
51/66
8/18/2019 Rapport BI
52/66
The first two graphs of the previous page refers to the
number of weddings in 200- and 2007, and as we see ,
there is a slight difference between the two graphs, with Ile
de France remaining as the municipality with the highest
number. The other report presents the GDP of all the 26
regions, with Rhone Alpes as the fist region in term of GDP.
And the last report clarifies its evolution for Metropolitan
France (2000 to 2007), and shows that the GDP didn’t
retreat at all.
8/18/2019 Rapport BI
53/66
PHASE 5:DATA MINING PHASE
8/18/2019 Rapport BI
54/66
DATA MINING PRESENTATION
According to the Gartner Group, “Data mining is the
process of discovering meaningful new correlations,
patterns and trends by sifting through large amounts of
data stored in repositories, using pattern recognitiontechnologies as well as statistical and mathematical
techniques.” There are other definitions:
“Data mining is the analysis of observational data
sets to find unsuspected relationships and to
summarize the data in new ways”.
“Data mining is an interdisciplinary field bringing
togther techniques from machine learning, pattern
recognition, statistics, databases, and visualization
to address the issue of information extraction from
large data bases”.
However, we tried as hard as we could to describe,
estimate, predict, classify, cluster and associate the data
that we had.
8/18/2019 Rapport BI
55/66
ANALYSING DATA
To analyze data, we’ve chosen to work with the tool
"WEKA”. The advantage being that this tool is programmed
in JAVA and therefore relatively fast. Moreover it is
extremely reliable. It has all the algorithms, classification
and searching functions. Besides it contains and offers a
large range of performance when it comes to graph
conceiving.
After configuring weka correctly and establishing the
connection, we retried the data that we wanted to
analyze, using the explorer interface of the tool.
8/18/2019 Rapport BI
56/66
TEST METHODS
We were interested to decision trees and methods of K-
Means. We started by the decision trees, we applied the j-
48 algorithm ( an improved version of the algorithm C4.5
Quinlain).
Decision trees:
The figure below is a decision tree listing similardepartments in terms of births and deaths
8/18/2019 Rapport BI
57/66
The second example concerns the decision tree
classification of GDP by department according to their
values
B- K- Means
In statistics and machine learning, k-means clustering is a
method of cluster analysis which aims to partition n-
observations into k clusters in which each observation
belongs to the cluster with the nearest mean. It is similar
to the expectation-maximization algorithm for mixtures of
8/18/2019 Rapport BI
58/66
Gaussians in that they both attempt to find the centers of
natural clusters in the data as well as in the iterative
refinement approach employed by both algorithms.
We take advantage of this algorithm to test our data forGDP and number of marriages in the departments
The results are:
GDP (Gross domestic product)
K=2
=== Run i nf or mat i on ===
Scheme: weka. cl ust er er s. Si mpl eKMeans - N 2 - A"weka. cor e. Eucl i deanDi st ance - R f i r st - l ast " - I 500 - S 10Rel at i on: Quer yResul tI nst ances: 96At t r i but es: 2
pi b
nom_depar t ement Test mode: eval uat e on t r ai ni ng dat a
=== Model and eval uat i on on t r ai ni ng set ===
kMeans======
Number of i t er at i ons: 7Wi t hi n cl ust er sum of squared er r ors : 94. 93524745601803Mi ss i ng val ues gl obal l y r epl aced wi t h mean/ mode
Cl ust er cent r oi ds:Cl ust er #
8/18/2019 Rapport BI
59/66
At t r i but e Ful l Dat a 0 1( 96) ( 78) ( 18)=============================================================================pi b 17670552083. 333310561371794. 8718 48477000000nom_depar t ement Ai nAi sne Ai n
Cl ust er ed I nst ances
0 78 ( 81%)1 18 ( 19%)
K=3
kMeans======
Number of i t er at i ons: 14Wi t hi n cl ust er sum of squared er r ors : 93. 52052880969502Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:
Cl ust er#At t r i but e Ful l Dat a0 1 2
( 96)( 72) ( 21) ( 3)
==================================================================================================pi b 17670552083. 33339343222222. 2222 34178333333. 3333 101972000000nom_depar t ement Ai nAi sne Ai n Al pes- Mar i t i mes
Cl ust er ed I nst ances
0 72 ( 75%)1 21 ( 22%)
2 3 ( 3%)
8/18/2019 Rapport BI
60/66
K=4
kMeans======
Number of i t er at i ons: 14Wi t hi n cl ust er sum of squared er r ors : 93. 52052880969502Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:Cl ust er#At t r i but e Ful l Dat a0 1 2
( 96)( 72) ( 21) ( 3)==================================================================================================pi b 17670552083. 33339343222222. 2222 34178333333. 3333 101972000000nom_depar t ement Ai n
Ai sne Ai n Al pes- Mar i t i mes
K-4
Cl ust er ed I nst ances
0 72 ( 75%)1 21 ( 22%)2 3 ( 3%)
K=6
kMeans
======
Number of i t er at i ons: 11Wi t hi n cl ust er sum of squared er r ors : 90. 27658195695966Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:
Cl ust er#
8/18/2019 Rapport BI
61/66
At t r i but e Ful l Dat a0 1 23 4 5
( 96)( 25) ( 27) ( 7)( 20) ( 3) ( 14)==================================================================================================================================================================================================pi b 17670552083. 33337723840000 14707333333. 3333 43717571428. 57143708000000 110004333333. 3333 28284500000nom_depar t ement Ai nAi sne Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?neFi ni st ?r e
Cl ust er ed I nst ances
0 25 ( 26%)1 27 ( 28%)2 7 ( 7%)3 20 ( 21%)4 3 ( 3%)5 14 ( 15%)
K=7
kMeans======
Number of i t er at i ons: 12Wi t hi n cl ust er sum of squared er r ors : 89. 27102393425096Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:
Cl ust er#At t r i but e Ful l Dat a0 1 23 4 56
( 96)( 23) ( 18) ( 7)( 20) ( 3) ( 11)( 14)
8/18/2019 Rapport BI
62/66
pi b 17670552083. 33337465304347. 8261 12792333333. 333343717571428. 5714 3708000000110004333333. 3333 29749727272. 727318354714285. 7143nom_depar t ement Ai nAl l i er Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?neFi ni st ?r e Cal vados
Cl ust er ed I nst ances
0 23 ( 24%)1 18 ( 19%)2 7 ( 7%)3 20 ( 21%)4 3 ( 3%)5 11 ( 11%)6 14 ( 15%)
And K=10
kMeans======
Number of i t er at i ons: 10Wi t hi n cl ust er sum of squared er r ors : 86. 26315726326193Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:
Cl ust er#At t r i but e Ful l Dat a0 1 23 4 56 7 89
( 96)(16) (8) (7)(19) (3) (9)(12) (9) (5)(8 )======================================================================================================================pi b 17670552083. 33336707000000 13605250000 43717571428. 57143614684210. 5263 110004333333. 333331099777777. 7778 16968916666. 666711780333333. 3333 232170000008733875000nom_depar t ement Ai nAl l i er Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?ne
8/18/2019 Rapport BI
63/66
Haut e- Gar onne Cal vadosAi sne Fi ni st ?r e Char ent e
Cl ust er ed I nst ances
0 16 ( 17%)1 8 ( 8%)2 7 ( 7%)3 19 ( 20%)4 3 ( 3%)5 9 ( 9%)6 12 ( 13%)7 9 ( 9%)8 5 ( 5%)9 8 ( 8%)
Marriages
K=2
kMeans======
Number of i t er at i ons: 6Wi t hi n cl ust er sum of squared er r or s: 204. 48640349553273Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:Cl ust er #
At t r i but e Ful l Dat a 01
( 200) ( 150)( 50)=======================================================================nbr decesdomi ci l l i e 5273. 02 3728. 21339907. 44nbr nai ssancesvi vant edomi ci l l i e 8232. 795 4988. 0617967nbrmar i ages 2738. 765 1862. 44675367. 72
8/18/2019 Rapport BI
64/66
nom_depar t ement Val - d' Oi se Val - d' Oi seVal - de- Marne
Cl ust er ed I nst ances
0 150 ( 75%)1 50 ( 25%)
K=4
kMeans======
Number of i t er at i ons: 12Wi t hi n cl ust er sum of squared er r or s: 196. 85641391574097Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode
Cl ust er cent r oi ds:Cl ust er #
At t r i but e Ful l Dat a 01 2 3
( 200) ( 38)( 16) ( 71) ( 75)=================================================================================================nbr decesdomi ci l l i e 5273. 02 8254. 842112836. 3125 4866. 5493 2533. 52nbr nai ssancesvi vant edomi ci l l i e 8232. 795 13968. 210526404. 75 6736. 1127 2867. 0267nbrmar i ages 2738. 765 5206. 15795695. 125 2441. 3239 1139. 5067nom_depar t ement Val - d' Oi se Val - d' Oi seVal - de- Mar ne Mar ne Haut e- Mar ne
Cl ust er ed I nst ances
0 38 ( 19%)1 16 ( 8%)2 71 ( 36%)3 75 ( 38%)..
.K=10
8/18/2019 Rapport BI
65/66
8/18/2019 Rapport BI
66/66
CONCLUSION
This project certainly gave us a lot of trouble: some
problems were encountered during conceiving the Data
Warehouse and analyzing it, however, these problems
have been overcome and this is mainly thanks to the
support and assistance from members of the team.
However, this project allowed us to highlight the fact that
teamwork is the cornerstone of every labor.
Finally, we greatly appreciate the opportunity that was
given to us, since we could address issues of knowledge,
skills, adaptability, context and values.