+ All Categories
Home > Documents > Rapport BI

Rapport BI

Date post: 07-Jul-2018
Category:
Upload: najim-essa
View: 218 times
Download: 0 times
Share this document with a friend

of 66

Transcript
  • 8/18/2019 Rapport BI

    1/66

    ACKNOWLEDGEMENTS

    “Silent gratitude isn’t much use to anyone.”

    [Gladys Bronwyn Stern]

    Our first words of recognition and gratitude are owed to

    Mr. Kamel SMAILI, for allowing us the honor to be our

    beloved teacher and supervisor, he responded to our

    numerous requests, and helped us by his professionalism

    and experience to accomplish a better job. He managed to

    make us appreciate the "beauty" of Business Intelligence,

    and through his marvelous way of teaching, we could

    approach the fascinating world of Data Warehouse and

    Data Mining.

    We would like as well to express our appreciation to all the

    students of this master, we’ve been so fortunate to be in

    such an impeccable atmosphere. Thank you for your

  • 8/18/2019 Rapport BI

    2/66

    unending support, your devotion and for making this year

    as rewarding and enjoyable to live.

    Many thanks to Mr. Abdelatif Bouhlal, who’s no longer

    teaching us, but from whom we’ve learned so much

    regarding the art of eloquence. We’re deeply grateful to

    you sir and we’ll truly never forget you.

    Finally, the time spent next to all these persons has been

    carved in our memories, they remain the example to

    follow, and we’re hoping that someday, we will be able to

    convey to our turn, as much as we were able to receive.

  • 8/18/2019 Rapport BI

    3/66

    TABLE OF CONTENTS

    Acknowledgments ......................................................... 2

    Table of contents ............................................................ 4

    Life before Business Intelligence ................................... 6

    BI at a glance ................................................................ 8

    INSEE Presentation .................................................... 11

    Phase 1: Identifying the project perimeter

    Project goals ........................................................ 13

    Project progress .................................................. 14

    Phase 2: Data Sourcing and ETL

    Data Understanding ........................................... 18

    Data cleaning ..................................................... 19

    Data acquisition & ETL Process ......................... 21

    Pentaho’s ETL tool .............................................. 23

    Phase 3: Conceiving the Data Warehouse

    Data warehousing ............................................... 24

    Conceptual Data Model ...................................... 26

    Physical Data Model ............................................. 4

    The Data marts ..................................................... 4

  • 8/18/2019 Rapport BI

    4/66

    Tools used ............................................................. 4

    Phase 4 : Operating and dissecting the data Marts ..... 1

    Data format & Calculated fields ........................... 4

    Reports generation ................................................ 4

    Phase 5 : Data Mining Phase ........................................ 4

    Data Mining Presentation ..................................... 4

    Analysing Data ...................................................... 4

  • 8/18/2019 Rapport BI

    5/66

    LIFE BEFORE BI

    In the beginning was the data, and the data was hidden

    away somewhere deep in the bowels of the corporate

    databases, where only an elite of highly trained users wereable to reach it.

    When access to this data was needed, the only way to get

    at it, was to ask or even beg one of those highly trained

    elite users for help (mainly if the person who’s asking forthese information isn’t a computer scientist). But when

    the query finally made its way to the top of Mr. Elite

    User’s in-tray, often several months later, the information

    that trickled down, in the form of a spreadsheet or even a

    printed report would be horrendously out-of-date.

    As for whether Mr. Elite User was likely to understand the

    business requirements asked in the first place and so avoid

    supplying with wrong (or at best irrelevant) information.

  • 8/18/2019 Rapport BI

    6/66

    Business intelligence remains the solution to this hideous

    problem; it sure not only, provides easy access to business

    data with its architecture and its collection of integrated

    operational as well as decision-support applications, but

    improve the ability to study past behaviors and actions in

    order to understand more where the organization or the

    company stands.

    Put simply, BI lets you make better business decisions

    because it gives you access to the right information at the

    right time.

  • 8/18/2019 Rapport BI

    7/66

    BI AT A GLANCE

    A lot of vague terms were being tossed around to define

    Business Intelligence: to one Business person, it means

    market research, something we would call “competitive

    intelligence.” To another person, “reporting” may be a

    better term, even though business intelligence goes well

    beyond accessing a static report. “Reporting” and

    “analysis” are terms frequently used to describe business

    intelligence too. Others will use terms such as “business

    analytics” or “decision support,” both with varying

    degrees of appropriateness.

    How these terms differ matters poorly, unless you are

    trying to compare market shares for different

    technologies. What matters most is to use the terminology

    that is most familiar to intended users and that has a

    positive connotation. No matter which terminology you

    use, keep the ultimate value of business intelligence in

    mind which is providing a pertinent insight, so you can

  • 8/18/2019 Rapport BI

    8/66

    measure performance in order to take action at a time

    when it is still possible, to eventually reach your goals.

    Best of all, it lets you do it all yourself, rather than having

    to depend on IT professionals to provide you with the data

    you need at a time that suits their schedule; it allow you

    also to track, understand, manage your business and

    several others options such as :

    Reporting: Reporting, as its name suggests, enables

    you to format and deliver information to largeaudiences both inside and outside your organization

    in the form of reports.

    Query and analysis: Query and analysis tools provide

    you with a means of interacting with businessinformation (by performing your own adhoc queries)

    without having to understand the often complex

    data that lies underneath this information.

  • 8/18/2019 Rapport BI

    9/66

    Performance management: Performance

    management tools let you keep track of and analyze

    key performance indicators and goals using

    Dashboards, Scorecards, and Analytics.

    What Business Intelligence Is not:

    BI is neither a product nor a system.

    A data warehouse may or may not be a component

    of your business intelligence architecture, but a data

    warehouse is not synonymous with businessintelligence.

  • 8/18/2019 Rapport BI

    10/66

    INSEE PRESENTATION

    France's National Institute of Statistics and Economic

    Studies (Institut National de la Statistique et des Études

    Économiques: INSEE) is a Directorate General of theMinistry of the Economy, Finance, and Employment. It is

    therefore a government agency whose personnel are

    government employees, although not all belong to the

    civil service. INSEE operates under government accounting

    rules: it receives its funding from the State's generalbudget.

    Getting to know INSEE

    Main goal and missions, legislative framework, INSEE in

    the European statistical system, brief history, INSEEresources, working at INSEE.

    Official Statistics

    The official statistical system collects the data needed to

    compile quantitative results. In this capacity, it undertakes

  • 8/18/2019 Rapport BI

    11/66

    censuses and surveys, manages databases, and also draws

    on administrative sources.

    Quality at the INSEE

    This quality rubric describes the rules, methods and

    resources that enable official statistics to meet quality

    requirements as well as possible. Such a description draws

    direct inspiration from the fifteen principles and related

    indicators from the European Statistics Code of Practice.

    French, European and International statistical sites

    Statistics production is conducted under a program, which

    is a "decision" applicable to the Member States. INSEE

    helps to design and implement multilateral cooperation

    programs under the aegis of international organizations

    such as Eurostat, U.N. institutions, the World Bank, and

    the International Monetary Fund (IMF).

    Seminars, conferences and fairs

    Conferences and seminars organized by Insee or in which

    Insee has participated.

  • 8/18/2019 Rapport BI

    12/66

    PHASE 1:

    ACHIEVEMENT CONTEXT

  • 8/18/2019 Rapport BI

    13/66

    PROJECT GOALS & OBJECTIVES

    The first thing that we had to do primarily is to define the

    objectives and the goals for this BI project because it ispractically impossible to create or accomplish a valid

    project without a solid understanding of the scope.

    Mainly, the objectives of this BI project are:

    To transform data into meaningful information

    to support effective decisions by improving its

    quality, consistency and completeness.

    Build a data warehouse based on INSEE results

    (the employee’s distribution, the repartition of

    student loans, the population statistics, Rates

    of death, birth, weeding …) to set the stage for

    successful and effective data mining.

    Deploy and exploit brightly the data warehouse

    with the appropriate tools.

    Generate specific and flexible reports.

  • 8/18/2019 Rapport BI

    14/66

    PROJECT PROGRESS

    As a first and foremost important step in our BI project,

    we strategically started with identifying the project

    perimeter, we mainly analyzed and tried to understand allthe data in the given spreadsheets, then, we did set the

    goals and the ultimate objectives relaying on every

    specified note.

    After identifying the project context and boundary, we

    cleaned, swabbed and filtrated the data using the

    appropriate ETL tools. ETL, which stands forEXTRACT

    TRANSFORM AND LOAD, is the set of functions combined

    into one solution that enables to “extract” data from

    numerous databases, sources, applications and systems,

    “transform” it as appropriate, and “load” it into another

    database, a data mart or a data warehouse for analysis, or

    send it along to another operational system to support a

    business process. Creating a Data Warehouse was the

    next phase: we tried to keep in mind that a DW is most

    likely to success, if it’s highly organized and flexible.

  • 8/18/2019 Rapport BI

    15/66

    Then, we exploited and analyze all the Data Marts, using

    the options offered by Cognos, we generate also severalreports, and adjust the values format too.

    Subsequently to this stage, was the Data Mining Phase,

    devoted to apply benefits from collections of data, to

    improve business by predicting and understanding

    behaviors. Finally, as BI is aimed to response to all types

    1

    2

    3

    4

  • 8/18/2019 Rapport BI

    16/66

    of issues, we inferred in this last phase descriptive or

    explanatory models and we construed and interpret all

    the results.

  • 8/18/2019 Rapport BI

    17/66

    PHASE 2:DATA SOURCING & ETL

  • 8/18/2019 Rapport BI

    18/66

    DATA UNDERSTANDING

    After setting up our Bi project’s perimeter and goals, we

    proceeded with a very central step which is the data

    understanding. There are several things to be learned

    about the data even after creating the Data Warehouse or

    mining it, such as identifying entities and meanings of

    individual attributes.

    Fortunately, we didn’t have to collect data - a really crucial

    phase, chiefly when it comes to several sources- the Excel

    spreadsheets given were widely enough, but we did

    however have our share of problems, problems relative

    mainly to data comprehension, since some informationswere missed (mostly Dom’s data), other were misplaced,

    per example: townships and township’s fractions… we had

    to grasp the confusing or ambiguous combinations, and it

    took us a long time to seize it.

  • 8/18/2019 Rapport BI

    19/66

    DATA CLEANING

    Data understanding is not an obligatory one, but useful

    from many aspects. Main role of data surveying in this

    stage is finding out from the general structure of the data,

    whether or not there is useful amount of informationenfolded in extracted or given data, which lead us to the

    data cleaning phase. Basic as it is, its purpose is to get

    healthy Data that can improve final modeling results. This

    included checking the consistency of individual attribute

    values and types, quantity, removing redundancy andfinding of outliers: we did detect a few anomalies

    regarding the slight difference in the the number of

    recruits compared to those accepted in internal contests,

    especially when there is not free intake test (concours HF

    file).

    Checking in this phase deals with completeness and

    correctness of data. Completness defines the proportion

    and regularity of missing values in data. Correctness is

    related to discovery of erroneous values present in data,

    their extent and possible remedies.

  • 8/18/2019 Rapport BI

    20/66

    DATA ACQUISITION & ETL

    PROCESS

    It becomes very difficult to extract desired data. It is easy toimplement something that either misses the user’s

    expectations or only partially satisfies them; Data

    acquisition or the extract, transform, and load (ETL) process

    is a complex set of activities whose sole principle is to

    attain the most accurate and integrated data possible andmake it accessible to the enterprise through the data

    warehouse.

    It includes the following subprocesses :

    Extracting which stands for copying the parts that we

    needed to the data staging area for further work from the

    INSEE’s excel spreadsheets, and purging the data that will

    not be used.

  • 8/18/2019 Rapport BI

    21/66

    Transforming: Once the data was extracted into the data

    staging area, we used as many possible transformations as

    we could, including correcting misspellings, parsing the

    data into standard formats (Like the PIB, we had to convert

    it from the ME to the Euro), changing data into the

    appropriate Type: the major problem with the data given is

    that all the attributes and the values were Text, which is

    really senseless, since there is dates involved, numeric

    values …. We had also to combine the sources, by matching

    and aggregating the information that has the same context,

    or even the same structure.

  • 8/18/2019 Rapport BI

    22/66

    Loading: At the end of the transformation process, we were

    able to load the data into CSV files, so that it can be easy to

    import into the data base that will be created.

    Nevertheless, we apply 80% of the ETL process manually,

    the lens being that we had to have the cleanest DWpossible; 20% remaining was handled by an ETL tool called

    Pentaho that will be explained in the next chapter.

  • 8/18/2019 Rapport BI

    23/66

    PENTAHO’S ETL TOOL

    We have come a long way from the days where all set

    activities had to be done manually: the BI industry has

    developed a plethora of tools and technologies to support

    the data acquisition process, we’ve chosen for our BIproject the Pentaho data integration, that offers first fully-

    unified ETL, modeling and data visualization development

    environment for Agile BI.

    Here's a preview of Penaho interface while using it for

    data transforming:

  • 8/18/2019 Rapport BI

    24/66

    PHASE 3:DESIGNING THE DATA

    WAREHOUSE

  • 8/18/2019 Rapport BI

    25/66

    DATA WAREHOUSING

    Data warehouses collect relevant data from multipledifferent data sources, rationalize, summarize it and

    catalog it in large consistent, stable, accurate, long term

    data stores which allows not only, for all types of

    questions to be answered but provides insights into data

    to answer the same question asked multiple differentways to support the decision making process.

    Although specific vocabularies vary from organization to

    organization, the data warehousing industry is in

    agreement that the data warehouse lifecycle model is

    fundamentally as described in the diagram of the next

    page.

    The model, which is a cycle rather than a serialized

    timeline, consists of five major phases:

  • 8/18/2019 Rapport BI

    26/66

    Design: Practically speaking, the best data warehousing

    practitioners are those who combine data with indicators

    and other critical business metrics.

    Prototype: Developing a unanimous working model of a

    data warehouse or data mart design, suitable for actual

    use. The purpose is to allow a back and forth between

    design and prototype.

  • 8/18/2019 Rapport BI

    27/66

    Deploy: It is at this phase that the single most often

    neglected component can undermine the whole process.

    Operation: the day-to-day maintenance of the datawarehouse or mart, the data delivery services thatprovides to analysts to keep the warehouse or martcurrent.

    Enhancement: In cases where external business

    conditions change discontinuously or organizations

    themselves undergo discontinuous changes.

  • 8/18/2019 Rapport BI

    28/66

    CONCEPTUAL DATA MODEL

    The following diagram illustrates and defines the portionsthat our Data Warehouse will contain.

    Part 1:

    We made this portion of the conceptual model, according

    to the following management rules:

  • 8/18/2019 Rapport BI

    29/66

    A contest refers to one and only category, type and intake

    type, but categories may have assortment of contests, the

    same thing can be applied to intake types and contest

    types.

    A socio-professional category has at least, one specific

    equipment, and inversely equipment may have several

    socio-professional categories.

    Part 2:

    To set up this part of our CDM, we were based on these

    regulations:

  • 8/18/2019 Rapport BI

    30/66

    A Superior Class belongs to one precise category, but a

    category encloses many superior classes. There is an

    association between class categories, gender and the BAC

    option that includes an effective and a percentage of a

    precise date.

    Part 3:

    This is nearly the major bit of our CDM, which embraces

    most of the entities that we have. We should mention

    however, that we did merge some of the data, because

    they share the same structure, such as domiciled births

    and deaths.

  • 8/18/2019 Rapport BI

    31/66

  • 8/18/2019 Rapport BI

    32/66

    Sample of data Items

  • 8/18/2019 Rapport BI

    33/66

    PHYSICAL DATA MODEL

    A Physical Data Model includes all the database

    entities/tables/views, attributes/columns/fields and the

    relationship between the entities that we have defined.

    Database performance, indexing strategy, physical storage

    and denormalization are important considerations of

    creating the physical data model. How the database is

    created is dependent to all the constraints implemented in

    the PDM.

  • 8/18/2019 Rapport BI

    34/66

  • 8/18/2019 Rapport BI

    35/66

    DATA MARTS

    In order to conceive our data marts, we had to form at first

    our dimensions and our fact tables. We started by

    denormalizing the physical implementation, so that we can

    put one fact in numerous places.

    Foremost, it improves usability by grouping all the

    associated attributes in a table, thus reducing significantly

    the total number of tables which a user will face.

    Our dimensions are as follows:

    Activite Dimension: Merge of two tables ( Activité and sous

    activté) this dimension presents the activities and

    subactivities related to each turf

  • 8/18/2019 Rapport BI

    36/66

    BAC Dimension: Merge of two tables (bac and bracnhe)

    that nominates all the bac options.

    Classes Dimension: Merge of two tables (classes

    supérieures and categorie_classe) which proffer all the

    superior classes and its categories.

    Commune Dimension: combination of quite a lot of tables

    (communes associées, cantons, fractions cantonales,

    arrondissement, commune), this dimension remains the

    geographic dimension that specifies the territory and the

    ground.

  • 8/18/2019 Rapport BI

    37/66

  • 8/18/2019 Rapport BI

    38/66

    Equipement, sexe, categoeir_sp, type-recrutement, etat

    dimensions: they refer respectively to the equipments,

    gender, socio-professional category, recruitment type, and

    the state of data (If the GDP is final, semi-final or

    provisory).

    Though, these dimensions won’t be handy, if it’s not for a

    specific kind of tables, primary in each dimensial model

    and containing the most useful facts, these tables are

    called: the fact table.

    Every fact table represents a many-to-many relationship

    and every fact table encloses a set of two or more foreign

    keys that join to their respective dimension tables.

    This is a list of all the fact tables that we’ve gathered and

    designed:

    Etablissement: This fact table represents the number of

    companies in each activity by year and district.

    Etablissement details: this one represents the number of

    companies in each sub activity.

  • 8/18/2019 Rapport BI

    39/66

    Serie_Bac: introduces the number of the students and the

    percentage of girls per district.

    Bibliotheque: clarifies the loans and the rate of registered

    made by region and year.

    Effectif class_sup : specifies the student’s number of a

    specific category by gender, according of course to a

    school year

    Concours : presents the number of admitted present or

    recruited persons that applied to a contest, and presentsthe percentage of women too.

    Poucentage: shows the percentage of the used

    equipments in every socio-professional category.

    Mortalities: represents mortality rates per region peryear.

    Marriage: introduces the number of weddings by

    department and year.

  • 8/18/2019 Rapport BI

    40/66

    Nb_naissances_deces: this fact table stipulates the

    number of domiciled births and deaths.

    PIB_Region: defines the GDP, the GDP per person, the

    GDP per job to all the districts.

    PIB_Departement: sets apart the GDP, the GDP per

    person, and the GDP per job to all the departments.

    Population: This final fact table presents municipal

    population, and the one who is counted separately of all

    the municipalities per year.

    Withal, conceiving the data warehouse environment

    usually takes the form of replicating the dimension tables

    and fact tables, and presenting sometimes these tables as

    logical subsets or complete “pie-wedge” of the overall

    model known as data marts.

    However, our data warehouse includes three Data marts

    sorted by realm or context, we distinguished three ones :

  • 8/18/2019 Rapport BI

    41/66

    The demographic Data Mart

    This data mart treats everything that is related to

    demography, like weddings, mortality, domiciled deaths

    and births.

    The Economic data Mart

  • 8/18/2019 Rapport BI

    42/66

    This one refers to all the economic values and

    measures, such as the GDP, the percentage of equipments

    used by activities, the number of companies…

    The Education Data Mart

    The last data mart shows how the dimensions related

    to education are managed, as contests, type of contests,

    rate of loans….

  • 8/18/2019 Rapport BI

    43/66

    THE TOOLS USED

    In terms of tools used, the choice was difficult, given

    the progression of advanced information technologies.

    The choice was made carefully and was as follows:

    At first, we used xamp to create our Mysql data basebecause of all the advantages that a Mysql db may offer,

    such as:

    The consolidated view of the base

    Quickly testing of the reliability, security andperformance of the tables and the queries.

    The robustness and ease of use of such an

    Management System database.

  • 8/18/2019 Rapport BI

    44/66

    But since we used Cognos 7, we had to convert our data

    base to an access one, because unfortunately Cognos does

    not support a Mysql Base… so eventually, we had to

    export it to an XML file, which gave rise to a format

    problem : all the different types of data was converted o a

    text type, so basically, we had to repair each field.

    Even though, we refurbished the data base, we faced

    several issues, especially when it comes to robustness of

    Access and Cognos: concerning Acces, every time, we had

  • 8/18/2019 Rapport BI

    45/66

  • 8/18/2019 Rapport BI

    46/66

    PHASE 4:OPERATING & DISSECTING

    THE DM

  • 8/18/2019 Rapport BI

    47/66

    DATA FORMAT & CALCULATED

    FIELDS

    Once we have completed all the steps of conceiving

    the data warehouse, we finally got some data loaded, and

    had to quarry it, but first we started by converting data

    into its appropriate format: GDP to monetary type,

    assigning the percentage sign, the Euro sign….

    We also customized more than a few fields to make it easy

    to understand or interpret, above all when it comes to

    reporting, which will be stagger in the next chapter.

    Calculated fields was a real help and release, we didn’thave to change our queries or create new ones, to obtain

    Data… we used it most when in the data Mart related to

    studies. We could view the result of a formula that uses

    information from other fields in the cube.

  • 8/18/2019 Rapport BI

    48/66

    DATA FORMAT & CALCULATED

    FIELDS

    Cognos provides among other options the ability to create,

    deploy and manage interactive, tabular or even graphical

    reports, from multiple data sources. We tried to generate

    the essential ones.

  • 8/18/2019 Rapport BI

    49/66

    The two reports present the births and deaths domiciled in

    France, the first one is general, but the second concerns

    only the departments of Dom for 2006 and 2007.

  • 8/18/2019 Rapport BI

    50/66

  • 8/18/2019 Rapport BI

    51/66

  • 8/18/2019 Rapport BI

    52/66

    The first two graphs of the previous page refers to the

    number of weddings in 200- and 2007, and as we see ,

    there is a slight difference between the two graphs, with Ile

    de France remaining as the municipality with the highest

    number. The other report presents the GDP of all the 26

    regions, with Rhone Alpes as the fist region in term of GDP.

    And the last report clarifies its evolution for Metropolitan

    France (2000 to 2007), and shows that the GDP didn’t

    retreat at all.

  • 8/18/2019 Rapport BI

    53/66

    PHASE 5:DATA MINING PHASE

  • 8/18/2019 Rapport BI

    54/66

    DATA MINING PRESENTATION

    According to the Gartner Group, “Data mining is the

    process of discovering meaningful new correlations,

    patterns and trends by sifting through large amounts of

    data stored in repositories, using pattern recognitiontechnologies as well as statistical and mathematical

    techniques.” There are other definitions:

    “Data mining is the analysis of observational data

    sets to find unsuspected relationships and to

    summarize the data in new ways”.

    “Data mining is an interdisciplinary field bringing

    togther techniques from machine learning, pattern

    recognition, statistics, databases, and visualization

    to address the issue of information extraction from

    large data bases”.

    However, we tried as hard as we could to describe,

    estimate, predict, classify, cluster and associate the data

    that we had.

  • 8/18/2019 Rapport BI

    55/66

    ANALYSING DATA

    To analyze data, we’ve chosen to work with the tool

    "WEKA”. The advantage being that this tool is programmed

    in JAVA and therefore relatively fast. Moreover it is

    extremely reliable. It has all the algorithms, classification

    and searching functions. Besides it contains and offers a

    large range of performance when it comes to graph

    conceiving.

    After configuring weka correctly and establishing the

    connection, we retried the data that we wanted to

    analyze, using the explorer interface of the tool.

  • 8/18/2019 Rapport BI

    56/66

    TEST METHODS

    We were interested to decision trees and methods of K-

    Means. We started by the decision trees, we applied the j-

    48 algorithm ( an improved version of the algorithm C4.5

    Quinlain).

    Decision trees:

    The figure below is a decision tree listing similardepartments in terms of births and deaths

  • 8/18/2019 Rapport BI

    57/66

    The second example concerns the decision tree

    classification of GDP by department according to their

    values

    B- K- Means

    In statistics and machine learning, k-means clustering is a

    method of cluster analysis which aims to partition n-

    observations into k clusters in which each observation

    belongs to the cluster with the nearest mean. It is similar

    to the expectation-maximization algorithm for mixtures of

  • 8/18/2019 Rapport BI

    58/66

    Gaussians in that they both attempt to find the centers of

    natural clusters in the data as well as in the iterative

    refinement approach employed by both algorithms.

    We take advantage of this algorithm to test our data forGDP and number of marriages in the departments

    The results are:

    GDP (Gross domestic product)

    K=2

    === Run i nf or mat i on ===

    Scheme: weka. cl ust er er s. Si mpl eKMeans - N 2 - A"weka. cor e. Eucl i deanDi st ance - R f i r st - l ast " - I 500 - S 10Rel at i on: Quer yResul tI nst ances: 96At t r i but es: 2

    pi b

    nom_depar t ement Test mode: eval uat e on t r ai ni ng dat a

    === Model and eval uat i on on t r ai ni ng set ===

    kMeans======

    Number of i t er at i ons: 7Wi t hi n cl ust er sum of squared er r ors : 94. 93524745601803Mi ss i ng val ues gl obal l y r epl aced wi t h mean/ mode

    Cl ust er cent r oi ds:Cl ust er #

  • 8/18/2019 Rapport BI

    59/66

    At t r i but e Ful l Dat a 0 1( 96) ( 78) ( 18)=============================================================================pi b 17670552083. 333310561371794. 8718 48477000000nom_depar t ement Ai nAi sne Ai n

    Cl ust er ed I nst ances

    0 78 ( 81%)1 18 ( 19%)

    K=3

    kMeans======

    Number of i t er at i ons: 14Wi t hi n cl ust er sum of squared er r ors : 93. 52052880969502Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:

    Cl ust er#At t r i but e Ful l Dat a0 1 2

    ( 96)( 72) ( 21) ( 3)

    ==================================================================================================pi b 17670552083. 33339343222222. 2222 34178333333. 3333 101972000000nom_depar t ement Ai nAi sne Ai n Al pes- Mar i t i mes

    Cl ust er ed I nst ances

    0 72 ( 75%)1 21 ( 22%)

    2 3 ( 3%)

  • 8/18/2019 Rapport BI

    60/66

    K=4

    kMeans======

    Number of i t er at i ons: 14Wi t hi n cl ust er sum of squared er r ors : 93. 52052880969502Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:Cl ust er#At t r i but e Ful l Dat a0 1 2

    ( 96)( 72) ( 21) ( 3)==================================================================================================pi b 17670552083. 33339343222222. 2222 34178333333. 3333 101972000000nom_depar t ement Ai n

    Ai sne Ai n Al pes- Mar i t i mes

    K-4

    Cl ust er ed I nst ances

    0 72 ( 75%)1 21 ( 22%)2 3 ( 3%)

    K=6

    kMeans

    ======

    Number of i t er at i ons: 11Wi t hi n cl ust er sum of squared er r ors : 90. 27658195695966Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:

    Cl ust er#

  • 8/18/2019 Rapport BI

    61/66

    At t r i but e Ful l Dat a0 1 23 4 5

    ( 96)( 25) ( 27) ( 7)( 20) ( 3) ( 14)==================================================================================================================================================================================================pi b 17670552083. 33337723840000 14707333333. 3333 43717571428. 57143708000000 110004333333. 3333 28284500000nom_depar t ement Ai nAi sne Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?neFi ni st ?r e

    Cl ust er ed I nst ances

    0 25 ( 26%)1 27 ( 28%)2 7 ( 7%)3 20 ( 21%)4 3 ( 3%)5 14 ( 15%)

    K=7

    kMeans======

    Number of i t er at i ons: 12Wi t hi n cl ust er sum of squared er r ors : 89. 27102393425096Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:

    Cl ust er#At t r i but e Ful l Dat a0 1 23 4 56

    ( 96)( 23) ( 18) ( 7)( 20) ( 3) ( 11)( 14)

  • 8/18/2019 Rapport BI

    62/66

    pi b 17670552083. 33337465304347. 8261 12792333333. 333343717571428. 5714 3708000000110004333333. 3333 29749727272. 727318354714285. 7143nom_depar t ement Ai nAl l i er Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?neFi ni st ?r e Cal vados

    Cl ust er ed I nst ances

    0 23 ( 24%)1 18 ( 19%)2 7 ( 7%)3 20 ( 21%)4 3 ( 3%)5 11 ( 11%)6 14 ( 15%)

    And K=10

    kMeans======

    Number of i t er at i ons: 10Wi t hi n cl ust er sum of squared er r ors : 86. 26315726326193Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:

    Cl ust er#At t r i but e Ful l Dat a0 1 23 4 56 7 89

    ( 96)(16) (8) (7)(19) (3) (9)(12) (9) (5)(8 )======================================================================================================================pi b 17670552083. 33336707000000 13605250000 43717571428. 57143614684210. 5263 110004333333. 333331099777777. 7778 16968916666. 666711780333333. 3333 232170000008733875000nom_depar t ement Ai nAl l i er Ai n Al pes- Mar i t i mesAl pes- de- Haut e- Pr ovence Bouches- du- Rh?ne

  • 8/18/2019 Rapport BI

    63/66

    Haut e- Gar onne Cal vadosAi sne Fi ni st ?r e Char ent e

    Cl ust er ed I nst ances

    0 16 ( 17%)1 8 ( 8%)2 7 ( 7%)3 19 ( 20%)4 3 ( 3%)5 9 ( 9%)6 12 ( 13%)7 9 ( 9%)8 5 ( 5%)9 8 ( 8%)

    Marriages

    K=2

    kMeans======

    Number of i t er at i ons: 6Wi t hi n cl ust er sum of squared er r or s: 204. 48640349553273Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:Cl ust er #

    At t r i but e Ful l Dat a 01

    ( 200) ( 150)( 50)=======================================================================nbr decesdomi ci l l i e 5273. 02 3728. 21339907. 44nbr nai ssancesvi vant edomi ci l l i e 8232. 795 4988. 0617967nbrmar i ages 2738. 765 1862. 44675367. 72

  • 8/18/2019 Rapport BI

    64/66

    nom_depar t ement Val - d' Oi se Val - d' Oi seVal - de- Marne

    Cl ust er ed I nst ances

    0 150 ( 75%)1 50 ( 25%)

    K=4

    kMeans======

    Number of i t er at i ons: 12Wi t hi n cl ust er sum of squared er r or s: 196. 85641391574097Mi ss i ng val ues gl obal l y repl aced wi t h mean/ mode

    Cl ust er cent r oi ds:Cl ust er #

    At t r i but e Ful l Dat a 01 2 3

    ( 200) ( 38)( 16) ( 71) ( 75)=================================================================================================nbr decesdomi ci l l i e 5273. 02 8254. 842112836. 3125 4866. 5493 2533. 52nbr nai ssancesvi vant edomi ci l l i e 8232. 795 13968. 210526404. 75 6736. 1127 2867. 0267nbrmar i ages 2738. 765 5206. 15795695. 125 2441. 3239 1139. 5067nom_depar t ement Val - d' Oi se Val - d' Oi seVal - de- Mar ne Mar ne Haut e- Mar ne

    Cl ust er ed I nst ances

    0 38 ( 19%)1 16 ( 8%)2 71 ( 36%)3 75 ( 38%)..

    .K=10

  • 8/18/2019 Rapport BI

    65/66

  • 8/18/2019 Rapport BI

    66/66

    CONCLUSION

    This project certainly gave us a lot of trouble: some

    problems were encountered during conceiving the Data

    Warehouse and analyzing it, however, these problems

    have been overcome and this is mainly thanks to the

    support and assistance from members of the team.

    However, this project allowed us to highlight the fact that

    teamwork is the cornerstone of every labor.

    Finally, we greatly appreciate the opportunity that was

    given to us, since we could address issues of knowledge,

    skills, adaptability, context and values.


Recommended