Date post: | 27-Feb-2018 |
Category: |
Documents |
Upload: | venkatesh-gardas |
View: | 213 times |
Download: | 0 times |
of 69
7/25/2019 Datamining With Big Data_Siva
1/69
ABSTRACT
Big Data concern large-volume, complex, growing data sets with multiple,
autonomous sources. With the fast development of networking, data storage,and the data collection capacity, Big Data are now rapidly expanding in all
science and engineering domains, including physical, biological and biomedical
sciences. This paper presents a !"# theorem that characteri$es the features of
the Big Data revolution, and proposes a Big Data processing model, from the
data mining perspective. This data-driven model involves demand-driven
aggregation of information sources, mining and analysis, user interestmodelling, and security and privacy considerations. We analyse the challenging
issues in the data-driven model and also in the Big Data revolution.
1 .INTRODUCTION
Introduction Data Mining
1
7/25/2019 Datamining With Big Data_Siva
2/69
%tructure of Data &ining
'enerally, data mining (sometimes called data or knowledge discovery) is the
process of analy$ing data from different perspectives and summari$ing it into
useful information - information that can be used to increase revenue, cuts costs,
or both. Data mining software is one of a number of analytical tools for
analy$ing data. *t allows users to analy$e data from many different dimensions
or angles, categori$e it, and summari$e the relationships identified. Technically,
data mining is the process of finding correlations or patterns among do$ens of
fields in large relational databases.
Data Mining Works
While large-scale information technology has been evolving separate
transaction and analytical systems, data mining provides the link between the
two. Data mining software analy$es relationships and patterns in storedtransaction data based on open-ended user +ueries. %everal types of analytical
software are available statistical, machine learning, and neural networks.
Generall! an o" "our t#es o" relations$i#s are soug$t%
Classes %tored data is used to locate data in predetermined groups. or
example, a restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order. This
2
7/25/2019 Datamining With Big Data_Siva
3/69
information could be used to increase traffic by having daily specials.
Clusters Data items are grouped according to logical relationships or
consumer preferences. or example, data can be mined to identify marketsegments or consumer affinities.
Associations Data can be mined to identify associations. The beer-diaper
example is an example of associative mining.
Se&uential #atterns Data is mined to anticipate behavior patterns and
trends. or example, an outdoor e+uipment retailer could predict the
likelihood of a backpack being purchased based on a consumers purchase
of sleeping bags and hiking shoes.
Data 'ining consists o" "i(e 'a)or ele'ents%
/) #xtract, transform, and load transaction data onto the data warehouse
system.
0) %tore and manage the data in a multidimensional database system.
1) 2rovide data access to business analysts and information technology
professionals.
3) !naly$e the data by application software.
4) 2resent the data in a useful format, such as a graph or table.
Di""erent le(els o" analsis are a(aila*le%
Arti"icial neural net+orks 5on-linear predictive models that learn
3
7/25/2019 Datamining With Big Data_Siva
4/69
through training and resemble biological neural networks in structure.
Genetic algorit$'s 6ptimi$ation techni+ues that use process such as
genetic combination, mutation, and natural selection in a design based on
the concepts of natural evolution.
Decision trees Tree-shaped structures that represent sets of decisions.
These decisions generate rules for the classification of a dataset. %pecific
decision tree methods include "lassification and 7egression Trees
("!7T) and "hi %+uare !utomatic *nteraction Detection ("!*D).
"!7T and "!*D are decision tree techni+ues used for classification of
a dataset. They provide a set of rules that you can apply to a new
(unclassified) dataset to predict which records will have a given outcome.
"!7T segments a dataset by creating 0-way splits while "!*D
segments using chi s+uare tests to create multi-way splits. "!7T
typically re+uires less data preparation than "!*D.
Nearest neig$*or 'et$od ! techni+ue that classifies each record in a
dataset based on a combination of the classes of the krecord(s) most
similar to it in a historical dataset (where k8/). %ometimes called the k-
nearest neighbor techni+ue.
Rule induction The extraction of useful if-then rules from data based on
statistical significance.
Data (isuali,ation The visual interpretation of complex relationships in
multidimensional data. 'raphics tools are used to illustrate data
relationships.
C$aracteristics o" Data Mining%
4
7/25/2019 Datamining With Big Data_Siva
5/69
-arge &uantities o" data The volume of data so great it has to be
analy$ed by automated techni+ues e.g. satellite information, credit card
transactions etc.
Nois! inco'#lete data *mprecise data is the characteristic of all data
collection.
Co'#le data structure conventional statistical analysis not possible
/eterogeneous data stored in legac sste's
Bene"its o" Data Mining%
/) *t9s one of the most effective services that are available today. With the
help of data mining, one can discover precious information about the
customers and their behavior for a specific set of products and evaluate
and analy$e, store, mine and load data related to them
0) !n analytical "7& model and strategic business related decisions can be
made with the help of data mining as it helps in providing a complete
synopsis of customers
1) !n endless number of organi$ations have installed data mining pro:ects
and it has helped them see their own companies make an unprecedented
improvement in their marketing strategies ("ampaigns)
3) Data mining is generally used by organi$ations with a solid customer
focus. or its flexible nature as far as applicability is concerned is being
used vehemently in applications to foresee crucial data including industry
analysis and consumer buying behaviors
4) ast paced and prompt access to data along with economic processing
techni+ues have made data mining one of the most suitable services that a
company seek
5
7/25/2019 Datamining With Big Data_Siva
6/69
Ad(antages o" Data Mining%
1. Marketing 0 Retail%
Data mining helps marketing companies build models based on historical
data to predict who will respond to the new marketing campaigns such as direct
mail, online marketing campaign;etc. Through the results, marketers will have
appropriate approach to sell profitable products to targeted customers.
Data mining brings a lot of benefits to retail companies in the same way asmarketing. Through market basket analysis, a store can have an appropriate
production arrangement in a way that customers can buy fre+uent buying
products together with pleasant. *n addition, it also helps the retail companies
offer certain discounts for particular products that will attract more customers.
. 2inance 0 Banking
Data mining gives financial institutions information about loan information
and credit reporting. By building a model from historical customer9s data, the
bank and financial institution can determine good and bad loans. *n addition,
data mining helps banks detect fraudulent credit card transactions to protect
credit card9s owner.
3. Manu"acturing
By applying data mining in operational engineering data, manufacturers can
detect faulty e+uipments and determine optimal control parameters. or
example semi-conductor manufacturers has a challenge that even the conditions
of manufacturing environments at different wafer production plants are similar,
the +uality of wafer are lot the same and some for unknown reasons even has
defects. Data mining has been applying to determine the ranges of control
parameters that lead to the production of golden wafer. Then those optimal
6
7/25/2019 Datamining With Big Data_Siva
7/69
control parameters are used to manufacture wafers with desired +uality.
4. Go(ern'ents
Data mining helps government agency by digging and analy$ing records offinancial transaction to build patterns that can detect money laundering or
criminal activities.
5. -a+ en"orce'ent%
Data mining can aid law enforcers in identifying criminal suspects as well as
apprehending these criminals by examining trends in location, crime type,habit, and other patterns of behaviors.
6. Researc$ers%
Data mining can assist researchers by speeding up their data analy$ing
process< thus, allowing those more time to work on other pro:ects.
-IT7RATUR7 SUR879
7
7/25/2019 Datamining With Big Data_Siva
8/69
1: Algorit$'s "or Mining t$e 7(olution o" Conser(ed Relational States in
Dna'ic Net+orks!
AUT/ORS% 7. !hmed and '. =arypisDynamic networks have recently being recogni$ed as a powerful abstraction to
model and represent the temporal changes and dynamic aspects of the data
underlying many complex systems. %ignificant insights regarding the stable
relational patterns among the entities can be gained by analy$ing temporal
evolution of the complex entity relations. This can help identify the transitions
from one conserved state to the next and may provide evidence to the existence
of external factors that are responsible for changing the stable relational patterns
in these networks. This paper presents a new data mining method that analy$es
the time-persistent relations or states between the entities of the dynamic
networks and captures all maximal non-redundant evolution paths of the stable
relational states. #xperimental results based on multiple datasets from real-
world applications show that the method is efficient and scalable.
: No(el A##roac$es to Cra+ling I'#ortant ;ages 7arl
AUT/ORS%&.. !lam, >.W. a, and %.=. ?ee
Web crawlers are essential to many Web applications, such as Web search
engines, Web archives, and Web directories, which maintain Web pages in their
local repositories. *n this paper, we study the problem of crawl scheduling that
biases crawl ordering toward important pages. We propose a set of crawling
algorithms for effective and efficient crawl ordering by prioriti$ing important
pages with the well-known 2age7ank as the importance metric. *n order to
score @7?s, the proposed algorithms utili$e various features, including partial
link structure, inter-host links, page titles, and topic relevance. We conduct a
large-scale experiment using publicly available data sets to examine the effect
of each feature on crawl ordering and evaluate the performance of many
algorithms. The experimental results verify the efficacy of our schemes. *n
particular, compared with the representative 7ank &ass crawler, the 27-title-
host algorithm reduces computational overhead by a factor as great as three in
8
7/25/2019 Datamining With Big Data_Siva
9/69
running time while improving effectiveness by 4 A in cumulative 2age7ank
3: Identi"ing In"luential and Susce#ti*le Me'*ers o" Social Net+orks
AUT/ORS% %. !ral and D. Walker
*dentifying social influence in networks is critical to understanding how
behaviors spread. We present a method that uses in vivo randomi$ed
experimentation to identify influence and susceptibility in networks while
avoiding the biases inherent in traditional estimates of social contagion.
#stimation in a representative sample of /.1 million acebook users showed that
younger users are more susceptible to influence than older users, men are more
influential than women, women influence men more than they influence other
women, and married individuals are the least susceptible to influence in the
decision to adopt the product offered. !nalysis of influence and susceptibility
together with network structure revealed that influential individuals are less
susceptible to influence than noninfluential individuals and that they cluster in
the network while susceptible individuals do not, which suggests that influential
people with influential friends may be instrumental in the spread of this productin the network.
4: Big ;ri(ac% ;rotecting Con"identialit in Big Data
AUT/ORS% !. &achanava::hala and >.2. 7eiter
! tremendous amount of data about individuals e.g., demographic
information, internet activity, energy usage, communication patterns and social
9
7/25/2019 Datamining With Big Data_Siva
10/69
interactions are being collected and analy$ed by many national statistical
agencies, survey organi$ations, medical centers, and Web and social networking
companies. Wide dissemination of microdata (data at the granularity of
individuals) facilitates advances in science and public policy, helps citi$ens to
learn about their societies, and enables students to develop skills at data
analysis. 6ften, however, data producers cannot release microdata as collected,
because doing so could reveal data sub:ects identities or values of sensitive
attributes. ailing to protect confidentiality (when promised) is unethical and
can cause harm to data sub:ects and the data provider. *t even may be illegal,
especially in government and research settings. or example, if one reveals
confidential data covered by the @. %. "onfidential *nformation 2rotection and
%tatistical #fficiency !ct, one is sub:ect to a maximum of C04, in fines and
a five year prison term.
5: Anal,ing Collecti(e Be$a(ior "ro' Blogs Using S+ar' Intelligence
AUT/ORS%%. Baner:ee and 5. !garwal
With the rapid growth of the availability and popularity of interpersonal andbehavior-rich resources such as blogs and other social media avenues, emerging
opportunities and challenges arise as people now can, and do, actively use
computational intelligence to seek out and understand the opinions of others.
The study of collective behavior of individuals has implications to business
intelligence, predictive analytics, customer relationship management, and
examining online collective action as manifested by various flash mobs, the!rab %pring (0//) and other such events. *n this article, we introduce a nature-
inspired theory to model collective behavior from the observed data on blogs
using swarm intelligence, where the goal is to accurately model and predict the
future behavior of a large population after observing their interactions during a
training phase. %pecifically, an ant colony optimi$ation model is trained with
behavioral trend from the blog data and is tested over real-world blogs.
2romising results were obtained in trend prediction using ant colony based
10
7/25/2019 Datamining With Big Data_Siva
11/69
pheromone classier and "* statistical measure. We provide empirical
guidelines for selecting suitable parameters for the model, conclude with
interesting observations, and envision future research directions.
. S9ST7M STUD9
27ASIBI-IT9 STUD9
The feasibility of the pro:ect is analy$ed in this phase and business
11
7/25/2019 Datamining With Big Data_Siva
12/69
proposal is put forth with a very general plan for the pro:ect and some cost
estimates. During system analysis the feasibility study of the proposed system is
to be carried out. This is to ensure that the proposed system is not a burden to
the company. or feasibility analysis, some understanding of the ma:or
re+uirements for the system is essential.
Three key considerations involved in the feasibility analysis are
#"656&*"!? #!%*B*?*TE
T#"5*"!? #!%*B*?*TE
%6"*!? #!%*B*?*TE
7CONOMICA- 27ASIBI-IT9
This study is carried out to check the economic impact that the system
will have on the organi$ation. The amount of fund that the company can pour
into the research and development of the system is limited. The expenditures
must be :ustified. Thus the developed system as well within the budget and this
was achieved because most of the technologies used are freely available. 6nly
the customi$ed products had to be purchased.
T7C/NICA- 27ASIBI-IT9
This study is carried out to check the technical feasibility, that is, the
technical re+uirements of the system. !ny system developed must not have a
high demand on the available technical resources. This will lead to high
demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest
re+uirement, as only minimal or null changes are re+uired for implementing this
system.
SOCIA- 27ASIBI-IT9
The aspect of study is to check the level of acceptance of the system by
the user. This includes the process of training the user to use the system
12
7/25/2019 Datamining With Big Data_Siva
13/69
efficiently. The user must not feel threatened by the system, instead must accept
it as a necessity. The level of acceptance by the users solely depends on the
methods that are employed to educate the user about the system and to make
him familiar with it. is level of confidence must be raised so that he is also
able to make some constructive criticism, which is welcomed, as he is the final
user of the system.
3. S9ST7M R7
7/25/2019 Datamining With Big Data_Siva
14/69
&onitor /4 F'! "olour.
&ouse ?ogitech.
7am 4/0 &b.
SO2TWAR7 R7
7/25/2019 Datamining With Big Data_Siva
15/69
6b:ect oriented
2ortable
Distributed
igh performance
*nterpreted
&ultithreaded
7obust
Dynamic
%ecure
With most programming languages, you either compile or interpret a program
so that you can run it on your computer. The >ava programming language is
unusual in that a program is both compiled and interpreted. With the compiler,
first you translate a program into an intermediate language called Java byte
codesKthe platform-independent codes interpreted by the interpreter on the
>ava platform. The interpreter parses and runs each >ava byte code instruction
on the computer. "ompilation happens :ust once< interpretation occurs each time
the program is executed. The following figure illustrates how this works.
Eou can think of >ava byte codes as the machine code instructions for the
Java Virtual Machine (>ava F&). #very >ava interpreter, whether it9s a
development tool or a Web browser that can run applets, is an implementation
of the >ava F&. >ava byte codes help make Lwrite once, run anywhereM
15
7/25/2019 Datamining With Big Data_Siva
16/69
possible. Eou can compile your program into byte codes on any platform that
has a >ava compiler. The byte codes can then be run on any implementation of
the >ava F&. That means that as long as a computer has a >ava F&, the same
program written in the >ava programming language can run on Windows 0,
a %olaris workstation, or on an i&ac.
T$e =a(a ;lat"or'
! platform is the hardware or software environment in which a
program runs. We9ve already mentioned some of the most popular
platforms like Windows 0, ?inux, %olaris, and &ac6%. &ost
platforms can be described as a combination of the operating system and
hardware. The >ava platform differs from most other platforms in that it9s
a software-only platform that runs on top of other hardware-based
platforms.
The Java platform has two components:
TheJava Virtual Machine(>ava F&)
TheJava Application Programming Interface(>ava !2*)
Eou9ve already been introduced to the >ava F&. *t9s the base for the
>ava platform and is ported onto various hardware-based platforms.
16
7/25/2019 Datamining With Big Data_Siva
17/69
The Java API is a large collection of readymade software
components that provide many useful capabilities! such as graphical user
interface "#$I% widgets& The Java API is grouped into libraries of related
classes and interfaces' these libraries are known as packages& The ne(t
section! )hat *an Java Technology +o, -ighlights what functionality
some of the packages in the Java API provide&
The following figure depicts a program that.s running on the Java
platform& As the figure shows! the Java API and the virtual machine
insulate the program from the hardware&
5ative code is code that after you compile it, the compiled code runs on a
specific hardware platform. !s a platform-independent environment, the >ava
platform can be a bit slower than native code. owever, smart compilers, well-
tuned interpreters, and :ust-in-time byte code compilers can bring performance
close to that of native code without threatening portability.
W$at Can =a(a Tec$nolog Do>
The most common types of programs written in the >ava programming
language are appletsand applications. *f you9ve surfed the Web, you9re
probably already familiar with applets. !n applet is a program that
adheres to certain conventions that allow it to run within a >ava-enabled
browser.
-owever! the Java programming language is not /ust for writing cute!
entertaining applets for the )eb& The generalpurpose! highlevel Java
17
7/25/2019 Datamining With Big Data_Siva
18/69
programming language is also a powerful software platform& $sing the
generous API! you can write many types of programs&
An application is a standalone program that runs directly on the Javaplatform& A special kind of application known as a server serves and
supports clients on a network& 0(amples of servers are )eb servers!
pro(y servers! mail servers! and print servers& Another speciali1ed
program is a servlet& A servlet can almost be thought of as an applet that
runs on the server side& Java 2ervlets are a popular choice for building
interactive web applications! replacing the use of *#I scripts& 2ervlets
are similar to applets in that they are runtime e(tensions of applications&
Instead of working in browsers! though! servlets run within Java )eb
servers! configuring or tailoring the server&
-ow does the API support all these kinds of programs, It does so with
packages of software components that provides a wide range of
functionality& 0very full implementation of the Java platform gives you
the following features:
T$e essentials 6b:ects, strings, threads, numbers, input and
output, data structures, system properties, date and time, and so on.
A##lets The set of conventions used by applets.
Net+orking @7?s, T"2 (Transmission "ontrol 2rotocol), @D2(@ser Data gram 2rotocol) sockets, and *2 (*nternet 2rotocol)
addresses.
Internationali,ation elp for writing programs that can be
locali$ed for users worldwide. 2rograms can automatically adapt to
specific locales and be displayed in the appropriate language.
Securit Both low level and high level, including electronicsignatures, public and private key management, access control, and
18
7/25/2019 Datamining With Big Data_Siva
19/69
certificates.
So"t+are co'#onents =nown as >avaBeansT&, can plug into
existing component architectures.
O*)ect seriali,ation !llows lightweight persistence and
communication via 7emote ðod *nvocation (7&*).
=a(a Data*ase Connecti(it ?=DBCTM: 2rovides uniform access
to a wide range of relational databases.
The >ava platform also has !2*s for 0D and 1D graphics, accessibility,
servers, collaboration, telephony, speech, animation, and more. The
following figure depicts what is included in the >ava 0 %D=.
/o+ Will =a(a Tec$nolog C$ange M -i"e>
We can9t promise you fame, fortune, or even a :ob if you learn the >avaprogramming language. %till, it is likely to make your programs better and
re+uires less effort than other languages. We believe that >ava technology will
help you do the following
Get started &uickl !lthough the >ava programming language is
a powerful ob:ect-oriented language, it9s easy to learn, especially
for programmers already familiar with " or "NN.
19
7/25/2019 Datamining With Big Data_Siva
20/69
Write less code "omparisons of program metrics (class counts,
method counts, and so on) suggest that a program written in the
>ava programming language can be four times smaller than the
same program in "NN.
Write *etter code The >ava programming language encourages
good coding practices, and its garbage collection helps you avoid
memory leaks. *ts ob:ect orientation, its >avaBeans component
architecture, and its wide-ranging, easily extendible !2* let you
reuse other people9s tested code and introduce fewer bugs.
De(elo# #rogra's 'ore &uickl Eour development time may be
as much as twice as fast versus writing the same program in "NN.
WhyO Eou write fewer lines of code and it is a simpler
programming language than "NN.
A(oid #lat"or' de#endencies +it$ 1@@ ;ure =a(a Eou can
keep your program portable by avoiding the use of libraries written
in other languages. The /A 2ure >avaT& 2roduct "ertification
2rogram has a repository of historical process manuals, white
papers, brochures, and similar materials online.
Write once! run an+$ere Because /A 2ure >ava programs
are compiled into machine-independent byte codes, they run
consistently on any >ava platform.
Distri*ute so"t+are 'ore easil Eou can upgrade applets easily
from a central server. !pplets take advantage of the feature of
allowing new classes to be loaded Lon the fly,M without
recompiling the entire program.
6DB"
&icrosoft 6pen Database "onnectivity (6DB") is a standardprogramming interface for application developers and database systems
20
7/25/2019 Datamining With Big Data_Siva
21/69
providers. Before 6DB" became a de facto standard for Windows programs to
interface with database systems, programmers had to use proprietary languages
for each database they wanted to connect to. 5ow, 6DB" has made the choice
of the database system almost irrelevant from a coding perspective, which is as
it should be. !pplication developers have much more important things to worry
about than the syntax that is needed to port their program from one database to
another when business needs suddenly change.
Through the 6DB" !dministrator in "ontrol 2anel, you can specify the
particular database that is associated with a data source that an 6DB"
application program is written to use. Think of an 6DB" data source as a door
with a name on it. #ach door will lead you to a particular database. or
example, the data source named %ales igures might be a %J? %erver database,
whereas the !ccounts 2ayable data source could refer to an !ccess database.
The physical database referred to by a data source can reside anywhere on the
?!5.
The 6DB" system files are not installed on your system by Windows P4.
7ather, they are installed when you setup a separate database application, such
as %J? %erver "lient or Fisual Basic 3.. When the 6DB" icon is installed in
"ontrol 2anel, it uses a file called 6DB"*5%T.D??. *t is also possible to
administer your 6DB" data sources through a stand-alone program called
6DB"!D&.#G#. There is a /Q-bit and a 10-bit version of this program andeach maintains a separate list of 6DB" data sources.
rom a programming perspective, the beauty of 6DB" is that the
application can be written to use the same set of function calls to interface with
any data source, regardless of the database vendor. The source code of the
application doesn9t change whether it talks to 6racle or %J? %erver. We only
21
7/25/2019 Datamining With Big Data_Siva
22/69
mention these two as an example. There are 6DB" drivers available for several
do$en popular database systems. #ven #xcel spreadsheets and plain text files
can be turned into data sources. The operating system uses the 7egistry
information written by 6DB" !dministrator to determine which low-level
6DB" drivers are needed to talk to the data source (such as the interface to
6racle or %J? %erver). The loading of the 6DB" drivers is transparent to the
6DB" application program. *n a clientHserver environment, the 6DB" !2*
even handles many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably
thinking there must be some catch. The only disadvantage of 6DB" is that it
isn9t as efficient as talking directly to the native database interface. 6DB" has
had many detractors make the charge that it is too slow. &icrosoft has always
claimed that the critical factor in performance is the +uality of the driver
software that is used. *n our humble opinion, this is true. The availability of
good 6DB" drivers has improved a great deal recently. !nd anyway, the
criticism about performance is somewhat analogous to those who said that
compilers would never match the speed of pure assembly language. &aybe not,
but the compiler (or 6DB") gives you the opportunity to write cleaner
programs, which means you finish sooner. &eanwhile, computers get faster
every year.
>DB"
*n an effort to set an independent database standard !2* for >ava< %un
&icrosystems developed >ava Database "onnectivity, or >DB". >DB" offers a
generic %J? database access mechanism that provides a consistent interface to a
variety of 7DB&%s. This consistent interface is achieved through the use of
22
7/25/2019 Datamining With Big Data_Siva
23/69
Lplug-inM database connectivity modules, or drivers. *f a database vendor wishes
to have >DB" support, he or she must provide the driver for each platform that
the database and >ava run on.
To gain a wider acceptance of >DB", %un based >DB"9s framework on
6DB". !s you discovered earlier in this chapter, 6DB" has widespread
support on a variety of platforms. Basing >DB" on 6DB" will allow vendors to
bring >DB" drivers to market much faster than developing a completely new
connectivity solution.
>DB" was announced in &arch of /PPQ. *t was released for a P day
public review that ended >une R, /PPQ. Because of user input, the final >DB"
v/. specification was released soon after.
The remainder of this section will cover enough information about >DB" for
you to know what it is about and how to use it effectively. This is by no means a
complete overview of >DB". That would fill an entire book.
>DB" 'oals
ew software packages are designed without goals in mind. >DB" is one
that, because of its many goals, drove the development of the !2*. These goals,
in con:unction with early reviewer feedback, have finali$ed the >DB" class
library into a solid framework for building database applications in >ava.
The goals that were set for >DB" are important. They will give you some
insight as to why certain classes and functionalities behave the way they do. The
eight design goals for >DB" are as follows
23
7/25/2019 Datamining With Big Data_Siva
24/69
/. %J? ?evel !2*
The designers felt that their main goal was to define a %J? interface for
>ava. !lthough not the lowest database interface level possible, it is at a low
enough level for higher-level tools and !2*s to be created. "onversely, it is
at a high enough level for application programmers to use it confidently.
!ttaining this goal allows for future tool vendors to LgenerateM >DB" code
and to hide many of >DB"9s complexities from the end user.
0. %J? "onformance
%J? syntax varies as you move from database vendor to database vendor.
*n an effort to support a wide variety of vendors, >DB" will allow any +uery
statement to be passed through it to the underlying database driver. This
allows the connectivity module to handle non-standard functionality in a
manner that is suitable for its users.
1. >DB" must be implemental on top of common database interfaces
The >DB" %J? !2* must LsitM on top of other common %J? level
!2*s. This goal allows >DB" to use existing 6DB" level drivers by theuse of a software interface. This interface would translate >DB" calls to
6DB" and vice versa.
3. 2rovide a >ava interface that is consistent with the rest of the >ava system
Because of >ava9s acceptance in the user community thus far, the
designers feel that they should not stray from the current design of the core
>ava system.4. =eep it simple
This goal probably appears in all software design goal listings. >DB" is
no exception. %un felt that the design of >DB" should be very simple,
allowing for only one method of completing a task per mechanism. !llowing
duplicate functionality only serves to confuse the users of the !2*.
Q. @se strong, static typing wherever possible
%trong typing allows for more error checking to be done at compile timeDB". owever, more complex
%J? statements should also be possible.
>ava ha two things a programming language and a platform. >ava is a
high-level programming language that is all of the following
%imple !rchitecture-neutral
6b:ect-oriented 2ortable
Distributed igh-performance
*nterpreted multithreaded
7obust Dynamic
%ecure
>ava is also unusual in that each >ava program is both compiled and
interpreted. With a compile you translate a >ava program into an
intermediate language called >ava byte codes the platform-independent
code instruction is passed and run on the computer.
"ompilation happens :ust once< interpretation occurs each time the
program is executed. The figure illustrates how this works.
25
7/25/2019 Datamining With Big Data_Siva
26/69
JavaProgram
Compilers
Interpreter
My Program
Eou can think of >ava byte codes as the machine code instructions
for the >ava Firtual &achine (>ava F&). #very >ava interpreter,
whether it9s a >ava development tool or a Web browser that can run
>ava applets, is an implementation of the >ava F&. The >ava F& can
also be implemented in hardware.
>ava byte codes help make Lwrite once, run anywhereM possible.
Eou can compile your >ava program into byte codes on my platformthat has a >ava compiler. The byte codes can then be run any
implementation of the >ava F&. or example, the same >ava program
can run Windows 5T, %olaris, and &acintosh.
Net+orking
TC;0I; stack
The T"2H*2 stack is shorter than the 6%* one
26
7/25/2019 Datamining With Big Data_Siva
27/69
T"2 is a connection-oriented protocol< @D2 (@ser Datagram
2rotocol) is a connectionless protocol.
*2 datagram9s
The *2 layer provides a connectionless and unreliable delivery
system. *t considers each datagram independently of the others. !ny
association between datagram must be supplied by the higher layers.
The *2 layer supplies a checksum that includes its own header. The
header includes the source and destination addresses. The *2 layer
handles routing through an *nternet. *t is also responsible for breaking
up large datagram into smaller ones for transmission and reassembling
them at the other end.
@D2
@D2 is also connectionless and unreliable. What it adds to *2 is a
checksum for the contents of the datagram and port numbers. These are
27
7/25/2019 Datamining With Big Data_Siva
28/69
used to give a clientHserver model - see later.
T"2
T"2 supplies logic to give a reliable connection-oriented protocol
above *2. *t provides a virtual circuit that two processes can use to
communicate.
*nternet addresses
*n order to use a service, you must be able to find it. The *nternet
uses an address scheme for machines so that they can be located. The
address is a 10 bit integer which gives the *2 address. This encodes a
network *D and more addressing. The network *D falls into various
classes according to the si$e of the network address.
5etwork address
"lass ! uses R bits for the network address with 03 bits left over for
other addressing. "lass B uses /Q bit network addressing. "lass " uses
03 bit network addressing and class D uses all 10.
%ubnet address
*nternally, the @5*G network is divided into sub networks. Building
// is currently on one sub network and uses /-bit addressing, allowing
/03 different hosts.
ost address
R bits are finally used for host addresses within our subnet. This
28
7/25/2019 Datamining With Big Data_Siva
29/69
places a limit of 04Q machines that can be on the subnet.
Total address
The 10 bit address is usually written as 3 integers separated by dots.
2ort addresses
! service exists on a host, and is identified by its port. This is a /Q
bit number. To send a message to a server, you send it to the port for
that service of the host that it is running on. This is not location
transparencyS "ertain of these ports are well known.
%ockets
! socket is a data structure maintained by the system to handle
network connections. ! socket is created using the call socket. *t returns
an integer that is like a file descriptor. *n fact, under Windows, this
handle can be used with 7ead ileand Write ilefunctions.
29
7/25/2019 Datamining With Big Data_Siva
30/69
#include
#include
intsocket(intfamily, inttype, intprotocol)ree"hart is a free /A >ava chart library that makes it easy for
developers to display professional +uality charts in their applications.
>ree"harts extensive feature set includes
! consistent and well-documented !2*, supporting a wide range of
chart typesree"hartava%cript is an ob:ect-oriented scripting language primarily used in client-sideinterfaces for web applications. !:ax (!synchronous >ava%cript and G&?) is a
32
7/25/2019 Datamining With Big Data_Siva
33/69
Web 0. techni+ue that allows changes to occur in a web page without the need
to perform a page refresh. >ava%cript toolkits can be leveraged to implement
!:ax-enabled components and functionality in web pages.
We* Ser(er and Client
Web %erver is a software that can process the client re+uest and send the
response back to the client. or example, !pache is one of the most widely used
web server. Web %erver runs on some physical machine and listens to client
re+uest on specific port.
! web client is a software that helps in communicating with the server. %ome of
the most widely used web clients are irefox, 'oogle "hrome, %afari etc. When
we re+uest something from server (through @7?), web client takes care of
creating a re+uest and sending it to server and then parsing the server response
and present it to the user.
/TM- and /TT;
Web %erver and Web "lient are two separate softwares, so there should be some
common language for communication. T&? is the common language between
server and client and stands for /yperText Markup -anguage.
Web server and client needs a common communication protocol, TT2
(/yperText Transfer ;rotocol) is the communication protocol between server
and client. TT2 runs on top of T"2H*2 communication protocol.%ome of the important parts of TT2 7e+uest are
/TT; Met$od action to be performed, usually '#T, 26%T, 2@T etc.
UR- 2age to access
2or' ;ara'eters similar to arguments in a :ava method, for example
user,password details from login page.
%ample TT2 7e+uest
33
7/25/2019 Datamining With Big Data_Siva
34/69
/
0
1
'#T Hirst%ervlet2ro:ectH:spsHhello.:sp TT2H/./
ost localhostRR
"ache-"ontrol no-cache
%ome of the important parts of TT2 7esponse are Status Code an integer to indicate whether the re+uest was success or
not. %ome of the well known status codes are 0 for success, 33 for 5ot
ound and 31 for !ccess orbidden.
Content T#e text, html, image, pdf etc. !lso known as &* type
Content actual data that is rendered by client and shown to user.
MIME Type or Content Type: If you see above sample -TTP response header!
it contains tag 3*ontentType4& It.s also called MIM0 type and server sends it
to client to let them know the kind of data it.s sending& It helps client in
rendering the data for user& 2ome of the mostly used mime types are te(t5html!
te(t5(ml! application5(ml etc&
Understanding UR-
$67 is acronym of $niversal 6esource 7ocator and it.s used to locate the
server and resource& 0very resource on the web has it.s own uni8ue address&
7et.s see parts of $67 with an e(ample&
http://localhost:8080/FirstServletProject/jsps/hello.jsp
http://9 This is the first part of $67 and provides the communication protocol
to be used in serverclient communication&
34
7/25/2019 Datamining With Big Data_Siva
35/69
localhost9 The uni8ue address of the server! most of the times it.s the hostname
of the server that maps to uni8ue IP address& 2ometimes multiple hostnamespoint to same IP addresses and web server virtual host takes care of sending
re8uest to the particular server instance&
80809 This is the port on which server is listening! it.s optional and if we don.t
provide it in $67 then re8uest goes to the default port of the protocol& Portnumbers to ;
for -TTP! ??= for -TTP2!
)eb servers are good for static contents -TM7 pages but they don.t know how
to generate dynamic content or how to save data into databases! so we need
another tool that we can use to generate dynamic content& There are severalprogramming languages for dynamic content like P-P! Python! 6uby on 6ails!
Java 2ervlets and J2Ps&
Java 2ervlet and J2Ps are server side technologies to e(tend the capability of
web servers by providing support for dynamic response and data persistence&
35
7/25/2019 Datamining With Big Data_Siva
36/69
We* Container
Tomcat is a web container! when a re8uest is made from *lient to web server! itpasses the re8uest to web container and it.s web container /ob to find the
correct resource to handle the re8uest "servlet or J2P% and then use the
response from the resource to generate the response and provide it to web
server& Then web server sends the response back to the client&
)hen web container gets the re8uest and if it.s for servlet then container
creates two b/ects -TTP2ervlet6e8uest and -TTP2ervlet6esponse& Then it
finds the correct servlet based on the $67 and creates a thread for the re8uest&
Then it invokes the servlet service"% method and based on the -TTP method
service"% method invokes do#et"% or doPost"% methods& 2ervlet methods
generate the dynamic page and write it to response& nce servlet thread is
complete! container converts the response to -TTP response and send it back to
client&
2ome of the important work done by web container are:
Co''unication Su##ort "ontainer provides easy way of
communication between web server and the servlets and >%2s. Because of
container, we don9t need to build a server socket to listen for any re+uest
from web server, parse the re+uest and generate response. !ll these
important and complex tasks are done by container and all we need to
focus is on our business logic for our applications.
-i"eccle and Resource Manage'ent "ontainer takes care of
managing the life cycle of servlet. "ontainer takes care of loading the
servlets into memory, initiali$ing servlets, invoking servlet methods and
destroying them. "ontainer also provides utility like >5D* for resource
36
7/25/2019 Datamining With Big Data_Siva
37/69
pooling and management.
Multit$reading Su##ort "ontainer creates new thread for every
re+uest to the servlet and when it9s processed the thread dies. %o servlets
are not initiali$ed for each re+uest and saves time and memory.
=S; Su##ort >%2s doesn9t look like normal :ava classes and web
container provides support for >%2. #very >%2 in the application is
compiled by container and converted to %ervlet and then container
manages them like other servlets.
Miscellaneous Task Web container manages the resource pool, does
memory optimi$ations, run garbage collector, provides security
configurations, support for multiple applications, hot deployment and
several other tasks behind the scene that makes our life easier.
We* A##lication Director Structure
Java )eb Applications are packaged as )eb Archive ")A6% and it has adefined structure& Bou can e(port above dynamic web pro/ect as )A6 file and
un1ip it to check the hierarchy& It will be something like below image&
De#lo'ent Descri#tor
we.!"lfile is the deployment descriptor of the web application and contains
37
7/25/2019 Datamining With Big Data_Siva
38/69
mapping for servlets "prior to =&%! welcome pages! security configurations!
session timeout settings etc&
Thats all for the /ava web application startup tutorial! we will e(plore 2ervletsand J2Ps more in future posts&
MS
7/25/2019 Datamining With Big Data_Siva
39/69
A relational database stores data in separate tables rather than putting
all the data in one big storeroom& The database structures are organi1ed
into physical files optimi1ed for speed& The logical model! with ob/ects
such as databases! tables! views! rows! and columns! offers a fle(ible
programming environment& Bou set up rules governing the relationships
between different data fields! such as onetoone! onetomany! uni8ue!
re8uired or optional! and 3pointers4 between different tables& The
database enforces these rules! so that with a welldesigned database!
your application never sees inconsistent! duplicate! orphan! outofdate!
or missing data&
The 2C7 part of 3My2C74 stands for 32tructured Cuery 7anguage4&
2C7 is the most common standardi1ed language used to access
databases& +epending on your programming environment! you might
enter 2C7 directly "for e(ample! to generate reports%! embed 2C7
statements into code written in another language! or use a language
specific API that hides the 2C7 synta(&
2C7 is defined by the AD2I5I2 2C7 2tandard& The 2C7 standard has
been evolving since ;E>F and several versions e(ist& In this manual!
32C7Ethe standard released in ;EEE! and 32C7:
7/25/2019 Datamining With Big Data_Siva
40/69
and use it without paying anything& If you wish! you may study the source
code and change it to suit your needs& The My2C7 software uses the #P7
"#D$ #eneral Public 7icense%! http:55www&fsf&org5licenses5! to define
what you may and may not do with the software in different situations& If
you feel uncomfortable with the #P7 or need to embed My2C7 code into
a commercial application! you can buy a commercially licensed version
from us& 2ee the My2C7 7icensing verview for more information
"http:55www&mys8l&com5company5legal5licensing5%&
The MyS#$ *ataase Server is very 'ast+ reliale+ scalale+ an% easy to
)se.
If that is what you are looking for! you should give it a try& My2C7 2erver
can run comfortably on a desktop or laptop! alongside your other
applications! web servers! and so on! re8uiring little or no attention& Ifyou dedicate an entire machine to My2C7! you can ad/ust the settings to
take advantage of all the memory! *P$ power! and I5 capacity
available& My2C7 can also scale up to clusters of machines! networked
together&
Bou can find a performance comparison of My2C7 2erver with other
database managers on our benchmark page&
My2C7 2erver was originally developed to handle large databases much
faster than e(isting solutions and has been successfully used in highly
demanding production environments for several years& Although under
constant development! My2C7 2erver today offers a rich and useful set of
40
7/25/2019 Datamining With Big Data_Siva
41/69
functions& Its connectivity! speed! and security make My2C7 2erver
highly suited for accessing databases on the Internet&
MyS#$ Server wor,s in client/server or e"e%%e% syste"s.
The My2C7 +atabase 2oftware is a client5server system that consists of a
multithreaded 2C7 server that supports different backends! several
different client programs and libraries! administrative tools! and a wide
range of application programming interfaces "APIs%&
)e also provide My2C7 2erver as an embedded multithreaded library
that you can link into your application to get a smaller! faster! easierto
manage standalone product&
- lar&e a"o)nt o' contri)te% MyS#$ so'tware is availale.
My2C7 2erver has a practical set of features developed in close
cooperation with our users& It is very likely that your favorite application
or language supports the My2C7 +atabase 2erver&
The official way to pronounce 3My2C74 is 3My 0ss Cue 0ll4 "not 3my
se8uel4%! but we do not mind if you pronounce it as 3my se8uel4 or in some
other locali1ed way&
41
7/25/2019 Datamining With Big Data_Siva
42/69
S9ST7M D7SIGN D787-O;M7NT
S9ST7M ARC/IT7CTUR7%
DATA 2-OW DIAGRAM%
/. The DD is also called as bubble chart. *t is a simple graphical formalism
that can be used to represent a system in terms of input data to the
system, various processing carried out on this data, and the output data is
42
7/25/2019 Datamining With Big Data_Siva
43/69
Manager Master
tweeteraccount
Verify account
1.2.1
Location
1.2.2
Hash tags
1.2.3
2.3
Manager Master
Formingclusterreports
Tweet count
1.2.4
t
Managers Master
Location infoa
generated by this system.
0. The data flow diagram (DD) is one of the most important modeling
tools. *t is used to model the system components. These components are
the system process, the data used by the process, an external entity that
interacts with the system and the information flows in the system.
1. DD shows how the information moves through the system and how it is
modified by a series of transformations. *t is a graphical techni+ue that
depicts information flow and the transformations that are applied as data
moves from input to output.
4. DD is also known as bubble chart. ! DD may be used to represent a
system at any level of abstraction. DD may be partitioned into levels
that represent increasing information flow and functional detail.
43
7/25/2019 Datamining With Big Data_Siva
44/69
UM- DIAGRAMS
@&? stands for @nified &odeling ?anguage. @&? is a standardi$ed
general-purpose modeling language in the field of ob:ect-oriented software
engineering. The standard is managed, and was created by, the 6b:ect
&anagement 'roup.
The goal is for @&? to become a common language for creating models
of ob:ect oriented computer software. *n its current form @&? is comprised of
two ma:or components a &eta-model and a notation. *n the future, some form
of method or process may also be added to< or associated with, @&?.
The @nified &odeling ?anguage is a standard language for specifying,
Fisuali$ation, "onstructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The @&? represents a collection of best engineering practices that have
proven successful in the modeling of large and complex systems.
The @&? is a very important part of developing ob:ects oriented
software and the software development process. The @&? uses mostly
graphical notations to express the design of software pro:ects.
GOA-S%
The 2rimary goals in the design of the @&? are as follows
/. 2rovide users a ready-to-use, expressive visual modeling ?anguage so
that they can develop and exchange meaningful models.
0. 2rovide extendibility and speciali$ation mechanisms to extend the core
concepts.
1. Be independent of particular programming languages and development
process.
44
7/25/2019 Datamining With Big Data_Siva
45/69
3. 2rovide a formal basis for understanding the modeling language.
4. #ncourage the growth of 66 tools market.
Q. %upport higher level development concepts such as collaborations,
frameworks, patterns and components.
I. *ntegrate best practices.
US7 CAS7 DIAGRAM%! use case diagram in the @nified &odeling ?anguage (@&?) is a type
45
7/25/2019 Datamining With Big Data_Siva
46/69
of behavioral diagram defined by and created from a @se-case analysis. *ts
purpose is to present a graphical overview of the functionality provided by a
system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. 7oles of the
actors in the system can be depicted.
C-ASS DIAGRAM%
*n software engineering, a class diagram in the @nified &odeling ?anguage
(@&?) is a type of static structure diagram that describes the structure of a
system by showing the systems classes, their attributes, operations (or
methods), and the relationships among the classes. *t explains which class
contains information.
46
7/25/2019 Datamining With Big Data_Siva
47/69
S7
7/25/2019 Datamining With Big Data_Siva
48/69
ACTI8IT9 DIAGRAM%
!ctivity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. *n the
@nified &odeling ?anguage, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system. !n
activity diagram shows the overall flow of control.
48
7/25/2019 Datamining With Big Data_Siva
49/69
5. SCR77N -A9OUT
49
7/25/2019 Datamining With Big Data_Siva
50/69
50
7/25/2019 Datamining With Big Data_Siva
51/69
51
7/25/2019 Datamining With Big Data_Siva
52/69
52
7/25/2019 Datamining With Big Data_Siva
53/69
53
7/25/2019 Datamining With Big Data_Siva
54/69
6. S9ST7M T7STING
The purpose of testing is to discover errors. Testing is the process of
trying to discover every conceivable fault or weakness in a work product. *t
provides a way to check the functionality of components, sub assemblies,
assemblies andHor a finished product *t is the process of exercising software
with the intent of ensuring that the
%oftware system meets its re+uirements and user expectations and does not fail
in an unacceptable manner. There are various types of test. #ach test type
addresses a specific testing re+uirement.
TYPES OF TESTS
Unit testing
@nit testing involves the design of test cases that validate that the internal
54
7/25/2019 Datamining With Big Data_Siva
55/69
program logic is functioning properly, and that program inputs produce valid
outputs. !ll decision branches and internal code flow should be validated. *t is
the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. @nit tests perform
basic tests at component level and test a specific business process, application,
andHor system configuration. @nit tests ensure that each uni+ue path of a
business process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.
Integration testing
*ntegration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and ismore concerned with the basic outcome of screens or fields. *ntegration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. *ntegration testing is specifically aimed at exposing the
problems that arise from the combination of components.
Functional test
unctional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical re+uirements, system
documentation, and user manuals.
unctional testing is centered on the following items
55
7/25/2019 Datamining With Big Data_Siva
56/69
Falid *nput identified classes of valid input must be accepted.
*nvalid *nput identified classes of invalid input must be re:ected.
unctions identified functions must be exercised.
6utput identified classes of application outputs must be
exercised.
%ystemsH2rocedures interfacing systems or procedures must be invoked.
6rgani$ation and preparation of functional tests is focused on re+uirements,
key functions, or special test cases. *n addition, systematic coverage pertaining
to identify Business process flows< data fields, predefined processes, and
successive processes must be considered for testing. Before functional testing is
complete, additional tests are identified and the effective value of current tests is
determined.
System Test
%ystem testing ensures that the entire integrated software system meetsre+uirements. *t tests a configuration to ensure known and predictable results.
!n example of system testing is the configuration oriented system integration
test. %ystem testing is based on process descriptions and flows, emphasi$ing
pre-driven process links and integration points.
White Box Testing White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at
least its purpose. *t is purpose. *t is used to test areas that cannot be reached
from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the
56
7/25/2019 Datamining With Big Data_Siva
57/69
inner workings, structure or language of the module being tested. Black box
tests, as most other kinds of tests, must be written from a definitive source
document, such as specification or re+uirements document, such as
specification or re+uirements document. *t is a testing in which the software
under test is treated, as a black box .you cannot LseeM into it. The test provides
inputs and responds to outputs without considering how the software works.
6.1 Unit Testing%
@nit testing is usually conducted as part of a combined code and unit test
phase of the software lifecycle, although it is not uncommon for coding and unit
testing to be conducted as two distinct phases.
Test strategy and approach
Field testin !ill "e #e$%&$'ed '(n)(ll* (nd %)n+ti&n(l tests !ill "e
!$itten in det(il,
Test objectives
All %ield ent$ies ')st !&$- #$e$l*,
P(es ')st "e (+ti.(ted %$&' t/e identi%ied lin-,
T/e ent$* s+$een 'ess(es (nd $es#&nses ')st n&t "e del(*ed,
Features to be tested
e$i%* t/(t t/e ent$ies ($e &% t/e +&$$e+t %&$'(t
N& d)#li+(te ent$ies s/&)ld "e (ll&!ed
All lin-s s/&)ld t(-e t/e )se$ t& t/e +&$$e+t #(e,
57
7/25/2019 Datamining With Big Data_Siva
58/69
6. Integration Testing
%oftware integration testing is the incremental integration testing of two
or more integrated software components on a single platform to produce failures
caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or one step up software
applications at the company level interact without error
Test Results% !ll the test cases mentioned above passed successfully. 5o
defects encountered.
6.3 Acceptance Testing
@ser !cceptance Testing is a critical phase of any pro:ect and re+uires
significant participation by the end user. *t also ensures that the system meets
the functional re+uirements.
Test Results% !ll the test cases mentioned above passed successfully. 5odefects encountered.
Test Case Report1
Use &ne te'#l(te %&$ e(+/ test +(seGENERAL INFORMATION
Test Stage: Unit Functionality
Interface
Performance Acceptance
Test Date: 09/09/2010 System Date, if 09/03/2015
58
7/25/2019 Datamining With Big Data_Siva
59/69
applicable:
Tester: Janardhan Test Case Nmber: 1
Test CaseDescripti!":
Unit testin focuses on !erifyin the effort on thesmallest unit of soft"are#module$ %he local datastructure is e&amined to ensure that the date stored
temporarily maintains its interity durin all steps in thealorithm's e&ecution$ (oundary conditions are tested toensure that the module operates properly at )oundariesesta)lished to limit or restrict processin$
Reslts: Pass*+,- Fail
INTROD#CTION
Re$ireme"t%s&t! be teste':
.ettin %"itter Account
R!les a"'Resp!"sibilities
:
.atherin the euirements of the Proect esinin and %estin$
Set #p(r!ce'res:
(y Installin clipse$
EN)IRONMENTAL NEEDS
*ar'+are: P4 "ith inimum 20.( 6ard is7 and 1.( A$
S!ft+are: 8indo"s P/2000: +racle: clipse$
TEST
Test Items a"'
Featres:
;ocationI and et"eet 4ount$
(r!ce'ralSteps:
If the User enters the ;ocation id it "ill )e redirected to another appropriatepae so that "e can confirm test is accepted$
Epecte'Reslts !fCase:
If the pae is redirected "e can confirm the result of this %est case issucceeded$
Test Case Report2
Use &ne te'#l(te %&$ e(+/ test +(seGENERAL INFORMATION
Test Stage: Unit Functionality
Interface
Performance Acceptance
Test Date: 09/09/2010 System Date, ifapplicable:
09/02/2015
Tester: Janardhan Test Case Nmber: 2
Test CaseDescripti!":
Unit testin focuses on !erifyin the effort on thesmallest unit of soft"are#module$ %he local datastructure is e&amined to ensure that the date storedtemporarily maintains its interity durin all steps in thealorithm's e&ecution$ (oundary conditions are tested toensure that the module operates properly at )oundariesesta)lished to limit or restrict processin$
Reslts: Pass*+,- FailINTROD#CTION
59
7/25/2019 Datamining With Big Data_Siva
60/69
Re$ireme"t%s&t! be teste':
After ettin
7/25/2019 Datamining With Big Data_Siva
61/69
characteristics of the Big Data are /) huge with heterogeneous and diverse data
sources, 0) autonomous with distributed and decentrali$ed control, and 1)
complex and evolving in data and knowledge associations. %uch combined
characteristics suggest that Big Data re+uire a Lbig mindM to consolidate data for
maximum values V0I.
To explore Big Data, we have analy$ed several challenges at the data, model,
and system levels. To support Big Data mining, high-performance computing
platforms are re+uired, which impose systematic designs to unleash the full
power of the Big Data. !t the data level, the autonomous information sources
and the variety of the data collection environments, often result in data with
complicated conditions, such as missingHuncertain values. *n other situations,
privacy concerns, noise, and errors can be introduced into the data, to produce
altered data copies. Developing a safe and sound information sharing protocol is
a ma:or challenge. !t the model level, the key challenge is to generate global
models by combining locally discovered patterns to form a unifying view. This
re+uires carefully designed algorithms to analy$e model correlations betweendistributed sites, and fuse decisions from multiple sources to gain a best model
out of the Big Data. !t the system level, the essential challenge is that a Big
Data mining framework needs to consider complex relationships between
samples, models, and data sources, along with their evolving changes with time
and other possible factors. ! system needs to be carefully designed so that
unstructured data can be linked through their complex relationships to formuseful patterns, and the growth of data volumes and item relationships should
help form legitimate patterns to predict the trend and future.
We regard Big Data as an emerging trend and the need for Big Data mining is
arising in all science and engineering domains. With Big Data technologies, we
will hopefully be able to provide most relevant and most accurate social sensing
feedback to better understand our society at real time. We can further stimulate
the participation of the public audiences in the data production circle for societal
61
7/25/2019 Datamining With Big Data_Siva
62/69
and economical events. The era of Big Data has arrived.
BIB-IOGRA;/9
V/ 7. !hmed and '. =arypis, L!lgorithms for &ining the #volution of
"onserved 7elational %tates in Dynamic 5etworks,M =nowledge and
*nformation %ystems, vol. 11, no. 1, pp. Q1-Q1, Dec. 0/0.
V0 &.. !lam, >.W. a, and %.=. ?ee, L5ovel !pproaches to "rawling
*mportant 2ages #arly,M =nowledge and *nformation %ystems, vol. 11, no. 1, pp
II-I13, Dec. 0/0.
V1 %. !ral and D. Walker, L*dentifying *nfluential and %usceptible &embers of
%ocial 5etworks,M %cience, vol. 11I, pp. 11I-13/, 0/0.
V3 !. &achanava::hala and >.2. 7eiter, LBig 2rivacy 2rotecting "onfidentiality
in Big Data,M !"& "rossroads, vol. /P, no. /, pp. 0-01, 0/0.
V4 %. Baner:ee and 5. !garwal, L!naly$ing "ollective Behavior from Blogs
@sing %warm *ntelligence,M =nowledge and *nformation %ystems, vol. 11, no.
62
7/25/2019 Datamining With Big Data_Siva
63/69
1, pp. 401-43I, Dec. 0/0.
VQ #. Birney, LThe &aking of #5"6D# ?essons for Big-Data 2ro:ects,M
5ature, vol. 3RP, pp. 3P-4/, 0/0.
VI >. Bollen, . &ao, and G. Xeng, LTwitter &ood 2redicts the %tock &arket,M
>. "omputational %cience, vol. 0, no. /, pp. /-R, 0//.
VR %. Borgatti, !. &ehra, D. Brass, and '. ?abianca, L5etwork !nalysis in the
%ocial %ciences,M %cience, vol. 101, pp. RP0-RP4, 0P.
VP >. Bughin, &. "hui, and >. &anyika, "louds, Big Data, and %mart !ssets
Ten Tech-#nabled Business Trends to Watch. &c=in%ey Juarterly, 0/.
V/ D. "entola, LThe %pread of Behavior in an 6nline %ocial 5etwork
#xperiment,M %cience, vol. 10P, pp. //P3-//PI, 0/.
V// #.E. "hang, . Bai, and =. Xhu, L2arallel !lgorithms for &ining ?arge-
%cale 7ich-&edia Data,M 2roc. /Ith !"& *nt9l "onf. &ultimedia, (&& 9P,)
pp. P/I-P/R, 0P.
V/0 7. "hen, =. %ivakumar, and . =argupta, L"ollective &ining of Bayesian5etworks from Distributed eterogeneous Data,M =nowledge and *nformation
%ystems, vol. Q, no. 0, pp. /Q3-/RI, 03.
V/1 E.-". "hen, W.-". 2eng, and %.-E. ?ee, L#fficient !lgorithms for *nfluence
&aximi$ation in %ocial 5etworks,M =nowledge and *nformation %ystems, vol.
11, no. 1, pp. 4II-Q/, Dec. 0/0.
63
7/25/2019 Datamining With Big Data_Siva
64/69
V/3 ".T. "hu, %.=. =im, E.!. ?in, E. Eu, '.7. Bradski, !.E. 5g, and =.
6lukotun, L&ap-7educe for &achine ?earning on &ulticore,M 2roc. 0th !nn.
"onf. 5eural *nformation 2rocessing %ystems (5*2% 9Q), pp. 0R/-0RR, 0Q.
V/4 '. "ormode and D. %rivastava, L!nonymi$ed Data 'eneration, &odels,
@sage,M 2roc. !"& %*'&6D *nt9l "onf. &anagement Data, pp. //4-//R,
0P.
V/Q %. Das, E. %ismanis, =.%. Beyer, 7. 'emulla, 2.>. aas, and >. &c2herson,
L7icardo *ntegrating 7 and adoop,M 2roc. !"& %*'&6D *nt9l "onf.
&anagement Data (%*'&6D 9/), pp. PRI-PPR. 0/.
V/I 2. Dewdney, 2. all, 7. %chili$$i, and >. ?a$io, LThe %+uare =ilometre
!rray,M 2roc. *###, vol. PI, no. R, pp. /3R0-/3PQ, !ug. 0P.
V/R 2. Domingos and '. ulten, L&ining igh-%peed Data %treams,M 2roc.%ixth !"& %*'=DD *nt9l "onf. =nowledge Discovery and Data &ining (=DD
9), pp. I/-R, 0.
V/P '. Duncan, L2rivacy by Design,M %cience, vol. 1/I, pp. //IR-//IP, 0I.
V0 B. #fron, L&issing Data, *mputation, and the Bootstrap,M >. !m. %tatistical!ssoc., vol. RP, no. 30Q, pp. 3Q1-3I4, /PP3.
V0/ !. 'hoting and #. 2ednault, Ladoop-&? !n *nfrastructure for the 7apid
*mplementation of 2arallel 7eusable !nalytics,M 2roc. ?arge-%cale &achine
?earning 2arallelism and &assive Data %ets Workshop (5*2% 9P), 0P.
V00 D. 'illick, !. aria, and >. De5ero, &ap7educe Distributed "omputing
64
7/25/2019 Datamining With Big Data_Siva
65/69
for &achine ?earning, Berkley, Dec. 0Q.
V01 &. elft, L'oogle @ses %earches to Track lu9s %pread,M The 5ew Eork
Times, httpHHwww.nytimes.comH0RH//H/0HtechnologyHinternetH/0flu.html.
0R.
V03 D. owe et al., LBig Data The uture of Biocuration,M 5ature, vol. 344,
pp. 3I-4, %ept. 0R.
V04 B. uberman, L%ociology of %cience Big Data Deserve a Bigger
!udience,M 5ature, vol. 3R0, p. 1R, 0/0.
V0Q L*B& What *s Big Data Bring Big Data to the #nterprise,M httpHH www-
/.ibm.comHsoftwareHdataHbigdataH, *B&, 0/0.
V0I !. >acobs, LThe 2athologies of Big Data,M "omm. !"&, vol. 40, no. R, pp.1Q-33, 0P.
V0R *. =opanas, 5. !vouris, and %. Daskalaki, LThe 7ole of Domain
=nowledge in a ?arge %cale Data &ining 2ro:ect,M 2roc. %econd ellenic "onf.
!* ðods and !pplications of !rtificial *ntelligence, *.2. Flahavas, ".D.
%pyropoulos, eds., pp. 0RR-0PP, 00.
V0P !. ?abrinidis and . >agadish, L"hallenges and 6pportunities with Big
Data,M 2roc. F?DB #ndowment, vol. 4, no. /0, 010-011, 0/0.
V1 E. ?indell and B. 2inkas, L2rivacy 2reserving Data &ining,M >. "ryptology,
vol. /4, no. 1, pp. /II-0Q, 00.
65
http://www.nytimes.com/2008/11/12/technology/internet/12flu.html.%202008http://www.nytimes.com/2008/11/12/technology/internet/12flu.html.%202008http://www.nytimes.com/2008/11/12/technology/internet/12flu.html.%202008http://www.nytimes.com/2008/11/12/technology/internet/12flu.html.%2020087/25/2019 Datamining With Big Data_Siva
66/69
V1/ W. ?iu and T. Wang, L6nline !ctive &ulti-ield ?earning for #fficient
#mail %pam iltering,M =nowledge and *nformation %ystems, vol. 11, no. /, pp.
//I-/1Q, 6ct. 0/0.
V10 >. ?orch, B. 2arno, >. &ickens, &. 7aykova, and >. %chiffman, L%horoud
#nsuring 2rivate !ccess to ?arge-%cale Data in the Data "enter,M 2roc. //th
@%#5*G "onf. ile and %torage Technologies (!%T 9/1), 0/1.
V11 D. ?uo, ". Ding, and . uang, L2aralleli$ation with &ultiplicative
!lgorithms for Big Data &ining,M 2roc. *### /0th *nt9l "onf. Data &ining, pp.
3RP-3PR, 0/0.
V13 >. &ervis, L@.%. %cience 2olicy !gencies 7ally to Tackle Big Data,M
%cience, vol. 11Q, no. QII, p. 00, 0/0.
V14 . &ichel, Low &any 2hotos !re @ploaded to lickr #very Day and&onthOM httpHHwww.flickr.comHphotosHfranckmichelHQR44/QPRRQH, 0/0.
V1Q T. &itchell, L&ining our 7eality,M %cience, vol. 10Q, pp. /Q33-/Q34, 0P.
V1I 5ature #ditorial, L"ommunity "leverness 7e+uired,M 5ature, vol. 344, no.
I0P, p. /, %ept. 0R.
V1R %. 2apadimitriou and >. %un, LDisco Distributed "o-"lustering with &ap-
7educe ! "ase %tudy Towards 2etabyte-%cale #nd-to-#nd &ining,M 2roc.
*### #ighth *nt9l "onf. Data &ining (*"D& 9R), pp. 4/0-40/, 0R.
V1P ". 7anger, 7. 7aghuraman, !. 2enmetsa, '. Bradski, and ". =o$yrakis,
L#valuating &ap7educe for &ulti-"ore and &ultiprocessor %ystems,M 2roc.
66
7/25/2019 Datamining With Big Data_Siva
67/69
*### /1th *nt9l %ymp. igh 2erformance "omputer !rchitecture (2"! 9I),
pp. /1-03, 0I.
V3 !. 7a:araman and >. @llman, &ining of &assive Data %ets. "ambridge
@niv. 2ress, 0//.
V3/ ". 7eed, D. Thompson, W. &a:id, and =. Wagstaff, L7eal Time &achine
?earning to ind ast Transient 7adio !nomalies ! %emi-%upervised !pproach
"ombining Detection and 7* #xcision,M 2roc. *nt9l !stronomical @nion %ymp.
Time Domain !stronomy, %ept. 0//.
V30 #. %chadt, LThe "hanging 2rivacy ?andscape in the #ra of Big Data,M
&olecular %ystems, vol. R, article Q/0, 0/0.
V31 >. %hafer, 7. !grawal, and &. &ehta, L%27*5T ! %calable 2arallel
"lassifier for Data &ining,M 2roc. 00nd F?DB "onf., /PPQ.
V33 !. da %ilva, 7. "hiky, and '. eYbrail, L! "lustering !pproach for
%ampling Data %treams in %ensor 5etworks,M =nowledge and *nformation
%ystems, vol. 10, no. /, pp. /-01, >uly 0/0.
V34 =. %u, . uang, G. Wu, and %. Xhang, L! ?ogical ramework for*dentifying Juality =nowledge from Different Data %ources,M Decision %upport
%ystems, vol. 30, no. 1, pp. /QI1-/QR1, 0Q.
V3Q LTwitter Blog, Dispatch from the Denver Debate,M httpHH
blog.twitter.comH0/0H/Hdispatch-from-denver-debate.html,6ct. 0/0.
V3I D. Wegener, &. &ock, D. !dranale, and %. Wrobel, LToolkit-Based igh-
67
7/25/2019 Datamining With Big Data_Siva
68/69
2erformance Data &ining of ?arge Data on &ap7educe "lusters,M 2roc. *nt9l
"onf. Data &ining Workshops (*"D&W 9P), pp. 0PQ-1/, 0P.
V3R ". Wang, %.%.&. "how, J. Wang, =. 7en, and W. ?ou, L2rivacy-
2reserving 2ublic !uditing for %ecure "loud %torageM *### Trans. "omputers,
vol. Q0, no. 0, pp. 1Q0-1I4, eb. 0/1.
V3P G. Wu and G. Xhu, L&ining with 5oise =nowledge #rror-!ware Data
&ining,M *### Trans. %ystems, &an and "ybernetics, 2art !, vol. 1R, no. 3, pp.
P/I-P10, >uly 0R.
V4 G. Wu and %. Xhang, L%ynthesi$ing igh-re+uency 7ules from Different
Data %ources,M *### Trans. =nowledge and Data #ng., vol. /4, no. 0, pp. 141-
1QI, &ar.H!pr. 01.
V4/ G. Wu, ". Xhang, and %. Xhang, LDatabase "lassification for &ulti-Database &ining,M *nformation %ystems, vol. 1, no. /, pp. I/- RR, 04.
V40 G. Wu, LBuilding *ntelligent ?earning Database %ystems,M !* &aga$ine,
vol. 0/, no. 1, pp. Q/-QI, 0.
V41 G. Wu, =. Eu, W. Ding, . Wang, and G. Xhu, L6nline eature %electionwith %treaming eatures,M *### Trans. 2attern !nalysis and &achine
*ntelligence, vol. 14, no. 4, pp. //IR-//P0, &ay 0/1.
V43 !. Eao, Low to 'enerate and #xchange %ecretes,M 2roc. 0Ith !nn. %ymp.
oundations "omputer %cience (6"%) "onf., pp. /Q0-/QI, /PRQ.
V44 &. Ee, G. Wu, G. u, and D. u, L!nonymi$ing "lassification Data @sing
68
7/25/2019 Datamining With Big Data_Siva
69/69
7ough %et Theory,M =nowledge-Based %ystems, vol. 31, pp. R0-P3, 0/1.
V4Q >. Xhao, >. Wu, G. eng, . Giong, and =. Gu, L*nformation 2ropagation in
6nline %ocial 5etworks ! Tie-%trength 2erspective,M =nowledge and
*nformation %ystems, vol. 10, no. 1, pp. 4RP-QR, %ept. 0/0.
V4I G. Xhu, 2. Xhang, G. ?in, and E. %hi, L!ctive ?earning rom %tream Data
@sing 6ptimal Weight "lassifier #nsemble,M *### Trans. %ystems, &an, and
"ybernetics, 2art B, vol. 3, no. Q, pp. /QI- /Q0/, Dec. 0/.