Post on 03-Apr-2018
transcript
7/28/2019 Modeler Server Performance, Optimization, And Sizing
1/16
Technical report
PASW
Modeler Server Performance,Optimization, and Sizing
SPSS is a registered trademar k and the other SPSS Inc. products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. 2009 SPSS Inc. All rights reserved. CSWP-0209
Table of contents
Introduction .......................................................................................................................... 2
High performance out-of-the-box ....................................................................................... 3
Scaling the data mining process with SPSS Predictive Enterprise Services .......................... 5
Performance optimization ..................................................................................................... 7
Advanced performance optimization................................................................................... 10
Scoping and sizing PASW Modeler Server ........................................................................... 12
Conclusion ......................................................................................................................... 16
About SPSS Inc. .................................................................................................................. 16
7/28/2019 Modeler Server Performance, Optimization, And Sizing
2/16
PASW ModelerServer Performance, Optimization, and Sizing
Introduction
Data mining offers organizations many benefits, including a more detailed view of their customers, along with a clearer view
of current conditions and deeper insight into future events. By choosing a high-performance data mining tool, organizations
can mine their data more efficiently and gain a significant return on investment (ROI). PASW Modeler*, the leading data mining
workbench from SPSS Inc., enables organizations to easily and quickly mine many types of data, including large datasets.
The result: more business value than other solutions can offer.
PASW Modeler uses a scalable, three-tiered architecture to improve modeling productivity and deployment when working with
large datasets. The PASW Modeler Client tier passes data mining processes to the PASW Modeler Server. Then PASW Modeler
Server** analyzes these tasks to determine which ones should be executed within the database. After the database processes
those tasks, it passes only the relevant aggregate or summary data to PASW Modeler Server. Since data pre-processing
typically 80-90 percent of the data mining effortoccurs in the database tier, users will accelerate modeling, maximize
resources, and minimize network traffic.
Data mining is an exploratory and interactive process requiring immediate feedback, so high-performance tools like PASW
Modeler Server are essential. PASW Modeler Server provides increased productivity and faster access to results. When
analytical results are deployed into operational systems, the impact of performance is even more significant because of high
data volumes and real-time constraints.
Data mining is a core process involved in predictive analytics, which combines advanced analytic techniques and decision
optimization to inform and direct decision making. The value of predictive analytics is that it gives your organization the ability
to act on the results, and PASW Modeler Servers high performance is crucial to timely action. This technical brief serves as
a guide for understanding and maximizing PASW Modeler Servers already high performance. It focuses on PASW Modeler
Servers out-of-the-box performance, scalability, and performance optimization, as well as its scoping and sizing requirements.
* PASW Modeler, formerly called Clementine, is part of SPSS Inc.s Predictive Analytics Software portfolio.
** PASW Modeler Server, formerly called Clementine Server, is part of SPSS Inc.s Predictive Analytics Software portfolio.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
3/16
High performance out-of-the-box
PASW Modeler Server has been designed and developed to provide high performance and scalability for all data mining tasks.
SQL generation and parallel processing, for example, are performed automatically. As a result, PASW Modeler users dont need
to make any changes to the way they work to get consistently high performance.
In our benchmark tests of PASW Modeler Server performance1, we measured the ability of PASW Modeler to carry out the
common tasks of model building, model scoring, and data preparation.
PASW ModelerServer Performance, Optimization, and Sizing 3
Figure 1: This stream was used in tests of model building performance.
Figure 1Model building: 16 million records in under five minutes
PASW Modeler Server was able to build a logistic
regression model from approximately 16 million records2
in less than five minutes (see Figure 1).
This dataset is larger than those typically used for model
building. Against a more modest-sized dataset of 500,000
records, all of the model types were built in less than two
minutes (see Figure 2).
PASW Modeler Server transforms a time-consuming
process into an iterative one and vastly reduces the time
required to build models and to find the best model.
Figure 2: The elapsed time taken to build a model usingdifferent algorithms3.
Figure 2
1 Test environment: 2 x Intel Xeon 3.6GHz (hyperthreaded), 8GB RAM, 36GB RAID 1 System disk, 440GB RAID 0 Data disk, Microsoft WindowsServer 2003 Enterprise x64 SP1, Microsoft SQL Server 2000 SP4, and Clementine 10.0.
2 21 fields used, mixture of data types.3 Neural network build time is affected by randomization in the selection of records to prevent overtraining.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
4/16
4 PASW ModelerServer Performance, Optimization, and Sizing
Figure 3: This stream was used in tests of model scoring performance.
Figure 3
Figure 4: The elapsed time taken to score a C&RT decision tree model.
Figure 4
Figure 5: This stream was used in tests of data preparation performance.
Figure 5
4 21 fields used, mixture of data types.
Model scoring: 32 million records in close to
eight minutes
In a test scoring records against a classification model
(see Figures 3 and 4), PASW Modeler Server accessed
data from a table of 32 million records4, scored the data
against a decision tree model, and wrote the scores to a
new database table in less than eight minutes.
This scoring was achieved at a sustained rate of close
to 65,000 records per second, equivalent to 225 million
records per hour.
Data preparation: 16 million customer
records processed against 42 million products
in eight minutes
Data mining is about more than model
building and scoring. A large part of the data
mining process involves preparing the data. As
seen in Figure 5, our tests of data preparation
involved the performance of multiple, common
data preparation steps, including joining
customer data to a product dataset of nearly
three times its size.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
5/16
However, with SPSS Predictive Enterprise Services,
organizations receive a complete, enterprise solution
to the problems of analytical asset and process
management. SPSS Predictive Enterprise Services uses
an advanced, service-oriented architecture to improve
the management of predictive models and related
analytical processes within your organizations business
operations. It extends PASW Modelers rapid model
development and deployment capabilities to create
more manageable predictive analytics solutions.
By providing an integrated way to centralize and organize predictive modelsand also automate predictive analytics
processesSPSS Predictive Enterprise Services helps organizations improve analytical asset and process management.
Analytical asset managementThe resources that are involved in a predictive analytics process may involve:
n PASW Modeler streams, models, and outputs
n Documentation
n External scripts for data preparation or report generation
n Resources from other predictive analytics tools, such as PASW Statistics syntax and outputs, and SAS code
PASW Modeler Server ran the stream against 16 million
customer records in approximately eight minutes for an
overall rate of over 33,000 customers per second (see
Figure 6).
Scaling the data mining process with SPSS
Predictive Enterprise Services
Raw data processing speed is not the only factor affecting
performance. Frequently, the volume of modelsrather
than the volume of datais the bottleneck hampering
data mining productivity. In many organizations, the
number of data miners, analysts, and others involved
in the process can also have a very significant impact
on performance.
PASW ModelerServer Performance, Optimization, and Sizing 5
Figure 6: The elapsed time taken to perform data preparation steps.
Figure 6
By using PASW Modeler Server with SPSS Predictive
Enterprise Services, one financial services organization
optimized its operational analytics, reducing the timetaken to execute a key analytical process by a factor of
80 times. This resulted in major, quantifiable savings.
Generating real performance from data mining activities often depends more on an organizations ability to manage its
analytical assets and complex, multi-part analytical processes than on raw data processing performance alone. For
example, powerful servers are often underutilized when organizations are unable to put the right models in the right place
and effectively schedule their execution.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
6/16
6 PASW ModelerServer Performance, Optimization, and Sizing
Figure 7: Predictive Enterprise Manager allows users to create and schedule multi-part, multi-tool, analytical processesvia a visual workflow interface.
Figure 7
These are analytical assetsthe tangible results of the efforts of data mining teams. SPSS Predictive Enterprise Services
provides a centralized repository that offers:
n Security and access control
n Version control and labeling
n Audit and tracking capabilities
n Advanced data mining-aware organization and search facilities
n Direct integration with PASW Modeler and also with PASW Statistics tools
Managing analytical assets provides a foundation for data mining processes, enabling these processes to scale to the
enterprise level.
Analytical process management
Developing robust processes for data mining activities such as model building, scoring, and validation is integral to delivering
high performance on an enterprise scale. These processes often involve the combination of multiple tools and technologies.
SPSS Predictive Enterprise Services provides a visual workflow user interface, Predictive Enterprise Manager, which allows a
full, end-to-end process to be defined using assets stored in the repository and a mix of technologies (see Figure 7).
Analytical processes are fully integrated with the repository, automatically extracting the required objects and versions, and
storing the results. A scheduling service allows these processes to be executed at regular intervals, and a notification service
provides e-mail tracking.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
7/16
Performance optimization
Most of PASW Modeler Servers high performance is achieved through performance optimizations that are switched on by
default. Many PASW Modeler operations can be further improved by fine-tuning performance parameters.
Maximize performance with in-database mining
One of the key benefits of PASW Modeler Server is that it allows organizations to fully utilize their investments in high-
performance database systems. Many organizations have invested heavily in a database infrastructure and business
intelligence systems, but these systems are often under-utilized by the analytical tools that use them.
PASW Modeler Server improves performance when mining large datasets by maximizing in-database mining. For example, you
can delegate as many operations as possible to your IBM DB2 Data Warehouse database or Oracle Database 10g, taking
advantage of database optimization and reducing data movement.
With PASW Modeler Server, processing is executed inthe database via SQL queries. Any operation that
cannot be represented using SQL queries is performed
by the server itself. Only relevant results are passed
back to the client; perhaps more importantly, data
transfer between the database and PASW Modeler
Server is minimized.
Another advantage of PASW Modeler Servers in-database mining is that it minimizesand can even eliminatedata transfer
costs. In a test measuring the impact of in-database mining (see Figure 8), the same PASW Modeler stream was executed
with full SQL generation, no SQL generation, and a scoring-only SQL generation (which executed the scoring in-database but
performed transfer of data to and from the database).
PASW ModelerServer Performance, Optimization, and Sizing 7
While SQL generation of the scoring was approximately
10 percent quicker than scoring in the application,
the biggest factor in performance is data transfer, which
accounts for more than 85 percent of the elapsed time
for scoring.
The only way to manage the data transfer bottleneck
is to ensure that less data is transferred. PASW Modeler
Servers SQL generation reduces data transfer to aminimum and leverages your investment in high-
performance databases.
Figure 8: Scoring stream executed with full SQL generation, SQLgeneration of scoring only, and no SQL generation
Figure 8
Data transfer costs are the most significant factor affecting
performance. For example, over 85 percent of the time
allotted to score a model can be attributed to data transfer
between the database and the scoring application.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
8/16
In Figure 9, the PASW Modeler stream is executed using SQL generation. Many nodes are purple, rather than the usual
white, during execution. Purple nodes mean that the operations represented by those nodes have been translated into SQL
and executed in-database. This feedback helps an analyst ensure that as much of the stream as possible is executed in the
database. Additional options allow the user to examine the SQL that is generated.
Stream optimization relies on intelligent SQL generation and stream execution
SQL generation is a powerful capability, but it depends upon analysts to understand how PASW Modeler operations can be
executed on a database. And analysts are focused on solving business problems, rather than optimizing their PASW Modeler
streams for performance.
For this reason, PASW Modeler Server features advanced optimization that intelligently re-orders operations in the PASW
Modeler stream to maximize performance without altering results. Data miners can organize streams in a way that makes
sense to them, and PASW Modeler Server will reorganize those same operations in a way that makes sense to the database.
8 PASW ModelerServer Performance, Optimization, and Sizing
Figure 9: SQL generation and highlighting in a PASW Modeler stream
Figure 9
SQL feedback, previewing, and viewing
There will be times when analysts will want more control over the optimization of PASW Modeler streams. PASW Modeler
Server supports this by providing immediate feedback: upon execution, every PASW Modeler node that can be fully translated
to SQL is highlighted (see Figure 9).
7/28/2019 Modeler Server Performance, Optimization, And Sizing
9/16
PASW ModelerServer Performance, Optimization, and Sizing 9
Figure 11: Setting a cache on a node that is likely to be re-executedwill store the data in a temporary table on the database, whenpossible. Executing streams from that cached node will allow furtherin-database operations.
Figure 11
Figure 10: Stream optimization
Figure 10In Figure 10, the derive node contains an operation that
cannot be carried out in the database. PASW Modeler
optimizes the process so that the select operation is
performed before the derive operation, thereby reducing
data transfer and improving performance.
In-database caching
One common user optimization is to set up a cache on
a node. The next time data is passed through that node,
the cache is filled with that data. From then on, the data
is read from the cache rather than from the data source.
This can be a useful way to ensure that expensive data
processing is only executed once.
Normally, the cache is stored as a temporary file on the
file system, but PASW Modeler Server also supports
the caching of this data into a temporary table in the
database. When combined with SQL optimization,
this may result in significant gains in performance.
As illustrated in Figure 11, the output from a stream
that merges multiple tables to create a data mining
view may be cached and reused as needed.
Plus, by automatically generating SQL for all downstream nodes, performance can be improved further. In Figure 11,
the select operation is highlighted, indicating that the operation is being executed in the database from the filled
database cache.
In-database model building
PASW Modeler Server supports integration with data mining algorithms that are available from other database vendors.
Organizations can use PASW Modeler to manage the entire data mining process while modeling with the database-native
algorithms provided by these vendors. Using in-database modeling ensures that data transfer is minimized, even during
the model building phase. It also helps organizations leverage their existing investments in IBM DB2 Intelligent Miner,
Microsoft SQL Server 2005, and Oracle Data Mining.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
10/16
Advanced performance optimization
In addition to in-database mining, PASW Modeler Server provides a number of capabilities that allow the user to optimize the
performance of his streams.
Database bulk-loading
Data movement is often a bottleneck in performance, especially when writing data to a database. PASW Modeler Server
provides a number of features to optimize this process for large data volumes.
10 PASW ModelerServer Performance, Optimization, and Sizing
Figure 12: Database export advanced options allow bulk loading todatabase via ODBC or through an external loader.
Figure 12
Figure 13: Create indexes on database tables to improvedatabase performance.
Figure 13
By default, writing data to a database is performed on
a row-by-row basis. While this prevents errors and
provides data security, it slows performance. Allowing
the PASW Modeler Server to commit multiple rows at
a time is a good way to ensure more reasonable
performance, and this option is available by default.
In addition to the batch committal of records, PASW
Modeler Server supports two types of bulk loading,
as shown in Figure 12.
The first is provided through ODBC bulk loading facilities.
The second type uses an external bulk loading tool to
allow a database-native solution. External bulk loading
scripts are provided for Microsoft SQL Server, Oracle Data
Mining, IBM DB2 Intelligent Miner, Netezza Performance
Server, Teradata Warehouse, and IBM Redbrick
Warehouse databases. These scripts can be customized,
and custom scripts may be written for other databases.
Database indexing
Indexing database tables maintains the performance of
in-database options. Correct indexing significantly impacts
many subsequent database operations.
As shown in Figure 13, PASW Modeler Server enables
users to create indexes on tables exported from PASWModeler. Simple indexes can be created easily, and PASW
Modeler also allows you to customize the SQL statement
used to create the index (for instance, to create a BITMAP,
UNIQUE, or FILLFACTOR index).
7/28/2019 Modeler Server Performance, Optimization, And Sizing
11/16
Optimized joins and sorts
By default, PASW Modeler has to make assumptions
about the state of data in the system. For example,
PASW Modeler cannot assume that any data has already
been sorted, so many operations ensure that a sort
is performed when required, even if such a sort is
redundant. PASW Modeler allows the user to optimize
a sort or join operation by specifying any existing sorts
on the data. This eliminates redundancy and improves
performance, as shown in Figure 14.
Users can also optimize the performance of PASW
Modeler Server through special case algorithms for joins.
PASW Modelers default join algorithm is designed toperform optimally when joining datasets of similar size.
In some very common operations, such as when using a
join to connect an ID in one table to a label or description
from another (e.g., joining a product code in a table of
transactions to a product name in a look-up table), the
default join is inefficient.
PASW Modeler offers an alternate join algorithm for these
situations that significantly boosts performance speed,
as can be seen in Figure 15.
High performance through parallel data processing
Multithreading is a method by which an applications
process can perform more than one task at the same
time. Threads share the same memory space, and
PASW ModelerServer Performance, Optimization, and Sizing 11
Figure 15: Impact of specialized join when joining a large table to asmall table (250,000 records)
Figure 15
Figure 14: Impact of pre-sorting optimization on sort performance
Figure 14
must synchronize at certain points within their execution to access shared resources safely. Operating systems provide
low-level mechanisms to support this synchronization. If an application uses more than one thread to execute, it is said
to be multithreaded.
Symmetric multiprocessing (SMP) machines are widely used and available for all platforms supported by PASW Modeler
Server. They comprise multiple CPUs sharing access to the same memory, disk, network, and other I/O resources. When amultithreaded application runs on an SMP box, threads may be distributed across the CPUs and execute truly in parallel.
Application processes and individual threads can usually migrate dynamically between CPUs to balance processor load.
This is generally handled transparently by the operating system.
PASW Modeler Server employs parallel processing to improve performance in both data processing and modeling operations.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
12/16
Parallel data processing
PASW Modeler Server uses a parallel data-sorting algorithm to improve the performance of a number of data processing
operations. Sorting is used by many PASW Modeler operations, including binning, model evaluation, merge and, of course,
the sort operation itself. All of these operations benefit from the parallelization of the sort operation.
The parallelized sort algorithm uses a technique called
record parallelism. This technique distributes records
across a number of separate sorting processes. Each process
sorts its own subset of records and then the results are joined.
Figure 16 shows the effect of running a parallelized sort on
multiprocessor hardware. At high data volumes, sort times
can be reduced by more than 30 percent.
12 PASW ModelerServer Performance, Optimization, and Sizing
Figure 16: Impact of multiple CPUs on data sorting performance
Figure 16
Parallel predictive model building
Parallel processing techniques are also used by PASW
Modelers C5.0 decision tree algorithm and can improve
performance in building decision trees and rule sets. The
benefits depend largely on dataset sizeboth the number
of records and the number of fieldsbut they can provide
a useful boost to what can be a time-consuming process.
Scoping and sizing PASW Modeler ServerMany factors must be considered when scoping hardware requirements for a PASW Modeler Server installation. The breadth
of PASW Modeler operations and differences in data volumes make it difficult to estimate performance for any specific
hardware configuration.
Impact of CPUs on performance
Obviously, the core speed of any individual CPU will impact data mining performance. Almost all data mining operations,
especially modeling, are heavily processor dependent, so an increase in CPU speed will produce a proportional increase
in performance for many PASW Modeler processes.
The main benefits of multiple CPUs (or multicore CPUs) occur when running multiple streams. This means that the number of
users will often be the deciding factor in determining the optimum number of CPUs. Multiple CPUs will also benefit parallelized
operations, but the main benefits will be from supporting multiple users.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
13/16
Table 1: Recommended number of CPUs per number of users
For a production server running scheduled data mining via SPSS Predictive Enterprise Services, the number of CPUs
should be determined by the number of separate processes to be performed simultaneously. Maximum performance
can be achieved, for instance, by splitting a model scoring process across multiple CPUs or building multiple
models simultaneously.
Impact of physical memory on performance
Most PASW Modeler operations can be performed on large volumes of data with minimal memory usage. Only certain
operations, such as sorting, joining, and modeling, require data to be temporarily stored in memory. If not enough memory is
available, these operations will store part of the data as virtual memory on disk. This can affect performance, since disk access
is significantly slower than memory access.
As with CPU usage, the number of users impacts the required memory for normal operation. Memory requirements depend on
data volume. Typical minimum requirements can be found in Table 2.
Table 2: Minimum RAM for number of users in normal use
Large volume model building
Model building is one of the more memory-intensive operations in the data mining process. This is because the model-
building algorithms require access to the entire modeling dataset, often making multiple passes at the data.
For this reason, model building is usually performed on subsets or samples of data. It is normally more productive to build
different models on a small subset of the data and then choose the best model, rather than to build a single model on a larger
dataset. This type of model building can usually be performed within minimal memory requirements.
PASW ModelerServer Performance, Optimization, and Sizing 13
Number of users Minimum RAM
1-2 1GB
3-4 2GB
5-10 4GB
11-20 8GB
21+ 16GB
Number of users Number of CPUs
1-2 1
3-4 2
5-10 4
11-20 8
21+ 16
7/28/2019 Modeler Server Performance, Optimization, And Sizing
14/16
Using more data rarely improves the predictive accuracy of a model. However, if model building on larger volumes is required,
additional memory can help performance.
14 PASW ModelerServer Performance, Optimization, and Sizing
5 Estimates based on neural network, Kohonen, and K-means algorithm memory requirements. Maximum physical memory may also be limited by theoperating system.
Table 3: Estimated RAM required (GB) to avoid disk-caching during model building5
Table 3 provides guidance on the memory required to avoid disk-caching on model building operations, based on the memory
usage of the neural network, K-means, and Kohonen modeling algorithms.
Memory configuration
PASW Modeler Server will, by default, limit the amount of physical memory used by any single process to ensure that other
simultaneous processes arent affected. A maximum of 25 percent of available memory will be allocated for model building,
and approximately 10 percent will be available for sorting operations. This figure is lower, as there may be multiple sorts in
a single stream. The PASW Modeler Server administrator can modify these settings.
Impact of disk space on performance
Before addressing disk space requirements, it is important to understand the volume of data that is likely to be used for
the actual data mining. Most organizations store many terabytes of data, especially transactional data, but this amount
will rarely be used. Normally the data is aggregated, selected, or sampled before it is used for analysis. While large data
volumes are typically used in model scoring, the model scoring processes usually rely on operations that dont use a lot
of system resources.
When trying to maximize performance, disk usage for data processing steps can be relatively high. The user often caches data
to minimize execution times, and some operations will spill to disk when physical memory is unavailable. In addition, some
operations may produce a dataset larger than the raw input data, further increasing disk requirements.
Columns
Rows (millions) 10 20 50 100 500 1000
0.1 0.5 0.5 0.5 0.5 2 4
0.5 0.5 0.5 0.5 1 4 8
1 0.5 0.5 1 2 8 16
2 0.5 0.5 2 4 16 32
4 0.5 1 4 8 32 -
8 1 2 8 16 - -
16 2 4 16 32 - -32 4 8 32 - - -
64 8 16 - - - -
7/28/2019 Modeler Server Performance, Optimization, And Sizing
15/16
To understand disk usage, a series of tests was performed based upon the PASW Modeler Application Template for customer
relationship management (CRM). This template consists of streams that demonstrate data mining techniques used for CRM.
The source dataset was 72MB in size, representing a sample of 140,000 customers and 360,000 transactions, plus other
associated data.
PASW ModelerServer Performance, Optimization, and Sizing 15
6 SQL generation typically reduces the disk space requirements for PASW Modeler Server since many of the data preparation steps can be carried out onthe database.
7 Estimates based on 1 million rows/10 columns requiring 100MB disk (high estimate) and a working multiplier of 5 times (high estimate for single user).
Figure 17: Percentage of original disk space required for data miningstream operations.
Figure 17The data was stored in text files and all operations
were carried out by PASW Modeler Serverno SQL
generation was required6.
As shown in Figure 17, the tests measured the maximum
amount of disk space needed to execute over 100
separate execution streams. The vast majority of streams
required little disk usage, but others used over four times
the disk space of the source data.
Given that these data preparation steps are typically
executed infrequently (its a best practice to store the
results of such processing as intermediate files or tables),
a conservative rule of thumb is to reserve between
three to five times the disk space required to store the
original data.
Table 4: Estimated disk space required (GB) for data mining (15 users)7
Columns
Rows (million) 10 20 50 100 500 1000
1 0.5 1 2.5 5 25 50
2 1 2 5 10 50 100
4 2 4 10 20 100 200
8 4 8 20 40 200 400
16 8 16 40 80 400 800
32 16 32 80 160 800 1600
64 32 64 160 320 1600 3200
This rule holds for small numbers of users because users will rarely perform high disk-usage operations simultaneously. In
addition, organizations can minimize overall disk usage by scheduling expensive data preparation steps during times of low
system usage.
7/28/2019 Modeler Server Performance, Optimization, And Sizing
16/16
Conclusion
The ever-growing amount of data created by organizations presents opportunities and challenges for data mining.
The PASW Modeler data mining solution makes it easy to use business knowledge to quickly develop, update, and deploy
predictive models.
Furthermore, PASW Modeler Servers combination of high performance, scalability, performance optimization options, and
flexible hardware requirements enables it to handle large and complex data mining projects. With PASW Modeler Server,
your organization can:
n Utilize your investment in high-performance databases for all data mining tasks, ensuring high performance and
minimizing data transfer costs
n Maximize your use of multiple CPUs (or multicore CPUs) in your operating environment by using parallel processing
during a number of data preparation and model-building operations
n Use in-database caching, database write-back with indexing, and optimized merging to join tables outside ofthe database
Scaling the entire data mining process with PASW Modeler Server makes it possible for your organization to analyze large
volumes of data efficiently, shortening the time needed to turn data into better business decisions that boost your ROI.
About SPSS Inc.
SPSS Inc. (NASDAQ: SPSS) is a leading global provider of predictive analytics software and solutions. The companys
predictive analytics technology improves business processes by giving organizations consistent control over decisions made
every day. By incorporating predictive analytics into their daily operations, organizations become Predictive Enterprisesable
to direct and automate decisions to meet business goals and achieve measurable competitive advantage.
More than 250,000 public sector, academic, and commercial customers rely on SPSS Inc. technology to help increase
revenue, reduce costs, and detect and prevent fraud. Founded in 1968, SPSS Inc. is headquartered in Chicago, Illinois. For
additional information, please visit www.spss.com.
To learn more, please visit www.spss.com. For SPSS Inc. office locations and telephone numbers, go to www.spss.com/worldwide.
SPSS is a registered trademar k and the other SPSS Inc. products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. 2009 SPSS Inc. All rights reserved. CSWP-0209