[Lecture Notes in Computer Science] Model Driven Engineering Languages and Systems Volume 3713 ||...

L. Briand and C. Williams (Eds.): MoDELS 2005, LNCS 3713, pp. 355-366, 2005. Springer-Verlag Berlin Heidelberg 2005

Model-Based Scalability Estimation in Inception-Phase Software Architecture

Steve Masticola1, Andre Bondi1, and Mark Hettish2

1Siemens Corporate Research, Inc. 755 College Road East

Princeton, NJ 08520 [email protected]

[email protected]

2Siemens Communications, Inc. 1700 Technology Drive

San Jose, CA 95110 [email protected]

Abstract. Scalability is one of the crucial nonfunctional requirements that must be evaluated in the Inception Phase of the Rational Unified Process [9]. This is the phase in which the least information is generally available to form a principled evaluation. We demonstrate how an estimate of user scalability can be formed using sequence diagrams of the common user scenarios, together with experimentation (ranging from simple timing measurements to more complex architectural prototypes), published study data, and performance data from baseline systems. Despite being quite inexpensive, the techniques used by our team enabled us to identify and guide corrective actions for major bottlenecks before they became serious design flaws in the Elaboration and Construction phases of the Unified Process. The same techniques also allowed us to quickly evaluate the effects of high-level architecture and technology alternatives on user scalability and response time.

1. Problem Statement

This study concerns a large-scale commercial server-based software product (denoted by LSCSP1) based on the Microsoft C# platform [6]. The system is partitioned into concurrent processes that communicate using socket-based communication transmission of SOAP messages.

An effort is underway to develop version N+1 of LSCSP based on Java technology, including elements of J2EE. The migration to Java was undertaken for two reasons.

1 The name of the product, the scenarios it supports, and all other identifying information, have

been changed to protect Siemens intellectual property.

mailto:[email protected]



356 Steve Masticola, Andre Bondi, and Mark Hettish

First, the LSCSP marketers wanted to reach customers who find the C# platform unacceptable. Second, it was hoped that the mapping could increase the number of users supported by each server. The LSCSP architects thought that the SOAP messaging scheme was inefficient and that they could get significant scalability with respect to the number of users (“user scalability”) gains from a more tightly-coupled messaging mechanism. In this context, user scalability has two components, load scalability, which concerns scalability with respect to the use of active resources such as processors, bandwidth, and I/O devices, and space scalability, which concerns scalability with respect to passive resources such as memory [8]. The LSCSP architects were primarily concerned with load scalability.

The architecture team realized early that, to achieve these gains, LSCSP’s modules

would have to be mapped onto Java containers of various types (servlet/JSP, EJB etc.) differently from the way they were mapped onto .NET processes and threads. The architecture team identified at least ten reasonable ways to do this. They needed to evaluate the effect that each mapping would have on user scalability. Additionally, within each possible mapping, several technology options were also possible for communications between different pairs of modules. These, too, could potentially affect the scalability of the system.

The two problems we faced, then, were:

• to estimate user scalability for each of the reasonable process-to-container mappings and for each of the possible communication technology alternatives,

• to estimate the increase in user scalability that could be expected from the migration to the new architecture and the Java platform.

This paper presents the lessons learned in creating these estimates. Our purpose is

to show a set of useful estimation techniques, rather than to present normative performance data or experimental studies. We therefore omit all detailed description of the experimental procedure, which was intended purely to provide “first-look” data for our own use. Any data here is shown only for descriptive purposes, and should not be taken as normative.

2. Nonfunctional Requirements in the Forthcoming System

The forthcoming version of LSCSP had several ambitious performance goals as nonfunctional requirements. Determining whether these goals could be met was the major motivation for doing performance analysis.

First, LSCSP N+1 had a goal of increasing the number of users supported on a “standard server” by an order of magnitude. An example of a “standard server” is a dual Pentium PC with a substantial amount of RAM and hard disk.

In addition, enterprise-level scalability was desired. The intent was to scale to systems of collaborating servers to support increasing numbers of users. Another goal was to support failover between servers with minimal interruption in service.

Model-Based Scalability Estimation in Inception-Phase Software Architecture 357

Achieving both of these goals will require system resources and thus affect user scalability.

3. Anticipated Performance Impacts of Some Implementation Choices

LSCSP relies on “server push” technology2 to periodically update one particular frame. Server push requires the client to periodically poll the server for updates. This further increased the server workload. It was thought that server-push would cause serious scaling problems. Therefore, we wanted to investigate updating technologies other than server push, and their effect on user scalability.

A certain amount of off-server traffic was expected to support cross-server request handling and data replication during normal operation to support failover handling. We wished to get a precise estimate of how much extra load would be caused by server-to-server communication in a large-scale scenario. We acknowledge, therefore, that the resource utilization of these scenarios should be modeled, but that we have not as yet examined this in detail.

3.1. The Need for Model Parameterization

Early in the scalability estimation effort, we decided to develop a spreadsheet-based model of user scalability. This would allow us to decide at the Inception Phase of the Rational Unified Process [9] (or at a similarly early stage in other processes) whether the nonfunctional requirements could likely be simultaneously met, or whether the architecture and/or choice of technologies needed to be revised.

Additionally, the team recognized that there were other factors that could not easily be predicted or determined through experimentation or existing data. These included, but were not limited to, per-user resource demands for various usage scenarios, the performance gain in the LSCSP business logic from parallel processing, and the language-dependent performance of Java versus C#. We parameterized the spreadsheet model to allow architects to see the performance impacts of different choices of technologies under different sets of assumptions about their associated processing costs. In the end, our model had twenty-three different parameters.

4. Procedure

Our procedure to conduct the analysis was to specify each architectural alternative under consideration in sufficient detail that UML sequence diagrams of the most common scenarios, and experimental data from architectural prototyping, could be

2 This is the conventional terminology, though in fact the client pulls the content.


used to derive expected consumptions of server resources. Our methodology was very similar to that of Smith and Williams [7]. We adapted their techniques in two key ways: we used no special-purpose performance analysis extensions to the UML, and we also employed only commercial UML modeling tools (mainly the UML features of Microsoft Visio), rather than special-purpose tools for performance analysis such as Smith and Williams’ SPE*ED.

Obtain Experimental Data

Model the Scenarios of the Important Transactions

Resource Use Data for Each Message Technology

Message Sequence Diagrams for Each Scenario

Scenario Rate per User per Second

Specify Messaging Technology Alternatives

Intercomponent Messaging Architecture

Candidate Message Technologies

Specify Platform Alternatives

Specify Architectural Alternatives

Combine

Annotated MSDs

Obtain Business Logic Data

Combine and Normalize

User Scalability Estimate

Scenario Latency Estimates

Business Logic Resource Use for Each Scenario

Figure 1: The model-based scalability estimation process used by the team. Figure 1 shows an abstract overview of the process that was used to estimate user

scalability. The team modeled each of the important transaction use cases of LSCSP, producing a message sequence diagram (MSD) for each scenario and an estimated scenario repetition rate (transaction rate) per user per second. Once the scenarios were modeled, resource use data was obtained from the LSCSP performance analysts for most of the business logic used in the scenarios.

Simultaneously, the team specified the platform, architectural, and messaging

technology alternatives that they wished to evaluate. This effort produced a list of candidate messaging technologies, and a set of inter-component messaging architectures (i.e., maps from the communication relationships between components to these technologies.) The latter was combined with the scenario MSDs to produce MSDs that had been annotated with the size and technology of each message.

Once the candidate message technologies had been identified, the relationship

between message size and resource usage could be determined experimentally for


each technology. The annotated MSDs could then be combined with this data to produce an estimate of resource usage for each instance of each scenario. Business logic resource usage could be added as well. With these data in hand, we would be in a position to determine the maximum sustainable transaction rates for given mixes of scenarios.

We note that other performance measures of interest, e.g., a lower bound on the latency for the scenarios, can be estimated using the same data, by finding the length of the critical path through the MSD.

4.1. Architectural Alternatives Under Consideration

We wished to analyze the user scalability of several different possible architectural alternatives. In each alternative, a choice was made for platform technologies, mappings of system components to those platforms, and inter-component communications technologies within each mapping.

4.1.1. Platform Technologies

LSCSP version N is based on a tiered architecture, consisting of client, presentation, business logic, integration, and resource tiers. Many of the tiers would undergo changes to their platform technology in version N+1.

• The presentation tier platform on LSCSP version N is Microsoft Internet Information Server, including Active Server Pages. In version N+1, this would probably change to Apache Tomcat and Java Server Pages. There is also an option for a tight integration of the presentation tier platform with the business logic tier platform.

• The business logic tier platform in LSCSP version N is simply the Windows runtime, since the LSCSP components run as processes. In LSCSP N+1, the business tier would run on EJB, a lightweight platform (called LWP here), or some architectural alternative that combines the two. Additionally, some non-real-time business logic could be implemented as servlets and run in the servlet container.

• In LSCSP N, there is no integration tier as such. The Java Connector Framework could potentially serve as an integration tier in LSCSP N+1.

The major decisions on platform technology for LSCSP N+1 involved mapping the

components of LSCSP to these technologies, possibly with some repartitioning. Additionally, there was a question of whether to use Tomcat standalone or some other technology.

4.1.2. High-Level Architecture Options

We considered eight proposed LSCSP package architectures. Ad-hoc diagrams showed the embedding and communication of the business-logic software components within and between the proposed container technologies. These diagrams


established partial constraints on the mappings from inter-component messages to specific technologies.

4.1.3. Inter-component Communications Technology Options

Within each high-level architecture option, it was clear that a LSCSP implementation could use many different communication technologies. We wanted to evaluate the effect of each of these technologies upon system performance. The technologies identified for study included:

• HTTP (for communicating between the client and presentation tiers.) • Java Messaging Service. • EJB calls (local and remote.) • Lightweight component-to-component calls. • Serialized Java objects over TCP. • SOAP-serialized objects over TCP. • Web services invocation via Jboss Mbean technology.

4.2. Scenario Modeling

For the Inception Phase performance modeling, the architecture team extrapolated the inter-component call sequences of LSCSP Version N to LSCSP Version N+1. These sequences were captured in the form of UML sequence diagrams. Figure 2 is an example of one of these scenario diagrams.

4.3. Experimentation

While part of the architecture team was capturing scenarios as MSDs, a second part of the team started a program of experimentation with the communication technologies listed in Section 4.1.3. These experiments were an early phase of architectural prototyping, and were intended to produce rough timings for internal use rather than benchmarks for engineering and sizing purposes. Creating a publishable benchmark was outside the scope of our activities.

To obtain reasonably accurate timings on a Windows XP platform from inside of

Java, we used Vladimir Robutsov’s com.vladium.utils timing utilities [1], [2]. Most of our timing experiments were timed using the sub-microsecond PC wall-clock timer. We standardized on using the wall-clock time to execute a scenario as the basis for resource consumption, for two reasons. First, using CPU time alone would hide idle time and delays due to non-CPU resource utilization. Second, CPU time as measured on Windows XP includes only the immediate process and kernel time, and wouldn’t include CPU used by system processes that are called into action indirectly while executing the scenario.


Figure 2: Sample MSD from one LSCSP scenario (S1).

Message transmission timings were performed by sampling the performance timer

at six instants in the handling of each message: (sender) start of process, marshalling completed, send done; (receiver side) start of reception, message received, message unmarshaled. Synthetic messages with payload lengths varying from one byte to one megabyte were generated and marshaled. Ten messages of each length were sent and received.

Two special concerns were queuing artifacts and JVM optimization. To avoid

message queue problems, transmission of any test message was held off until the previous test message had been received and unmarshaled. Since the beneficial effect of JVM optimization only comes into play after the corresponding code has been executing without being optimized, a “warm-up” run was completed before the test run was performed with measurement turned on.

Figure 3 is an example plot of experimental data for a light-weight service call,

which confirmed our beliefs that this mechanism is fairly efficient. The upper plot shows that the average elapsed time for handling and light-weight service call is of the same order of magnitude for all message payload sizes. The lower plot contains the same data. Its vertical scale has been expanded to show that the processing time of the light-weight message handling mechanism is quite insensitive to the size of the payload field. Moreover, the average observed processing time is visibly low compared with the displayed points, indicating that the distribution of values is skewed below the average. We did observe some spikes in wall-clock time. We believe that they might be caused by uncontrolled activity by other processes executing under Windows XP.


Lightweight Service Call - Overview

-20

0

20

40

60

80

100

120

1 10 100 1000 10000 100000

Message length (bytes)

Tim

e (m

illis

econ

ds)

TimeAvg

Lightweight Service Call - Detail

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1 10 100 1000 10000 100000

Message length (bytes)

Tim

e (m

illis

econ

ds)

TimeAvg

Figure 3: Sample data for a light-weight service call. Error bars are at one standard deviation from the mean.

Mechanism Latency (msec)

with 1024 bytes

Latency (msec)

with 2048 bytes

Latency (msec)

with 4096 bytes

Latency grows with

message size?

HTTP server-side response (Tomcat) 8.60 14.70

27.07 Yes

Java local call (intraunit) 0.05 0.05 0.05 No Java Messaging Service (JBoss) 511.80 494.38 468.43 Yes3 Light-weight service call (LWS-Impl) 0.19 0.11 0.19 No Remote session EJB, same container (JBoss) 0.20 0.24 *0.17 No Local session EJB, same container (JBoss) *0.25 0.25 *0.15 No Serialized objects over TCP 0.34 0.74 0.36 No SOAP messages over TCP (send time) 0.78 1.19 1.66 Yes SOAP messages over TCP (receive time) 1.26 2.13 2.18 Yes Web services using JBoss.NET MBeans 10.04 10.80 11.86 Yes

Table 1: Summary of experimental results at 1024, 2048, and 4096 bytes.

3 The Jboss implementation of JMS showed three different operating regions with respect to

message size.


Table 1 shows a summary of the experiment results with message payload sizes of 1024, 2048, and 4096 bytes (a common expected message size) for each of the messaging technologies we considered.

For some messaging mechanisms, such as HTTP and SOAP over TCP, we saw that the elapsed time increased as the message length increased. For other messaging mechanisms, such as EJB inter-bean calls in Jboss and light-weight inter-service calls in LWS-Impl, there was no such trend. These results are indicated in the “Latency grows with message size?” column of Table 1. The asterisks indicate that the data may reflect unexplained experimental error for those measurements and that we have used a conservative estimate instead. (Again, these latency values should not be viewed as normative, since that was not their intent.)

Interestingly, we found that the light-weight service technology LWS-Impland Jboss EJB [5] had about the same message transmission time, probably because they are using similar underlying mechanisms for separating Java namespaces and passing object references between them. The data also shows that the Java Messaging Service (JMS) would not be fast enough to use as an internal communication mechanism within LSCSP. This observation alone saved the team from taking a wrong direction, since JMS was being advocated within the team as a high-speed inter-component communication mechanism.

It is worth noting that the latency for transmitting serialized Java objects over TCP within the same server did not grow much with increasing message size. We conjecture that there is some optimization within the platform (at the Java and possibly the Windows XP layers) that avoids memory copying for these messages.

4.4. Use of Published Data

A literature search was undertaken early in the Inception Phase to find any published timing benchmark data that would be relevant to the project. In particular, we wanted to find any existing benchmarks relating business logic performance in Java and C#. Many claims and counterclaims have been made about the performance of these platforms by their proponents, but little data is available comparing communications mechanisms and business logic. One exception is [3]; while this data did not pertain directly to our needs, it served to reinforce our belief that the two languages would have roughly equivalent performance in business logic. Our experimental data later supported this belief.

4.5. Baselining the Existing System

Baseline data on CPU usage was available for LSCSP N for each of several scenarios. The usage figures included inter-component messaging, which we wished to exclude in order to baseline the cost of handling business logic irrespective of which option we chose. We therefore ran an experiment to estimate the CPU overhead in messaging.


Following this, the CPU time on the LSCSP N experimental machine was normalized to wall-clock time on the LSCSP N+1 experimental machine, using ratios of CPU clock rate on the two machines and Amdahl’s Law [5] to normalize with respect to the number of processors. The ratio of CPU time to wall-clock time and the fraction of parallelizable business logic are engineering estimates that are settable parameters in the spreadsheet model.

We also needed to understand how the performance of the application in C# would

compare with that in Java, other things being equal. For this purpose, we created a non-recursive “Towers of Hanoi” program in both languages and ran it with tower sizes from one to twenty. While the experiment is certainly not normative, it reinforced our impression that the business logic would probably not get a significant performance increase in the migration from C# to Java.

4.6. Spreadsheet Model

A spreadsheet model was created to summarize the scenarios of Section 4.2. Each worksheet in the spreadsheet corresponded to one scenario. One line on each sheet counted all the messages of a given communication technology and approximate message length in one of the MSDs in the scenario. Since the process of translation from the Visio MSDs to the spreadsheet was manual, checksums for the number of messages were included to check for errors in the transcription of the diagrams.

5. Discussion of the Performance Results and Their Architectural Implications

The pie chart in Figure 4 illustrates how the data gathered in the spreadsheet contributed to our understanding of resource usage for a particular group of parameter settings. It clearly shows that the business logic in two particular scenarios (labeled S1 and S2 in Figure 4) would the biggest contributors to system load in LSCSP N+1’s expected operation, for a total of 89% of the system load. The total contribution of all messaging in all scenarios to the system load was less than one tenth of the total CPU usage.

We used a partial UML model of LSCSP N+1 (the MSDs of the most important scenarios) along with other experimental and published data and engineering estimates to approximately predict single-server user scalability with good effect. Moreover, we constructed a model which allows architects to vary the engineering estimates as parameters and derive best-case and worst-case user scalability estimates.


Percent of Total Load by Scenario

S9 business logic3%

S5 business logic0%

S5 messaging0%

S6 messaging0%

S6 business logic0%S7 messaging

0% S8 business logic0%S4 business logic

2%

S1 business logic42%

S1 messaging2%

S2 messaging1%

S2 business logic47%

S3 business logic0%

S3 messaging2%

S4 messaging1%

Figure 4: Breakdown of load by scenario and load type (inter-component messaging vs. business logic).

The effort to construct these inception-phase estimates, excluding the experimentation, was less than one person-month. Getting the experimental data for message transmission time comprised most of the effort involved in the estimation. This data can now be reused for other projects, greatly reducing the effort needed to get scalability estimates.4 Moreover, the same technique can be applied to estimate other architectural information of interest in the Inception Phase, such as latency and enterprise-wide user scalability.

Examples of two useful results that came out of our effort were: • the identification of JMS as being unsuitable for use in LSCSP on

performance grounds, and; • the observation that LSCSP N+1 would be bottlenecked by business logic

(especially scenarios S1 and S2), rather than by inter-module communications as was previously expected. This implies that the project goal of a five-fold improvement in users per server would require major attention to business logic performance.

These timely and inexpensively obtained results prevented later embarrassment. They could be used to focus architecture and development activity in LSCSP N+1.

4 The experimentation required about five person-months. It must be emphasized, though, that

the data can be re-used for other estimation. Therefore, the cost of the experiments should be amortized over other estimations.


6. Future Work

This paper demonstrates the feasibility of model-based scalability estimation based on industry-standard tools, but it could be made far more efficient with improved tool support. In particular, estimating the resource usage of the MSD edges was tedious and error-prone. Given the fact that the commonly-used UML diagramming tools all support plug-ins, it shouldn’t be too difficult to automate this step.

By annotating activity diagrams with branch probabilities, it should also be

possible to form quantitative performance models in a manner similar to that suggested by Smith and Williams [7]. In this way, it should be possible to use activity diagrams as inputs to an analytic queuing model that can help evaluate performance over all possible scenarios for which the system is designed, rather than just an enumerated subset.

7. References

[1 Robutsov, Vladimir. “My kingdom for a good timer! Reach submillisecond timing in Java.” JavaWorld, January 10, 2003.

[2] Robutsov, Vladimir. “Profiling CPU usage from within a Java application. JavaWorld, November 8, 2002.

[3] Wilson, Matthew. “C# Performance: Comparison with C++, C, D, and Java, Part 1.” Windows Developer Network, Fall 2003,

[4] JBoss, Inc. “JBoss Administration and Development Guide, JBoss 3.2.6.” http://docs.jboss.org/jbossas/admindevel326/html/ , 2004.

[5] Gunther, Neil. The Practical Performance Analyst, iUniverse Inc., 2000. [6] Robinson, S. et. al. Professional C#. Wrox Press, 2001. ISBN 1861004990. [7] Smith, C.U. and Williams, L.G. Performance Solutions: A Practical Guide to Creating

Responsive, Scalable Software. Addison Wesley, Boston, 2002. ISBN 0-201-72229-1. [8] Bondi, A.B. “Characteristics of scalability and their impact on performance.” Proc. WOSP

2000, 195-200, Ottawa, September 2000. [9] Kruchten, Philippe. The Rational Unified Process: An Introduction, Third Edition. Addison-

Wesley, 2003. ISBN 0-321-19770-4.

Date post:	31-Jan-2017
Category:	Documents
Upload:	clay
View:	213 times
Download:	0 times

[Lecture Notes in Computer Science] Model Driven Engineering Languages and Systems Volume 3713 ||...

Documents