+ All Categories
Home > Documents > Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham...

Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham...

Date post: 09-Mar-2018
Category:
Upload: dangkiet
View: 212 times
Download: 0 times
Share this document with a friend
21
Performance Based Hierarchical Memory Architecture: Analytical Modeling and Analysis Marwan Sleiman Department of computer Science and Engineering University of Connecticut Storrs, CT 06269-2115 e-mail: [email protected] Outline Abstract 1- Introduction 2- Previous Work 3- Motivation 4- Model Architecture 5- Work Done 5.1 Power-tailed behavior and moments of the memory access time 5.2 The P-K Formula and Queuing Time 5.3 Interdependence of the Hit Ratios 5.4 Calculations and Results 6- Proposed Work 7- Conclusion References
Transcript
Page 1: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Performance Based Hierarchical Memory Architecture: Analytical Modeling and Analysis

Marwan SleimanDepartment of computer Science and Engineering

University of ConnecticutStorrs, CT 06269-2115

e-mail: [email protected]

Outline

Abstract

1- Introduction

2- Previous Work

3- Motivation

4- Model Architecture

5- Work Done5.1 Power-tailed behavior and moments of the memory access time5.2 The P-K Formula and Queuing Time5.3 Interdependence of the Hit Ratios5.4 Calculations and Results

6- Proposed Work

7- Conclusion

References

Page 2: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Abstract

the modern computing environment expands memory hierarchy from CPU registers and local memory to network storage, the optimal goal of a computer architect becomes to design a memory hierarchy that maximizes the overall design of his machine with a minimal cost. This requires deciding on the number, speed and size of the hierarchical layers. As the gap between processor and memory speed is growing exponentially, it becomes more important to develop an analytical model to capture all these hierarchical levels and optimize the memory access time to make a good utilization of both CPU and memory. In this work we study the performance of systems with multi-level hierarchical memory by studying the mean access time and variance which helps the designer to optimize the cost and response of the hierarchical memory system We show the importance of the variance of the access time and we study its behavior with the increasing number and size of intermediate memory levels. We use a linear-algebraic queuing theory approach to derive the mean and variance of the memory access time and we explain why previous attempts failed to provide accurate models. The existence of an unbounded variance for the memory access time can lead to extremely long wait queues and a particular class of distributions that has this property is the power tailed distribution. We show that under reasonable assumptions for the values of the parameters involved in the memory levels the response-time for such a hierarchical memory system is power tailed. Our model differs from all the previous related work by being global and general and by using some probabilistic equations that show the interdependence between the different levels and by using the P-K formula to distinguish between the memory access time and queuing time; this makes it independent of the application using the memory while classical approaches were program dependent. Moreover, our model achieves higher levels of accuracy while being expandable to multiple levels.

1. Introduction:

Storage is the main bottleneck in the performance of modern computing systems because the gap between processor and storage memory speed has been growing exponentially for the last two decades: CPU speed is improving by a factor of 1.55 per year while the RAM memory speed is improving only by 7% [1]. Thus researchers created hierarchical memories with high speed cache memory to bridge this gap. Thus the CPU can fetch data from the intermediate fast memory instead of slower main memory. The advances in modern computer architecture and networking technologies expanded the concept of this hierarchy beyond the local machine to middle-tier and network storage; storage systems nowadays are found in computing systems in several hierarchical levels, they start with registers, then they expand to caches, main memory, disks, tapes, network storage systems (SAN) [2], and so on... Each storage system provides the basic function of storing data and holding the data until it is retrieved at a later time. The main differences between the various storage systems are their speed, cost, size, and volatility. As the memory becomes closer to the CPU, it becomes faster but smaller and more expensive. Thus, when designing a memory hierarchy, the aim is to get the fastest design while maintaining a compromise between size, speed and cost. The CPU can spend a considerable amount of time waiting for data to be accessed and this will negatively impacts the overall performance because it increases the CPI in a pipelined processor [3] and increases the query execution time in a database application [4]. Unfortunately, little practical rules/metrics exist for systems designers and administrators to optimize their memory hierarchies. Exhaustive simulation takes far too long particularly with the memory hierarchies become more complex [5]; trial and error on running systems is usually impossible; and prior mathematical analysis comes short of providing intuitive insight into memory sizing [6] or assume the availability of memory technologies with arbitrary speeds and costs [7]. Having found that the previous approaches in studying memory access time were limited with the number of memory levels they can represent, depended on the application, and did not provide with high accuracy, we built an analytical model based on Markov chain analysis. This model is independent of the distribution of memory requests that depends on the application to study the performance of the hierarchical memory systems. We represented the hierarchical memory by a M/G/1 queue [8] and we calculated the access time by using linear algebraic queuing theory We have shown first that the hierarchical memory access time is power tailed [9], then we expanded out work [10] and we considered the queuing time of the consecutive memory requests that can occur in a database application for example and we showed the difference between the access time and the queuing time. This difference explains the cause of the inaccuracy of

Page 3: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

previous approaches in predicting the memory access time. We also study the behavior of the variance access time which is highly critical because it can dramatically affect the miss ratio of the memory system and its performance. The model we built is more general than the classical models because it can take as input different input parameters like, access time, hit ratio, cost, and memory request distribution. We are planning to use and expand our analytical model to optimize the memory design by trying to build a memory with a minimal response time and variance and maintaining a minimal cost.

2. Previous Work:

The hierarchical memory access time was studied by several researchers but all the previous approaches to model and optimize the access time were limited. The limitations are the result either from the dependence of the models on the application or from the limitations of the analytical model that can not represent the deep hierarchies. For example, Balasubramonian et al. explain that the recent memory hierarchy organizations do not match the applications requirements which results in a degradation in the performance [11]. Jin et al. develop a limited analytical model that captures only a two-level cache [12], but we see in their work a big discrepancy between the predicted and measured memory performance. Many researchers studied memory hierarchy and some developed analytical models for designing memory hierarchies. Jacobs et al [13] proposed an analytical model to determine the optimal cache size to minimize the average memory reference time. Trace-driven simulations are used extensively by researchers to investigate such aspects of cache performance as multiprocessor cache coherence and replacement strategies. Tracedriven studies are valuable for understanding cache behavior on specific workloads, but they are not easily applied to other workloads [5]. Unlike traces, mathematical analysis lends itself well to understanding cache behavior on general workloads, though such generality usually leads to less accurate results. Du et al [14] showed the importance of the depth of memory hierarchy as a primary factor affecting the execution time on a cluster of workstations, but their results are dependent on the workload type which may vary depending on the application. Abraham et al [15] show that using a cache simulator may take up to 466 days to simulate some cases -which is clearly infeasible. Chow [16-17] studied the scalability of the size of cache with the number of levels to conclude that the optimal number of cache levels scales with the logarithm of the capacity of the cache hierarchy. Garcia Molina and Rege [18, 19] demonstrated that for effective utilization of memory it is more suitable to use slower devices and less of faster device. Rege [19] showed that up to 3:1 advantage in performance can be achieved by using a three level rather than the two level hierarchy at the same total cost. Welch [20] showed that the optimal speed of each level should be proportional to the amount of time spent servicing requests at that level. Jacob [13] pointed out that these studies have two major shortcomings:

(i) They assume the availability of memory technologies with arbitrary speeds and costs, (ii) They do not apply their analyses to a specific model of workload locality.

Being able to create and use technologies on a continuum of characteristics is convenient but too idealistic a viewpoint to be usable by system builders. Failing to apply a specific model of workload locality makes it impossible to provide an easily used, closed form solution for the optimal cache configuration [13], and so results from these papers have contained dependencies on the cache configuration—the number of levels, or the sizes and hit rates of the levels. Most researchers focused on two-level memory as shown in articles [21] and [22] and we don’t see any work that focused enough on deep memory hierarchies. None of the previous researchers talked about the variance of the access time of the memory hierarchy. This variance is of primary importance because a high variance in the access time corresponds to a higher miss ratio and unexpected delay -which is undesirable.

3. Motivation

We provide a general model to study the memory access time. First we motivate our formulation through an example. We consider the memory model depicted in Figure 1 in the form of a state diagram. We compare the average memory access time with and without the intermediate level cache memory.

Page 4: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 1. Two-level cash memory represented in a State-diagram: Then first level cache hit ratio is denoted h and the miss ratio is 1-h. We assume here that the memory access time increase from Level1 to Level2 by a factor . The upper memory level is the cache memory and it is denoted by Level 1, while the lower level is the main memory and is denoted by Level2.

C1 represents the memory access by the CPU; this access has a hit ratio of h and an average lookup time . In the case of a hit C2 represents the read operation with a time equal to ; and in the case of a miss, it accesses the next level of memory, denoted by C3. At the second level, we assume that the average lookup time is T3 and the memory read time is . Here > 1 is the memory speed factor between the two consecutive levels. The memory access time without the intermediate level memory is T2+ . Now, if we introduce the intermediate level cache memory as show Fig. 1, then the access time is:

T1 +h + (1-h)(T2 + + ) = T1+T2+ + -h(T2 + )

The system with intermediate cache memory has less average memory access time than the single-level system if:

T1+T2+ + -h(T2 + ) < T2 +which reduces to

Similar results can be easily derived for systems with more memory levels. As the memory hierarchy grows with more levels, we will find more parameters and more constraints to take into consideration and selecting the appropriates values for each parameter become more challenging as we try to optimize our design with respect to time and cost.

4. Model Architecture and Analytical Model

CPU

C1 C3

C2 C4

h

1

1

1-h

1

1

T1 T2

Page 5: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

We consider a hierarchical memory that consists of L levels and a lowest memory level m as shown in Figure 2. We model this hierarchy in a state diagram as shown in Figure 3. Each physical memory level, i, in Figure 2 corresponds to two states in Figure 3: The upper state Ci corresponds to the lookup time while the lower state C2i corresponds to the memory access time. The first state S0 corresponds to the memory request from the processor while the last state C corresponds to the lowest memory in the hierarchy.

Figure 2. Hierarchical Memory Model: CPU with L intermediate levels of memory and main memory m.

Figure 3. State diagram of a hierarchical memory system with L intermediate levels: Intermediate memory levels are represented by two states. The figure shows the hit ratio hi in every level and the access time Ti at every state.

This hierarchical memory system is a M/G/1 queue. In order to build the linear algebraic model for this system [1], we define the following terms:

S0

C1

C2l

-1

C2i

-1

C2 C2lC2i

h1

1

hi

1

1- h1 1-h(i-1)

111 1

T1T2l-1T2i-1 1-hi 1-h(l-1)

hl Cm

1-hl

1

T2 T2i T2l

Page 6: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

is the random variable representing the system time that corresponds to the total memory access time through all memory stages.

P is the sub-stochastic matrix that corresponds to the transition from one state to another one.

p is the entrance vector that corresponds to the state of the system when at the first memory request. p is a row vector of size , where is the number of intermediate levels.

is the unit column vector of size .

M is the transition rate matrix; it corresponds to the rates of leaving each state. M is a diagonal matrix of the same size as P.

I is the identity matrix of the same dimension as P and M.

B = M(I –P)

V = B-1 is the inverse of B.

P =

M = =

5. Work Done Having a good analytical model to represent our memory hierarchy, we first worked on studying the mean and variance of the memory access time. We derived the moments of the memory access time in [9] and we have shown that the access-time is power-tailed for an infinite memory hierarchy, then we used our results to go further and study the effect of adding more levels and increasing the size of each level on the mean and variance of the memory access time.

5.1 Power-tailed behavior and moments of the memory access time

Recently, ill-behaved distributions were object of study for performance modeling in various aspects of computing systems. One such distribution is the power-tailed distribution. According to Greiner et. al. [23], a power-tail distribution is one for which the reliability function, R(x), is of the form R(x) = , where

Page 7: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

. These power tailed distributions have the singular property of a finite mean but an unbounded variance (i.e ) for . This means these distributions may have the second or higher order moments unbounded. Such distributions are abundant in the traffic arrivals in local and wide area networks, such as, video services, FTP data connections, NNTP, and WWW packet arrivals. Moreover, the CPU time and file sizes are also shown to behave power tailed [24, 25]. Therefore our main question of concern was if the hierarchical nature of the memory produces such power tailed distributions. So, in [9], we took a linear algebraic approach to answer this question.

We have shown in [9] that the memory access time is given by the first moment of V and it is independent of the distribution of both the memory access request (which is dependent on the compiler) and the service time of the nodes (which depend on the hardware specifications of the memory levels). The mean memory access time is given by:

= = p V (1)

However the variance of the access time is dependent on the distribution of the service time of the memory nodes. It is dependent on the first and second moments of V. For exponential distributions, it is given by:

= 2 p V2 - (pV ) 2 (2)For non-exponential distributions, the variance is given by:

= + p V T Where = diag(C2

v1-1, C2v2-1, … ,C2

vL-1)

Where C2vi= is the coefficient of variation of state i in Fig.2.

In an application that has multiple consecutive memory requests like a database application for example, the memory requests will be queued and must wait to get service from a busy memory, so neither the previous models nor the above model will be sufficient to predict the exact time. Thus we used the P-K formula the next paragraph to find the exact queuing time.

5.2 The P-K Formula and Queuing Time

The Pollaczek-Khintchine formula (called P-K formula) [26] gives the expected average number of customers in queue and in process in M/G/1 queues. The P-K formula was combined with Little’s theorem [8] to show that the mean time spent by a customer in an M/G/1 queue is given by:

(3)

Where,

is coefficient of variation, ,

is the utilization factor, ,

and is the arrival rate.

We used equation (3) to predict the queuing time for our hierarchical memory system shown in Figure 3 which becomes as shown in Figure 4. The queuing occurs when we have a system with multiple

Page 8: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

consecutive memory requests that can not be accessed by the sequential memory at the same time, so the memory requests are buffered in a queue; this can be the case of a shared memory on a parallel machine, a simple database access application, or a pipelined CPU. In the last two cases the queuing causes a bottleneck and affects the performance of the system because it increases the CPI in a pipelined processor [3] and increases the query execution time in a database application [4].

The variance of the queuing time of the model shown in Figure 3 is the same as that of the model shown in the previous paragraph which is shown in equation (2). It is obvious from both equations (1) and (3) that the mean memory access times depend on several parameters including hit ratio and access time at each memory level.

Figure 4. Queuing diagram of a hierarchical memory system with L intermediate levels: Now the arriving memory requests arrive with rate and are queued before the CPU.

5.3 Interdependence of the Hit Ratios

Przybylski [11] used Agarwal’s cache miss model [12] to show that the cache miss ratio is inversely proportional to its size; so the hit ratio of each memory level, which is the probabilistic complement of the miss ratio, is proportional too to the inverse of its size and thus its cost. But extending this observation to multi-level memory is a little complicated and requires more calculations; so we define the following parameters for the system in Fig.2:

is the probability of finding data in the intermediate memory level i.

is the size of each intermediate memory level i.

is the cost per unit of size of each intermediate memory level i.

, where is a constant.

The total cost of the L-levels hierarchical system becomes:

(4)

For the memory hierarchy in Figure 3, we define the following terms:

L is the number of intermediate memory levels in the hierarchy.

S0

C1

C2l

-1

C2i

-1

C2 C2lC2i

h1

1

hi

1

1- h1 1-h(i-1)

111 1

T1T2l-1T2i-1 1-hi 1-h(l-1)

hl Cm

1-hl

1

T2 T2i T2l

Page 9: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

is the random variable representing the probability of finding data in a memory levelis the size of memory level i, Mi

is the hit ratio at memory level i, Mi ,

is the cost per unit of size of memory level 1, M1

is the cost per unit of size of memory level 2, M2

is the cost per unit of size of memory level 3, M3

We can proof by a simple calculation that the hit ratios are given by:

It is clear from equations (1) and (3) and from the above derivations in this section that the mean time is a function of the hit ratio and size of each intermediate level, so to optimize the mean access time, we will have to optimize (1) and (3) versus these parameters.

5.4 Calculations and Results

In order to show the difference between the mean memory access time in equation (1) and the mean queuing time for memory input/output requests in equation (3), we applied equations (1) and (3) to several cases then compare the results for each case. Our aim was to show that the mean access time is not the same for both methods and has different minima. We supposed that we have a hierarchical memory system we are building and we assume that the system has a cost . We wrote a Matlab code to implement our equations and to verify that what we mentioned is accurate. We have carried out an exhaustive set of program runs over several parameters. Since the results are consistent with each other we present only a few here.

We first considered the 2-Level memory system in figure 4 and we assume that it has an arbitrary fixed cost C. To study the mean response time versus S, the size of memory level 2, we assumed that S is in an arbitrary interval (4.4<S<7.8). So Sc , the size of memory level 1, will be:

We plot both the mean memory time, E(x), and the queuing time, E(T), versus the size of the level 2 memory in Figure 5. We remark that they have different minima -which confirms our assumption about the difference between them.

Page 10: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 5. Mean memory access time E(X) and mean queuing time E(T) versus the size S of the Level 2 memory, M2, for a 2-Level hierarchical memory system. E(x) has its minimum for S = 6.4, while E(T) has its minimum for S = 6.8

Then we plotted the variance of the memory access time obtained from equation (2) for the two-level memory system in Figure 6. The behavior of the variance is as important as that of the mean memory time and queuing time because it causes deviation from the mean time. Deviation from the each side of the mean time is undesirable: finishing too late is obviously undesirable because it can dramatically affect the performance (like increasing the CPI in a pipelined processor for example), but finishing too early is also undesirable because we waste our memory resources. We remark here that, while the memory time has a convex behavior, the variance decreases as we increase the memory size and this is normal because it depends on the second moment of the memory access time [9] which increases faster as the memory size increases.

To emphasize more on the difference between E(X) and E(T), we calculated the difference between the value of the minimum of E(T) and the value of at the same value of . We call this difference DiffT. We plot DiffT versus the input rate in Figure 7. We selected the values of to keep the system utilization

factor between zero and one [8].

Page 11: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 6. Variance of the memory access time versus the size S of the Level 2 memory, M2, for a 2-Level hierarchical memory system. The variance decreases as the memory size increases.

We remark in figure 7 that the difference is more significant as the traffic input rate and system utilization increases. This value goes up to 11.25 % of the minimum value of the mean time for a utilization factor close to 1.

Page 12: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 7. Difference between Min(T) and the value of X for the same value of S for different values of input rate , for a 2-Level hierarchical memory system.

To show that our results are independent of the number of level in a hierarchical memory, we repeated what we did for the 2-level cash memory to the 3-level cash memory. Now the system is more complicated because we have more variables. Here too, we assumed that the system has an arbitrary cost C and we assumed that S2 and S3, the sizes of the second and third memory levels, are in arbitrary intervals (1<S2<6 and 6<S3<18). So, S1, the size of memory level 1, will be given by:

We plotted both the mean memory time, E(x), and the queuing time, E(T), versus the sizes of the memory levels 2 and 3 in Figure 8. We remarked here too that both surfaces are similar and they have different minima.

Page 13: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 8. Mean memory access time E(X) and mean queuing time E(T) versus the sizes of memory levels 2 and 3 for a 3-Level hierarchical memory system. Both surfaces look concave. E(x) has its minimum of 12.59 for S2 = 4.5 and S3 = 16, while E(T) has its minimum of 13.68 for S2 = 4.75 and S3 = 16.

The plot of the difference between the value of the minimum of E (T) and the value of versus in figure 9 shows here a more significant difference equal to 21 % of the minimal value of the memory access time. This difference explains the difference between the predicted and measured performance that Jin et al. get in their paper about performance prediction on shared memory programs (13) in effect the authors there calculate the mean access time while the values they measure are those of the queuing time, this is why they get a 20% difference!

Finally to compare the performance of the two memory systems we plotted the variance of the memory access time for the three-level memory system in figure 10. We remark in figure 10 that the standard deviation is more sensitive to the upper level memory and it decreases faster as we increase Sb, the size of the upper level. If we compare figure 10 to figure 6, we remark the standard deviation is higher in figure 10 which means that adding one more level to the hierarchy increases the variance and this observation is very important because the computer architect must take it into consideration when designing systems sensitive to the variance.

Page 14: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 9. Difference between Min(T) and the value of X for the same value of S for different values of input rate , for a 3-Level hierarchical memory system.

Page 15: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

Figure 10. Variance of the memory access time versus the sizes of the intermediate Levels for a 3-Level hierarchical memory system. The variance decreases as the memory sizes increases.

6. CONCLUSION We have developed an analytical model to evaluate the mean and variance of the access time for memory requests in a hierarchical multi-level memory environment. We have shown and explained the difference between the mean memory access time and the memory requests queuing time. This difference explains the discrepancy between the analytical values and the practical values obtained by previous researchers. We have also shown the behavior of the variance of the access time as we increase the levels and add more levels to the hierarchical memory system: Increasing the size of a memory reduces the variance; however adding more levels increases it. Our observation helps the designer decide whether to use bigger levels of memories or use more levels in his design. Our analytical model shown in equation (3), is a universal model and can represent deep memory hierarchies that extend beyond the concept of local machine storage to network storage. This model uses Markov Chain analysis and can take different types of memory request distributions. Our model is also ideal for optimization because it can take different inputs like the hit ratio, size, cost and speed parameters of the intermediate memory level. This flexibility of taking different parameters makes it easy to expand and capture any level of hierarchy.

7. FUTURE WORK This work is part of a larger work examining performance of hierarchical memory systems with both analytical and simulation techniques. There are many topics we are either already investigating or hope to investigate soon. We intend to validate our performance model by comparing the predicted access times against execution times measured on real machines by using benchmarks. We are also planning to study the behavior of variance of the memory access time more in depth for both the exponential and non exponential cases. We are working right now on proofing the convexity of the mean times obtained in equations (1) and (3) and we are planning to apply several optimization techniques on equation (3) to optimize the design of

Page 16: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

multi-level hierarchical memories versus to come out with the fastest possible cost-effective system. We want also to verify our model by testing it on real values.

References

[1] D. A. Patterson and J.L. Hennessy, “Computer Architecture: A Quantitative Approach”, 2nd edition. San Fransisco, CA: Morgan Kaufmann Publishers, 1996.[2] Jon William Toigo, “The Holy Grail of Network Storage Management,” Prentice Hall, 2004[3] A. Smith, “Disk cache-miss ratio analysis and design considerations. ACM Transaction on computer Systems, 3(3), p 161-203, 1985.[4] I. MacIntyre and B. Preiss, “The Effect of Cache on the Performance of a Multi-Threaded Pipelined RISC Processor,” the Engineering Institute of Canada , 1991.[5] A. Smith. Cache memories. Computing Surveys, 14(3):473–530, 1982.[6] C. Chow. Determination of cache’s capacity and its matching storage hierarchy. IEEE Transactions on Computers, 25(2):157–164, 1976.[7] T. Welch. Memory ierarchy configuration analysis. IEEE transactions on computers, 27(5): 408-413, 1978[8] Lester Lipsky, “Queueing Theory - A Linear Algebraic Approach,” Maxwell Macmillan International publishing group, 1992.[9] Kishori M. Konwar Lester Lipsky Marwan Sleiman, “Moments of Memory Access Time for Systems With Hierarchical Memories,” 21st International Conference on Computers and Their Applications (CATA-2006), Seattle WA, March 2006.[10] Marwan Sleiman, Lester Lipsky, Kishori Konwar, “Performance Modeling of Hierarchical Memories”, submitted for review to the 19th International Conference on Computer Applications in Industry and Engineering (CAINE-2006), November 13-15, 2006, Las Vegas, Nevada  USA[11] Rajeev Balasubramonian, David Albonesiz, Alper Buyuktosunoglu, and Sandhya Dwarkadas, “Dynamic Memory Hierarchy Performance Optimization,” 27th international symposium on computer architecture, June 2000.[12] Ruoming Jin, Gagan Agrawal, “Performance Prediction for Random Write Reductions: A Case Study in Modeling Shared Memory Programs,” Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Marina Del Rey, California, pages: 117 – 128, 2002.[13] B. Jacob, P. Chen, S. Silverman, and T. Mudge. An analytical model for designing memory hierarchy, IEEE Transactions on Computers, 45(10):100–114, 1996.[14] X. Du, X. Zhang, and Z. Zhu. Memory hierarchy considerations for cost-effective cluster computing, IEEE Transactions on Computers, 49(9):915–933, 2000.[15] S. Abraham and S. Mahlke. Automatic and efficient evaluation of memory hierarchies for embedded systems. In Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, pages 114–125, 1999.[16] C. Chow. On optimization of storage hierarchies. IBM Journal of Research and Development, pages 194–203, 1974.[17] C. Chow. Determination of cache’s capacity and its matching storage hierarchy. IEEE Transactions on Computers, 25(2):157–164, 1976[18] H. Garcia-Molina, A. Park, and L. Rogers. Performance through memory. pages 122–131, 1987.[19] S. Rege. Cost, performance and size trade-offs for different levels of memory hierarchy. IEEE Computer, 19:43–51, 1976.[20] T. Welch. Memory hierarchy configuration analysis. IEEE Transactions on Computers, 27(5):408–413, 1978.[21] A. Smith, “Cache Memories,” computing Surveys, 14(3): p 473-530, 1982.[22] A. Smith, “Disk cache-miss ratio analysis and design considerations. ACM Transaction on computer Systems, 3(3), p 161-203, 1985.

Page 17: Dynamic Processor Allocation for Real time tasksmarwan/Publications/Proposal.doc · Web viewAbraham et al [15] show that using a cache simulator may take up to 466 days to simulate

[23] M. Greiner, M. Jobmann, and L. Lipsky. The importance of power-tail distributions for modeling queueing systems. Operations Research, 47(2), 1999.[24] [7] S. Garg, L. Lipsky, and M. Robert. The effect of power-tail distribution on the behavior of time sharing computer systems. ACM SIGAPP Symposium on Applied Computing, Kansas City, MO, 1992.[25] V. Paxson and F. S. Wide area traffic: The failure of possion modeling. IEEE/ACM Transactions on Networking, 3, 1995.[26] Daniel P Heyman and Matthew J Sobel, “Stochastic Models in Operations Research: Stochastic Processes and Operating Characteristics,” Courier Dover Publications, 2004

Biography:

Marwan Sleiman received his BS and MS in computer engineering degrees University of Balamand, Lebanon, in 1997 and 1999 respectively. He worked as a computer and telecom engineer for several years before joining University of Connecticut to enroll in the PHD program in computer science and engineering in January 2004.


Recommended