Tuesday, August 08, 2006 

More on Quantitative Design Methods

In a recent post, I discussed the idea of using a performance model at design time. Unfortunately, good models tend to be somewhat complicated. Recently I've been reading a paper by Dick Brodine ("Mathematical Server Sizing", IEEE Computer, July 2006), where he proposed a simple model based on queuing theory.
The model is quite simple, meaning:
- the input parameters can be measured on an early prototype or estimated easily
- the math involved is simple enough to be implemented in Excel :-)
Still, the model accounts for the non-linearity of loading. More specifically, it acknowledges the fact that response time degrades when the server is concurrently serving several requests. So, although very simple, the model can give a first-cut approximation of response time, useful for early server sizing.
The model is too simple to account for the dynamics of any specific server / container. For instance, there is no way to account for massive swapping due to large memory footprint when too many requests are served, or for the dynamics of garbage collection, and so on. Still, it's much better than just throwing a few number in a linear model, or just implementing the system without any attention to performance issues.
Comments:
The method is simple, just like many other queuing models. However, the math behind the solution involves simultaneous differential equations, hardly trivial.

I don't see why this method can't be used at any point in time, whether that be for an initial sizing for a proposed server software acquisition or down the road, when new user load is anticipated.

The key to using this method is the proper gathering of data to estimate lamda and mu. If the distributional assumptions are met, and the measurements for input are correct, the results will be valid.

Note that a model for a server farm where processors are not indentical does not currently exist. If you take a close look at that problem, you'll realize the mathematical complexity of this technique.
 
One other comment. If you have particular circumstances that affect server response time i.e garbage collection, estimate and measure the time for that type of event and include those cases in your set of runs that determine lamda and mu. For instance, if we have a user transaction profile that consists of .50 map requests, .25 query requests, .20 feature info requests, and .05 network trace requests(exponential), run each of these types of requests and calcuate mu. There is no reason that you can't make map requests say .48 and include .02 for time consuming garbage collections. So mu in this case would equal

1.0 / (.48 * ave map secs + .02 * ave garbage secs + .25 * ave query secs + .20 * ave feature info secs + .05 * ave network trace secs)

Once again, if you measure these items correctly(with WAS for instance), your performance estimates should be right on the money.
 
Hi Dick,
Nice to see you posting here :-).

Please note that simplicity was meant to be a positive connotation of your model. Complex models with hard to estimate parameters are seldom used in practice. Models that require custom programming to get the job done are not in widespread use either. So, simplicity adds significant value to any prediction technique.

The ability to use the model early was also meant with a positive connotation. At design time, we often have different choices for the granularity of services offered. Different granularity means, in the end, different lambda and mu. Having a simple model to estimate the performance profile adds to our design arsenal (of course, performance it's not the only concern), and that's a good thing.
Besides, during initial sizing the problem of heterogeneous servers in a farm is often not so important (we can assume we'll have homogeneous servers), while down the road it's more of a concern, as we are probably adding newer, more powerful machines to an existing farm.

About garbage collection and other phenomena (e.g. swapping / paging, resource contention, etc.), I'm afraid it's more complicated than that. We don't pay a small amount for garbage collection at every hit, especially with generational collectors. We end up paying big once in a while, when the collector has to scan the large generations. Unfortunately, under heavy load, that happens more frequently, but it's not really spreaded over each request. Swapping / paging it's even more obvious: for reasonable loads, we basically don't have paging. At a point, paging kicks in, and adds a significant amount to every request. So it's not a constant cost that can be accounted for easily in your model.
In other models (usually more complicated, like "An Analytical Model for Multitier Internet Services and Its Applications", SIGMETRICS’05) those problems are somehow accounted for (although indirectly, see 3.3 in that paper). What is also useful in that model is that we can provide concurrency limits at every tier.

That said, more realistic models are usually much more complex, and we don't necessarily want to use them, especially early on. In fact, what I would really need is a simple model for multi-tier architectures. Several models I've seen requires the estimation of lambda-i (lambda at tier i), which is somewhat wrong. We should only estimate lambda at the outermost tier (that is, user's behavior). Knowledge of the arrival rate at the other tiers (given the various mu-i) must be built inside the model itself.
 
So you're saying that the probability of garbage collection and paging increases with load.
If my formulation were expanded to handle the general case where all servers are not identical, then we would measure mu at various levels of load. That would capture the garbage collection/paging in the measures of mu and expected response should be accurate. Perhaps Trivedi or Harris and Gross are interested in solving this one?

Incidentally, I can write a stand alone program that pages like crazy. Also, garbage collection can occur repeatedly in a single program running by itself, depending upon how variables go in and out of scope. It depends on how you write the code. However, I'll agree that paging/garbage collection are a function of load. But I'm not sure of the extent to which it underestimates mu.
 
Here's an idea.

My tool should gather the density functions of paging and garbage collection as a function of load.
The user might enter something like this:

Load/Server Paging Time

5 .05 sec
10 .15 sec
15 .25 sec

Load/Server Garbage Collection Time

5 .25 sec
10 .50 sec
15 .75 sec

A regression could produce continuous curves for each of these densitities. Then, when calculating mu in the tool, I calculate mu as usual, varying load and add on Paging Time and Garbage Colelction Time to mu. This might work. But, can users actually determine these numbers? How can they go about gathering this data?
 
Well, if you have a working systems you can measure the time spent in GC and paging with platform-dependent tools (for instance, the performance monitor on Windows), although it gets a little complicated.
Unfortunately, at design time it's quite hard to estimate the impact of GC and paging. It's very platform dependent.

Anyway, I would probably stay with a simpler model, at least at design time. Or go with an approach like the one in "Design-Level Performance Prediction of Component-Based Architecture", IEEE TSE, November 2005 (I referenced this paper a few posts ago) where you add specific knowledge of the target platform behavior into the model. But that's no longer a simple model, I'm afraid.

Speaking of tools: is the tool you described in the paper freely available? You provided all the details to create one, but I'm sure more than a few people won't mind getting yours :-)).
 
I'm debating whether to make my tool freely available. Actually, I'm hoping that someone from Microsoft, IBM, or Sun purchases the rights to the tool. The formula is public domain. The tool is copyrighted.

Here's another comment. Trivedi's original model was for a mainframe single processor system. In this model, average request response time varies with load. In terms of the math, the increase in expected service time increases with load. So, in effect, he captures changes in traffic. So what does increasing traffic mean? It could certainly represent paging and context switches between multiple concurrent tasks. That was true of mainframe os' years ago. The factor that is different is gc.

Should we attribute an increase in expected response time to specific factors like gc or paging? Or just mathematically describe the phenomena, that is, service time increases with load? I think my model may be correct. In testing at GE, my predicted number of servers was within 1 server of the actual number of servers achieving a certain level of performance. So I'm going to contend that the traffic factor in Trevedi's model, and my enhancement to his formulation, account for gc and paging. Testing should verify this contention. Unfortunately, my current assignment is software development. So I don't have the resources or the time to formally validate my model.
 
Here's an update on the server sizing tool. It is being used to size two projects at Level 3 Communications. The first project is for an unnamed country and involves an estimated 3500 servers. The second project is for the Department of Defense. In both cases, the estimates from the tool are within .05 of the estimates that Level 3 came up with via other methods.
 
Post a Comment

<< Home