The technical effort and publicity that Microsoft is putting into this campaign is remarkable. It first came to my attention through a blog at a site that caters to IBM Power System customers:
Quoting from one response - "what I find humorous is the use of the term "ground-breaking benchmark study". What it really amounts to is statistical smoke geared to those who don't have a clue about systems. Their target is mainly those in management positions who are unfamiliar with the hardware and will be captured by the dollar signs."
That response reminded me of an experience early in my career, while working in the IT department of a Fortune 100 company, which illustrates the danger of putting too much focus on a single metric. In our case, the IT department fell under the VP of Finance whose personal performance appraisal included the metric "return on assets", which led to a top-down policy of leasing nearly all computer equipment. Exceptions were allowed, but required special approval - so managers who were lower down on the organizational chart just found it easier to lease.
My immediate supervisor finally did a comprehensive study of the cost of buying vs. leasing which favored buying - to the tune of several hundred thousand dollars per year. Unfortunately the leasing policy negatively impacted the company for many years due to a clueless oversight at the executive level - too much focus on a simple metric.
One irony of Microsoft's elaborate multi-year effort is that they could have simply written a CPU bound benchmark - say a dozen lines of code to iterate through a string concatenation algorithm. Then have launched several instances on a IBM Power Server, and have launched several instances on a Blade Server and produced about the same results - you accumulate more iterations per second on a 16-CPU Wintel blade than an 8-CPU Power Server for a fraction of the cost.
Of course, no one would give any heed to such a ridiculous benchmark - it's too far removed from any real-world workload - it would be meaningless. Microsoft's elaborate web-services workload is not like that! Or is it?
People who develop and deploy applications on IBM Power Systems are kind of puzzled that Microsoft would even sponsor a configuration - placing an IBM model 570 in solely a web application server role - front ending a database server. Why not use it the way it was designed - to handle both application server and database server roles at the same time?
Occasionally IBM Power Servers are used as database servers supporting Wintel application server farms, but for Microsoft to cast a Power Server in only a role of an application server suggests that the real intent of Microsoft's benchmark report was to cast IBM Power Servers in the most unfavorable light possible.
What would be the result of using the Power Server as both an application server and database server? In the case of the stock trading web service the database workload appears to represent a small fraction of the overall workload. It would probably have relatively little impact on overall through-put of the Power Server. But Microsoft would then be the unfavorable position of needing to include the cost of its database servers in the report. And database servers tend to be costly.
Placing the database on the Power Server would pose another dilemma for Microsoft. Rather than a fantasy configuration of using two (2) databases on two (2) blade servers, they'd need to consolidate that, somehow. How real-world is it to maintain and update the same customer accounts on two separate databases on two separate blade servers, anyway? Replication services may be an exception. Let's get real.
The Power Server that Microsoft used in their study was running AIX - a version of Unix. A shop that runs Websphere under AIX is normally adding a web services workload to an otherwise AIX workload. In the real world, how do you run a Unix workload on an HP blade server? On a Power Server you just combine the workloads - not even a need to partition the box. On a blade server you need more blades - more OS instances - cost goes up.
What about load balancing? When running multiple application server instances on multiple blade servers you normally front-end them with load-balancers. The benchmark report didn't mention any load-balancers. A Power Server doesn't need any - just configure more HTTP Server / Websphere threads. Did Microsoft bypass the load-balancer requirement by configuring different Load Runner clients to access different IIS servers? How real-world is that?
The value of a Power Server like the one featured in the report is that it supports running many complex workloads concurrently. In addition to Websphere and database serving it might also be handling PHP, CGI, FTP, Telnet, Lotus Notes, Domino, business intelligence, batch reports, file serving, user authentication, etc.
Conventional wisdom is that complex workloads tend to destabilize Windows. So you find a lot of data centers around the world hosting Wintel server farms, taking up a lot of space, burning a lot of electricity, lots of cabling, running at only 15% capacity, maintained by a relatively large team of technicians.
Rather than debating about how Microsoft may NOT have tuned a Websphere workload appropriately, perhaps the debate should be about centralized architecture vs. distributed architecture. Granted that J2EE and ASP.Net are both distributed architectures. But a Power Server is geared for consolidated, centralized workloads.
Under distributed architecture you see sets of domain controllers / active directory servers handling network authentication and authorization - as database other products authenticate against multiple sources - including within the product itself.
Under centralized architecture you see applications and database servers, file servers and so forth running under OS defined user profiles - you don't have to go across a network to a different server to check authority.
Under distributed architecture you see large teams of technicians setting up servers and trying to tuning performance through numerous manual settings - though not necessarily the type of tuning that Microsoft indicated in this report. Real-world tuning has more to do with analyzing and synchronizing workloads on different servers - lots of human effort.
Under distributed architecture you see different applications running on different servers. Someone has to keep track of software inventories, needed updates, and so forth - let along hardware requirements.
Under a centralized architecture, low-level task dispatching in the underlying OS balances workloads across CPU's. That may sound a bit trite, but what it amounts to is that you have one server running thousands of different tasks, hundreds of different workloads, with say 60-90% CPU utilization, and good overall response times - without manual synchronization.
Under a Blade architecture, a request may be dispatched to a server that's running a virus scan, or garbate collection, and users begin complaining about eratic response times. A request dispatcher may not have any idea what a CPU is doing.
The stock trader benchmarks don't appear to have complex business rules. SPEC J2EE benchmarks are a bit more varied (interesting) - showing breakouts for browse, purchase, manage, and manufacturing. Average response times for three of the categories run in the three iterations per second range, but the average response for a manufacturing transaction is in the 2-3 seconds range, which is more common for many Web workloads.
In a different thread in this forum, I recently posted links to SPEC benchmarks which pitted HP-Unix-Intel against IBM-AIX-Power which reported higher throughput for HP-Intel on purchase, browse, and manage. But reported higher throughput for IBM-Power for manufacturing - such that overall the benchmark fell in IBM's favor - by a significant margin. The lesson is to beware of looking at a single metric, or even a few metrics of performance.BTW, that thread was a request for a Load Runner report used in this benchmark, which seems to have been ignored.
The stock trader benchmark appears to be misleading, in that the average reader might misinterpret it as an example of a mixed workload, when it's actually three separate snap-shots of essentially the same workload - not a good example of real world web services. IBM may have come up with the original workload, but that doesn't justify it. It doesn't even begin to approximate real-world workloads. A comprehensive ERP system consists of hundreds of different applications - hundreds of different database tables - thousands of URLs.
It's pretty common for organizations that deploy web services workloads on IBM Power Servers to have Websphere handle browser I/O, but use components running in the native environment to handle complex business rules and database I/O. Native languages and the native interfaces are more efficient. You don't see that with ASP.Net - where you see about everything running under managed code - less efficient.
Some Power Server shops deploy all application components natively and achieve remarkable throughput. Native interfaces are more efficient. Granted that this thread deals specifically with Web application servers - but the real world is not limited to that.
Stock trader appears to rely on connection pooling managed by database servers. That seems to work in a really simple application. But with complex systems involving hundreds of database tables, that leads to memory leaks over time as more and more database processes open more and more database tables and views. In the real-world, DB connection pools should be managed at the application level - applications should allocate and de-allocate resources according to end-user input - such as clicking the "Exit" link.
Bottom line - beware of basing decisions on simplistic metrics - the ones reported here appear to be exceedingly misleading.
Nathan Andelin