|
Could Microsoft make available a detailed performance report from LoadRunner - backing up published benchmarks - the one comparing Websphere on Power 570, Websphere on Blade, and .Net on Blade? The numbers for Websphere on Power Server 570 would be most helpful - at least. It would be helpful if a report included HTTP response codes, bytes transferred, and response times. Microsoft published 8,016 TPS for Trade Web under Websphere on Power 570, which has 8 CPU's IIRC. Are we to understand that transaction response times average 1 millisecond per CPU? That's 8,016 / 8 CPUs / 1000 milliseconds? Thanks, Nathan Andelin
| | Nathan Andelin | Just to follow-up, I was kind of hoping for a more detailed performance report - something perhaps comparable to the SPEC Java Application Server report format.
Here's one from HP:
Here's one from IBM:
I pulled these reports because the hardware & software configurations are quite comparable (4 cores, 2 chips, 2 cores per chip, 16 gig Memory, 2 app server instances, both running a version of Unix, & published the same year).
The main difference in software seems to be IBM Websphere/DB2 vs Oracle equivalents, while the main difference in hardware seems to be IBM Power vs. HP-Intel.
IBM reports 1,197 operations/second while HP reports 874, but the average/maximum/90th percentile response times were even more interesting to me.
Thanks,
Nathan Andelin.
| | Nathan Andelin | Nathan - I noted your persistence in terms of experience and background in some of the other threads. You can actually use the provided capacity tool to generate like results. The only reason why the whole Mercury file is not provided is that 1: full code is provided, 2: detail of scripts and what was enabled is provided, and 3: a small majority have a license that will allow the number of concurrent connections per the particular test tool.
The major item you bring up is the date of 2004. It is 2009, and thatencapsulates at least one major polished change in at least Microsoft's Server Platform codebase / offerings, and numerous iterations of changes in CPU architecture. Five years of changes, and in most ways - as detailed by the results - a closing to a particular mark. You can reverse engineer everything you want to prove - with the tools provided in the download as well as duplicating it on any testing platform you wishas the scripts are provided and easily translated to most popular capacity web stress tools (Empirx, Mercury, etc)as well as the one that is included. That is why all the extra is provided - prove it for yourself, since you have been bitten by the "promises" before. Everything you want to prove or disprove is already right there - you needn't the full Mercury logs because the data has been provided and / or can be extrapolated. Do you have a license for Mercury?
Jody | | CN2 Technology | Jody,
The SPEC reports I referenced are from February and August 2007 test dates - not 2004. IIRC, August 2007 was the same date that Microsoft published their Stock Trader benchmark. So the time frames seem relevant to me.
No, I don't have a Mercury license. And it would take a lot of effort for me to set up an environment to run my own tests. Unfortunately, I don't have the resources to pursue that.
It appears to be true that Microsoft has provided the source code and tools for individuals to pursue their own testing, but considering the effort involved, most organizations won't be doing that. It could be that individual test results may be substantially incongruent with Microsoft publications.
So it would be helpful if Microsoft backed-up its brochure style reports with actual details. Microsoft never clearly defined what "Transaction" means, for example.
Nathan.
| | Nathan Andelin | Transaction (as noted in paper) means:
1) For Web app test: a page returned to user in browser. The test script flow is fully documented; one transaction equates to multiple business service calls (local); db calls (to remote db); depending on the logic of the page. The number of such calls for equivalent pages in the app are the same for WebSphere and .NET implementations. But in this workload, the "transaction" is the entire page being returned to browser/user.
2) For the other two web service workloads; a "transaction" is simply a completed web service request; so more granular than for the Web app; except in this case all web service requests ("transactions") are business service ops called as remote web service requests from the included Capacity Planner tool.
-Greg Greg Leake, Microsoft | | Gregory Leake | Transaction ... means ... a page returned to user in browser.
See, that's part of the confusion. You're reporting 8,016 pages per second from a Java Application Server that has only 8-CPU's. That would be slightly more than one page per millisecond per CPU. That level of through-put is so far removed from any real-world workload that people have a hard time with the numbers. Nobody gets one millisecond response times from Web Application / Database servers running on separate hardware tiers.
Perhaps a detailed performance report is asking too much. For anyone having an instance of the Stock Trader, how about sampling byte counts and response times using Microsoft Fiddler, and indicating some results here. Fiddler is a great free tool from Microsoft that reports numbers that ordinary folks can relate to.
Thanks,
Nathan.
| | Nathan Andelin | Transaction ... means ... a page returned to user in browser.
I'm still having difficulty with the 1 page per millisecond per CPU indicated in the report. How could that be correct? I was hoping that someone else might try to confirm that, using Fiddler or Firebug or something, since I don't have Stock Trader running.
When I monitor other applications (including Microsoft applications), I don't see that type of response time. For example, Fiddler indicates that it takes about 2.2 seconds to return this page from social.msdn.microsoft.com. The page itself takes about 400 ms, but spawns 5 additional asynchronous requests, evidently to dynamically update parts of the page. That's still an average of 366 ms per request. And the page is not very complex.
The total byte count returned is about 12K, which seems normal for an application of this type. Can't blame the 2.2 second response time on the Internet. It only takes about 60 ms for me to download that amount of data. The majority of time appears to be spent at the site hosting the application.
- Nathan
| | Nathan Andelin | Hi Nathan,
I am working on getting you the data you request. Basically, will do two complete runs (one .NET, one WebSphere 7) in Mercury, and for these breakout in a paper with screen shots and also tabular data min, max, avg response times; total bytes and total bytes/sec, response codes (error rates), etc.
I am doing these runs on Win Server 2008, on HP DL380 dual-quad core setup (64-bit for all stacks, OS and WebSphere/.NET); with a single DL380 HP dual-quad core database with appropriate RAID arrays for fast data access (reads and writes). It will still take me a few days to complete; this is a supplemental report, not meant as a replacement at all for the current paper, for which all data remains valid. However, not all of that equipment is still available to me; hence getting you data on the setup listed above.
This includes one additional tuning setting that IBM has recommended (-Xgcthread8 on JVM arguments). I am doing this wrt to verifying data on a modified script another customer is asking about, that concentrates more on buys/sells (with the series of transactions these spawn in both apps); so the results are not directly comparable to the full scripts run in the complete paper; although if you insist I could do these as well on the equipment available to me.
One thing to keep in mind wrt to response times......we are testing on a dedicated 1GB backbone.....driving load from 1.GHz clients (32 physical machines). In the tests, which are designed to stress the app server, not the network, Mercury is getting back the app-servergenerated HTML, but specifically set not to return images from each page. The web app workload does stress the app server in a typical way, with multiple business logic calls from the JSP/ASP.NET pages, and multiple databases calls (JDBC vs. ADO.NET) per page--with each page returned marked as a transaction. Think times are 1 second, as before, but of course the 1 second is not part of the response time, which I believe Mercury marks as time to last byte received on a page request.
In both cases (for WebSphere and .NET), a fairly typical response time pattern is observed, that is to say, response time remain flat up to server saturation, then elbow up as further requests must be queued by each app server. The intent is to measure each app server at peak TPS rates, meaning at a concurrent user loadjust at saturation of the app server, which happens at or near 100% CPU utilization of the app server tier.
In a nutshell, yes, the response times as tracked by Mercury can be as low as .001 seconds (even lower) on average before server saturation is hit, on our dedicated backbone which consists of clients on one switch, linked to a separate LinkSys switch to which all servers are wired. The communication to the DB, from the app tier, happens on a separate dedicated subnet on dedicated switchs to which app servers and database are wired.
Both apps are extremely fast; both do the same amount of work, db queries, transactions, etc. And I believe the workload is realistic for a properly coded, data-driven web app. Browser render time is not captured, as Mercury clients are not rendering the HTML received, just getting it back and making sure response codes are correct. Consistency checking is also done post testing, to ensure the proper number of registered users, buys, sells are actually getting to the database wrt to what Mercury reports.
I appreciate your interest in this benchmark; I still 100% stand behind the results, and the comparisons made in the paper, and I hope the extra detail I am working on help convince you of the same.
-Greg Greg Leake, Microsoft | | Gregory Leake | Hi Greg,
Thank you for your follow-up on this request. The idea of you testing on scaled-back hardware also appeals to me. Pairing a set of quad-core application servers with a set of quad-core database servers seems to be a more realistic configuration for small-to-medium sized organizations - I relate better to that.
If it's not too late, I'd also suggest that you run against a single data store as opposed to having separate application servers updating the same data in separate data stores. In a real-world setting you wouldn't want to store an individual's account information on two separate DB instance's, except for back-up purposes.
-Nathan
| | Nathan Andelin | Yep; I am doing against a single database instance as you suggest. Look for data soon, I am going to try to finish this up over the holiday weekend. Note we used two databases in the original report becuase both the Power 570 server and the 4 HP Blades w/ Win Server are capable of such high throughput rates as to shift the bottleneck to the database tier with a single database instance. This allowed us to ensure we were properly measuring true app server peak throughput rates.
On the scaled back middle tier (single HP Intel/Win Server 2008 app server); this will not be the case; a single database server on fast raid arrays will suffice.
-Greg Greg Leake, Microsoft | | Gregory Leake |
Nathan,
Here are the stats for two runs each (WebSphere and .NET) For the Web App Test. Note these results are not to be directly compared to the published paper since they are on different hardware (single quad-core); different script (shorter, more emphasis on register, buy, sell, logout). The original results in the paper are accurate, this is just to provide more detail that Nathan requested. The results are recorded using LoadRunner operating on 32 physical client machines (as before) with a 1-second think time between requests. A transaction response time is defined as the time it takes for a full page (less images) to be returned to the agent; in other words, each tx is a page request by the user. The first two runs below are for .NET StockTrader 2.04 and WebSphere 7 Trade at 2,000 concurrent users; which is below server saturation for both app servers. You can see extremely low response times. The last two runs are at 2,900 concurrent users, which has just hit server saturation (~100% CPU) for both app servers. Response times are just beginning to climb. Adding more users beyond 2,900 would result, for both app servers, in no increase in TPS, but rapidly spiking response times as further concurrent requestsneed to bequeued by both app servers. Bytes per second are higher for .NET becuase the HTML is more stylized, avg bytes of HTML per page is slightly higher becuase of the extra layout tags. This gives a slight artificial edge to IBM WebSphere, although the byte size per page is quite close to .NET. In this shorter script, .NET and WebSphere test out about the same in peak tps rates. One interesting observation is that .NET actually processes buys and sells more quickly than WebSphere; however, since the script is shorter, there are more logins (register does a login) and logouts per script iteration. These turn out to be the most expensive ops in the .NET StockTrader based on its design, since both do a Response.Redirect from the server after ASP.NET Web Form postback, meaning the ASP.NET pipeline is entered into twice, as iftwo pages are being executed instead of just one. Register redirects to the trade home page on successful completion; logout destroys the forms auth session, then redirects to the login page. This design could be changes, but seems quite typical for such ASP.NET ops where the end result of a postback is to goto another page in the app. At any rate, the results are very equivalent for .NET and WebSphere 7 on this shorter script/test bed setup.
App Server: 1 HP/Intel Quad-Core @ 3.00 GHZ Database: 1 HP/Dual Quad-Core @ 2.7 GHZ (SANS Array Storage, two controllers)
DOTNET @ 2000 Users
Tx Response Times
Min Avg Max Median Std Dev
Register: .002 .005 .01 .004 .002
Quotes: .001 .001 .003 .001 .001
Buy: .001 .002 .005 .002 .001
Sell: .001 .002 .005 .002 .001
Logout: .001 .001 .006 .001 .001
Bytes/Sec: 20,658,415
Failed Txs: 0
Tx/Sec: 1,987.30
WebSphere 7 Trade @ 2000 Users
Tx Response Times
Min Avg Max Median Std Dev
Register: .001 .002 .006 .002 .001
Quotes: .001 .001 .004 .001 .001
Buy: .006 .008 .015 .008 .002
Sell: .006 .008 .014 .008 .001
Logout: .001 .001 .004 .001 .001
Bytes/Sec: 17,266,031
Failed Txs: 0
Tx/Sec: 1,983.20
DOTNET @ 2900 Users
Tx Response Times
Min Avg Max Median Std Dev
Register: .091 .113 .135 .112 .011
Quotes: .039 .05 .061 .05 .006
Buy: .05 .061 .071 .06 .005
Sell: .049 .06 .071 .06 .005
Logout: .081 .103 .125 .102 .011
Bytes/Sec: 28,467,761
Failed Txs: 0
Tx/Sec: 2,693
WebSphere 7 Trade @ 2900 Users
Tx Response Times
Min Avg Max Median Std Dev
Register: .031 .034 .039 .033 .002
Quotes: .018 .02 .024 .02 .001
Buy: .045 .048 .053 .048 .002
Sell: .037 .039 .043 .039 .002
Logout: .016 .019 .022 .018 .001
Bytes/Sec: 24,364,131
Failed Txs: 0
Tx/Sec: 2,796
Greg Leake, Microsoft | | Gregory Leake | Thanks for the detail, Greg. That gives me an idea of where you're coming from.
This discussion motivated me to download a trial version of HP Loadrunner, to do some stress testing against some of my Web applications, and to examine their performance. I discovered that Loadrunner can generate a formatted document, which I added text to, for context. Following is a link to the report:
I don't really know enough about StockTrader to draw decent comparisons, but my report considers the price/performance of some real-world applications which run under IBM i OS & IBM Power Servers. I calculated a price/performance ratio of $12.54 for my applications.
You might be able to roughly compare that to the $32.45, $7.92, and $3.99 figures that you reported in your benchmark for Webspere on Power, Webspere on HP Blade, and .Net on HP Blade.
But note that the $12.54 in my report includes the cost of the application server and database server combined, since the database is part of IBM i OS. Consolidating workloads on a single server tends to reduce cost of ownership over time, too.
In the case of the .Net application, the $3.99 price/performance ratio included the $50K price of the application server, but left out the $300K (ballpark cost) for the database servers. By adding in $300K, the $3.99 .Net figure would go to $27.83.
Hope that helps.
-Nathan
| | Nathan Andelin | Hi Nathan,
That is a start. Some questions though:
1) I do think you need to compare equivalent workloads. Without this, the comparison islikely notveryvalid. 2) Please breakout your ~300K cost in database servers, including DB2 and SQL Server. This would be interesting to add to the equation. Not sure what DB software you are pricing; and how. Most of the cost in database, I believe, is in the storage (SANS array); not the server running the database itself; or necessarily in the DB software. I do believe, but am willing to be proven wrong, that in terms of software, SQL Server Std or Enterprise really beats Oracle and DB/2 in terms of overall price, especially price-perf ratios.
With that said, this benchmark was meant to measure middle tier (not Database) hardware + software acquisition costs; inclusive of overall perf; price paid for; and price-perf metrics.
More comments welcome! The debate around cost and overall perf can only benfit customers in the long run (and, obviously, I think MSFT since we tend to compete on cost!)
-Greg Greg Leake, Microsoft | | Gregory Leake | Hi Greg,
I understand that the workloads are not equivalent, which makes a price/performance comparison pretty rough, but there's some indication in the study that my workloads may be more CPU & DB I/O intensive than yours. You have some transactions that run in the 1-2 millisecond range, while all mine use more CPU than that.
For example, just to generate a list of menu items under OnePoint Portal requires a lot of DB I/O because the Portal only shows items that the current user is authorized to. For every item on the menu, the Portal checks to see if the user is authorized to it, or if the user belongs to a group which is authorized to it. Otherwise the item is not even output to the screen. None of that is cached I/O; an administrator could change user or group authority real time, and the results would be immediately reflected on the screen. These applications are scoped for real-world use, not just for benchmark studies.
Regarding the $300K ballpark cost of your database servers, I wasn't even considering the cost of the SANS array, now that you mention it. But your report indicated that you were running SQL Server Enterprise Edition on 8 processors & 32 cores, so that cost alone would run about 8 * $23.5K = $188K. You priced your HP blade servers at 50K, and indicated that you were using a couple of even higher capacity HP blade servers to process DB I/O.
I understand the problem of doing rough comparisons of different Web applications, but I don't understand your rationalle for leaving out database server costs. Your applications obviously require a database.
-Nathan
| | Nathan Andelin | Hi Nathan,
The rational for leaving the DB pricing out of the cost is the same for leaving the DB out of the system under test (not a limiting resource, the focus is purely on the application middle tier). Its fine to do it either way, of course; however, given the capacity of the databases; it would be common to have other separate middle tier servers and apps running against the same database(s), since the DB is not a limiting resource by design. If you do want to roll the database costs into the overall cost (even though the dbs have capacity to simultaenously support other apps, and could be downsized/down-priced):
1) The hardware cost for SQL Server 2008 and DB2 9.5/9.7 are the same since both are running on exactly the same hardware. I have not priced the disk arrays; but the two 32-core Intel/HP (quad quad-cores) blades are each about $20K. So this same amount would be added to both IBM WebSphere costs and MS .NET costs.
2) Then, the only difference becomes the price of DB2 vs. SQL Server. I do not think much difference there either, although each would need to be priced out.
But, the bottom line in these tests are that we focus on the middle tier, including performance and cost. Database access is involved, to make the workloads realistic; but the databases are super-sized to ensure they are not a bottleneck in the tests; and we are truly measuring peak TPS of the app server tier. In both cases (websphere and .NET); the database hardware and software cost is not included for those reasons; although you could add in if you want, to both the websphere (with DB2) and .NET (with SQL Srver 2008) costs.
-Greg Greg Leake, Microsoft | | Gregory Leake |
|