We just got a new Server at work that will be used only for running SAS code, and I've been asked to run some tests and make sure that it's performing better than our other servers. I'm not an expert at this so I want to make sure I avoid making naive mistakes that don't properly measure the performance of the server. My header looks like this:
options fullstimer;
%LET BenchStartTime = %sysfunc(datetime(),22.);
Which I use as a check for the "real time" report in the log. I have a vague understanding of the difference between "user cpu time" and "system cpu time", but if anyone wants to offer up additional information on that, that would be helpful.
Anyway, the main point of this post is that I want to know if there are any standard benchmark tests that I should be using in order to see if this new server is better than the old ones. Currently I'm using something I found online which is just appending a bunch of copies of sashelp.class (but I think this might be a bad idea because pulling from the C: and loading it into a different drive might be the same across all servers, right? If the C: is the slowest drive, doesn't that become the bottleneck?), and I'm also using code that basically generates a bunch of random data of a fixed size and comparing runtimes. Is this the correct approach? Is there something else I should be doing? How many times should I be running these benchmark tests to make sure that it isn't a fluke?
Thanks for your help!
I would test by doing the things that you normally do. If you run larger merges, then you're basically talking I/O; so just make a very large dataset, write it out, read it in, etc., and perform the same test on the other machine. Perform each test a few times each in a fresh SAS session.
Further, it sounds like you need to make sure the new server can handle multiple concurrent sessions. You can simulate this in part by submitting many connections from one computer by using SAS/CONNECT; that allows you to start multiple concurrent sessions. Then do so, perhaps I would create a script that starts a local SAS session, signs on to the server and rsubmits a job of a normal difficulty that takes maybe 5 to 10 minutes, and then run that script 20 times concurrently (you can script this or just double click a .bat file 20 times). See how it handles it as compared to the other server.
On SAS 9.2 and later you can use the %SASHOME%/SASFoundation/9.x/sasiotest.exe utility. You will find some further Guidance in Support note 51659.
Edit:
A similiar utility can be downloaded for Unix Plattform, details in Support note 51660.
Related
Is there a Progress profiling tool that allows me to see the queries executing against an OpenEdge database?
We're doing a migration from an OpenEdge database into a SQL database. In order to map the data correctly we'd like to run certain application reports on the OpenEdge database and see what database queries are being executed to retrieve the data.
Is this possible with some kind of Progress profiling tool (a la SQL Server Profiling)? Preferably free...
Progress is record oriented, not set oriented like SQL, so your reports aren't a single query or a set of queries, it is more likely a lot of record lookups combined with what you'd consider query-like operations.
Depending on the version you're running, there is a way to send a signal to the client to see what it is currently doing, however doing so will almost certainly not give you enough information to discern what's going on "under the hood."
Long story short, your options are to get a Dataserver product so you can attach the Progress client to an SQL database - this will enable you to use an SQL database w/out losing the Progress functionality. The second option is to get a copy of the program's source code to find out how the reports are structured.
Tim is quite right -- without the source code, looking at the queries is unlikely to provide you with much insight.
None the less there are some tools and capabilities that will provide information about queries. Probably the most useful for your purpose would be to specify something similar to:
-logentrytypes QryInfo -logginglevel 3 -clientlog "mylog.log"
at session startup.
You can use session triggers to identify almost anything done by any program, without modifying or having access to the source of those programs. Setting this up may be more work than is worth it for your purpose. We have a testing system built around this idea. One big flaw: triggers cannot be fired for CAN-FIND.
I am using sqlite from c++ and I want to implement a progress bar that will inform user about the progress of a search.
Using sqlite3_progress_handler I can set ca callback to be called every N virtual machine instructions. This is ok for an infinite progress bar that is notifying user the app is still working.
What I need is a progress from 0 -> 100%. Can this be done ?
I realize that this is a bit late, and that this question already has an accepted answer, but I think a little more information would be useful.
As the OP noted in the question, the sqlite3_progress_handler can be configured to call the callback function every N VM instructions. This is not a time progress monitor or a query statement progress monitor, but a monitor for the VM instructions that the query planner has calculated for the query (which will do in a pinch).
By prefacing the query with 'EXPLAIN QUERY PLAN' (or just 'EXPLAIN' for short) and stepping the results to get a count, you will know how many VM instructions are in the query plan. There's your 100% figure.
BE SURE to read the caveats on the SQLite.org website about the EXPLAIN QUERY PLAN command, especially the part about not relying on the output format. But for this situation, we're not concerned with the information in the results, but only the number of instructions.
As of SQLite version 3.24.0 (2018-06-04), the output format for the EXPLAIN QUERY PLAN command diverged significantly from the EXPLAIN command. EXPLAIN QUERY PLAN is no longer suitable for this use case; you have to specify the EXPLAIN command itself.
To be clear, the EXPLAIN command is not published as supporting the sqlite3_progress_handler() API. The fact that it can be used in this case is purely coincidental. You should always test that this works on whichever version of SQLite you are using.
It is not possible for the database to predict how much time (or how many VM instructions) a query will need.
I currently have a django site, and it's kind of slow, so I want to understand what's going on. How can I profile it so to differentiate between:
effect of the network
effect of the hosting I'm using
effect of the javascript
effect of the server side execution (python code) and sql access.
any other effect I am not considering due to the massive headache I happen to have tonight.
Of course, for some of them I can use firebug, but some effects are correlated (e.g. javascript could appear slow because it's doing slow network access)
Thanks
client side:
check with firebug if/which page components take long to load, and how long the browser needs to render the page after loading is completed. If everything is fast but rendering takes its time, then probably your html/css/js is the problem, otherwise it's server side.
server side (i assume you sit on some unix-alike server):
check the web server with a small static content (a small gif or a little html page), using apache bench (ab, part of the apache webserver package) or httperf, the server should be able to answerat least 100 requests per second (of course this depends heavily on the size of your test content, webserver type, hardware and other stuff, so dont take that 100 to seriously). if that looks good,
test django with ab or httperf on a "static view" (one that doesnt use a database object), if thats slow it's a hint that you need more cpu power. check cpu utilization on the server with top. if thats ok, the problem might be in the way the web server executes the python code
if serving semi-static content is ok, your problem might be the database or IO-bound. Database problems are a wide field, here is some general advice:
check i/o throughput with iostat. if you see lot's of writes then you have get a better disc subsystem, faster raid, SSD hard drives .. or optimize your application to write less.
if its lots of reads, the host might not have enough ram dedicated as file system buffer, or your database queries might not be optimized
if i/o looks ok, then the database might be not be suited for your workload or not correctly configured. logging slow queries and monitoring database activity, locks etc might give you some idea
if you let us know what hardware/software you use i might be able to give more detailed advice
edit/PS: forgot one thing: of course your app might have a bad design and does lots of unnecessary/inefficient things ...
Take a look at the Django debug toolbar - that'll help you with the server side code (e.g. what database queries ran and how long they took); and is generally a great resource for Django development.
The other non-Django specific bits you could profile with yslow.
There are various tools, but problems like this are not hard to find because they are big.
You have a problem, and when you remove it, you will experience a speedup. Suppose that speedup is some factor, like 2x. That means the program is spending 50% of its time waiting for the slow part. What I do is just stop it a few times and see what it's waiting for. In this case, I would see the problem 50% of the times I stop it.
First I would do this on the client side. If I see that the 50% is spent waiting for the server, then I would try stopping it on the server side. Then if I see it is waiting for SQL queries, I could look at those.
What I'm almost certain to find out is that more work is being requested than is actually needed. It is not usually something esoteric like a "hotspot" or an "algorithm". It is usually something dumb, like doing multiple queries when one would have been sufficient, so as to avoid having to write the code to save the result from the first query.
Here's an example.
First things first; make sure you know which pages are slow. You might be surprised. I recommend django_dumpslow.
SQLite is a great little database, but I am having an issue with it on Windows. It can take up to 50 seconds to perform a query on a 100MB database the first time the application is launched. Subsequent loads take 10% of that time.
After some discussions on the SQLite mailing list, I am told
"The bug is in Windows. It aggressively pre-caches big database files
-- reads in big chunks of the files -- to make it look as if programs
like Outlook are better than they really are. Unfortunately although
this speeds up some programs it makes others act jerky because they
have no control over how much is read when they ask for just a few
bytes of file."
This problem is compounded because there is no way to get progress information while all this is happening from SQLite, so my users think something is broken. (I could display a dummy progress report, but that is really cheesy for a sharp tool.)
I believe there is a way to turn the pre-caching off globally, but is there some way around this programmatically?
I don't know how to fix the caching problem, but 50 seconds sounds extreme. If the query itself takes 10% of that, that means 45 seconds to load a 100mb file. Even if Windows does read in the entire file in one go, that shouldn't take more than a couple of seconds given normal harddrive speeds.
Is the file very fragmented or something?
It sounds to me like there's more than just precaching at play here.
I'm too having the same problem with my first query. The problem returns after not querying the database for a long time. It seems to be a memory caching problem. My software runs 24/7 and every once in a while the user performs the SELECT query. I am also performing the query on a database of the same size.
I want to stress test a web service method by calling it several thousand times in quick succession. The method has a single string parameter that I will vary on each call.
I'm planning on writing a Powershell script to loop and call this method a number of times.
Is there a better way to do this?
If you run call after call - it's not going to help you too much, as it will not show you how the service behaves under a heavy load of many simultaneous connections.
Go with some multi-threaded solution (I do not know if powershell has this).
Some opensource testing tools are listed here. Just set your web service to accept GET requests as well, not only SOAP(default), so you can form the urls.
For these situations I would use JMeter. You need to play around with it first, but it is very flexible, it will run requests in different threads, it will display the results graphically and it also allows you to script your jobs.
I would also recommend to start it not in the same machine as the server and if possible start two or more instances in different machines to simulate the load.
Personally, I'd use something like openSTA.
This allows a session with a web site to be recorded and then played back via a relatively simple script language.
You can also easily test web services and write your own scripts.
It allows you to put scripts together in a test in any way you want and configure the number of iterations, the number of users in each iteration, the ramp up time to introduce each new user and the delay between each iteration. Tests can also be scheduled in the future.
It's open source and free.
It produces a number of reports which can be saved to a spreadsheet. We then use a pivot table to easily analyse and graph the results.
It really depends on if there are other requirements. Such as logging, history, etc.
If quick and dirty is all you need, then you're good.
If you need something more robust then you can look at either custom building a test harness in the language of your choice or using things such as Mercury, MS Team Tester, nUnit or the like.
For those that stumble upon this and don't want to use the 3rd party options already mentioned. As the accepted answer mentions, a multi-threaded solution is best, this can be achieved in PowerShell:
ForEach -Parallel ($item in $collection) { }
https://learn.microsoft.com/en-us/powershell/module/psworkflow/about/about_foreach-parallel
Although the number of threads could be limited to 5: Does powershell's parallel foreach use at most 5 thread?