I've been tasked with optimizing some scheduled tasks that run for hours. One of the tasks runs through data from 1995-present. My first thought (besides revising queries,etc) was to create a cfloop through all the years and start a thread for each year.
Is this a good approach? Is there a better way to divide up the workload on such a task?
You really need to work out what is slow before you optimize anything. Otherwise you may spend a lot of time tweaking code for relatively small gains. Databases are often the bottleneck but you need to find that out first.
At a simple level, you can enable debugging (on a dev machine) and see where time is being spent. There are also tools like Fusion Reactor which will give you more insight. Alternatively you can can just add some <cflog> calls to your script and then analyze them to identify the slow blocks. Which ever way you decide to do it, you need to know where your effort is best spent.
Some other thoughts....
Does the data change?
If not then compile the data once and store it so that the scheduled tasks don't have to redo the work each time
Which version of CF are you on?
You could run out of threads if you are not careful - which would be particularly bad if your server is running other stuff. But yes, threads could be part of your solution.
Related
I'm writing a high performance application (a raytracer) in C++ using Visual Studio, and I just spent two days trying to root out a performance drop I witnessed after refactoring the code. The reason it took so long was because the performance drop was smaller than the normal variation in execution time I witnessed from run to run.
Not sure if this is normal, but sometimes the program may run at around 33fps pretty consistently, then if you close and rerun, it may run at 37fps. This means that in order to test any new change, I had to manually run and rerun until I witnessed peak performance (And this could require up to like 10 runs). Simply running it for some large number of frames, and measuring the time doesn't fix this variability. For example, if the program runs for 40 seconds on average, it will nevertheless vary by over 1-2 seconds, which makes this test nearly useless for detecting the 1 millisecond per frame performance loss I was dealing with.
Visual Studio's profiling tools also didn't help find this small of an issue, because they also were subject to variation, and in any case, its not necessarily going to tell me the exact offending line, so I have to test solutions, and the profiler is not very effective at confirming a proposed solution's efficacy.
I realize this all may sound like premature optimization, but I don't think it is because I'm optimizing only after finishing complete features; I'm just trying to monitor changes in performance regularly so that issues like the above don't slip in and just get added to the apparent cost of the new feature.
Anyways, my question is simply whether there's a way to objectively determine the "real" speed of an application, discounting the effect of variation. Or, failing that, how do developers deal with such issues? I doubt that my current process is the ideal one.
There are lots of profilers for both c++ and openGL. For those who just need the links, here are they.
OpenGL debugger-profiler
C++ profilers but I recommend Google orbit because it has dark theme.
My eyes stopped at
Objectively measure performance
As you mentioned the speed varies from run to run because it's too complex system. It helps if the scope is small and it only tests some key algorithms. It worth to automatize and collect some reference data. As every scientist say one test is not a test, you should rely on regular tests with controlled environments.
And here comes some tricks that can be used to measure performance.
In the comments others said, an average based on several runs may help you. It softens the noise from the outside.
Process priority or processor affinity could help you control the environment. By giving low priority to other processes your program gains more resource.
Measuring the whole execution of a test and compare it against processor time. As several processes runs at the same time, processor time may differs from execution time.
Update your reference values if you do a software update. Perhaps one update comes with performance boost while other with security patch.
Give a performance range for your program instead of one specific number. Perhaps the temperature messed up your measurement and the clock speed was decreased.
If a test runs too fast to measure, execute the most critical part several times in a test case. Too fast depend on how accurate you can measure. On ms basis it's really hard to decide if a test executed in 2 ms instead of 1 ms is a failure or not. However, if executed 1000 times - 1033 ms compared to 1000 ms gives you better insight.
Only test what is the critical section. Set up the environment and start the stopwatch when everything is ready. The system startup could be another test.
I have a system that i need to profile.
It is comprised of tens of processes, mostly c++, some comprised of several threads, that communicate to the network and to one another though various system calls.
I know there are performance bottlenecks sometimes, but no one has put in the time/effort to check where they are: they may be in userspace code, inefficient use of syscalls, or something else.
What would be the best way to approach profiling a system like this?
I have thought of the following strategy:
Manually logging the roundtrip times of various code sequences (for example processing an incoming packet or a cli command) and seeing which process takes the largest time. After that, profiling that process, fixing the problem and repeating.
This method seems sorta hacky and guess-worky. I dont like it.
How would you suggest to approach this problem?
Are there tools that would help me out (multi-process profiler?)?
What im looking for is more of a strategy than just specific tools.
Should i profile every process separately and look for problems? if so how do i approach this?
Do i try and isolate the problematic processes and go from there? if so, how do i isolate them?
Are there other options?
I don't think there is a single answer to this sort of question. And every type of issue has it's own problems and solutions.
Generally, the first step is to figure out WHERE in the big system is the time spent. Is it CPU-bound or I/O-bound?
If the problem is CPU-bound, a system-wide profiling tool can be useful to determine where in the system the time is spent - the next question is of course whether that time is actually necessary or not, and no automated tool can tell the difference between a badly written piece of code that does a million completely useless processing steps, and one that does a matrix multiplication with a million elements very efficiently - it takes the same amount of CPU-time to do both, but one isn't actually achieving anything. However, knowing which program takes most of the time in a multiprogram system can be a good starting point for figuring out IF that code is well written, or can be improved.
If the system is I/O bound, such as network or disk I/O, then there are tools for analysing disk and network traffic that can help. But again, expecting the tool to point out what packet response or disk access time you should expect is a different matter - if you contact google to search for "kerflerp", or if you contact your local webserver that is a meter away, will have a dramatic impact on the time for a reasonable response.
There are lots of other issues - running two pieces of code in parallel that uses LOTS of memory can cause both to run slower than if they are run in sequence - because the high memory usage causes swapping, or because the OS isn't able to use spare memory for caching file-I/O, for example.
On the other hand, two or more simple processes that use very little memory will benefit quite a lot from running in parallel on a multiprocessor system.
Adding logging to your applications such that you can see WHERE it is spending time is another method that works reasonably well. Particularly if you KNOW what the use-case is where it takes time.
If you have a use-case where you know "this should take no more than X seconds", running regular pre- or post-commit test to check that the code is behaving as expected, and no-one added a lot of code to slow it down would also be a useful thing.
I am looking for some tools to profile where the time is spent. Have looked at oprofile, but that doesnt really give me what I need.
I was looking at callgrind, specifically using the CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION macros. I dont want the tool to slow down the app too much, like valgrind does in general. But that doesn't really work because Valgrind seems to seralize everything to one single thread.
For example, if fn A calls fb B which calls fn C, and back to B and A, I want to know how much time was spent where. I have some mutex tools that I am using, but a good time tool would be extremely useful to see where exactly is the time being spent, so that I can concentrate on those paths. Short of adding something myself, is there any tool I can use for this task? Its a C++ app btw. I cannot use valgrind because of its single threaded-ness in the kernel. Also, my app spends a bunch of time waiting, so plain CPU profilers are not really helping as much..
You might care to take a look at point 3 of this post.
It suggests not asking where the time is spent, but why.
There is a qualitative difference between supposing that you are looking for some method that "spends too much time" versus asking (by studying stack samples, not summarizing them) what is the program actually trying to accomplish at a small sampling of time points.
That approach will find anything you can find by measuring methods, and a lot more.
If applied repeatedly, it can result in large factors of speedup.
In a multi-thread situation, you can identify the threads that are not idle, and apply it to them.
What I want to do
I have a computationally intensive OCaml application and I'd like it to run in the background without disturbing normal computer usage. I'd like to present the users with two options:
(1) the application only runs when CPU usage is virtually 0%;
(2) the application only uses "free" processing power (e.g. if other processes add up to 100%, the OCaml application pauses; if other processes are virtually 0%, then there are no restrictions for the OCaml application; if other processes add up to, say, 50% then OCaml will use up to 50%).
Some thoughts
My idea is to check CPU usage at various check points in the code and pause execution if necessary.
In (1), we just check if CPU is below say 2% and, if not, pause until it becomes lower than 2% again.
In (2), things are trickier. Since when no restrictions are present the application always consumes 100% of CPU and checkpoints will be quite frequent, to reduce CPU usage to, say, half, I just have to delay it at every checkpoint by exactly the time it took between check points. If check points are frequent, this would be similar to using 50% CPU, I'd say. For other percentages we can do something similar by suspending for appropriate periods of time. However, this looks very contrived, full of overhead, and above all, I'm not sure it really does what I want. A better alternative could be to invoke Unix.nice n with some appropriate integer at the start of the application. I suppose that setting n=15 would probably be right.
My questions
(Q1) How can I know from within my OCaml application what the CPU usage for the application process is? (I'd like to do this with an OCaml function and not by invoking "ps" or something similar on the command line...)
(Q2) Do you see problems with my idea to achieve (2). Which are the practical differences to changing niceness of process?
(Q3) Do you have any other suggestions for (2)?
Use Caml's Unix library to periodically capture your CPU times and your elapsed times. Your CPU usage is the ratio. Try Unix.gettimeofday and Unix.times. N.B. You'll need to link with the -lunix option.
I too would just run the process under nice and be done with it.
Get your PID then parse the contents of /proc/<PID>/stat to get info about your process and /proc/stat to get global CPU info. They both have a bunch of statistics that you can use to decide when to do work and when to sleep. Do man proc to see the documentation for all the fields (long). Related question with good info: stackoverflow.com/questions/1420426
Setting niceness is easy and reliable. Doing things yourself is much more work but potentially gives you more control. If your actual goal is to just run as a background task, I would go with nice and be done with it.
I'm currently developing a custom search engine with built-in web crawler. For some reason I'm not into multi-threading, thus so far my indexer was coded in single-threaded manner. Now I have a small dilemma with the crawler I'm building. Can anybody suggest which is better, crawl 1 page then index it, or crawl 1000+ page and cache, then index?
Networks are slow (relative to the CPU). You will see a significant speed increase by parallelizing your crawler. Otherwise, your app will spend the majority of its time waiting on network IO to complete. You can either use multiple threads and blocking IO or a single thread with asynchronous IO.
Also, most indexing algorithms will perform better on batches of documents verses indexing one document at a time.
Better? In terms of what? In terms of speed I can't forsee a noticable difference. In terms of robustness (recovering from a catastrophic failure) its probably better to index each page as you crawl it.
I would strongly suggest getting "in" to to multi-threading if you are serious about your crawler. Basically, you would want to have at least one indexer and at least one crawler (potentially multitudes for both) running at all times. Among other things, this minimizes start-up and shutdown overhead (e.g. initializing and freeing data structures).
Not using threads is OK.
However if you still want performance, you need to deal with Asynchronous IO.
I would recommend checking out Boost.ASIO link text. Using Asynchronous IO will make your dilemma "irrelevant", as it would not matter. Also as a bonus, in future if you do decide to use threads, then its trivial to tell Boost.Asio to apply multuple threads to the problem.