Profiling on Windows

Profiling on Windows - profiling

I started trying out things with Racket a few days ago, and now I'm stuck with the profiler.
The function I want to profile is called my-fn.
First thing I tried was to include (require profile) then call (profile-thunk (thunk (my-fn arg1))).
When I hit the run button in DrRacket, it shows
Profiling results
-----------------
Total cpu time observed: 0ms (out of 2ms)
Number of samples taken: 0 (once every 0ms)
====================================
Caller
Idx Total Self Name+srcLocal%
ms(pct) ms(pct) Callee
====================================
then print the return value of `my-fn`.
I then tried to make my function slower by giving it more items in the arguments, and add the `#:repeat` option to the profiler.
(profile-thunk (thunk (my-fn arg1)) #:repeat 5000)
The result is now:
Profiling results
-----------------
Total cpu time observed: 29488ms (out of 31641ms)
Number of samples taken: 1687 (once every 17ms)
====================================
Caller
Idx Total Self Name+srcLocal%
ms(pct) ms(pct) Callee
====================================
Which seems to have done something good but I still can't see the results.
I then tried to use the command line (after having added raco to the %PATH%)
raco profile .\test.rkt
but I still get nothing good:
Profiling results
-----------------
Total cpu time observed: 234ms (out of 281ms)
Number of samples taken: 4 (once every 59ms)
====================================
Caller
Idx Total Self Name+srcLocal%
ms(pct) ms(pct) Callee
====================================
I'm all out of ideas, so anyone could share the right way to do it, please?
Sidenote, is it possible to have the profiler results come AFTER the return value of the function in DrRacket? Mine is a long list of number, I need to scroll up to see the (empty) profiler result. (It is not happening on the command line though so not very important)
Thanks
P.S.
I also tried to change the renderer with (require profile/render-text)
and then (profile-thunk (thunk (my-fn arg1)) #:repeat 5000 #:renderer render)
or then (profile-thunk (thunk (my-fn arg1)) #:repeat 5000 #:renderer text:render)
or then (profile-thunk (thunk (my-fn arg1)) #:repeat 5000 #:renderer #'render)
But I get error messages about renderer not being a function.

Not quite exactly what I was hoping for, but I found a fix in the mailing list.
To enable profiling in DrRacket:
Go to Language | Choose Language
Click on Show Details
At the top-right, select "Debugging and profiling". (the default setting seems to be only debugging)
Then run some code in the console or with the "Run" button, and then:
Go to View | Show profile
and at the bottom of the screen, profile information shows up. As far as I can see it doesn't handle thread very well ( printing <<unknown>>37:28 for the line of the thread joining command touch) but it is always better that no profiler at all.
I still didn't resolve why I couldn't get the profile-thunk function nor the profile macro not the profile command of raco to work but it might be related to windows or my install?

Related

Nextjs export timeout configuration

I am building a website with NextJS that takes some time to build. It has to create a big dictionary, so when I run next dev it takes around 2 minutes to build.
The issue is, when I run next export to get a static version of the website there is a timeout problem, because the build takes (as I said before), 2 minutes, whihc exceeds the 60 seconds limit pre-configured in next.
In the NEXT documentation: https://nextjs.org/docs/messages/static-page-generation-timeout it explains that you can increase the timeout limit, whose default is 60 seconds: "Increase the timeout by changing the staticPageGenerationTimeout configuration option (default 60 in seconds)."
However it does not specify WHERE you can set that configuration option. In next.config.json? in package.json?
I could not find this information anywhere, and my blind tries of putting this parameter in some of the files mentioned before did not work out at all. So, Does anybody know how to set the timeout of next export? Thank you in advance.

They were a bit more clear in the basic-features/data-fetching part of the docs that it should be placed in the next.config.js
I added this to mine and it worked (got rid of the Error: Collecting page data for /path/[pk] is still timing out after 2 attempts. See more info here https://nextjs.org/docs/messages/page-data-collection-timeout build error):
// next.config.js
module.exports = {
// time in seconds of no pages generating during static
// generation before timing out
staticPageGenerationTimeout: 1000,
}

How to send a code to the parallel port in exact sync with a visual stimulus in Psychopy

I am new to python and psychopy, however I have vast experience in programming and in designing experiments (using Matlab and EPrime). I am running an RSVP (rapid visual serial presentation) experiment with displays a different visual stimuli every X ms (X is an experimental variable, can be from 100 ms to 1000 ms). As this is a physiological experiment, I need to send triggers over the parallel port exactly on stimulus onset. I test the sync between triggers and visual onset using an oscilloscope and photosensor. However, when I send my trigger before or after the win.flip(), even with the window waitBlanking=False parameter then I still get a difference between the onset of the stimuli and the onset of the code.
Attached is my code:
im=[]
for pic in picnames:
im.append(visual.ImageStim(myWin,image=pic,pos=[0,0],autoLog=True))
myWin.flip() # to get to the next vertical blank
while tm < and t &lt len(codes):
im[tm].draw()
parallel.setData(codes[t]) # before
myWin.flip()
#parallel.setData(codes[t]) # after
ttime.append(myClock.getTime())
core.wait(0.01)
parallel.setData(0)
dur=(myClock.getTime()-ttime[t])*1000
while dur < stimDur-frameDurAvg+1:
dur=(myClock.getTime()-ttime[t])*1000
t=t+1
tm=tm+1
myWin.flip()
How can I sync my stimulus onset to the trigger? I'm not sure if this is a graphics card issue (I'm using a LCD ACER screen with the onboard Intel graphics card). Many thanks,
Shani

win.flip() waits for next monitor update. This means that the next line after win.flip() is executed almost exactly when the monitor begins drawing the frame. That's where you want to send your trigger. The line just before win.flip() is potentially almost one frame earlier, e.g. 16.7 ms on a 60Hz monitor so your trigger would arrive too early.
There are two almost identical ways to do it. Let's start with the most explicit:
for i in range(10):
win.flip()
# On the first flip
if i == 0:
parallel.setData(255)
core.wait(0.01)
parallel.setData(0)
... so the signal is sent just after the image has been pushed to the monitor.
The slightly more timing-accurate way to do it will save you like 0.01 ms (plus minus an order of magnitude). Somewhere early in the script define
def sendTrigger(code):
parallel.setData(code)
core.wait(0.01)
parallel.setData(0)
Then do
win.callOnFlip(sendTrigger, code=255)
for i in range(10):
win.flip()
This will call the function just after the first flip, before psychopy does a bit of housecleaning. So the function could have been called win.callOnNextFlip since it's only executed on the first following flip.
Again, this difference in timing is so miniscule compared to other factors that this is not really a question of a performance but rather of style preferences.

There is a hidden timing variable that is usually ignored - the monitor input lag, and I think this is the reason for the delay. Put simply, the monitor needs some time to display the image even after getting the input from the graphics card. This delay has nothing to do with the refresh rate (how many times the screen switches buffer), or the response time of the monitor.
In my monitor, I find a delay of 23ms when I send a trigger with callOnFlip(). How I correct it is: floor(23/16.667) = 1, and 23%16.667 = 6.333. So I call the callOnFlip on the second frame, wait 6.3 ms and trigger the port. This works. I haven't tried with WaitBlanking=True, which waits for the blanking start from the graphics card, as that gives me some more time to prepare the next buffer already. However, I think that even with WaitBlanking=True the effect will be there. (More after testing!)
Best,
Suddha

There is at least one routine that you can use to normalized the trigger delay to your screen refreshing rate. I just tested it with a photosensor cell and I went from a mean delay of 13 milliseconds (sd = 3.5 ms) between the trigger and the stimulus display, to a mean delay of 4.8 milliseconds (sd = 3.1 ms).
The procedure is the following :
Compute the mean duration between two displays. Say your screen has a refreshing rate of 85.05 (this is my case). This means that there is mean duration of 1000/85.05 = 11.76 milliseconds between two refreshes.
Just after you called win.flip(), wait for this averaged delay before you send your trigger : core.wait(0.01176).
This will not ensure that all your delays now equal zero, since you cannot master the synchronization between the win.flip() command and the current state of your screen, but it will center the delay around zero. At least, it did for me.
So the code could be updated as following :
refr_rate = 85.05
mean_delay_ms = (1000 / refr_rate)
mean_delay_sec = mean_delay_ms / 1000 # Psychopy needs timing values in seconds
def send_trigger(port, value):
core.wait(mean_delay_sec)
parallel.setData(value)
core.wait(0.001)
parallel.setData(0)
[...]
stimulus.draw()
win.flip()
send_trigger(port, value)
[...]

Log Slow Pages Taking Longer Than [n] Seconds In ColdFusion with Details

(ACF9)
Unless there's an option I'm missing, the "Log Slow Pages Taking Longer Than [n] Seconds" setting isn't useful for front-controller based sites (e.g., Model-Glue, FW/1, Fusebox, Mach-II, etc.).
For instance, in a Mura/Framework-One site, I just end up with:
"Warning","jrpp-186","04/25/13","15:26:36",,"Thread: jrpp-186, processing template: /home/mysite/public_html_cms/wwwroot/index.cfm, completed in 11 seconds, exceeding the 10 second warning limit"
"Warning","jrpp-196","04/25/13","15:27:11",,"Thread: jrpp-196, processing template: /home/mysite/public_html_cms/wwwroot/index.cfm, completed in 59 seconds, exceeding the 10 second warning limit"
"Warning","jrpp-214","04/25/13","15:28:56",,"Thread: jrpp-214, processing template: /home/mysite/public_html_cms/wwwroot/index.cfm, completed in 32 seconds, exceeding the 10 second warning limit"
"Warning","jrpp-134","04/25/13","15:31:53",,"Thread: jrpp-134, processing template: /home/mysite/public_html_cms/wwwroot/index.cfm, completed in 11 seconds, exceeding the 10 second warning limit"
Is there some way to get query string or post details in there, or is there another way to get what I'm after?

You can easily add some logging to your application for any requests that take longer than 10 seconds.
In onRequestStart():
request.startTime = getTickCount();
In onRequestEnd():
request.endTime = getTickCount();
if (request.endTime - request.startTime > 10000){
writeLog(cgi.QUERY_STRING);
}

If you're writing a Mach-II, FW/1 or ColdBox application, it's trivial to write a "plugin" that runs on every request which captures the URL or FORM variables passed in the request and stores that in a simple database table or log file. (You can even capture session.userID or IP address or whatever you may need.) If you're capturing to a database table, you'll probably not want any indexes to optimize for performance and you'll need to rotate that table so you're not trying to do high-speed inserts on a table with tens of millions of rows.
In Mach-II, you'd write a plugin.
In FW/1, you'd put a call to a controller which handles this into setupRequest() in your application.cfc.
In ColdBox, you'd write an interceptor.

The idea is that the log just tells you what pages arw xonsostently slow sp ypu can do your own performance tuning.
Turn on debugging for further details for a start.

Can oprofile ignore calls to external functions and instead accumulate the time to the caller?

I'm currently calling oprofile with these parameters:
operf --callgraph --vmlinux /usr/lib/debug/boot/vmlinux-$(uname -r) <BINARY>
opreport -a -l <BINARY>
As an example, the output is:
CPU: Core 2, speed 2e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 90000
samples cum. samples % cum. % image name symbol name
12635 12635 27.7674 27.7674 libc-2.15.so __memset_sse2
9404 22039 20.6668 48.4342 vmlinux-3.5.0-21-generic get_page_from_freelist
4381 26420 9.6279 58.0621 vmlinux-3.5.0-21-generic native_flush_tlb_single
3684 30104 8.0962 66.1583 vmlinux-3.5.0-21-generic page_fault
701 30805 1.5406 67.6988 vmlinux-3.5.0-21-generic handle_pte_fault
You can see that most of the time is spent within __memset_sse2 but it is not obvious which of my own code should be optimized. At least not from the output above.
In my specific case, I was able to quickly locate the source of the problem by using some kind of poor man's profiler. I ran the program in a debugger, stopped it from time to time and looked at the call stacks of each thread.
Is it possible to get the same results directly from the output of oprofile? The strategy that I used will most likely fail if the performance bottleneck is not that obvious as it was in my example.
Is there an option to ignore all calls to external function (e.g., to the kernel or libc) and just accumulate the time to the caller? For example:
void foo() {
// some expensive call to memset...
}
Here, it would be more insightful for me to see foo at the top of the profiling output, not memset.
(I tried opreport --exclude-dependent but found it not helpful as it seems only to skip the external functions in the output.)

How to profile the time of calling initializer functions when dyld loads an image?

I am now trying to optimize the launch time of an application, and currently want to reduce the time spent by the OS image loader.
From dyld(1), I found two environment variables: DYLD_PRINT_STATISTICS and DYLD_PRINT_INITIALIZERS.
DYLD_PRINT_STATISTICS causes dyld to print statistics about how dyld spent its time. This is its output when I launched the application after a purge command.
total time: 5.9 seconds (100.0%)
total images loaded: 130 (101 from dyld shared cache, 0 needed no fixups)
total segments mapped: 105, into 12914 pages with 840 pages pre-fetched
total images loading time: 1.8 seconds (31.0%)
total dtrace DOF registration time: 0.09 milliseconds (0.0%)
total rebase fixups: 137,658
total rebase fixups time: 1.1 seconds (19.4%)
total binding fixups: 28,502
total binding symbol lookups: 1,127, average images searched per symbol: 0.1
total binding fixups time: 452.28 milliseconds (7.6%)
total weak binding fixups time: 188.52 milliseconds (3.1%)
total bindings lazily fixed up: 130 of 1,164
total initializer time: 2.3 seconds (38.7%)
total symbol trie searches: 2611
total symbol table binary searches: 118
total images defining/using weak symbols: 13/65
Apparently, calling initializer functions use lots of time. So I tried DYLD_PRINT_INITIALIZERS to get the list of initializer functions.
I actually got them, but there was no time information on them, such as a timestamp, just tons of following lines:
dyld: calling initializer function 0x25170 in MY_APPLICATE_PATH
This is useless for me to find the target initializers to optimize.
So, my questions are:
Is there any method to make dyld print timestamps?
If it is impossible, could I use something else to profile the time of calling initializer functions by dyld? Is DTrace or Instruments applicable here?
Any advices on the above statistics are appreciated.
Although I am now working on OSX, any tools or helpful information for Windows are also welcome.
Thanks

Can you use this technique?
I've seen slow startup of a large app, which was loading a lot of dlls and doing a lot of initializing. Sampling will tell what's going on, that you can fix, and it will probably surprise you.
For example, what I never would have guessed is the amount of time spent setting up internationalized character strings, many of which did not need to be. Another example: not only creating and initializing data structures, but destruction and re-creation of them, needlessly, as things were being inserted in menus, tree views, etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js