With SDL2, how do I reliably get the initial positions of the joystick axes? - sdl

I'm using the SDL2 event-based joystick API to obtain the positions of the analog joystick axes, and this is mostly working well.
However, SDL2 fails to reliably deliver those positions when the software is starting up. Even when forcing the generation of axis events using SDL_UpdateJoystick(), the corresponding event reports the default position (0) instead of a value reflecting the actual, physical position of the attached stick.
Aside from calling SDL_UpdateJoystick(), I also tried setting the SDL_HINT_JOYSTICK_ALLOW_BACKGROUND_EVENTS hint prior to initialization, as well as calling SDL_GetJoystickGetAxis(). Stepping into the source code of the latter function, I discovered that it does not actively query the axis position as I expected, but instead reports a stored value.
Through experimentation, I finally came up with a rather ugly hack that seems to "do the trick":
// My joystick axis querying function
SDL_JoystickUpdate();
for (auto i = 0U; i < 100; i ++) {
SDL_Delay(1);
SDL_PumpEvents();
}
return SDL_JoystickGetAxis((SDL_Joystick*)handle, axis);
I have no idea why this is working or what is going on behind the scenes. I'd be very thankful for a better solution.
UPDATE 2020-09-03: I still do not have a better solution, but I experimented with SDL's source code, specifically the function SDL_DINPUT_JoystickUpdate() in SDL_dinputjoystick.c. If I inject an artifical delay of 50ms between IDirectInputDevice8_Poll() and the call to UpdateDINPUTJoystickState_Buffered(), which in turn calls IDirectInputDevice8_GetDeviceData() to collect the data, I get the correct values. (Unfortunately this cannot be used as a solution, as SDL_DINPUT_JoystickUpdate() is being called constantly.) This would seem to indicate that there is a delay between a call Poll() and the data becoming available.
UPDATE 2020-09-03 #2: I have submitted a tentative (not very satisfactory) patch proposal to SDL's Bugzilla.

The issue here is that the buffer tracks changes. if nothing has been touched since init, there are no changes, so we don't know what the initial state is.
UpdateDINPUTJoystickState_Buffered won't update the initial state until a buffered axis change happens.
UpdateDINPUTJoystickState_Polled never gets called, and without this, the initial value never gets set until the joystick is moved and released.
SDL_DINPUT_JoystickUpdate does call IDirectInputDevice8_Poll, so it DOES wake up the device if it happens to need it, but if the joystick has a buffer, it still won't update the initial state on SDL side until there's an axis change. and of course with a hardware deadzone or a fake analog, there isn't one.
The fault is clearly in SDL here, and i can see no obvious solution besides the suggested patch to call IDirectInputDevice8_Poll during init, and sleep for 50 ms to force the device to output it's current state into the buffer.

Related

What is the accepted timing strategy when using Vertical Synchronisation?

Coming from a basic understanding of OpenGL programming, all required drawing operations are performed in a sequence, once per frame redraw. The performance of the hardware dictates essentially how fast this happens. As I understand, a game will attempt to draw as quickly as possible so redraw operations are essentially wrapped in a while loop. The graphics operations (graphics engine) will then be optimised to ensure the frame rate is acceptable for the application.
Graphics hardware supporting Vertical Synchronisation however locks frame rates to the display rate. A first question would be how should a graphics engine interact with the hardware synchronisation? Is this even possible or does the renderer work at maximum speed and the hardware selectively calls up the latest frame, discarding all unused previous frames..?
The motivation for this question is not that I am immediately intending to write a graphics engine, instead am debugging an issue with an existing system where the graphics of a moving scene appear to stutter onscreen. Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely. I am somewhat clutching at straws as to what is happening or why, want to understand some more background information on graphics systems.
Summarily the question would be on how one is expected to interact with hardware redraw events and if that is even possible. However any additional information would be welcome.
A first question would be how should a graphics engine interact with the hardware synchronisation?
To avoid flicker modern rendering systems use double buffering i.e. there are two color plane buffers and after finishing drawing to one, the display readout pointer is set to the finished buffer plane. This buffer swap can happen synchronized or non-synchronized. With V-Sync enabled the buffer swap will be synchronized and the rendering thread blocks until the buffer swap happened.
Since with double buffering mandates buffer swaps this implicitly introduces a synchronization mechanism. This is how interactive rendering systems lock onto the display refresh.
Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely.
This sounds like a badly written animation loop that assumes constant framerate locked onto the display refresh rate, based on the assumption that frames render faster than a display refresh interval and the buffer swap can be issued in time for the next retrace to happen.
The only robust way to deal with vertical synchronization is to actually measure the time between frame renderings and advance the rendering loop by that amount of time.
This is a guess, but:
The Problem Isn't Vertical Synchronization
I don't know what OS you're working with, but there are various ways to get information about the monitor and how fast the screen is refreshing (for the purposes of this answer, we'll assume your monitor is somewhat recent and redraws at a rate of 60 Hz, or 60 times every second, or once every 16.66666... milliseconds).
Renderers are usually paired up with an "Logic" side to the application: input, ui calculations, simulation running, etc. etc. It seems like the logic side of your application is running fast enough, but the Rendering side - i.e., the Draw Call as its commonly summed up into - is bounding the speed of your application.
Vertical Synchronization can exacerbate this in that if your Draw Call is made to happen every 16.66666 milliseconds - but it takes much longer than 16.666666 milliseconds - then you perceive a frame rate drop (i.e. frames will "stutter" because they're taking too long to produce a single frame). VSync - and the enabling or disabling thereof - is not something that bottlenecks your code: it just says "hey, since the Hardware is only going to take 1 frame from us every 16.666666 milliseconds, why make more draw calls than just one every 16.66666 milliseconds? As long as we do one draw call once for every passing of this time, our application will look as fluid as possible, and we don't have to waste time making more calls than that!"
The problem with that is that it assumes your code is going to run fast enough to make it in those 16.6666 milliseconds. If it does not, stuttering, lagging, visual artifacts, frozen frames, and other things manifest themselves on screen.
When you turn off VSync, you're telling your Render Call to be called as often as possible, as fast as possible. This may give it some extra wiggle room alongside the Logic call to get a frame rendered, so that when the Hardware Says "I'm gonna take a picture and put it on the screen now!" it's all prettied up, just in time, to get into posture and say cheese! (though by what you say, it barely makes it).
What To Do:
Start by profiling your code. Find out what functions are taking the most time. Judging by the stutter, something in your code is taking longer than is expected and is giving you undesirable performance. Make sure to profile first to find the critical sections of where you're burning away time, and figure out how to keep it correct and make it just as fast. You may want to figure out what's being called in the Render Call and profile the time it takes to complete one cycle of that specifically. Then time the Logic call(s) and see how long it takes to execute those as well. Then, chop away.
Good luck!

OpenGL window systems screen tearing prevention

in my OpenGL application I want to prevent screen tearing for obvious reasons. So far I have been using vsync. But I would like to replace it with a page flipping buffer swap (changing a pointer to the monitor's data instead of changing the value) to improve performance. My question is: Do the important windowing systems (Windows, Cocoa, X11) support this kind of buffer swap at all and does it need to be requested explicitly or is it the default behavior?
V-Sync is the "vertical retrace synchronization". If V-Sync is enables it means, that the double buffers are exchanged in that timespan, when the display is not drawing. It's a term inherited from the time of CRT displays, where an electron beam was used to draw the image line by line from top left to the bottom. When the beam reached the bottom right it had to be returned to the top right. The electron beam was steered using two pair of electromagnet coils and (unlike the electrostatic deflectors in an oscilloscopes) can not operate beyond a certain slew rate. That's the V-Sync
Today, displays receive their data still line by line into a buffer internal to the display. At the end of a whole frame a small pause is inserted.
So the "vertical retrace" is that timespan where you can update the data in your display framebuffer, wihout interfereing with the drawing process.
So far I have been using vsync.
No, you didn't "use" vsync. You do use double buffering, which exchange is synchronized by the V-Sync.
But I would like to replace it with a page flipping buffer swap
This is not your decision to make. What method is used is chosen by the graphics hardware and its driver. Your program lives in userspace and can't even talk on that a low level with the hardware. And normally the method that performs best in the situation is used.

How to do exactly one render per vertical sync (no repeating, no skipping)?

I'm trying to do vertical synced renders so that exactly one render is done per vertical sync, without skipping or repeating any frames. I would need this to work under Windows 7 and (in the future) Windows 8.
It would basically consist of drawing a sequence of QUADS that would fit the screen so that a pixel from the original images matches 1:1 a pixel on the screen. The rendering part is not a problem, either with OpenGL or DirectX. The problem is the correct syncing.
I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
I tried all combinations and permutation of these two instructions along with glFlush(), and it was not reliable.
I then tried with Direct3D 10, by drawing and then calling
g_pSwapChain->Present(1, 0);
pOutput->WaitForVBlank();
where g_pSwapChain is a IDXGISwapChain* and pOutput is the IDXGIOutput* associated to that SwapChain.
Both versions, OpenGL and Direct3D, result in the same: The first sequence of, say, 60 frames, doesn't last what it should (instead of about 1000ms at 60hz, is lasts something like 1030 or 1050ms), the following ones seem to work fine (about 1000.40ms), but every now and then it seems to skip a frame. I do the measuring with QueryPerformanceCounter.
On Direct3D, trying a loop of just the WaitForVBlank, the duration of 1000 iterations is consistently 1000.40 with little variation.
So the trouble here is not knowing exactly when each of the functions called return, and whether the swap is done during the vertical sync (not earlier, to avoid tearing).
Ideally (if I'm not mistaken), to achieve what I want, it would be to perform one render, wait until the sync starts, swap during the sync, then wait until the sync is done. How to do that with OpenGL or DirectX?
Edit:
A test loop of just WaitForVSync 60x takes consistently from 1000.30ms to 1000.50ms.
The same loop with Present(1,0) before WaitForVSync, with nothing else, no rendering, takes the same time, but sometimes it fails and takes 1017ms, as if having repeated a frame. There's no rendering, so there's something wrong here.
I have the same problem in DX11. I want to guarantee that my frame rendering code takes an exact multiple of the monitor's refresh rate, to avoid multi-buffering latency.
Just calling pSwapChain->present(1,0) is not sufficient. That will prevent tearing in fullscreen mode, but it does not wait for the vblank to happen. The present call is asynchronous and it returns right away if there are frame buffers remaining to be filled. So if your render code is producing a new frame very quickly (say 10ms to render everything) and the user has set the driver's "Maximum pre-rendered frames" to 4, then you will be rendering four frames ahead of what the user sees. This means 4*16.7=67ms of latency between mouse action and screen response, which is unacceptable. Note that the driver's setting wins - even if your app asked for pOutput->setMaximumFrameLatency(1), you'll get 4 frames regardless. So the only way to guarantee no mouse-lag regardless of driver setting is for your render loop to voluntarily wait until the next vertical refresh interval, so that you never use those extra frameBuffers.
IDXGIOutput::WaitForVBlank() is intended for this purpose. But it does not work! When I call the following
<render something in ~10ms>
pSwapChain->present(1,0);
pOutput->waitForVBlank();
and I measure the time it takes for the waitForVBlank() call to return, I am seeing it alternate between 6ms and 22ms, roughly.
How can that happen? How could waitForVBlank() ever take longer than 16.7ms to complete? In DX9 we solved this problem using getRasterState() to implement our own, much-more-accurate version of waitForVBlank. But that call was deprecated in DX11.
Is there any other way to guarantee that my frame is exactly aligned with the monitor's refresh rate? Is there another way to spy the current scanline like getRasterState used to do?
I previously tried using OpenGL, with the WGL_EXT_swap_control extension, by drawing and then calling
SwapBuffers(g_hDC);
glFinish();
That glFinish() or glFlush is superfluous. SwapBuffers implies a glFinish.
Could it be, that in your graphics driver settings you set "force V-Blank / V-Sync off"?
We use DX9 currently, and want to switch to DX11. We currently use GetRasterState() to manually sync to the screen. That goes away in DX11, but I've found that making a DirectDraw7 device doesn't seem to disrupt DX11. So just add this to your code and you should be able to get the scanline position.
IDirectDraw7* ddraw = nullptr;
DirectDrawCreateEx( NULL, reinterpret_cast<LPVOID*>(&ddraw), IID_IDirectDraw7, NULL );
DWORD scanline = -1;
ddraw->GetScanLine( &scanline );
On Windows 8.1 and Windows 10, you can make use of the DXGI 1.3 DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT. See MSDN. The sample here is for Windows 8 Store apps, but it should be adaptable to class Win32 windows swapchains as well.
You may find this video useful as well.
When creating a Direct3D device, set PresentationInterval parameter of the D3DPRESENT_PARAMETERS structure to D3DPRESENT_INTERVAL_DEFAULT.
If you run in kernel-mode or ring-0, you can attempt to read bit 3 from the VGA input register (03bah,03dah). The information is quite old but although it was hinted here that the bit might have changed location or may be obsoleted in later version of Windows 2000 and up, I actually doubt this. The second link has some very old source-code that attempts to expose the vblank signal for old Windows versions. It no longer runs, but in theory rebuilding it with latest Windows SDK should fix this.
The difficult part is building and registering a device driver that exposes this information reliably and then fetching it from your application.

Direct3D 11 missing GetRasterStatus, how do I detect the vertical blank period?

I'm updating an application in which measurement of the time of presentation of a stimulus on a screen requires the greatest amount of accuracy. It is currently written with DirectDraw, which got put out to pasture a long while ago, and there's a need to update our graphics library.
The way which we measure the presentation time utilizes detecting the end of the Vertical Blank period. Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen. Detecting the scan line can increase the certainty of that measurement, but I would be able to work with only detecting when the vertical blank period ended immediately after the Flip or Present was called.
Direct 3D 9 has the IDirect3DDevice9::GetRasterStatus Method that returns a D3DRASTER_STATUS struct which includes a InVBlank boolean, that describes if the device is in a vertical blank, as well as the current scan line. DirectDraw has similar functions (IDirectDraw::GetVerticalBlankStatus, also IDirectDraw::GetScanLine which returns DDERR_VERTICALBLANKINPROGRESS during Vertical Blank can be used to detect the VB).
However I have not been able to find any similar function in Direct3D11. Does anyone know if this functionality was moved or removed between Direct3D9 and Direct3D11, and if the latter, why?
Sorry for the late reply, but I notice there is still no accepted answer so perhaps you never found one that worked. Nowadays on Windows, the DesktopWindowManager service (dwm.exe) coordinates everything and can't really be bypassed. Ever since Windows 8, this service can't be disabled.
So DWM is always going to control the frame rate, render queue management, and final composition for all of the various IDXGISurface(n) objects and IDXGIOutput(n) monitors and there isn't much use in tracking VSync for an offscreen render target, unless I'm missing something (no sarcasm intended). As for your question, I wasn't sure if your goal was to:
obtain extremely precise timing info, but just for diagnostic, profiling, or informational use, or
whether the app was then going to (attempt to) use those results to (attempt to) schedule its own present cycles.
If it's the latter, I believe you can effectively only do this if the D3D app is running in full-screen exclusive mode. That's the only case where the DWM—in the guise of DXGI–will truly trust a client to handle its own Present timing.
The (barely) good news here is that if your interest in VSync is informational only—which is to say that you fall into bullet category (1.) from above—then you can indeed get all the timing data you'd ever want, and at QueryPerformanceFrequency resolution, which is typically around 320 ns.¹
Here's how to get that high-res video timing info. But again, just to be clear, despite the apparent success in obtaining the information as shown below, any attempt to use these interesting results, for example, to condition some deterministic--and thus potentially useful--outcome on the readings you obtain will be destined to fail, that is, entirely thwarted by DWM intermediation:
DWM_TIMING_INFO
Specifies Desktop Window Manager (DWM) composition timing information. Used by the DwmGetCompositionTimingInfo function.
typedef struct _DWM_TIMING_INFO
{
UINT32 cbSize; // size of this DWM_TIMING_INFO structure
URATIO rateRefresh; // monitor refresh rate
QPC_TIME qpcRefreshPeriod; // monitor refresh period
URATIO rateCompose; // composition rate
QPC_TIME qpcVBlank; // query performance counter value before the vertical blank
CFRAMES cRefresh; // DWM refresh counter
UINT cDXRefresh; // DirectX refresh counter
QPC_TIME qpcCompose; // query performance counter value for a frame composition
CFRAMES cFrame; // frame number that was composed at qpcCompose
UINT cDXPresent; // DirectX present number used to identify rendering frames
CFRAMES cRefreshFrame; // refresh count of the frame that was composed at qpcCompose
CFRAMES cFrameSubmitted; // DWM frame number that was last submitted
UINT cDXPresentSubmitted; // DirectX present number that was last submitted
CFRAMES cFrameConfirmed; // DWM frame number that was last confirmed as presented
UINT cDXPresentConfirmed; // DirectX present number that was last confirmed as presented
CFRAMES cRefreshConfirmed; // target refresh count of the last frame confirmed as completed by the GPU
UINT cDXRefreshConfirmed; // DirectX refresh count when the frame was confirmed as presented
CFRAMES cFramesLate; // number of frames the DWM presented late
UINT cFramesOutstanding; // number of composition frames that have been issued but have not been confirmed as completed
CFRAMES cFrameDisplayed; // last frame displayed
QPC_TIME qpcFrameDisplayed; // QPC time of the composition pass when the frame was displayed
CFRAMES cRefreshFrameDisplayed; // vertical refresh count when the frame should have become visible
CFRAMES cFrameComplete; // ID of the last frame marked as completed
QPC_TIME qpcFrameComplete; // QPC time when the last frame was marked as completed
CFRAMES cFramePending; // ID of the last frame marked as pending
QPC_TIME qpcFramePending; // QPC time when the last frame was marked as pending
CFRAMES cFramesDisplayed; // number of unique frames displayed
CFRAMES cFramesComplete; // number of new completed frames that have been received
CFRAMES cFramesPending; // number of new frames submitted to DirectX but not yet completed
CFRAMES cFramesAvailable; // number of frames available but not displayed, used, or dropped
CFRAMES cFramesDropped; // number of rendered frames that were never displayed because composition occurred too late
CFRAMES cFramesMissed; // number of times an old frame was composed when a new frame should have been used but was not available
CFRAMES cRefreshNextDisplayed; // frame count at which the next frame is scheduled to be displayed
CFRAMES cRefreshNextPresented; // frame count at which the next DirectX present is scheduled to be displayed
CFRAMES cRefreshesDisplayed; // total number of refreshes that have been displayed for the application since the DwmSetPresentParameters function was last called
CFRAMES cRefreshesPresented; // total number of refreshes that have been presented by the application since DwmSetPresentParameters was last called
CFRAMES cRefreshStarted; // refresh number when content for this window started to be displayed
ULONGLONG cPixelsReceived; // total number of pixels DirectX redirected to the DWM
ULONGLONG cPixelsDrawn; // number of pixels drawn
CFRAMES cBuffersEmpty; // number of empty buffers in the flip chain
}
DWM_TIMING_INFO;
(Note: To horizontally compress the above source code for display on this website, assume the following abbreviations are prepended:)
typedef UNSIGNED_RATIO URATIO;
typedef DWM_FRAME_COUNT CFRAMES;
Now for apps running in windowed mode, you can certainly grab this detailed information as often as you like. If you only need it for passive profiling, then getting the data from DwmGetCompositionTimingInfo is the modern way to do it.
And speaking of modern, since the question hinted at modernizing, you'll want to consider using a IDXGISwapChain1 obtained from IDXGIFactory2::CreateSwapChainForComposition to enable the use of the new DirectComposition component.
DirectComposition enables rich and fluid transitions by achieving a high framerate, using graphics hardware, and operating independently of the UI thread. DirectComposition can accept bitmap content drawn by different rendering libraries, including Microsoft DirectX bitmaps, and bitmaps rendered to a window (HWND bitmaps). Also, DirectComposition supports a variety of transformations, such as 2D affine transforms and 3D perspective transforms, as well as basic effects such as clipping and opacity.
Anyway, it seems less likely that detailed timing information might usefully inform an app's runtime behavior; maybe it will help you predict your next VSync, but one does wonder what significance "keen awareness of the blanking period" might have for some particular DWM-subjugated offscreen swap chain.
Because your app's surface is just one of many that the DWM is juggling, the DWM is going to be doing all kinds of dynamic adaptation of its own, under an assumption of each client behaving consistently. Unpredictable adaptations are uncooperative in such a regime, and will likely just end up confounding both parties.
Notes:1. The resolution of QPC is many orders of magnitude higher than that of the DateTime tick, despite the the latter's suggestive use of a 100 ns. unit denomination. Think of DateTime.Now.Ticks as a repackaging of the (millisecond-denoted) Environment.TickCount, but converted to 100-ns units. For the highest possible resolution, use static method Stopwatch.GetTimestamp() instead of DateTime.Now.Ticks.
Another alternative:
There's D3DKMTGetScanLine() which works with D3D9, D3D10, D3D11, D3D12, and even OpenGL.
It's actually a GDI32 function so you piggyback off the Window's existing graphics hAdaptor to poll the VBlank/Scanline -- no need to create a Direct3D frame buffer. That's why this API works fine with OpenGL, Mantle, and non-Direct3D renderers too, despite the D3D prefix of this API call.
It also tells you VBlank status & Raster scan line.
It's useful for beam-racing applications in supreme "latency-is-critical" applications. Some virtual reality renders use beam racing, when even a mere 20ms of lag can mean the difference between pleasant VR and dizzying/pukeworthy VR.
Beam racing is rendering on the fly, following the scanout of a display. In speciallized latency-critical applications, you can reduce latency from Direct3D Present() to pixels hitting your eyeballs, to absolute minimum (as little as 3ms).
To understand what beam racing is, https://www.wired.com/2009/03/racing-the-beam/ -- it was common back in the day when graphics chips had no frame buffers -- making beam racing necessary for improved graphics on Atari 2600, Nintendo, Commodore 64, etc...
For a more modern implementation of beam racing, see Lagless VSYNC ON Algorithm for Emulators.
"Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen."
Good luck.
There is actually no guarantee that anything you put into the present queue will ever be shown on screen (!!); you can manually drop frames w/ buffer sequencing present flags, or NVIDIA can do it for you (... thanks?)
Buffer Sequencing in DXGI
The DXGI Swapchain's flip queue is generally FIFO, but popular new driver overrides (i.e. FastSync) that users concerned with latency will most assuredly have enabled, favor CPU-side throughput over such trivial things as displaying any of the frames you draw :)
Normally you could count on IDXGISwapChain::Present (...) to begin blocking when the swapchain is full of undisplayed images and the driver is staging commands n-many frames ahead of the GPU, but with FastSync forced, Present never blocks and the render-ahead-queue flushes its work by overwriting any completed frames in the Swapchain that are waiting on VBLANK.
Back-to-back presents that complete quicker than screen refresh are under no obligation to (and will not) scan-out, thus their status in relation to VBLANK is meaningless.
Unless you implement rate limiting yourself to prevent the CPU from immediately staging the next frame after any call to Present, you need a different paradigm for measuring frame status altogether.
D3D9Ex / DXGI Supports Presentation Statistics in Flip / Fullscreen Exclusive:
Frames do not actually present to a user unless the following APIs say they do:
IDXGISwapChain::GetFrameStatistics (...) and IDXGISwapChain::GetLastPresentCount (...)
You can use frame stats to compute the length of the render queue / present latency in real-time, and your timing goals likely can be satisfied by tracking a present # against the accounting information for successfully sync'd frames.
The question here is why? It looks like you want to solve a symptom of your issue; maybe that's a distraction from your real issue. Waiting for vsync was a useful technique on Amiga or DOS. It is totally wrong on any compositing or multithreading OS.
First, what do you want to achieve? Tearing-free rendering is done by setting a swap interval on either D3D or OpenGL. It is harmful to try to do better than the OS there. Just think about cases like multiple monitors or what happens if more than one app tries to sync.
If you are a client to some other process and want to run your timing on VSync, Windows unfortunately offers no object to wait on as far as I know. Your best bet is to still rely on the Present call and estimate what is happening.
There are two cases: You are either rendering (presenting) faster or slower than vsync. If you are faster, Present should block for you already. If present never waits and your time between calls is more than 1/60 sec., you probably want to render less often.
The most common case why people care about VSync is video. You can render a lot faster than vsync but want to wait for just the right time to present. The only thing to do there is to run a few frames as fast as you can and from that estimate you frame timing. Use some jitter and feedback... or use built in hardware video that is happy enough to be kernel friends with the video driver.

box2d + cocos2d: Why is there a delay when manipulating objects in box 2d using mouseJoint

When I drag an object in my game, the object is never directly under the finger. There us this lag / delay that I cannot get rid of. It follows my finger instead of being directly underneath it. You can try in the Testbed as well. Trying moving an object really fast and the object is never underneath the mouse/finger
Is this a weakness in box2d? Or am I missing something obvious ?
Thanks in advance
That's because mouseJoint is similar to distantJoint (spring). There is a maxForce parameter you can specify to minimize the delay - make the spring more hard.
EDIT:
Also you can move your object directly specifying it's position to your finger position. But if this object will collide with something it will provide non-physical behavior because the velocity of the body will be zero.
So to move it correctly (if there will be collisions) you should specify it's velocity or acceleration (as mouse joint does). But to evaluate your finger velocity you will need some time and delay will remain.
Most of it has to do with latency in the hardware. If your timings are completely perfect, their will be 16ms of lag caused by the iPhone's GPU, ~20ms of lag from the touchscreen, and then how ever long the processing takes for your scene. So those add up to anywhere between 36-70ms of lag. Also, there is a small amount of damping applied in box2d on the mouse joint, for stability of the physics simulation.