XAudio2 playing delay - c++

I'm trying to play raw-pcm data with xaudio and i have huge delays in playing (>=5ms). I'm doing this with next code:
bool Play(uint8_t *data, size_t size)
{
_xaudio_buffer.AudioBytes = size;
_xaudio_buffer.pAudioData = data;
Time t1;
if (_g_source->SubmitSourceBuffer(&_xaudio_buffer) != S_OK)
return false;
if(WaitForSingleObjectEx(_voice_callback.hBufferEndEvent,INFINITE,true) != WAIT_OBJECT_0)
return false;
Time t2;
printf("%d\n",t2-t1);
}
Time class is just a wrapper under GetTickCount, the resulting t2-t1 will return difference in milliseconds. I've checked that my Time class doesn't produce any additional delay.
It is not so hard to calculate play time in milliseconds:
play_time = size*1000 / (channels*(bits_per_sample/8) * frequency)
So for data with size 4608 bytes, 48 khz, 2 channels, 16 bit per sample it should take nearly 24ms for playing such chunk. Instead, code that i showed above needs about >= 31 ms for playing such chunk. What is producing such delay? How to deal with it, if i'm writing a video-player and i obtaining data from real-time stream (i already have synchronization function, but 5ms delay for such small sample produces not ideal sound)?
Also, I've tested this code on 2 computers with Windows 7 with different hardware. Both producing same delay.

Related

Inconsistent chrono::high_resolution_clock delay

I'm trying to implement a MIDI-like clocked sample player.
There is a timer, which increments pulse counter, and every 480 pulses is a quarter, so pulse period is 1041667 ns for 120 beats per minute.
Timer is not sleep-based and running in separate thread, but it seems like delay time is inconsistent: period between samples played in a test file is fluctuating +- 20 ms (in some occasions period is OK and steady, I can't find out dependency of this effect).
Audio backend influence is excluded: i've tried OpenAL as well as SDL_mixer.
void Timer_class::sleep_ns(uint64_t ns){
auto start = std::chrono::high_resolution_clock::now();
bool sleep = true;
while(sleep)
{
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(now - start);
if (elapsed.count() >= ns) {
TestTime = elapsed.count();
sleep = false;
//break;
}
}
}
void Timer_class::Runner(void){
// this running as thread
while(1){
sleep_ns(BPMns);
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1); // period of this event fluctuates severely
}
}
};
void Player_class::PlayFile(int FileNumber){
#ifdef AUDIO_SDL_MIXER
if(Mix_PlayChannel(-1, WaveData[FileNumber], 0)==-1) {
printf("Mix_PlayChannel: %s\n",Mix_GetError());
}
#endif // AUDIO_SDL_MIXER
}
Am i doing something wrong in terms of an approach? Is there any better way to implement timer of this kind?
Deviation higher than 4-5 ms is too much in case of audio.
I see a large error and a small error. The large error is that your code assumes that the main processing in Runner consistently takes zero time:
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1); // period of this event fluctuates severely
}
That is, you're "sleeping" for the time you want your loop iteration to take, and then you're doing processing on top of that.
The small error is presuming that you can represent your ideal loop iteration time with an integral number of nanoseconds. This error is so small that it doesn't really matter. However I amuse myself by showing people how they can get rid of this error too. :-)
First lets correct the small error by exactly representing the idealized loop iteration time:
using quarterPeriod = std::ratio<1, 2>;
using iterationPeriod = std::ratio_divide<quarterPeriod, std::ratio<480>>;
using iteration_time = std::chrono::duration<std::int64_t, iterationPeriod>;
I know nothing of music, but I'm guessing the above code is right because if you convert iteration_time{1} to nanoseconds, you get approximately 1041667ns. iteration_time{1} is intended to be the precise amount of time you want each iteration of your loop in Timer_class::Runner to take.
To correct the large error, you need to sleep until a time_point, as opposed to sleeping for a duration. Here's a generic utility to help you do that:
template <class Clock, class Duration>
void
delay_until(std::chrono::time_point<Clock, Duration> tp)
{
while (Clock::now() < tp)
;
}
Now if you code Timer_class::Runner to use delay_until instead of sleep_ns, I think you'll get better results:
void
Timer_class::Runner()
{
auto next_start = std::chrono::steady_clock::now() + iteration_time{1};
while (true)
{
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1);
}
delay_until(next_start);
next_start += iteration_time{1};
}
}
I ended up using #howard-hinnant version of delay, and reducing buffer size in openal-soft, that's what made a huge difference, fluctuations is now about +-5 ms for 1/16th at 120BPM (125 ms period) and +-1 ms for quarter beats. Leaves a lot to be desired, but i guess it's okay

pthread_cond_timedwait timing out late when large load put on CPU

In writing unit tests for an object, I am noticing that a pthread_cond_timedwait does not timeout soon enough when large loads are put upon the CPU. If these loads are not put on the CPU, everything works fine. When loads are put on to the system, however, I find that no matter the amount of time I set the timeout to, the true delay is off by about 50-100ms.
For example, here is a printout from a single interval of the program, where the last and current times are found using the function GetTimeInMs.
// Printout, values are in ms
Last: 89799240
Current: 89799440
Period Length: 200
Expected Period: 100
From all I have read this issue is usually caused by using relative times instead of absolute times, but as far as I can tell we are using absolute times correctly. If you wonderful people could help me figure out what is being done wrong here I would be very grateful.
The function utilizing timedwait is shown here. Note that based off of timing debugging I have done, I know the extra time generated is done via the timedwait call, so I have not included other code that would not be necessary.
bool func(unsigned long long int time = 100) // ms
{
struct timespec ts;
pthread_mutex_lock(&m_Mutex);
if (0 == m_CurrentCount)
{
// Current time + delay in ns
unsigned long long int absnanotime = (GetTimeInMs()+time)*1000000;
struct timespec ts;
ts.tv_nsec = absnanotime % 1000000000ULL;
ts.tv_sec = absnanotime / 1000000000ULL;
do
{
if (0 != pthread_cond_timedwait(&m_Condition, &m_Mutex, &ts))
{
// In the case I am testing, I hope to get here via timeout in 100 ms
pthread_mutex_unlock(&m_Mutex);
return false;
}
}
while (!m_CurrentCount);
}
pthread_mutex_unlock(&m_Mutex);
return true;
}
unsigned long long int GetTimeInMs()
{
unsigned long long int time;
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
time = ts.tv_nsec + ts.tv_sec * 1000000000ULL;
time = time / 1000000ULL; // Converts to ms
return time;
}
The code used to initialize the class variables used in func.
void init()
{
pthread_mutex_init(&m_Mutex, NULL);
pthread_condattr_init(&m_Attr);
pthread_condattr_setclock(&m_Attr, CLOCK_MONOTONIC);
pthread_cond_init(&m_Condition, &m_Attr);
}
The CPU eater thread which simulates CPU load is running the following while loop.
void cpuEatingThread()
{
while (false == m_ShutdownRequested);
{
// m_UselessFoo is of type float*
m_UselessFoo = new float(1.23423525);
delete m_UselessFoo;
}
}
It's likely that, when the wait times out, the thread becomes ready without any priority boost or any other such action/s. If the box is loaded up, then the ready thread may not become running immediately.
It's common to apply temporary priority boosts to thread that become ready on signals - this tends to improve overall performance in the 'usual' case where the signal arrives before the timeout. The timeout is often more of an 'unusual' event, often signaling some sort of failure that will not be repeated and so threads becoming ready on timeout can wait their turn:)
For timed waits in general, the requirement is that they will wait at least as long as their argument. If you want precise times, this is not the right tool; you'll need something that guarantees particular times, and that's generally only available in a real-time operating system (RTOS).

The best way to measure milliseconds for deltatime on Mac OS?

I have found a function to get milliseconds since the Mac was started:
U32 Platform::getRealMilliseconds()
{
// Duration is a S32 value.
// if negative, it is in microseconds.
// if positive, it is in milliseconds.
Duration durTime = AbsoluteToDuration(UpTime());
U32 ret;
if( durTime < 0 )
ret = durTime / -1000;
else
ret = durTime;
return ret;
}
The problem is that after ~20 days AbsoluteToDuration returns INT_MAX all the time until the Mac is rebooted.
I have tried to use method below, it worked, but looks like gettimeofday takes more time and slows down the game a bit:
timeval tim;
gettimeofday(&tim, NULL);
U32 ret = ((tim.tv_sec) * 1000 + tim.tv_usec/1000.0) + 0.5;
Is there a better way to get number of milliseconds elapsed since some epoch (preferably since the app started)?
Thanks!
Your real problem is that you are trying to fit an uptime-in-milliseconds value into a 32-bit integer. If you do that your value will always wrap back to zero (or saturate) in 49 days or less, no matter how you obtain the value.
One possible solution would be to track time values with a 64-bit integer instead; that way the day of reckoning gets postponed for a few hundred years and so you don't have to worry about the problem. Here's a MacOS/X implementation of that:
uint64_t GetTimeInMillisecondsSinceBoot()
{
return UnsignedWideToUInt64(AbsoluteToNanoseconds(UpTime()))/1000000;
}
... or if you don't want to return a 64-bit time value, the next-best thing would be to record the current time-in-milliseconds value when your program starts, and then always subtract that value from the values you return. That way things won't break until your own program has been running for at least 49 days, which I suppose is unlikely for a game.
uint32_t GetTimeInMillisecondsSinceProgramStart()
{
static uint64_t _firstTimeMillis = GetTimeInMillisecondsSinceBoot();
uint64_t nowMillis = GetTimeInMillisecondsSinceBoot();
return (uint32_t) (nowMillis-_firstTimeMillis);
}
My preferred method is mach_absolute_time - see this tech note - I use the second method, i.e. mach_absolute_time to get time stamps and mach_timebase_info to get the constants needed to convert the difference between time stamps into an actual time value (with nanosecond resolution).

Win32 API timers

I was using the system timer (clock() function, see time.h) to time some serial and USB comms. All I needed was approx 1ms accurace. The first thing I noticed is that individual times can be out (plus or minus) 10ms. Timing a number of smaller events led to less accurate timing as events went by. Aggregate timing was slightly better. After a bit of a root on MSDN etc I stumbled across the timer in windows multi-media library (timeGetTime(), see MMSystem.h). This was much better with decent accuracy to the 1ms level.
Weirdness then ensued, after initially working flawlessy for days (lovely logs with useful timings) it all went pearshaped as this API also started showing this odd granularity (instead of a bunch of small comms messages taking 3ms,2ms,3ms,2ms, 3ms etc. it came out as 0ms, 0ms, 0ms, 0ms, 15ms etc. Rebooting the PC restored nomal accuarce but at some indeterminate time (after an hour or so) the anomoly returned.
Anyone got any idea or suggestions of how to get this level of timing accuracy on Windows XP (32bit Pro, using Visual Studio 2008).
My little timing class:
class TMMTimer
{
public:
TMMTimer( unsigned long msec);
TMMTimer();
void Clear() { is_set = false; }
void Set( unsigned long msec=0);
bool Expired();
unsigned long Elapsed();
private:
unsigned long when;
int roll_over;
bool is_set;
};
/** Main constructor.
*/
TMMTimer::TMMTimer()
{
is_set = false;
}
/** Main constructor.
*/
TMMTimer::TMMTimer( unsigned long msec)
{
Set( msec);
}
/** Set the timer.
*
* #note This sets the timer to some point in the future.
* Because the timer can wrap around the function sets a
* rollover flag for this condition which is checked by the
* Expired member function.
*/
void TMMTimer::Set( unsigned long msec /*=0*/)
{
unsigned long now = timeGetTime(); // System millisecond counter.
when = now + msec;
if (when < now)
roll_over = 1;
else
roll_over = 0;
is_set = true;
}
/** Check if timer expired.
*
* #return Returns true if expired, else false.
*
* #note Also returns true if timer was never set. Note that this
* function can handle the situation when the system timer
* rolls over (approx every 47.9 days).
*/
bool TMMTimer::Expired()
{
if (!is_set)
return true;
unsigned long now = timeGetTime(); // System millisecond counter.
if (now > when)
{
if (!roll_over)
{
is_set = false;
return true;
}
}
else
{
if (roll_over)
roll_over = false;
}
return false;
}
/** Returns time elapsed since timer expired.
*
* #return Time in milliseconds, 0 if timer was never set.
*/
unsigned long TMMTimer::Elapsed()
{
if (!is_set)
return 0;
return timeGetTime()-when;
}
Did you call timeBeginPeriod(1); to set the multimedia resolution to 1 millisecond? The multimedia timer resolution is system-global, so if you didn't set it yourself, chances are that you started after something else had called it, then when that something else called timeEndPeriod(), the resolution went back to the system default (which is normally 10 ms, if memory serves).
Others have advised using QueryPerformanceCounter(). This does have much higher resolution, but you still need to be careful. Depending on the kernel involved, it can/will use the x86 RDTSC function, which is a 64-bit counter of instruction cycles. For better or worse, on a CPU whose clock rate varies (which started on laptops, but is now common almost everywhere) the relationship between the clock count and wall time varies right along with the clock speed. If memory serves, if you force Windows to install with the assumption that there are multiple physical processors (not just multiple cores), you'll get a kernel in which QueryPerformanceCounter() will read the motherboard's 1.024 MHz clock instead. This reduces resolution compared to the CPU clock, but at least the speed is constant (and if you only need 1 ms resolution, it should be more than adequate anyway).
If you want high resolution timing on Windows, you should consider using QueryPerformanceFrequency and QueryPerformanceCounter.
These will provide the most accurate timings available on Windows, and should be much more "stable" over time. QPF gives you the number of counts/second, and QPC gives you the current count. You can use this to do very high resolution timings on most systems (fraction of ms).
Check out the high resolution timers from the Win32 API.
http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
You can use it to usually get microsecond resolution timers.

C++ windows time

I have a problem in using time.
I want to use and get microseconds on windows using C++.
I can't find the way.
The "canonical" answer was given by unwind :
One popular way is using the QueryPerformanceCounter() call.
There are however few problems with this method:
it's intended for measurement of time intervals, not time. This means you have to write code to establish "epoch time" from which you will measure precise intervals. This is called calibration.
As you calibrate your clock, you also need to periodically adjust it so it's never too much out of sync (this is called drift) with your system clock.
QueryPerformanceCounter is not implemented in user space; this means context switch is needed to call kernel side of implementation, and that is relatively expensive (around 0.7 microsecond). This seems to be required to support legacy hardware.
Not all is lost, though. Points 1. and 2. are something you can do with a bit of coding, 3. can be replaced with direct call to RDTSC (available in newer versions of Visual C++ via __rdtsc() intrinsic), as long as you know accurate CPU clock frequency. Although, on older CPUs, such call would be susceptible to changes in cpu internal clock speed, in all newer Intel and AMD CPUs it is guaranteed to give fairly accurate results and won't be affected by changes in CPU clock (e.g. power saving features).
Lets get started with 1. Here is data structure to hold calibration data:
struct init
{
long long stamp; // last adjustment time
long long epoch; // last sync time as FILETIME
long long start; // counter ticks to match epoch
long long freq; // counter frequency (ticks per 10ms)
void sync(int sleep);
};
init data_[2] = {};
const init* volatile init_ = &data_[0];
Here is code for initial calibration; it has to be given time (in milliseconds) to wait for the clock to move; I've found that 500 milliseconds give pretty good results (the shorter time, the less accurate calibration). For the purpose of callibration we are going to use QueryPerformanceCounter() etc. You only need to call it for data_[0], since data_[1] will be updated by periodic clock adjustment (below).
void init::sync(int sleep)
{
LARGE_INTEGER t1, t2, p1, p2, r1, r2, f;
int cpu[4] = {};
// prepare for rdtsc calibration - affinity and priority
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
SetThreadAffinityMask(GetCurrentThread(), 2);
Sleep(10);
// frequency for time measurement during calibration
QueryPerformanceFrequency(&f);
// for explanation why RDTSC is safe on modern CPUs, look for "Constant TSC" and "Invariant TSC" in
// Intel(R) 64 and IA-32 Architectures Software Developer’s Manual (document 253668.pdf)
__cpuid(cpu, 0); // flush CPU pipeline
r1.QuadPart = __rdtsc();
__cpuid(cpu, 0);
QueryPerformanceCounter(&p1);
// sleep some time, doesn't matter it's not accurate.
Sleep(sleep);
// wait for the system clock to move, so we have exact epoch
GetSystemTimeAsFileTime((FILETIME*) (&t1.u));
do
{
Sleep(0);
GetSystemTimeAsFileTime((FILETIME*) (&t2.u));
__cpuid(cpu, 0); // flush CPU pipeline
r2.QuadPart = __rdtsc();
} while(t2.QuadPart == t1.QuadPart);
// measure how much time has passed exactly, using more expensive QPC
__cpuid(cpu, 0);
QueryPerformanceCounter(&p2);
stamp = t2.QuadPart;
epoch = t2.QuadPart;
start = r2.QuadPart;
// calculate counter ticks per 10ms
freq = f.QuadPart * (r2.QuadPart-r1.QuadPart) / 100 / (p2.QuadPart-p1.QuadPart);
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_NORMAL);
SetThreadAffinityMask(GetCurrentThread(), 0xFF);
}
With good calibration data you can calculate exact time from cheap RDTSC (I measured the call and calculation to be ~25 nanoseconds on my machine). There are three things to note:
return type is binary compatible with FILETIME structure and is precise to 100ns , unlike GetSystemTimeAsFileTime (which increments in 10-30ms or so intervals, or 1 millisecond at best).
in order to avoid expensive conversions integer to double to integer, the whole calculation is performed in 64 bit integers. Even though these can hold huge numbers, there is real risk of integer overflow, and so start must be brought forward periodically to avoid it. This is done in clock adjustment.
we are making a copy of calibration data, because it might have been updated during our call by clock adjustement in another thread.
Here is the code to read current time with high precision. Return value is binary compatible with FILETIME, i.e. number of 100-nanosecond intervals since Jan 1, 1601.
long long now()
{
// must make a copy
const init* it = init_;
// __cpuid(cpu, 0) - no need to flush CPU pipeline here
const long long p = __rdtsc();
// time passed from epoch in counter ticks
long long d = (p - it->start);
if (d > 0x80000000000ll)
{
// closing to integer overflow, must adjust now
adjust();
}
// convert 10ms to 100ns periods
d *= 100000ll;
d /= it->freq;
// and add to epoch, so we have proper FILETIME
d += it->epoch;
return d;
}
For clock adjustment, we need to capture exact time (as provided by system clock) and compare it against our clock; this will give us drift value. Next we use simple formula to calculate "adjusted" CPU frequency, to make our clock meet system clock at the time of next adjustment. Thus it is important that adjustments are called on regular intervals; I've found that it works well when called in 15 minutes intervals. I use CreateTimerQueueTimer, called once at program startup to schedule adjustment calls (not demonstrated here).
The slight problem with capturing accurate system time (for the purpose of calculating drift) is that we need to wait for the system clock to move, and that can take up to 30 milliseconds or so (it's a long time). If adjustment is not performed, it would risk integer overflow inside function now(), not to mention uncorrected drift from system clock. There is builtin protection against overflow in now(), but we really don't want to trigger it synchronously in a thread which happened to call now() at the wrong moment.
Here is the code for periodic clock adjustment, clock drift is in r->epoch - r->stamp:
void adjust()
{
// must make a copy
const init* it = init_;
init* r = (init_ == &data_[0] ? &data_[1] : &data_[0]);
LARGE_INTEGER t1, t2;
// wait for the system clock to move, so we have exact time to compare against
GetSystemTimeAsFileTime((FILETIME*) (&t1.u));
long long p = 0;
int cpu[4] = {};
do
{
Sleep(0);
GetSystemTimeAsFileTime((FILETIME*) (&t2.u));
__cpuid(cpu, 0); // flush CPU pipeline
p = __rdtsc();
} while (t2.QuadPart == t1.QuadPart);
long long d = (p - it->start);
// convert 10ms to 100ns periods
d *= 100000ll;
d /= it->freq;
r->start = p;
r->epoch = d + it->epoch;
r->stamp = t2.QuadPart;
const long long dt1 = t2.QuadPart - it->epoch;
const long long dt2 = t2.QuadPart - it->stamp;
const double s1 = (double) d / dt1;
const double s2 = (double) d / dt2;
r->freq = (long long) (it->freq * (s1 + s2 - 1) + 0.5);
InterlockedExchangePointer((volatile PVOID*) &init_, r);
// if you have log output, here is good point to log calibration results
}
Lastly two utility functions. One will convert FILETIME (including output from now()) to SYSTEMTIME while preserving microseconds to separate int. Other will return frequency, so your program can use __rdtsc() directly for accurate measurements of time intervals (with nanosecond precision).
void convert(SYSTEMTIME& s, int &us, long long f)
{
LARGE_INTEGER i;
i.QuadPart = f;
FileTimeToSystemTime((FILETIME*) (&i.u), &s);
s.wMilliseconds = 0;
LARGE_INTEGER t;
SystemTimeToFileTime(&s, (FILETIME*) (&t.u));
us = (int) (i.QuadPart - t.QuadPart)/10;
}
long long frequency()
{
// must make a copy
const init* it = init_;
return it->freq * 100;
}
Well of course none of the above is more accurate than your system clock, which is unlikely to be more accurate than few hundred milliseconds. The purpose of precise clock (as opposed to accurate) as implemented above, is to provide single measure which can be used for both:
cheap and very accurate measurement of time intervals (not wall time),
much less accurate, but monotonous and consistent with the above, measure of wall time
I think it does it pretty well. Example use are logs, where one can use timestamps not only to find time of events, but also reason about internal program timings, latency (in microseconds) etc.
I leave the plumbing (call to initial calibration, scheduling adjustment) as an exercise for gentle readers.
You can use boost date time library.
You can use boost::posix_time::hours, boost::posix_time::minutes,
boost::posix_time::seconds, boost::posix_time::millisec, boost::posix_time::nanosec
http://www.boost.org/doc/libs/1_39_0/doc/html/date_time.html
One popular way is using the QueryPerformanceCounter() call. This is useful if you need high-precision timing, such as for measuring durations that only take on the order of microseconds. I believe this is implemented using the RDTSC machine instruction.
There might be issues though, such as the counter frequency varying with power-saving, and synchronization between multiple cores. See the Wikipedia link above for details on these issues.
Take a look at the Windows APIs GetSystemTime() / GetLocalTime() or GetSystemTimeAsFileTime().
GetSystemTimeAsFileTime() expresses time in 100 nanosecond intervals, that is 1/10 of a microsecond. All functions provide the current time with in millisecond accuracy.
EDIT:
Keep in mind, that on most Windows systems the system time is only updated about every 1 millisecond. So even representing your time with microsecond accuracy makes it still necessary to acquire the time with such a precision.
Take a look at this: http://www.decompile.com/cpp/faq/windows_timer_api.htm
May be this can help:
NTSTATUS WINAPI NtQuerySystemTime(__out PLARGE_INTEGER SystemTime);
SystemTime [out] - a pointer to a LARGE_INTEGER structure that receives the system time. This is a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC).