How to get below 10ms latency using WASAPI shared mode? - c++

According to Microsoft, starting with Windows 10, applications using shared-mode WASAPI can request buffer sizes smaller than 10ms (see https://msdn.microsoft.com/en-us/library/windows/hardware/mt298187%28v=vs.85%29.aspx).
According to the article, achieving such low latencies requires some driver updates, which I did. Using an exclusive-mode render and capture stream, I measured a total round-trip latency (using a hardware loopback cable) of around 13ms. This suggests to me that at least one of the endpoints successfully achieves a latency of < 10ms. (Is this assumption correct?)
The article mentions that applications can use the new IAudioClient3 interface to query the minimum buffer size supported by the Windows audio engine using IAudioClient3::GetSharedModeEnginePeriod(). However, this function always returns 10ms on my system, and any attempt to initialize an audio stream using either IAudioClient::Initialize() or IAudioClient3::InitializeSharedAudioStream() with a period lower than 10ms always results in AUDCLNT_E_INVALID_DEVICE_PERIOD.
Just to be sure, I also disabled any effects processing in the audio drivers.
What am I missing? Is it even possible to get low latency from shared mode?
See below for some sample code.
#include <windows.h>
#include <atlbase.h>
#include <mmdeviceapi.h>
#include <audioclient.h>
#include <iostream>
#define VERIFY(hr) do { \
auto temp = (hr); \
if(FAILED(temp)) { \
std::cout << "Error: " << #hr << ": " << temp << "\n"; \
goto error; \
} \
} while(0)
int main(int argc, char** argv) {
HRESULT hr;
CComPtr<IMMDevice> device;
AudioClientProperties props;
CComPtr<IAudioClient> client;
CComPtr<IAudioClient2> client2;
CComPtr<IAudioClient3> client3;
CComHeapPtr<WAVEFORMATEX> format;
CComPtr<IMMDeviceEnumerator> enumerator;
REFERENCE_TIME minTime, maxTime, engineTime;
UINT32 min, max, fundamental, default_, current;
VERIFY(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED));
VERIFY(enumerator.CoCreateInstance(__uuidof(MMDeviceEnumerator)));
VERIFY(enumerator->GetDefaultAudioEndpoint(eRender, eMultimedia, &device));
VERIFY(device->Activate(__uuidof(IAudioClient), CLSCTX_ALL, nullptr, reinterpret_cast<void**>(&client)));
VERIFY(client->QueryInterface(&client2));
VERIFY(client->QueryInterface(&client3));
VERIFY(client3->GetCurrentSharedModeEnginePeriod(&format, &current));
// Always fails with AUDCLNT_E_OFFLOAD_MODE_ONLY.
hr = client2->GetBufferSizeLimits(format, TRUE, &minTime, &maxTime);
if(hr == AUDCLNT_E_OFFLOAD_MODE_ONLY)
std::cout << "GetBufferSizeLimits returned AUDCLNT_E_OFFLOAD_MODE_ONLY.\n";
else if(SUCCEEDED(hr))
std::cout << "hw min = " << (minTime / 10000.0) << " hw max = " << (maxTime / 10000.0) << "\n";
else
VERIFY(hr);
// Correctly? reports a minimum hardware period of 3ms and audio engine period of 10ms.
VERIFY(client->GetDevicePeriod(&engineTime, &minTime));
std::cout << "hw min = " << (minTime / 10000.0) << " engine = " << (engineTime / 10000.0) << "\n";
// All values are set to a number of frames corresponding to 10ms.
// This does not change if i change the device's sampling rate in the control panel.
VERIFY(client3->GetSharedModeEnginePeriod(format, &default_, &fundamental, &min, &max));
std::cout << "default = " << default_
<< " fundamental = " << fundamental
<< " min = " << min
<< " max = " << max
<< " current = " << current << "\n";
props.bIsOffload = FALSE;
props.cbSize = sizeof(props);
props.eCategory = AudioCategory_ForegroundOnlyMedia;
props.Options = AUDCLNT_STREAMOPTIONS_RAW | AUDCLNT_STREAMOPTIONS_MATCH_FORMAT;
// Doesn't seem to have any effect regardless of category/options values.
VERIFY(client2->SetClientProperties(&props));
format.Free();
VERIFY(client3->GetCurrentSharedModeEnginePeriod(&format, &current));
VERIFY(client3->GetSharedModeEnginePeriod(format, &default_, &fundamental, &min, &max));
std::cout << "default = " << default_
<< " fundamental = " << fundamental
<< " min = " << min
<< " max = " << max
<< " current = " << current << "\n";
error:
CoUninitialize();
return 0;
}

Per Hans in the comment above, double-check that you've followed the instructions for Low Latency Audio here.
I'd reboot the machine just to be sure; Windows can be a bit finicky with that kind of thing.

Related

How to read values from memory address with DLL injected?

I successfully added Discord Rich Presence to an old game through DLL injection.
When I open the game the RPC kicks in, and it successfully updates my Discord Status.
I want go to further and make the RPC dynamic, reading values from the client itself.
I got the memory addresses from the client using Cheat Engine, these values are persisting through close/open the game again and again.
Now, I have tried several approaches to get this done.
As I am using an internal dll, I can directly read the values from the memory.
I want to read the level from the player
In the following code I'm trying to obtain the value from the address that stores the level of the player.
I'm using three different GetModuleHandles, just to see which could work.
DWORD moduleBase = (DWORD)GetModuleHandle("1.exe"); //GetModule1
DWORD anothermethod = (DWORD)GetModuleHandleA(0); //GetModule2
uintptr_t* p = (uintptr_t*)((uintptr_t)ExeBaseAddress + 0xD35240);
uintptr_t ModuleBaseAdrs = (DWORD&)*p;
printf("ModBaseAdrsLoc - %p, ModuleBaseAdrs - %X\n", p, ModuleBaseAdrs);
int* level = (int*)GetPointerAddress(moduleBase + 0xD35240, {});
DWORD level2DWORD = (DWORD)(anothermethod + 0x0D35240);
size_t ModuleBase = (size_t)GetModuleHandle("1.exe"); //GetModule3
size_t* Adr = reinterpret_cast<size_t*>((*reinterpret_cast<size_t*>(ModuleBase + 0xD35240)));
DWORD* levellevel = (DWORD*)0x0D35240;
Sleep(2000);
int PID = find("1.exe");
std::cout << "PID: " << PID << std::endl;
std::cout << "level2DWORD: " << level2DWORD << std::endl; //GetModule2
std::cout << "Level: " << level << std::endl; //GetModule1
std::cout << "LevelLevel: " << levellevel << std::endl;
std::cout << "Adr: " << Adr << std::endl; //GetModule3
This is what I get on the console
As you can see, I have tried different approaches but none of them results on the desired output.
Could you please light my path?
How can I successfully read from the memory addresses?

Minor Page Faults when writing to mmaped file buffer

I'm noticing minor page faults when writing to a mmapped file buffer where the file is backed by disk.
My understanding of mmap is that for file mappings, the page cache has the file's data, and the page table will be updated to point to the file data in the page cache. This means that on the first write to the mmapped buffer, the page table will have to be updated to point to the page cache, and we may see minor page faults. However, as my benchmark below shows, even after pre-faulting the mmapped buffer, I still see minor page faults when doing random writes.
Note that these minor page faults only show up if I write to a random buffer (buf in the benchmark below) in-between writing to the mmapped buffer. Also note that these minor page faults do not seem to happen when using a tmpfs which is not disk backed.
So my question is why do we see these minor page faults when writing to a disk-backed file?
Here is the benchmark:
#include <iostream>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <fstream>
#include <sys/mman.h>
#include <sys/resource.h>
int main(int argc, char** argv) {
// Command line parsing.
if (argc != 2) {
std::cout << "Usage: ./bench <path to file>" << std::endl;
exit(1);
}
std::string filepath = argv[1];
// Open and truncate the file to be of size `FILE_LEN`.
int fd = open(filepath.c_str(), O_CREAT | O_TRUNC | O_RDWR, 0664);
const size_t FILE_LEN = (1 << 26); // 64MiB
if (fd < 0) {
std::cout << "open failed: " << strerror(errno) << std::endl;
exit(1);
}
if (ftruncate(fd, FILE_LEN) < 0) {
std::cout << "ftruncate failed: " << strerror(errno) << std::endl;
exit(1);
}
// `mmap` the file and pre-fault it.
char* ptr = static_cast<char*>(mmap(nullptr, FILE_LEN, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0));
if(ptr == MAP_FAILED) {
std::cout << "mmap failed: " << strerror(errno) << std::endl;
exit(1);
}
memset(ptr, 'A', FILE_LEN);
// Create a temporary buffer to clear the cache.
constexpr size_t BUF_LEN = (1 << 22); // 4MiB
char* buf = new char[BUF_LEN];
memset(buf, 'B', BUF_LEN);
std::cout << "Opened file " << fd << ", pre faulted " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN]<< std::endl;
// Run the actual benchmark
rusage usage0, usage1;
getrusage(RUSAGE_THREAD, &usage0);
unsigned int x = 0;
for (size_t i = 0; i < 1000; ++i) {
char c = i % 26 + 'a';
const size_t WRITE_SIZE = 1024;
size_t start = i*WRITE_SIZE;
if (start + WRITE_SIZE >= FILE_LEN) {
start %= FILE_LEN;
}
memset(ptr + start, c, WRITE_SIZE);
x += ptr[start];
char d = (c * 142) % 26 + 'a';
for (size_t k = 0; k < BUF_LEN; ++k) {
buf[k] = d;
}
x += buf[int(d)];
}
std::cout << "Using the buffers " << ptr[rand() % FILE_LEN] << " " << buf[rand() % BUF_LEN] << " " << x << std::endl;
getrusage(RUSAGE_THREAD, &usage1);
std::cout << "========================" << std::endl;
std::cout << "Minor page faults = " << usage1.ru_minflt - usage0.ru_minflt << std::endl;
std::cout << "========================" << std::endl;
return 0;
}
Running ./bench "/dev/shm/test.txt" where /dev/shm/ uses the tmpfs filesystem, the benchmark always shows 0 minor page faults.
Running ./bench "/home/username/test.txt", the benchmark above shows ~200 minor page faults.
i.e. I see output like this with the above command:
========================
Minor page faults = 231
========================
Note that increasing the number of iterations in the benchmark correlates to an increase in number of minor page faults as well (e.g. changing number of iterations fromm 1000 to 2000 results in ~450 minor page faults).

What is the condition for completion handler of basic_stream_socket::async_write_some to be called?

The documentation for basic_stream_socket::async_write_some states, that the completion handler is called "when the write operation completes". But what exactly does that mean? I can think at least of two things
when the kernel (networking stack in particular) takes responsibility for the data
when the networking stack on the remote end has acknowledged (as in TCP ACK) the data, that was sent and the acknowledgement has reached the local networking stack
Tried to come up with a test program:
Coliru
#include <boost/asio.hpp>
#include <iostream>
using namespace boost::asio;
using ip::tcp;
using boost::system::error_code;
using namespace std::chrono_literals;
auto now = &std::chrono::high_resolution_clock::now;
auto sleep_for = [](auto dur) { std::this_thread::sleep_for(dur); };
auto timestamp = [start = now()] { return (now() - start)/1.0ms; };
int main() {
static constexpr size_t msglen = 16 << 20; // 16 mib
thread_pool io(1);
tcp::acceptor a(io, {{}, 7878});
a.set_option(tcp::acceptor::reuse_address(true));
a.listen();
#define CHECKED_OPTION(s, name, requested) do { \
tcp::socket::name option(requested); \
/*s.set_option(option);*/ \
s.get_option(option); \
std::cout << " " << #name << ":" << option.value(); \
} while (0)
a.async_accept([=](error_code ec, tcp::socket&& con) {
std::cout << timestamp() << "ms accept " << ec.message();
std::cout << " " << con.remote_endpoint(ec);
con.set_option(tcp::no_delay(true));
CHECKED_OPTION(con, receive_buffer_size, 100);
CHECKED_OPTION(con, send_buffer_size, 100);
std::cout << std::endl;
if (!ec) {
sleep_for(1s);
std::cout << timestamp() << "ms start write" << std::endl;
auto xfr = con.write_some(buffer(std::string(msglen, '*')), ec);
std::cout << timestamp() << "ms write completed: " << xfr << "/" << msglen << " (" << ec.message() << ")" << std::endl;
}
});
{
tcp::socket s(io);
sleep_for(1s);
std::cout << timestamp() << "ms connecting" << std::endl;
s.connect({{}, 7878});
std::cout << timestamp() << "ms connected";
CHECKED_OPTION(s, receive_buffer_size, 100);
CHECKED_OPTION(s, send_buffer_size, 100);
std::cout << std::endl;
sleep_for(3s);
std::cout << timestamp() << "ms disconnecting" << std::endl;
}
std::cout << timestamp() << "ms disconnected" << std::endl;
a.cancel();
io.join();
}
Note how we make sure to send more data than is being read by a wide margin to saturate any buffering involved. (We actually donot read any data from the client socket at all)
It prints (on Coliru):
1000.48ms connecting
1001.07ms connected receive_buffer_size:530904 send_buffer_size:1313280
1001.23ms accept Success 127.0.0.1:41614 receive_buffer_size:531000 send_buffer_size:1313280
2001.64ms start write
4001.37ms disconnecting
4001.62ms disconnected
4013.33ms write completed: 4481610/16777216 (Success)
It is clear that
write_some is only complete when the packages are ACK-ed, and the actual number of bytes transferred is returned.
the kernel ACKs packages independently of the application layer
In fact the packages may arrive out of order, in which case the kernel ACKs them individually before the sequencing for the application to read the data via the socket API.
Buffering
Buffering is inevitable, but can be tuned within limits. E.g. uncommenting this line from the CHECKED_OPTION macro:
s.set_option(option); \
Gives different output (Live On Coliru):
1000.44ms connecting
1001.08ms connected receive_buffer_size:1152 send_buffer_size:2304
1001.31ms accept Success 127.0.0.1:41618 receive_buffer_size:1152 send_buffer_size:2304
2001.88ms start write
4001.42ms disconnecting
4001.61ms disconnected
4008.21ms write completed: 43776/16777216 (Success)

Bad Allocation while trying to create USRP Object in c++

I'm getting an error - "bad allocation" while working with the UHD library.
I'm trying to compile some basic code to learn more about the UHD library. After compiling the program I'm getting an error.
Code:
int UHD_SAFE_MAIN(int argc, char *argv[]) {
uhd::set_thread_priority_safe();
std::string device_args("addr=192.168.10.2");
std::string subdev("A:0");
std::string ant("TX/RX");
std::string ref("internal");
double rate(1e6);
double freq(915e6);
double gain(10);
double bw(1e6);
//create a usrp device
std::cout << std::endl;
std::cout << boost::format("Creating the usrp device with: %s...") %device_args << std::endl;
uhd::usrp::multi_usrp::sptr usrp = uhd::usrp::multi_usrp::make(device_args);
// Lock mboard clocks
std::cout << boost::format("Lock mboard clocks: %f") % ref << std::endl;
usrp->set_clock_source(ref);
//always select the subdevice first, the channel mapping affects the other settings
std::cout << boost::format("subdev set to: %f") % subdev << std::endl;
usrp->set_rx_subdev_spec(subdev);
std::cout << boost::format("Using Device: %s") % usrp->get_pp_string() << std::endl;
//set the sample rate
if (rate <= 0.0) {
std::cerr << "Please specify a valid sample rate" << std::endl;
return ~0;
}
// set sample rate
std::cout << boost::format("Setting RX Rate: %f Msps...") % (rate / 1e6) << std::endl;
usrp->set_rx_rate(rate);
std::cout << boost::format("Actual RX Rate: %f Msps...") % (usrp->get_rx_rate() / 1e6) << std::endl << std::endl;
// set freq
std::cout << boost::format("Setting RX Freq: %f MHz...") % (freq / 1e6) << std::endl;
uhd::tune_request_t tune_request(freq);
usrp->set_rx_freq(tune_request);
std::cout << boost::format("Actual RX Freq: %f MHz...") % (usrp->get_rx_freq() / 1e6) << std::endl << std::endl;
// set the rf gain
std::cout << boost::format("Setting RX Gain: %f dB...") % gain << std::endl;
usrp->set_rx_gain(gain);
std::cout << boost::format("Actual RX Gain: %f dB...") % usrp->get_rx_gain() << std::endl << std::endl;
// set the IF filter bandwidth
std::cout << boost::format("Setting RX Bandwidth: %f MHz...") % (bw / 1e6) << std::endl;
usrp->set_rx_bandwidth(bw);
std::cout << boost::format("Actual RX Bandwidth: %f MHz...") % (usrp->get_rx_bandwidth() / 1e6) << std::endl << std::endl;
// set the antenna
std::cout << boost::format("Setting RX Antenna: %s") % ant << std::endl;
usrp->set_rx_antenna(ant);
std::cout << boost::format("Actual RX Antenna: %s") % usrp->get_rx_antenna() << std::endl << std::endl;
return EXIT_SUCCESS;
}
Part of the code where the error occurs:
//create a usrp device
std::cout << std::endl;
std::cout << boost::format("Creating the usrp device with: %s...") %device_args << std::endl;
uhd::usrp::multi_usrp::sptr usrp = uhd::usrp::multi_usrp::make(device_args);
Error:enter image description here
I'm using:
Microsoft Visual C++ Express 2010
C++ language
UHD library, Win32_VS2010.exe, 003.007.003-release
Boost library 1_63_0
I do not connect any URSP device to my computer.
I don't know if the error is connected with UHD library or with the C++ language. I was trying to compile this program using different versions of Microsoft Visual Studio and different versions of the UHD library, including the latest one. I was even trying to compile this on different PC, but the result was similar, there wasn't an error which interrupted the program but i got string "error: bad allocation" in the console instead and program stopped working in the same spot.
When i first started compiling this program I didn't got "bad allocation error" (UHD_003.004.000 - release). I got an error which said - "Error: LookupError: KeyError: No device found for ----->. After that i decided to upgrade my UHD library version to the newer one (003.007.003) and then bad allocation error started occuring. I was trying to install back the previous version but it didn't help.
I was trying to change type of device_args, from string to uhd::device_addr_t, like it is said in manual on http://files.ettus.com/manual, but the error didn't disappear.
Any help would be appreciated.
"I do not connect any URSP device to my computer."
You cannot execute this code without having a USRP connected to the computer you are running it on.
when you call uhd::usrp::multi_usrp::make(device_args);
the uhd is trying to connected to a USRP with the IP address you have speciified in device args.
try connecting a usrp to your computer and try again

Qt C++ and QSerialDevice: Windows 7 USB->Serial Port Reading/Writing

I am attempting to read from/write to an RS-232 capable device. This works without issue on Linux. The device is connected via a Digitus USB/Serial Adapter.
The device shows up in Device Manager as COM4.
void PayLife::run() {
this->sendingData = 0;
this->running = true;
qDebug() << "Starting PayLife Thread";
this->port = new AbstractSerial();
this->port->setDeviceName(this->addy);
QByteArray ba;
if (port->open(AbstractSerial::ReadWrite| AbstractSerial::Unbuffered)) {
if (!port->setBaudRate(AbstractSerial::BaudRate19200)) {
qDebug() << "Set baud rate " << AbstractSerial::BaudRate19200 << " error.";
goto end_thread;
};
if (!port->setDataBits(AbstractSerial::DataBits7)) {
qDebug() << "Set data bits " << AbstractSerial::DataBits7 << " error.";
goto end_thread;
}
if (!port->setParity(AbstractSerial::ParityEven)) {
qDebug() << "Set parity " << AbstractSerial::ParityEven << " error.";
goto end_thread;
}
if (!port->setStopBits(AbstractSerial::StopBits1)) {
qDebug() << "Set stop bits " << AbstractSerial::StopBits1 << " error.";
goto end_thread;
}
if (!port->setFlowControl(AbstractSerial::FlowControlOff)) {
qDebug() << "Set flow " << AbstractSerial::FlowControlOff << " error.";
goto end_thread;
}
while(this->running) {
if ((port->bytesAvailable() > 0) || port->waitForReadyRead(900)) {
ba.clear();
ba = port->read(1024);
qDebug() << "Readed is : " << ba.size() << " bytes";
}
else {
qDebug() << "Timeout read data in time : " << QTime::currentTime();
}
}
}
end_thread:
this->running = false;
}
On Linux, I don't use QSerialDevice, just regular serial reading/writing.
No matter what, I always get:
Starting PayLife Thread
Readed is : 0 bytes
Timeout read data in time : QTime("16:27:43")
Timeout read data in time : QTime("16:27:44")
Timeout read data in time : QTime("16:27:45")
Timeout read data in time : QTime("16:27:46")
I am not exactly sure why.
Note, I tried first to use regular Windows API reading and writing with the same results, i.e. it just doesn't ready any data from the device.
I am 100% sure that there is always something to read from the device, as it spams ENQ across the connection.
You should generate the doxygen documentation of QSerialDevice if you haven't already done so. The problem seems to be explained there.
On Windows in unbuffered mode:
Necessary to avoid the values of CharIntervalTimeout and
TotalReadConstantTimeout equal to 0. In theory, it was planned that at
zero values of timeouts method AbstractSerial::read() will read the
data which are in the buffer device driver (not to be confused with
the buffer AbstractSerial!) and return them immediately. But for
unknown reasons, this reading always returns 0, not depending on
whether or not a ready-made data in the buffer.
Because read waits for the data in unbuffered mode, I guess waitForReadyReady doesn't do anything useful in that mode.