Continuous WASAPI Ring-Buffer Sampling - c++

How to use WASAPI (or something like it) to continuously sample audio into a (thread-safe) ring-buffer, so that a consumer thread can read from that buffer in an a set interval?
Currently we have a .sample() method that returns a chunk of samples after a set sampling interval, but this has quite the overhead due to memory allocation etc.. maybe this method could be optimized; I'm pretty sure we're doing it wrong.
std::vector<short> sampler2::sample()
{
// prepare header
waveInPrepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));
// insert a wave input buffer
waveInAddBuffer(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));
// commence sampling input
waveInStart(hWaveIn);
// sleep for the duration of a sample interval
std::this_thread::sleep_for(milliseconds(SAMPLE_INTERVAL));
// create vector
std::vector<short> samplesChunk(&waveIn[0], &waveIn[0] + NUMPTS);
// return vector
return samplesChunk;
}
GitHub Links: sampler2.h & sampler2.cpp
The code is very shitty and we have no clue how to properly use WASAPI. Our goal was to (quickly) create a sampler class that can leverage a sampling interval of >10 ms.

Your sample uses waveout API. You can check MSDN for WASAPI reference and usage.
Here is the basic description of WASAPI usage:
The client calls the methods in the IAudioRenderClient interface to write rendering data to an endpoint buffer.To request an endpoint buffer of a particular size, the client calls the IAudioClient::Initialize method. To get the size of the allocated buffer, which might be different from the requested size, the client calls the IAudioClient::GetBufferSize method.
To move a stream of rendering data through the endpoint buffer, the client alternately calls the IAudioRenderClient::GetBuffer method and the IAudioRenderClient::ReleaseBuffer method. The client accesses the data in the endpoint buffer as a series of data packets. The GetBuffer call retrieves the next packet so that the client can fill it with rendering data. After writing the data to the packet, the client calls ReleaseBuffer to add the completed packet to the rendering queue.
There is also this Microsoft C++ WASAPI example.

Related

Send audio data from usermode to Sysvad (virtual audio driver) use IOCTL

In my application(usermode), i receive audio data and save it use function:
VOID CSoundRecDlg::ProcessHeader(WAVEHDR * pHdr)
{
MMRESULT mRes=0;
TRACE("%d",pHdr->dwUser);
if(WHDR_DONE==(WHDR_DONE &pHdr->dwFlags))
{
mmioWrite(m_hOPFile,pHdr->lpData,pHdr->dwBytesRecorded);
mRes=waveInAddBuffer(m_hWaveIn,pHdr,sizeof(WAVEHDR));
if(mRes!=0)
StoreError(mRes,TRUE,"File: %s ,Line Number:%d",__FILE__,__LINE__);
}
}
pHdr Pointer points to the audio data(byte[11025])
How to I can get this data in sysvad using IOCTL. Thanks for help.
If I understand correctly you have an audio buffer than you want to send for output in sysvad. For this case scenario you would have to write the butter in using "writebytes"
please look at this example for more in depth details.
https://github.com/microsoft/Windows-driver-samples/blob/master/audio/sysvad/EndpointsCommon/minwavertstream.cpp
UPDATE
in answer to your comment:
Circular buffer is not a must it really depends from the implementation you want to do, the main point is to get the buffer in memory, writing it is simply like this
adapterObject->WriteEtwEvent(eMINIPORT_LAST_BUFFER_RENDERED,
m_ullLinearPosition + ByteDisplacement, // Current linear buffer position
m_ulCurrentWritePosition, // The very last WaveRtBufferWritePosition that the driver received
0,
0);
ideally you would use separation of concerns with the logic for reading and writing independent from each other, with the buffer object just passed between them

How to access audio waveform buffer when recording using WASAPI in c++?

I've used winmm library to access the waveform information before with syntax like this waveInAddBuffer(hwi, &wh[i], sizeof(WAVEHDR)); So I've to pass the pointer that point to the memory block (&wh[i]) where I want to store the waveform data, In order to access it I just have do this wh[i].lpData
Are there a similar function like this in WASAPI?
It's IAudioCaptureClient::GetBuffer method:
Retrieves a pointer to the next available packet of data in the capture endpoint buffer.
[…]
BYTE **ppData
Pointer to a pointer variable into which the method writes the starting address of the next data packet that is available for the client to read.
See also:
For a code example that calls the GetBuffer method, see Capturing a Stream.

QTcpSocket data transfer stops when read buffer is full and does not resumes when it frees up

I have server-client Qt application, where client sends data packets to server and server reads them at a set time intervals. It happens that client sends data faster than server can read thus filling all the memory on the server side. I am using QAbstractSocket::setReadBufferSize(size) to set max read buffer size on the server side and when it fills up, socket data transferring stops, and data is buffered on client side, which is what i want, but the problem is when server's QTcpSocket's internal read buffer frees up (is not full anymore), data transfer between client and server does not resume.
I've tried to use QAbstractSocket::resume() which seems to work, but Qt5.10 documentation says:
Continues data transfer on the socket. This method should only be used
after the socket has been set to pause upon notifications and a
notification has been received. The only notification currently
supported is QSslSocket::sslErrors(). Calling this method if the
socket is not paused results in undefined behavior.
I feel like I should not use that function in this situation, but is there any other solution? How do i know if socket is paused? Why data transfer does not continue automaticaly when QTcpSocket's internal read buffer is not full anymore?
EDIT 1 :
I have downloaded Qt(5.10.0) sources and pdb's to debug this situation and I can see that QAbstractSocket::readData() internal function have line "d->socketEngine->setReadNotificationEnabled(true)" which re-enables data transfering, but QAbstractSocket::readData() gets called only when QTcpSocket internal read buffer is empty (qiodevice.cpp; QIODevicePrivate::read(); line 1176) and in My situation it is never empty, because I read it only when it has enough data for complete packet.
Shouldn't QAbstractSocket::readData() be called when read buffer is not full anymore and not when it's completely empty? Or maybe i do something wrong?
Found a Workaround!
In Qt5.10 sources i can clearly see that QTcpSpcket internal read notifications is disabled (qabstractsocket.cpp; bool QAbstractSocketPrivate::canReadNotification(); line 697) when read buffer is full and to enable read notifications you need to read all buffer to make it empty OR use QAbstractSocket::setReadBufferSize(newSize) which internally enables read notifications WHEN newSize is not 0 (unlimited) and not equal to oldSize (qabstractsocket.cpp; void QAbstractSocket::setReadBufferSize(qint64 size); line 2824).
Here's a short function for that:
QTcpSocket socket;
qint64 readBufferSize; // Current max read buffer size.
bool flag = false; // flag for changing max read buffer size.
bool isReadBufferLimitReached = false;
void App::CheckReadBufferLimitReached()
{
if (readBufferSize <= socket.bytesAvailable())
isReadBufferLimitReached = true;
else if (isReadBufferLimitReached)
{
if (flag)
{
readBufferSize++;
flag = !flag;
}
else
{
readBufferSize--;
flag = !flag;
}
socket.setReadBufferSize(readBufferSize);
isReadBufferLimitReached = false;
}
}
In the function which reads data from QTcpSocket at the set intervals, BEFORE reading data, I call this function, which checks if read buffer is full and sets isReadBufferLimitReached if true. Then I read needed amount of data from QTcpSocket and AT THE END I call that function again, which, if buffer were full before, calls QAbstractSocket::setReadBufferSize(size) to set new buffer size and enable internal read notifications. Changing read buffer size by +/-1 should be safe, because you read at least 1 byte from socket.

Poco Websocket cant'r read large data

I use Net::SocketReactor on proccess connection. When data input into socket called something like the following code:
int WebSocketWrapper::DoRecieve(void *buf) {
try{
int flags;
const auto size = m_sock.availabel();
const auto ret = m_sock.receiveFrame(buf, size, flags);
if (size != ret){
logger.warrning('Read less than available');
}
return ret;
}
catch (WebSocketException& exc){
logger.log(exc);
switch (exc.code()){
case pnet::WebSocket::WS_ERR_HANDSHAKE_UNSUPPORTED_VERSION:
logger.debug("unsuported version");
break;
// fallthrough
case pnet::WebSocket::WS_ERR_NO_HANDSHAKE:
case pnet::WebSocket::WS_ERR_HANDSHAKE_NO_VERSION:
case pnet::WebSocket::WS_ERR_HANDSHAKE_NO_KEY:
logger.debug("Bad request");
break;
}
}
return 0;
}
It's good working when data size is less than 1400 bytes. TCP packs not fragmented. But when I try send data over 1400 bytes I have WebSocketException: "Insufficient buffer for payload size". I'm explore source code Poco::Net::Websocket and he found conflict. When call Websocket::readFrame analyzes the size of the frame a header, but I have only part of the frame. I can request that return StreamSocket::availabel.
How read large data from websocket?
WebSockets operate in frames and you will always receive a frame or nothing. With that said, don't bother figuring out the amount of available data (you're probably hitting the ethernet 1500 byte MTU) but provide storage to accommodate the largest frame you expect to receive and call receiveFrame(). If the messages are fragmented between multiple frames, you'll have to deal with that at the application level. See documentation:
Receives a frame from the socket and stores it
in buffer. Up to length bytes are received. If
the frame's payload is larger, a WebSocketException
is thrown and the WebSocket connection must be
terminated.
The upcoming 1.7 release will have receiveFrame() that resizes the buffer automatically to accomodate the frame.
To understand fragmented messages, see Receiving Data in RFC 6455. While WebSockets are conceived as messaging protocol, some musings on whether they are really messaging or streaming can be found here.
Also, the code you posted does not compile and the idea of writing an unknown number of bytes in a buffer of unknown size seems hazardous, to put it mildly.

Calculating socket upload speed

I'm wondering if anyone knows how to calculate the upload speed of a Berkeley socket in C++. My send call isn't blocking and takes 0.001 seconds to send 5 megabytes of data, but takes a while to recv the response (so I know it's uploading).
This is a TCP socket to a HTTP server and I need to asynchronously check how many bytes of data have been uploaded / are remaining. However, I can't find any API functions for this in Winsock, so I'm stumped.
Any help would be greatly appreciated.
EDIT: I've found the solution, and will be posting as an answer as soon as possible!
EDIT 2: Proper solution added as answer, will be added as solution in 4 hours.
I solved my issue thanks to bdolan suggesting to reduce SO_SNDBUF. However, to use this code you must note that your code uses Winsock 2 (for overlapped sockets and WSASend). In addition to this, your SOCKET handle must have been created similarily to:
SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);
Note the WSA_FLAG_OVERLAPPED flag as the final parameter.
In this answer I will go through the stages of uploading data to a TCP server, and tracking each upload chunk and it's completion status. This concept requires splitting your upload buffer into chunks (minimal existing code modification required) and uploading it piece by piece, then tracking each chunk.
My code flow
Global variables
Your code document must have the following global variables:
#define UPLOAD_CHUNK_SIZE 4096
int g_nUploadChunks = 0;
int g_nChunksCompleted = 0;
WSAOVERLAPPED *g_pSendOverlapped = NULL;
int g_nBytesSent = 0;
float g_flLastUploadTimeReset = 0.0f;
Note: in my tests, decreasing UPLOAD_CHUNK_SIZE results in increased upload speed accuracy, but decreases overall upload speed. Increasing UPLOAD_CHUNK_SIZE results in decreased upload speed accuracy, but increases overall upload speed. 4 kilobytes (4096 bytes) was a good comprimise for a file ~500kB in size.
Callback function
This function increments the bytes sent and chunks completed variables (called after a chunk has been completely uploaded to the server)
void CALLBACK SendCompletionCallback(DWORD dwError, DWORD cbTransferred, LPWSAOVERLAPPED lpOverlapped, DWORD dwFlags)
{
g_nChunksCompleted++;
g_nBytesSent += cbTransferred;
}
Prepare socket
Initially, the socket must be prepared by reducing SO_SNDBUF to 0.
Note: In my tests, any value greater than 0 will result in undesirable behaviour.
int nSndBuf = 0;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBuf, sizeof(nSndBuf));
Create WSAOVERLAPPED array
An array of WSAOVERLAPPED structures must be created to hold the overlapped status of all of our upload chunks. To do this I simply:
// Calculate the amount of upload chunks we will have to create.
// nDataBytes is the size of data you wish to upload
g_nUploadChunks = ceil(nDataBytes / float(UPLOAD_CHUNK_SIZE));
// Overlapped array, should be delete'd after all uploads have completed
g_pSendOverlapped = new WSAOVERLAPPED[g_nUploadChunks];
memset(g_pSendOverlapped, 0, sizeof(WSAOVERLAPPED) * g_nUploadChunks);
Upload data
All of the data that needs to be send, for example purposes, is held in a variable called pszData. Then, using WSASend, the data is sent in blocks defined by the constant, UPLOAD_CHUNK_SIZE.
WSABUF dataBuf;
DWORD dwBytesSent = 0;
int err;
int i, j;
for(i = 0, j = 0; i < nDataBytes; i += UPLOAD_CHUNK_SIZE, j++)
{
int nTransferBytes = min(nDataBytes - i, UPLOAD_CHUNK_SIZE);
dataBuf.buf = &pszData[i];
dataBuf.len = nTransferBytes;
// Now upload the data
int rc = WSASend(sock, &dataBuf, 1, &dwBytesSent, 0, &g_pSendOverlapped[j], SendCompletionCallback);
if ((rc == SOCKET_ERROR) && (WSA_IO_PENDING != (err = WSAGetLastError())))
{
fprintf(stderr, "WSASend failed: %d\n", err);
exit(EXIT_FAILURE);
}
}
The waiting game
Now we can do whatever we wish while all of the chunks upload.
Note: the thread which called WSASend must be regularily put into an alertable state, so that our 'transfer completed' callback (SendCompletionCallback) is dequeued out of the APC (Asynchronous Procedure Call) list.
In my code, I continuously looped until g_nUploadChunks == g_nChunksCompleted. This is to show the end-user upload progress and speed (can be modified to show estimated completion time, elapsed time, etc.)
Note 2: this code uses Plat_FloatTime as a second counter, replace this with whatever second timer your code uses (or adjust accordingly)
g_flLastUploadTimeReset = Plat_FloatTime();
// Clear the line on the screen with some default data
printf("(0 chunks of %d) Upload speed: ???? KiB/sec", g_nUploadChunks);
// Keep looping until ALL upload chunks have completed
while(g_nChunksCompleted < g_nUploadChunks)
{
// Wait for 10ms so then we aren't repeatedly updating the screen
SleepEx(10, TRUE);
// Updata chunk count
printf("\r(%d chunks of %d) ", g_nChunksCompleted, g_nUploadChunks);
// Not enough time passed?
if(g_flLastUploadTimeReset + 1 > Plat_FloatTime())
continue;
// Reset timer
g_flLastUploadTimeReset = Plat_FloatTime();
// Calculate how many kibibytes have been transmitted in the last second
float flByteRate = g_nBytesSent/1024.0f;
printf("Upload speed: %.2f KiB/sec", flByteRate);
// Reset byte count
g_nBytesSent = 0;
}
// Delete overlapped data (not used anymore)
delete [] g_pSendOverlapped;
// Note that the transfer has completed
Msg("\nTransfer completed successfully!\n");
Conclusion
I really hope this has helped somebody in the future who has wished to calculate upload speed on their TCP sockets without any server-side modifications. I have no idea how performance detrimental SO_SNDBUF = 0 is, although I'm sure a socket guru will point that out.
You can get a lower bound on the amount of data received and acknowledged by subtracting the value of the SO_SNDBUF socket option from the number of bytes you have written to the socket. This buffer may be adjusted using setsockopt, although in some cases the OS may choose a length smaller or larger than you specify, so you must re-check after setting it.
To get more precise than that, however, you must have the remote side inform you of progress, as winsock does not expose an API to retrieve the amount of data currently pending in the send buffer.
Alternately, you could implement your own transport protocol on UDP, but implementing rate control for such a protocol can be quite complex.
Since you don't have control over the remote side, and you want to do it in the code, I'd suggest doing very simple approximation. I assume a long living program/connection. One-shot uploads would be too skewed by ARP, DNS lookups, socket buffering, TCP slow start, etc. etc.
Have two counters - length of the outstanding queue in bytes (OB), and number of bytes sent (SB):
increment OB by number of bytes to be sent every time you enqueue a chunk for upload,
decrement OB and increment SB by the number returned from send(2) (modulo -1 cases),
on a timer sample both OB and SB - either store them, log them, or compute running average,
compute outstanding bytes a second/minute/whatever, same for sent bytes.
Network stack does buffering and TCP does retransmission and flow control, but that doesn't really matter. These two counters will tell you the rate your app produces data with, and the rate it is able to push it to the network. It's not the method to find out the real link speed, but a way to keep useful indicators about how good the app is doing.
If data production rate is bellow the network output rate - everything is fine. If it's the other way around and the network cannot keep up with the app - there's a problem - you need either faster network, slower app, or different design.
For one-time experiments just take periodic snapshots of netstat -sp tcp output (or whatever that is on Windows) and calculate the send-rate manually.
Hope this helps.
If your app uses packet headers like
0001234DT
where 000123 is the packet length for a single packet, you can consider using MSG_PEEK + recv() to get the length of the packet before you actually read it with recv().
The problem is send() is NOT doing what you think - it is buffered by the kernel.
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket send buffer = %d\n", now(), flag);
sz=sizeof(int);
ERR_CHK(getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &flag, &sz));
fprintf(STDOUT, "%s: listener socket recv buffer = %d\n", now(), flag);
See what these show for you.
When you recv on a NON-blocking socket that has data, it normally does not have MB of data parked in the buufer ready to recv. Most of what I have experienced is that the socket has ~1500 bytes of data per recv. Since you are probably reading on a blocking socket it takes a while for the recv() to complete.
Socket buffer size is the probably single best predictor of socket throughput. setsockopt() lets you alter socket buffer size, up to a point. Note: these buffers are shared among sockets in a lot of OSes like Solaris. You can kill performance by twiddling these settings too much.
Also, I don't think you are measuring what you think you are measuring. The real efficiency of send() is the measure of throughput on the recv() end. Not the send() end.
IMO.