Make printf write faster to the Windows commandline - c++

I'm analyzing some high resolution midi data. I'm writing it to the stdout but since there is so much data coming in it takes seconds for them all to display after I did the actual action.
Currently this line writes to the commandline:
std::vector<unsigned char> message;
...
printf("W 1 = %03d, W 2 = %03d, W 3 = %03d \n",(int)message[2],(int)message2[1],(int)message2[2]);

There's a good chance that this is a video driver issue - video card manufacturers probably don't always pay a lot of attention to console window performance. I've had rigs with painfully slow - I mean tooth-extraction painful - console windows that had probably 100 times improvement in that area by updating the video driver.

why dnt you use a string builder class like this one here and append all your output string and write to the output at the end?
What do you think?

Related

Speed up data logging code

I have a device that outputs 64 bits of binary data at a rate of 1KHz. I am reading the device over USB via a 3rd party DLL, converting the binary data into a float, timestamping it, and writing to file.
I have the following setup at the moment:
int main(int argc, char* argv[])
{
unsigned char Message_Rx[64];
USHORT Bytes_Read=0;
std::ofstream out(argv[1]);
do
{
Result = Comms.USBRead(&Message_Rx[0],&Bytes_Read);
unsigned long now = getTickCount(start);
if(Result != 0)
{
uint16_t msb (Message_Rx[11] & 0xff) \\leftshited 8;
uint16_t lsb (Message_Rx[12] & 0xff);
uint16_t rate = msb | lsb;
char outstring[1024];
sprintf(outstring, "%d\t%.7f", now, (float)rate*0.03125);
out << outstring << "\n";
}
}while(!kbhit());
out.close();
}
(Sorry, formatting gets messed up with >> or <<).
This produces perfectly good results on my desktop. There doesn't appear to be any data missing and the timestamps are continuous and 1ms apart.
143379582 -0.5937500
143379583 -1.5312500
143379584 -1.6250000
143379585 -1.4062500
143379586 -1.1875000
143379587 -1.3437500
143379588 -1.3125000
143379589 -1.3125000
143379590 -1.1562500
But when I run this on the old laptop that I need to use I get timestamps that appear in blocks and it looks like there must be some data missing:
143379582 -0.5937500
143379582 -1.5312500
143379582 -1.6250000
143379582 -1.4062500
143379582 -1.1875000
143379593 -1.3437500
143379593 -1.3125000
143379593 -1.3125000
143379593 -1.1562500
Is there a way to achieve a speedup of my code so that I won't lose data?
To say this loud and clear: for any PC that is not a Intel 486SX, 64kb/s is a utmost laughable rate. Getting a few Mb/s over USB is very doable with small, Dollar-a-piece microcontrollers without any optimization.
Whatever goes wrong needs investigation much more than your code does.
I don't know the Comms library, but that's where I'd look for the place where time is spent.
Other than that, your printing stuff to the screen should take orders of magnitude more time than your processing, but still shouldn't be a problem. As mentioned, 1kS/s * 64 b/S is nothing for modern (read: last twenty years) PC hardware.
I recommend storing the raw data until the key is hit. After the key is pressed, output the data.
You want to remove formatting and output from high performance code areas.
Paraphrasing a song, There will be time enough for printing when the data's done.
Edit 1:
An array-based circular queue is a good data structure to hold the incoming data. This gives you the last N data samples.
Whenever you have issues with performance, your first step should be to profile the code to see what parts of it are taking up time.
However, for your code, I would say that the printing and string handling are unnecessary for the main loop. I would have a separate array of timestamps and within my main loop only acquire data.
After a key is hit, you no longer have timing restrictions and can deal with the somewhat expensive operation of file I/O and building up of the strings.
A final note is that your OS might be stealing CPU cycles from you. You may want to try to run your code with higher priorities to rule out scheduling.
With all that said, as was mentioned above, your data rate should be sustainable unless you're running on some really vintage hardware.

Irregular file writing performance in c++

I am writing an app which receives a binary data stream wtih a simple function call like put(DataBLock, dateTime); where each data package is 4 MB
I have to write these datablocks to seperate files for future use with some additional data like id, insertion time, tag etc...
So I both tried these two methods:
first with FILE:
data.id = seedFileId;
seedFileId++;
std::string fileName = getFileName(data.id);
char *fNameArray = (char*)fileName.c_str();
FILE* pFile;
pFile = fopen(fNameArray,"wb");
fwrite(reinterpret_cast<const char *>(&data.dataTime), 1, sizeof(data.dataTime), pFile);
data.dataInsertionTime = time(0);
fwrite(reinterpret_cast<const char *>(&data.dataInsertionTime), 1, sizeof(data.dataInsertionTime), pFile);
fwrite(reinterpret_cast<const char *>(&data.id), 1, sizeof(long), pFile);
fwrite(reinterpret_cast<const char *>(&data.tag), 1, sizeof(data.tag), pFile);
fwrite(reinterpret_cast<const char *>(&data.data_block[0]), 1, data.data_block.size() * sizeof(int), pFile);
fclose(pFile);
second with ostream:
ofstream fout;
data.id = seedFileId;
seedFileId++;
std::string fileName = getFileName(data.id);
char *fNameArray = (char*)fileName.c_str();
fout.open(fNameArray, ios::out| ios::binary | ios::app);
fout.write(reinterpret_cast<const char *>(&data.dataTime), sizeof(data.dataTime));
data.dataInsertionTime = time(0);
fout.write(reinterpret_cast<const char *>(&data.dataInsertionTime), sizeof(data.dataInsertionTime));
fout.write(reinterpret_cast<const char *>(&data.id), sizeof(long));
fout.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
fout.write(reinterpret_cast<const char *>(&data.data_block[0]), data.data_block.size() * sizeof(int));
fout.close();
In my tests the first methods looks faster, but my main problem is in both ways at first everythings goes fine, for every file writing operation it tooks almost the same time (like 20 milliseconds), but after the 250 - 300th package it starts to make some peaks like 150 to 300 milliseconds and then goes down to 20 milliseconds and then again 150 ms and so on... So it becomes very unpredictable.
When I put some timers to the code I figured out that the main reason for these peaks are because of the fout.open(...) and pfile = fopen(...) lines. I have no idea if this is because of the operating system, hard drive, any kind of cache or buffer mechanism etc...
So the question is; why these file opening lines become problematic after some time, and is there a way to make file writing operation stable, I mean fixed time?
Thanks.
NOTE: I'm using Visual studio 2008 vc++, Windows 7 x64. (I tried also for 32 bit configuration but the result is same)
EDIT: After some point writing speed slows down as well even if the opening file time is minimum. I tried with different package sizes so here are the results:
For 2 MB packages it takes double time to slow down, I mean after ~ 600th item slowing down begins
For 4 MB packages almost 300th item
For 8 MB packages almost 150th item
So it seems to me it is some sort of caching problem or something? (in hard drive or OS). But I also tried with disabling hard drive cache but nothing changed...
Any idea?
This is all perfectly normal, you are observing the behavior of the file system cache. Which is a chunk of RAM that's is set aside by the operating system to buffer disk data. It is normally a fat gigabyte, can be much more if your machine has lots of RAM. Sounds like you've got 4 GB installed, not that much for a 64-bit operating system. Depends however on the RAM needs of other processes that run on the machine.
Your calls to fwrite() or ofstream::write() write to a small buffer created by the CRT, it in turns make operating system calls to flush full buffers. The OS writes normally completely very quickly, it is a simple memory-to-memory copy going from the CRT buffer to the file system cache. Effective write speed is in excess of a gigabyte/second.
The file system driver lazily writes the file system cache data to the disk. Optimized to minimize the seek time on the write head, by far the most expensive operation on the disk drive. Effective write speed is determined by the rotational speed of the disk platter and the time needed to position the write head. Typical is around 30 megabytes/second for consumer-level drives, give or take a factor of 2.
Perhaps you see the fire-hose problem here. You are writing to the file cache a lot faster than it can be emptied. This does hit the wall eventually, you'll manage to fill the cache to capacity and suddenly see the perf of your program fall off a cliff. Your program must now wait until space opens up in the cache so the write can complete, effective write speed is now throttled by disk write speeds.
The 20 msec delays you observe are normal as well. That's typically how long it takes to open a file. That is a time that's completely dominated by disk head seek times, it needs to travel to the file system index to write the directory entry. Nominal times are between 20 and 50 msec, you are on the low end of that already.
Clearly there is very little you can do in your code to improve this. What CRT functions you use certainly don't make any difference, as you found out. At best you could increase the size of the files you write, that reduces the overhead spent on creating the file.
Buying more RAM is always a good idea. But it of course merely delays the moment where the firehose overflows the bucket. You need better drive hardware to get ahead. An SSD is pretty nice, so is a striped raid array. Best thing to do is to simply not wait for your program to complete :)
So the question is; why these file opening lines become problematic
after some time, and is there a way to make file writing operation
stable, I mean fixed time?
This observation(.i.e. varying time taken in write operation) does not mean that there is problem in OS or File System.There could be various reason behind your observation. One possible reason could be the delayed write may be used by kernel to write the data to disk. Sometime kernel cache it(buffer) in case another process should read or write it soon so that extra disk operation can be avoided.
This situation may lead to inconsistency in the time taken in different write call for same size of data/buffer.
File I/O is bit complex and complicated topic and depends on various other factors. For complete information on internal algorithm on File System, you may want to refer the great great classic book "The Design Of UNIX Operating System" By Maurice J Bach which describes these concepts and the implementation in detailed way.
Having said that, you may want to use the flush call immediately after your write call in both version of your program(.i.e. C and C++). This way you may get the consistent time in your file I/O write time. Otherwise your programs behaviour look correct to me.
//C program
fwrite(data,fp);
fflush(fp);
//C++ Program
fout.write(data);
fout.flush();
It's possible that the spikes are not related to I/O itself, but NTFS metadata: when your file count reach some limit, some NTFS AVL-like data structure needs some refactoring and... bump!
To check it you should preallocate the file entries, for example creating all the files with zero size, and then opening them when writing, just for testing: if my theory is correct you shouldn't see your spikes anymore.
UHH - and you must disable file indexing (Windows search service) there! Just remembered of it... see here.

playing wav audio file in C++ and QT

I am reading a .wav file in C and then I am trying to play the audio file using some of the QT functions. Here is how I read the file:
FILE *fhandle=fopen("myAudioFile.wav","rb");
fread(ChunkID,1,4,fhandle);
fread(&ChunkSize,4,1,fhandle);
fread(Format,1,4,fhandle);
fread(Subchunk1ID,1,4,fhandle);
fread(&Subchunk1Size,4,1,fhandle);
fread(&AudioFormat,2,1,fhandle);
fread(&NumChannels,2,1,fhandle);
fread(&SampleRate,4,1,fhandle);
fread(&ByteRate,4,1,fhandle);
fread(&BlockAlign,2,1,fhandle);
fread(&BitsPerSample,2,1,fhandle);
fread(&Subchunk2ID,1,4,fhandle);
fread(&Subchunk2Size,4,1,fhandle);
Data=new quint16 [Subchunk2Size/(BitsPerSample/8)];
fread(Data,BitsPerSample/8,Subchunk2Size/(BitsPerSample/8),fhandle);
fclose(fhandle);
So my audio file is inside Data. Each element of Data is unsigned 16-bit Integer.
To play the sound I divide each 16-bit unsigned Integer into two characters and then every 3 ms (using a timer) I send 256 characters to the audio card.
Assume myData is a character array of 256 characters I do this (every 3 ms) to play the sound:
m_output->write(myData, 256);
Also m_output is defined as:
m_output = m_audioOutput->start();
and m_audioOutput is defined as:
m_audioOutput = new QAudioOutput(m_Outputdevice, m_format, this);
And the audio format is set correctly as:
m_format.setFrequency(44100);
m_format.setChannels(2);
m_format.setSampleSize(16);
m_format.setSampleType(QAudioFormat::UnSignedInt );
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setCodec("audio/pcm");
However, when I try to run the code I hear some noise which is very different from the real audio file.
Is there anything I am doing wronge?
Thanks,
TJ
I think the problem is that you are using QTimer. QTimer is absolutely not going to allow you to run code every three milliseconds exactly, regardless of the platform you're using. And if you're off by just one sample, your audio is going to sound horrible. According to the QTimer docs:
...they are not guaranteed to time out at the exact value specified. In
many situations, they may time out late by a period of time that
depends on the accuracy of the system timers.
and
...the accuracy of the timer will not equal [1 ms] in many real-world situations.
As much as I love Qt, I wouldn't try to use it for signal processing. I would use another framework such as JUCE.

Rendering some sound data into one new sound data?

I'm creating an application that will read a unique format that contains sound "bank" and offsets when the sound must be played.
Imagine something like..
Sound bank: (ID on the left-hand and file name on the right-hand side)
0 kick.wav
1 hit.wav
2 flute.wav
And the offsets: (Time in ms on the left-hand and sound ID on the right-hand side)
1000 0
2000 1
3000 2
And the application will generate a new sound file (ie. wav, for later conversion to other formats) that plays a kick at first sec, a hit at second sec, and flute at third sec.
I completely have no idea on where to begin.
I usually use FMOD for audio playbacks, but never did something like this before.
I'm using C++ and wxWidgets on a MSVC++ Express Edition environment, and LGPL libraries would be fine.
If I understand correctly, you want to generate a new wave file by mixing wavs from a soundbank. You may not need a sound API at all for this, especially if all your input wavs are in the same format.
Simply load each wav file into a buffer. For SampleRate*secondsUntilStartTime samples, for each buffer in the ActiveList, add buffer[bufferIdx++] into the output buffer. If bufferIdx == bufferLen, remove this buffer from the ActiveList. At StartTime, add the next buffer the ActiveList, and repeat.
If FMOD supports output to a file instead of the sound hardware, you can do this same thing with the streaming API. Just keep track of elapsed samples in the StreamCallback, and start mixing in new files whenever you reach their start offsets.

How to use ALSA's snd_pcm_writei()?

Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html