I'm attempting to make a console side scrolling shooter, I know this isn't the ideal medium for it but I set myself a bit of a challenge.
The problem is that whenever it updates the frame, the entire console is flickering. Is there any way to get around this?
I have used an array to hold all of the necessary characters to be output, here is my updateFrame function. Yes, I know system("cls") is lazy, but unless that's the cause of problem I'm not fussed for this purpose.
void updateFrame()
for (int y = 0; y < MAX_Y; y++)
for (int x = 0; x < MAX_X; x++)
std::cout << battleField[x][y];
std::cout << std::endl;

Ah, this brings back the good old days. I did similar things in high school :-)
You're going to run into performance problems. Console I/O, especially on Windows, is slow. Very, very slow (sometimes slower than writing to disk, even). In fact, you'll quickly become amazed how much other work you can do without it affecting the latency of your game loop, since the I/O will tend to dominate everything else. So the golden rule is simply to minimize the amount of I/O you do, above all else.
First, I suggest getting rid of the system("cls") and replace it with calls to the actual Win32 console subsystem functions that cls wraps (docs):
#define NOMINMAX
#include <Windows.h>
void cls()
// Get the Win32 handle representing standard output.
// This generally only has to be done once, so we make it static.
static const HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
COORD topLeft = { 0, 0 };
// std::cout uses a buffer to batch writes to the underlying console.
// We need to flush that to the console because we're circumventing
// std::cout entirely; after we clear the console, we don't want
// stale buffered text to randomly be written out.
// Figure out the current width and height of the console window
if (!GetConsoleScreenBufferInfo(hOut, &csbi)) {
// TODO: Handle failure!
DWORD length = csbi.dwSize.X * csbi.dwSize.Y;
DWORD written;
// Flood-fill the console with spaces to clear it
FillConsoleOutputCharacter(hOut, TEXT(' '), length, topLeft, &written);
// Reset the attributes of every character to the default.
// This clears all background colour formatting, if any.
FillConsoleOutputAttribute(hOut, csbi.wAttributes, length, topLeft, &written);
// Move the cursor back to the top left for the next sequence of writes
SetConsoleCursorPosition(hOut, topLeft);
Indeed, instead of redrawing the entire "frame" every time, you're much better off drawing (or erasing, by overwriting them with a space) individual characters at a time:
// x is the column, y is the row. The origin (0,0) is top-left.
void setCursorPosition(int x, int y)
static const HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
COORD coord = { (SHORT)x, (SHORT)y };
SetConsoleCursorPosition(hOut, coord);
// Step through with a debugger, or insert sleeps, to see the effect.
setCursorPosition(10, 5);
std::cout << "CHEESE";
setCursorPosition(10, 5);
std::cout 'W';
setCursorPosition(10, 9);
std::cout << 'Z';
setCursorPosition(10, 5);
std::cout << " "; // Overwrite characters with spaces to "erase" them
// Voilà, 'CHEESE' converted to 'WHEEZE', then all but the last 'E' erased
Note that this eliminates the flicker, too, since there's no longer any need to clear the screen completely before redrawing -- you can simply change what needs changing without doing an intermediate clear, so the previous frame is incrementally updated, persisting until it's completely up to date.
I suggest using a double-buffering technique: Have one buffer in memory that represents the "current" state of the console screen, initially populated with spaces. Then have another buffer that represents the "next" state of the screen. Your game update logic will modify the "next" state (exactly like it does with your battleField array right now). When it comes time to draw the frame, don't erase everything first. Instead, go through both buffers in parallel, and write out only the changes from the previous state (the "current" buffer at that point contains the previous state). Then, copy the "next" buffer into the "current" buffer to set up for your next frame.
char prevBattleField[MAX_X][MAX_Y];
std::memset((char*)prevBattleField, 0, MAX_X * MAX_Y);
// ...
for (int y = 0; y != MAX_Y; ++y)
for (int x = 0; x != MAX_X; ++x)
if (battleField[x][y] == prevBattleField[x][y]) {
setCursorPosition(x, y);
std::cout << battleField[x][y];
std::memcpy((char*)prevBattleField, (char const*)battleField, MAX_X * MAX_Y);
You can even go one step further and batch runs of changes together into a single I/O call (which is significantly cheaper than many calls for individual character writes, but still proportionally more expensive the more characters are written).
// Note: This requires you to invert the dimensions of `battleField` (and
// `prevBattleField`) in order for rows of characters to be contiguous in memory.
for (int y = 0; y != MAX_Y; ++y)
int runStart = -1;
for (int x = 0; x != MAX_X; ++x)
if (battleField[y][x] == prevBattleField[y][x]) {
if (runStart != -1) {
setCursorPosition(runStart, y);
std::cout.write(&battleField[y][runStart], x - runStart);
runStart = -1;
else if (runStart == -1) {
runStart = x;
if (runStart != -1) {
setCursorPosition(runStart, y);
std::cout.write(&battleField[y][runStart], MAX_X - runStart);
std::memcpy((char*)prevBattleField, (char const*)battleField, MAX_X * MAX_Y);
In theory, that will run a lot faster than the first loop; however in practice it probably won't make a difference since std::cout is already buffering writes anyway. But it's a good example (and a common pattern that shows up a lot when there is no buffer in the underlying system), so I included it anyway.
Finally, note that you can reduce your sleep to 1 millisecond. Windows will actually often sleep longer, typically up 15ms, but it will prevent your CPU core from reaching 100% usage with a minimum of additional latency.
Note that this not at all the way "real" games do things; they almost always clear the buffer and redraw everything every frame. They don't get flickering because they use the equivalent of a double-buffer on the GPU, where the previous frame stays visible until the new frame is completely finished being drawn.
Bonus: You can change the colour to any of 8 different system colours, and the background too:
void setConsoleColour(unsigned short colour)
static const HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
SetConsoleTextAttribute(hOut, colour);
// Example:
const unsigned short DARK_BLUE = FOREGROUND_BLUE;
std::cout << "Hello ";
std::cout << "world";
std::cout << "!" << std::endl;

system("cls") is the cause of your problem. For updating frame your program has to spawn another process and then load and execute another program. This is quite expensive.
cls clears your screen, which means for a small amount of the time (until control returns to your main process) it displays completely nothing. That's where flickering comes from.
You should use some library like ncurses which allows you to display the "scene", then move your cursor position to <0,0> without modifying anything on the screen and redisplay your scene "over" the old one. This way you'll avoid flickering, because your scene will always display something, without 'completely blank screen' step.

One method is to write the formatted data to a string (or buffer) then block write the buffer to the console.
Every call to a function has an overhead. Try go get more done in a function. In your Output, this could mean a lot of text per output request.
For example:
static char buffer[2048];
char * p_next_write = &buffer[0];
for (int y = 0; y < MAX_Y; y++)
for (int x = 0; x < MAX_X; x++)
*p_next_write++ = battleField[x][y];
*p_next_write++ = '\n';
*p_next_write = '\0'; // "Insurance" for C-Style strings.
cout.write(&buffer[0], std::distance(p_buffer - &buffer[0]));
I/O operations are expensive (execution-wise), so the best use is to maximize the data per output request.

With the accepted answer the rendering would still be flickering if your updated area is big enough. Even if you animate a single horizontal line to move from top to bottom you'll most of the time see it like this:
This happens because you see the previous frame in the process of being overwritten by a newer one. For complex scenes like video or 3D rendering, this is barely acceptable. The proper way to do it is by using the double buffering technique. The idea is to draw all the "pixels" into an off-screen buffer and when done display it all at once. Gladly Windows console supports this approach pretty well. Please see the full example on how to do the double buffering below:
#include <chrono>
#include <thread>
#include <Windows.h>
#include <vector>
const unsigned FPS = 25;
std::vector<char> frameData;
short cursor = 0;
// Get the intial console buffer.
auto firstBuffer = GetStdHandle(STD_OUTPUT_HANDLE);
// Create an additional buffer for switching.
auto secondBuffer = CreateConsoleScreenBuffer(
// Assign switchable back buffer.
HANDLE backBuffer = secondBuffer;
bool bufferSwitch = true;
// Returns current window size in rows and columns.
COORD getScreenSize()
GetConsoleScreenBufferInfo(firstBuffer, &bufferInfo);
const auto newScreenWidth = bufferInfo.srWindow.Right - bufferInfo.srWindow.Left + 1;
const auto newscreenHeight = bufferInfo.srWindow.Bottom - bufferInfo.srWindow.Top + 1;
return COORD{ static_cast<short>(newScreenWidth), static_cast<short>(newscreenHeight) };
// Switches back buffer as active.
void swapBuffers()
WriteConsole(backBuffer, &frameData.front(), static_cast<short>(frameData.size()), nullptr, nullptr);
backBuffer = bufferSwitch ? firstBuffer : secondBuffer;
bufferSwitch = !bufferSwitch;
std::this_thread::sleep_for(std::chrono::milliseconds(1000 / FPS));
// Draw horizontal line moving from top to bottom.
void drawFrame(COORD screenSize)
for (auto i = 0; i < screenSize.Y; i++)
for (auto j = 0; j < screenSize.X; j++)
if (cursor == i)
frameData[i * screenSize.X + j] = '#';
frameData[i * screenSize.X + j] = ' ';
if (cursor >= screenSize.Y)
cursor = 0;
int main()
const auto screenSize = getScreenSize();
SetConsoleScreenBufferSize(firstBuffer, screenSize);
SetConsoleScreenBufferSize(secondBuffer, screenSize);
frameData.resize(screenSize.X * screenSize.Y);
// Main rendering loop:
// 1. Draw frame to the back buffer.
// 2. Set back buffer as active.
while (true)
In this example, I went with a static FPS value for the sake of simplicity. You may also want to introduce some functionality to stabilize frame frequency output by counting the actual FPS. That would make your animation run smoothly independent of the console throughput.


this_thread::sleep_for / SDL Rendering Skips instructions

I'm trying to make a sorting visualizer with SDL2, everything works except one thing, the wait time.
The sorting visualizer has a delay, I can change it to whatever i want, but when I set it to around 1ms it skips some instructions.
Here is 10ms vs 1ms:
10ms delay
1ms delay
The video shows how the 1ms delay doesn't actually finish sorting:
Picture of 1ms delay algorithm completion.
I suspect the problem being the wait function I use, I'm trying to make this program multi-platform so there are little to no options.
Here's a snippet of the code:
Selection Sort Code (Shown in videos):
void selectionSort(void)
int minimum;
// One by one move boundary of unsorted subarray
for (int i = 0; i < totalValue-1; i++)
// Find the minimum element in unsorted array
minimum = i;
for (int j = i+1; j < totalValue; j++){
if (randArray[j] < randArray[minimum]){
minimum = j;
lineColoration[j] = 2;
lineColoration[i] = 1;
// Swap the found minimum element with the first element
swap(randArray[minimum], randArray[i]);
Some variables need explanation:
totalValue is the amount of values to be sorted (user input)
randArray is a vector that stores all the values
waitTime is the amount of milliseconds the computer will wait each time (user input)
I've cut the code down, and removed other algorithms to make a reproducible example, not rendering and using cout seems to work, but I still cant pin down if the issue is the render or the wait function:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <random>
#include <thread>
#include <vector>
#include <math.h>
SDL_Window* window;
SDL_Renderer* renderer;
using namespace std;
vector<int> randArray;
int totalValue= 100;
auto waitTime= 1ms;
vector<int> lineColoration;
int lineSize;
int lineHeight;
Uint32 ticks= 0;
void OrganizeVariables()
for(int i= 0; i < totalValue; i++)
randArray.push_back(i + 1);
auto rng= default_random_engine{};
shuffle(begin(randArray), end(randArray), rng);
int create_window(void)
window= SDL_CreateWindow("Sorting Visualizer", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 1800, 900, SDL_WINDOW_SHOWN);
return window != NULL;
int create_renderer(void)
renderer= SDL_CreateRenderer(
return renderer != NULL;
int init(void)
if(SDL_Init(SDL_INIT_VIDEO) != 0)
goto bad_exit;
if(create_window() == 0)
goto quit_sdl;
if(create_renderer() == 0)
goto destroy_window;
cout << "All safety checks passed succesfully" << endl;
return 1;
return 0;
void cleanup(void)
void render(void)
SDL_SetRenderDrawColor(renderer, 0, 0, 0, 255);
//This is used to only render when 16ms hits (60fps), if true, will set the ticks variable to GetTicks() + 16
if(SDL_GetTicks() > ticks) {
for(int i= 0; i < totalValue - 1; i++) {
// SDL_Rect image_pos = {i*4, 100, 3, randArray[i]*2};
SDL_Rect fill_pos= {i * (1 + lineSize), 100, lineSize,randArray[i] * lineHeight};
switch(lineColoration[i]) {
case 0:
case 1:
case 2:
cout << "Error, drawing color not defined, exting...";
cout << "Unkown Color ID: " << lineColoration[i];
SDL_RenderFillRect(renderer, &fill_pos);
ticks= SDL_GetTicks() + 16;
void selectionSort(void)
int minimum;
// One by one move boundary of unsorted subarray
for (int i = 0; i < totalValue-1; i++) {
// Find the minimum element in unsorted array
minimum = i;
for (int j = i+1; j < totalValue; j++) {
if (randArray[j] < randArray[minimum]) {
minimum = j;
lineColoration[j] = 2;
lineColoration[i] = 1;
// Swap the found minimum element with the first element
swap(randArray[minimum], randArray[i]);
int main(int argc, char** argv)
//Rough estimate of screen size
lineSize= 1100 / totalValue;
lineHeight= 700 / totalValue;
The problem is the ticks= SDL_GetTicks() + 16; as those are too many ticks for a millisecond wait and the if(SDL_GetTicks() > ticks) condition is false most of the time.
If you put 1ms wait and ticks= SDL_GetTicks() + 5 it will work.
In the selectionSort loop, if in the last, say, eight iterations, the if(SDL_GetTicks() > ticks) skips the drawing, the loop may well finish and let some pending drawings.
It is not the algorithm not completing, it is it finish before ticks reaches a number high enough to allow the drawing.
The main problem is that you are dropping updates to the screen by making all rendering dependant on an if condition:
if(SDL_GetTicks() > ticks)
My tests have shown that only about every 70th call to the function render actually gets rendered. All other calls are filtered by this if condition.
This extremely high number is because you are calling the function render not only in your outer loop, but also in the inner loop. I see no reason why it should also be called in the inner loop. In my opinion, it should only be called in the outer loop.
If you only call it in the outer loop, then about every 16th call to the function is actually rendered.
However, this still means that the last call to the render function only has a 1 in 16 chance of being rendered. Therefore, it is not surprising that the last render of your program does not represent the last sorting step.
If you want to ensure that the last sorting step gets rendered, you could simply execute the rendering code once unconditionally, after the sorting has finished. However, this may not be the ideal solution, because I believe you should first make a more fundamental decision on how your program should behave:
In your question, you are using delays of 1ms between calls to render. This means that your program is designed to render 1000 frames per second. However, your monitor can probably only display about 60 frames per second (some gaming monitors can display more). In that case, every displayed frame lasts for at least 16.7 milliseconds.
Therefore, you must decide how you want your program to behave with regard to the monitor. You could make your program
sort faster than your monitor can display individual sorting steps, so that not all of the sorting steps are rendered, or
sort slower than your monitor can display individual sorting steps, so that all sorting steps are displayed by the monitor for at least one frame, possibly several frames, or
sort at exactly the same speed as your monitor can display, so that one sorting step is displaying for exactly one frame by the monitor.
Implementing #3 is the easiest of all. Because you have enabled VSYNC in the function call to SDL_CreateRenderer, SDL will automatically limit the number of renders to the display rate of your monitor. Therefore, you don't have to perform any additional waiting in your code and can remove the line
from the function selectionSort. Also, since SDL knows better than you whether your monitor is ready for the next frame to be drawn, it does not seem appropriate that you try to limit the number of frames yourself. So you can remove the line
if(SDL_GetTicks() > ticks) {
and the corresponding closing brace from the function render.
On the other hand, it may be better to keep the if statement to prevent the massively high frame rates in case SDL doesn't limit them properly. In that case, the frame rate limiter should probably be set well above 60 fps, though (maybe 100-200 fps), to ensure that the frames are passed fast enough to SDL.
Implementing #1 is harder, as it actually requires you to select which sorting steps to render and which ones not to render. Therefore, in order to implement #1, you will probably need to keep the if statement mentioned above, so that rendering only occurs conditionally.
However, it does not seem meaningful to make the if statement dependant on elapsed time since the last render, because while wating, the sorting will continue at full speed and it is therefore possible that all of the sorting will be completed with only one frame of rendering. You are currently preventing this from happending by slowing down the sort by using the line
in the function selectionSort. But this does not seem like an ideal solution, but rather a stopgap measure.
Instead of making the if condition dependant on time, it would be easier to make it dependant on the number of sorting steps since the last render. That way, you could, for example, program it in such a way that every 5th sorting step gets rendered. In that case, there would be no need anymore to additionally slow down the actual sorting and your code would be simpler.
As already described above, when implementing #1, you will also have to ensure that you do not drop the last rendering step, or that you at least render the last frame after the sorting is finished. Otherwise, the last frame will likely not display the completed sort, but rather an intermediate sorting step.
Implementing #2 is similar to implementing #1, except that you will have to use SDL_Delay (which is equivalent this_thread::sleep_for) or SDL_AddTimer to determine when it is time to render the next sorting step.
Using SDL_AddTimer would require you to handle SDL Events. However, I would recommend that you do this anyway, because that way, you will also be able to handle SDL_QUIT events, so that you can close your program by closing the window. This would also make the line
this_thread::sleep_for( 5000ms );
at the end of your program unnecessary, because you could instead wait for the user to close the window, like this:
for (;;)
SDL_Event event;
SDL_WaitEvent( &event );
if ( event.type == SDL_QUIT ) break;
However, it would probably be better if you restructured your entire program, so that you only have one message loop, which responds to both SDL Timer and SDL_QUIT events.

Waveform Widget Touchgfx

I’m creating a waveform widget in TouchGFX, but unsure how best to loop the waveform back to zero at the end because there are three frame buffers so you have to invalidate over an area three times or you get flickering . How would you handle looping the array back to start (x=0).
The main issue is my code originally assumed there was only one frame buffer. I think my code needs to be refactored for three framebuffers or add the ability to write directly to the frame buffer. Any hints would be greatly appreciated.
bool Graph::drawCanvasWidget(const Rect& invalidatedArea) const
if (numPoints < 3)
// A graph line with a single (or not even a single) point is invisible
return true;
Canvas canvas(this, invalidatedArea);
for (int index = 0; index < (numPoints-1); index++)
return canvas.render(); // Shape above automatically closed
return true;
void Graph::newPoint(int y)
}else if ((maxPoints-numPoints)<=20){
points[numPoints].x = numPoints;
points[numPoints].y = y;
Rect minimalRect(480,0,20,100);
points[numPoints].x = numPoints;
points[numPoints].y = y;
Rect minimalRect(numPoints-3,0,20,100);
With TouchGFX 4.15.0 (just out) the TouchGFX Designer now supports a Graph widget (previously only found in source code in demos) which can be used to produce your waveforms. It has some more elegant ways of inserting points which may suit your needs.

QueryPerformanceCounter limiting/speeding up slide speed

I have a thread that waits on a std::condition_variable then loops till it is done.
Im trying to slide my rect that is drawn in opengl.
Everything works fine without using a delta, But i would like my rect to slide at the same speed no matter what computer it is ran on.
At the moment it jumps about half way then slides really slow.
If i dont use my delta it does not run at the same speed if ran on slower computers.
Im not sure if i should ihave a if statement and check if time has passed then do the sliding, an not use a delta?
auto toolbarGL::Slide() -> void
LARGE_INTEGER then, now, freq;
while (true)
// Waits to be ready to slide
// Keeps looping till stopped then starts to wait again
float delta_time_sec = (float)(now.QuadPart - then.QuadPart) / freq.QuadPart;
if (slideDir == SlideFlag::Right)
if (this->x < 0)
this->x += 10 * delta_time_sec;
else if (slideDir == SlideFlag::Left)
if (this->x > -90)
this->x -= 10 * delta_time_sec;
then = now;
If you want your rectangle to move at a steady speed no matter what, I suggest a different approach -- instead of relying on your code executing at a particular time and causing a side effect (like x += 10) each time, come up with a function that will tell you what the rectangle's location should be at any given time. That way, no matter when your Paint() method is called, it will always draw the rectangle at the location that corresponds to that time.
For example:
// Returns the current time, in microseconds-since-some-arbitrary-time-zero
unsigned long long GetCurrentTimeMicroseconds()
static unsigned long long _ticksPerSecond = 0;
if (_ticksPerSecond == 0) _ticksPerSecond = (QueryPerformanceFrequency(&tps)) ? tps.QuadPart : 0;
if ((_ticksPerSecond > 0)&&(QueryPerformanceCounter(&curTicks)))
return (curTicks.QuadPart*1000000)/_ticksPerSecond;
printf("GetCurrentTimeMicroseconds() failed, oh dear\n");
return 0;
// A particular location on the screen
int startPositionX = 0;
// A clock-value at which the rectangle was known to be at that location
unsigned long long timeStampAtStartPosition = GetCurrentTimeInMicroseconds();
// The rectangle's current velocity, in pixels-per-second
int speedInPixelsPerSecond = 10;
// Given any clock-value (in microseconds), returns the expected position of the rectangle at that time
int GetXAtTime(unsigned long long currentTimeInMicroseconds)
const long long timeSinceMicroseconds = currentTimeInMicroseconds-timeStampAtStartPosition;
return startPositionX + ((speedInPixelsPerSecond*timeSinceMicroseconds)/1000000);
void PaintScene()
const int rectX = GetXAtTime(GetCurrentTimeMicroseconds());
// code to paint the rectangle at position (rectX) goes here...
Given the above, your program can call PaintScene() as seldom or as often as it wants, and your rectangle's on-screen speed will not change (although the animation will look more or less smooth, depending on how often you call it).
Then if you want the rectangle to change its direction of motion, you can just do something like this:
const unsigned long long now = GetCurrentTimeInMicroseconds();
startPositionX = GetXAtTime(now);
speedInPixelsPerSecond = -speedInPixelsPerSecond; // reverse course!
The above example uses a simple y=mx+b-style equation that provides linear motion, but you can get many different types of motion, by using different parametric equations that take a time-value argument and return a corresponding position-value.

Mouse cursor position deviations when scrolling

I'm trying to write little project for programming class. It's simple graphic library using only ascii characters, working on Windows console (I use win7 64bit). The problems occur when i'm trying to add mouse handling. Here's the code
void importantMouseThings()
DWORD numEvents = 0;
DWORD numEventsRead = 0;
GetNumberOfConsoleInputEvents( cgWindow::inputHandle, &numEvents);
if (numEvents != 0)
INPUT_RECORD *eventBuffer = new INPUT_RECORD[numEvents];
ReadConsoleInput(cgWindow::inputHandle, eventBuffer, numEvents, &numEventsRead);
for (DWORD i = 0; i < numEventsRead; i++)
if (eventBuffer[i].EventType == MOUSE_EVENT)
int mousex = eventBuffer[i].Event.MouseEvent.dwMousePosition.X;
int mousey = eventBuffer[i].Event.MouseEvent.dwMousePosition.Y;
std::cout << mousex << " " << mousey << std::endl;
delete[] eventBuffer;
The problem is when I move cursor to upper-left corner of console window, cout writes "0 0", but when I use scroll wheel (I mean when i scroll down or up) valuses change to something like "20 14". When I call another mouse event, simply by moving cursor a bit, values come back to return state "0 0".
Maybe I just don't get what dwMousePosition is, maybe it's something with console window (both, window and buffer sizes are set to 80x80, so there's no visible scrollbars).
Okey, I just realized, that value changes depends on window position. So if the window is on the left side of screen, the X divergence is very small and grows as I move window to the right. Any ideas, what is wrong?

Realtime audio application, improving performance

I am currently writing a C++ real time audio application which roughly contains:
reading frames from a buffer
interpolating frames with the hermit interpolation here
filtering ever frame with two biquad filters (and updating their coefficients every frame)
a 3 band crossover containing 18 biquad calculations
a FreeVerb algorithm from the STK libary here
I think this should be handable for my PC but I get some buffer underflows every so often so I would like to improve the performance of my application. I have a bunch of question I hope you can answer me. :)
1) Operator Overloading
Instead of working directly with my flaot samples and doing calculations for every sample,
I pack my floats in a Frame class which contains the left and the right Sample. The class overloads some operators for addition, subtraction and multiplication with float.
The filters (biquad mostly) and the reverb works with floats and doesn't use this class but the hermite interpolator and every multiplication and addition for volume controll and mixing uses the class.
Does this has an impact on the performance and would it be better to work with left and right sample directly?
2) std::function
The callback function from the audio IO libary PortAudio calls a std::function. I use this to encapsulation everything related to PortAudio. So the "user" sets his own callback function with std::bind
std::bind( &AudioController::processAudio,
Since for every callback, the right function has to be found from the CPU (however this works...), does this have an impact and would it be better to define a class the user has to inherit from?
3) virtual functions
I use a class called AudioProcessor which declares a virtual function:
virtual void tick(Frame *buffer, int frameCout) = 0;
This function always processes a number of frames at once. Depending on the drive, 200 frames up to 1000 frames per call.
Within the signal processing path, I call this function 6 time from multiple derivated classes. I remember that this is done with lookup tables so the CPU knows exactly which function it has to call. So does the process of calling a "virtual" (derivated) function has an impact on the performance?
The nice thing about this is the structure in the source code but only using inlines maybe would have an performance improvement.
These are all questions for now. I have some more about Qt's event loop because I think that my GUI uses quite a bit of CPU time as well. But this is another topic I guess. :)
Thanks in advance!
These are all relevant function calls within the signal processing. Some of them are from the STK libary.
The biquad functions are from STK and should perform fine. This goes for the freeverb algorithm as well.
// ################################ AudioController Function ############################
void AudioController::processAudio(int frameCount, float *output) {
Frame * leftFrameBuffer = (Frame*) output;
if(leftLoaded) { // the left processor is loaded
leftProcessor->tick(leftFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
leftFrameBuffer[i].leftSample = 0.0f;
leftFrameBuffer[i].rightSample = 0.0f;
if(rightLoaded) { // the right processor is loaded
// the rightFrameBuffer is allocated once and ensured to have enough space for frameCount Frames
rightProcessor->tick(rightFrameBuffer, frameCount); //(TrackProcessor::tick()
} else {
for(int i = 0; i < frameCount; i++) {
rightFrameBuffer[i].leftSample = 0.0f;
rightFrameBuffer[i].rightSample = 0.0f;
// MIX
for(int i = 0; i < frameCount; i++ ) {
leftFrameBuffer[i] = volume * (leftRightMix * leftFrameBuffer[i] + (1.0 - leftRightMix) * rightFrameBuffer[i]);
// ################################ AudioController Function ############################
void TrackProcessor::tick(Frame *frames, int frameNum) {
if(bufferLoaded && playback) {
for(int i = 0; i < frameNum; i++) {
// read from buffer
frames[i] = bufferPlayer->tick();
// filter coeffs
caltulateFilterCoeffs(lowCutoffFilter->tick(), highCutoffFilter->tick());
// filter
frames[i].leftSample = lpFilterL->tick(hpFilterL->tick(frames[i].leftSample));
frames[i].rightSample = lpFilterR->tick(hpFilterR->tick(frames[i].rightSample));
} else {
for(int i = 0; i < frameNum; i++) {
frames[i] = Frame(0,0);
// Effect 1, Equalizer
if(effsActive[0]) {
insEffProcessors[0]->tick(frames, frameNum);
// Effect 2, Reverb
if(effsActive[1]) {
insEffProcessors[1]->tick(frames, frameNum);
// Volume
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample *= volume;
frames[i].rightSample *= volume;
// ################################ Equalizer ############################
void EqualizerProcessor::tick(Frame *frames, int frameNum) {
if(active) {
Frame lowCross;
Frame highCross;
for(int f = 0; f < frameNum; f++) {
lowAmp = lowAmpFilter->tick();
midAmp = midAmpFilter->tick();
highAmp = highAmpFilter->tick();
lowCross = highLPF->tick(frames[f]);
highCross = highHPF->tick(frames[f]);
frames[f] = lowAmp * lowLPF->tick(lowCross)
+ midAmp * lowHPF->tick(lowCross)
+ highAmp * lowAPF->tick(highCross);
// ################################ Reverb ############################
// This function just calls the stk::FreeVerb tick function for every frame
// The FreeVerb implementation can't realy be optimised so I will take it as it is.
void ReverbProcessor::tick(Frame *frames, int frameNum) {
if(active) {
for(int i = 0; i < frameNum; i++) {
frames[i].leftSample = reverb->tick(frames[i].leftSample, frames[i].rightSample);
frames[i].rightSample = reverb->lastOut(1);
// ################################ Buffer Playback (BufferPlayer) ############################
Frame BufferPlayer::tick() {
// adjust read position based on loop status
if(inLoop) {
while(readPos > loopEndPos) {
readPos = loopStartPos + (readPos - loopEndPos);
int x1 = readPos;
float t = readPos - x1;
Frame f = interpolate(buffer->frameAt(x1-1),
readPos += stepSize;;
return f;
// interpolation:
Frame BufferPlayer::interpolate(Frame x0, Frame x1, Frame x2, Frame x3, float t) {
Frame c0 = x1;
Frame c1 = 0.5f * (x2 - x0);
Frame c2 = x0 - (2.5f * x1) + (2.0f * x2) - (0.5f * x3);
Frame c3 = (0.5f * (x3 - x0)) + (1.5f * (x1 - x2));
return (((((c3 * t) + c2) * t) + c1) * t) + c0;
inline Frame BufferPlayer::frameAt(int pos) {
if(pos < 0) {
pos = 0;
} else if (pos >= frames) {
pos = frames -1;
// get chunk and relative Sample
int chunk = pos/ChunkSize;
int chunkSample = pos%ChunkSize;
return Frame(leftChunks[chunk][chunkSample], rightChunks[chunk][chunkSample]);
Some suggestions on performance improvement:
Optimize Data Cache Usage
Review your functions that operate on a lot of data (e.g. arrays). The functions should load data into cache, operate on the data, then store back into memory.
The data should be organized to best fit into the data cache. Break up the data into smaller blocks if it doesn't fit. Search the web for "data driven design" and "cache optimizations".
In one project, performing data smoothing, I changed the layout of data and gained 70% performance.
Use Multiple Threads
In the big picture, you may be able to use at least three dedicated threads: input, processing and output. The input thread obtains the data and stores it in buffer(s); search the Web for "double buffering". The second thread gets data from the input buffer, processes it, then writes to an output buffer. The third thread writes data from the output buffer to the file.
You may also benefit from using threads for left and right samples. For example, while one thread is processing the left sample, another thread could be processing the right sample. If you could put the threads on different cores, you may see even more performance benefit.
Use the GPU processing
A lot of modern Graphics Processing Units (GPU) have a lot of cores that can process floating point values. Maybe you could delegate some of the filtering or analysis functions to the cores in the GPU. Be aware that this requires overhead and to gain the benefit, the processing part should be more computative than the overhead.
Reducing the Branching
Processors prefer to manipulate data over branching. Branching stalls the execution as the processor has to figure out where to get and process the next instruction. Some have large instruction caches that can contain small loops; but there is still a penalty for branching to the top of the loop again. See "Loop Unrolling". Also check your compiler optimizations and optimize high for performance. Many compilers will switch to loop unrolling for you, if the circumstances are correct.
Reduce the Amount of Processing
Do you need to process the entire sample or portions of it? For example, in video processing, much of the frame doesn't change only small portions. So the entire frame doesn't need to be processed. Can the audio channels be isolated so only a few channels are processed rather than the entire spectrum?
Coding to Help the Compiler Optimize
You can help the compiler with optimizations by using the const modifier. The compiler may be able to use different algorithms for variables that don't change versus ones that do. For example, a const value can be placed in the executable code, but a non-const value must be placed in memory.
Using static and const can help too. The static usually implies only one instance. The const implies something that doesn't change. So if there is only one instance of the variable that doesn't change, the compiler can place it into the executable or read-only memory and perform a higher optimization of the code.
Loading multiple variables at the same time can help too. The processor can place the data into the cache. The compiler may be able to use specialized assembly instructions for fetching sequential data.