Image to ASCII art conversion - c++

Prologue
This subject pops up here on Stack Overflow from time to time, but it is removed usually because of being a poorly written question. I saw many such questions and then silence from the OP (usual low rep) when additional information is requested. From time to time if the input is good enough for me I decide to respond with an answer and it usually gets a few up-votes per day while active, but then after a few weeks the question gets removed/deleted and all starts from the beginning. So I decided to write this Q&A so I can reference such questions directly without rewriting the answer over and over again …
Another reason is also this meta thread targeted at me so if you got additional input, feel free to comment.
Question
How can I convert a bitmap image to ASCII art using C++?
Some constraints:
gray scale images
using mono-spaced fonts
keeping it simple (not using too advanced stuff for beginner level programmers)
Here is a related Wikipedia page ASCII art (thanks to #RogerRowland).
Here similar maze to ASCII Art conversion Q&A.

There are more approaches for image to ASCII art conversion which are mostly based on using mono-spaced fonts. For simplicity, I stick only to basics:
Pixel/area intensity based (shading)
This approach handles each pixel of an area of pixels as a single dot. The idea is to compute the average gray scale intensity of this dot and then replace it with character with close enough intensity to the computed one. For that we need some list of usable characters, each with a precomputed intensity. Let's call it a character map. To choose more quickly which character is the best for which intensity, there are two ways:
Linearly distributed intensity character map
So we use only characters which have an intensity difference with the same step. In other words, when sorted ascending then:
intensity_of(map[i])=intensity_of(map[i-1])+constant;
Also when our character map is sorted then we can compute the character directly from intensity (no search needed)
character = map[intensity_of(dot)/constant];
Arbitrary distributed intensity character map
So we have array of usable characters and their intensities. We need to find intensity closest to the intensity_of(dot) So again if we sorted the map[], we can use binary search, otherwise we need an O(n) search minimum distance loop or O(1) dictionary. Sometimes for simplicity, the character map[] can be handled as linearly distributed, causing a slight gamma distortion, usually unseen in the result unless you know what to look for.
Intensity-based conversion is great also for gray-scale images (not just black and white). If you select the dot as a single pixel, the result gets large (one pixel -> single character), so for larger images an area (multiply of font size) is selected instead to preserve the aspect ratio and do not enlarge too much.
How to do it:
Evenly divide the image into (gray-scale) pixels or (rectangular) areas dots
Compute the intensity of each pixel/area
Replace it by character from character map with the closest intensity
As the character map you can use any characters, but the result gets better if the character has pixels dispersed evenly along the character area. For starters you can use:
char map[10]=" .,:;ox%##";
sorted descending and pretend to be linearly distributed.
So if intensity of pixel/area is i = <0-255> then the replacement character will be
map[(255-i)*10/256];
If i==0 then the pixel/area is black, if i==127 then the pixel/area is gray, and if i==255 then the pixel/area is white. You can experiment with different characters inside map[] ...
Here is an ancient example of mine in C++ and VCL:
AnsiString m = " .,:;ox%##";
Graphics::TBitmap *bmp = new Graphics::TBitmap;
bmp->LoadFromFile("pic.bmp");
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf24bit;
int x, y, i, c, l;
BYTE *p;
AnsiString s, endl;
endl = char(13); endl += char(10);
l = m.Length();
s ="";
for (y=0; y<bmp->Height; y++)
{
p = (BYTE*)bmp->ScanLine[y];
for (x=0; x<bmp->Width; x++)
{
i = p[x+x+x+0];
i += p[x+x+x+1];
i += p[x+x+x+2];
i = (i*l)/768;
s += m[l-i];
}
s += endl;
}
mm_log->Lines->Text = s;
mm_log->Lines->SaveToFile("pic.txt");
delete bmp;
You need to replace/ignore VCL stuff unless you use the Borland/Embarcadero environment.
mm_log is the memo where the text is outputted
bmp is the input bitmap
AnsiString is a VCL type string indexed from 1, not from 0 as char*!!!
This is the result: Slightly NSFW intensity example image
On the left is ASCII art output (font size 5 pixels), and on the right input image zoomed a few times. As you can see, the output is larger pixel -> character. If you use larger areas instead of pixels then the zoom is smaller, but of course the output is less visually pleasing. This approach is very easy and fast to code/process.
When you add more advanced things like:
automated map computations
automatic pixel/area size selection
aspect ratio corrections
Then you can process more complex images with better results:
Here is the result in a 1:1 ratio (zoom to see the characters):
Of course, for area sampling you lose the small details. This is an image of the same size as the first example sampled with areas:
Slightly NSFW intensity advanced example image
As you can see, this is more suited for bigger images.
Character fitting (hybrid between shading and solid ASCII art)
This approach tries to replace area (no more single pixel dots) with character with similar intensity and shape. This leads to better results, even with bigger fonts used in comparison with the previous approach. On the other hand, this approach is a bit slower of course. There are more ways to do this, but the main idea is to compute the difference (distance) between image area (dot) and rendered character. You can start with naive sum of the absolute difference between pixels, but that will lead to not very good results because even a one-pixel shift will make the distance big. Instead you can use correlation or different metrics. The overall algorithm is the almost the same as the previous approach:
So evenly divide the image to (gray-scale) rectangular areas dot's
ideally with the same aspect ratio as rendered font characters (it will preserve the aspect ratio. Do not forget that characters usually overlap a bit on the x-axis)
Compute the intensity of each area (dot)
Replace it by a character from the character map with the closest intensity/shape
How can we compute the distance between a character and a dot? That is the hardest part of this approach. While experimenting, I develop this compromise between speed, quality, and simpleness:
Divide character area to zones
Compute a separate intensity for left, right, up, down, and center zone of each character from your conversion alphabet (map).
Normalize all intensities, so they are independent on area size, i=(i*256)/(xs*ys).
Process the source image in rectangle areas
(with the same aspect ratio as the target font)
For each area, compute the intensity in the same manner as in bullet #1
Find the closest match from intensities in the conversion alphabet
Output the fitted character
This is the result for font size = 7 pixels
As you can see, the output is visually pleasing, even with a bigger font size used (the previous approach example was with a 5 pixel font size). The output is roughly the same size as the input image (no zoom). The better results are achieved because the characters are closer to the original image, not only by intensity, but also by overall shape, and therefore you can use larger fonts and still preserve details (up to a point of course).
Here is the complete code for the VCL-based conversion application:
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "win_main.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
Graphics::TBitmap *bmp=new Graphics::TBitmap;
//---------------------------------------------------------------------------
class intensity
{
public:
char c; // Character
int il, ir, iu ,id, ic; // Intensity of part: left,right,up,down,center
intensity() { c=0; reset(); }
void reset() { il=0; ir=0; iu=0; id=0; ic=0; }
void compute(DWORD **p,int xs,int ys,int xx,int yy) // p source image, (xs,ys) area size, (xx,yy) area position
{
int x0 = xs>>2, y0 = ys>>2;
int x1 = xs-x0, y1 = ys-y0;
int x, y, i;
reset();
for (y=0; y<ys; y++)
for (x=0; x<xs; x++)
{
i = (p[yy+y][xx+x] & 255);
if (x<=x0) il+=i;
if (x>=x1) ir+=i;
if (y<=x0) iu+=i;
if (y>=x1) id+=i;
if ((x>=x0) && (x<=x1) &&
(y>=y0) && (y<=y1))
ic+=i;
}
// Normalize
i = xs*ys;
il = (il << 8)/i;
ir = (ir << 8)/i;
iu = (iu << 8)/i;
id = (id << 8)/i;
ic = (ic << 8)/i;
}
};
//---------------------------------------------------------------------------
AnsiString bmp2txt_big(Graphics::TBitmap *bmp,TFont *font) // Character sized areas
{
int i, i0, d, d0;
int xs, ys, xf, yf, x, xx, y, yy;
DWORD **p = NULL,**q = NULL; // Bitmap direct pixel access
Graphics::TBitmap *tmp; // Temporary bitmap for single character
AnsiString txt = ""; // Output ASCII art text
AnsiString eol = "\r\n"; // End of line sequence
intensity map[97]; // Character map
intensity gfx;
// Input image size
xs = bmp->Width;
ys = bmp->Height;
// Output font size
xf = font->Size; if (xf<0) xf =- xf;
yf = font->Height; if (yf<0) yf =- yf;
for (;;) // Loop to simplify the dynamic allocation error handling
{
// Allocate and initialise buffers
tmp = new Graphics::TBitmap;
if (tmp==NULL)
break;
// Allow 32 bit pixel access as DWORD/int pointer
tmp->HandleType = bmDIB; bmp->HandleType = bmDIB;
tmp->PixelFormat = pf32bit; bmp->PixelFormat = pf32bit;
// Copy target font properties to tmp
tmp->Canvas->Font->Assign(font);
tmp->SetSize(xf, yf);
tmp->Canvas->Font ->Color = clBlack;
tmp->Canvas->Pen ->Color = clWhite;
tmp->Canvas->Brush->Color = clWhite;
xf = tmp->Width;
yf = tmp->Height;
// Direct pixel access to bitmaps
p = new DWORD*[ys];
if (p == NULL) break;
for (y=0; y<ys; y++)
p[y] = (DWORD*)bmp->ScanLine[y];
q = new DWORD*[yf];
if (q == NULL) break;
for (y=0; y<yf; y++)
q[y] = (DWORD*)tmp->ScanLine[y];
// Create character map
for (x=0, d=32; d<128; d++, x++)
{
map[x].c = char(DWORD(d));
// Clear tmp
tmp->Canvas->FillRect(TRect(0, 0, xf, yf));
// Render tested character to tmp
tmp->Canvas->TextOutA(0, 0, map[x].c);
// Compute intensity
map[x].compute(q, xf, yf, 0, 0);
}
map[x].c = 0;
// Loop through the image by zoomed character size step
xf -= xf/3; // Characters are usually overlapping by 1/3
xs -= xs % xf;
ys -= ys % yf;
for (y=0; y<ys; y+=yf, txt += eol)
for (x=0; x<xs; x+=xf)
{
// Compute intensity
gfx.compute(p, xf, yf, x, y);
// Find the closest match in map[]
i0 = 0; d0 = -1;
for (i=0; map[i].c; i++)
{
d = abs(map[i].il-gfx.il) +
abs(map[i].ir-gfx.ir) +
abs(map[i].iu-gfx.iu) +
abs(map[i].id-gfx.id) +
abs(map[i].ic-gfx.ic);
if ((d0<0)||(d0>d)) {
d0=d; i0=i;
}
}
// Add fitted character to output
txt += map[i0].c;
}
break;
}
// Free buffers
if (tmp) delete tmp;
if (p ) delete[] p;
return txt;
}
//---------------------------------------------------------------------------
AnsiString bmp2txt_small(Graphics::TBitmap *bmp) // pixel sized areas
{
AnsiString m = " `'.,:;i+o*%&$##"; // Constant character map
int x, y, i, c, l;
BYTE *p;
AnsiString txt = "", eol = "\r\n";
l = m.Length();
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf32bit;
for (y=0; y<bmp->Height; y++)
{
p = (BYTE*)bmp->ScanLine[y];
for (x=0; x<bmp->Width; x++)
{
i = p[(x<<2)+0];
i += p[(x<<2)+1];
i += p[(x<<2)+2];
i = (i*l)/768;
txt += m[l-i];
}
txt += eol;
}
return txt;
}
//---------------------------------------------------------------------------
void update()
{
int x0, x1, y0, y1, i, l;
x0 = bmp->Width;
y0 = bmp->Height;
if ((x0<64)||(y0<64)) Form1->mm_txt->Text = bmp2txt_small(bmp);
else Form1->mm_txt->Text = bmp2txt_big (bmp, Form1->mm_txt->Font);
Form1->mm_txt->Lines->SaveToFile("pic.txt");
for (x1 = 0, i = 1, l = Form1->mm_txt->Text.Length();i<=l;i++) if (Form1->mm_txt->Text[i] == 13) { x1 = i-1; break; }
for (y1=0, i=1, l=Form1->mm_txt->Text.Length();i <= l; i++) if (Form1->mm_txt->Text[i] == 13) y1++;
x1 *= abs(Form1->mm_txt->Font->Size);
y1 *= abs(Form1->mm_txt->Font->Height);
if (y0<y1) y0 = y1; x0 += x1 + 48;
Form1->ClientWidth = x0;
Form1->ClientHeight = y0;
Form1->Caption = AnsiString().sprintf("Picture -> Text (Font %ix%i)", abs(Form1->mm_txt->Font->Size), abs(Form1->mm_txt->Font->Height));
}
//---------------------------------------------------------------------------
void draw()
{
Form1->ptb_gfx->Canvas->Draw(0, 0, bmp);
}
//---------------------------------------------------------------------------
void load(AnsiString name)
{
bmp->LoadFromFile(name);
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf32bit;
Form1->ptb_gfx->Width = bmp->Width;
Form1->ClientHeight = bmp->Height;
Form1->ClientWidth = (bmp->Width << 1) + 32;
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
load("pic.bmp");
update();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender)
{
delete bmp;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender)
{
draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseWheel(TObject *Sender, TShiftState Shift, int WheelDelta, TPoint &MousePos, bool &Handled)
{
int s = abs(mm_txt->Font->Size);
if (WheelDelta<0) s--;
if (WheelDelta>0) s++;
mm_txt->Font->Size = s;
update();
}
//---------------------------------------------------------------------------
It is simple a form application (Form1) with a single TMemo mm_txt in it. It loads an image, "pic.bmp", and then according to the resolution, choose which approach to use to convert to text which is saved to "pic.txt" and sent to memo to visualize.
For those without VCL, ignore the VCL stuff and replace AnsiString with any string type you have, and also the Graphics::TBitmap with any bitmap or image class you have at disposal with pixel access capability.
A very important note is that this uses the settings of mm_txt->Font, so make sure you set:
Font->Pitch = fpFixed
Font->Charset = OEM_CHARSET
Font->Name = "System"
to make this work properly, otherwise the font will not be handled as mono-spaced. The mouse wheel just changes the font size up/down to see results on different font sizes.
[Notes]
See Word Portraits visualization
Use a language with bitmap/file access and text output capabilities
I strongly recommend to start with the first approach as it is very easy straightforward and simple, and only then move to the second (which can be done as modification of the first, so most of the code stays as is anyway)
It is a good idea to compute with inverted intensity (black pixels is the maximum value) because the standard text preview is on a white background, hence leading to much better results.
you can experiment with size, count, and layout of the subdivision zones or use some grid like 3x3 instead.
Comparison
Finally here is a comparison between the two approaches on the same input:
The green dot marked images are done with approach #2 and the red ones with #1, all on a six-pixel font size. As you can see on the light bulb image, the shape-sensitive approach is much better (even if the #1 is done on a 2x zoomed source image).
Cool application
While reading today's new questions, I got an idea of a cool application that grabs a selected region of the desktop and continuously feed it to the ASCIIart convertor and view the result. After an hour of coding, it's done and I am so satisfied with the result that I simply must have to add it here.
OK the application consists of just two windows. The first master window is basically my old convertor window without the image selection and preview (all the stuff above is in it). It has just the ASCII preview and conversion settings. The second window is an empty form with transparent inside for the grabbing area selection (no functionality whatsoever).
Now on a timer, I just grab the selected area by the selection form, pass it to conversion, and preview the ASCIIart.
So you enclose an area you want to convert by the selection window and view the result in master window. It can be a game, viewer, etc. It looks like this:
So now I can watch even videos in ASCIIart for fun. Some are really nice :).
If you want to try to implement this in GLSL, take a look at this:
Convert floating-point numbers to decimal digits in GLSL?

Related

How can I make the faces of my cube smoothly transition between all colors of the rainbow?

I have a program in Visual Studio that is correctly rendering a 3D cube that is slowly spinning. I have a working FillTriangle() function that fills in the faces of the cube with any color whose hex code I enter as a parameter (for example, 0x00ae00ff for purple). I have set the color of each face to start at red (0xFF000000), and then I have a while loop in main() that updates the scene and draws new pixels every frame. I also have a Timer class that handles all sorts of time-related things, including the Update() method that updates things every frame. I want to make it so that the colors of the faces smoothly transitions from one color to the next, through every color of the rainbow, and I want it to loop and do it as long as the program is running. Right now, it is smoothly transitioning between a few colors before suddenly jumping to another color. For example, it might smoothly transition from yellow to orange to red, but then suddenly jump to green. Here is the code that is doing that right now:
...
main()
{
...
float d = 0.0f; //float for the timer to increment
//screenPixels is the array of all pixels on the screen, numOfPixels is the number of pixels being displayed
while(Update(screenPixels, numOfPixels))
{
...
timer.Signal(); //change in time between the last 2 signals
d += timer.Delta(); //timer.Delta() is the average current time
if(d > (1/30)) // 1 divided by number of frames
{
//Reset timer
d = 0.0f;
//Add to current pixel color being displayed
pixelColor += 0x010101FF;
}
...
}
...
}
Is there a better way to approach this? Adding to the current pixel color was the first thing that came to my mind, and it's kind of working, but it keeps skipping colors for some reason.
That constant is going to overflow with each addition. Not just as a whole number, but across each component of the color spectrum: R, G, and B.
You need to break your pixelColor into separate Red, Green, and Blue colors and do math on each byte independently. And leave Alpha fixed at 255 (fully opaque). And check for overflow/underflow along the way. When you reach an overflow or underflow moment, just change direction from incrementing to decrementing.
Also, I wouldn't increment each component by the same value (1) on each step. With the same increment on R,G, and B, you'd just be adding "more white" to the color. If you want a more natural rainbow loop, we can do something like the following:
Change this:
pixelColor += 0x010101FF;
To this:
// I'm assuming pixelColor is RGBA
int r = (pixelColor >> 24) & 0x0ff;
int g = (pixelColor >> 16) & 0x0ff;
int b = (pixelColor >> 8) & 0x0ff;
r = Increment(r, &redInc);
r = Increment(g, &greenInc);
g = Increment(g, &blueInc);
pixelColor = (r << 24) | (g << 16) | (b << 8) | 0x0ff;
Where redInc, greenInc, and blueInc are defined and initialized outside your main while loop as follows:
int redInc = -1;
int greenInc = 2;
int blueInc = 4;
And the increment function is something like this:
void Increment(int color, int* increment) {
color += *increment;
if (color < 0) {
color = 0;
*increment = (rand() % 4 + 1);
} else if (color > 255) {
color = 255;
*increment = -(rand() % 4 + 1);
}
}
That should cycle through the colors in a more natural fashion (from darker to brighter to darker again) with a bit of randomness so it's never the same pattern twice. You can play with the randomness by adjusting the initial colorInc constants at initialization time as well as how the *increment value gets updated in the Increment function.
If you see any weird color flickering, it's quite possible that you have the alpha byte in the wrong position. It might be the high byte, not the low byte. Similarly, some systems order the colors in the integer as RGBA. Others do ARGB. And quite possible RGB is flipped with BGR.

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

Rogue line being drawn to window

I am making a graphing program in C++ using the SFML library. So far I have been able to draw a function to the screen. I have run into two problems along the way.
The first is a line which seems to return to the origin of my the plane, starting from the end of my function.
You can see it in this image:
As you can see this "rogue" line seems to change colour as it nears the origin. My first question is what is this line and how may I eradicate it from my window?
The second problem which is slightly unrelated and more mathematical can be seen in this image:
As you can see the asymptotes which are points where the graph is undefined or non continuous are being drawn. This leads me to my second question: is there a way ( in code ) to identify an asymptote and not draw it to the window.
My code for anything drawn to the window is:
VertexArray axis(Lines, 4);
VertexArray curve(PrimitiveType::LinesStrip, 1000);
axis[0].position = Vector2f(100000, 0);
axis[1].position = Vector2f(-100000, 0);
axis[2].position = Vector2f(0, -100000);
axis[3].position = Vector2f(0, 100000);
float x;
for (x = -pi; x < pi; x += .0005f)
{
curve.append(Vertex(Vector2f(x, -tan(x)), Color::Green));
}
I would very much appreciate any input : )
Update:
Thanks to the input of numerous people this code seems to work fine in fixing the asymptote problem:
for (x = -30*pi; x < 30*pi; x += .0005f)
{
x0 = x1; y0 = y1;
x1 = x; y1 = -1/sin(x);
a = 0;
a = fabs(atan2(y1 - y0, x1 - x0));
if (a > .499f*pi)
{
curve.append(Vertex(Vector2f(x1, y1), Color::Transparent));
}
else
{
curve.append(Vertex(Vector2f(x1, y1), Color::Green));
}
}
Update 2:
The following code gets rid of the rogue line:
VertexArray curve(Lines, 1000);
float x,y;
for (x = -30 * pi; x < 30 * pi; x += .0005f)
{
y = -asin(x);
curve.append(Vertex(Vector2f(x, y)));
}
for (x = -30 * pi + .0005f; x < 30 * pi; x += .0005f)
{
y = -asin(x);
curve.append(Vertex(Vector2f(x, y)));
}
The first problem looks like a wrong polyline/curve handling. Don't know what API are you using for rendering but some like GDI need to start the pen position properly. For example if you draw like this:
Canvas->LineTo(x[0],y[0]);
Canvas->LineTo(x[1],y[1]);
Canvas->LineTo(x[2],y[2]);
Canvas->LineTo(x[3],y[3]);
...
Then you should do this instead:
Canvas->MoveTo(x[0],y[0]);
Canvas->LineTo(x[1],y[1]);
Canvas->LineTo(x[2],y[2]);
Canvas->LineTo(x[3],y[3]);
...
In case your API needs MoveTo command and you are not setting it then last position is used (or default (0,0)) which will connect start of your curve with straight line from last draw or default pen position.
Second problem
In continuous data you can threshold the asymptotes or discontinuities by checking the consequent y values. If your curve render looks like this:
Canvas->MoveTo(x[0],y[0]);
for (i=1;i<n;i++) Canvas->LineTo(x[i],y[i]);
Then you can change it to something like this:
y0=y[0]+2*threshold;
for (i=0;i<n;i++)
{
if (y[1]-y0>=threshold) Canvas->MoveTo(x[i],y[i]);
else Canvas->LineTo(x[i],y[i]);
y0=y[i];
}
The problem is selection of the threshold because it is dependent on x density of sampled points and on the first derivation of your y data by x (slope angles)
If you are stacking up more functions the curve append will create your unwanted line ... instead handle each data as separate draw or put MoveTo command in between them
[Edit1]
I see it like this (fake split):
double x0,y0,x1,y1,a;
for (e=1,x = -pi; x < pi; x += .0005f)
{
// last point
x0=x1; y0=y1;
// your actual point
x1=x; y1=-tan(x);
// test discontinuity
if (e) { a=0; e=0; } else a=fabs(atan2(y1-y0,x1-x0));
if (a>0.499*M_PI) curve.append(Vertex(Vector2f(x1,y1), Color::Black));
else curve.append(Vertex(Vector2f(x1,y1), Color::Green));
}
the 0.499*M_PI is you threshold the more closer is to 0.5*M_PIthe bigger jumps it detects... I faked the curve split by black color (background) it will create gaps on axis intersections (unless transparency is used) ... but no need for list of curves ...
Those artifacts are due to the way sf::PrimitiveType::LinesStrip works (or more specific lines strips in general).
In your second example, visualizing y = -tan(x), you're jumping from positive infinity to negative infinity, which is the line you're seeing. You can't get rid of this, unless you're using a different primitive type or splitting your rendering into multiple draw calls.
Imagine a line strip as one long thread you're pinning with pushpins (representing your vertices). There's no (safe) way to go from positive infinity to negative infinity without those artifacts. Of course you could move outside the visible area, but then again that's really specific to this one function.

Plot an audio waveform in C++/Qt

I have an university assignement which consists in displaying the waveform of an audio file using C++/Qt. We should be able to modify the scale that we use to display it (expressed in audio samples per screen pixel).
So far, I am able to:
open the audio file
read the samples
plot the samples at a given scale
To plot the samples at a given scale, I have tried two strategies. Let assume that N is the value of the scale:
for i going from 0 to the width of my window, plot the i * Nth audio sample at the screen pixel i. This is very fast and constant in time because we always access the same amount of audio data points.
However, it does not represent the waveform correctly, as we use the value of only 1 point to represent N points.
for i going from 0 to N * width, plot the ith audio sample at the screen position i / (N * width) and let Qt figure out how to represent that correctly on physical screen pixels.
That plots very beautiful waveforms but it takes hell a lot of time to access data. For instance, if I want to display 500 samples per pixel and the width of my window is 100px, I have to access 50 000 points, which are then plotted by Qt as 100 physical points (pixels).
So, how can I get a correct plot of my audio data, which can be calculated fast? Should I calculate the average of N samples for each physical pixel? Should I do some curve fitting?
In other words, what kind of operation is involved when Qt/Matplotlib/Matlab/etc plot thousands of data point to a very limited amount of physical pixels?
Just because I do know how to do it and I already asked something similar on stackoverflow I'll reference this. I'll provide code later.
Drawing Waveforms is a real problem. I tried to figure this out for more than a half of a year!
To sum this up:
According to the Audacity Documentation:
The waveform view uses two shades of blue, one darker and one lighter.
The dark blue part of the waveform displays the tallest peak in the area that pixel represents. At default zoom level Audacity will
display many samples within that pixel width, so this pixel represents
the value of the loudest sample in the group.
The light blue part of the waveform displays the average RMS (Root Mean Square) value for the same group of samples. This is a rough
guide to how loud this area might sound, but there is no way to
extract or use this RMS part of the waveform separately.
So you simply try to get the important information out of a chunk of data. If you do this over and over you'll have multiple stages which can be used for drawing.
I'll provide some code here, please bear with me it's in development:
template<typename T>
class CacheHandler {
public:
std::vector<T> data;
vector2d<T> min, max, rms;
CacheHandler(std::vector<T>& data) throw(std::exception);
void addData(std::vector<T>& samples);
/*
irreversible removes data.
Fails if end index is greater than data length
*/
void removeData(int endIndex);
void removeData(int startIndex, int endIndex);
};
using this:
template<typename T>
inline WaveformPane::CacheHandler<T>::CacheHandler(std::vector<T>& data, int sampleSizeInBits) throw(std::exception)
{
this->data = data;
this->sampleSizeInBits = sampleSizeInBits;
int N = log(data.size()) / log(2);
rms.resize(N); min.resize(N); max.resize(N);
rms[0] = calcRMSSegments(data, 2);
min[0] = getMinPitchSegments(data, 2);
max[0] = getMaxPitchSegments(data, 2);
for (int i = 1; i < N; i++) {
rms[i] = calcRMSSegments(rms[i - 1], 2);
min[i] = getMinPitchSegments(min[i - 1], 2);
max[i] = getMaxPitchSegments(max[i - 1], 2);
}
}
What I'd suggest is something like this:
Given totalNumSamples audio samples in your audio file, and widgetWidth pixels of width in your display widget, you can calculate which samples are to be represented by each pixel:
// Given an x value (in pixels), returns the appropriate corresponding
// offset into the audio-samples array that represents the
// first sample that should be included in that pixel.
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int totalNumSamples)
{
return (totalNumSamples*x)/widgetWidth;
}
virtual void paintEvent(QPaintEvent * e)
{
QPainter p(this);
for (int x=0; x<widgetWidth; x++)
{
const int firstSampleIndexForPixel = GetFirstSampleIndexForPixel(x, widgetWidth, totalNumSamples);
const int lastSampleIndexForPixel = GetFirstSampleIndexForPixel(x+1, widgetWidth, totalNumSamples)-1;
const int largestSampleValueForPixel = GetMaximumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
const int smallestSampleValueForPixel = GetMinimumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
// draw a vertical line spanning all sample values that are contained in this pixel
p.drawLine(x, GetYValueForSampleValue(largestSampleValueForPixel), x, GetYValueForSampleValue(smallestSampleValueForPixel));
}
}
Note that I didn't include source code for GetMinimumSampleValueInRange(), GetMaximumSampleValueInRange(), or GetYValueForSampleValue(), since hopefully what they do is obvious from their names, but if not, let me know and I can explain them.
Once you have the above working reasonably well (i.e. drawing a waveform that shows the entire file into your widget), you can start working on adding in zoom-and-pan functionality. Horizontal zoom can be implemented by modifying the behavior of GetFirstSampleIndexForPixel(), e.g.:
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int sampleIndexAtLeftEdgeOfWidget, int sampleIndexAfterRightEdgeOfWidget)
{
int numSamplesToDisplay = sampleIndexAfterRightEdgeOfWidget-sampleIndexAtLeftEdgeOfWidget;
return sampleIndexAtLeftEdgeOfWidget+((numSamplesToDisplay*x)/widgetWidth);
}
With that, you can zoom/pan simply by passing in different values for sampleIndexAtLeftEdgeOfWidget and sampleIndexAfterRightEdgeOfWidget that together indicate the subrange of the file you want to display.

Vertically flipping an Char array: is there a more efficient way?

Lets start with some code:
QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
if (vertFlip){
/* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
* The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
* The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
* This function is EXTREMELY SLOW
*/
QByteArray tempArray = imageArray;
for (int h = 0; h < height; ++h){
for (int w = 0; w < width/2; ++w){
for (int i = 0; i < 4; ++i){
imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
}
}
}
}
return imageArray;
}
This is the code I use right now to vertically flip an image which is 640*480 (The image is actually not guaranteed to be 640*480, but it mostly is). The color encoding is RGBA, which means that the total array size is 640*480*4. I get the images with 30 FPS, and I want to show them on the screen with the same FPS.
On an older CPU (Athlon x2) this code is just too much: the CPU is racing to keep up with the 30 FPS, so the question is: can I do this more efficient?
I am also working with OpenGL, does that have a gimmic I am not aware of that can flip images with relativly low CPU/GPU usage?
According to this question, you can flip an image in OpenGL by scaling it by (1,-1,1). This question explains how to do transformations and scaling.
You can improve at least by doing it blockwise, making use of the cache architecture. In your example one of the accesses (either the read OR the write) will be off-cache.
For a start it can help to "capture scanlines" if you're using two loops to loop through the pixels of an image, like so:
for (int y = 0; y < height; ++y)
{
// Capture scanline.
char* scanline = imageArray.data() + y*width*4;
for (int x = 0; x < width/2; ++x)
{
const int flipped_x = width - x-1;
for (int i = 0; i < 4; ++i)
swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
}
}
Another thing to note is that I used swap instead of a temporary image. That'll tend to be more efficient since you can just swap using registers instead of loading pixels from a copy of the entire image.
But also it generally helps if you use a 32-bit integer instead of working one byte at a time if you're going to be doing anything like this. If you're working with pixels with 8-bit types but know that each pixel is 32-bits, e.g., as in your case, you can generally get away with a case to uint32_t*, e.g.
for (int y = 0; y < height; ++y)
{
uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
std::reverse(scanline, scanline + width);
}
At this point you might parellelize the y loop. Flipping an image horizontally (it should be "horizontal" if I understood your original code correctly) in this way is a little bit tricky with the access patterns, but you should be able to get quite a decent boost using the above techniques.
I am also working with OpenGL, does that have a gimmic I am not aware
of that can flip images with relativly low CPU/GPU usage?
Naturally the fastest way to flip images is to not touch their pixels at all and just save the flipping for the final part of the pipeline when you render the result. For this you might render a texture in OGL with negative scaling instead of modifying the pixels of a texture.
Another thing that's really useful in video and image processing is to represent an image to process like this for all your image operations:
struct Image32
{
uint32_t* pixels;
int32_t width;
int32_t height;
int32_t x_stride;
int32_t y_stride;
};
The stride fields are what you use to get from one scanline (row) of an image to the next vertically and one column to the next horizontally. When you use this representation, you can use negative values for the stride and offset the pixels accordingly. You can also use the stride fields to, say, render only every other scanline of an image for fast interactive half-res scanline previews by using y_stride=height*2 and height/=2. You can quarter-res an image by setting x stride to 2 and y stride to 2*width and then halving the width and height. You can render a cropped image without making your blit functions accept a boatload of parameters by just modifying these fields and keeping the y stride to width to get from one row of the cropped section of the image to the next:
// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source,
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);
// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst,
int src_x, int src_y, int src_w, int src_h,
const uint32_t* src, bool flip_x, bool flip_y);
By using negative values and passing it to an image operation (ex: a blit operation), the result will naturally be flipped without having to actually flip the image. It'll end up being "drawn flipped", so to speak, just as with the case of using OGL with a negative scaling transformation matrix.