Change Perlin noise algorithm to work with continuous procedural generation - c++

Right now I have a perlin noise function where I pass a buffer of seeds and another buffer which the function fills with the noise values. I am using this to procedurely generate the heights of the vertices in a terrain. The problem is right now the terrain is limited to the size of the buffer but I want to have it continuosly generate chunks with the chunks being consistant with eachother but I don't see how to do that with the current function I am using. Here is the code for the algorithm is there anything I can change to make it work?
inline void perlInNoise2D(int nWidth,int nHeight, float *Seed, int nOctaves, float fBias, float *fOutput)
{
for(int x = 0; x < nWidth; x++)
{
for(int y = 0; y < nHeight; y++)
{
float fNoise = 0.0f;
float fScale = 1.0f;
float fScaleAccum = 0.0f;
for(int o = 0; o < nOctaves;o++)
{
int nPitch = nWidth >> o;
int sampleX1 = (x / nPitch) * nPitch;
int sampleY1 = (y / nPitch) * nPitch;
int sampleX2 = (sampleX1 + nPitch) % nWidth;
int sampleY2 = (sampleY1 + nPitch) % nWidth;
float fBlendX = (float)(x - sampleX1) / (float) nPitch;
float fBlendY = (float)(y - sampleY1) / (float) nPitch;
float fSampleT = (1.0f - fBlendX) * Seed[sampleY1 * nWidth + sampleX1] + fBlendX * Seed[sampleY1 * nWidth + sampleX2];
float fSampleB = (1.0f - fBlendX) * Seed[sampleY2 * nWidth + sampleX1] + fBlendX * Seed[sampleY2 * nWidth + sampleX2];
fNoise += (fBlendY * (fSampleB - fSampleT) + fSampleT) * fScale;
fScaleAccum += fScale;
fScale = fScale / fBias;
}
fOutput[(y * nWidth) + x] = fNoise / fScaleAccum;
}
}
}

Presumably this is tied in to a "map reveal" mechanism?
A common technique is to generate overlapping chunks and average them together. As a simple example, you generate chunks of 2*nWidth by 2*nHeight. You'd then have 4 overlapping chunks at any XY pos. At the edge of the map, you'll have a strip where not all chunks have been generated. When this part of the map needs to be revealed, you generate those chunks on the fly. This moves the edge outwards.
The averaging process already smooths out the boundary effects. You can make this more effective by smoothing out each individual chunk near its edges. Since the chunk edges do not coincide, the smoothing of different chunks does not coincide either. A simple triangle smooth could be sufficient (i.e. the smooth window is 1 in the middle, 0 at the edge, and linear in between) but you could also use a gaussian or any other function that peaks in the middle and gradually smooths towards the chunk edge.

Related

Optimizing raster algorithm in OpenCL 32seconds for a cube on nVidia RTX 3080

I'm new in OpenCl. I wrote an OpenCL software rasterizer to rasterize triangles. Now the time that is used for a cube is 32Seconds, which is too much, I'm testing on nVidia RTX3080 Laptop.
The result is very weird and it's too slow.
Here is the kernel,
___kernel void fragment_shader(__global struct Fragment* fragments, __global struct Triangle_* triangles, int triCount)
{
size_t px = get_global_id(0); // triCount
//size_t py = get_global_id(1); // triCount
int imageWidth = 256;
int imageHeight = 256;
if(px < triCount)
{
float3 v0Raster = (float3)(triangles[px].v[0].pos[0], triangles[px].v[0].pos[1], triangles[px].v[0].pos[2]);
float3 v1Raster = (float3)(triangles[px].v[1].pos[0], triangles[px].v[1].pos[1], triangles[px].v[1].pos[2]);
float3 v2Raster = (float3)(triangles[px].v[2].pos[0], triangles[px].v[2].pos[1], triangles[px].v[2].pos[2]);
float xmin = min3(v0Raster.x, v1Raster.x, v2Raster.x);
float ymin = min3(v0Raster.y, v1Raster.y, v2Raster.y);
float xmax = max3(v0Raster.x, v1Raster.x, v2Raster.x);
float ymax = max3(v0Raster.y, v1Raster.y, v2Raster.y);
float slope = (ymax - ymin) / (xmax - xmin);
// be careful xmin/xmax/ymin/ymax can be negative. Don't cast to uint32_t
unsigned int x0 = max((uint)0, (uint)(floor(xmin)));
unsigned int x1 = min((uint)(imageWidth) - 1, (uint)(floor(xmax)));
unsigned int y0 = max((uint)0, (uint)(floor(ymin)));
unsigned int y1 = min((uint)(imageHeight) - 1, (uint)(floor(ymax)));
float3 v0 = v0Raster;
float3 v1 = v1Raster;
float3 v2 = v2Raster;
float area = edgeFunction(v0Raster, v1Raster, v2Raster);
for (unsigned int y = y0; y <= y1; ++y) {
for (unsigned int x = x0; x <= x1; ++x) {
float3 p = { x + 0.5f, y + 0.5f, 0 };
float w0 = edgeFunction(v1Raster, v2Raster, p);
float w1 = edgeFunction(v2Raster, v0Raster, p);
float w2 = edgeFunction(v0Raster, v1Raster, p);
if (w0 >= 0 && w1 >= 0 && w2 >= 0) {
fragments[y * 256 + x].col[0] = 1.0f;
fragments[y * 256 + x].col[1] = 0;
fragments[y * 256 + x].col[2] = 0;
}
}
}
}
}
The kernel is supposed to run for every triangle, and does box testing and rasterize the pixels.
here is how I invoke it:
global_size[0] = triCount-1;
auto time_start = std::chrono::high_resolution_clock::now();
err = clEnqueueNDRangeKernel(commandQueue, kernel_fragmentShader, 1, NULL, global_size,
NULL, 0, NULL, NULL);
if (err < 0) {
perror("Couldn't enqueue the kernel_fragmentShader");
exit(1);
}
I tried to omit lighting and everything still it takes around 20seconds to render a cube.
This kind of approach is well suited for massively parallel rendering like on GPU. I assume you are doing this on CPU side so the performance is poor as you have no or too small parallelization and no or very little HW support for used operations. On GPU you got SIMD instructions for most of the stuff needed and a lot of is done in HW instead of in code).
To gain speed on CPU side see how to rasterize rotated rectangle this was standard way of SW rendering back in the days before GPUs. The method simply renders edges of convex polygon (or triangle) as lines into 2 buffers (start end poins per horizontal line) and then just fill or interpolate the horizontal lines. This uses much much less operations per pixel.
Your method computes point inside triangle per each pixel of BBOX which meas much more pixels are processed and each pixel need too many complicated operations which kills performance.
On top of this your code is not optimized for example
fragments[y * 256 + x].col[0] = 1.0f;
fragments[y * 256 + x].col[1] = 0;
fragments[y * 256 + x].col[2] = 0;
Why are you computing y * 256 + x 3 times? also I would feel better with (y<<8)+x but nowadays compilers should do it for you. You can also just add 256 to starting address instead of multiplying...
I do not code in OpenCL (IIRC its for computer vision and DIP not for rendering) so I hope you have direct access to fragments[] and not some constrained with additional test which kills performance a lot (similar to putpixel,setpixel,pixel[][],etc. on most gfx apis which can kill performance even up to 10000x times)

how to implement a c++ function which creates a swirl on an image

imageData = new double*[imageHeight];
for(int i = 0; i < imageHeight; i++) {
imageData[i] = new double[imageWidth];
for(int j = 0; j < imageWidth; j++) {
// compute the distance and angle from the swirl center:
double pixelX = (double)i - swirlCenterX;
double pixelY = (double)j - swirlCenterY;
double pixelDistance = pow(pow(pixelX, 2) + pow(pixelY, 2), 0.5);
double pixelAngle = atan2(pixelX, pixelY);
// double swirlAmount = 1.0 - (pixelDistance/swirlRadius);
// if(swirlAmount > 0.0) {
// double twistAngle = swirlTwists * swirlAmount * PI * 2.0;
double twistAngle = swirlTwists * pixelDistance * PI * 2.0;
// adjust the pixel angle and compute the adjusted pixel co-ordinates:
pixelAngle += twistAngle;
pixelX = cos(pixelAngle) * pixelDistance;
pixelY = sin(pixelAngle) * pixelDistance;
// }
(this)->setPixel(i, j, tempMatrix[(int)(swirlCenterX + pixelX)][(int)(swirlCenterY + pixelY)]);
}
}
I am trying to implement a c++ function (code above) based on the following pseudo-code
which is supposed to create a swirl on an image, but I have some continuity problems on the borders.
The function I have for the moment is able to apply the swirl on a disk of a given size and to deform it almost as I whished but its influence doesn't decrease as we get close to the borders. I tried to multiply the angle of rotation by a 1 - (r/R) factor (with r the distance between the current pixel in the function and the center of the swirl, and R the radius of the swirl), but this doesn't give the effect I hoped for.
Moreover, I noticed that at some parts of the border, a thin white line appears (which means that the values of the pixels there is equal to 1) and I can't exactly explain why.
Maybe some of the problems I have are linked to the atan2 C++ standard function.

Character recognition from an image C++

*Note: while this post is pretty much asking about bilinear interpolation I kept the title more general and included extra information in case someone has any ideas on how I can possibly do this better
I have been having trouble implementing a way to identify letters from an image in order to create a word search solving program. For mainly educational but also portability purposes, I have been attempting this without the use of a library. It can be assumed that the image the characters will be picked off of contains nothing else but the puzzle. Although this page is only recognizing a small set of characters, I have been using it to guide my efforts along with this one as well. As the article suggested I have an image of each letter scaled down to 5x5 to compare each unknown letter to. I have had the best success by scaling down the unknown to 5x5 using bilinear resampling and summing the squares of the difference in intensity of each corresponding pixel in the known and unknown images. To attempt to get more accurate results I also added the square of the difference in width:height ratios, and white:black pixel ratios of the top half and bottom half of each image. The known image with the closest "difference score" to the unknown image is then considered the unknown letter. The problem is that this seems to have only about a 50% accuracy. To improve this I have tried using larger samples (instead of 5x5 I tried 15x15) but this proved even less effective. I also tried to go through the known and unknown images and look for features and shapes, and determine a match based on two images having about the same amount of the same features. For example shapes like the following were identified and counted up (Where ■ represents a black pixel). This proved less effective as the original method.
■ ■ ■ ■
■ ■
So here is an example: the following image gets loaded:
The program then converts it to monochrome by determining if each pixel has an intensity above or below the average intensity of an 11x11 square using a summed area table, fixes the skew and picks out the letters by identifying an area of relatively equal spacing. I then use the intersecting horizontal and vertical spaces to get a general idea of where each character is. Next I make sure that the entire letter is contained in each square picked out by going line by line, above, below, left and right of the original square until the square's border detects no dark pixels on it.
Then I take each letter, resample it and compare it to the known images.
*Note: the known samples are using arial font size 12, rescaled in photoshop to 5x5 using bilinear interpolation.
Here is an example of a successful match:
The following letter is picked out:
scaled down to:
which looks like
from afar. This is successfully matched to the known N sample:
Here is a failed match:
is picked out and scaled down to:
which, to no real surprise does not match to the known R sample
I changed how images are picked out, so that the letter is not cut off as you can see in the above images so I believe the issue comes from scaling the images down. Currently I am using bilinear interpolation to resample the image. To understand how exactly this works with downsampling I referred to the second answer in this post and came up with the following code. Previously I have tested that this code works (at least to a "this looks ok" point) so it could be a combination of factors causing problems.
void Image::scaleTo(int width, int height)
{
int originalWidth = this->width;
int originalHeight = this->height;
Image * originalData = new Image(this->width, this->height, 0, 0);
for (int i = 0; i < this->width * this->height; i++) {
int x = i % this->width;
int y = i / this->width;
originalData->setPixel(x, y, this->getPixel(x, y));
}
this->resize(width, height); //simply resizes the image, after the resize it is just a black bmp.
double factorX = (double)originalWidth / width;
double factorY = (double)originalHeight / height;
float * xCenters = new float[originalWidth]; //the following stores the "centers" of each pixel.
float * yCenters = new float[originalHeight];
float * newXCenters = new float[width];
float * newYCenters = new float[height];
//1 represents one of the originally sized pixel's side length
for (int i = 0; i < originalWidth; i++)
xCenters[i] = i + 0.5;
for (int i = 0; i < width; i++)
newXCenters[i] = (factorX * i) + (factorX / 2.0);
for (int i = 0; i < height; i++)
newYCenters[i] = (factorY * i) + (factorY / 2.0);
for (int i = 0; i < originalHeight; i++)
yCenters[i] = i + 0.5;
/* p[0] p[1]
p
p[2] p[3] */
//the following will find the closest points to the sampled pixel that still remain in this order
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
POINT p[4]; //POINT used is the Win32 struct POINT
float pDists[4] = { FLT_MAX, FLT_MAX, FLT_MAX, FLT_MAX };
float xDists[4];
float yDists[4];
for (int i = 0; i < originalWidth; i++) {
for (int j = 0; j < originalHeight; j++) {
float xDist = abs(xCenters[i] - newXCenters[x]);
float yDist = abs(yCenters[j] - newYCenters[y]);
float dist = sqrt(xDist * xDist + yDist * yDist);
if (xCenters[i] < newXCenters[x] && yCenters[j] < newYCenters[y] && dist < pDists[0]) {
p[0] = { i, j };
pDists[0] = dist;
xDists[0] = xDist;
yDists[0] = yDist;
}
else if (xCenters[i] > newXCenters[x] && yCenters[j] < newYCenters[y] && dist < pDists[1]) {
p[1] = { i, j };
pDists[1] = dist;
xDists[1] = xDist;
yDists[1] = yDist;
}
else if (xCenters[i] < newXCenters[x] && yCenters[j] > newYCenters[y] && dist < pDists[2]) {
p[2] = { i, j };
pDists[2] = dist;
xDists[2] = xDist;
yDists[2] = yDist;
}
else if (xCenters[i] > newXCenters[x] && yCenters[j] > newYCenters[y] && dist < pDists[3]) {
p[3] = { i, j };
pDists[3] = dist;
xDists[3] = xDist;
yDists[3] = yDist;
}
}
}
//channel is a typedef for unsigned char
//getOPixel(point) is a macro for originalData->getPixel(point.x, point.y)
float r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).r + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).r;
float r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).r + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).r;
float interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel r = (channel)round(interpolated);
r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).g + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).g; //yDist[3]
r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).g + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).g; //yDist[0]
interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel g = (channel)round(interpolated);
r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).b + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).b; //yDist[3]
r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).b + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).b; //yDist[0]
interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel b = (channel)round(interpolated);
this->setPixel(x, y, { r, g, b });
}
}
delete[] xCenters;
delete[] yCenters;
delete[] newXCenters;
delete[] newYCenters;
delete originalData;
}
I have utmost respect for anyone even remotely willing to sift through this to try and help. Any and all suggestion will be extremely appreciated.
UPDATE:
So as suggested I started augmenting the known data set with scaled down letters from word searches. This greatly improved accuracy from about 50% to 70% (percents calculated from a very small sample size so take the numbers lightly). Basically I'm using the original set of chars as a base (this original set was actually the most accurate out of other sets I've tried ex: a set calculated using the same resampling algorithm, a set using a different font etc.) And I just am manually adding knowns to that set. I basically will manually assign the first 20 or so images picked out in a search their corresponding letter and save that into the known set folder. I still am choosing the closest out of the entire known set to match a letter. Would this still be a good method or should some kind of change be made? I also implemented a feature where if a letter is about a 90% match with a known letter, I assume the match is correct and and the current "unknown" to the list of knowns. I could see this possibly going both ways, I feel like it could either a. make the program more accurate over time or b. solidify the original guess and possibly make the program less accurate over time. I have actually not noticed this cause a change (either for the better or for the worse). Am I on the right track with this? I'm not going to call this solved just yet, until I get accuracy just a little higher and test the program from more examples.

Perlin Noise getting wrong values in Y axis (C++)

Issue
I'm trying to implement the Perlin Noise algorithm in 2D with a single octave with a size of 16x16. I'm using this as heightmap data for a terrain, however it only seems to work in one axis. Whenever the sample point moves to a new Y section in the Perlin Noise grid, the gradient is very different from what I expect (for example, it often flips from 0.98 to -0.97, which is a very sudden change).
This image shows the staggered terrain in the z direction (which is the y axis in the 2D Perlin Noise grid)
Code
I've put the code that calculates which sample point to use at the end since it's quite long and I believe it's not where the issue is, but essentially I scale down the terrain to match the Perlin Noise grid (16x16) and then sample through all the points.
Gradient At Point
So the code that calculates out the gradient at a sample point is the following:
// Find the gradient at a certain sample point
float PerlinNoise::gradientAt(Vector2 point)
{
// Decimal part of float
float relativeX = point.x - (int)point.x;
float relativeY = point.y - (int)point.y;
Vector2 relativePoint = Vector2(relativeX, relativeY);
vector<float> weights(4);
// Find the weights of the 4 surrounding points
weights = surroundingWeights(point);
float fadeX = fadeFunction(relativePoint.x);
float fadeY = fadeFunction(relativePoint.y);
float lerpA = MathUtils::lerp(weights[0], weights[1], fadeX);
float lerpB = MathUtils::lerp(weights[2], weights[3], fadeX);
float lerpC = MathUtils::lerp(lerpA, lerpB, fadeY);
return lerpC;
}
Surrounding Weights of Point
I believe the issue is somewhere here, in the function that calculates the weights for the 4 surrounding points of a sample point, but I can't seem to figure out what is wrong since all the values seem sensible in the function when stepping through it.
// Find the surrounding weight of a point
vector<float> PerlinNoise::surroundingWeights(Vector2 point){
// Produces correct values
vector<Vector2> surroundingPoints = surroundingPointsOf(point);
vector<float> weights;
for (unsigned i = 0; i < surroundingPoints.size(); ++i) {
// The corner to the sample point
Vector2 cornerToPoint = surroundingPoints[i].toVector(point);
// Getting the seeded vector from the grid
float x = surroundingPoints[i].x;
float y = surroundingPoints[i].y;
Vector2 seededVector = baseGrid[x][y];
// Dot product between the seededVector and corner to the sample point vector
float dotProduct = cornerToPoint.dot(seededVector);
weights.push_back(dotProduct);
}
return weights;
}
OpenGL Setup and Sample Point
Setting up the heightmap and getting the sample point. Variables 'wrongA' and 'wrongA' is an example of when the gradient flips and changes suddenly.
void HeightMap::GenerateRandomTerrain() {
int perlinGridSize = 16;
PerlinNoise perlin_noise = PerlinNoise(perlinGridSize, perlinGridSize);
numVertices = RAW_WIDTH * RAW_HEIGHT;
numIndices = (RAW_WIDTH - 1) * (RAW_HEIGHT - 1) * 6;
vertices = new Vector3[numVertices];
textureCoords = new Vector2[numVertices];
indices = new GLuint[numIndices];
float perlinScale = RAW_HEIGHT/ (float) (perlinGridSize -1);
float height = 50;
float wrongA = perlin_noise.gradientAt(Vector2(0, 68.0f / perlinScale));
float wrongB = perlin_noise.gradientAt(Vector2(0, 69.0f / perlinScale));
for (int x = 0; x < RAW_WIDTH; ++x) {
for (int z = 0; z < RAW_HEIGHT; ++z) {
int offset = (x* RAW_WIDTH) + z;
float xVal = (float)x / perlinScale;
float yVal = (float)z / perlinScale;
float noise = perlin_noise.gradientAt(Vector2( xVal , yVal));
vertices[offset] = Vector3(x * HEIGHTMAP_X, noise * height, z * HEIGHTMAP_Z);
textureCoords[offset] = Vector2(x * HEIGHTMAP_TEX_X, z * HEIGHTMAP_TEX_Z);
}
}
numIndices = 0;
for (int x = 0; x < RAW_WIDTH - 1; ++x) {
for (int z = 0; z < RAW_HEIGHT - 1; ++z) {
int a = (x * (RAW_WIDTH)) + z;
int b = ((x + 1)* (RAW_WIDTH)) + z;
int c = ((x + 1)* (RAW_WIDTH)) + (z + 1);
int d = (x * (RAW_WIDTH)) + (z + 1);
indices[numIndices++] = c;
indices[numIndices++] = b;
indices[numIndices++] = a;
indices[numIndices++] = a;
indices[numIndices++] = d;
indices[numIndices++] = c;
}
}
BufferData();
}
Turned out the issue was in the interpolation stage:
float lerpA = MathUtils::lerp(weights[0], weights[1], fadeX);
float lerpB = MathUtils::lerp(weights[2], weights[3], fadeX);
float lerpC = MathUtils::lerp(lerpA, lerpB, fadeY);
I had the interpolation in the y axis the wrong way around, so it should have been:
lerp(lerpB, lerpA, fadeY)
Instead of:
lerp(lerpA, lerpB, fadeY)

How to rotate the contents of 16x16 bitmap array using only maths (no scaling, just clip the corners off)

another question from me about bitmaps! A quick intro to this: I'm working on a university project where I have no external libraries, only the basic windows/c++, this bitmap rotation must be done entirely by simply modifying pixels in an array.
I have a 16x16 bitmap (it's just a COLORREF array that's 16x16 elements long) and I want to rotate it about the centre point (or any point actually).
I have some code that almost works, it rotates it about the top-left corner so I know I'm close, I just don't know what to edit to offset that by 8 pixels as everything I can think of results in overflowing out of the 16x16 area.
Here's the code I currently have (which I grabbed from DrDobbs and modified it a bit, it had a scaling parameter (the (1.0) parts) which I didn't need).
void Sprite::DrawAt(Render* render, int x, int y, double angle)
{
COLORREF* tmp = new COLORREF[width * height];
int u, v;
for (int i = 0; i<height; i++)
{
for (int j = 0; j<width; j++)
{
u = cos(-angle) * j * (1.0) + sin(-angle) * i * (1.0);
v = -sin(-angle) * j * (1.0) + cos(-angle) * i * (1.0);
tmp[(i * width) + j] = bitmap[(v * width) + u];
}
}
// x-(width/2) renders it at the centre point instead of the top-left
render->BlockShiftBitmap(tmp, x - (width/2), y - (height/2), width, height, -1);
delete[] tmp;
}
(Excuse some of the bad coding habits here, I'm only interested in the topic at hand, everything else will get cleaned up another time).
That code results in this:
http://puu.sh/hp4nB/8279cd83dd.gif http://puu.sh/hp4nB/8279cd83dd.gif
It rotates around the top-left corner, and it also grabs out of bounds memory too. I could do with a solution that rotates around the centre (or any point, that would come in handy later on for things such as doors!) and also clips off the corners and ensures no random bits of memory end up in the resulting bitmap.
The result should hopefully look something like this with the black pixels turned white:
http://puu.sh/hp4uc/594dca91da.gif http://puu.sh/hp4uc/594dca91da.gif
(don't ask what the hell that creature is! he's some kind of red-eared debug-lizard)
Thanks, you awesome people here have helped quite a bit on this little project of mine!
could you try subtracting 8 from i's and j's
u = cos(-angle) * (j-8) * (1.0) + sin(-angle) * (i-8) * (1.0);
v = -sin(-angle) * (j-8) * (1.0) + cos(-angle) * (i-8) * (1.0);
To rotate around an origin (ox, oy), first substract these coordinates, then rotate, and then add them again.
// Choose the center as the origin
ox = width / 2;
oy = height / 2;
// Rotate around the origin by angle
u = cos(-angle) * (j-ox) + sin(-angle) * (i-oy) + ox;
v = -sin(-angle) * (j-ox) + cos(-angle) * (i-oy) + oy;
Then, add a bounds check before accessing your image, and use a replacement color for the "background", in case the coordinates are not within the bounds:
if (u >= 0 && u < width && v >= 0 && v < height)
tmp[(i * width) + j] = bitmap[(v * width) + u];
else
tmp[(i * width) + j] = 0; // However you represent white...