Butterfly pattern appears in random walk using srand(), why? - c++

About 3 years ago I coded a 2D random walk togheter with a coleague in C++, first it seemed to work properly as we obtained a diferent pattern each time. But whenever we decided to increase the number of steps above some threshold an apparent butterfly pattern appeared, we noticed that with each run of the code the pattern would repeat but starting on a different place of the butterfly. We concluded and reported then that it was due to the pseudorandom generator associated with srand() function, but today I found again this report and there are still some things that I would like to understand. I would like to understand better how the pseudorandom generator works in order to obtain this sort of symmetry and ciclic pattern. The pattern I'm talking about is this (The steps are color coded in a rainbow sequence to apreciate the progression of the walk):
EDIT:
I'm adding the code used to obtain this figure:
#include<iostream>
#include<cmath>
#include<stdlib.h>
#include<time.h>
#include <fstream>
#include <string.h>
#include <string>
#include <iomanip>
using namespace std;
int main ()
{
srand(time(NULL));
int num1,n=250000;
ofstream rnd_coordinates("Random2D.txt");
float x=0,y=0,sumx_f=0,sumy_f=0,sum_d=0,d_m,X,t,d;
float x_m,y_m;
x=0;
y=0;
for(int i=0;i<n;i++){
t=i;
num1= rand()%4;
if(num1==0){
x++;
}
if(num1==1){
x--;
}
if(num1==2){
y++;
}
if(num1==3){
y--;
}
rnd_coordinates<<x<<','<<y<<','<<t<<endl;
}
rnd_coordinates.close();
return 0;
}

You never hit rand()'s period, but keep in mind you don't actually use the entire rand() range that in its entirety guarantees a 2^32 period.
With that in mind, you have 2 options:
Use all the bits. rand() returns 2 bytes (16 bits), and you need 2 bits (for 4 possible values). Split that 16 bit output into chunks of 2 bits and use them all in sequence.
At the very least if you insist on using the lazy %n way, choose a modulo that's not a divisor of your period. For example choose 5 instead of 4, since 5 is prime, and if you get the 5th value reroll.

The code below constitutes a complete compileable example.
Your issue is with dropping bits from the random generator. Lets's see how one could write a source of random bit pairs that doesn't drop bits. It requires that RAND_MAX is of the form 2^n−1, but the idea could be extended to support any RAND_MAX >= 3.
#include <cassert>
#include <cstdint>
#include <cstdlib>
class RandomBitSource {
int64_t bits = rand();
int64_t bitMask = RAND_MAX;
static_assert((int64_t(RAND_MAX + 1) & RAND_MAX) == 0, "No support for RAND_MAX != 2^(n-1)");
public:
auto get2Bits() {
if (!bitMask) // got 0 bits
bits = rand(), bitMask = RAND_MAX;
else if (bitMask == 1) // got 1 bit
bits = (bits * (RAND_MAX+1)) | rand(), bitMask = (RAND_MAX+1) | RAND_MAX;
assert(bitMask & 3);
bitMask >>= 2;
int result = bits & 3;
bits >>= 2;
return result;
}
};
Then, the random walk implementation could be as follows. Note that the ' digit separator is a C++14 feature - quite handy.
#include <vector>
using num_t = int;
struct Coord { num_t x, y; };
struct Walk {
std::vector<Coord> points;
num_t min_x = {}, max_x = {}, min_y = {}, max_y = {};
Walk(size_t n) : points(n) {}
};
auto makeWalk(size_t n = 250'000)
{
Walk walk { n };
RandomBitSource src;
num_t x = 0, y = 0;
for (auto& point : walk.points)
{
const int bits = src.get2Bits(), b0 = bits & 1, b1 = bits >> 1;
x = x + (((~b0 & ~b1) & 1) - ((b0 & ~b1) & 1));
y = y + (((~b0 & b1) & 1) - ((b0 & b1) & 1));
if (x < walk.min_x)
walk.min_x = x;
else if (x > walk.max_x)
walk.max_x = x;
if (y < walk.min_y)
walk.min_y = y;
else if (y > walk.max_y)
walk.max_y = y;
point = { x, y };
}
return walk;
}
With a bit more effort, we can make this into an interactive Qt application. Pressing Return generates a new image.
The image is viewed at the native resolution of the screen it's displayed on, i.e. it maps to physical device pixels. The image is not scaled. Instead, it is rotated when needed to better fit into the screen's orientation (portrait vs landscape). That's for portrait monitor aficionados :)
#include <QtWidgets>
QImage renderWalk(const Walk& walk, Qt::ScreenOrientation orient)
{
using std::swap;
auto width = walk.max_x - walk.min_x + 3;
auto height = walk.max_y - walk.min_y + 3;
bool const rotated = (width < height) == (orient == Qt::LandscapeOrientation);
if (rotated) swap(width, height);
QImage image(width, height, QPixmap(1, 1).toImage().format());
image.fill(Qt::black);
QPainter p(&image);
if (rotated) {
p.translate(width, 0);
p.rotate(90);
}
p.translate(-walk.min_x, -walk.min_y);
auto constexpr hueStep = 1.0/720.0;
qreal hue = 0;
int const huePeriod = walk.points.size() * hueStep;
int i = 0;
for (auto& point : walk.points) {
if (!i--) {
p.setPen(QColor::fromHsvF(hue, 1.0, 1.0, 0.5));
hue += hueStep;
i = huePeriod;
}
p.drawPoint(point.x, point.y);
}
return image;
}
#include <ctime>
int main(int argc, char* argv[])
{
srand(time(NULL));
QApplication a(argc, argv);
QLabel view;
view.setAlignment(Qt::AlignCenter);
view.setStyleSheet("QLabel {background-color: black;}");
view.show();
auto const refresh = [&view] {
auto *screen = view.screen();
auto orientation = screen->orientation();
auto pixmap = QPixmap::fromImage(renderWalk(makeWalk(), orientation));
pixmap.setDevicePixelRatio(screen->devicePixelRatio());
view.setPixmap(pixmap);
view.resize(view.size().expandedTo(pixmap.size()));
};
refresh();
QShortcut enter(Qt::Key_Return, &view);
enter.setContext(Qt::ApplicationShortcut);
QObject::connect(&enter, &QShortcut::activated, &view, refresh);
return a.exec();
}

Every pseudorandom generator is a cycle of some sequence of numbers. One of the ways we distinguish "good" prngs from "bad" prngs is the length of this sequence. There is some state associated with the generator, so the maximum period is bounded by how many distinct states there are.
Your implementation has a "short" period, because it repeats in less than the age of the universe. It probably has 32 bits of state, so the period is at most 2^32.
As you are using C++, you can try again using a randomly seeded std::mt19937, and you won't see repeats.

You might want to look at my answer to another question here about older rand() implementations. Sometimes with the old rand() and srand() functions the lower order bits are much less random than the higher order bits. Some of these older implementations still persist, it's possible you used one.

Related

Fast method to access random image pixels and at most once

I'm learning OpenCV (C++) and as a simple practice, I designed a simple effect which makes some of image pixels black or white. I want each pixel to be edited at most once; so I added address of all pixels to a vector. But it made my code very slow; specially for large images or high amounts of effect. Here is my code:
void effect1(Mat& img, float amount) // 100 ≥ amount ≥ 0
{
vector<uchar*> addresses;
int channels = img.channels();
uchar* lastAddress = img.ptr<uchar>(0) + img.total() * channels;
for (uchar* i = img.ptr<uchar>(0); i < lastAddress; i += channels) addresses.push_back(i); //Fast Enough
size_t count = img.total() * amount / 100 / 2;
for (size_t i = 0; i < count; i++)
{
size_t addressIndex = xor128() % addresses.size(); //Fast Enough, xor128() is a fast random number generator
for (size_t j = 0; j < channels; j++)
{
*(addresses[addressIndex] + j) = 255;
} //Fast Enough
addresses.erase(addresses.begin() + addressIndex); // MAKES CODE EXTREMELY SLOW
}
for (size_t i = 0; i < count; i++)
{
size_t addressIndex = xor128() % addresses.size(); //Fast Enough, xor128() is a fast random number generator
for (size_t j = 0; j < channels; j++)
{
*(addresses[addressIndex] + j) = 0;
} //Fast Enough
addresses.erase(addresses.begin() + addressIndex); // MAKES CODE EXTREMELY SLOW
}
}
I think rearranging vector items after erasing an item is what makes my code slow (if I remove addresses.erase, code will run fast).
Is there any fast method to select each random item from a collection (or a number range) at most once?
Also: I'm pretty sure such effect already exists. Does anyone know the name of it?
This answer assumes you have a random bit generator function, since std::random_shuffle requires that. I don't know how xor128 works, so I'll use the functionality of the <random> library.
If we have a population of N items, and we want to select groups of size j and k randomly from that population with no overlap, we can write down the index of each item on a card, shuffle the deck, draw j cards, and then draw k cards. Everything left over is discarded. We can achieve this with the <random> library. Answer pending on how to incorporate a custom PRNG like you implemented with xor128.
This assumes that random_device won't work on your system (many compilers implement it in a way that it will always return the same sequence) so we seed the random generator with current time like the good old fashioned srand our mother used to make.
Untested since I don't know how to use OpenCV. Anyone with a lick of experience with that please edit as appropriate.
#include <ctime> // for std::time
#include <numeric> // for std::iota
#include <random>
#include <vector>
void effect1(Mat& img, float amount, std::mt19937 g) // 0.0 ≥ amount ≥ 1.00
{
std::vector<cv::Size> ind(img.total());
std::iota(ind.begin(), ind.end(), 0); // fills with 0, 1, 2, ...
std::random_shuffle(ind.begin(), ind.end(), g);
cv::Size count = img.total() * amount;
auto white = get_white<Mat>(); // template function to return this matrix' concept of white
// could easily replace with cv::Vec3d(255,255,255)
// if all your matrices are 3 channel?
auto black = get_black<Mat>(); // same but... opposite
auto end = ind.begin() + count;
for (auto it = ind.begin(), it != end; ++it)
{
img.at(*it) = white;
}
end = (ind.begin() + 2 * count) > ind.end() ?
ind.end() :
ind.begin() + 2 * count;
for (auto it = ind.begin() + count; it != end; ++it)
{
img.at(*it) = black;
}
}
int main()
{
std::mt19937 g(std::time(nullptr)); // you normally see this seeded with random_device
// but that's broken on some implementations
// adjust as necessary for your needs
cv::Mat mat = ... // make your cv objects
effect1(mat, 0.1, g);
// display it here
}
Another approach
Instead of shuffling indices and drawing cards from a deck, assume each pixel has a random probability of switching to white, switching to black, or staying the same. If your amount is 0.4, then select a random number between 0.0 and 1.0, any result between 0.0 and 0.4 flips the pixel black, and betwen 0.4 and 0.8 flips it white, otherwise it stays the same.
General algorithm:
given probability of flipping -> f
for each pixel in image -> p:
get next random float([0.0, 1.0)) -> r
if r < f
then p <- BLACK
else if r < 2*f
then p <- WHITE
You won't get the same number of white/black pixels each time, but that's randomness! We're generating a random number for each pixel anyway for the shuffling algorithm. This has the same complexity unless I'm mistaken.
Also: I'm pretty sure such effect already exists. Does anyone know the name of it?
The effect you're describing is called salt and pepper noise. There is no direct implementation in OpenCV that I know of though.
I think rearranging vector items after erasing an item is what makes
my code slow (if I remove addresses.erase, code will run fast).
Im not sure why you add your pixels to a vector in your code, it would make much more sense and also be much more performant to directly work on the Mat object and change the pixel value directly. You could use OpenCVs inbuild Mat.at() function to directly change the pixel values to either 0 or 255.
I would create a single loop which generates random indexes in the range of your image dimension and manipulate the image pixels directly. That way you are in O(n) for your noise addition. You could also just search for "OpenCV" and "salt and pepper noise", I am sure there already are a lot of really performant implementations.
I also post a simpler code:
void saltAndPepper(Mat& img, float amount)
{
vector<size_t> pixels(img.total()); // size_t = unsigned long long
uchar channels = img.channels();
iota(pixels.begin(), pixels.end(), 0); // Fill vector with 0, 1, 2, ...
shuffle(pixels.begin(), pixels.end(), mt19937(time(nullptr))); // Shuffle the vector
size_t count = img.total() * amount / 100 / 2;
for (size_t i = 0; i < count; i++)
{
for (size_t j = 0; j < channels; j++) // Set all pixel channels (e.g. Grayscale with 1 channel or BGR with 3 channels) to 255
{
*(img.ptr<uchar>(0) + (pixels[i] * channels) + j) = 255;
}
}
for (size_t i = count; i < count*2; i++)
{
for (size_t j = 0; j < channels; j++) // Set all pixel channels (e.g. Grayscale with 1 channel or BGR with 3 channels) to 0
{
*(img.ptr<uchar>(0) + (pixels[i] * channels) + j) = 0;
}
}
}

Arithmetic Coding FPAQ0 (a simple order-0 arithmetic file compressor )

I am trying to understand the code of fpaq0 aritmetic compressor but I am not able to fully understand it.Here is the link to the code -fpaq0.cpp
I am not able to understand exactly the how ct[512]['2] and cxt are working.Also I am not very much clear how decoder is working.Why before encoding every charater e.encode(0) is being called.
NOTE; I have understood the arithmetic coder presented in the link-Data Compression with Arithmetic Encoding
void update(int y) {
if (++ct[cxt][y] > 65534) {
ct[cxt][0] >>= 1;
ct[cxt][1] >>= 1;
}
if ((cxt+=cxt+y) >= 512)
cxt=1;
}
// Assume a stationary order 0 stream of 9-bit symbols
int p() const {
return 4096*(ct[cxt][1]+1)/(ct[cxt][0]+ct[cxt][1]+2);
}
inline void Encoder::encode(int y) {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
if (y)
x2=xmid;
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
putc(x2>>24, archive);
x1<<=8;
x2=(x2<<8)+255;
}
}
inline int Encoder::decode() {
// Update the range
const U32 xmid = x1 + ((x2-x1) >> 12) * predictor.p();
assert(xmid >= x1 && xmid < x2);
int y=0;
if (x<=xmid) {
y=1;
x2=xmid;
}
else
x1=xmid+1;
predictor.update(y);
// Shift equal MSB's out
while (((x1^x2)&0xff000000)==0) {
x1<<=8;
x2=(x2<<8)+255;
int c=getc(archive);
if (c==EOF) c=0;
x=(x<<8)+c;
}
return y;
}
fpaq0 is a file compressor which uses an order-0 bitwise model for modeling and uses 12-bits carry-less arithmetic coder for entropy coding stage. ct[512][2] stores counters for each contexts to compute symbol probabilities. The context (order-0 in fpaq0) is calculated with partial bits with a leading one (to simplify calculations).
For more easy explanation, let's skip EOF symbol for now. Order-0 context calculated as follow without EOF symbol (simplified):
// Full byte encoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Encoding part
int y = ReadNextBit();
int p = GetProbability(ctx);
EncodeBit(y, p);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
For decoding, context is used without EOF symbol like following (simplified):
// Full byte decoding
int cxt = 1; // context starts with leading one
for (int i = 0; i < 8; ++i) {
// Decoding part
int p = GetProbability(ctx);
int y = DecodeBit(p);
WriteBit(y);
// Model updating
UpdateCounter(cxt, y); // Update related counter
cxt = (cxt << 1) | y; // shift left and insert new bit
}
fpaq0 is designed as a streaming compressor. Meaning that it doesn't need to know exact length of the input stream. So, the question how decoder should know when to stop? EOF symbol used exactly for that. While encoding every single byte, a zero bit is encoded as a flag to indicate there is more data to follow. One indicates we reached the end of stream. So, decoder knows when to stop. That's the reason why our context model is 9-bits (EOF flag + 8 bits data).
Now, the last part: probability calculation. fpaq0 uses just counts of past symbols under order-0 context to calculate final probability.
n0 = count of 0
n1 = count of 1
p = n1 / (n0 + n1)
There are two implementation details that should be addressed: counter overflow and division by zero.
Counter overflow is addressed by halving both counts when they reach a threshold. Since, we're dealing with p, it makes sense.
Division by zero is addressed by inserting one into formula for each variables. So,
p = (n1 + 1) / ((n0 + 1) + (n1 + 1))

multiply numbers on all paths and get a number with minimum number of zeros

I have m*n table which each entry have a value .
start position is at top left corner and I can go right or down until I reach lower right corner.
I want a path that if I multiply numbers on that path I get a number that have minimum number of zeros in it's right side .
example:
1 2 100
5 5 4
possible paths :
1*2*100*4=800
1*2*5*4= 40
1*5*5*4= 100
Solution : 1*2*5*4= 40 because 40 have 1 zero but other paths have 2 zero.
easiest way is using dfs and calculate all paths. but it's not efficient.
I'm looking for an optimal substructure for solving it using dynammic programming.
After thinking for a while I came up to this equation :
T(i,j) = CountZeros(T(i-1,j)*table[i,j]) < CountZeros(T(i,j-1)*table[i,j]) ?
T(i-1,j)*table[i,j] : T(i,j-1)*table[i,j]
Code :
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
using namespace std;
using Table = vector<vector<int>>;
const int rows = 2;
const int cols = 3;
Table memo(rows, vector<int>(cols, -1));
int CountZeros(int number)
{
if (number < 0)
return numeric_limits<int>::max();
int res = 0;
while (number != 0)
{
if (number % 10 == 0)
res++;
else break;
number /= 10;
}
return res;
}
int solve(int i, int j, const Table& table)
{
if (i < 0 || j < 0)
return -1;
if (memo[i][j] != -1)
return memo[i][j];
int up = solve(i - 1, j, table)*table[i][j];
int left = solve(i, j - 1, table)*table[i][j];
memo[i][j] = CountZeros(up) < CountZeros(left) ? up : left;
return memo[i][j];
}
int main()
{
Table table =
{
{ 1, 2, 100 },
{ 5, 5, 4 }
};
memo[0][0] = table[0][0];
cout << solve(1, 2, table);
}
(Run )
But it is not optimal (for example in above example it give 100 )
Any idea for better optimal sub-structure ? can I solve it with dynammic programming ?!
Let's reconsider the Bellman optimality equation for your task. I consider this as a systematic approach to such problems (whereas I often don't understand DP one-liners). My reference is the book of Sutton and Barto.
The state in which your system is can be described by a triple of integer numbers (i,j,r) (which is modeled as a std::array<int,3>). Here, i and j denote column and row in your rectangle M = m_{i,j}, whereas r denotes the multiplication result.
Your actions in state (i,j,r) are given by going right, with which you end in state (i, j+1, r*m_{i,j+1}) or by going down which leads to the state (i+1, j, r*m_{i+1,j}).
Then, the Bellman equation is given by
v(i,j,r) = min{ NullsIn(r*m_{i+1,j}) - NullsIn(r) + v_(i+1,j, r*m_{i+1,j})
NullsIn(r*m_{i,j+1}) - NullsIn(r) + v_(i,j+1, r*m_{i,j+1}) }
The rationale behind this equation is the following: NullsIn(r*m_{i+1,j}) - NullsIn(r) denotes the zeros you have to add when you take one of the two actions, i.e. the instant penalty. v_(i+1,j, r*m_{i+1,j}) denotes the zeros in the state you get to when you take this action. Now one wants to take the action which minimizes both contributions.
What you need further is only a function int NullsIn(int) which returns the nulls in a given integer. Here is my attempt:
int NullsIn(int r)
{
int ret=0;
for(int j=10; j<=r; j*=10)
{
if((r/j) * j == r)
++ret;
}
return ret;
}
For convenience I further defined a NullsDifference function:
int NullsDifference(int r, int m)
{
return NullsIn(r*m) - NullsIn(r);
}
Now, one has to do a backwards iteration starting from the initial state in the right bottom element of the matrix.
int backwardIteration(std::array<int,3> state, std::vector<std::vector<int> > const& m)
{
static std::map<std::array<int,3>, int> memoization;
auto it=memoization.find(state);
if(it!=memoization.end())
return it->second;
int i=state[0];
int j=state[1];
int r=state[2];
int ret=0;
if(i>0 && j>0)
{
int inew=i-1;
int jnew=j-1;
ret=std::min(NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m),
NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m));
}
else if(i>0)
{
int inew=i-1;
ret= NullsDifference(r, m[inew][j]) + backwardIteration({inew,j,r*m[inew][j]}, m);
}
else if(j>0)
{
int jnew=j-1;
ret= NullsDifference(r, m[i][jnew]) + backwardIteration({i,jnew,r*m[i][jnew]}, m);
}
memoization[state]=ret;
return ret;
}
This routine is called via
int main()
{
int ncols=2;
int nrows=3;
std::vector<std::vector<int> > m={{1,2,100}, {5,5,4}};
std::array<int,3> initialState = {ncols-1, nrows -1, m[ncols-1][nrows - 1]};
std::cout<<"Minimum number of zeros: "backwardIteration(initialState, m)<<"\n"<<std::endl;
}
For your array, it prints out the desired 1 for the number of zeros.
Here is a live demo on Coliru.
EDIT
Here is an important thing: in production, you usually don't call backwardIteration as I did because it takes an exponentially increasing number of recursive calls. Rather, you start in the top left and call it, then store the result. Next you go left and down and each time call backwardIteration where you now use the previously stored result. And so on.
In order to do this, one needs a memoization concept within the function backwardIteration, which returns the already stored result instead of invoking another recursive call.
I've added memoization in the function call above. Now you can loop through the array from left top to right bottom in any way you like -- but prefereably take small steps, such as row-by-row, column-by-column, or rectangle-for-rectangle.
In fact, this and only this is the spirit of Dynamic Programming.

Drawing circle, OpenGL style

I have a 13 x 13 array of pixels, and I am using a function to draw a circle onto them. (The screen is 13 * 13, which may seem strange, but its an array of LED's so that explains it.)
unsigned char matrix[13][13];
const unsigned char ON = 0x01;
const unsigned char OFF = 0x00;
Here is the first implementation I thought up. (It's inefficient, which is a particular problem as this is an embedded systems project, 80 MHz processor.)
// Draw a circle
// mode is 'ON' or 'OFF'
inline void drawCircle(float rad, unsigned char mode)
{
for(int ix = 0; ix < 13; ++ ix)
{
for(int jx = 0; jx < 13; ++ jx)
{
float r; // Radial
float s; // Angular ("theta")
matrix_to_polar(ix, jx, &r, &s); // Converts polar coordinates
// specified by r and s, where
// s is the angle, to index coordinates
// specified by ix and jx.
// This function just converts to
// cartesian and then translates by 6.0.
if(r < rad)
{
matrix[ix][jx] = mode; // Turn pixel in matrix 'ON' or 'OFF'
}
}
}
}
I hope that's clear. It's pretty simple, but then I programmed it so I know how it's supposed to work. If you'd like more info / explanation then I can add some more code / comments.
It can be considered that drawing several circles, eg 4 to 6, is very slow... Hence I'm asking for advice on a more efficient algorithm to draw the circles.
EDIT: Managed to double the performance by making the following modification:
The function calling the drawing used to look like this:
for(;;)
{
clearAll(); // Clear matrix
for(int ix = 0; ix < 6; ++ ix)
{
rad[ix] += rad_incr_step;
drawRing(rad[ix], rad[ix] - rad_width);
}
if(rad[5] >= 7.0)
{
for(int ix = 0; ix < 6; ++ ix)
{
rad[ix] = rad_space_step * (float)(-ix);
}
}
writeAll(); // Write
}
I added the following check:
if(rad[ix] - rad_width < 7.0)
drawRing(rad[ix], rad[ix] - rad_width);
This increased the performance by a factor of about 2, but ideally I'd like to make the circle drawing more efficient to increase it further. This checks to see if the ring is completely outside of the screen.
EDIT 2: Similarly adding the reverse check increased performance further.
if(rad[ix] >= 0.0)
drawRing(rad[ix], rad[ix] - rad_width);
Performance is now pretty good, but again I have made no modifications to the actual drawing code of the circles and this is what I was intending to focus on with this question.
Edit 3: Matrix to polar:
inline void matrix_to_polar(int i, int j, float* r, float* s)
{
float x, y;
matrix_to_cartesian(i, j, &x, &y);
calcPolar(x, y, r, s);
}
inline void matrix_to_cartesian(int i, int j, float* x, float* y)
{
*x = getX(i);
*y = getY(j);
}
inline void calcPolar(float x, float y, float* r, float* s)
{
*r = sqrt(x * x + y * y);
*s = atan2(y, x);
}
inline float getX(int xc)
{
return (float(xc) - 6.0);
}
inline float getY(int yc)
{
return (float(yc) - 6.0);
}
In response to Clifford that's actually a lot of function calls if they are not inlined.
Edit 4: drawRing just draws 2 circles, firstly an outer circle with mode ON and then an inner circle with mode OFF. I am fairly confident that there is a more efficient method of drawing such a shape too, but that distracts from the question.
You're doing a lot of calculations that aren't really needed. For example, you're calculating the angle of the polar coordinates, but never use it. The square root can also easily be avoided by comparing the square of the values.
Without doing anything fancy, something like this should be a good start:
int intRad = (int)rad;
int intRadSqr = (int)(rad * rad);
for (int ix = 0; ix <= intRad; ++ix)
{
for (int jx = 0; jx <= intRad; ++jx)
{
if (ix * ix + jx * jx <= radSqr)
{
matrix[6 - ix][6 - jx] = mode;
matrix[6 - ix][6 + jx] = mode;
matrix[6 + ix][6 - jx] = mode;
matrix[6 + ix][6 + jx] = mode;
}
}
}
This does all the math in integer format, and takes advantage of the circle symmetry.
Variation of the above, based on feedback in the comments:
int intRad = (int)rad;
int intRadSqr = (int)(rad * rad);
for (int ix = 0; ix <= intRad; ++ix)
{
for (int jx = 0; ix * ix + jx * jx <= radSqr; ++jx)
{
matrix[6 - ix][6 - jx] = mode;
matrix[6 - ix][6 + jx] = mode;
matrix[6 + ix][6 - jx] = mode;
matrix[6 + ix][6 + jx] = mode;
}
}
Don't underestimate the cost of even basic arithmetic using floating point on a processor with no FPU. It seems unlikely that floating point is necessary, but the details of its use are hidden in your matrix_to_polar() implementation.
Your current implementation considers every pixel as a candidate - that is also unnecessary.
Using the equation y = cy ± √[rad2 - (x-cx)2] where cx, cy is the centre (7, 7 in this case), and a suitable integer square root implementation, the circle can be drawn thus:
void drawCircle( int rad, unsigned char mode )
{
int r2 = rad * rad ;
for( int x = 7 - rad; x <= 7 + rad; x++ )
{
int dx = x - 7 ;
int dy = isqrt( r2 - dx * dx ) ;
matrix[x][7 - dy] = mode ;
matrix[x][7 + dy] = mode ;
}
}
In my test I used the isqrt() below based on code from here, but given that the maximum r2 necessary is 169 (132, you could implement a 16 or even 8 bit optimised version if necessary. If your processor is 32 bit, this is probably fine.
uint32_t isqrt(uint32_t n)
{
uint32_t root = 0, bit, trial;
bit = (n >= 0x10000) ? 1<<30 : 1<<14;
do
{
trial = root+bit;
if (n >= trial)
{
n -= trial;
root = trial+bit;
}
root >>= 1;
bit >>= 2;
} while (bit);
return root;
}
All that said, on such a low resolution device, you will probably get better quality circles and faster performance by hand generating bitmap lookup tables for each radius required. If memory is an issue, then a single circle needs only 7 bytes to describe a 7 x 7 quadrant that you can reflect to all three quadrants, or for greater performance you could use 7 x 16 bit words to describe a semi-circle (since reversing bit order is more expensive than reversing array access - unless you are using an ARM Cortex-M with bit-banding). Using semi-circle look-ups, 13 circles would need 13 x 7 x 2 bytes (182 bytes), quadrant look-ups would be 7 x 8 x 13 (91 bytes) - you may find that is fewer bytes that the code space required to calculate the circles.
For a slow embedded device with only a 13x13 element display, you should really just make a look-up table. For example:
struct ComputedCircle
{
float rMax;
char col[13][2];
};
Where the draw routine uses rMax to determine which LUT element to use. For example, if you have 2 elements with one rMax = 1.4f, the other = 1.7f, then any radius between 1.4f and 1.7f will use that entry.
The column elements would specify zero, one, or two line segments per row, which can be encoded in the lower and upper 4 bits of each char. -1 can be used as a sentinel value for nothing-at-this-row. It is up to you how many look-up table entries to use, but with a 13x13 grid you should be able to encode every possible outcome of pixels with well under 100 entries, and a reasonable approximation using only 10 or so. You can also trade off compression for draw speed as well, e.g. putting the col[13][2] matrix in a flat list and encoding the number of rows defined.
I would accept MooseBoy's answer if only he explained the method he proposes better. Here's my take on the lookup table approach.
Solve it with a lookup table
The 13x13 display is quite small, and if you only need circles which are fully visible within this pixel count, you will get around with a quite small table. Even if you need larger circles, it should be still better than any algorithmic way if you need it to be fast (and have the ROM to store it).
How to do it
You basically need to define how each possible circle looks like on the 13x13 display. It is not sufficient to just produce snapshots for the 13x13 display, as it is likely you would like to plot the circles at arbitrary positions. My take for a table entry would look like this:
struct circle_entry_s{
unsigned int diameter;
unsigned int offset;
};
The entry would map a given diameter in pixels to offsets in a large byte table containing the shape of the circles. For example for diameter 9, the byte sequence would look like this:
0x1CU, 0x00U, /* 000111000 */
0x63U, 0x00U, /* 011000110 */
0x41U, 0x00U, /* 010000010 */
0x80U, 0x80U, /* 100000001 */
0x80U, 0x80U, /* 100000001 */
0x80U, 0x80U, /* 100000001 */
0x41U, 0x00U, /* 010000010 */
0x63U, 0x00U, /* 011000110 */
0x1CU, 0x00U, /* 000111000 */
The diameter specifies how many bytes of the table belong to the circle: one row of pixels are generated from (diameter + 7) >> 3 bytes, and the number of rows correspond to the diameter. The output code of these can be made quite fast, while the lookup table is sufficiently compact to get even larger than the 13x13 display circles defined in it if needed.
Note that defining circles this way for odd and even diameters may or may not appeal you when output by a centre location. The odd diameter circles will appear to have a centre in the "middle" of a pixel, while the even diameter circles will appear to have their centre on the "corner" of a pixel.
You may also find it nice later to refine the overall method so having multiple circles of different apparent sizes, but having the same pixel radius. Depends on what is your goal: if you want some kind of smooth animation, you may get there eventually.
Algorithmic solutions I think mostly will perform poorly here, since with this limited display surface really every pixel's state counts for the appearance.

Arduino mega queue

I wrote this simple code which reads a length from the Sharp infrared sensor, end presents the average meter in cm (unit) by serial.
When write this code for the Arduino Mega board, the Arduino starts a blinking LED (pin 13) and the program does nothing. Where is the bug in this code?
#include <QueueList.h>
const int ANALOG_SHARP = 0; //Set pin data from sharp.
QueueList <float> queuea;
float cm;
float qu1;
float qu2;
float qu3;
float qu4;
float qu5;
void setup() {
Serial.begin(9600);
}
void loop() {
cm = read_gp2d12_range(ANALOG_SHARP); //Convert to cm (unit).
queuea.push(cm); //Add item to queue, when I add only this line Arduino crash.
if ( 5 <= queuea.peek()) {
Serial.println(average());
}
}
float read_gp2d12_range(byte pin) { //Function converting to cm (unit).
int tmp;
tmp = analogRead(pin);
if (tmp < 3)
return -1; // Invalid value.
return (6787.0 /((float)tmp - 3.0)) - 4.0;
}
float average() { //Calculate average length
qu1 += queuea.pop();
qu2 += queuea.pop();
qu3 += queuea.pop();
qu4 += queuea.pop();
qu5 += queuea.pop();
float aver = ((qu1+qu2+qu3+qu4+qu5)/5);
return aver;
}
I agree with the peek() -> count() error listed by vhallac. But I'll also point out that you should consider averaging by powers of 2 unless there is a strong case to do otherwise.
The reason is that on microcontrollers, division is slow. By averaging over a power of 2 (2,4,8,16,etc.) you can simply calculate the sum and then bitshift it.
To calculate the average of 2: (v1 + v2) >> 1
To calculate the average of 4: (v1 + v2 + v3 + v4) >> 2
To calculate the average of n values (where n is a power of 2) just right bitshift the sum right by [log2(n)].
As long as the datatype for your sum variable is big enough and won't overflow, this is much easier and much faster.
Note: this won't work for floats in general. In fact, microcontrollers aren't optimized for floats. You should consider converting from int (what I'm assuming you're ADC is reading) to float at the end after the averaging rather than before.
By converting from int to float and then averaging floats you are losing more precision than averaging ints than converting the int to a float.
Other:
You're using the += operator without initializing the variables (qu1, qu2, etc.) -- it's good practice to initialize them if you're going to use += but it looks as if = would work fine.
For floats, I'd have written the average function as:
float average(QueueList<float> & q, int n)
{
float sum = 0;
for(int i=0; i<n; i++)
{
sum += q.pop();
}
return (sum / (float) n);
}
And called it: average(queuea, 5);
You could use this to average any number of sensor readings and later use the same code to later average floats in a completely different QueueList. Passing the number of readings to average as a parameter will really come in handy in the case that you need to tweak it.
TL;DR:
Here's how I would have done it:
#include <QueueList.h>
const int ANALOG_SHARP=0; // set pin data from sharp
const int AvgPower = 2; // 1 for 2 readings, 2 for 4 readings, 3 for 8, etc.
const int AvgCount = pow(2,AvgPow);
QueueList <int> SensorReadings;
void setup(){
Serial.begin(9600);
}
void loop()
{
int reading = analogRead(ANALOG_SHARP);
SensorReadings.push(reading);
if(SensorReadings.count() > AvgCount)
{
int avg = average2(SensorReadings, AvgPower);
Serial.println(gpd12_to_cm(avg));
}
}
float gp2d12_to_cm(int reading)
{
if(reading <= 3){ return -1; }
return((6787.0 /((float)reading - 3.0)) - 4.0);
}
int average2(QueueList<int> & q, int AvgPower)
{
int AvgCount = pow(2, AvgPower);
long sum = 0;
for(int i=0; i<AvgCount; i++)
{
sum += q.pop();
}
return (sum >> AvgPower);
}
You are using queuea.peek() to obtain the count. This will only return the last element in queue. You should use queuea.count() instead.
Also you might consider changing the condition tmp < 3 to tmp <= 3. If tmp is 3, you divide by zero.
Great improvement jedwards, however the first question I have is why use queuelist instead of an int array.
As an example I would do the following:
int average(int analog_reading)
{
#define NUM_OF_AVG 5
static int readings[NUM_OF_AVG];
static int next_position;
static int sum;
if (++next_position >= NUM_OF_AVG)
{
next_position=0;
}
reading[next_position]=analog_reading;
for(int i=0; i<NUM_OF_AVG; i++)
{
sum += reading[i];
}
average = sum/NUM_OF_AVG
}
Now I compute a new rolling average with every reading and it eliminates all the issues related to dynamic memory allocation (memory fragmentation, no available memory, memory leaks) in a embedded device.
I appreciate and understand the use of shifting for a division by 2,4 or 8, however I would stay away from that technique for two reasons.
I think readability and maintainability of the source code is more important then saving a little bit of time with a shift instead of a divide unless you can test and verify the divide is a bottleneck.
Second, I believe most current optimizing compilers will do a shift if possible, I know GCC does.
I will leave refactoring out the for loop for the next guy.