Undefined behavior with determinist procedure - c++

I am currently trying to implement a "cave generation" as a 2D array following the "Game of Life" ideas. The idea is as follow:
I have a 2d vector of 0s and 1s (which respectively represent air and block) randomly generated with a uniform_real_distribution with density (here 0.45, so 45% of the array will be 1).
After this we iterate x times on the array. An iteration looks as follow:
First, we copy the array on a new one.
Second, we iterate on the old array as follow: We look at the number of blocks on the neighbourhood of the block we're at, and depending on two things we do this:
IF the current tile is air and has more than 4 blocks in its neighbourhood (-1,-1) to (1,1) excluding himself, change it to a block in the NEW ARRAY
IF the current tile is a block and has less than 3 blocks in its neighbourhood, change it to air in the NEW ARRAY
Copy the new array in the old array
The problem is, that EVEN when I seed my uniform law with a determinist seed, sometimes (1 time over 3), the map will be completely filled with blocks after two or three iterations. I have literally 0 idea of why after looking at my code for many hours, and this is why I am here. There is the code:
cavefactory.h
#ifndef CAVEFACTORY_H_
#define CAVEFACTORY_H_
#include <vector>
namespace cavegenerator {
// define cave_t as a 2d vector of integers
using cave_t = std::vector<std::vector<int>>;
// constants
namespace DEFAULT {
constexpr unsigned short int WIDTH = 64;
constexpr unsigned short int HEIGHT = 64;
constexpr float DENSITY = 0.45;
constexpr unsigned short int BIRTH_LIMIT = 4;
constexpr unsigned short int DEATH_LIMIT = 3;
} // namespace DEFAULT
class CaveFactory {
public:
CaveFactory(unsigned short int width = DEFAULT::WIDTH,
unsigned short int height = DEFAULT::HEIGHT,
float density = DEFAULT::DENSITY);
// makes a cave with the desired number of iterations and parameters
static cave_t MakeCave(unsigned short int width = DEFAULT::WIDTH,
unsigned short int height = DEFAULT::HEIGHT,
float density = DEFAULT::DENSITY,
int iterations = 3,
unsigned short int bl = DEFAULT::BIRTH_LIMIT,
unsigned short int dl = DEFAULT::DEATH_LIMIT);
// implemented in case of generalization of cave(more than two blocks)
bool isSolid(int i, int j);
cave_t getCave();
void Print();
void Iterate( unsigned short int bl = DEFAULT::BIRTH_LIMIT,
unsigned short int dl = DEFAULT::DEATH_LIMIT );
private:
cave_t cave_;
int NumberOfNeighbours(int i, int j);
void Initialize(float density = DEFAULT::DENSITY);
};
} // namespace cavegenerator
#endif // CAVEFACTORY_H_
cavefactory.cc
#include "cavefactory.h"
#include <random>
#include <iostream>
#include <ctime>
#include <algorithm>
namespace cavegenerator {
CaveFactory::CaveFactory(unsigned short int width, unsigned short int height, float density) {
cave_.resize(width);
for (auto &i : cave_) {
i.resize(height);
}
Initialize(density);
}
bool CaveFactory::isSolid(int i, int j) {
return (cave_[i][j] == 1);
}
int CaveFactory::NumberOfNeighbours(int x, int y) {
int num = 0;
for (int i = -1; i < 2; i++) {
for (int j = -1; j < 2; j++) {
if ( i == 0 && j == 0 ) continue; // we don't want to count ourselve
// if out of bounds, add a solid neighbour
if ( x + i >= (int)cave_.size() || x + i < 0 || y + j >= (int)cave_[i].size() || y + j < 0) {
++num;
} else if (isSolid(x+i, y+j)) {
++num;
}
}
}
return num;
}
cave_t CaveFactory::getCave() {
return cave_;
}
void CaveFactory::Print() {
for (auto &i : cave_) {
for (auto &j : i) {
std::cout << ((j==1) ? "x" : " ");
}
std::cout << "\n";
}
return;
}
cave_t CaveFactory::MakeCave(unsigned short int width,
unsigned short int height,
float density,
int iterations,
unsigned short int bl,
unsigned short int dl)
{
CaveFactory cave(width, height, density);
for (int i = 0; i < iterations; i++) {
cave.Iterate(bl, dl);
}
return cave.getCave();
}
// Initlialize the cave with the specified density
void CaveFactory::Initialize(float density) {
std::mt19937 rd(4);
std::uniform_real_distribution<float> roll(0, 1);
for (auto &i : cave_) {
for (auto &j : i) {
if (roll(rd) < density) {
j = 1;
} else {
j = 0;
}
}
}
}
// for each cell in the original cave, if the cell is solid:
// if the number of solid neighbours is under the death limit, we kill the block
// if the cell is air, if the number of solid blocks is above the birth limit we place a block
void CaveFactory::Iterate(unsigned short int bl, unsigned short int dl) {
cave_t new_cave = cave_;
for (int i = 0; i < (int)cave_.size(); i++) {
for (int j = 0; j < (int)cave_[0].size(); j++) {
int number_of_neighbours = NumberOfNeighbours(i, j);
if (isSolid(i, j) && number_of_neighbours < dl) {
new_cave[i][j] = 0;
} else if (!isSolid(i,j) && number_of_neighbours > bl) {
new_cave[i][j] = 1;
}
}
}
std::copy(new_cave.begin(), new_cave.end(), cave_.begin());
}
} // namespace cavegenerator
main.cc
#include <iostream>
#include <vector>
#include <random>
#include <ctime>
#include <windows.h>
#include "cavefactory.h"
int main() {
cavegenerator::CaveFactory caveEE;
caveEE.Print();
for(int i = 0; i < 15; i++) {
caveEE.Iterate();
Sleep(600);
system("cls");
caveEE.Print();
}
return 0;
}
I know windows.h is a bad habit, I just used it for debugging.
I hope someone can make me understand, maybe it's just a normal behavior I'm not aware of?
Thank you very much.

(int)cave_[i].size() in NumberOfNeighbours is incorrect, it should be (int)cave_[x+i].size() (or (int)cave_[0].size() since all rows and columns are equal size). When i equals -1 you have an out of bounds vector access and undefined behaviour.

Related

Trying to convert C++ binary array to hexadecimal then print results

I'm wanting to print an int array with 8 bytes and convert it to bin & hex with the output as such:
0 00000000 00
1 00000001 07
...
I've finished creating the binary convert function. I want to use the same function as the binary conversion -with an array, but check the left half with the right half and solve each different sided of the 8 bytes; left most -3 and right most is -7.
What am I doing wrong? I cannot figure out how to implement it and I know my hex function is all out of wack.
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
const int num = 8; //may not be needed -added for hex
void Generatebinary(int arr[]);
void GeneratehexDec(int arr[]);
void print_binary(int arr[]); //haven't created yet
int main()
{
int arr[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
int i = 1;
while (i < 256)
{
Generatebinary(arr);
print_binary(arr); //not created yet
GeneratehexDec(arr);
i++;
}
}
void Generatebinary(int arr[])
{
for (int i = 7; i > 0; i--)
{
if (arr[i] == 1)
arr[i] = 0;
else if (arr[i] == 0)
{
arr[i] = 1;
break;
}
}
}
void GereatehexDec(int num)
{ //improper use
int a;
int i;
int answer[] = { };
a = num % 16;
i++;
answer[i] = num;
for (int i = num; i > 0; i--)
{
cout << answer[i];
}
cout << a;
}
First of all, you can't do int answer[] = { }; an array has to be pre-allocated (indicating how many elements it will store) or has to be dynamically allocated during run-time, then be freed, you have to manage your memory, not forget to deallocate... It's for this reason that Stroustrup tells you to not use arrays unless necessary. Use std::vector
void GereatehexDec(int num)
{ //improper use
int a = 0; // always initialize your variables
int i = 0; // is this i supposed to be the same as the i in the for loop below?
std::vector<int> answer;
a = num % 16;
i++; // this doesn't make sense
answer.at(i) = num;
for (int i = num; i > 0; i--) // what about the i variable you declared previously?
{
cout << answer.at(i);
}
cout << a;
}
Here's a template function that could help you (converts numbers into string hex)
template <typename I> std::string n2hexstr(I w, size_t hex_len = sizeof(I) << 1) {
static const char* digits = "0123456789ABCDEF";
std::string rc(hex_len, '0');
for (size_t i = 0, j = (hex_len - 1) * 4; i<hex_len; ++i, j -= 4)
rc[i] = digits[(w >> j) & 0x0f];
return "0x" + rc;
}
int main() {
std::cout << n2hexstr(127);
}

Need to insert random numbers of 1-100 into a three dimensional array

I need to create a program in which random numbers between 1 and 100 are placed in each dimension of a 3D array. The arrays are of varying sizes, and thus far have only encountered crashes upon execution. Tried on a smaller scale with a 1D array and got it to work fine. Cant seem to translate on larger scale. My code so far...
int const STOCK_AMOUNT = 1000, DAY_AMOUNT = 366, TIME_AMOUNT = 480;
int randomGenerator();
void randomInsert(int array0[DAY_AMOUNT][TIME_AMOUNT][STOCK_AMOUNT]);
int main()
{
int cube[DAY_AMOUNT][TIME_AMOUNT][STOCK_AMOUNT];
srand((unsigned)time(0));
randomInsert(cube);
return 0;
}
void randomInsert(int array0[DAY_AMOUNT][TIME_AMOUNT][STOCK_AMOUNT])
{
for (int count1 = 0; count1 < DAY_AMOUNT; count1++)
{
for (int count2 = 0; count2 < TIME_AMOUNT; count2++)
{
for (int count3 = 0; count3 < STOCK_AMOUNT; count3++)
{
int randomGenerator();
array0[count1][count2][count3] = randomGenerator();
cout << endl;
}
}
}
}
int randomGenerator()
{
int randNum;
int lowerLimit = 1;
int upperLimit = 100;
randNum = (rand() % upperLimit) + lowerLimit;
return randNum;
}
You seem to be exceeding stack size. your array, created on the stack, holds about 175M integers, that is, about 700MB of memory. You need to setup compilation options to increase the stack size.
EDIT: moreover, please be aware that putting such huge arrays on the stack is generally considered bad practice. Ideally, use STL vectors, that is the modern way to deal with arrays.
Here’s a more complex version using the STL:
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <ctime>
#include <functional>
#include <iostream>
#include <random>
using std::cout;
using std::endl;
// The size of a type that can hold values from 1 to 100. Used for storage. (Copy to int for calculation.)
using elem_t = int_least8_t;
// Lower and upper bounds:
constexpr int lb = 1;
constexpr int ub = 100;
constexpr size_t stocks = 100, days = 300, times = 400;
using pricearray_t = elem_t[days][times][stocks];
pricearray_t& init_prices()
/* Returns a heap-allocated array that must be deleted with delete[]. */
{
// This ugly little cast is brought to us by the C++ rules for array types.
pricearray_t &to_return = *(pricearray_t*) new pricearray_t;
const std::default_random_engine::result_type seed = std::time(NULL) * CLOCKS_PER_SEC + std::clock();
std::default_random_engine generator(seed);
std::uniform_int_distribution<int_fast8_t> distribution( lb, ub );
auto x = std::bind( distribution, generator );
for ( size_t i = 0; i < days; ++i )
for ( size_t j = 0; j < times; ++j )
for ( size_t k = 0; k < stocks; ++k )
to_return[i][j][k] = static_cast<elem_t>(x());
return to_return;
}
int main(void)
{
const pricearray_t &prices = init_prices();
long long int sum = 0;
for ( size_t i = 0; i < days; ++i )
for ( size_t j = 0; j < times; ++j )
for ( size_t k = 0; k < stocks; ++k ) {
const int x = prices[i][j][k];
assert( x >= lb );
assert( x <= ub );
sum += x;
}
cout << "The mean is " << static_cast<double>(sum) / days / times / stocks << "." << endl;
delete[] &prices;
return EXIT_SUCCESS;
}
Here’s a version that uses smart pointers to manage memory automatically:
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <ctime>
#include <functional>
#include <iostream>
#include <memory>
#include <random>
using std::cout;
using std::endl;
// The size of a type that can hold values from 1 to 100. Used for storage. (Copy to int for calculation.)
using elem_t = int_least8_t;
// Lower and upper bounds:
constexpr int lb = 1;
constexpr int ub = 100;
constexpr size_t stocks = 100, days = 300, times = 400;
// The unique_ptr type doesn’t play nicely with arrays of known size.
using pricearray_t = elem_t[][times][stocks];
std::unique_ptr<pricearray_t> init_prices()
/* Returns a managed pointer to an array of uniformly-distributed values.
*/
{
// This smart pointer will use its move constructor to avoid copying the entire array.
std::unique_ptr<pricearray_t> to_return = std::make_unique<pricearray_t>(days);
const std::default_random_engine::result_type seed = std::time(NULL) * CLOCKS_PER_SEC + std::clock();
std::default_random_engine generator(seed);
std::uniform_int_distribution<int_fast8_t> distribution( lb, ub );
auto x = std::bind( distribution, generator );
for ( size_t i = 0; i < days; ++i )
for ( size_t j = 0; j < times; ++j )
for ( size_t k = 0; k < stocks; ++k )
to_return[i][j][k] = static_cast<elem_t>(x());
return to_return;
}
int main(void)
{
/* The contents of the smart pointer will be deleted automatically when it goes out of scope.
*/
const std::unique_ptr<pricearray_t> prices = init_prices();
long long int sum = 0;
for ( size_t i = 0; i < days; ++i )
for ( size_t j = 0; j < times; ++j )
for ( size_t k = 0; k < stocks; ++k ) {
const int x = prices[i][j][k];
assert( x >= lb );
assert( x <= ub );
sum += x;
}
cout << "The mean is " << static_cast<double>(sum) / days / times / stocks << "." << endl;
return EXIT_SUCCESS;
}
It seems the specifications for the original cube provided by the professor were clearly much too large....odd how a professor wouldnt catch that? Scaled back the specs, and now have something that works. The bit in the main is supposed to find the average of the stock price each day(50) for each stock(100).
#include<iostream>
#include<ctime>
#include <cstdlib>
#include <fstream>
using namespace std;
const int STOCK_AMOUNT = 100, DAY_AMOUNT = 50, TIME_AMOUNT = 8;
int randomGenerator();
void priceGenerator(int [STOCK_AMOUNT][DAY_AMOUNT][TIME_AMOUNT]);
int main()
{
ofstream outputFile;
int cube[STOCK_AMOUNT][DAY_AMOUNT][TIME_AMOUNT];
double total;
srand((unsigned)time(0));
priceGenerator(cube);
outputFile.open("Average_Day_Price");
for (int row = 0; row < STOCK_AMOUNT; row++)
{
total=0;
for (int col = 0; col < DAY_AMOUNT; col++)
{
for (int layer = 0; layer < TIME_AMOUNT; layer++)
{
total = cube[row][col][layer];
double average = (total / TIME_AMOUNT);
outputFile << "STOCK Id:" << (row+1) << "--" << "--" << average <<endl:
}
}
}
outputFile.close();
return 0;
}
void priceGenerator(int array0[STOCK_AMOUNT][DAY_AMOUNT][TIME_AMOUNT])
{
int i,y, z;
for ( i = 0; i < STOCK_AMOUNT; i++)
{
for ( y = 0; y < DAY_AMOUNT; y++)
{
for (z = 0; z < TIME_AMOUNT; z++)
{
int randNum;
int lowerLimit = 1;
int upperLimit = 100;
randNum = (rand() % upperLimit) + lowerLimit;
array0[i][y][z] = randNum;
}
}
}
}

Can't understand why my program throws error

My code is in
#include <iostream>
#include <string>
#include <algorithm>
#include <climits>
#include <vector>
#include <cmath>
using namespace std;
struct State {
int v;
const State *rest;
void dump() const {
if(rest) {
cout << ' ' << v;
rest->dump();
} else {
cout << endl;
}
}
State() : v(0), rest(0) {}
State(int _v, const State &_rest) : v(_v), rest(&_rest) {}
};
void ss(int *ip, int *end, int target, const State &state) {
if(target < 0) return; // assuming we don't allow any negatives
if(ip==end && target==0) {
state.dump();
return;
}
if(ip==end)
return;
{ // without the first one
ss(ip+1, end, target, state);
}
{ // with the first one
int first = *ip;
ss(ip+1, end, target-first, State(first, state));
}
}
vector<int> get_primes(int N) {
int size = floor(0.5 * (N - 3)) + 1;
vector<int> primes;
primes.push_back(2);
vector<bool> is_prime(size, true);
for(long i = 0; i < size; ++i) {
if(is_prime[i]) {
int p = (i << 1) + 3;
primes.push_back(p);
// sieving from p^2, whose index is 2i^2 + 6i + 3
for (long j = ((i * i) << 1) + 6 * i + 3; j < size; j += p) {
is_prime[j] = false;
}
}
}
}
int main() {
int N;
cin >> N;
vector<int> primes = get_primes(N);
int a[primes.size()];
for (int i = 0; i < primes.size(); ++i) {
a[i] = primes[i];
}
int * start = &a[0];
int * end = start + sizeof(a) / sizeof(a[0]);
ss(start, end, N, State());
}
It takes one input N (int), and gets the vector of all prime numbers smaller than N.
Then, it finds the number of unique sets from the vector that adds up to N.
The get_primes(N) works, but the other one doesn't.
I borrowed the other code from
How to find all matching numbers, that sums to 'N' in a given array
Please help me.. I just want the number of unique sets.
You've forgotten to return primes; at the end of your get_primes() function.
I'm guessing the problem is:
vector<int> get_primes(int N) {
// ...
return primes; // missing this line
}
As-is, you're just writing some junk here:
vector<int> primes = get_primes(N);
it's undefined behavior - which in this case manifests itself as crashing.

radix select using cuda

I have been working to develop a radix select using CUDA which utilizes k smallest element to sort given number of elements. The main idea behind this radix select is that is scans through 32 bit integer starting from its MSB to LSB. It partitions all 0 bit on left side and all 1 bit on the right side. The side with contains k smallest elements is solved recursively. My partition process works just fine but I am having problem dealing with recursive function calls. I am unable to stop the recursion. Please help me on that!
My kernel function looks like this: This is kernel.h
#include "header.h"
#define WARP_SIZE 32
#define BLOCK_SIZE 32
__device__ int Partition(int *d_DataIn, int firstidx, int lastidx, int k, int N, int bit)
{
int threadID = threadIdx.x + BLOCK_SIZE * blockIdx.x;
int WarpID = threadID >> 5;
int LocWarpID = threadID - 32 * WarpID;
int NumWarps = N / WARP_SIZE;
int pivot;
__shared__ int DataPartition[BLOCK_SIZE];
__shared__ int DataBinary[WARP_SIZE];
for(int i = 0; i < NumWarps; i++)
{
if(LocWarpID >= firstidx && LocWarpID <=lastidx)
{
int r = d_DataIn[i * WARP_SIZE + LocWarpID];
int p = (r>>(31-bit))&1;
unsigned int B = __ballot(p);
unsigned int B_flip = ~B;
if(p==1)
{
int b = B << (32-LocWarpID);
int RightLoc = __popc(b);
DataPartition[lastidx - RightLoc] = r;
}
else
{
int b_flip = B_flip << (32 - LocWarpID);
int LeftLoc = __popc(b_flip);
DataPartition[LeftLoc] = r;
}
if(LocWarpID <= lastidx - __popc(B))
{
d_DataIn[LocWarpID] = DataPartition[LocWarpID];
}
else
{
d_DataIn[LocWarpID] = DataPartition[LocWarpID];
}
pivot = lastidx - __popc(B);
return pivot+1;
}
}
}
__device__ int RadixSelect(int *d_DataIn, int firstidx, int lastidx, int k, int N, int bit)
{
if(firstidx == lastidx)
return *d_DataIn;
int q = Partition(d_DataIn, firstidx, lastidx, k, N, bit);
int length = q - firstidx;
if(k == length)
return *d_DataIn;
else if(k < length)
return RadixSelect(d_DataIn, firstidx, q-1, k, N, bit+1);
else
return RadixSelect(d_DataIn, q, lastidx, k-length, N, bit+1);
}
__global__ void radix(int *d_DataIn, int firstidx, int lastidx, int k, int N, int bit)
{
RadixSelect(d_DataIn, firstidx, lastidx, k, N, bit);
}
Host code is main.cu and it looks like:
#include "header.h"
#include <iostream>
#include <fstream>
#include "kernel.h"
#define BLOCK_SIZE 32
using namespace std;
int main()
{
int N = 32;
thrust::host_vector<float>h_HostFloat(N);
thrust::counting_iterator <unsigned int> Numbers(0);
thrust::transform(Numbers, Numbers + N, h_HostFloat.begin(), RandomFloatNumbers(1.f, 100.f));
thrust::host_vector<int>h_HostInt(N);
thrust::transform(h_HostFloat.begin(), h_HostFloat.end(), h_HostInt.begin(), FloatToInt());
thrust::device_vector<float>d_DeviceFloat = h_HostFloat;
thrust::device_vector<int>d_DeviceInt(N);
thrust::transform(d_DeviceFloat.begin(), d_DeviceFloat.end(), d_DeviceInt.begin(), FloatToInt());
int *d_DataIn = thrust::raw_pointer_cast(d_DeviceInt.data());
int *h_DataOut;
float *h_DataOut1;
int fsize = N * sizeof(float);
int size = N * sizeof(int);
h_DataOut = new int[size];
h_DataOut1 = new float[fsize];
int firstidx = 0;
int lastidx = BLOCK_SIZE-1;
int k = 20;
int bit = 1;
int NUM_BLOCKS = N / BLOCK_SIZE;
radix <<< NUM_BLOCKS, BLOCK_SIZE >>> (d_DataIn, firstidx, lastidx, k, N, bit);
cudaMemcpy(h_DataOut, d_DataIn, size, cudaMemcpyDeviceToHost);
WriteData(h_DataOut1, h_DataOut, 10, N);
return 0;
}
List of headers that I used:
#include "cuda.h"
#include "cuda_runtime_api.h"
#include "device_launch_parameters.h"
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/generate.h>
#include "functor.h"
#include <thrust/iterator/counting_iterator.h>
#include <thrust/copy.h>
#include <thrust/device_ptr.h>
Another header file "functor.h" to convert floating point numbers to int type and to generate random floating numbers.
#include <thrust/random.h>
#include <sstream>
#include <fstream>
#include <iomanip>
struct RandomFloatNumbers
{
float a, b;
__host__ __device__
RandomFloatNumbers(float _a, float _b) : a(_a), b(_b) {};
__host__ __device__
float operator() (const unsigned int n) const{
thrust::default_random_engine rng;
thrust::uniform_real_distribution<float> dist(a,b);
rng.discard(n);
return dist(rng);
}
};
struct FloatToInt
{
__host__ __device__
int operator() (const float &x)
const {
union {
float f_value;
int i_value;
} value;
value.f_value = x;
return value.i_value;
}
};
float IntToFloat(int &x)
{
union{
float f_value;
int i_value;
}value;
value.i_value = x;
return value.f_value;
}
bool WriteData(float *h_DataOut1, int *h_DataOut, int bit, int N)
{
std::ofstream data;
std::stringstream file;
file << "out\\Partition_";
file << std::setfill('0') <<std::setw(2) << bit;
file << ".txt";
data.open((file.str()).c_str());
if(data.is_open() == false)
{
std::cout << "File is not open" << std::endl;
return false;
}
for(int i = 0; i < N; i++)
{
h_DataOut1[i] = IntToFloat(h_DataOut[i]);
//cout << h_HostFloat[i] << " \t" << h_DataOut1[i] << endl;
//std::bitset<32>bitshift(h_DataOut[i]&1<<31-bit);
//data << bitshift[31-bit] << "\t" <<h_DataOut1[i] <<std::endl;
data << h_DataOut1[i] << std::endl;
}
data << std::endl;
data.close();
std::cout << "Partition=" <<bit <<"\n";
return true;
}
Per your request, I'm posting the code I used to investigate this and help me in studying your code.
#include <stdio.h>
#include <stdlib.h>
__device__ int gpu_partition(unsigned int *data, unsigned int *partition, unsigned int *ones, unsigned int* zeroes, int bit, int idx, unsigned int* warp_ones){
int one = 0;
int valid = 0;
int my_one, my_zero;
if (partition[idx]){
valid = 1;
if(data[idx] & (1ULL<<(31-bit))) one=1;}
__syncthreads();
if (valid){
if (one){
my_one=1;
my_zero=0;}
else{
my_one=0;
my_zero=1;}
}
else{
my_one=0;
my_zero=0;}
ones[idx]=my_one;
zeroes[idx]=my_zero;
unsigned int warp_one = __popc(__ballot(my_one));
if (!(threadIdx.x & 31))
warp_ones[threadIdx.x>>5] = warp_one;
__syncthreads();
// reduce
for (int i = 16; i > 0; i>>=1){
if (threadIdx.x < i)
warp_ones[threadIdx.x] += warp_ones[threadIdx.x + i];
__syncthreads();}
return warp_ones[0];
}
__global__ void gpu_radixkernel(unsigned int *data, unsigned int m, unsigned int n, unsigned int *result){
__shared__ unsigned int loc_data[1024];
__shared__ unsigned int loc_ones[1024];
__shared__ unsigned int loc_zeroes[1024];
__shared__ unsigned int loc_warp_ones[32];
int l=0;
int bit = 0;
unsigned int u = n;
if (n<2){
if ((n == 1) && !(threadIdx.x)) *result = data[0];
return;}
loc_data[threadIdx.x] = data[threadIdx.x];
loc_ones[threadIdx.x] = (threadIdx.x<n)?1:0;
__syncthreads();
unsigned int *next = loc_ones;
do {
int s = gpu_partition(loc_data, next, loc_ones, loc_zeroes, bit++, threadIdx.x, loc_warp_ones);
if ((u-s) > m){
u = (u-s);
next = loc_zeroes;}
else{
l = (u-s);
next = loc_ones;}}
while ((u != l) && (bit<32));
if (next[threadIdx.x]) *result = loc_data[threadIdx.x];
}
int partition(unsigned int *data, int l, int u, int bit){
unsigned int *temp = (unsigned int *)malloc(((u-l)+1)*sizeof(unsigned int));
int pos = 0;
for (int i = l; i<=u; i++)
if(data[i] & (1ULL<<(31-bit))) temp[pos++] = data[i];
int result = u-pos;
for (int i = l; i<=u; i++)
if(!(data[i] & (1ULL<<(31-bit)))) temp[pos++] = data[i];
pos = 0;
for (int i = u; i>=l; i--)
data[i] = temp[pos++];
free(temp);
return result;
}
unsigned int radixselect(unsigned int *data, int l, int u, int m, int bit){
if (l == u) return(data[l]);
if (bit > 32) {printf("radixselect fail!\n"); return 0;}
int s = partition(data, l, u, bit);
if (s>=m) return radixselect(data, l, s, m, bit+1);
return radixselect(data, s+1, u, m, bit+1);
}
int main(){
unsigned int data[8] = {32767, 22, 88, 44, 99, 101, 0, 7};
unsigned int data1[8];
for (int i = 0; i<8; i++){
for (int j=0; j<8; j++) data1[j] = data[j];
printf("value[%d] = %d\n", i, radixselect(data1, 0, 7, i, 0));}
unsigned int *d_data;
cudaMalloc((void **)&d_data, 1024*sizeof(unsigned int));
unsigned int h_result, *d_result;
cudaMalloc((void **)&d_result, sizeof(unsigned int));
cudaMemcpy(d_data, data, 8*sizeof(unsigned int), cudaMemcpyHostToDevice);
for (int i = 0; i < 8; i++){
gpu_radixkernel<<<1,1024>>>(d_data, i, 8, d_result);
cudaMemcpy(&h_result, d_result, sizeof(unsigned int), cudaMemcpyDeviceToHost);
printf("gpu result index %d = %d\n", i, h_result);
}
unsigned int data2[1024];
unsigned int data3[1024];
for (int i = 0; i < 1024; i++) data2[i] = rand();
cudaMemcpy(d_data, data2, 1024*sizeof(unsigned int), cudaMemcpyHostToDevice);
for (int i = 0; i < 1024; i++){
for (int j = 0; j<1024; j++) data3[j] = data2[j];
unsigned int cpuresult = radixselect(data3, 0, 1023, i, 0);
gpu_radixkernel<<<1,1024>>>(d_data, i, 1024, d_result);
cudaMemcpy(&h_result, d_result, sizeof(unsigned int), cudaMemcpyDeviceToHost);
if (h_result != cpuresult) {printf("mismatch at index %d, cpu: %d, gpu: %d\n", i, cpuresult, h_result); return 1;}
}
printf("Finished\n");
return 0;
}
Here are some notes, in no particular order:
I got rid of all your thrust code, it's not doing anything useful as far as the radix select algorithm is concerned. I also find your casting of float to int curious. I haven't thought through the ramifications of trying to do a bitwise radix select in order on a sequence of exponent bits followed by a sequence of mantissa bits. It might work, (although I think if you include the sign bit, it definitely won't work) but again I don't think it's central to understanding the algorithm.
I included a host version that I wrote just to check my device results.
I'm pretty sure this algorithm will fail in some cases where there are duplicated elements. For example, if you hand it a vector of all zeroes, I think it will fail. I don't think it would be difficult to handle that case however.
my host version is recursive, but my device version is not. I don't see that recursion is that useful here, since the non-recursive form of the algorithm is easy to write as well, especially since there are at most 32 bits to travel through. Still, if you wanted to create a recursive device version, it should not be difficult, by incorporating the u,s, and l manipulation code inside the partition function.
I have dispensed with typical cuda error checking. However I recommend it.
I don't consider this to be a paragon of cuda programming. If you delve into for example a radix sort algorithm (such as here), you will see that it is pretty complex. A fast GPU radix select would look nothing like my code. I wrote my code to be analogous to the serial recursive partitioned radix sort, which is not the best way to do it on a massively parallel architecture.
Since radix select is not a sort, I attempted to write a device code that would do no data movement of the input data, since I considered this to be expensive and unnecessary. I do a single read from global memory for the data at the beginning of the kernel, and thereafter I do all work out of shared memory, and even in shared memory I am not re-arranging the data (as I do in my host version) so as to avoid the cost of data movement. Instead I keep flag arrays of ones and zeroes partitions, to feed to the next partitioning step. The data movement would involve a fair amount of uncoalesced and/or bank-conflicted traffic, whereas the flag arrays allow all accesses to be non-bank-conflicted.

Empty Destructor Crashing Program: C++

The following program calculates all primes for really large numbers (eg. 600,851,475,143). Everything works right so far except when I put in large numbers the destructor is crashing the application. Can anyone see something wrong with my application?
After rechecking my solution the answer is wrong but the question still is valid.
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <cmath>
#include <stdexcept>
#include <climits>
typedef std::vector<unsigned long long>::const_iterator prime_it;
#define MAX_COL 900000
struct large_vector
{
public:
large_vector(unsigned long long size, unsigned int row) :
m_Row(row)
{
m_RowVector.reserve(size);
}
std::vector<bool> m_RowVector;
unsigned int m_Row;
};
struct prime_factor
{
public:
prime_factor(unsigned long long N);
~prime_factor() {}
void print_primes();
private:
std::vector<bool> m_Primes;
std::vector<large_vector>m_Vect_Primes;
unsigned long long m_N;
};
prime_factor::prime_factor(unsigned long long N) :
m_N(N)
{
// If number is odd then we need the cieling of N/2 / MAX_COL
int number_of_vectors = (m_N % MAX_COL == 0) ? (m_N / MAX_COL) : ((m_N / MAX_COL) + 1);
std::cout << "There will be " << number_of_vectors << " rows";
if (number_of_vectors != 0) {
for (int x = 0; x < number_of_vectors; ++x) {
m_Vect_Primes.push_back(large_vector(MAX_COL, x));
}
m_Vect_Primes[0].m_RowVector[0] = false;
m_Vect_Primes[0].m_RowVector[1] = false;
unsigned long long increment = 2;
unsigned long long index = 0;
while (index < m_N) {
for (index = 2*increment; index < m_N; index += increment) {
unsigned long long row = index/MAX_COL;
unsigned long long col = index%MAX_COL;
m_Vect_Primes[row].m_RowVector[col] = true;
}
while (m_Vect_Primes[increment/MAX_COL].m_RowVector[increment%MAX_COL]) {
increment++;
}
}
}
}
void prime_factor::print_primes()
{
for (int index = 0; index < m_N; ++index) {
if (m_Vect_Primes[index/MAX_COL].m_RowVector[index%MAX_COL] == false) {
std::cout << index << " ";
}
}
}
/*!
* Driver
*/
int main(int argc, char *argv[])
{
static const unsigned long long N = 600851475143;
prime_factor pf(N);
pf.print_primes();
}
Update
I am pretty sure this is a working version:
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <cmath>
#include <stdexcept>
#include <climits>
typedef std::vector<unsigned long long>::const_iterator prime_it;
#define MAX_COL 900000
struct large_vector
{
public:
large_vector(unsigned long long size, unsigned int row) :
m_Row(row)
{
m_RowVector.resize(size);
}
std::vector<bool> m_RowVector;
unsigned int m_Row;
};
struct prime_factor
{
public:
prime_factor(unsigned long long N);
~prime_factor() {}
void print_primes();
private:
std::vector<bool> m_Primes;
std::vector<large_vector>m_Vect_Primes;
unsigned long long m_N;
};
prime_factor::prime_factor(unsigned long long N) :
m_N(N)
{
// If number is odd then we need the cieling of N/2 / MAX_COL
int number_of_vectors = (m_N % MAX_COL == 0) ? ((m_N/2) / MAX_COL) : (((m_N/2) / MAX_COL) + 1);
std::cout << "There will be " << number_of_vectors << " rows";
if (number_of_vectors != 0) {
for (int x = 0; x < number_of_vectors; ++x) {
m_Vect_Primes.push_back(large_vector(MAX_COL, x));
}
m_Vect_Primes[0].m_RowVector[0] = false;
m_Vect_Primes[0].m_RowVector[1] = false;
unsigned long long increment = 2;
unsigned long long index = 0;
while (index < m_N) {
for (index = 2*increment; index < m_N/2; index += increment) {
unsigned long long row = index/MAX_COL;
unsigned long long col = index%MAX_COL;
m_Vect_Primes[row].m_RowVector[col] = true;
}
increment += 1;
while (m_Vect_Primes[increment/MAX_COL].m_RowVector[increment%MAX_COL]) {
increment++;
}
}
}
}
void prime_factor::print_primes()
{
for (unsigned long long index = 0; index < m_N/2; ++index) {
if (m_Vect_Primes[index/MAX_COL].m_RowVector[index%MAX_COL] == false) {
std::cout << index << " ";
}
}
}
/*!
* Driver
*/
int main(int argc, char *argv[])
{
static const unsigned long long N = 400;
prime_factor pf(N);
pf.print_primes();
}
Your usage of reserve is incorrect.
m_RowVector.reserve(size);
Here m_RowVector has space reserved so that the vector can grow without being re-allocated. BUT the size of m_RowVector is still 0 and thus accessing any elements is still undefined. You must change the size of the array with either resize() or push_back() to put elements into the vector.
I can't see anything wrong but I am sure that you have other index beyond the end of vector problems. I would change the use of operator[] into the method at() this will throw an exception when you access elements of the end of the vector and give you a clue to the actual location of the error.