Assigning and retrieving bit-wise memory value for Genetic Algo - c++

I came across this code for developing a class for GA/GP but failed to understand it and hence unable debug the program.
typedef struct {
void *dataPointer;
int length;
} binary_data;
typedef struct {
organism *organisms; //This must be malloc'ed
int organismsCount;
int (*fitnessTest)(organism org);
int orgDnaLength;
unsigned int desiredFitness;
void (*progress)(unsigned int fitness);
} evolutionary_algorithm;
The above is straight forward. Then we try to initiate organism before testing their fitnness etc...
int main(int argc, char *argv[])
{
srand(time(NULL));
int i;
evolutionary_algorithm ea;
ea.progress = progressDisplayer;
ea.organismsCount = 50;
ea.orgDnaLength = sizeof(unsigned int);
organism *orgs =(organism *) malloc(sizeof(organism) * ea.organismsCount);
for (i = 0; i < 50; i++)
{
organism newOrg;
binary_data newOrgDna;
newOrgDna.dataPointer = malloc(sizeof(unsigned int));
memset(newOrgDna.dataPointer, i, 1);
newOrgDna.length = sizeof(unsigned int);
newOrg.dna = newOrgDna;
orgs[i] = newOrg;
}
As far as i understand is the memset() tries to write a binary value into that memory location void pointer (newOrgDna.dataPointer) and so on. But i cant figure how to reassemble all those binary values to get the integer value assigned to variable "dna" of newOrg so that i check the integer value assign to the an individual organism and eventually the entire population residing in the entire memory location which has been assigned to "orgs".
As you guess from above, i not very familiar memory management at this deep level of details so your help is very much appreciated.
Thank you so much

This code looks a bit strange. This line:
newOrgDna.dataPointer = malloc(sizeof(unsigned int));
will allocate probably 4 bytes (or 8 on 64 bit machines). Strange part is that memset in line just below will set only first byte.
To get actual value you might do:
char val = *((char*) newOrgDna.dataPointer);
But, as I said, this code looks a bit off. I would rewrite it as:
for (i = 0; i < 50; i++)
{
organism newOrg;
binary_data newOrgDna;
unsigned int * data = (unsigned int*) malloc(sizeof(unsigned int));
*data = i;
newOrgDna.length = sizeof(*data);
newOrgDna.data = (void*) data; // I think that cast can be dropped
newOrg.dna = newOrgDna;
orgs[i] = newOrg;
}
Then everywhere you want to get data from organism * you can do:
void f( organism * o )
{
assert( sizeof(unsigned int) == o->dna.length );
unsigned int data = *((unsigned int*) o->dna.data);
}
Also this is rather a C question not C++.

Related

Value at index in *unsigned char

I have a class that holds audio data bytes:
class clsAudioData
{
private:
unsigned char *m_content;
long m_size;
public:
clsAudioData();
~clsAudioData();
void Load(string file);
long Size();
unsigned char *Content();
void LoadContent(long size, FILE *f);
};
void clsAudioData::LoadContent(long size, FILE *f)
{
m_size =size;
m_content = new unsigned char[m_size];
fread(m_content, sizeof(unsigned char), m_size,f);
}
I'm trying to printf values at certain positions.
To do that, I tried:
for (int i = 0; i < 20; i++)
{
printf("audio data = %d\n", nAudioData.Content[i]);
}
The compiler tells me:
clsAudioData::Content Function doesn't accept 1 argument
How could I access an "element" at a certain index to printf it?
Thank you.
You'll have to call the Content function: nAudioData.Content()[i].
Aside: please make m_content a std::vector<unsigned char>. You'll have a lot less chance of leaking memory.

Malloc outside of main() or any other function (i.e. at global scope)

I wanted a heap allocated buffer common to a class (to use as a scratchpad during computations). At some point I may free and then reallocate the buffer if it is not large enough. I wanted the buffer to exist without having to call a "myclass::initialize();" in main(); I came up with the following code that compiles and works well for my purpose.
My questions are: Why does this code compile correctly? Why is malloc() allowed to be outside of main() or any other function? Is the compiler interpreting this somehow and removing the malloc?
Code compiled on linux 64bit using "g++ example.cpp" and checked with valgrind
// example.cpp
#include <cstdio>
#include <cstdlib>
class myclass {
public:
static char* pbuf; // buffer
static unsigned int length; // buffer length
const static unsigned int chunk_size; // allocation chunck size
};
// set constants and allocate buffer
const unsigned int myclass::chunk_size = sizeof(long unsigned int) * 8;
unsigned int myclass::length = chunk_size; // start with smallest chunk
char* myclass::pbuf = (char*)malloc(sizeof(char)*myclass::length);
int main() {
// write to buffer (0 to 63 on 64bit machine)
for (int i = 0; i < myclass::length; i++) {
*(myclass::pbuf+i) = i;
}
// read from buffer (print the numbers 0 to 63)
for (int i = 0; i < myclass::length; i++) {
printf("%d\n", *(myclass::pbuf+i));
}
free(myclass::pbuf); // last line of program
}
Thanks for the answers. Sound like this is more common than I thought. "Functions calls are allowed in static initializers". This leads me to a slightly modified version catching a possible malloc error:
#include <cstdio>
#include <cstdlib>
class myclass {
public:
static char* pbuf; // buffer
static unsigned int length; // buffer length
const static unsigned int chunk_size; // allocation chunck size
static void* malloc_buf(unsigned int);
};
// set constants and allocate buffer
const unsigned int myclass::chunk_size = sizeof(long unsigned int) * 8;
unsigned int myclass::length = chunk_size; // start with smallest chunk
//char* myclass::pbuf = (char*)malloc(sizeof(char)*myclass::length);
char* myclass::pbuf = (char*)myclass::malloc_buf(sizeof(char)*myclass::length);
void* myclass::malloc_buf(unsigned int N) {
void* buf = malloc(N);
if (!buf) exit(EXIT_FAILURE);
return buf;
}
int main() {
// write to buffer (0 to 63 on 64bit machine)
for (int i = 0; i < myclass::length; i++) {
*(myclass::pbuf+i) = i;
}
// read from buffer (print the numbers 0 to 63)
for (int i = 0; i < myclass::length; i++) {
printf("%d\n", *(myclass::pbuf+i));
}
free(myclass::pbuf); // last line of program
}
It's just doing static initialization (initialization before main is called). Static initializers are allowed to call functions.
main() is just another function - which is why it has such specific requirements placed on it to allow it to be called properly.
Other things can and do happen before it gets called. Static initialization among them.

Why does VS clam this violates memory?

I needed to enforce some SSE memory boundaries for the code i'm writing but i'm having some trouble with Visual Studio's memory checker. I'm wondering why VS believes there is memory getting corrupted?
#define sse_t float* __restrict
#include <iostream>
#include <assert.h>
#include <stdio.h>
using namespace std;
class AlignedContainer {
public:
AlignedContainer(int n = 0, int frames = 0, size_t align = 16) {
assert((align & (align - 1)) == 0);
int bufferSize = sizeof(float) * n;
for (int i = 0; i < frames; i++) {
int alignedSize = bufferSize + 15;
auto aqbuf = new unsigned char[alignedSize];
auto aligned = reinterpret_cast < unsigned char *>((reinterpret_cast < size_t > (aqbuf) + 15) & ~15); // 16 bit alignment in preperation for SSE
memset(aqbuf, 0, alignedSize); // for debugging, forces memory to instantly allocate
AcqBuffers.push_back(aqbuf);
displayFrames.push_back(aligned);
}
}
~AlignedContainer() {
for (int i = 0; i < AcqBuffers.size(); i++) {
delete[]AcqBuffers[i];
}
AcqBuffers.empty();
displayFrames.empty();
}
inline sse_t operator [] (int i) const {
return (sse_t) displayFrames[i];
}
private:
vector < void *>displayFrames;
vector < void *>AcqBuffers;
};
int main(int argc, char *argv[])
{
int h = 2160;
int w = 2544;
AlignedContainer ac;
ac = AlignedContainer(h * w, 4);
}
Error at the last line.
/***
*static int CheckBytes() - verify byte range set to proper value
*
*Purpose:
* verify byte range set to proper value
*
*Entry:
* unsigned char *pb - pointer to start of byte range
* unsigned char bCheck - value byte range should be set to
* size_t nSize - size of byte range to be checked
*
*Return:
* TRUE - if all bytes in range equal bcheck
* FALSE otherwise
*
*******************************************************************************/
extern "C" static int __cdecl CheckBytes(
unsigned char * pb,
unsigned char bCheck,
size_t nSize
)
{
while (nSize--)
{
if (*pb++ != bCheck)
{
return FALSE;
}
}
return TRUE;
}
The statement
ac = AlignedContainer(h * w, 4);
first creates a temporary object, that is copied (with the copy-assignment operator) to ac. But because you don't provide the copy-assignment operator, the default one is invoked, and that does just do a shallow copying. So when the temporary object is destroyed the memory allocated by the temporary is deleted, so the ac object have pointers to unallocated memory, which it then tries to delete itself.
You need to read about the rule of three.
When i tried to run your code i discovered the following:
Your code is missing the rvalue assignment operator. Without it, it appears that the content of AcqBuffers gets moved when you call ac = AlignedContainer(h * w, 4);
Somehow the class still hols the content of AcqBuffers (after being moved) deleting it when destroyed. When the destructor of ac gets called then the destructor delets AcqBuffers again causing runtime error.
To fix this you need to add this:
AlignedContainer& operator = (AlignedContainer && rv)
{
displayFrames = std::move(rv.displayFrames);
AcqBuffers = std::move(rv.AcqBuffers);
return (*this);
}
Raxvan.

How to cast from char pointer to custom object pointer

I'm using leveldb to store key-value pairs of integer and MyClass objects. Actually, a key can contain more then one of theses objects.
The problem I have appears when retrieving the data from the database. It compiles, however the values of the MyClass members are not the one I put into the database.
std::string value;
leveldb::Slice keySlice = ANYKEY;
levelDBObj->Get(leveldb::ReadOptions(), keySlice, &value);
The std::string value1 can now contain only one MyClass object or more. So how do I get them?
I already tried the following which didn't work;
1.) directly typecasting and memcpy
std::vector<MyClass> vObjects;
MyClass* obj = (MyClass*)malloc( value.size());
memcpy((void*)obj, (void*) (value.c_str()), value.size());
MyClass dummyObj;
int numValues = value.size()/sizeof(MyClass);
for( int i=0; i<numValues; ++i) {
dummyObj = *(obj+i);
vObjects.push_back(dummyObj);
}
2.) reinterpret_cast to void pointer
MyClass* obj = (MyClass*)malloc( value.size());
const void* vobj = reinterpret_cast<const void*>( value.c_str() );
int numValues = value.size()/sizeof(MyClass);
for( int i=0; i<numValues; ++i) {
const MyClass dummyObj = *(reinterpret_cast<const MyClass*>(vobj)+i);
vObjects.push_back(dummyObj);
}
MyClass is a collection of several public members, e.g. unsigned int and unsigned char and it has a stable size.
I know that there are similar problems with only one object. But in my case the vector can contain more then one and it comes from the leveldb database.
EDIT: SOLUTION
I wrote (de)serialization method for MyClass which then made it working. Thanks for the hint!
void MyClass::serialize( char* outBuff ) {
memcpy(outBuff, (const void*) &aVar, sizeof(aVar));
unsigned int c = sizeof(aVar);
memcpy(outBuff+c, (const void*) &bVar, sizeof(bVar));
c += sizeof(bVAr);
/* and so on */
}
void MyClass::deserialize( const char* inBuff ) {
memcpy((void*) &aVar, inBuff, sizeof(aVar));
unsigned int c = sizeof(aVar);
memcpy((void*) &aVar, inBuff+c, sizeof(aVar));
c += sizeof(aVar);
/* and so on */
}
The get method is as follows (put analogously):
int getValues(leveldb::Slice keySlice, std::vector<MyObj>& values) const {
std::string value;
leveldb::Status status = levelDBObj->Get(leveldb::ReadOptions(), keySlice, &value);
if (!status.ok()) {
values.clear();
return -1;
}
int nValues = value1.size()/sizeof(CHit);
MyObj dummyObj;
for( int i=0; i<nValues; ++i) {
dummyObj.deserialize(value.c_str()+i*sizeof(MyObj));
values.push_back(dummyObj);
}
return 0;
}
You have to serialize your class... otherwise, you're just taking some memory and writing it in leveldb. Whatever you get back is not only going to be different, but it will probably be completely useless too. Check out this question for more info on serialization: How do you serialize an object in C++?
LevelDB does support multiple objects under one key, however, try to avoid doing that unless you have a really good reason. I would recommend that you hash each object with a unique hash (see Google's CityHash if you want a hashing function) and store the serialized objects with their corresponding hash. If your objects is a collection in itself, then you have to serialize all of your objects to an array of bytes and have some method that allows you to determine where each object begins/ends.
Update
A serializable class would look something like this:
class MyClass
{
private:
int _numeric;
string _text;
public:
// constructors
// mutators
void SetNumeric(int num);
void SetText(string text);
static unsigned int SerializableSize()
{
// returns the serializable size of the class with the schema:
// 4 bytes for the numeric (integer)
// 4 bytes for the unsigned int (the size of the text)
// n bytes for the text (it has a variable size)
return sizeof(int) + sizeof(unsigned int) + _text.size();
}
// serialization
int Serialize(const char* buffer, const unsigned int bufferLen, const unsigned int position)
{
// check if the object can be serialized in the available buffer space
if(position+SerializableSize()>bufferLen)
{
// don't write anything and return -1 signaling that there was an error
return -1;
}
unsigned int finalPosition = position;
// write the numeric value
*(int*)(buffer + finalPosition) = _numeric;
// move the final position past the numeric value
finalPosition += sizeof(int);
// write the size of the text
*(unsigned int*)(buffer + finalPosition) = (unsigned int)_text.size();
// move the final position past the size of the string
finalPosition += sizeof(unsigned int);
// write the string
memcpy((void*)(buffer+finalPosition), _text.c_str(), (unsigned int)_text.size());
// move the final position past the end of the string
finalPosition += (unsigned int)_text.size();
// return the number of bytes written to the buffer
return finalPosition-position;
}
// deserialization
static int Deserialize(MyClass& myObject,
const char* buffer,
const unsigned int buffSize,
const unsigned int position)
{
insigned int currPosition = position;
// copy the numeric value
int numeric = *(int*)(buffer + currentPosition);
// increment the current position past the numeric value
currentPosition += sizeof(int);
// copy the size of the text
unsigned int textSize = *(unsigned int*)(buffer + currentPosition);
// increment the current position past the size of the text
currentPosition += sizeof(unsigned int);
// copy the text
string text((buffer+currentPosition), textSize);
if(currentPosition > buffSize)
{
// you decide what to do here
}
// Set your object's values
myObject.SetNumeric(numeric);
myObject.SetText(text);
// return the number of bytes deserialized
return currentPosition - position;
}
};

efficent way to save objects into binary files

I've a class that consists basically of a matrix of vectors: vector< MyFeatVector<T> > m_vCells, where the outer vector represents the matrix. Each element in this matrix is then a vector (I extended the stl vector class and named it MyFeatVector<T>).
I'm trying to code an efficient method to store objects of this class in binary files.
Up to now, I require three nested loops:
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
where this->at(dy,dx,dz) retrieves the dz element of the vector at position [dy,dx].
Is there any possibility to store the m_vCells private member without using loops? I tried something like: foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), (this->m_vCells.size())*sizeof(CFeatureVector<T>)); which seems not to work correctly. We can assume that all the vectors in this matrix have the same size, although a more general solution is also welcomed :-)
Furthermore, following my nested-loop implementation, storing objects of this class in binary files seem to require more physical space than storing the same objects in plain-text files. Which is a bit weird.
I was trying to follow the suggestion under http://forum.allaboutcircuits.com/showthread.php?t=16465 but couldn't arrive into a proper solution.
Thanks!
Below a simplified example of my serialization and unserialization methods.
template < typename T >
bool MyFeatMatrix<T>::writeBinary( const string & ofile ){
ofstream foutput(ofile.c_str(), ios::out|ios::binary);
foutput.write(reinterpret_cast<char*>(&this->m_nHeight), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nWidth), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nDepth), sizeof(int));
//foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), nSze*sizeof(CFeatureVector<T>));
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
foutput.close();
return true;
}
template < typename T >
bool MyFeatMatrix<T>::readBinary( const string & ifile ){
ifstream finput(ifile.c_str(), ios::in|ios::binary);
int nHeight, nWidth, nDepth;
finput.read(reinterpret_cast<char*>(&nHeight), sizeof(int));
finput.read(reinterpret_cast<char*>(&nWidth), sizeof(int));
finput.read(reinterpret_cast<char*>(&nDepth), sizeof(int));
this->resize(nHeight, nWidth, nDepth);
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
finput.read( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
finput.close();
return true;
}
A most efficient method is to store the objects into an array (or contiguous space), then blast the buffer to the file. An advantage is that the disk platters don't have waste time ramping up and also the writing can be performed contiguously instead of in random locations.
If this is your performance bottleneck, you may want to consider using multiple threads, one extra thread to handle the output. Dump the objects into a buffer, set a flag, then the writing thread will handle the output, releaving your main task to perform more important tasks.
Edit 1: Serializing Example
The following code has not been compiled and is for illustrative purposes only.
#include <fstream>
#include <algorithm>
using std::ofstream;
using std::fill;
class binary_stream_interface
{
virtual void load_from_buffer(const unsigned char *& buf_ptr) = 0;
virtual size_t size_on_stream(void) const = 0;
virtual void store_to_buffer(unsigned char *& buf_ptr) const = 0;
};
struct Pet
: public binary_stream_interface,
max_name_length(32)
{
std::string name;
unsigned int age;
const unsigned int max_name_length;
void load_from_buffer(const unsigned char *& buf_ptr)
{
age = *((unsigned int *) buf_ptr);
buf_ptr += sizeof(unsigned int);
name = std::string((char *) buf_ptr);
buf_ptr += max_name_length;
return;
}
size_t size_on_stream(void) const
{
return sizeof(unsigned int) + max_name_length;
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
*((unsigned int *) buf_ptr) = age;
buf_ptr += sizeof(unsigned int);
std::fill(buf_ptr, 0, max_name_length);
strncpy((char *) buf_ptr, name.c_str(), max_name_length);
buf_ptr += max_name_length;
return;
}
};
int main(void)
{
Pet dog;
dog.name = "Fido";
dog.age = 5;
ofstream data_file("pet_data.bin", std::ios::binary);
// Determine size of buffer
size_t buffer_size = dog.size_on_stream();
// Allocate the buffer
unsigned char * buffer = new unsigned char [buffer_size];
unsigned char * buf_ptr = buffer;
// Write / store the object into the buffer.
dog.store_to_buffer(buf_ptr);
// Write the buffer to the file / stream.
data_file.write((char *) buffer, buffer_size);
data_file.close();
delete [] buffer;
return 0;
}
Edit 2: A class with a vector of strings
class Many_Strings
: public binary_stream_interface
{
enum {MAX_STRING_SIZE = 32};
size_t size_on_stream(void) const
{
return m_string_container.size() * MAX_STRING_SIZE // Total size of strings.
+ sizeof(size_t); // with room for the quantity variable.
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
// Treat the vector<string> as a variable length field.
// Store the quantity of strings into the buffer,
// followed by the content.
size_t string_quantity = m_string_container.size();
*((size_t *) buf_ptr) = string_quantity;
buf_ptr += sizeof(size_t);
for (size_t i = 0; i < string_quantity; ++i)
{
// Each string is a fixed length field.
// Pad with '\0' first, then copy the data.
std::fill((char *)buf_ptr, 0, MAX_STRING_SIZE);
strncpy(buf_ptr, m_string_container[i].c_str(), MAX_STRING_SIZE);
buf_ptr += MAX_STRING_SIZE;
}
}
void load_from_buffer(const unsigned char *& buf_ptr)
{
// The actual coding is left as an exercise for the reader.
// Psuedo code:
// Clear / empty the string container.
// load the quantity variable.
// increment the buffer variable by the size of the quantity variable.
// for each new string (up to the quantity just read)
// load a temporary string from the buffer via buffer pointer.
// push the temporary string into the vector
// increment the buffer pointer by the MAX_STRING_SIZE.
// end-for
}
std::vector<std::string> m_string_container;
};
I'd suggest you to read C++ FAQ on Serialization and you can choose what best fits for your
When you're working with structures and classes, you've to take care of two things
Pointers inside the class
Padding bytes
Both of these could make some notorious results in your output. IMO, the object must implement to serialize and de-serialize the object. The object can know well about the structures, pointers data etc. So it can decide which format can be implemented efficiently.
You will have to iterate anyway or has to wrap it somewhere. Once you finished implementing the serialization and de-serialization function (either you can write using operators or functions). Especially when you're working with stream objects, overloading << and >> operators would be easy to pass the object.
Regarding your question about using underlying pointers of vector, it might work if it's a single vector. But it's not a good idea in the other way.
Update according to the question update.
There are few things you should mind before overriding STL members. They're not really a good candidate for inheritance because it doesn't have any virtual destructors. If you're using basic data types and POD like structures it wont make much issues. But if you use it truly object oriented way, you may face some unpleasant behavior.
Regarding your code
Why you're typecasting it to char*?
The way you serialize the object is your choice. IMO what you did is a basic file write operation in the name of serialization.
Serialization is down to the object. i.e the parameter 'T' in your template class. If you're using POD, or basic types no need of special synchronization. Otherwise you've to carefully choose the way to write the object.
Choosing text format or binary format is your choice. Text format has always has a cost at the same time it's easy to manipulate it rather than binary format.
For example the following code is for simple read and write operation( in text format).
fstream fr("test.txt", ios_base::out | ios_base::binary );
for( int i =0;i <_countof(arr);i++)
fr << arr[i] << ' ';
fr.close();
fstream fw("test.txt", ios_base::in| ios_base::binary);
int j = 0;
while( fw.eof() || j < _countof(arrout))
{
fw >> arrout[j++];
}
It seems to me, that the most direct root to generate a binary file containing a vector is to memory map the file and place it in the mapped region. As pointed out by sarat, you need to worry about how pointers are used within the class. But, boost-interprocess library has a tutorial on how to do this using their shared memory regions which include memory mapped files.
First off, have you looked at Boost.multi_array? Always good to take something ready-made rather than reinventing the wheel.
That said, I'm not sure if this is helpful, but here's how I would implement the basic data structure, and it'd be fairly easy to serialize:
#include <array>
template <typename T, size_t DIM1, size_t DIM2, size_t DIM3>
class ThreeDArray
{
typedef std::array<T, DIM1 * DIM2 * DIM3> array_t;
array_t m_data;
public:
inline size_t size() const { return data.size(); }
inline size_t byte_size() const { return sizeof(T) * data.size(); }
inline T & operator()(size_t i, size_t j, size_t k)
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T & operator()(size_t i, size_t j, size_t k) const
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T * data() const { return m_data.data(); }
};
You can serialize the data buffer directly:
ThreeDArray<int, 4, 6 11> arr;
/* ... */
std::ofstream outfile("file.bin");
outfile.write(reinterpret_cast<char*>(arr.data()), arr.byte_size());