writing Bytes into a .Bin file - c++

I have a vector in C++ that I want to write it to a .bin file.
this vector's type is byte, and the number of bytes could be huge, maybe millions.
I am doing it like this:
if (depthQueue.empty())
return;
FILE* pFiledep;
pFiledep = fopen("depth.bin", "wb");
if (pFiledep == NULL)
return;
byte* depthbuff = (byte*) malloc(depthQueue.size() * 320 * 240 * sizeof(byte));
if(depthbuff)
{
for(int m = 0; m < depthQueue.size(); m++)
{
byte b = depthQueue[m];
depthbuff[m] = b;
}
fwrite(depthbuff, sizeof(byte),
depthQueue.size() * 320 * 240 * sizeof(byte), pFiledep);
fclose(pFiledep);
free(depthbuff);
}
depthQueue is my vector which contains bytes and lets say its size is 100,000.
Sometimes I don't get this error, but the bin file is empty.
Sometime I get heap error.
Somtimes when I debug this, it seems that malloc doesn't allocate the space.
Is the problem is with space?
Or is chunk of sequential memory is so long and it can't write in bin?

You don't need hardly any of that. vector contents are guaranteed to be contiguous in memory, so you can just write from it directly:
fwrite(&depthQueue[0], sizeof (Byte), depthQueue.size(), pFiledep);
Note a possible bug in your code: if the vector is indeed vector<Byte>, then you should not be multiplying its size by 320*240.
EDIT: More fixes to the fwrite() call: The 2nd parameter already contains the sizeof (Byte) factor, so don't do that multiplication again in the 3rd parameter either (even though sizeof (Byte) is probably 1 so it doesn't matter).

Related

Element-wise shifting from smaller array to a larger array

I am programming an ESP32 in the Arduino framework. For my application, I need to create a buffer which will store information from both the present and the last time it was accessed. Here is what I am attempting to do.
//first buffer
char buffer1[4];
//second buffer
char buffer2[8];
void setup {
//setup
}
//buffer1 values will change with each iteration of loop from external inputs
//buffer2 must store most recent values of buffer1 plus values of buffer1 from when loop last ran
for example:
**loop first iteration**
void loop {
buffer1[0] = {1};
buffer1[1] = {2};
buffer1[2] = {3};
buffer1[3] = {1};
saveold(); //this is the function I'm trying to implement to save values to buffer2 in an element-wise way
}
//value of buffer2 should now be: buffer2 = {1,2,3,1,0,0,0,0}
**loop second iteration**
void loop {
buffer1[0] = {2};
buffer1[1] = {3};
buffer1[2] = {4};
buffer1[3] = {2};
saveold();
}
//value of buffer2 should now be: buffer2 = {2,3,4,2,1,2,3,1}
From what I've been able to understand through searching online, the "saveold" function I'm trying to make
should implement some form of memmove for these array operations
I've tried to piece it together, but I always overwrite the value of buffer2 instead of somehow shifting new values in, while retaining the old ones
This is all I've got:
void saveold() {
memmove(&buffer2[0], &buffer1[0], (sizeof(buffer1[0]) * 4));
}
From my understanding, this copies buffer1 starting from index position 0 to buffer2, starting at index position 0, for 4 bytes (where 1 char = 1 byte).
Computer science is not my backround, so perhaps there is some fundamental solution or strategy that I am missing. Any pointers would be appreciated.
You have multiple options to implement saveold():
Solution 1
void saveold() {
// "shift" lower half into upper half, saving recent values (actually it's a copy)
buffer2[4] = buffer2[0];
buffer2[5] = buffer2[1];
buffer2[6] = buffer2[2];
buffer2[7] = buffer2[3];
// copy current values
buffer2[0] = buffer[0];
buffer2[1] = buffer[1];
buffer2[2] = buffer[2];
buffer2[3] = buffer[3];
}
Solution 2
void saveold() {
// "shift" lower half into upper half, saving recent values (actually it's a copy)
memcpy(buffer2 + 4, buffer2 + 0, 4 * sizeof buffer2[0]);
// copy current values
memcpy(buffer2 + 0, buffer1, 4 * sizeof buffer1[0]);
}
Some notes
There are even more ways to do it. Anyway, choose the one you understand best.
Be sure that buffer2 is exactly double size of buffer1.
memcpy() can be used safely if source and destination don't overlap. memmove() checks for overlaps and reacts accordingly.
&buffer1[0] is the same as buffer1 + 0. Feel free to use the expression you better understand.
sizeof is an operator, not a function. So sizeof buffer[0] evaluates to the size of buffer[0]. A common and most accepted expression to calculate the size of an array dimension is sizeof buffer1 / sizeof buffer1[0]. You only need parentheses if you evaluate the size of a data type, like sizeof (int).
Solution 3
The last note leads directly to this improvement of solution 1:
void saveold() {
// "shift" lower half into upper half, saving recent values
size_t size = sizeof buffer2 / sizeof buffer2[0];
for (int i = 0; i < size / 2; ++i) {
buffer2[size / 2 + i] = buffer2[i];
}
// copy current values
for (int i = 0; i < size / 2; ++i) {
buffer2[i] = buffer1[i];
}
}
To apply this knowledge to solution 2 is left as an exercise for you. ;-)
The correct way to do this is to use buffer pointers, not by doing hard-copy backups. Doing hardcopies with memcpy is particularly bad on slow legacy microcontrollers such as AVR. Not quite sure what MCU this ESP32 got, seems to be some oddball one from Tensilica. Anyway, this answer applies universally for any processor where you have more data than CPU data word length.
perhaps there is some fundamental solution or strategy that I am missing.
Indeed - it really sounds that what you are looking for is a ring buffer. That is, an array of fixed size which has a pointer to the beginning of the valid data, and another pointer at the end of the data. You move the pointers, not the data. This is much more efficient both in terms of execution speed and RAM usage, compared to making naive hardcopies with memcpy.

file size and buffer overshoot

I have a function which opens a file from an SD card, uses the file size to set the size of a buffer, writes a block of information to that buffer, then does something with that information, as shown in this code:
char filename = "filename.txt";
uint16_t duration;
uint16_t pixel;
int q = 0;
int w = 0;
bool largefile;
File f;
int readuntil;
long large_buffer;
f = SD.open(filename);
if(f.size() > 3072) {
w = 3072;
} else {
w = f.size();
}
uint8_t buffer[w];
while(f.available()) {
f.read(buffer, sizeof(buffer));
while(q < sizeof(buffer)) {
doStuffWithInformation(buffer[q++]);
}
q=0;
}
f.close;
This works great with smaller file sizes, but anything over the hard limit buffer size of 3072 (which I arrived at empirically, its just the amount of memory that can be safely committed to this function), runs into a problem. Larger files read fine until they hit the last loop of while(f.available()), where they read the end of the file, but then continue reading the buffer, the tail end of which is filled with data from the last loop, that wasn't overwritten by the latest f.read(). How can I make sure that the last loop of the while(f.available()) function only works with the information that was written to the buffer during the current loop? My only idea right now is to solve for factors of the file size, and set the buffer size as the largest factor less than 3072, but this seems intensive to run every time this function is called. Is there an elegant solution staring me in the face?
Your program is not behaving correctly because f.read() is not guaranteed to read the whole buffer. Moreover, it is bound to happen when you read the last chunk of the file, unless the file size is a factor of buffer size (3072 in your case).
While Arduino specification (https://www.arduino.cc/en/Reference/FileRead) doesn't say so, SD.read function returns the number of bytes read. See code of the library here: https://github.com/arduino-libraries/SD/blob/master/src/utility/SdFile.cpp, int16_t SdFile::read(void* buf, uint16_t nbyte)
Knowing that, you should change your loop as following (while also rewriting it as a for loop for better readability and removing q definition above):
while(f.available()) {
uint16_t sz = f.read(buffer, sizeof(buffer));
for (uint16_t q = 0; q < sz; ++q) {
doStuffWithInformation(buffer[q]);
}
}
On a side note, now, when you have this logic in place, it would make sense for you to do away with variable length array and use a fixed buffer of size 512 - the standard sector size on the SD card. Most likely, it will yield the same performance in regards to read, and slightly better performance in regards to sizeof, which will becomes a compile-time constant rather than a run-time calculation. This also makes your program simpler. This makes for following code:
f = SD.open(filename);
...
uint8_t buffer[512];

C++ Optimal Block Size For Reading From A File

I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.
There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.
You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.

Use of fread(), fwrite() and malloc in C++

I'm having amazing difficulties using the fread and fwrite functions in C++.
The project is writing a rudimentary FAT16 file system and we are restricted to using fread and fwrite.
When I am initially writing the file my code looks like this:
directoryTable = (directoryEntry *)malloc(clusterSize);
for (int i = 0; i < bootRecord[0] / 128 ; ++i){
directoryEntry * newEntry = (directoryEntry *)malloc(sizeof(directoryEntry));
newEntry->name = (char *)malloc(112);
newEntry->name[0] = 'a'
write(1, &(newEntry->name[1]), 1);
newEntry->size = 0;
newEntry->type = 0;
newEntry->creation = 0x0000;
newEntry->index = 0;
fwrite(newEntry->name, 112, 1, fp);
fwrite(&newEntry->size, sizeof(int), 1, fp);
fwrite(&newEntry->type, sizeof(int), 1, fp);
fwrite(&newEntry->creation, sizeof(int), 1, fp);
fwrite(&newEntry->index, sizeof(int), 1, fp);
directoryTable[i] = *newEntry;
}
When I'm assigning the first character of directoryEntry->name to 'a', my intent is actually to assign it the value 0x00 so I can check later if it's null. I simply am using a right for debugging purposes. My problem is when I read it, I get nothing back.
And when I'm reading my code looks like this:
fseek(fp, clusterSize * root, SEEK_SET);
for(int i = 0; i < clusterSize / 128; ++i){
directoryEntry * newEntry = (directoryEntry *) malloc(128);
newEntry->name = (char *) malloc(112);
fread(newEntry->name, 112, 1, fp);
write(1, newEntry->name[0],1);
fread(&newEntry->size, sizeof(int), 1, fp);
fread(&newEntry->type, sizeof(int), 1, fp);
fread(&newEntry->creation, sizeof(int), 1, fp);
fread(&newEntry->index, sizeof(int), 1, fp);
directoryTable[i] = *newEntry;
}
It should be noted that the values of clusterSize and root are also read in using similar methods. I've already checked and their values are accurate in both situations. Since I was able to read those in without a problem, I have no idea why I'm having such a big problem now. I feel my use of malloc is not quite right, I've never worked with it before.
Also, here is a definition of directoryTable if needed:
typedef struct{
char * name;
unsigned int index;
unsigned int size;
unsigned int type;
unsigned int creation;
} directoryEntry;
Thank you guys for you time and if you need me to clarify on anything I'd be happy to.
I see a lot of little problems:
You do not zero out the memory you malloc'd for name. Try memset(newentry->name, 0, 112);
You are writing the second element of the name array when you write it (index 1), and the first element of the name array when you read it (index 0): write(1, &(newEntry->name[1]), 1); vs. write(1, newEntry->name[0],1);
Additionally in the first write you are taking the address of it, and in the second write you are not, so it looks likes the second one should read: write(1, &(newEntry->name[0]),1);
The call to fseek looks correct, but as a reader of the question I have no way of knowing if clusterSize * root is the correct offset into the file you are reading from. You also do not seek to that offset when you start writing the file (or at least don't show it), so it becomes murkier.
What exactly does the 128 you keep dividing by in your for loops represent? Also in one spot you divide clusterSize by it and in another bootrecord[0]. I assume those are the same size, but...
In the read loop you malloc 128 bytes of memory, but in the write loop you malloc (sizeof(directoryEntry). Those are two different sizes, and malloc'ing the 128 bytes is wrong, since the structure contains a pointer to the 112 bytes, and not the actual 112 bytes for name.
I am not sure if any of these are causing the problem you are seeing, which you don't actually specify by the way, but it may at least get you pointed in the right direction.

I need to create a very large array of bits/boolean values. How would I do this in C/C++?

Is it even possible to create an array of bits with more than 100000000 elements? If it is, how would I go about doing this? I know that for a char array I can do this:
char* array;
array = (char*)malloc(100000000 * sizeof(char));
If I was to declare the array by char array[100000000] then I would get a segmentation fault, since the maximum number of elements has been exceeded, which is why I use malloc.
Is there something similar I can do for an array of bits?
If you are using C++, std::vector<bool> is specialized to pack elements into a bit map. Of course, if you are using C++, you need to stop using malloc.
You could try looking at boost::dynamic_bitset. Then you could do something like the following (taken from Boost's example page):
boost::dynamic_bitset<> x(100000000); // all 0's by default
x[0] = 1;
x[1] = 1;
x[4] = 1;
The bitset will use a single bit for each element so you can store 32 items in the space of 4 bytes, decreasing the amount of memory required considerably.
In C and C++, char is the smallest type. You can't directly declare an array of bits. However, since an array of any basic type is fundamentally made of bits, you can emulate them, something like this (code untested):
unsigned *array;
array = (unsigned *) malloc(100000000 / sizeof(unsigned) + 1);
/* Retrieves the value in bit i */
#define GET_BIT(array, i) (array[i / sizeof(unsigned)] & (1 << (i % sizeof(unsigned))))
/* Sets bit i to true*/
#define SET_BIT(array, i) (array[i / sizeof(unsigned)] |= (1 << (i % sizeof(unsigned))))
/* Sets bit i to false */
#define CLEAR_BIT(array, i) (array[i / sizeof(unsigned)] &= ~(1 << (i % sizeof(unsigned))))
The segmentation fault you noticed is due to running out of stack space. Of course you can't declare a local variable that is 12.5 MB in size (100 million bits), let alone 100MB in size (100 million bytes) in a thread with a stack of ~ 4 MB. Should work as a global variable, although then you may end up with a 12 or 100 MB executable file -- still not a good idea. Dynamic allocation is definitely the right thing to do for large buffers like that.
If it is allowed to use STL, then I would use std::bitset.
(For 100,000,000 bits, it would use 100000000 / 32 unsigned int underneath, each storing 32 bits.)
std::vector<bool>, already mentioned, is another good solution.
There are a few approaches to creating a bitmap in C++.
If you already know the size of bitmap at compile time, you can use the STL, std::bitset template.
This is how you would do it with bitset
std::bitset<100000000> array
Otherwise, if the size of the bitmap changes dynamically during runtime, you can use std::vector<bool> or boost::dynamic_bitset as recommended here http://en.cppreference.com/w/cpp/utility/bitset (See note at the bottom)
Yes but it's going to be a little bit more complicated !
The better way to store bits is to use the bits into the char itself !
So you can store 8 bits in a char !
Which will "only" require 12'500'000 octets !
Here is some documentation about binaries : http://www.somacon.com/p125.php
You should look on google :)
Other solution:
unsigned char * array;
array = (unsigned char *) malloc ( 100000000 / sizeof(unsigned char) + 1);
bool MapBit ( unsigned char arraybit[], DWORD position, bool set)
{
//work for 0 at 4294967295 bit position
//calc bit position
DWORD bytepos = ( position / 8 );
//
unsigned char bitpos = ( position % 8);
unsigned char bit = 0x01;
//get bit
if ( bitpos )
{
bit = bit << bitpos;
}
if ( set )
{
arraybit [ bytepos ] |= bit;
}
else
{
//get
if ( arraybit [ bytepos ] & bit )
return true;
}
return false;
}
I'm fond of the bitarray that's in the open source fxt library at http://www.jjj.de/fxt/. It's simple, efficient and contained in a few headers, so it's easy to add to your project. Plus there's many complementary functions to use with the bitarray (see http://www.jjj.de/bitwizardry/bitwizardrypage.html).