C++ unable to handle large boolean arrays [duplicate]

C++ unable to handle large boolean arrays [duplicate] - c++

This question already has answers here:
Segmentation fault when trying to create a buffer of 100MB
(5 answers)
Closed 1 year ago.
I am writing a function that requires me to turn a file into an array of 0s and 1s. I figured this was most easily accomplished using a bool array. However, the below code fails and crashes for files larger than ~1MB. My machine has 8GB RAM, so I see no reason for the crash.
string file_name;
cin >> file_name;
string text = read_file(file_name); //I have defined this function as returning a string containing the file's contents and it works fine when tested separately
int length = text.size();
bool bin_arr[8*length]; //to store 0s and 1s
The initialisation of bin_arr fails, and the program simply exits.
I will be handling files larger than 1GB or so. However, I have no idea why this is happening. I am fairly new to C++.
In case it is relevant, I am on Windows 10, using GCC version 6.3.0.

I am writing a function that requires me to turn a file into an array of 0s and 1s. I figured this was most easily accomplished using a bool array.
I don't know why you figured that, the easiest way is to store a file as bytes, exactly as the file contains. vector<bool> does this, for example, it doesn't store each bit in an individual byte like your code, and is thus 8 times more memory efficient.
To get individual bits, either use the aforementioned vector<bool> or use regular bit fiddling. Remember that b & (1 << bit_number) returns a non-zero value if that bit is set in your byte.
I will be handling files larger than 1GB or so.
Then don't store the entire file in memory, stream it in small chunks.
However, I have no idea why this is happening.
Oh, that's easy. You are trying to allocate a needlessly gigantic array needlessly on the stack, instead of on the heap. Your stack has very strict limits, by default 1MB I believe in Windows.

Variable-Length Arrays like bool bin_arr[8*length]; are not in the C++ standard and it has risk of stack overflow. You should use std::vector.
Note that std::vector<bool> has a special implementation. To avoid this, using std::vector<char> or std::deque<bool> should be better.

Related

C++ fill an empty buffer with a single value

I apologize in advance if I am using the incorrect terminology, I'm new to the C++ language. I have a class with a constructor that creates an empty buffer using malloc
LPD6803PWM::LPD6803PWM(uint16_t leds, uint8_t dout, uint8_t cout) {
numLEDs = leds;
pixels = (uint16_t *) malloc(numLEDs);
dataPin = dout;
clockPin = cout;
}
My understanding is that this creates an empty buffer with the length of whatever I pass to numLEDs this is essentially a dynamically created array correct? I'm using malloc because this code goes on an Arduino that has very limited memory and I want to avoid overflows and from what I have read, this is the best way to declare arrays is you don't know what size the array will be and you want to avoid overflow errors.
My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value. Very often I will want to do this and even microseconds make a difference in this application. I know that from the C++ standard library array classes have a fill method, but what about an array declared in this way?

My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value.
The C standard library provides memset() and related functions for filling a buffer. There's also calloc(), which allocates a buffer just like malloc(), but fills the buffer with 0 at the same time.
Very often I will want to do this and even microseconds make a difference in this application.
In that case you might consider ways to avoid repeatedly allocating the array, which could take more time than filling an existing array. As well, the easiest way to make your code go faster is to run it on faster hardware. Arduino is a great platform, but Raspberry Pi Zero costs less ($5, if you can find them), has a LOT more memory, and has a clock speed that's 64x faster than a typical Arduino (1Ghz vs. 16MHz). Computing is often a tradeoff between good, cheap, and fast, but in this case you get all three.

You can still use std::fill (or std::fill_n), most standard library implementations will delegate to memset for RandomAccessIterator (e.g. gcc and Clang). Trust in the standard library writers!

You can use memset. But you have to be careful about the value you want to set. And you won't be much faster than using a for loop. The computer needs to set all these values somehow! memset may set larger contiguous memory spans and therefore be faster, but a smart compiler may do the same for a for loop.
If you're really concerned about microseconds you need to do some profiling.

Well, you can use memset from stdlib.h:
memset(array, 0, size_of_array_in_bytes);
Note however that memset works byte for byte,e.g it sets the first byte to 0 or whatever value you set as the second parameter, then the second byte and so on, which means that you must be careful.
Just a note:
malloc gets its size as the size of arrays in bytes, so you might consider multiplying its parameter by sizeof(uint16_t)

Variable Length Array Performance Implications (C/C++)

I'm writing a fairly straightforward function that sends an array over to a file descriptor. However, in order to send the data, I need to append a one byte header.
Here is a simplified version of what I'm doing and it seems to work:
void SendData(uint8_t* buffer, size_t length) {
uint8_t buffer_to_send[length + 1];
buffer_to_send[0] = MY_SPECIAL_BYTE;
memcpy(buffer_to_send + 1, buffer, length);
// more code to send the buffer_to_send goes here...
}
Like I said, the code seems to work fine, however, I've recently gotten into the habit of using the Google C++ style guide since my current project has no set style guide for it (I'm actually the only software engineer on my project and I wanted to use something that's used in industry). I ran Google's cpplint.py and it caught the line where I am creating buffer_to_send and threw some comment about not using variable length arrays. Specifically, here's what Google's C++ style guide has to say about variable length arrays...
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Variable-Length_Arrays_and_alloca__
Based on their comments, it appears I may have found the root cause of seemingly random crashes in my code (which occur very infrequently, but are nonetheless annoying). However, I'm a bit torn as to how to fix it.
Here are my proposed solutions:
Make buffer_to_send essentially a fixed length array of a constant length. The problem that I can think of here is that I have to make the buffer as big as the theoretically largest buffer I'd want to send. In the average case, the buffers are much smaller, and I'd be wasting about 0.5KB doing so each time the function is called. Note that the program must run on an embedded system, and while I'm not necessarily counting each byte, I'd like to use as little memory as possible.
Use new and delete or malloc/free to dynamically allocate the buffer. The issue here is that the function is called frequently and there would be some overhead in terms of constantly asking the OS for memory and then releasing it.
Use two successive calls to write() in order to pass the data to the file descriptor. That is, the first write would pass only the one byte, and the next would send the rest of the buffer. While seemingly straightforward, I would need to research the code a bit more (note that I got this code handed down from a previous engineer who has since left the company I work for) in order to guarantee that the two successive writes occur atomically. Also, if this requires locking, then it essentially becomes more complex and has more performance impact than case #2.
Note that I cannot make the buffer_to_send a member variable or scope it outside the function since there are (potentially) multiple calls to the function at any given time from various threads.
Please let me know your opinion and what my preferred approach should be. Thanks for your time.

You can fold the two successive calls to write() in your option 3 into a single call using writev().
http://pubs.opengroup.org/onlinepubs/009696799/functions/writev.html

I would choose option 1. If you know the maximum length of your data, then allocate that much space (plus one byte) on the stack using a fixed size array. This is no worse than the variable length array you have shown because you must always have enough space left on the stack otherwise you simply won't be able to handle your maximum length (at worst, your code would randomly crash on larger buffer sizes). At the time this function is called, nothing else will be using the further space on your stack so it will be safe to allocate a fixed size array.

How to read an array in a binary file in C++

Currently I read arrays in C++ with ifstream, read and reinterpret_cast by making a loop on values. Is it possible to load for example an unsigned int array from a binary file in one time without making a loop ?
Thank you very much

Yes, simply pass the address of the first element of the array, and the size of the array in bytes:
// Allocate, for example, 47 ints
std::vector<int> numbers(47);
// Read in as many ints as 'numbers' has room for.
inFile.read(&numbers[0], numbers.size()*sizeof(numbers[0]));
Note: I almost never use raw arrays. If I need a sequence that looks like an array, I use std::vector. If you must use an array, the syntax is very similar.
The ability to read and write binary images is non-portable. You may not be able to re-read the data on another machine, or even on the same machine with a different compiler. But, you have that problem already, with the solution that you are using now.

Declaring large character array in c++

I am trying right now to declare a large character array. I am using the character array as a bitmap (as in a map of booleans, not the image file type). The following code generates a compilation error.
//This is code before main. I want these as globals.
unsigned const long bitmap_size = (ULONG_MAX/(sizeof(char)));
char bitmap[bitmap_size];
The error is overflow in array dimension. I recognize that I'm trying to have my process consume a lot of data and that there might be some limit in place that prevents me from doing so. I am curious as to whether I am making a syntax error or if I need to request more resources from the kernel. Also, I have no interest in creating a bitmap with some class. Thank you for your time.
EDIT
ULONG_MAX is very much dependent upon the machine that you are using. On the particular machine I was compiling my code on it was well over 4.2 billion. All in all, I wouldn't to use that constant like a constant, at least for the purpose of memory allocation.

ULONG_MAX/sizeof(char) is the same as ULONG_MAX, which is a very large number. So large, in fact, that you don't have room for it even in virtual memory (because ULONG_MAX is probably the number of bytes in your entire virtual memory).
You definitely need to rethink what you are trying to do.

It's impossible to declare an array that large on most systems -- on a 32-bit system, that array is 4 GB, which doesn't fit into the available address space, and on most 64-bit systems, it's 16 exabytes (16 million terabytes), which doesn't fit into the available address space there either (and, incidentally, may be more memory than exists on the entire planet).
Use malloc() to allocate large amounts of memory. But be realistic. :)

As I understand it, the maximum size of an array in c++ is the largest integer the platform supports. It is likely that your long-type bitmap_size constant exceeds that limit.

Access Violation Using memcpy or Assignment to an Array in a Struct

Update 2:
Well I’ve refactored the work-around that I have into a separate function. This way, while it’s still not ideal (especially since I have to free outside the function the memory that is allocated inside the function), it does afford the ability to use it a little more generally. I’m still hoping for a more optimal and elegant solution…
Update:
Okay, so the reason for the problem has been established, but I’m still at a loss for a solution.
I am trying to figure out an (easy/effective) way to modify a few bytes of an array in a struct. My current work-around of dynamically allocating a buffer of equal size, copying the array, making the changes to the buffer, using the buffer in place of the array, then releasing the buffer seems excessive and less-than optimal. If I have to do it this way, I may as well just put two arrays in the struct and initialize them both to the same data, making the changes in the second. My goal is to reduce both the memory footprint (store just the differences between the original and modified arrays), and the amount of manual work (automatically patch the array).
Original post:
I wrote a program last night that worked just fine but when I refactored it today to make it more extensible, I ended up with a problem.
The original version had a hard-coded array of bytes. After some processing, some bytes were written into the array and then some more processing was done.
To avoid hard-coding the pattern, I put the array in a structure so that I could add some related data and create an array of them. However now, I cannot write to the array in the structure. Here’s a pseudo-code example:
main() {
char pattern[]="\x32\x33\x12\x13\xba\xbb";
PrintData(pattern);
pattern[2]='\x65';
PrintData(pattern);
}
That one works but this one does not:
struct ENTRY {
char* pattern;
int somenum;
};
main() {
ENTRY Entries[] = {
{"\x32\x33\x12\x13\xba\xbb\x9a\xbc", 44}
, {"\x12\x34\x56\x78", 555}
};
PrintData(Entries[0].pattern);
Entries[0].pattern[2]='\x65'; //0xC0000005 exception!!! :(
PrintData(Entries[0].pattern);
}
The second version causes an access violation exception on the assignment. I’m sure it’s because the second version allocates memory differently, but I’m starting to get a headache trying to figure out what’s what or how to get fix this. (I’m currently working around it by dynamically allocating a buffer of the same size as the pattern array, copying the pattern to the new buffer, making the changes to the buffer, using the buffer in the place of the pattern array, and then trying to remember to free the—temporary—buffer.)
(Specifically, the original version cast the pattern array—+offset—to a DWORD* and assigned a DWORD constant to it to overwrite the four target bytes. The new version cannot do that since the length of the source is unknown—may not be four bytes—so it uses memcpy instead. I’ve checked and re-checked and have made sure that the pointers to memcpy are correct, but I still get an access violation. I use memcpy instead of str(n)cpy because I am using plain chars (as an array of bytes), not Unicode chars and ignoring the null-terminator. Using an assignment as above causes the same problem.)
Any ideas?

It is illegal to attempt to modify string literals. Your
Entries[0].pattern[2]='\x65';
line attempts exactly that. In your second example you are not allocating any memory for the strings. Instead, you are making your pointers (in the struct objects) to point directly at string literals. And string literals are not modifiable.
This question gets asked several times every day. Read Why is this string reversal C code causing a segmentation fault? for more details.

The problem boils down to the fact that a char[] is not a char*, even if the char[] acts a lot like a char* in expressions.

Other answers have addressed the reason for the error: you're modifying a string literal which is not allowed.
This question is tagged C++ so the easy way to solve your problem is to use std::string.
struct ENTRY {
std::string pattern;
int somenum;
};

Based on your updates, your real problem is this: You want to know how to initialize the strings in your array of structs in such a way that they're editable. (The problem has nothing to do with what happens after the array of structs is created -- as you show with your example code, editing the strings is easy enough if they're initialized correctly.)
The following code sample shows how to do this:
// Allocate the memory for the strings, on the stack so they'll be editable, and
// initialize them:
char ptn1[] = "\x32\x33\x12\x13\xba\xbb\x9a\xbc";
char ptn2[] = "\x12\x34\x56\x78";
// Now, initialize the structs with their char* pointers pointing at the editable
// strings:
ENTRY Entries[] = {
{ptn1, 44}
, {ptn2, 555}
};
That should work fine. However, note that the memory for the strings is on the stack, and thus will go away if you leave the current scope. That's not a problem if Entries is on the stack too (as it is in this example), of course, since it will go away at the same time.
Some Q/A on this:
Q: Why can't we initialize the strings in the array-of-structs initialization? A: Because the strings themselves are not in the structs, and initializing the array only allocates the memory for the array itself, not for things it points to.
Q: Can we include the strings in the structs, then? A: No; the structs have to have a constant size, and the strings don't have constant size.
Q: This does save memory over having a string literal and then malloc'ing storage and copying the string literal into it, thus resulting in two copies of the string, right? A: Probably not. When you write
char pattern[] = "\x12\x34\x56\x78";
what happens is that that literal value gets embedded in your compiled code (just like a string literal, basically), and then when that line is executed, the memory is allocated on the stack and the value from the code is copied into that memory. So you end up with two copies regardless -- the non-editable version in the source code (which has to be there because it's where the initial value comes from), and the editable version elsewhere in memory. This is really mostly about what's simple in the source code, and a little bit about helping the compiler optimize the instructions it uses to do the copying.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js