is there a specific catch with storing Arduino String objects into a QueueArray?
when I try the following code, Arduino just stops executing at "enqueue" function.
QueueArray <String> q;
String s = "blah";
q.enqueue(s);
Serial.println("checkpoint"); delay(1000);
Serial.println(q.peek()); delay(1000);
Same code works for storing integers, and even (char *). what am I missing?
by inspecting the headerfile (only template functions, so also source):
http://playground.arduino.cc/uploads/Code/QueueArray.zip
I believe the constructor of the queue gets you into this trouble.
I have had troubles allocating objects in the queue.
I have decided to only put the references in there as a number, and then on the recieving end set that as the address value of a pointer of the object type.
This is btw. a risky strategy!
why not just use enqueue the c_str attribute https://www.arduino.cc/en/Reference/CStr ?
Edit: this answer to a comment would bee too long:
The Arduino is a µProcessor and have a very limited memory (~2K).
(32K for your program, and similar in flash).
Remember that it is also a machine with a limited memory management.
so stack and heaps are both very small, and in general when the heap is used, it can get really fragmented very quickly.
As a C++ programmer that looks at your code, one might expect that the string is in fact stack allocated. (hint: there is no new keyword in there) so even if it internally would hold characters in the heap, this should allow you to assume that the string object would be destroyed when you exit the scope.
(you should also expect that it would clean up after it self, and deallocate the dynamic memory used for characters. Depending on your version of the libraries, this does in fact work, or there is a problem with re-allocate / deallocate, making stuff worse than we need to discuss here..)
learn the difference between heap and stack:
C++ Object Instantiation
you can do something with arduino strings minimize the ehap fragmentation, like reserving memory before doing string operations.
But every time you decide to use a string, you will most likely fragment the heap, especially if you let it stay alive after scope is exited (for use on the other end of the queue).
To avoid the fragmentation issues, (remember it's a small system), you could use enums or similar for predefined messages. However if you insist that you actually need to enqueue a string I have a better suggestion.
You could create a small global array of strings, to hold the strings that will be enquede. (whis will ofcourse limit the queue size, since no more messages can be in the queue, than the array of string objects allow..
This array will be stack allocated, but the characters referenced by each string would be heap allocated.
This allows you to let the reader of the queue clear the string, and thus free memory on the heap.
However as the array never leaves scope, the strings are never automatically deleted by the sender.
You would have to actively clear the string after recieving the string.
For this solution, there will be some constant overhead on the stack.
Alternatively (and in my opinion much better, since fragmentation is non-existing), you could use something like a global character buffer, reserved to allocating messages that is being transferred.
Before enqueueing, append the new message (c-styled null terminated string) after the last one currently in the buffer. The queue should contain the pointer to the message in the char array.
You must always test if there is room in the buffer for the string, before allocating some new message.
The way I do this is by having a global "recieved" pointer, that the worker will update when it reads the message.
It simply moves the pointer to the strings ending null character, when it's done treating it.
The producer will then have another local pointer used for remembering where it wrote the last time it wrote something.
Since the recieved pointer is global, the producer can always calculate the "distance" available between the pointers, and know how many characters can be written to the message buffer..
This is a simple circular buffer, so you need to handle overflowing etc. by adding a read and a write method.
Related
I have a question about heap overflows.
I understand that if a stack variable overruns it's buffer, it could overwrite the EIP and ESP values and, for example, make the program jump to a place where the coder did not expect it to jump.
This seems, as I understand, to behave like this because of the backward little endian storing (where f.e. the characters in an array are stored "backwards", from last to first).
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then? (unless you where on a solaris which as far as I know has a big endian system,side note)
Would this basicly be a danger since it would just write into "empty space"?
So no aimed jumping to adresses and areas the code was not designed for?
Am I getting this wrong?
To specify my question:
I am writing a program where the user is meant to pass a string argument and a flag when executing it via command line, and I want to know if the user could perform a hack with this string argument when it is put on the heap with the malloc function.
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then?
You are making a couple of assumptions:
You are assuming that the heap is at the end of the main memory segment. That ain't necessarily so.
You are assuming that the object in the heap is at the end of the heap. That ain't necessarily so. (In fact, it typically isn't so ...)
Here's an example that is likely to cause problems no matter how the heap is implemented:
char *a = malloc(100);
char *b = malloc(100);
char *c = malloc(100);
for (int i = 0; i < 200; i++) {
b[i] = 'Z';
}
Writing beyond the end of b is likely to trample either a or c ... or some other object in the heap, or the free list.
Depending on what objects you trample, you may overwrite function pointers, or you may do other damage that results in segmentation faults, unpredictable behaviour and so on. These things could be used for code injection, to cause the code to malfunction in other ways that are harmful from a security standpoint ... or just to implement a denial of service attack by crashing the target application / service.
There are various ways heap overflow could lead to code execution:
Most obvious - you overflow into another object that contains function pointers and get to overwrite one of them.
Slightly less obvious - the object you overflow into doesn't itself contain function pointers, but it contains pointers that will be used for writing, and you get to overwrite one of them to point to a function pointer so that a subsequent write overwrites a function pointer.
Exploiting heap bookkeeping structures - by overwriting the data that the heap allocator itself uses to track size and status of allocated/free blocks, you trick it into overwriting something valuable elsewhere in memory.
Etc.
For some advanced techniques, see:
http://packetstormsecurity.com/files/view/40638/MallocMaleficarum.txt
Even if you can't overwrite a return address, how do you feel about an attacker modifying the rest of your data? This shouldn't thrill you.
To answer your question generally: it is a very bad idea to let the user copy data anywhere without checking its size. You should absolutely never do that, especially on purpose.
If the user means no harm, they may crash your program, either by overwriting useful data, or by causing a page fault. If your user is malicious, you're potentially letting them hijack your system. Both are highly undesirable.
Endianness does not matter to buffer overflows. Big endian machines are just as vulnerable as little-endian machines. The only difference will be the byte order of the malicious data.
You may be thinking instead of the direction the stack grows in, which is independent of endianness. In the case where it grows up, you won't be able to hijack the return address of the function that declares the buffer. However, if you pass that buffer address to any other function, and this function overflows instead, an attacker may change this function's return address. This would be the case, for instance, if you called memcpy of scanf or any other function to modify your buffer (assuming that the compiler didn't inline them).
The stack usually grows downwards. In this case, an attacker can use an overflow to hijack the return address of the function that declares it.
In other words, neither the stack configuration nor endianness offer meaningful protection against stack buffer overflows.
As for the heap:
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then?
The answer, as almost always, is it depends, but probably not. The 32-bit implementation of malloc in glibc keeps bookkeeping structure at the end of the buffer (or at least, used to). By overflowing onto the bookkeeping structures with the correct incantations, when the allocation was freed, you could cause free to write four arbitrary bytes at an arbitrary location. This is a lot of power. This kind of exploit comes up regularly in capture-the-flag competitions and is very exploitable.
I'm writing a C++14 JSON library as an exercise and to use it in my personal projects.
By using callgrind I've discovered that the current bottleneck during a continuous value creation from string stress test is an std::string dynamic memory allocation. Precisely, the bottleneck is the call to malloc(...) made from std::string::reserve.
I've read that many existing JSON libraries such as rapidjson use custom allocators to avoid malloc(...) calls during string memory allocations.
I tried to analyze rapidjson's source code but the large amount of additional code and comments, plus the fact that I'm not really sure what I'm looking for, didn't help me much.
How do custom allocators help in this situation?
Is a memory buffer preallocated somewhere (where? statically?) and std::strings take available memory from it?
Are strings using custom allocators "compatible" with normal strings?
They have different types. Do they have to be "converted"? (And does that result in a performance hit?)
Code notes:
Str is an alias for std::string.
By default, std::string allocates memory as needed from the same heap as anything that you allocate with malloc or new. To get a performance gain from providing your own custom allocator, you will need to be managing your own "chunk" of memory in such a way that your allocator can deal out the amounts of memory that your strings ask for faster than malloc does. Your memory manager will make relatively few calls to malloc, (or new, depending on your approach) under the hood, requesting "large" amounts of memory at once, then deal out sections of this (these) memory block(s) through the custom allocator. To actually achieve better performance than malloc, your memory manager will usually have to be tuned based on known allocation patterns of your use cases.
This kind of thing often comes down to the age-old trade off of memory use versus execution speed. For example: if you have a known upper bound on your string sizes in practice, you can pull tricks with over-allocating to always accommodate the largest case. While this is wasteful of your memory resources, it can alleviate the performance overhead that more generalized allocation runs into with memory fragmentation. As well as making any calls to realloc essentially constant time for your purposes.
#sehe is exactly right. There are many ways.
EDIT:
To finally address your second question, strings using different allocators can play nicely together, and usage should be transparent.
For example:
class myalloc : public std::allocator<char>{};
myalloc customAllocator;
int main(void)
{
std::string mystring(customAllocator);
std::string regularString = "test string";
mystring = regularString;
std::cout << mystring;
return 0;
}
This is a fairly silly example and, of course, uses the same workhorse code under the hood. However, it shows assignment between strings using allocator classes of "different types". Implementing a useful allocator that supplies the full interface required by the STL without just disguising the default std::allocator is not as trivial. This seems to be a decent write up covering the concepts involved. The key to why this works, in the context of your question at least, is that using different allocators doesn't cause the strings to be of different type. Notice that the custom allocator is given as an argument to the constructor not a template parameter. The STL still does fun things with templates (such as rebind and Traits) to homogenize allocator interfaces and tracking.
What often helps is the creation of a GlobalStringTable.
See if you can find portions of the old NiMain library from the now defunct NetImmerse software stack. It contains an example implementation.
Lifetime
What is important to note is that this string table needs to be accessible between different DLL spaces, and that it is not a static object. R. Martinho Fernandes already warned that the object needs to be created when the application or DLL thread is created / attached, and disposed when the thread is destroyed or the dll is detached, and preferrably before any string object is actually used. This sounds easier than it actually is.
Memory allocation
Once you have a single point of access that exports correctly, you can have it allocate a memory buffer up-front. If the memory is not enough, you have to resize it and move the existing strings over. Strings essentially become handles to regions of memory in this buffer.
Placement new
Something that often works well is called the placement new() operator, where you can actually specify where in memory your new string object needs to be allocated. However, instead of allocating, the operator can simply grab the memory location that is passed in as an argument, zero the memory at that location, and return it. You can also keep track of the allocation, the actual size of the string etc.. in the Globalstringtable object.
SOA
Handling the actual memory scheduling is something that is up to you, but there are many possible ways to approach this. Often, the allocated space is partitioned in several regions so that you have several blocks per possible string size. A block for strings <= 4 bytes, one for <= 8 bytes, and so on. This is called a Small Object Allocator, and can be implemented for any type and buffer.
If you expect many string operations where small strings are incremented repeatedly, you may change your strategy and allocate larger buffers from the start, so that the number of memmove operations are reduced. Or you can opt for a different approach and use string streams for those.
String operations
It is not a bad idea to derive from std::basic_str, so that most of the operations still work but the internal storage is actually in the GlobalStringTable, so that you can keep using the same stl conventions. This way, you also make sure that all the allocations are within a single DLL, so that there can be no heap corruption by linking different kinds of strings between different libraries, since all the allocation operations are essentially in your DLL (and are rerouted to the GlobalStringTable object)
Custom allocators can help because most malloc()/new implementations are designed for maximum flexibility, thread-safety and bullet-proof workings. For instance, they must gracefully handle the case that one thread keeps allocating memory, sending the pointers to another thread that deallocates them. Things like these are difficult to handle in a performant way and drive the cost of malloc() calls.
However, if you know that some things cannot happen in your application (like one thread deallocating stuff another thread allocated, etc.), you can optimize your allocator further than the standard implementation. This can yield significant results, especially when you don't need thread safety.
Also, the standard implementation is not necessarily well optimized: Implementing void* operator new(size_t size) and void operator delete(void* pointer) by simply calling through to malloc() and free() gives an average performance gain of 100 CPU cycles on my machine, which proves that the default implementation is suboptimal.
I think you'd be best served by reading up on the EASTL
It has a section on allocators and you might find fixed_string useful.
The best way to avoid a memory allocation is don't do it!
BUT if I remember JSON correctly all the readStr values either gets used as keys or as identifiers so you will have to allocate them eventually, std::strings move semantics should insure that the allocated array are not copied around but reused until its final use. The default NRVO/RVO/Move should reduce any copying of the data if not of the string header itself.
Method 1:
Pass result as a ref from the caller which has reserved SomeResonableLargeValue chars, then clear it at the start of readStr. This is only usable if the caller actually can reuse the string.
Method 2:
Use the stack.
// Reserve memory for the string (BOTTLENECK)
if (end - idx < SomeReasonableValue) { // 32?
char result[SomeReasonableValue] = {0}; // feel free to use std::array if you want bounds checking, but the preceding "if" should insure its not a problem.
int ridx = 0;
for(; idx < end; ++idx) {
// Not an escape sequence
if(!isC('\\')) { result[ridx++] = getC(); continue; }
// Escape sequence: skip '\'
++idx;
// Convert escape sequence
result[ridx++] = getEscapeSequence(getC());
}
// Skip closing '"'
++idx;
result[ridx] = 0; // 0-terminated.
// optional assert here to insure nothing went wrong.
return result; // the bottleneck might now move here as the data is copied to the receiving string.
}
// fallback code only if the string is long.
// Your original code here
Method 3:
If your string by default can allocate some size to fill its 32/64 byte boundary, you might want to try to use that, construct result like this instead in case the constructor can optimize it.
Str result(end - idx, 0);
Method 4:
Most systems already has some optimized allocator that like specific block sizes, 16,32,64 etc.
siz = ((end - idx)&~0xf)+16; // if the allocator has chunks of 16 bytes already.
Str result(siz);
Method 5:
Use either the allocator made by google or facebooks as global new/delete replacement.
To understand how a custom allocator can help you, you need to understand what malloc and the heap does and why it is quite slow in comparison to the stack.
The Stack
The stack is a large block of memory allocated for your current scope. You can think of it as this
([] means a byte of memory)
[P][][][][][][][][][][][][][][][]
(P is a pointer that points to a specific byte of memory, in this case its pointing at the first byte)
So the stack is a block with only 1 pointer. When you allocate memory, what it does is it performs a pointer arithmetic on P, which takes constant time.
So declaring int i = 0; would mean this,
P + sizeof(int).
[i][i][i][i][P][][][][][][][][][][][],
(i in [] is a block of memory occupied by an integer)
This is blazing fast and as soon as you go out of scope, the entire chunk of memory is emptied simply by moving P back to the first position.
The Heap
The heap allocates memory from a reserved pool of bytes reserved by the c++ compiler at runtime, when you call malloc, the heap finds a length of contiguous memory that fits your malloc requirements, marks it as used so nothing else can use it, and returns that to you as a void*.
So, a theoretical heap with little optimization calling new(sizeof(int)), would do this.
Heap chunk
At first : [][][][][][][][][][][][][][][][][][][][][][][][][]
Allocate 4 bytes (sizeof(int)):
A pointer goes though every byte of memory, finds one that is of correct length, and returns to you a pointer.
After : [i][i][i][i][][][]][][][][][][][][][]][][][][][][][]
This is not an accurate representation of the heap, but from this you can already see numerous reasons for being slow relative to the stack.
The heap is required to keep track of all already allocated memory and their respective lengths. In our test case above, the heap was already empty and did not require much, but in worst case scenarios, the heap will be populated with multiple objects with gaps in between (heap fragmentation), and this will be much slower.
The heap is required to cycle though all the bytes to find one that fits your length.
The heap can suffer from fragmentation since it will never completely clean itself unless you specify it. So if you allocated an int, a char, and another int, your heap would look like this
[i][i][i][i][c][i2][i2][i2][i2]
(i stands for bytes occupied by int and c stands for bytes occupied by a char. When you de-allocate the char, it will look like this.
[i][i][i][i][empty][i2][i2][i2][i2]
So when you want to allocate another object into the heap,
[i][i][i][i][empty][i2][i2][i2][i2][i3][i3][i3][i3]
unless an object is the size of 1 char, the overall heap size for that allocation is reduced by 1 byte. In more complex programs with millions of allocations and deallocations, the fragmentation issue becomes severe and the program will become unstable.
Worry about cases like thread safety (Someone else said this already).
Custom Heap/Allocator
So, a custom allocator usually needs to address these problems while providing the benefits of the heap, such as personalized memory management and object permanence.
These are usually accomplished with specialized allocators. If you know you dont need to worry about thread safety or you know exactly how long your string will be or a predictable usage pattern you can make your allocator fast than malloc and new by quite a lot.
For example, if your program requires a lot of allocations as fast as possible without lots of deallocations, you could implement a stack allocator, in which you allocate a huge chunk of memory with malloc at startup,
e.g
typedef char* buffer;
//Super simple example that probably doesnt work.
struct StackAllocator:public Allocator{
buffer stack;
char* pointer;
StackAllocator(int expectedSize){ stack = new char[expectedSize];pointer = stack;}
allocate(int size){ char* returnedPointer = pointer; pointer += size; return returnedPointer}
empty() {pointer = stack;}
};
Get expected size, get a chunk of memory from the heap.
Assign a pointer to the beginning.
[P][][][][][][][][][] ..... [].
then have one pointer that moves for each allocation. When you no longer need the memory, you simply move the pointer to the beginning of your buffer. This gives your the advantage of O(1) speed allocations and deallocations as well as object permanence for the lack of flexible deallocation and large initial memory requirements.
For strings, you could try a chunk allocator. For every allocation, the allocator gives a set chunk of memory.
Compatibility
Compatibility with other strings is almost guaranteed. As long as you are allocating a contiguous chunk of memory and preventing anything else from using that block of memory, it will work.
I'm trying to optimize some code I've written to handle several layers of an application protocol. I made liberal use of the std::string class, and strove for simplicity rather than premature optimization. The application is too slow, and valgrind & gprof show I'm spending significant time copy-constructing strings as a buffer moves upward through my stack.
It seems to me that, after copying chars from the system buffer to my lowest application buffer, I should be able to avoid copying the data any more: after all, it is not mutated as it moves up the stack.
My protocol format is a "transmission", consisting of one or more newline-terminated records, each consisting of several tab-separated fields, and terminated with a special token. E.g.
RECORD 1\tHAS\tTHESE\tFIELDS\nRECORD 2\tLOOKS\tLIKE\tTHIS\nEND-OF-TRANSMISSION\n
This would be assembled in a single std::string called input_buffer.
The processing of a transmission involves extracting a record from the buffer and passing it to the next layer; extracting a vector of fields from the record and passing it to the next layer; storing the fields into a map. At each stage, data is being copied as new std::strings are allocated.
Is it possible to allocate a const string from an index into input_buffer, and a length ... without any copying being done? For example, RECORD 2 begins at offset 26 and is 24 chars long:
const std:string record (substr(input_buffer, 26), 24 );
I'm not familiar with the innards of a string object, but its performance guarantees seem to imply that somewhere there's a simple char sequence, and almost undoubtedly a pointer to those chars' memory. Could that pointer be initialized to memory belonging to another string?
(My compiler is g++ 4.7, but if this is something that requires 4.8, I'd appreciate knowing about that too.)
From what I understand, this sounds like a good candidate for boost::string_ref.
You would simply do boost::string_ref input(input_buffer); and then pass string_refs up the stack instead. The only thing you have to worry about is keeping the original buffer alive the whole time.
I know that dynamic memory has advantages over setting a fixed size array and and using a portion of it. But in dynamic memory you have to enter the amount data that you want to store in the array. When using strings you can type as many letters as you want(you can even use strings for numbers and then use a function to convert them). This fact makes me think that dynamic memory for character arrays is obsolete compared to strings.
So i wanna know what are the advantages and disadvantages when using strings? When is the space occupied by strings freed? Is maybe the option to free your dynamically allocated memory with delete an advantage over strings? Please explain.
The short answer is "no, there is no drawbacks, only advantages" with std::string over character arrays.
Of course, strings do USE dynamic memory, it just hides the fact behind the scenes so you don't have to worry about it.
In answer to you question: When is the space occupied by strings freed? this post may be helpful. Basically, std::strings are freed once they go out of scope. Often the compiler can decide when to allocate and release the memory.
std::string usually contains an internal dynamically allocated buffer. When you assign data, or if you push back new data, and the current buffer size is not sufficient, a new buffer is allocated with an increased size and the old data is copied or moved to the new buffer. The old buffer is then deallocated.
The main buffer is deallocated when the string goes out of scope. If the string object is a local variable in a function (on the stack), it will deallocate at the end of the current code block. If it's a function parameter, when the function exits. If it's a class member, whenever the class is destroyed.
The advantage of strings is flexibility (increases in size automatically) and safety (harder to go over the bounds of an array). A fixed-size char array on the stack is faster as no dynamic allocation is required. But you should worry about that if you have a performance problem, and not before.
well, your question got me thinking, and then i understood that you are talking about syntax differences, because both ways are dynamic allocating char arrays. the only difference is in the need:
if you need to create a string containing a sentence then you can, and
that's fine, not to use malloc
if you want an array and to "play" with it, meaning change or set the cells cording to some method, or changing it's size, then initiating it with malloc would be the appropriate way
the only reason i see to a static allocating char a[17] (for example) is for a single purpose string that you need, meaning only when you know the exact size you'll need and it won't change
and one important point the i found:
In dynamic memory allocation, if the memory is being continually allocated but the one allocated for objects that are not in use, is not released, then it can lead to stack overflow condition or memory leak which is a big disadvantage.
Update 2:
Well I’ve refactored the work-around that I have into a separate function. This way, while it’s still not ideal (especially since I have to free outside the function the memory that is allocated inside the function), it does afford the ability to use it a little more generally. I’m still hoping for a more optimal and elegant solution…
Update:
Okay, so the reason for the problem has been established, but I’m still at a loss for a solution.
I am trying to figure out an (easy/effective) way to modify a few bytes of an array in a struct. My current work-around of dynamically allocating a buffer of equal size, copying the array, making the changes to the buffer, using the buffer in place of the array, then releasing the buffer seems excessive and less-than optimal. If I have to do it this way, I may as well just put two arrays in the struct and initialize them both to the same data, making the changes in the second. My goal is to reduce both the memory footprint (store just the differences between the original and modified arrays), and the amount of manual work (automatically patch the array).
Original post:
I wrote a program last night that worked just fine but when I refactored it today to make it more extensible, I ended up with a problem.
The original version had a hard-coded array of bytes. After some processing, some bytes were written into the array and then some more processing was done.
To avoid hard-coding the pattern, I put the array in a structure so that I could add some related data and create an array of them. However now, I cannot write to the array in the structure. Here’s a pseudo-code example:
main() {
char pattern[]="\x32\x33\x12\x13\xba\xbb";
PrintData(pattern);
pattern[2]='\x65';
PrintData(pattern);
}
That one works but this one does not:
struct ENTRY {
char* pattern;
int somenum;
};
main() {
ENTRY Entries[] = {
{"\x32\x33\x12\x13\xba\xbb\x9a\xbc", 44}
, {"\x12\x34\x56\x78", 555}
};
PrintData(Entries[0].pattern);
Entries[0].pattern[2]='\x65'; //0xC0000005 exception!!! :(
PrintData(Entries[0].pattern);
}
The second version causes an access violation exception on the assignment. I’m sure it’s because the second version allocates memory differently, but I’m starting to get a headache trying to figure out what’s what or how to get fix this. (I’m currently working around it by dynamically allocating a buffer of the same size as the pattern array, copying the pattern to the new buffer, making the changes to the buffer, using the buffer in the place of the pattern array, and then trying to remember to free the—temporary—buffer.)
(Specifically, the original version cast the pattern array—+offset—to a DWORD* and assigned a DWORD constant to it to overwrite the four target bytes. The new version cannot do that since the length of the source is unknown—may not be four bytes—so it uses memcpy instead. I’ve checked and re-checked and have made sure that the pointers to memcpy are correct, but I still get an access violation. I use memcpy instead of str(n)cpy because I am using plain chars (as an array of bytes), not Unicode chars and ignoring the null-terminator. Using an assignment as above causes the same problem.)
Any ideas?
It is illegal to attempt to modify string literals. Your
Entries[0].pattern[2]='\x65';
line attempts exactly that. In your second example you are not allocating any memory for the strings. Instead, you are making your pointers (in the struct objects) to point directly at string literals. And string literals are not modifiable.
This question gets asked several times every day. Read Why is this string reversal C code causing a segmentation fault? for more details.
The problem boils down to the fact that a char[] is not a char*, even if the char[] acts a lot like a char* in expressions.
Other answers have addressed the reason for the error: you're modifying a string literal which is not allowed.
This question is tagged C++ so the easy way to solve your problem is to use std::string.
struct ENTRY {
std::string pattern;
int somenum;
};
Based on your updates, your real problem is this: You want to know how to initialize the strings in your array of structs in such a way that they're editable. (The problem has nothing to do with what happens after the array of structs is created -- as you show with your example code, editing the strings is easy enough if they're initialized correctly.)
The following code sample shows how to do this:
// Allocate the memory for the strings, on the stack so they'll be editable, and
// initialize them:
char ptn1[] = "\x32\x33\x12\x13\xba\xbb\x9a\xbc";
char ptn2[] = "\x12\x34\x56\x78";
// Now, initialize the structs with their char* pointers pointing at the editable
// strings:
ENTRY Entries[] = {
{ptn1, 44}
, {ptn2, 555}
};
That should work fine. However, note that the memory for the strings is on the stack, and thus will go away if you leave the current scope. That's not a problem if Entries is on the stack too (as it is in this example), of course, since it will go away at the same time.
Some Q/A on this:
Q: Why can't we initialize the strings in the array-of-structs initialization? A: Because the strings themselves are not in the structs, and initializing the array only allocates the memory for the array itself, not for things it points to.
Q: Can we include the strings in the structs, then? A: No; the structs have to have a constant size, and the strings don't have constant size.
Q: This does save memory over having a string literal and then malloc'ing storage and copying the string literal into it, thus resulting in two copies of the string, right? A: Probably not. When you write
char pattern[] = "\x12\x34\x56\x78";
what happens is that that literal value gets embedded in your compiled code (just like a string literal, basically), and then when that line is executed, the memory is allocated on the stack and the value from the code is copied into that memory. So you end up with two copies regardless -- the non-editable version in the source code (which has to be there because it's where the initial value comes from), and the editable version elsewhere in memory. This is really mostly about what's simple in the source code, and a little bit about helping the compiler optimize the instructions it uses to do the copying.