The meaning of 'store {} {}, {}* %p' in llvm-ir? - llvm

I know that the store instruction is used to store data into memory, but what does the following piece of llvm-ir code mean? Where you can see lots of empty "{}" structures here.
; CHECK: Function: foo:
; CHECK-NEXT: NoAlias: {}* %p, {}* %q
define void #foo({}* %p, {}* %q) {
store {} {}, {}* %p
store {} {}, {}* %q
ret void
}
FYI: https://github.com/llvm-mirror/llvm/blob/master/test/Analysis/CFLAliasAnalysis/Steensgaard/empty.ll

Each of those two stores a pointer into RAM. The pointer points to a zero-byte structure, but that doesn't affect the operation of storing the pointer.
As to why, it's test code. Do you know the joke about the tester who walks into a bar and orders zero beers? Structs can contain zero fields (and sometimes do, e.g. when each field has been separately determined to be redundant and optimised away) so the compiler needs to handle empty structs, and therefore tests contain empty structs.

Related

Change endianness of entire struct in C++

I am writing a parser in C++ to parse a well defined binary file. I have declared all the required structs. And since only particular fields are of interest to me, so in my structs I have skipped non-required fields by creating char array of size equal to skipped bytes. So I am just reading the file in char array and casting the char pointer to my struct pointer. Now problem is that all data fields in that binary are in big endian order, so after typecasting I need to change the endianness of all the struct fields. One way is to do it manually for each and every field. But there are various structs with many fields, so it'll be very cumbersome to do it manually. So what's the best way to achieve this. And since I'll be parsing very huge such files (say in TB's), so I require a fast way to do this.
EDIT : I have use attribute(packed) so no need to worry about padding.
If you can do misaligned accesses with no penalty, and you don't mind compiler- or platform-specific tricks to control padding, this can work. (I assume you are OK with this since you mention __attribute__((packed))).
In this case the nicest approach is to write value wrappers for your raw data types, and use those instead of the raw type when declaring your struct in the first place. Remember the value wrapper must be trivial/POD-like for this to work. If you have a POSIX platform you can use ntohs/ntohl for the endian conversion, it's likely to be better optimized that whatever you write yourself.
If misaligned accesses are illegal or slow on your platform, you need to deserialize instead. Since we don't have reflection yet, you can do this with the same value wrappers (plus an Ignore<N> placeholder that skips N bytes for fields you're not interested), and declare them in a tuple instead of a struct - you can iterate over the members in a tuple and tell each to deserialize itself from the message.
One way to do that is combine C preprocessor with C++ operators. Write a couple of C++ classes like this one:
#include "immintrin.h"
class FlippedInt32
{
int value;
public:
inline operator int() const
{
return _bswap( value );
}
};
class FlippedInt64
{
__int64 value;
public:
inline operator __int64() const
{
return _bswap64( value );
}
};
Then,
#define int FlippedInt32
before including the header that define these structures. #undef immediately after the #include.
This will replace all int fields in the structures with FlippedInt32, which has the same size but returns flipped bytes.
If it’s your own structures which you can modify you don’t need the preprocessor part. Just replace the integers with the byte-flipping classes.
If you can come up with a list of offsets (in-bytes, relative to the top of the file) of the fields that need endian-conversion, as well as the size of those fields, then you could do all of the endian-conversion with a single for-loop, directly on the char array. E.g. something like this (pseudocode):
struct EndianRecord {
size_t offsetFromTop;
size_t fieldSizeInByes;
};
std::vector<EndianRecord> todoList;
// [populate the todo list here...]
char * rawData = [pointer to the raw data]
for (size_t i=0; i<todoList.size(); i++)
{
const EndianRecord & er = todoList[i];
ByteSwap(&rawData[er.offsetFromTop], er.fieldSizeBytes);
}
struct MyPackedStruct * data = (struct MyPackedStruct *) rawData;
// Now you can just read the member variables
// as usual because you know they are already
// in the correct endian-format.
... of course the difficult part is coming up with the correct todoList, but since the file format is well-defined, it should be possible to generate it algorithmically (or better yet, create it as a generator with e.g. a GetNextEndianRecord() method that you can call, so that you don't have to store a very large vector in memory)

C++ Memory Allocation Of A Variable That Is Not Initialized (32bit Machine)

I noticed that if you allocate a char array inside of a function like this
void example()
{
char dataBuffer[100] = { 0 };
}
then examine the disassembly of the function with IDA that this actually inserts a call to memset() in order to initialize the char array. Looks something like this after I reversed it
memset(stackPointer + offset + 1, 0, 100);
The raw assembly looks like
addic r3, r1, 0x220
addic r3, r3, 1
clrldi r3, r3, 32
li r4, 0
li r5, 0x64
bl memset
But If I were to change the example() function to
void example()
{
char dataBuffer[100];
}
Then the call to memset() is not inserted I noticed when examining the disassembly in IDA.
So basically my question is, if the char array is not initialized to zero will it still be safe to work with? For example
void example()
{
char dataBuffer[100];
strcpy(dataBuffer, "Just some random text");
strcat(dataBuffer, "blah blah blah example text\0");//null terminator probably not required as strcpy() appends null terminator and strcat() moves the null terminator to end of string. but whatever
}
Should I expect any UB when writing/reading to the char array like this even when it is not initialized to zero with the inserted memset() that comes along with initializing the char array with = { 0 }?
It's perfectly safe to work with it as an array with garbage data. This means writing into it is safe, reading from it is not. You simply just don't know what is in it yet. The function strcpy doesn't read from the array it gets (or more specifically, from the pointer it gets) it just writes onto it. So it's safe.
After you are done with writing into your char buffer. When you come to use it, you are going to go through it until you encounter a null (0) character. That null character will be set there when you wrote into it last. After that null character comes garbage if you didn't initialize it, and comes 0's if you did. In both cases, it doesn't matter since you are not going to read past the null character.
See: http://www.cplusplus.com/reference/cstring/strcpy/
it uses a very similar example to the code you provided.
The line
char dataBuffer[100];
calls the variable dataBuffer into existence, and thus also associates memory with it. However, as an optimization, this memory is not initialized. C is designed not to perform any unnecessary work, and you are working in the C subset of C++ here.
That said, if your compiler can prove that you don't actually use the memory, it does not need to allocate it. But such an optimization would not be detectable from within your running, standard compliant code by definition. Your code will run as if the memory had been allocated. (This as-if rule is the basis for pretty much all optimizations that your compiler is allowed to perform.)
Your strcpy() and strcat() calls are fine, as they do not overrun the allocated buffer. But better forget that strcpy() and strcat() exist, there are better, safer functions to use nowadays.

Storing dynamic length data 'inside' structure

Problem statement : User provides some data which I have to store inside a structure. This data which I receive come in a data structure which allows user to dynamically add data to it.
Requirement: I need a way to store this data 'inside' the structure, contiguously.
eg. Suppose user can pass me strings which I have to store. So I wrote something like this :
void pushData( string userData )
{
struct
{
string junk;
} data;
data.junk = userData;
}
Problem : When I do this kind of storage, actual data is not really stored 'inside' the structure because string is not POD. Similar problem comes when I receive vector or list.
Then I could do something like this :
void pushData( string userData )
{
struct
{
char junk[100];
} data;
// Copy userdata into array junk
}
This store the data 'inside' the structure, but then, I can't put an upper limit on the size of string user can provide.
Can someone suggest some approach ?
P.S. : I read something about serializability, but couldnt really make out clearly if it could be helpful in my case. If it is the way to go forward, can someone give idea how to proceed with it ?
Edit :
No this is not homework.
I have written an implementation which can pass this kind of structure over message queues. It works fine with PODs, but I need to extend it to pass on dynamic data as well.
This is how message queue takes data:
i. Give it a pointer and tell the size till which it should read and transfer data.
ii. For plain old data types, data is store inside the structure, I can easily pass on the pointer of this structure to message queue to other processes.
iii. But in case of vector/string/list etc, actual data is not inside the structure and thus if I pass on the pointer of this structure, message queue will not really pass on the actual data, but rather the pointers which would be stored inside this structure.
You can see this and this. I am trying to achieve something similar.
void pushData( string userData )
{
struct Data
{
char junk[1];
};
struct Data* data = malloc(userData.size() + 1);
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
Here we use an array of length 1, but we allocate the struct using malloc so it can actually have any size we want.
You ostensibly have some rather artificial constraints, but to answer the question: for a single struct to contain a variable amount of data is not possible... the closest you can come is to have the final member be say char [1], put such a struct at the start of a variably-sized heap region, and use the fact that array indexing is not checked to access memory beyond that character. To learn about this technique, see http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html (or the answer John Zwinck just posted)
Another approach is e.g. template <size_t N> struct X { char data_[size]; };, but each instantiation will be a separate struct type, and you can't pre-instantiate every size you might want at run-time (given you've said you don't want an upper bound). Even if you could, writing code that handles different instantiations as the data grows would be nightmarish, as would the code bloat caused.
Having a structure in one place with a string member with data in another place is almost always preferable to the hackery above.
Taking a hopefully-not-so-wild guess, I assume your interest is in serialising the object based on starting address and size, in some generic binary block read/write...? If so, that's still problematic even if your goal were satisfied, as you need to find out the current data size from somewhere. Writing struct-specific serialisation routines that incorporates the variable-length data on the heap is much more promising.
Simple solution:estimate max_size of data (ex 1000), to prevent memory leak(if free memory & malloc new size memory -> fragment memory) when pushData multiple called.
#define MAX_SIZE 1000
void pushData( string userData )
{
struct Data
{
char junk[MAX_SIZE];
};
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
As mentioned by John Zwinck....you can use dynamic memory allocation to solve your problem.
void pushData( string userData )
{
struct Data
{
char *junk;
};
struct Data *d = calloc(sizeof(struct data), 1);
d->junk = malloc(strlen(userData)+1);
strcpy(d->junk, userdata);
}

How do I fit a variable sized char array in a struct?

I don't understand how the reallocation of memory for a struct allows me to insert a larger char array into my struct.
Struct definition:
typedef struct props
{
char northTexture[1];
char southTexture[1];
char eastTexture[1];
char westTexture[1];
char floorTexture[1];
char ceilingTexture[1];
} PROPDATA;
example:
void function SetNorthTexture( PROPDATA* propData, char* northTexture )
{
if( strlen( northTexture ) != strlen( propData->northTexture ) )
{
PROPDATA* propPtr = (PROPDATA*)realloc( propData, sizeof( PROPDATA ) +
sizeof( northTexture ) );
if( propPtr != NULL )
{
strcpy( propData->northTexture, northTexture );
}
}
else
{
strcpy( propData->northTexture, northTexture );
}
}
I have tested something similar to this and it appears to work, I just don't understand how it does work. Now I expect some people are thinking "just use a char*" but I can't for whatever reason. The string has to be stored in the struct itself.
My confusion comes from the fact that I haven't resized my struct for any specific purpose. I haven't somehow indicated that I want the extra space to be allocated to the north texture char array in that example. I imagine the extra bit of memory I allocated is used for actually storing the string, and somehow when I call strcpy, it realises there is not enough space...
Any explanations on how this works (or how this is flawed even) would be great.
Is this C or C++? The code you've posted is C, but if it's actually C++ (as the tag implies) then use std::string. If it's C, then there are two options.
If (as you say) you must store the strings in the structure itself, then you can't resize them. C structures simply don't allow that. That "array of size 1" trick is sometimes used to bolt a single variable-length field onto the end of a structure, but can't be used anywhere else because each field has a fixed offset within the structure. The best you can do is decide on a maximum size, and make each an array of that size.
Otherwise, store each string as a char*, and resize with realloc.
This answer is not to promote the practice described below, but to explain things. There are good reasens not to use malloc and suggestions to use std::string, in other answers, are valid.
I think You have come across the trick used for example by Microsoft to avid the cost of a pointer dereference. In the case of Unsized Arrays in Structures (please check the link) it relies on a non-standard extension to the language. You can use a trick like that, even without the extension, but only for the struct member, that is positioned at it's end in the memory. Usually the last member in the structure declaration is also the last, in the memory, but check this question to know more about it. For the trick to work, You also have to make sure, the compiler won't add padding bytes at the end of the structure.
The general idea is like this: Suppose You have a structure with an array at the end like
struct MyStruct
{
int someIntField;
char someStr[1];
};
When allocating on the heap, You would normally say something like this
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct));
However, if You allocate more space, than Your stuct actually occupies, You can reference the bytes, that are laid out in the memory, right behind the struct with "out of bounds" access to the array elements. Assuming some typical sizes for the int and the char, and lack of padding bytes at the end, if You write this:
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct) + someMoreBytes);
The memory layout should look like:
| msp | msp+1 | msp+2 | msp+3 | msp+4 | msp+5 | msp+6 | ... |
| <- someIntField -> |someStr[0]| <- someMoreBytes -> |
In that case, You can reference the byte at the address msp+6 like this:
msp->someStr[2];
strcpy is not that intelligent, and it is not really working.
The call to realloc() allocates enough space for the string - so it doesn't actually crash but when you strcpy the string to propData->northTexture you may be overwriting anything following northTexture in propData - propData->southTexture, propData->westTexture etc.
For example is you called SetNorthTexture(prop, "texture");
and printed out the different textures then you would probably find that:
northTexture is "texture"
southTexture is "exture"
eastTexture is "xture" etc (assuming that the arrays are byte aligned).
Assuming you don't want to statically allocate char arrays big enough to hold the largest strings, and if you absolutely must have the strings in the structure then you can store the strings one after the other at the end of the structure. Obviously you will need to dynamically malloc your structure to have enough space to hold all the strings + offsets to their locations.
This is very messy and inefficient as you need to shuffle things around if strings are added, deleted or changed.
My confusion comes from the fact that
I haven't resized my struct for any
specific purpose.
In low level languages like C there is some kind of distinction between structs (or types in general) and actual memory. Allocation basically consists of two steps:
Allocation of raw memory buffer of right size
Telling the compiler that this piece of raw bytes should be treated as a structure
When you do realloc, you do not change the structure, but you change the buffer it is stored in, so you can use extra space beyond structure.
Note that, although your program will not crash, it's not correct. When you put text into northTexture, you will overwrite other structure fields.
NOTE: This has no char array example but it is the same principle. It is just a guess of mine of what are you trying to achieve.
My opinion is that you have seen somewhere something like this:
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO, *PBITMAPINFO;
What you are trying to obtain can happen only when the array is at the end of the struct (and only one array).
For example you allocate sizeof(BITMAPINFO)+15*sizeof(GBQUAD) when you need to store 16 RGBQUAD structures (1 from the structure and 15 extra).
PBITMAPINFO info = (PBITMAPINFO)malloc(sizeof(BITMAPINFO)+15*sizeof(GBQUAD));
You can access all the RGBQUAD structures like they are inside the BITMAPINFO structure:
info->bmiColors[0]
info->bmiColors[1]
...
info->bmiColors[15]
You can do something similar to an array declared as char bufStr[1] at the end of a struct.
Hope it helps.
One approach to keeping a struct and all its strings together in a single allocated memory block is something like this:
struct foo {
ptrdiff_t s1, s2, s3, s4;
size_t bufsize;
char buf[1];
} bar;
Allocate sizeof(struct foo)+total_string_size bytes and store the offsets to each string in the s1, s2, etc. members and bar.buf+bar.s1 is then a pointer to the first string, bar.buf+bar.s2 a pointer to the second string, etc.
You can use pointers rather than offsets if you know you won't need to realloc the struct.
Whether it makes sense to do something like this at all is debatable. One benefit is that it may help fight memory fragmentation or malloc/free overhead when you have a huge number of tiny data objects (especially in threaded environments). It also reduces error handling cleanup complexity if you have a single malloc failure to check for. There may be cache benefits to ensuring data locality. And it's possible (if you use offsets rather than pointers) to store the object on disk without any serialization (keeping in mind that your files are then machine/compiler-specific).

Handling different datatypes in a single structure

I need to send some information on a VxWorks message queue. The information to be sent is decided at runtime and may be of different data types. I am using a structure for this -
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
string m_strData; // variable to hold string value
};
I am currently sending an array of structData over the message queue.
structData arrStruct[MAX_SIZE];
The problem here is that only one variable in the structure is useful at a time, the other two are useless. The message queue is therefore unneccessarily overloaded.
I can't use unions because the datatype and the value are required.
I tried using templates, but it doesn't solve the problem.I can only send an array of structures of one datatype at a time.
template <typename T>
struct structData
{
char m_chType;
T m_Data;
}
structData<int> arrStruct[MAX_SIZE];
Is there a standard way to hold such information?
I don't see why you cannot use a union. This is the standard way:
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
union
{
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
char *m_strData; // variable to hold string value
}
};
Normally then, you switch on the data type, and then access on the field which is valid for that type.
Note that you cannot put a string into a union, because the string type is a non-POD type. I have changed it to use a pointer, which could be a C zero-terminated string. You must then consider the possibility of allocating and deleting the string data as necessary.
You can use boost::variant for this.
There are many ways to handle different datatypes. Besides the union solution you can use a generic struct like :
typedef struct
{
char m_type;
void* m_data;
}
structData;
This way you know the type and you can cast the void* pointer into the right type.
This is like the union solution a more C than C++ way of doing things.
The C++ way would be something using inheritance. You define a base "Data" class an use inheritance to specialize the data. You can use RTTI to check for type if needed.
But as you stated, you need to send your data over a VxWork queue. I'm no specialist but if those queues are OS realtime queue, all the previous solutions are not good ones. Your problem is that your data have variable length (in particular string) and you need to send them through a queue that probably ask for something like a fixed length datastruct and the actual length of this datastruct.
In my experience, the right way to handle this is to serialize the data into something like a buffer class/struct. This way you can optimize the size (you only serialize what you need) and you can send your buffer through your queue.
To serialize you can use something like 1 byte for type then data. To handle variable length data, you can use 1 to n bytes to encode data length, so you can deserialize the data.
For a string :
1 byte to code the type (0x01 = string, ...)
2 bytes to code the string length (if you need less than 65536 bytes)
n data bytes
So the string "Hello" will be serialized as :
0x00 0x00 0x07 0x65 0x48 0x6c 0x6c
You need a buffer class and a serializer/deserializer class. Then you do something like :
serialize data
send serialized data into queue
and on the other side
receive data
deserialize data
I hope it helps and that I have not misunderstood your problem. The serialization part is overkill if the VxWorks queues are not what I think ...
Be very careful with the "string" member in the message queue. Under the hood, it's a pointer to some malloc'd memory that contains the actual string characters, so you're only passing the 'pointer' in your queue, not the real string.
The receiving process may potentially not be able to access the string memory, or -worse - it may have already been destroyed by the time your message reader tries to get it.
+1 for 1800 and Ylisar.
Using an union for this kind of things is probably the way to go. But, as others pointed out, it has several drawbacks:
inherently error prone.
not safely extensible.
can't handle members with constructors (although you can use pointers).
So unless you can built a nice wrapper, going the boost::variant way is probably safer.
This is a bit offtopic, but this issue is one of the reasons why languages of the ML family have such a strong appeal (at least for me). For example, your issue is elegantly solved in OCaml with:
(*
* LData, FData and StrData are constructors for this sum type,
* they can have any number of arguments
*)
type structData = LData of int | FData of float | StrData of string
(*
* the compiler automatically infers the function signature
* and checks the match exhaustiveness.
*)
let print x =
match x with
| LData(i) -> Printf.printf "%d\n" i
| FData(f) -> Printf.printf "%f\n" f
| StrData(s) -> Printf.printf "%s\n" s
Try QVariant in Qt