Load globalvariable raw data value in llvm - llvm

When I iterate GlobalVariable and use:
Constant *initializer = gv->getInitializer();
ConstantDataSequential *cdata = dyn_cast<ConstantDataSequential>(initializer);
const char *array=cdata->getRawDataValues().data()
to put them into a char array, I can modify the data like operating on a char array object.
However when I use LoadInst and GetElementPtrInst to get the element, I can not treat every element as the char type. The question is how can I achieve the same goal using llvm API.

I can not treat every element as the char type
Sorry, I just want to know can we load the value in globalvariable and convert it into char* or string
I suggest you take a look at LLVM Language Reference Manual as well as the DragonBook since judging by your post history you pretty much have absolutely no idea about compiler internals yet.
As for your question, there is no such thing as char or string type at the LLVM IR Level and the char* or string is of type i8* usually at the IR level, which you could find out yourself by reading through the LLVM IR. Some combination of LoadInst,GetElementPtr and BitCast could achieve what you are trying to do.
On a side note, if you are trying to write StringEncryption passes which I believe is the case judging by your post history, take a look at my open-source implementation and hopefully you could learn a bit or so

Related

C++ passing variable number of locations from a given array to a function

The following piece of code works in C++ when running on Windows:
void* paramsList[MAX_PARAMS_NUM] = { 0 };
...some code to populate paramsList (p.s MAX_PARAMS_NUM is a constant)
vsnprintf((char*)pStr, MAXLEN, (char*)pTempFormat, (va_list)paramsList);
This code works fine on Windows, but i am trying to make it run on Linux and the program crushes because this conversion of paramsList to va_list doesn't work there.
Now the setting of this scenario is that i get a format string from a server that i don't control. The format string ('pTempFormat') is like the one used in printf of unknown number of % in it (maximum is MAX_PARAMS_NUM) and i populate the paramsList accordingly and then i use vsnprintf to create a string from the format string i got and the values populated in paramsList.(those values can be anything from integers, to hex to char * (aka strings) and any combination of them, according to the format string received from the server).
i don't know how many locations paramsList to pass to vsnprintf until i finish populating it according to the format string received from the server. So i need to somehow either pass a variable number of locations from paramsListto vsnprintf or to convert those locations into va_list (which i couldn't figure out how to do from what i read online).
I also considered using a combination of variadic templates and va_list - to somehow pass a variable number of locations from paramsListto a variadic function and to pass them on to vsnprintf. But i couldn't figure out how to pass certain locations from a given array to a variadic function either.
Update:
I use Visual Studio 2015 to compile on Windows, and GCC 4.9 on Ubuntu.
The error i am getting when trying to compile this code on Linux is: error: ISO C++ forbids casting to an array type 'va_list {aka __va_list_tag [1]}'
va_list is an unspecified type. That means it may be a void* [] or something else entirely.
That it worked by chance in some cases is just that va_list is compatible with void* [] on one particular platform for one compiler, it is by no means indication that this is legal.
The correct way to deal with this is, unfortunately, to stop using the printf family and parse the format string yourself, there is no standard functionality to reach in and fetch the parsed format string to use for yourself.

How to hard code binary data to string

I want to test serialized data conversion in my application, currently the object is stored in file and read the binary file and reloading the object.
In my unit test case I want to test this operation. As the file operations are costly I want to hard code the binary file content in the code itself.
How can I do this?
Currently I am trying like this,
std::string FileContent = "\00\00\00\00\00.........";
and it is not working.
You're right that a string can contain '\0', but here you're still initializing it from const char*, which, by definition, stops at the first '\0'. I'd recommend you to use uint8_t[] or even uint32_t[] (that is, without passing to std::string), even if the second might have up to 3 bytes of overhead (but it's more compact when in source). That's e.g. how X bitmaps are usually stored.
Another possibility is base64 encoding, which is printable but needs (a relatively quick) decoding.
If you really want to put the const char[] to a std::string, first convert the pointer to const char*, then use the two-iterator constructor of std::string. While it's true that std::string can hold '\0', it's somewhat an antipattern to store binary in a string, thus I'm not giving the exact code, just the hint.
The following should do what you need, however probably not recommended as most people wouldn't expect an std::string to contain null bytes.
std::string FileContent { "\x00\x00\x00\x00\x00", 5 };

When to use std::string vs char*? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C++ char* vs std::string
I'm new to C++ coming from C# but I really do like C++ much better.
I have an abstract class that defines two constant strings (not static). And I wondered if a const char* would be a better choice. I'm still getting the hang of the C++ standards, but I just figured that there really isn't any reason why I would need to use std::string in this particular case (no appending or editing the string, just writing to the console via printf).
Should I stick to std::string in every case?
Should I stick to std::string in every case?
Yes.
Except perhaps for a few edge cases where you writing a high performance multi-threaded logging lib and you really need to know when a memory allocation is going to take place, or perhaps in fiddling with individual bits in a packet header in some low level protocol/driver.
The problem is that you start with a simple char and then you need to print it, so you use printf(), then perhaps a sprintf() to parse it because std::stream would be a pain for just one int to string. And you end up with an unsafe and unmaintainable mix oc c/c++
I would stick to using std::string instead of const char*, simply because most of the built-in C++ libraries work with strings and not character arrays. std::string has a lot of built-in methods and facilities that give the programmer a lot of power when manipulating strings.
Should I stick to std::string in every case?
There are cases where std::string isn't needed and just a plain char const* will do. However you do get other functionality besides manipulation, you also get to do comparison with other strings and char arrays, and all the standard algorithms to operate on them.
I would say go with std::string by default (for members and variables), and then only change if you happen to see that is the cause of a performance drop (which it won't).
Use std::string when you need to store a value.
Use const char * when you want maximum flexibility, as almost everything can be easily converted to or from one.
This like comparing Apples to Oranges. std::string is a container class while char* is just a pointer to a character sequence.
It really all depends on what you want to do with the string.
Std::string on the other hand can give you a quick access for simple string calculation and manipulation function. Most of those are simple string manipulation functions, nothing fancy really.
So it basically depends on your needs and how your functions are declared. The only advantage for std::string over a char pointer is that it doesnt require a specific lenghth decleration.

C++: How to add raw binary data into source with Visual Studio?

I have a binary file which i want to embed directly into my source code, so it will be compiled into the .exe file directly, instead of reading it from a file, so the data would already be in the memory when i launch the program.
How do i do this?
Only idea i got was to encode my binary data into base64, put it in a string variable and then decode it back to raw binary data, but this is tricky method which will cause pointless memory allocating. Also, i would like to store the data in the .exe as compact as the original data was.
Edit: The reason i thought of using base64 was because i wanted to make the source code files as small as possible too.
The easiest and most portable way would be to write a small
program which converts the data to a C++ source, then compile
that and link it into your program. This generated file might
look something like:
unsigned char rawData[] =
{
0x12, 0x34, // ...
};
There are tools for this, a typical name is "bin2c". The first search result is this page.
You need to make a char array, and preferably also make it static const.
In C:
Some care might be needed since you can't have a char-typed literal, and also because generally the signedness of C's char datatype is up to the implementation.
You might want to use a format such as
static const unsigned char my_data[] = { (unsigned char) 0xfeu, (unsigned char) 0xabu, /* ... */ };
Note that each unsigned int literal is cast to unsigned char, and also the 'u' suffix that makes them unsigned.
Since this question was for C++, where you can have a char-typed literal, you might consider using a format such as this, instead:
static const char my_data[] = { '\xfe', '\xab', /* ... */ };
since this is just an array of char, you could just as well use ordinary string literal syntax. Embedding zero-bytes should be fine, as long as you don't try to treat it as a string:
static const char my_data[] = "\xfe\xdab ...";
This is the most compact solution. In fact, you could probably use that for C, too.
You can use resource files (.rc). Sometimes they are bad, but for Windows based application that's the usual way.
Why base64? Just store the file as it is in one char*.

Initializing a char array in C. Which way is better?

The following are the two ways of initializing a char array:
char charArray1[] = "foo";
char charArray2[] = {'f','o','o','\0'};
If both are equivalent, one would expect everyone to use the first option above (since it requires fewer key strokes). But I've seen code where the author takes the pain to always use the second method.
My guess is that in the first case the string "foo" is stored in the data segment and copied into the array at runtime, whereas in the second case the characters are stored in the code segment and copied into the array at runtime. And for some reason, the author is allergic to having anything in the data segment.
Edit: Assume the arrays are declared local to a function.
Questions: Is my reasoning correct? Which is your preferred style and why?
What about another possibility:
char charArray3[] = {102, 111, 111, 0};
You shouldn't forget the C char type is a numeric type, it just happens the value is often used as a char code. But if I use an array for something not related to text at all, I would would definitely prefer initialize it with the above syntax than encode it to letters and put them between quotes.
If you don't want the terminal 0 you also have to use the second form or in C use:
char charArray3[3] = "foo";
It is a a C feature that nearly nobody knows, but if the compiler does not have room enough to hold the final 0 when initializing a charArray, it does not put it, but the code is legal. However this should be avoided because this feature has been removed from C++, and a C++ compiler would yield an error.
I checked the assembly code generated by gcc, and all the different forms are equivalent. The only difference is that it uses either .string or .byte pseudo instruction to declare data. But tha's just a readability issue and does not make a bit of difference in the resulting program.
I think the second method is used mostly in legacy code where compilers didn't support the first method. Both methods should store the data in the data segments. I prefer the first method due to readability. Also, I needed to patch a program once (can't remember which, it was a standard UNIX tool) to not use /etc (it was for an embedded system). I had a very hard time finding the correct place because they used the second method and my grep couldn't find "etc" anywhere :-)