llvm, defining strings and arrays via c++ API - c++

I develop a toy compiler, and trying to implement strings and arrays.
I have noticed that clang creates always a global variable for those types, even if they where defined within a function.
I guess that there is a good reason for that, so I try to do the same.
My problem is that I cannot figure out how to do it via c++ API.
kalidoscope tutorial does not cover strings and arrays, so the only source that I have found is the documentation.
In the documentation for the Module class, there is the function getOrInsertGlobal, which looks relevant, but I cannot understand how I set the actual value of the global. The function arguments include only the name and the type of the variable. So where does the value go?
So the question is: how can I define a global string, such as "hello" or array, such as [i32 1, i32 2] in llvm c++ API? Any example would be really appreciated.

What you want is called a read-only GlobalVariable and you need that variable, an initializer, and probably a constant cast so that all of your strings can have the same type.
Suppose your strings are the C kind — null-terminated sequences of bytes. In that case you'll want your strings to be an array of zero bytes, so that all arrays have the same type. But the initialisers need to be arrays of the right numbers of bytes, so that each initialiser's type will match its value. So you create your array using something like this (cut and pasted together from bits of code I've written, won't even compile, is not the most efficient way, but does contain most of the building blocks you need):
std::vector<llvm::Constant *> chars(utf8string.size());
for(unsigned int i = 0; i < utf8string.size(); i++)
chars[i] = ConstantInt::get(i8, utf8string[i]);
auto init = ConstantArray::get(ArrayType::get(i8, chars.size()),
GlobalVariable * v =
new GlobalVariable(module, init->getType(), true,
GlobalVariable::ExternalLinkage, init,
return ConstantExpr::getBitCast(v, i8->getPointerTo());
Note that a GlobalVariable is a pointer to whatever it's been initialised as, so if you initialise it with the five-byte sequence "test\0", then it'll be a pointer to a five bytes. Or, if you cast, it can be a pointer to 0 bytes (LLVM lets you index past the official end), or it can be an instance if an abstract type you define.

Using the code and the help of #arnt on the answer above, and I ended up with the following code to implement a string initialization. It now works, and also avoids the call to new, so it does not require any cleanup later.
I post it, hoping that it may be useful for someone.
llvm::Value* EulStringToken::generateValue(llvm::Module* module, llvm::LLVMContext context) {
//0. Defs
auto str = this->value;
auto charType = llvm::IntegerType::get(context, 8);
//1. Initialize chars vector
std::vector<llvm::Constant *> chars(str.length());
for(unsigned int i = 0; i < str.size(); i++) {
chars[i] = llvm::ConstantInt::get(charType, str[i]);
//1b. add a zero terminator too
chars.push_back(llvm::ConstantInt::get(charType, 0));
//2. Initialize the string from the characters
auto stringType = llvm::ArrayType::get(charType, chars.size());
//3. Create the declaration statement
auto globalDeclaration = (llvm::GlobalVariable*) module->getOrInsertGlobal(".str", stringType);
globalDeclaration->setInitializer(llvm::ConstantArray::get(stringType, chars));
globalDeclaration->setUnnamedAddr (llvm::GlobalValue::UnnamedAddr::Global);
//4. Return a cast to an i8*
return llvm::ConstantExpr::getBitCast(globalDeclaration, charType->getPointerTo());


Why do I have to make a 2d array for this

I was solving a question online on strings where we had to perform run-length encoding on a given string, I wrote this function to achieve the answer
using namespace std;
string runLengthEncoding(string str) {
vector <char> encString;
int runLength = 1;
for(int i = 1; i < str.length(); i++)
if(str[i - 1] != str[i] || runLength == 9)
encString.push_back(str[i - 1]);
runLength = 0;
encString.push_back(str[str.size() - 1]);
string encodedString(encString.begin(), encString.end());
return encodedString;
Here I was getting a very long error on this particular line in the for loop and outside it when I wrote:
which I later found out should be:
I don't quite understand why I have to insert it as a 2D element(I don't know if that is the right way to say it, forgive me I am a beginner in this) when I am just trying to insert the integer...
In stupid terms - why do I gotta add [0] in this?
std::to_string() returns a std::string. That's what it does, if you check your C++ textbook for a description of this C++ library function that's what you will read there.
encString.push_back( /* something */ )
Because encString is a std::vector<char>, it logically follows that the only thing can be push_back() into it is a char. Just a single char. C++ does not allow you to pass an entire std::string to a function that takes a single char parameter. C++ does not work this way, C++ allows only certain, specific conversions betweens different types, and this isn't one of them.
And that's why encString.push_back(to_string(runLength)); does not work. The [0] operator returns the first char from the returned std::string. What a lucky coincidence! You get a char from that, the push_back() expects a single char value, and everyone lives happily ever after.
Also, it is important to note that you do not, do not "gotta add [0]". You could use [1], if you have to add the 2nd character from the string, or any other character from the string, in the same manner. This explains the compilation error. Whether [0] is the right solution, or not, is something that you'll need to figure out separately. You wanted to know why this does not compile without the [0], and that's the answer: to_string() returns a std::string put you must push_back() a single char value, and using [0] makes it happen. Whether it's the right char, or not, that's a completely different question.

LLVM Pass - Issues replacing a GlobalVariable

I am trying to write an LLVM pass which manipulates strings.
After iterating all the GlobalVariable objects and picking out the strings, I get the string data, perform the manipulation, create a new GlobalVariable and then use replaceAllUsesWith() to replace the old with the new. Sounds simple enough...
However, I am getting an assert error, telling me that the replacement should be the same type. I have not changed the length of the string, so I don't know why the type would be different. A cut down version of the code is below.
for (Module::global_iterator gi = M.global_begin(), ge = M.global_end(); gi != ge; gi++) {
GlobalVariable *gv = *gi;
ConstantDataSequential *cdata = dyn_cast<ConstantDataSequential>(gv->getInitializer());
std::string orig = "";
if (cdata->isString() {
orig = cdata->getAsString();
} else if (cdata->isCString() {
orig = cdata->getAsCString();
} else {
// string returned has the same length, but different contents
std::string modified = manipulateString(orig);
std::ostringstream oss;
oss << gv->getName() << "Modified" ;
Constant *cMod = ConstantDataArray::getString(M.getContext(), modified, true);
GlobalVariable *newGv = new GlobalVariable(M,
Note: I've hand typed this code, so it may not compile, but it should serve as an illustration of what I'm trying to achieve and how I'm trying to achieve it.
For some reason, the new GlobalVariable has a different type. Printing the types at runtime yields:
gv->getType() = [36 x i8]*
newGv->getType() = [37 * x i8]*
The size of both strings are 36 chars. Why is the type of the new GlobalVariable different, even though the string length has not changed? Why has an extra element been added?
Also, replaceAllUsesWith() requires that the replacement be same type. If I wanted the replacement to be string of a different length, how would I achieve that?
You cannot replace with an object of a different type. You can, however, cast the GlobalVariable to have the right type. What you want is...
ConstantExpr::getPointerCast(newGv, gv->getType());
...except that that won't compile, because the second argument has to be a PointerType. You can always add another level of casting, making the code less clear but the compiler more happy:
ConstantExpr::getPointerCast(newGv, cast<PointerType>(gv->getType()));
I have found it helpful to user 0-length arrays for all variable-length arrays, and always cast constants to that.

How to put arguments in a function at run time?

So I am using execlp in my c++ program. execlp is of the form " int execlp(const char *file, const char *arg0,...,const char *argn)" meaning that it can take arbitrary amount of arguments. I just want to know that is there a way I can put arguments inside this function at run time? Since the arguments are provided by the user, there is no way for me to know the exact number of arguments. Of course I can pick a ridiculously large number from the start but that won't be very efficient.I need a more efficient way that would allow me to put arguments at run time.
If you are not required to use execlp, execv or execvp are better functions for your requirement.
From http://linux.die.net/man/3/execlp
The execv(), execvp(), and execvpe() functions provide an array of pointers to null-terminated strings that represent the argument list available to the new program. The first argument, by convention, should point to the filename associated with the file being executed. The array of pointers must be terminated by a NULL pointer.
I guess that you are using Linux or some other POSIX system.
You obviously need, as R.Sahu answered, to use functions like execv(3), which takes an array of arguments to execve(2) syscall. You could allocate that array in C dynamic memory with malloc(3) or friends (calloc). If coding in C++, you would use new.
For a useless example, here is a chunk of code executing /bin/echo on an array of arguments 1, 2, .... nargs where int nargs; is strictly positive.
Variant in C99
char** myargs = malloc ((nargs+2)*sizeof(char*));
if (!myargs) { perror("malloc myargs"); exit(EXIT_FAILURE); };
myargs[0] = "echo";
for (int ix=0; ix<nargs; ix++)
{ char buf[32];
myargs[ix+1] = strdup(buf);
if (!myargs[ix+1]) { perror("strdup"); exit(EXIT_FAILURE); };
myargs[nargs+1] = NULL;
execv("/bin/echo", myargs);
perror("exec echo failed");
In C++ you would e.g. code char**myargs = new char*[nargs+2];
In general, you need to later free (in C++, use delete) heap allocated memory. Here it is not really needed, since execv does not return. However, in other occasions (e.g. if using fork before execv, so the parent process is continuing and would later waitpid), you need a loop to free each individual element (result of strdup), then you need to free the entire myargs array.
Regarding the general question of calling an arbitrary (runtime-known) function of arbitrary signature, this is not possible in plain standard C99, but you could use some libraries (with a few assembler or machine specific code inside them) like libffi
In genuine C++11 you still need the array argument to execv to be an array of char*. You might consider using (as an intermediate step) some std::vector<std::string> but you'll need at least to transform it into a std::vector<char*> then pass the data to execve. Read about std::string (and its c_str member function) and std::vector (and its data member function). You could try something like:
assert (nargs>0);
std::vector<std::string> vecstr;
vecstr[0] = "echo";
for (int ix=0; ix<nargs; ix++) vecstr[ix+1] = std::to_string(ix+1);
std::vector<const char*> vecargs;
std::transform(vecstr.begin(), vecargs.begin(),
[](const std::string&s) { return s.c_str(); });
vecargs[nargs+1] = nullptr;
execv("/bin/echo", vecargs.data());
throw std::runtime_error(std::string{"exec failure:"}+strerror(errno));
Notice that execv can fail, in particular when the array of arguments is too big; usually the limit is a few hundred thousands elements, but it can be much smaller.

C++ WriteProcessMemory Without Variables

I want to do WriteProcessMemory In C++ using Dword or Int, without storing it in a Variable i found one way to do this but i can only do it with bytes. does anyone know how to do this??
this one works using bytes.
WriteProcessMemory(hProcess, (void*)(BasePointer + 0x728),"\x90\x90", 4, NULL);
Thanks for the help everyone i made a function and its working really good
void WriteMemory(DWORD Address,DWORD NewValue, int NewValueSize)
WriteProcessMemory(hProcess, (void*)Address, (void*)&NewValue, NewValueSize, NULL);
int main()
WriteMemory((BasePointer + 0x6F8),2+rand()%65500,2);
return 0;
The reason your code "works" with bytes is that you're using a string literal. A string literal is an array of char, and an array of char automatically converts to a pointer to the first element if the context calls for it, as it does when you try to pass one as the third argument of WriteProcessMemory.
You can write any value you want as a string literal, including a four-byte DWord, as long as you're willing to express it one byte at a time. For example, "\x70\x71\x72\x73". On Windows, that's equivalent to a pointer to the DWord value 0x73727170. You probably won't want to do that, though; expressing numbers like that is tedious.
C++ doesn't offer any facility for having literal arrays of non-char type. There's just not much demand for it. Demand for literal char arrays is high because everyone deals with text, so we want easy ways of expressing it in our code. Although everyone also works with numbers, we rarely have need to express blobs of numerical data in our code, especially not mid-expression.
You haven't given a practical problem to be solved by your question. You're just asking whether something is possible to do. I'm sorry to be the bearer of bad news, but the answer is that what you're asking for cannot be done in C++. You'll just have to do like everyone else and declare a variable. Variables are cheap; feel free to use them whenever the need arises. Nonetheless, you've been shown ways to keep your code concise by using subroutines. Macros can also help shorten your code, if that's your goal.
Please also note that the string literal in your code is an array of three characters — the two between quotation marks, plus the nul character the compiler automatically includes at the end of all string literals. You're telling the function that you've provided a pointer to a block of four bytes, which is false. the fourth byte that the function writes into the other process will have an unspecified value.
Put the data into an array, and have a small loop get each item from the array, write it to the target process, then move to the next:
struct data {
DWORD offset;
DWORD length;
char data[256];
data items[] = {
{0x728, 4, "\x90\x90"},
// ...
for (int i=0; i<elements(items); i++)
WriteProcessMemory(hProcess, (void *)(BasePointer + items[i].offset), items[i].data, items[i].length, NULL);

Character Pointers (allotted by new)

I wrote the following code:
char *pch=new char[12];
char *f=new char[42];
char *lab=new char[20];
char *mne=new char[10];
char *add=new char[10];
If initially I want these arrays to be null, can't I do this:
and so on.....
And after that if I want to add some cstring to an empty array can't I check:
//then add cstring by *lab="cstring";
And if I can't do any of these things, please tell me the right way to do it...
In C++11, an easy way to initialize arrays is by using brace-initializers:
char * p = new char[100] { 0 };
The reasoning here is that all the missing array elements will be zero-initialized. You can also use explicit value-initialization (I think that's even allowed in C++98/03), which is zero-initalization for the primitive types:
char * q = new char[110]();
First of all, as DeadMG says, the correct way of doing this is using std:string:
std::string lab; // empty initially, no further initialization needed
if (lab.size() == 0) // string empty, note, very fast, no character comparison
lab += "cstring"; // or even lab = "cstring", as lab is empty
Also, in your code, if you insist in using C strings, after the initialization, the correct checking for the empty string would be
if (*lab == '\0')
First of all, I agree with everybody else to use a std::string instead of character arrays the vast majority of the time. Link for help is here: C++ Strings Library
Now to directly answer your question as well:
and so on.....
This is wrong. Assuming your compiler doesn't give you an error, you're not assigning the "null terminator" to those arrays, you're trying to assign the pointer value of where the "\0" string is to the first few memory locations where the char* is pointing to! Remember, your variables are pointers, not strings. If you're trying to just put a null-character at the beginning, so that strlen or other C-string functions see an "empty" string, do this: *lab='\0'; The difference is that with single-ticks, it denotes the character \0 whereas with double, it's a string literal, which returns a pointer to the first element. I hope that made sense.
Now for your second, again, you can't just "assign" like that to C-style strings. You need to put each character into the array and terminate it correctly. Usually the easiest way is with sprintf:
sprintf(lab, "%s", "mystring");
This may not make much sense, especially as I'm not dereferencing the pointer, but I'll walk you through it. The first argument says to sprintf "output your characters to where this pointer is pointing." So it needs the raw pointer. The second is a format string, like printf uses. So I'm telling it to use the first argument as a string. And the 3rd is what I want in there, a pointer to another string. This example would also work with sprintf(lab, "mystring") as well.
If you want to get into C-style string processing, you need to read some examples. I'm afraid I don't even know where to look on the 'net for good examples of that, but I wish you good luck. I'd highly recommend that you check out the C++ strings library though, and the basic_string<> type there. That's typedef'd to just std::string, which is what you should use.