LLVM Pass - Issues replacing a GlobalVariable - llvm

I am trying to write an LLVM pass which manipulates strings.
After iterating all the GlobalVariable objects and picking out the strings, I get the string data, perform the manipulation, create a new GlobalVariable and then use replaceAllUsesWith() to replace the old with the new. Sounds simple enough...
However, I am getting an assert error, telling me that the replacement should be the same type. I have not changed the length of the string, so I don't know why the type would be different. A cut down version of the code is below.
for (Module::global_iterator gi = M.global_begin(), ge = M.global_end(); gi != ge; gi++) {
GlobalVariable *gv = *gi;
ConstantDataSequential *cdata = dyn_cast<ConstantDataSequential>(gv->getInitializer());
std::string orig = "";
if (cdata->isString() {
orig = cdata->getAsString();
} else if (cdata->isCString() {
orig = cdata->getAsCString();
} else {
continue;
}
// string returned has the same length, but different contents
std::string modified = manipulateString(orig);
std::ostringstream oss;
oss << gv->getName() << "Modified" ;
Constant *cMod = ConstantDataArray::getString(M.getContext(), modified, true);
GlobalVariable *newGv = new GlobalVariable(M,
cMod->getType(),
true,
GlobalValue::ExternalLinkage,
cMod,
oss.str());
gv->replaceAllUsesWith(newGv);
}
Note: I've hand typed this code, so it may not compile, but it should serve as an illustration of what I'm trying to achieve and how I'm trying to achieve it.
For some reason, the new GlobalVariable has a different type. Printing the types at runtime yields:
gv->getType() = [36 x i8]*
newGv->getType() = [37 * x i8]*
The size of both strings are 36 chars. Why is the type of the new GlobalVariable different, even though the string length has not changed? Why has an extra element been added?
Also, replaceAllUsesWith() requires that the replacement be same type. If I wanted the replacement to be string of a different length, how would I achieve that?

You cannot replace with an object of a different type. You can, however, cast the GlobalVariable to have the right type. What you want is...
ConstantExpr::getPointerCast(newGv, gv->getType());
...except that that won't compile, because the second argument has to be a PointerType. You can always add another level of casting, making the code less clear but the compiler more happy:
ConstantExpr::getPointerCast(newGv, cast<PointerType>(gv->getType()));
I have found it helpful to user 0-length arrays for all variable-length arrays, and always cast constants to that.

Related

Problem in comparing strings without assigning to variable

int i = ("aac" > "aab");
cout << i;
The above code does not give me the output as 1 (as it should be). But when I assign "aac" and "aab" to two separate string variables and use the variables instead of using strings directly (code attached below), I get the desired output.
Could anyone help me please?
string s1 = "aac";
string s2 = "aab";
int i = (s1 > s2);
cout << i;
Literal constants like "aac" aren't std::string objects; rather, they are just data in (read-only) memory that evaluate, in most 'access' cases, to the address of their first element (i.e. a char* pointer); so, a comparison between them will be a comparison between those addresses — something you are unlikely to be able to control or predict.
To get an inline comparison, in your case, you can use inline std::string constructors (sometimes knows as "wrappers"), like this:
int i=(string("aac")>string("aab"));
Or, using the more 'modern' "curly-brace" initializer syntax:
int i = (string{ "aac" } > string{ "aab" });
For more brevity, you can make use of the fact that std::string has versions of the > (and similar) operators that take a string literal as one of the arguments; thus, you need only 'wrap' one of the literals, and could reduce the above code to something like:
int i = (string{ "aac" } > "aab");
If you use C-style char * / char [] strings, you need to use strcmp like:
int i = strcmp("aac", "aab");
Otherwise, you are just comparing addresses of the first elements of both of strings.

llvm, defining strings and arrays via c++ API

I develop a toy compiler, and trying to implement strings and arrays.
I have noticed that clang creates always a global variable for those types, even if they where defined within a function.
I guess that there is a good reason for that, so I try to do the same.
My problem is that I cannot figure out how to do it via c++ API.
kalidoscope tutorial does not cover strings and arrays, so the only source that I have found is the documentation.
In the documentation for the Module class, there is the function getOrInsertGlobal, which looks relevant, but I cannot understand how I set the actual value of the global. The function arguments include only the name and the type of the variable. So where does the value go?
So the question is: how can I define a global string, such as "hello" or array, such as [i32 1, i32 2] in llvm c++ API? Any example would be really appreciated.
What you want is called a read-only GlobalVariable and you need that variable, an initializer, and probably a constant cast so that all of your strings can have the same type.
Suppose your strings are the C kind — null-terminated sequences of bytes. In that case you'll want your strings to be an array of zero bytes, so that all arrays have the same type. But the initialisers need to be arrays of the right numbers of bytes, so that each initialiser's type will match its value. So you create your array using something like this (cut and pasted together from bits of code I've written, won't even compile, is not the most efficient way, but does contain most of the building blocks you need):
std::vector<llvm::Constant *> chars(utf8string.size());
for(unsigned int i = 0; i < utf8string.size(); i++)
chars[i] = ConstantInt::get(i8, utf8string[i]);
auto init = ConstantArray::get(ArrayType::get(i8, chars.size()),
entries);
GlobalVariable * v =
new GlobalVariable(module, init->getType(), true,
GlobalVariable::ExternalLinkage, init,
utf8string);
return ConstantExpr::getBitCast(v, i8->getPointerTo());
Note that a GlobalVariable is a pointer to whatever it's been initialised as, so if you initialise it with the five-byte sequence "test\0", then it'll be a pointer to a five bytes. Or, if you cast, it can be a pointer to 0 bytes (LLVM lets you index past the official end), or it can be an instance if an abstract type you define.
Using the code and the help of #arnt on the answer above, and I ended up with the following code to implement a string initialization. It now works, and also avoids the call to new, so it does not require any cleanup later.
I post it, hoping that it may be useful for someone.
llvm::Value* EulStringToken::generateValue(llvm::Module* module, llvm::LLVMContext context) {
//0. Defs
auto str = this->value;
auto charType = llvm::IntegerType::get(context, 8);
//1. Initialize chars vector
std::vector<llvm::Constant *> chars(str.length());
for(unsigned int i = 0; i < str.size(); i++) {
chars[i] = llvm::ConstantInt::get(charType, str[i]);
}
//1b. add a zero terminator too
chars.push_back(llvm::ConstantInt::get(charType, 0));
//2. Initialize the string from the characters
auto stringType = llvm::ArrayType::get(charType, chars.size());
//3. Create the declaration statement
auto globalDeclaration = (llvm::GlobalVariable*) module->getOrInsertGlobal(".str", stringType);
globalDeclaration->setInitializer(llvm::ConstantArray::get(stringType, chars));
globalDeclaration->setConstant(true);
globalDeclaration->setLinkage(llvm::GlobalValue::LinkageTypes::PrivateLinkage);
globalDeclaration->setUnnamedAddr (llvm::GlobalValue::UnnamedAddr::Global);
//4. Return a cast to an i8*
return llvm::ConstantExpr::getBitCast(globalDeclaration, charType->getPointerTo());
}

How do I declare a new string the same length of a known const string?

I've been using:
string letters = THESAMELENGTH; // Assign for allocation purposes.
Reason being, if I:
string letters[THESAMELENGTH.length()];
I get a non constant expression complaint.
But if I:
string letters[12];
I'm at risk of needing to change every instance if the guide const string changes size.
But it seems foolish to assign a string when I won't use those entries, I only want my newly assigned string to be the same length as the previously assigned const string, then fill with different values.
How do you recommend I do this gracefully and safely?
You can
string letters(THESAMELENGTH.length(), ' '); // constructs the string with THESAMELENGTH.length() copies of character ' '
BTW: string letters[12]; doesn't mean the same as you expected. It declares a raw array of string containing 12 elements.
I only want my newly assigned string to be the same length as the previously assigned const string, then fill with different values.
Part of the reason the string class/type exists is so you don't have to worry about trying to manage its length. (The problem with arrays of char.)
If you have a const std::string tmp then you can't just assign anything to it after it has already been initialized. E.g.:
const std::string tmp = "A value"; // initialization
tmp = "Another value"; // compile error
How do you recommend I do this gracefully and safely?
If you really want to keep strings to a specific size, regardless of their contents, you could always resize your string variables. For example:
// in some constants.h file
const int MAX_STRING_LENGTH = 16;
// in other files
#include "constants.h"
// ...
std::string word = ... // some unknown string
word.resize(MAX_STRING_LENGTH);
Now your word string will have a length/size of MAX_STRING_LENGTH and anything beyond the end gets truncated.
This example is from C++ Reference
// resizing string
#include <iostream>
#include <string>
int main ()
{
std::string str ("I like to code in C");
std::cout << str << '\n';
unsigned sz = str.size();
str.resize (sz+2,'+');
std::cout << str << '\n';
str.resize (14);
std::cout << str << '\n';
return 0;
}
// program output
I like to code in C
I like to code in C++
I like to code
You can't just ask a string variable for its length at compile-time. By definition, it's impossible to know the value of a variable, or the state of any given program for that matter, while it's not running. This question only makes sense at run-time.
Others have mentioned this, but there seems to be an issue with your understanding of string letters[12];. That gives you an array of string types, i.e. you get space for 12 full strings (e.g. words/sentences/etc), not just letters.
In other words, you could do:
for(size_t i = 0; i < letters.size(); ++i)
letters[i] = "Hello, world!";
So your letters variable should be renamed to something more accurate (e.g. words).
If you really want letters (e.g. the full alphabet on a single string), you could do something like this:
// constants.h
const std::string ALPHABET_LC = "abc...z";
const std::string ALPHABET_UC = "ABC...Z";
const int LETTER_A = 0;
const int LETTER_B = 1;
// ...
// main.cpp, etc.
char a = ALPHABET_LC[LETTER_A];
char B = ALPHABET_UC[LETTER_B];
// ...
It all depends on what you need to do, but this might be a good alternative.
Disclaimer: Note that it's not really my recommendation that you do this. You should let strings manage their own length. For example, if the string value is actually shorter than your limit, you're causing your variable to use more space/memory than needed, and if it's longer, you're still truncating it. Neither side-effect is good, IMHO.
The first thing you need to do is understand the difference between a string length and an array dimension.
std::string letters = "Hello";
creates a single string that contains the characters from "Hello", and has length 5.
In comparison
std::string letters[5];
creates an array of five distinct default-constructed objects of type std::string. It doesn't create a single string of 5 characters. The reason for the non-constant complaint when doing
std::string letters[THESAMELENGTH.length()];
is that construction of arrays in standard C++ is required to use a length known to the compiler, whereas the length of a std::string is determined at run time.
If you have a string, and what to create another string of the same length, you can do something like
std::string another_string(letters.length(), 'A');
which will create a single string containing the required number of letters 'A'.
It is largely pointless to do what you are seeking as a std::string can dynamically change its length anyway, as needed. There is also nothing stopping a std::string from allocating more than it needs (e.g. to make provision for multiple increases in its length).

C++ Multi dimensional array from external file question

I'm trying to get a contiguous line with values separated by "&" to load into a multi-dimensional array. Here's the way I'm trying to do it - Everything checks out in the code, except the string "str" which contains my separated values in the format "value1, value2, value3, etc..." just loads that whole string into array[0][0]. I know there are better ways of doing this, but what I would like to know is why C++ won't treat "str" as if I had typed out the individual values and hard coded "array".
Here is the code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int main(int argc, char* argv[])
{
string str, strTotal;
ifstream in;
in.open("Desktop/01_001.PAC");
getline(in,str);
while ( in ) {
strTotal += str;
getline(in,str);
}
string searchString( "&" );
string replaceString( ", " );
assert( searchString != replaceString );
string::size_type pos = 0;
while ( (pos = str.find(searchString, pos)) != string::npos ) {
str.replace( pos, searchString.size(), replaceString );
pos++;
}
string array[4][5] = {str};
cout << array[0][0];
return(0);
}
And here is the external file ("Desktop/01_001.PAC"):
void&void&void&void&a&a1&a2&a3&b&b1&b2&b3&c&c1&c2&c3&d&d1&d2&d3
Thanks in advance!
Because code and data are different things. Your code is compiled before it runs.
It sounds as if this is what you expect:
The string contains the text "foo, bar, baz".
The statement string[] whatever = {str}; is run.
Since "str" contains "foo, bar, baz", you want it to have the same effect as if the line of code were actually string[] whatever = {"foo", "bar", "baz"}.
Asking something like this implies a complete misunderstanding of how programming works.
Nothing this magical will ever happen in C++. It cannot, because (a) what if you actually wanted to put 'str' into the array? (b) what if 'foo', 'bar' and 'baz' were also variables in your program - should they be interpreted the same way?
Variable names are not text. They no longer exist, for all practical purposes, at the time that your code runs. They are only there so that you, as the programmer, can say "the value that is used over here should be the same one that is used over there".
Further, array initializations in C++ do not care how many elements are actually in the initialization vs. the declared size of the array. Any additional elements will be default-initialized (i.e., assigned empty strings).
A string cannot be treated like an array of strings, because it isn't one. If you want an array of strings, then build it, using the individual string elements as you determine them.
But since you don't know in advance how many elements there are, you should use std::vector instead of an array. And why are you trying to arrange the data into a 2-dimensional structure? How are you expecting to know how "wide" it should be?
If I'm reading your code correctly, you appear to be searching through the string (loaded from file), and only assigning the very last result to an array index (x=4, y=5). So your code is doing something like this:
while (have not found last variable)
search for next variable in string
assign variable to (4,5) in matrix
So that last assignment might even work, but since you only assign at the end, the array is not going to be filled the way I think you want it to be filled.
I'm going to assume the matrix you want is always the same size, otherwise things get more complicated. In this case, you could use something like this:
let xMax = 4
let yMax = 5
for (x from 0 to xMax)
for (y from 0 to yMax)
find the next variable in the string
assign it to the current (x,y) location in matrix
Debug statements are your friend here! Try the above solution without saving it to an array, and instead print out each term, to see if it is working correctly.
I would also point out that the string "void" is not the C++ keyword void, and so will not work if you want an array index to be void. Try getting your code to work without voids at first.

Casting string type with GetDlgItemText() for use as string buffer in C++

I am stumped by the behaviour of the following in my Win32 (ANSI) function:
(Multi-Byte Character Set NOT UNICODE)
void sOut( HWND hwnd, string sText ) // Add new text to EDIT Control
{
int len;
string sBuf, sDisplay;
len = GetWindowTextLength( GetDlgItem(hwnd, IDC_EDIT_RESULTS) );
if(len > 0)
{
// HERE:
sBuf.resize(len+1, 0); // Create a string big enough for the data
GetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sBuf.data(), len+1 );
} // MessageBox(hwnd, (LPSTR)sBuf.c_str(), "Debug", MB_OK);
sDisplay = sBuf + sText;
sDisplay = sDisplay + "\n\0"; // terminate the string
SetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sDisplay.c_str() );
}
This should append text to the control with each call.
Instead, all string concatenation fails after the call to GetDlgItemText(), I am assuming because of the typecast?
I have used three string variables to make it really obvious. If sBuf is affected then sDisplay should not be affected.
(Also, why is len 1 char less than the length in the buffer?)
GetDlgItemText() corretly returns the content of the EDIT control, and SetDlgItemText() will correctly set any text in sDisplay, but the concatenation in between is just not happening.
Is this a "hidden feature" of the string class?
Added:
Yes it looks like the problem is a terminating NUL in the middle. Now I understand why the len +1. The function ensures the last char is a NUL.
Using sBuf.resize(len); will chop it off and all is good.
Added:
Charles,
Leaving aside the quirky return length of this particular function, and talking about using a string as a buffer:
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself.
That's precisely what's needed isn't it?
Further, it requires that the program must not alter any of the values of that array.
As I understand it that is going to change along with the guarantee that all bytes are contiguous. I forget where I read a long article on this, but MS already implements this it asserted.
What I don't like about using a vector is that the bytes are copied twice before I can return them: once into the vector and again into the string. I also need to instantiate a vector object and a string object. That is a lot of overhead. If there were some string friendly of working with vectors (or CStrings) without resorting to old C functions or sopying characters one by one, I would use them. The string is very syntax friendly in that way.
The data() function on a std::string returns a const char*. You are not allowed to right into the buffer returned by it, it may be a duplicated buffer.
What you could do instead is to used a std::vector<char> as a temporary buffer.
E.g. (untested)
std::vector<char> sBuf( len + 1 );
GetDlgItemText( /* ... */, &sBuf[0], len + 1 );
std::string newText( &sBuf[0] );
newText += sText;
Also, the string you pass to SetDlgItemText should be \0 terminated so you should used c_str() not data() for this.
SetDlgItemText( /* ... */, newText.c_str() );
Edit:
OK, I've just checked the contract for GetWindowTextLength and GetDlgItemText. Check my edits above. Both will include the space for a null terminator so you need to chop it off the end of your string otherwise concatenation of the two strings will include a null terminator in the middle of the string and the SetDlgItemText call will only use the first part of the string.
There is a further complication in that GetWindowTextLength isn't guaranteed to be accurate, it only guarantees to return a number big enough for a program to create a buffer for storing the result. It is extremely unlikely that this will actually affect a dialog box item owned by the calling code but in other situations the actual text may be shorter than the returned length. For this reason you should search for the first \0 in the returned text in any case.
I've opted to just use the std::string constructor that takes a const char* so that it finds the first \0 correctly.
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself. Further, it requires that the program must not alter any of the values of that array. This means that the return value of data() may or may not be a copy of the string's internal representation and even if it isn't a copy you still aren't allowed to write to it.
I am far away from the win32 api and their string nightmare, but there is something in the code that you can check. Standard C++ strings do not need to be null terminated and nulls can happen anywhere within the string. I won't comment on the fact that you are casting away constantness with your C-style cast, which is a problem on its own, but rather on the strange effect you are
When you initially create the string you allocate extra space for the null (and initialize all elements to '\0') and then you copy the elements. At that point your string is len+1 in size and the last element is a null. After that you append some other string, and what you get is a string that will still have a null character at position len. When you retrieve the data with either data() (does not guarantee null termination!) or c_str() the returned buffer will still have the null character at len position. If that is passed to a function that stops on null (takes a C style string), then even if the string is complete, the function will just process the first len characters and forget about the rest.
#include <string>
#include <cstdio>
#include <iostream>
int main()
{
const char hi[] = "Hello, ";
const char all[] = "world!";
std::string result;
result.resize( sizeof(hi), 0 );
// simulate GetDlgItemText call
std::copy( hi, hi+sizeof(hi), const_cast<char*>(result.data()) ); // this is what your C-style cast is probably doing
// append
result.append( all );
std::cout << "size: " << result.size() // 14
<< ", contents" << result // "Hello, \0world!" - dump to a file and edit with a binary editor
<< std::endl;
std::printf( "%s\n", result.c_str() ); // "Hello, "
}
As you can see, printf expects a C-style string and will stop when the first null character is found, so that it can seem as if the append operation never took place. On the other hand, c++ streams do work properly with std::string and will dump the whole content, checking that the strings were actually appended.
A patch to your append operation disappearing would be removing the '\0' from the initial string (reserve only len space in the string). But that is not really a good solution, you should never use const_cast (there are really few places where it can be required and this is not one of them), the fact that you don't see it is even worse: using C style casts is making your code look nicer than it is.
You have commented on another answer that you do not want to add std::vector (which would provide with a correct solution as &v[0] is a proper mutable pointer into the buffer), of course, not adding the extra space for the '\0'. Consider that this is part of an implementation file, and the fact that you use or not std::vector will not extend beyond this single compilation unit. Since you are already using some STL features, you are not adding any extra requirement to your system. So to me that would be the way to go. The solution provided by Charles Bailey should work provided that you remove the extra null character.
This is NOT an answer. I have added it here as an answer only so that I can use formatting in a long going discussion about const_cast.
This is an example where using const_cast can break a running application:
#include <iostream>
#include <map>
typedef std::map<int,int> map_type;
void dump( map_type const & m ); // implemented somewhere else for concision
int main() {
map_type m;
m[1] = 10;
m[2] = 20;
m[3] = 30;
map_type::iterator it = m.find(2);
const_cast<int&>(it->first) = 10;
// At this point the order invariant of the container is broken:
dump(); // (1,10),(10,20),(3,30) !!! unordered by key!!!!
// This happens with g++-4.0.1 in MacOSX 10.5
if ( m.find(3) == m.end() ) std::cout << "key 3 not found!!!" << std::endl;
}
That is the danger of using const_cast. You can get away in some situations, but in others it will bite back, and probably hard. Try to debug in thousands of lines where the element with key 3 was removed from the container. And good luck in your search, for it was never removed.