Changing source code at compilation time (using LLVM) - llvm

#include <stdio.h>
#include <string.h>
int foo(char* a)
{ char str[10];
if (strlen(a)<10)
{
sprintf(str,"Yes");
puts(str);
return 0;
}
else
{
sprintf(str,"No");
puts(str);
return 1;
}
}
Now, lets say while writing a LLVM pass, I want to ensure that instead of calling sprintf, printf is called instead (with the same arguments). How could I go about doing that?

In a nutshell,
Go over all instructions in the function.
If the instruction is a CallInst, check if it's a call to sprintf (you can just check its name).
Create a new CallInst (via CallInst::Create) that calls printf instead of sprintf. I think the easiest way to get the Value of the printf declaration is to use one of the Module::getOrCreate methods. The type of printf should be the same as the one for sprintf minus the first parameter.
Set the operands of the new call instructions to be the same as for sprintf, minus the first parameter.
Replace the old call with the new call - replaceInstWithInst (in BasicBlockUtils.h) is probably the most convenient way.
Finally, you might want to track the usages of the old first parameter, remove them all and then remove it - so you'd get rid of that puts call.

#ifdef USE_BUFFER
#define my_printf(...) sprintf(buffer, __VA_ARGS__)
#else
#define my_printf(...) printf(__VA_ARGS__)
#endif
Now you can use for instance my_printf("My name is %s.", "Bozo"); and it will compile as if it was printf("My name is %s.", "Bozo") by default.
If you include #define USE_BUFFER before the headers, it will instead convert those lines to sprintf(buffer, "My name is %s.", "Bozo") at compile time. Of course the variable buffer must exist in the context.

Related

Stringification of int in C/C++

The below code should output 100 to my knowledge of stringification. vstr(s) should be expanded with value of 100 then str(s) gets 100 and it should return the string "100". But, it outputs "a" instead. What is the reason? But, if I call with macro defined constant foo then it output "100". Why?
#include<stdio.h>
#define vstr(s) str(s)
#define str(s) #s
#define foo 100
int main()
{
int a = 100;
puts(vstr(a));
puts(vstr(foo));
return 0;
}
The reason is that preprocessors operate on tokens passed into them, not on values associated with those tokens.
#include <stdio.h>
#define vstr(s) str(s)
#define str(s) #s
int main()
{
puts(vstr(10+10));
return 0;
}
Outputs:
10+10
The # stringizing operator is part of the preprocessor. It's evaluated at compile time. It can't get the value of a variable at execution time, then somehow magically convert that to something it could have known at compile time.
If you want to convert an execution-time variable into a string at execution time, you need to use a function like std::to_string.
Since vstr is preprocessed, the line
puts(vstr(a));
is translated as:
puts("a");
The value of the variable a plays no role in that line. You can remove the line
int a = 100;
and the program will behave identically.
Stringification is the process of transforming something into a string. What your macro stringifies ?
Actually the name of the variable itself, this is done at compilation-time.
If you want to stringify and then print the value of the variable at execution-time, then you must used something like printf("%\n",v); in C or cout << v << endl; in C++.
A preprocessor macro is not the same thing as a function, it does not expand the arguments at runtime and sees the value, but rather processes it at preprocessing stage (which is before compilation, so it doesn't even know the variables dependency).
In this case, you've passed the macro a to stringify, which it did. The preprocessor doesn't care a is also the name of a variable.

Accessing a variable in C++ by "stitching" its name together

Let's say I have a variable:
int fish5 = 7;
Can I access fish5 by concatenation of the term "fish" and "5" somehow?
An ideal solution would look something like this:
printf("I am displaying the number seven: %i", fish + 5);
No, not exactly what you want. But in your example, you can use an Array (only works if you want to concatenate an variablename with an number):
int fish[6] = {0};
fish[5] = 7;
printf("I am displaying the number seven: %i", fish[5]);
See also here for an reference to Arrays in C++.
Another solution would be using a std::map instead, like pointed out by Thrustmaster in the comments.
Then you could write something like:
#include <map>
#include <string>
int main(int argc, char* argv[]){
std::map<std::string, int> map;
map.insert(std::make_pair("fish5", 7));
printf("I am displaying the number seven: %d", map[std::string("fish") + std::to_string(5)]);
return 0;
}
For more Information about std::map, see here.
It is impossible to transition the solution from the compile time to run time, because c++ is a compiled language, not interpreted one. The variable names "lose their meaning" after compilation. They are just a set of symbols with addresses. This means, that after compilation asking for something like fish5 makes no sense.
To achieve what you want, you need to bind the name to the object somehow programmatically for example by using a map, that stores names as keys, and object references as values. This is how python does it, and why in python you can actually access the object via its name from the code.
In case anyone would wonder why for example gdb or crash dumps are meaningful, it is for pretty much the same reason. The symbol names must be saved at the compilation time (either embedded in the executable or an external file), then an external tool can figure out what is the name of a variable under a certain address. But a compiled executable can work just fine with out this information.
Alternatively, you need to remember the reference itself in some more convenient way, that allows it to be computable. E.g. store it in an array and access as fish[5]; although in this example it can be evaluated at the compile time, you can use the same method at run-time using a variable in place of 5.
The distinction between compile-time and run-time is very important, because you can actually do what you want at the compile-time with preprocessor, but only because it is compile time.
You could use operating system specific things, like (on Posix e.g. Linux) dlsym(3) to access variables thru the symbol table inside the (unstripped) ELF executable at runtime.
So, preferably declare the variables you want to access by their name as extern "C" e.g.
extern "C" int fish17;
(otherwise, take into account compiler specific name mangling)
Declare also a program handle:
void *progdlh;
initialize it early in main
progdlh = dlopen(NULL, RTLD_NOW|RTLD_GLOBAL);
if (!progdlh) { fprintf(stderr, "dlopen failure %s\n", dlerror());
exit(EXIT_FAILURE); }
then, to retrieve your variable by a computed name you might try:
char nambuf[32];
snprintf (nambuf, sizeof(nambuf), "%s%d", "fish", 17);
int *pvar = dlsym(progdlh, nambuf);
if (!pvar) { fprintf(stderr, "dlsym %s failure %s\n", nambuf, dlerror());
exit(EXIT_FAILURE); }
printf ("variable %s is %d\n", nambuf, *pvar);
You'll probably should link your program with -rdynamic flag (and with the -ldl library)
My answer should work on Linux.
You can only do this at compile time, using preprocessor. Complete example:
#include <cstdio>
#define JOIN(a,b) a##b
int main(void) {
int fish5 = 5;
std::printf("I am displaying the number five: %i", JOIN(fish, 5));
return 0;
}
However, this is strictly compile time. If you try JOIN(fish, fish) you get error: ‘fishfish’ undeclared and if you try JOIN("fish", fish) you get error: pasting ""fish"" and "fish" does not give a valid preprocessing token.
In C++, variable names do not exist at runtime, so the operation can't be done at runtime, except though some deep debug info hackary to find variables by their name string (like a debugges does). If using strings is valid approach for you, then it's better to just have a map from string to variable address explicitly. Other answers show how to do this already, by using std::map.

How can I output #define value

I am new to this forum so please go easy on me :)
I have the following in my code
#define SYS_SBS 0x02
Whenever I try to use this and try to output,I get 2 as the value, however I want to get SYS_SBS as the output for my program. Is there a way, I can do this.
I have no control over the source code. I just have to output SYS_SBS.
Additional Details: I cannot change some the header files. However I can change the main function in .cpp file. I want the SYS_SBS as the output. I am working with satellites and for all the satellited detected by my receiver, I have to output what type of sattelite they are. In the code all of them are defined with this hexadecimal number. I just want to output SYS_SBS and not 2
#include <stdio.h>
#define SYS_SBS 0x02
#define id(x) #x
int main(){
printf("%s %d\n", id(SYS_SBS), SYS_SBS);
return 0;
}
The C standard provides a stringification operator (add a # in front of the token) that allows you to outuput a specific token.
What's not possible is to convert backwards from a variable's value to this token name as this is lost during translations (as others have mentioned). If you need that kind of conversion, think about a explicit "value2str" function that returns a string representation of a given value:
const char *myType2str(int value)
{
switch (value)
{
case SYS_SBS:
return "SYS_SBS";
default:
return "UNKNOWN VALUE";
}
}
EDIT: According to some comments, stringification is part of the standard. Changed that. Thanks for the hint. Wasn't aware of that.
0x02 is the hexadecimal representation in the source. Once you compiled it, it's just a number (2).
If you want to print it as hex, then, well... print it as hex (eg: use the formatting string "0x%.2x").
Well, you could simply:
printf("SYS_SBS");
But I assume you have a number as "input" (like 2), and want to output the string SYS_SBS, well, that's not directly possible. The best you can do is create a lookup table, eg:
const char* sys_strings[] = { "SYS_EX", "SYS_TEM", "SYS_SBS" };

printf with std::string?

My understanding is that string is a member of the std namespace, so why does the following occur?
#include <iostream>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString);
cin.get();
return 0;
}
Each time the program runs, myString prints a seemingly random string of 3 characters, such as in the output above.
C++23 Update
We now finally have std::print as a way to use std::format for output directly:
#include <print>
#include <string>
int main() {
// ...
std::print("Follow this command: {}", myString);
// ...
}
This combines the best of both approaches.
Original Answer
It's compiling because printf isn't type safe, since it uses variable arguments in the C sense1. printf has no option for std::string, only a C-style string. Using something else in place of what it expects definitely won't give you the results you want. It's actually undefined behaviour, so anything at all could happen.
The easiest way to fix this, since you're using C++, is printing it normally with std::cout, since std::string supports that through operator overloading:
std::cout << "Follow this command: " << myString;
If, for some reason, you need to extract the C-style string, you can use the c_str() method of std::string to get a const char * that is null-terminated. Using your example:
#include <iostream>
#include <string>
#include <stdio.h>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString.c_str()); //note the use of c_str
cin.get();
return 0;
}
If you want a function that is like printf, but type safe, look into variadic templates (C++11, supported on all major compilers as of MSVC12). You can find an example of one here. There's nothing I know of implemented like that in the standard library, but there might be in Boost, specifically boost::format.
[1]: This means that you can pass any number of arguments, but the function relies on you to tell it the number and types of those arguments. In the case of printf, that means a string with encoded type information like %d meaning int. If you lie about the type or number, the function has no standard way of knowing, although some compilers have the ability to check and give warnings when you lie.
Please don't use printf("%s", your_string.c_str());
Use cout << your_string; instead. Short, simple and typesafe. In fact, when you're writing C++, you generally want to avoid printf entirely -- it's a leftover from C that's rarely needed or useful in C++.
As to why you should use cout instead of printf, the reasons are numerous. Here's a sampling of a few of the most obvious:
As the question shows, printf isn't type-safe. If the type you pass differs from that given in the conversion specifier, printf will try to use whatever it finds on the stack as if it were the specified type, giving undefined behavior. Some compilers can warn about this under some circumstances, but some compilers can't/won't at all, and none can under all circumstances.
printf isn't extensible. You can only pass primitive types to it. The set of conversion specifiers it understands is hard-coded in its implementation, and there's no way for you to add more/others. Most well-written C++ should use these types primarily to implement types oriented toward the problem being solved.
It makes decent formatting much more difficult. For an obvious example, when you're printing numbers for people to read, you typically want to insert thousands separators every few digits. The exact number of digits and the characters used as separators varies, but cout has that covered as well. For example:
std::locale loc("");
std::cout.imbue(loc);
std::cout << 123456.78;
The nameless locale (the "") picks a locale based on the user's configuration. Therefore, on my machine (configured for US English) this prints out as 123,456.78. For somebody who has their computer configured for (say) Germany, it would print out something like 123.456,78. For somebody with it configured for India, it would print out as 1,23,456.78 (and of course there are many others). With printf I get exactly one result: 123456.78. It is consistent, but it's consistently wrong for everybody everywhere. Essentially the only way to work around it is to do the formatting separately, then pass the result as a string to printf, because printf itself simply will not do the job correctly.
Although they're quite compact, printf format strings can be quite unreadable. Even among C programmers who use printf virtually every day, I'd guess at least 99% would need to look things up to be sure what the # in %#x means, and how that differs from what the # in %#f means (and yes, they mean entirely different things).
use myString.c_str() if you want a c-like string (const char*) to use with printf
thanks
Use std::printf and c_str()
example:
std::printf("Follow this command: %s", myString.c_str());
You can use snprinft to determine the number of characters needed and allocate a buffer of the right size.
int length = std::snprintf(nullptr, 0, "There can only be %i\n", 1 );
char* str = new char[length+1]; // one more character for null terminator
std::snprintf( str, length + 1, "There can only be %i\n", 1 );
std::string cppstr( str );
delete[] str;
This is a minor adaption of an example on cppreference.com
printf accepts a variable number of arguments. Those can only have Plain Old Data (POD) types. Code that passes anything other than POD to printf only compiles because the compiler assumes you got your format right. %s means that the respective argument is supposed to be a pointer to a char. In your case it is an std::string not const char*. printf does not know it because the argument type goes lost and is supposed to be restored from the format parameter. When turning that std::string argument into const char* the resulting pointer will point to some irrelevant region of memory instead of your desired C string. For that reason your code prints out gibberish.
While printf is an excellent choice for printing out formatted text, (especially if you intend to have padding), it can be dangerous if you haven't enabled compiler warnings. Always enable warnings because then mistakes like this are easily avoidable. There is no reason to use the clumsy std::cout mechanism if the printf family can do the same task in a much faster and prettier way. Just make sure you have enabled all warnings (-Wall -Wextra) and you will be good. In case you use your own custom printf implementation you should declare it with the __attribute__ mechanism that enables the compiler to check the format string against the parameters provided.
The main reason is probably that a C++ string is a struct that includes a current-length value, not just the address of a sequence of chars terminated by a 0 byte. Printf and its relatives expect to find such a sequence, not a struct, and therefore get confused by C++ strings.
Speaking for myself, I believe that printf has a place that can't easily be filled by C++ syntactic features, just as table structures in html have a place that can't easily be filled by divs. As Dykstra wrote later about the goto, he didn't intend to start a religion and was really only arguing against using it as a kludge to make up for poorly-designed code.
It would be quite nice if the GNU project would add the printf family to their g++ extensions.
Printf is actually pretty good to use if size matters. Meaning if you are running a program where memory is an issue, then printf is actually a very good and under rater solution. Cout essentially shifts bits over to make room for the string, while printf just takes in some sort of parameters and prints it to the screen. If you were to compile a simple hello world program, printf would be able to compile it in less than 60, 000 bits as opposed to cout, it would take over 1 million bits to compile.
For your situation, id suggest using cout simply because it is much more convenient to use. Although, I would argue that printf is something good to know.
Here’s a generic way of doing it.
#include <string>
#include <stdio.h>
auto print_helper(auto const & t){
return t;
}
auto print_helper(std::string const & s){
return s.c_str();
}
std::string four(){
return "four";
}
template<class ... Args>
void print(char const * fmt, Args&& ...args){
printf(fmt, print_helper(args) ...);
}
int main(){
std::string one {"one"};
char const * three = "three";
print("%c %d %s %s, %s five", 'c', 3+4, one + " two", three, four());
}

C/C++ line number

In the sake of debugging purposes, can I get the line number in C/C++ compilers?
(standard way or specific ways for certain compilers)
e.g
if(!Logical)
printf("Not logical value at line number %d \n",LineNumber);
// How to get LineNumber without writing it by my hand?(dynamic compilation)
You should use the preprocessor macro __LINE__ and __FILE__. They are predefined macros and part of the C/C++ standard. During preprocessing, they are replaced respectively by a constant string holding an integer representing the current line number and by the current file name.
Others preprocessor variables :
__func__ : function name (this is part of C99, not all C++ compilers support it)
__DATE__ : a string of form "Mmm dd yyyy"
__TIME__ : a string of form "hh:mm:ss"
Your code will be :
if(!Logical)
printf("Not logical value at line number %d in file %s\n", __LINE__, __FILE__);
As part of the C++ standard there exists some pre-defined macros that you can use. Section 16.8 of the C++ standard defines amongst other things, the __LINE__ macro.
__LINE__: The line number of the current source line (a decimal
constant).
__FILE__: The presumed name of the source file (a character string
literal).
__DATE__: The date of translation of the source file (a character string
literal...)
__TIME__: The time of translation of the source file (a character string
literal...)
__STDC__: Whether__STDC__ is predefined
__cplusplus: The name __cplusplus is defined to the value 199711L when
compiling a C ++ translation unit
So your code would be:
if(!Logical)
printf("Not logical value at line number %d \n",__LINE__);
You could use a macro with the same behavior as printf(),
except that it also includes debug information such as
function name, class, and line number:
#include <cstdio> //needed for printf
#define print(a, args...) printf("%s(%s:%d) " a, __func__,__FILE__, __LINE__, ##args)
#define println(a, args...) print(a "\n", ##args)
These macros should behave identically to printf(), while including java stacktrace-like information. Here's an example main:
void exampleMethod() {
println("printf() syntax: string = %s, int = %d", "foobar", 42);
}
int main(int argc, char** argv) {
print("Before exampleMethod()...\n");
exampleMethod();
println("Success!");
}
Which results in the following output:
main(main.cpp:11) Before exampleMethod()...
exampleMethod(main.cpp:7) printf() syntax: string = foobar, int = 42
main(main.cpp:13) Success!
C++20 offers a new way to achieve this by using std::source_location. This is currently accessible in gcc an clang as std::experimental::source_location with #include <experimental/source_location>.
The problem with macros like __LINE__ is that if you want to create for example a logging function that outputs the current line number along with a message, you always have to pass __LINE__ as a function argument, because it is expanded at the call site.
Something like this:
void log(const std::string msg) {
std::cout << __LINE__ << " " << msg << std::endl;
}
Will always output the line of the function declaration and not the line where log was actually called from.
On the other hand, with std::source_location you can write something like this:
#include <experimental/source_location>
using std::experimental::source_location;
void log(const std::string msg, const source_location loc = source_location::current())
{
std::cout << loc.line() << " " << msg << std::endl;
}
Here, loc is initialized with the line number pointing to the location where log was called.
You can try it online here.
Use __LINE__ (that's double-underscore LINE double-underscore), the preprocessor will replace it with the line number on which it is encountered.
Checkout __FILE__ and __LINE__ macros
Try __FILE__ and __LINE__.
You might also find __DATE__ and __TIME__ useful.
Though unless you have to debug a program on the clientside and thus need to log these informations you should use normal debugging.
For those who might need it, a "FILE_LINE" macro to easily print file and line:
#define STRINGIZING(x) #x
#define STR(x) STRINGIZING(x)
#define FILE_LINE __FILE__ ":" STR(__LINE__)
Since i'm also facing this problem now and i cannot add an answer to a different but also valid question asked here,
i'll provide an example solution for the problem of:
getting only the line number of where the function has been called in C++ using templates.
Background: in C++ one can use non-type integer values as a template argument. This is different than the typical usage of data types as template arguments.
So the idea is to use such integer values for a function call.
#include <iostream>
class Test{
public:
template<unsigned int L>
int test(){
std::cout << "the function has been called at line number: " << L << std::endl;
return 0;
}
int test(){ return this->test<0>(); }
};
int main(int argc, char **argv){
Test t;
t.test();
t.test<__LINE__>();
return 0;
}
Output:
the function has been called at line number: 0
the function has been called at line number: 16
One thing to mention here is that in C++11 Standard it's possible to give default template values for functions using template. In pre C++11 default values for non-type arguments seem to only work for class template arguments. Thus, in C++11, there would be no need to have duplicate function definitions as above. In C++11 its also valid to have const char* template arguments but its not possible to use them with literals like __FILE__ or __func__ as mentioned here.
So in the end if you're using C++ or C++11 this might be a very interesting alternative than using macro's to get the calling line.
Use __LINE__, but what is its type?
LINE The presumed line number (within the current source file) of the current source line (an integer constant).
As an integer constant, code can often assume the value is __LINE__ <= INT_MAX and so the type is int.
To print in C, printf() needs the matching specifier: "%d". This is a far lesser concern in C++ with cout.
Pedantic concern: If the line number exceeds INT_MAX1 (somewhat conceivable with 16-bit int), hopefully the compiler will produce a warning. Example:
format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Wformat=]
Alternatively, code could force wider types to forestall such warnings.
printf("Not logical value at line number %ld\n", (long) __LINE__);
//or
#include <stdint.h>
printf("Not logical value at line number %jd\n", INTMAX_C(__LINE__));
Avoid printf()
To avoid all integer limitations: stringify. Code could directly print without a printf() call: a nice thing to avoid in error handling2 .
#define xstr(a) str(a)
#define str(a) #a
fprintf(stderr, "Not logical value at line number %s\n", xstr(__LINE__));
fputs("Not logical value at line number " xstr(__LINE__) "\n", stderr);
1 Certainly poor programming practice to have such a large file, yet perhaps machine generated code may go high.
2 In debugging, sometimes code simply is not working as hoped. Calling complex functions like *printf() can itself incur issues vs. a simple fputs().