Is there a way to access environment variables in an LLVM pass - llvm

I was wondering if there was a way to access environment variables while writing an LLVM pass. In my source file, I read a certain environment variable, depending on whose value the execution is furthered. I ought to recreate that in my pass and add a few checks therein.
I would really appreciate if there was some tutorial laying out these details in form of examples.

You should be able to just use cstdlib's getenv function in your pass, just like in any other C++ code.

Related

Is there an IDE for Fortran that can check values of variables?

I used to use Matlab and one feature I relied on was that it allowed you to check the value of all variables easily. For example, if I ran a sample program like "a+b=c, c+d=e", it automatically saves the value of c and can be introspected directly.
Now I turn to Fortran. My current IDE just builds the code into an "exe" and I need to save necessary values specifically. If I'm doing a calculation with hundreds of variables it is hard to manage.
So I'm asking is there an IDE allowing me to check values of variables directly without saving each of them manually?
p.s. Is there any term of this feature? Is it called "memory introspection"?

About definition of llvm pass

I do not quite understand the definition of pass in the llvm. Does it mean I can only use opt command to run the program?
My situation is like I want to find loops in a CFG of basic blocks and I want to use LLVM API instead of writing code by myself. I found a file called Loopinfo http://llvm.org/docs/doxygen/html/LoopInfo_8h_source.html which includes pass.h and class passinfo inherited from Functionpass. Does it mean I can only use opt command to call instead of writing a normal project which uses some of class's functions and build and execute? I hope I clarified my question clearly.
You can analyze and manipulate LLVM IR just fine without knowing anything about passes. Just use the LLVM API and you'll be OK.
So what's the deal with passes? Well, if you do write your analysis or transformation in the form of a pass - by following this guide - you can still just use it as any regular C++ class1, but you get some advantages:
You can use the opt tool to run your pass. It will take care of everything else for you (e.g. loading the IR), it makes it very easy to run other passes before or after your pass (including the useful verification pass), makes it easy to enable/disable debug mode, etc.
You can easily combine your pass with other passes using a pass manager, which is very convenient (will take care of pass dependencies for you, for example).
So in general, writing things in the form of passes is recommended but not required.
1 Well if you define requirements on other passes then you'll have to run those yourself if you're not using opt or a pass manager
The easiest way is to add pass executed via opt command. But, you should be able to create dedicated executable which reads LLVM bitcode, performs your pass and writes bitcode back.
See here for an example:
Parsing and Modifying LLVM IR code
Source of opt command might also be useful:
https://llvm.org/svn/llvm-project/llvm/trunk/tools/opt/opt.cpp

Is there a way to figure out what environment variables are needed/used by an executable?

I've got a C++ program that will run certain very specific commands as root. The reason this is needed is because another program running under Node.js needs to do things like set system time, set time zone, etc that require root privileges to accomplish. I'm using the function execve in C++ to make the system call with root privileges after using the setuid command. I specifically choose the execve command because I want to wall off the environment so I don't create an environment variable vulnerability.
setuid(0);
execve(acExeName, pArgsForExec2, pcEnv);
What I want to do is find out exactly the pcEnv which is the environment variable list for the program to execute with that my program needs. For example, if I want to run the tool time-admin as if I was running it from the console, how can I figure out what environment variables it needs. I know I can print off the environment variables with the command printenv, but that gives me all of them. I'm quite sure I don't need them all and want as small a subset as possible.
I know I can use them all and then slowly comment each one out and see if it keeps working, but I'd really rather not go that far.
Anyone got a clever way to figure out what environment variables are used by a program? I should add I'm doing this on a Ubuntu 12.04 LTS install.
Thanks for any help.
There are no general ways of figuring out the environment variables used by some program. For example, one could imagine that a program has some configuration files which gives the name of environment variables.
Actually many shell like programs (or script interpreters) are doing that.
More generally, the argument to getenv(3) could be computed. So in theory you cannot guess its possible values. (I might be wrong, but some very old versions of libc and of bash used to play such tricks; unfortunately, I forgot the details, but sometimes an environment variable with some pid number in its name was used).
And, as others commented, you might want to use ltrace (or play LD_PRELOAD tricks), or use gdb, to find out how getenv is called ...
And the application might also use the environ variable (see environ(7) ...) or the third argument to main ....
In practice however, a reasonably written program should clearly document all the environment variables it is using....
If you have access to the source code of the program, you could, if it is compiled by GCC, use (the just released version 1.0 of) the MELT plugin. MELT is a domain specific language to extend GCC and can be used to explore the internal Gimple representations handled by GCC while compiling your program. In particular with its new findgimple mode you could find in one command all the calls to getenv with a constant string.

llvm: strategies to build JIT content incrementally

I want my language backend to build functions and types incrementally but don't pollute the main module and context when functions and types fail to build successfully (due to problems with the user input).
I ask an earlier question regarding this.
One strategy i can see for this would be building everything in temp module and LLVMContext, migrating to main context only after success, but i am not sure if that is possible with the current API. For instance, i wouldn't know know to migrate that content between different contexts, as they are supposed to represent isolated islands of LLVM functionality, but maybe there is always the alternative to save everything to .bc and load somewhere else?
what other strategies would you suggest for achieving this?
Assuming you have two modules - source and destination, it's possible to copy a function from source to destination. The code in LLVM you can use as an example is the body of the LLVM linker, in lib/linker/LinkModules.cpp.
In particular, look at the linkFunctionProto and linkFunctionBody methods in that file. linkFunctionBody copies the function definition, and uses the llvm::CloneFunctionInto utility for the heavy lifting.
As for LLVMContext, unless you specifically need to run several LLVM instances simultaneously in different threads, don't worry about it too much and just use getGlobalContext() everywhere a context is required. Read this doc page for more information.

Calling an executable's function code

I have the location/offset of a particular function present inside an executable. Would it be possible to call such a function (while suppressing the CRT's execution of the executable's entry point, hopefully) ?
In effect, you can simulate the Windows loader, assuming you run under Windows, but the basics should be the same on any platform. See e.g. http://msdn.microsoft.com/en-us/magazine/cc301805.aspx.
Load the file into memory,
Replace all relative addresses of functions that are called by the loaded executable with the actual function addresses.
Change the memory page to "executable" (this is the difficult and platform-dependent part)
Initialize the CRT in order to, e.g., initialize static variables.
Call.
However, as the commenters point out correctly, this might only be practical as an exercise using very simple functions. There are many, many things that can go wrong if you don't manage to emulate the complete OS loader.
PS: You could also ask the Google: http://www.cultdeadcow.com/tools/pewrap.html
PPS: You may also find helpful advice in the "security" community: https://www.blackhat.com/presentations/bh-usa-07/Harbour/Whitepaper/bh-usa-07-harbour-WP.pdf
Yes, you can call it, if you will initialize all global variables which this function uses. Probably including CRT global variables. As alternative way, you can hook and replace all CRT functions that callee uses. See disassembly of that function to get right solution.
1) Take a look at the LoadLibraryEx() API. It has some flags that could be able to do all the dirty work described by Sebastian.
2) Edit the executable. Several modified bytes will do the job. Here is some documentation on the file format: http://docsrv.sco.com:507/en/topics/COFF.html