Function to call #include macro from a string variable argument? - c++

Is it possible to have a function like this:
const char* load(const char* filename_){
return
#include filename_
;
};
so you wouldn't have to hardcode the #include file?
Maybe with a some macro?
I'm drawing a blank, guys. I can't tell if it's flat out not possible or if it just has a weird solution.
EDIT:
Also, the ideal is to have this as a compile time operation, otherwise I know there's more standard ways to read a file. Hence thinking about #include in the first place.

This is absolutely impossible.
The reason is - as Justin already said in a comment - that #include is evaluated at compile time.
To include files during run time would require a complete compiler "on board" of the program. A lot of script languages support things like that, but C++ is a compiled language and works different: Compile and run time are strictly separated.

You cannot use #include to do what you want to do.
The C++ way of implementing such a function is:
Find out the size of the file.
Allocate memory for the contents of the file.
Read the contents of the file into the allocated memory.
Return the contents of the file to the calling function.
It will better to change the return type to std::string to ease the burden of dealing with dynamically allocated memory.
std::string load(const char* filename)
{
std::string contents;
// Open the file
std::ifstream in(filename);
// If there is a problem in opening the file, deal with it.
if ( !in )
{
// Problem. Figure out what to do with it.
}
// Move to the end of the file.
in.seekg(0, std::ifstream::end);
auto size = in.tellg();
// Allocate memory for the contents.
// Add an additional character for the terminating null character.
contents.resize(size+1);
// Rewind the file.
in.seekg(0);
// Read the contents
auto n = in.read(contents.data(), size);
if ( n != size )
{
// Problem. Figure out what to do with it.
}
contents[size] = '\0';
return contents;
};
PS
Using a terminating null character in the returned object is necessary only if you need to treat the contents of the returned object as a null terminated string for some reason. Otherwise, it maybe omitted.

I can't tell if it's flat out not possible
I can. It's flat out not possible.
Contents of the filename_ string are not determined until runtime - the content is unknown when the pre processor is run. Pre-processor macros are processed before compilation (or as first step of compilation depending on your perspective).
When the choice of the filename is determined at runtime, the file must also be read at runtime (for example using a fstream).
Also, the ideal is to have this as a compile time operation
The latest time you can affect the choice of included file is when the preprocessor runs. What you can use to affect the file is a pre-processor macro:
#define filename_ "path/to/file"
// ...
return
#include filename_
;

it is theoretically possible.
In practice, you're asking to write a PHP construct using C++. It can be done, as too many things can, but you need some awkward prerequisites.
a compiler has to be linked into your executable. Because the operation you call "hardcoding" is essential for the code to be executed.
a (probably very fussy) linker again into your executable, to merge the new code and resolve any function calls etc. in both directions.
Also, the newly imported code would not be reachable by the rest of the program which was not written (and certainly not compiled!) with that information in mind. So you would need an entry point and a means of exchanging information. Then in this block of information you could even put pointers to code to be called.
Not all architectures and OSes will support this, because "data" and "code" are two concerns best left separate. Code is potentially harmful; think of it as nitric acid. External data is fluid and slippery, like glycerine. And handling nitroglycerine is, as I said, possible. Practical and safe are something completely different.
Once the prerequisites were met, you would have two or three nice extra functions and could write:
void *load(const char* filename, void *data) {
// some "don't load twice" functionality is probably needed
void *code = compile_source(filename);
if (NULL == code) {
// a get_last_compiler_error() would be useful
return NULL;
}
if (EXIT_SUCCESS != invoke_code(code, data)) {
// a get_last_runtime_error() would also be useful
release_code(code);
return NULL;
}
// it is now the caller's responsibility to release the code.
return code;
}
And of course it would be a security nightmare, with source code left lying around and being imported into a running application.
Maintaining the code would be a different, but equally scary nightmare, because you'd be needing two toolchains - one to build the executable, one embedded inside said executable - and they wouldn't necessarily be automatically compatible. You'd be crying loud for all the bugs of the realm to come and rejoice.
What problem would be solved?
Implementing require_once in C++ might be fun, but you thought it could answer a problem you have. Which is it exactly? Maybe it can be solved in a more C++ish way.
A better alternative, considering also performances etc., to compile a loadable module beforehand, and load it at runtime.
If you need to perform small tunings to the executable, place parameters into an external configuration file and provide a mechanism to reload it. Once the modules conform to a fixed specification, you can even provide "plugins" that weren't available when the executable was first developed.

Related

Structure not in memory

I created a structure like that:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
Right after this declaration, in another process, I open the process which contains the structure and search for a byte array with the struct's doubles but nothing gets found.
To obtain a result, I need to add something like std::cout << options.bindableKeys;after the declaration. Then I get a result from my pattern search.
Why is this behaving like that? Is there any fix?
Minimal reproducible example:
struct Options {
double bindableKeys = 567;
double graphicLocation = 150;
double textures = 300;
};
Options options;
while(true) {
double val = options.bindableKeys;
if(val > 10)
std::cout << "test" << std::endl;
}
You can search the array with CheatEngine or another pattern finder
Contrary to popular belief, C++ source code is not a sequence of instructions provided to the executing computer. It is not a list of things that the executable will contain.
It is merely a description of a program.
Your compiler is responsible for creating an executable program, that follows the same semantics and logical narrative as you've described in your source code.
Creating an Options instance is all well and good, but if creating it does not do anything (has no side effects) and you never use any of its data, then it may as well not exist, and therefore is not a part of the logical narrative of your program.
Consequently, there is no reason for the compiler to put it into the executable program. So, it doesn't.
Some people call this "optimisation". That the instance is "optimised away". I prefer to call it common sense: the instance was never truly a part of your program.
And even if you do use the data in the instance, it may be possible for an executable program to be created that more directly uses that data. In your case, nothing changes the default values of Option's members, so there is no reason to include them into the program: the if statement can just have 567 baked into it. Then, since it's baked in, the whole condition becomes the constant expression 567 > 10 which must always be true; you'll likely find that the resulting executable program consequently contains no branching logic at all. It just starts up, then outputs "test" over and over again until you force-terminate it.
That all being said, because we live in a world governed by physical laws, and because compilers are imperfect, there is always going to be some slight leakage of this abstraction. For this reason, you can trick the compiler into thinking that the instance is "used" in a way that requires its presence to be represented more formally in the executable, even if this isn't necessary to implement the described program. This is common in benchmarking code.

What is the right way to parse large amount of input for main function in C++

Suppose there are 30 numbers I had to input into an executable, because of the large amount of input, it is not reasonable to input them via command line. One standard way is to save them into a single XML file and use XML parser like tinyxml2 to parse them. The problem is if I use tinyxml2 to parse the input directly I will have a very bloated main function, which seems to contradict the common good practice.
For example:
int main(int argc, char **argv){
int a[30];
tinyxml2::XMLDocument doc_xml;
if (doc_xml.LoadFile(argv[1])){
std::cerr << "failed to load input file";
}
else {
tinyxml2::XMLHandle xml(&doc_xml);
tinyxml2::XMLHandle a0_xml =
xml.FirstChildElement("INPUT").FirstChildElement("A0");
if (a0_xml.ToElement()) {
a0_xml.ToElement()->QueryIntText(&a[0]);
}
else {
std::cerr << "A0 missing";
}
tinyxml2::XMLHandle a1_xml =
xml.FirstChildElement("INPUT").FirstChildElement("A1");
if (a1_xml.ToElement()) {
a1_xml.ToElement()->QueryIntText(&a[1]);
}
else {
std::cerr << "A1 missing";
}
// parsing all the way to A29 ...
}
// do something with a
return 0;
}
But on the other hand, if I write an extra class just to parse these specific type of input in order to shorten the main function, it doesn't seem to be right either, because this extra class will be useless unless it's used in conjunction with this main function since it can't be reused elsewhere.
int main(int argc, char **argv){
int a[30];
ParseXMLJustForThisExeClass ParseXMLJustForThisExeClass_obj;
ParseXMLJustForThisExeClass_obj.Run(argv[1], a);
// do something with a
return 0;
}
What is the best way to deal with it?
Note, besides reading XML files you can also pass lots of data through stdin. It's pretty common practice to use e.g. mycomplexcmd | hexdump -C, where hexdump is reading from stdin through the pipe.
Now up to the rest of the question: there's a reason to go with the your multiple-functions example (here it's not very important whether they're constructors or usual functions). It's pretty much the same as why would you want any function to be smaller — readability. That said, I don't know about the "common good practice", and I've seen many terminal utilities with very big main().
Imagine someone new is reading 1-st variant of main(). They'd be going through the hoops of figuring out all these handles, queries, children, parents — when all they wanted is to just look at the part after // do something with a. It's because they don't know if it's relevant to their problem or not. But in the 2-nd variant they'll quickly figure it out "Aha, it's the parsing logic, it's not what I am looking for".
That said, of course you can break the logic with detailed comments. But now imagine something went wrong, someone is debugging the code, and they pinned down the problem to this function (alright, it's funny given the function is main(), maybe they just started debugging). The bug turned out to be very subtle, unclear, and one is checking everything in the function. Now, because you're dealing with mutable language, you'd often find yourself in situation where you think "oh, may be it's something with this variable, where it's being changed?"; and you first look up every use of the variable through this large function, then conditions that could lead to blocks where it's changed; then you figuring out what does this another big block, relevant to the condition, that could've been extracted to a separate function, what variables are used in there; and to the moment you figured out what it's doing you already forgot half of what you were looking before!
Of course sometimes big functions are unavoidable. But if you asked the question, it's probably not your case.
Rule of thumb: you see a function doing two different things having little in common, you want to break it to 2 separate functions. In your case it's parsing XML and "doing something with a". Though if that 2-nd part is a few lines, probably not worth extracting — speculate a bit. Don't worry about the overhead, compilers are good at optimizing. You can either use LTO, or you can declare a function in .cpp file only as static (non-class static), and depending on optimization options a compiler may inline the code.
P.S.: you seem to be in the state where it's very useful to learn'n'play with some Haskell. You don't have to use it for real serious projects, but insights you'd get can be applied anywhere. It forces you into better design, in particular you'd quickly start feeling when it's necessary to break a function (aside of many other things).

How to return the name of a variable stored at a particular memory address in C++

first time posting here after having so many of my Google results come up from this wonderful site.
Basically, I'd like to find the name of the variable stored at a particular memory address. I have a memory editing application I wrote that edits a single value, the problem being that every time the application holding this value is patched, I have to hardcode in the new memory address into my application, and recompile, which takes so much time to upkeep that its almost not worthwhile to do.
What I'd like to do is grab the name of the variable stored at a certain memory address, that way I can then find its address at runtime and use that as the memory address to edit.
This is all being written in C++.
Thanks in advance!
Edit:
Well I've decided I'd like to stream the data from a .txt file, but I'm not sure how to convert the string into an LPVOID for use as the memory address in WriteProcessMemory(). This is what I've tried:
string fileContents;
ifstream memFile("mem_address.txt");
getline(memFile, fileContents);
memFile.close();
LPVOID memAddress = (LPVOID)fileContents.c_str();
//Lots of code..
WriteProcessMemory(WindowsProcessHandle, memAddress, &BytesToBeWrote, sizeof(BytesToBeWrote), &NumBytesWrote);
The code is all correct in terms of syntax, it compiles and runs, but the WriteProcessMemory errors and I can only imagine it has to do with my faulty LPVOID variable. I apologize if extending the use of my question is against the rules, I'll remove my edit if it is.
Compile and generate a so called map file. This can be done easily with Visual-C++ (/MAP linker option). There you'll see the symbols (functions, ...) with their starting address. Using this map file (Caution: has to be updated each time you recompile) you can match the addresses to names.
This is actually not so easy because the addresses are relative to the preferred load address, and probably will (randomization) be different from the actual load address.
Some old hints on retrieving the right address can be found here: http://home.hiwaay.net/~georgech/WhitePapers/MapFiles/MapFiles.htm
In general, the names of variables are not kept around when the program is compiled. If you are in control of the compilation process, you can usually configure the linker and compiler to produce a map-file listing the locations in memory of all global variables. However, if this is the case, you can probably acheive your goals more easily by not using direct memory accesses, but rather creating a proper command protocol that your external program can call into.
If you do not have control of the compilation process of the other program, you're probably out of luck, unless the program shipped with a map file or debugging symbols, either of which can be used to derive the names of variables from their addresses.
Note that for stack variables, deriving their names will require full debugging symbols and is a very non-trivial process. Heap variables have no names, so you will have no luck there, naturally. Further, as mentioned in #jdehaan's answer, map files can be a bit tricky to work with in the best of times. All in all, it's best to have a proper control protocol you can use to avoid any dependence on the contents of the other program's memory at all.
Finally, if you have no control over the other program, then I would recommend putting the variable location into a separate datafile. This way you would no longer need to recompile each time, and could even support multiple versions of the program being poked at. You could also have some kind of auto-update service pulling new versions of this datafile from a server of yours if you like.
Unless you actually own the application in question, there is no standard way to do this. If you do own the application, you can follow #jdehaan answer.
In any case, instead of hardcoding the memory address into your application, why not host a simple feed somewhere that you can update at any time with the memory address you need to change for each version of the target application? This way, instead of recompiling your app every time, you can just update that feed when you need to be able to manipulate a new version.
You cannot directly do this; variable names do not actually exist in the compiled binary. You might be able to do that if the program was written, in say, Java or C#, which do store information about variables in the compiled binary.
Further, this wouldn't in general be possible, because it's always possible that the most up to date copy of a value inside the target program is located inside of a CPU register rather than in memory. This is more likely if the program in question is compiled in release mode, with optimizations turned on.
If you can ensure the target program is compiled in debug mode you should be able to use the debugging symbols emitted by the compiler (the .pdb file) in order to map addresses to variables, but in that case you would need to launch the target process as if it were being debugged -- the plain Read Process Memory and Write Process Memory methods would not work.
Finally, your question ignores a very important consideration -- there need not be a variable corresponding to a particular address even if such information is stored.
If you have the source to the app in question and optimal memory usage is not a concern, then you can declare the interesting variables inside a debugging-friendly structure similar to:
typedef struct {
const char head_tag[15] = "VARIABLE_START";
char var_name[32];
int value;
const char tail_tag[13] = "VARIABLE_END";
} debuggable_int;
Now, your app should be able to search through the memory space for the program and look for the head and tail tags. Once it locates one of your debuggable variables, it can use the var_name and value members to identify and modify it.
If you are going to go to this length, however, you'd probably be better off building with debugging symbols enabled and using a regular debugger.
Billy O'Neal started to head in the right direction, but didn't (IMO) quite get to the real target. Assuming your target is Windows, a much simpler way would be to use the Windows Symbol handler functions, particularly SymFromName, which will let you supply the symbol's name, and it will return (among other things) the address for that symbol.
Of course, to do any of this you will have to run under an account that's allowed to do debugging. At least for global variables, however, you don't necessarily have to stop the target process to find symbols, addresses, etc. In fact, it works just fine for a process to use these on itself, if it so chooses (quite a few of my early experiments getting to know these functions did exactly that). Here's a bit of demo code I wrote years ago that gives at least a general idea (though it's old enough that it uses SymGetSymbolFromName, which is a couple of generations behind SymFromName). Compile it with debugging information and stand back -- it produces quite a lot of output.
#define UNICODE
#define _UNICODE
#define DBGHELP_TRANSLATE_TCHAR
#include <windows.h>
#include <imagehlp.h>
#include <iostream>
#include <ctype.h>
#include <iomanip>
#pragma comment(lib, "dbghelp.lib")
int y;
int junk() {
return 0;
}
struct XXX {
int a;
int b;
} xxx;
BOOL CALLBACK
sym_handler(wchar_t const *name, ULONG_PTR address, ULONG size, void *) {
if (name[0] != L'_')
std::wcout << std::setw(40) << name
<< std::setw(15) << std::hex << address
<< std::setw(10) << std::dec << size << L"\n";
return TRUE;
}
int
main() {
char const *names[] = { "y", "xxx"};
IMAGEHLP_SYMBOL info;
SymInitializeW(GetCurrentProcess(), NULL, TRUE);
SymSetOptions(SYMOPT_UNDNAME);
SymEnumerateSymbolsW(GetCurrentProcess(),
(ULONG64)GetModuleHandle(NULL),
sym_handler,
NULL);
info.SizeOfStruct = sizeof(IMAGEHLP_SYMBOL);
for (int i=0; i<sizeof(names)/sizeof(names[0]); i++) {
if ( !SymGetSymFromName(GetCurrentProcess(), names[i], &info)) {
std::wcerr << L"Couldn't find symbol 'y'";
return 1;
}
std::wcout << names[i] << L" is at: " << std::hex << info.Address << L"\n";
}
SymCleanup(GetCurrentProcess());
return 0;
}
WinDBG has a particularly useful command
ln
here
Given a memory location, it will give the name of the symbol at that location. With right debug information, it is a debugger's (I mean person doing debugging :)) boon!.
Here is a sample output on my system (XP SP3)
0:000> ln 7c90e514 (7c90e514)
ntdll!KiFastSystemCallRet |
(7c90e520) ntdll!KiIntSystemCall
Exact matches:
ntdll!KiFastSystemCallRet ()

storage of user, error, exception messages (c++)

Rather simple question.
Where should I store error,exception, user messages?
By far, I always declared local strings inside the function where it is going to be invoked and did not bother.
e.g.
SomeClass::function1(...)
{
std::string str1("message1");
std::string str2("message2");
std::string str3("message3");
...
// some code
...
}
Suddenly I realized that since construction & initialization are called each time and it might be quite expensive. Would it be better to store them as static strings in class or even in a separate module?
Localization is not the case here.
Thanks in advance.
Why not just use a string constant when you need it?
SomeClass::function1(...)
{
/* ... */
throw std::runtime_error("The foo blortched the baz!");
/* ... */
}
Alternately, you can use static const std::strings. This is appropriate if you expect to copy them to a lot of other std::strings, and your C++ implementation does copy-on-write:
SomeClass::function1(...)
{
static const std::string str_quux("quux"); // initialized once, at program start
xyz.someMember = str_quux; // might not require an allocation+copy
}
If you expect to make lots of copies of these strings, and you don't have copy-on-write (or can't rely on it being present), you might want to look into using boost::flyweight.
TBH its probably best to ONLY construct error messages when they are needed (ie if something goes badly wrong who cares if you get a slowdown). If the messages are always going to appear then its probably best to define them statically to avoid the fact that they will be initialised each time. Generally, though, I only display user messages in debug mode so its quite easy to not show them if you are trying to do a performance build. I then only construct them when they are needed.

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.