C++ Win32 Replace Strings in an Executable - c++

I'm looking for a good way to replace several strings inside a native win32 compiled exe. For example, I have the following in my code:
const char *updateSite = "http://www.place.com"
const char *updateURL = "/software/release/updater.php"
I need to modify these strings with other arbitrary length strings within the exe. I realize I could store this type of configuration elsewhere, but keeping it in the exe meets the portability requirements for my app. I would appreciate any help and/or advice on the best way to do this.
Thanks!
Update: I found some code in the Metasploit project that seems to do this:
MSF:Util:Exe

I would not mess around the the EXE itself, if you really need 1 file, then do the old zip append trick and put your configs in there.
Could look like this:
> BINARY DATA
> ZIP FILE DATA
> 32bit unsigned int which's value is the size of the appended zip file
Pros:
easy to extend / maintain
you don't mess with the exe itself
you can put lots of stuff in there
Contras:
You need to link some compression lib
If you don't want to zip it, then just write some simple uncompressed archive thing your own.

In a PE file is the global relocations table- it is a list of addresses (for example, global variables or constants that must be runtime-stored, like, say, strings) that must be altered by the PE loader. If you knew which entry this particular variable was, you could get it's address and then alter it manually. However, this would be a total bitch and you'd need an in-depth knowledge of your favourite compiler and the PE format. Easier just to use XML or Lua or something else that's totally portable - they were invented for exactly this kind of purpose.
Edit:
Why not just use a const char**? Is there something wrong with this being a normal runtime variable?

IMO the best place to store that strings in a string table resource. It's incorporated into your .EXE file, so the portability will not be compromised.
Use the visual studio editor to alter that values.
Use LoadString WinAPI, or better, CString::LoadString method, in your code, to load the values.
There's also 3-rd party software allowing you to modify the strings in the compiled .EXE, without recompilation.

Related

Reverse offsetof / Get name of element by offset

Given the offset of an element in a C++ struct, how can I find its name/type without manual counting? This would be especially useful when decoding ASM code where such offsets are regularly used. Ideally the tool would parse a C(++) header file and then give the answer from that. Thanks for any pointers :)
One such tool might be the compiler itself (using the same ABI-relevant flags as used to generate the code). Create a small program which includes the header file, then prints the result of offsetofapplied to each struct's members. You'll then have a suitable look-up table which you could refer to manually, or use as input to another tool you might write.
It may be possible (depending on the complexity of the headers) to auto-generate the program above (you'll probably want to run the header through the C preprocessor first, to expand macros and select the correct branch of conditionals).

Safely embedding a string in C code (Secure string, Secure char*)

I have a dll (ansi c) that has some string litarals defined.
__declspec(dllexport) char* GetSomeString()
{
return "This is a test string from TestLib.dll";
}
When compiled this string is still visible in "notepad" for example. I'm fairly new to C, so I was wondering, is there a way to safely store string literals?
Should I do it with a resx file (for example), that has some encrypted values, or what would be the best way?
Thanks
EDIT 1:
The scenario is basically the following in pseudo code:
if(hostname)
return hostname
else
return "Literal String"';
It's this "literal string" that I would like to see "secured" in some way..
Don't put your secrets on anyone else's computer if you want them to stay secret.
See my related answer, The #1 Law of Software Licensing
And Eric Lippert's similar answer
First of all, since your executable1 needs to decode that literal in memory, any attacker determined enough will be able to do the same; often it's just as easy as freezing the process after startup (or after it needed to use the string we want), creating a memory dump and use utilities like string over it. There are methods to mitigate the issue (e.g. zeroing the memory used by a sensitive string immediately after using it), but since your code is on a machine where the potential attacker has all the privileges, you can only put roadblocks: in the end your executable is completely in the attacker's hands.
That being said, if your concern is just "not leaving important strings en plein air" you may just run an executable packer/encrypter over your whole dll. This is as easy as adding a post-build step in your solution, the packer will compress/encrypt the whole executable image and build an executable that when launched will decrypt and run it in memory.
This method has the great advantage of not requiring any change to your code: you just run upx over the compiled dll and you get your compressed dll, no XORs or weird literals spread across your code are needed.
Of course, this is quite weak security (basically it will just protect from snooping around in the executable with notepad or a hex editor), but again, storing critical "secrets" in an executable that is going to be distributed is a bad idea in first place.
In the whole answer I "executable" is to be intended in the wide meaning - i.e. also dlls are included.
You probably want to store hardcoded passwords in the library, right? You can XOR the string with some value, and store it, then read it and XOR again. It's the simplest way, but it doesn't protect your string from any kind of disassembling/reverse engineering.

How do I associate changed lines with functions in a git repository of C code?

I'm attempting to construct a “heatmap” from a multi-year history stored in a git repository where the unit of granularity is individual functions. Functions should grow hotter as they change more times, more frequently, and with more non-blank lines changed.
As a start, I examined the output of
git log --patch -M --find-renames --find-copies-harder --function-context -- *.c
I looked at using Language.C from Hackage, but it seems to want a complete translation unit—expanded headers and all—rather being able to cope with a source fragment.
The --function-context option is new since version 1.7.8. The foundation of the implementation in v1.7.9.4 is a regex:
PATTERNS("cpp",
/* Jump targets or access declarations */
"!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
/* C/++ functions/methods at top level */
"^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
/* compound type at top level */
"^((struct|class|enum)[^;]*)$",
/* -- */
"[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
This seems to recognize boundaries reasonably well but doesn’t always leave the function as the first line of the diff hunk, e.g., with #include directives at the top or with a hunk that contains multiple function definitions. An option to tell diff to emit separate hunks for each function changed would be really useful.
This isn’t safety-critical, so I can tolerate some misses. Does that mean I likely have Zawinski’s “two problems”?
I realise this suggestion is a bit tangential, but it may help in order to clarify and rank requirements. This would work for C or C++ ...
Instead of trying to find text blocks which are functions and comparing them, use the compiler to make binary blocks. Specifically, for every C/C++ source file in a change set, compile it to an object. Then use the object code as a basis for comparisons.
This might not be feasible for you, but IIRC there is an option on gcc to compile so that each function is compiled to an 'independent chunk' within the generated object code file. The linker can pull each 'chunk' into a program. (It is getting pretty late here, so I will look this up in the morning, if you are interested in the idea. )
So, assuming we can do this, you'll have lots of functions defined by chunks of binary code, so a simple 'heat' comparison is 'how much longer or shorter is the code between versions for any function?'
I am also thinking it might be practical to use objdump to reconstitute the assembler for the functions. I might use some regular expressions at this stage to trim off the register names, so that changes to register allocation don't cause too many false positive (changes).
I might even try to sort the assembler instructions in the function bodies, and diff them to get a pattern of "removed" vs "added" between two function implementations. This would give a measure of change which is pretty much independent of layout, and even somewhat independent of the order of some of the source.
So it might be interesting to see if two alternative implementations of the same function (i.e. from different a change set) are the same instructions :-)
This approach should also work for C++ because all names have been appropriately mangled, which should guarantee the same functions are being compared.
So, the regular expressions might be kept very simple :-)
Assuming all of this is straightforward, what might this approach fail to give you?
Side Note: This basic strategy could work for any language which targets machine code, as well as VM instruction sets like the Java VM Bytecode, .NET CLR code, etc too.
It might be worth considering building a simple parser, using one of the common tools, rather than just using regular expressions. Clearly it is better to choose something you are familiar with, or which your organisation already uses.
For this problem, a parser doesn't actually need to validate the code (I assume it is valid when it is checked in), and it doesn't need to understand the code, so it might be quite dumb.
It might throw away comments (retaining new lines), ignore the contents of text strings, and treat program text in a very simple way. It mainly needs to keep track of balanced '{' '}', balanced '(' ')' and all the other valid program text is just individual tokens which can be passed 'straight through'.
It's output might be a separate file/function to make tracking easier.
If the language is C or C++, and the developers are reasonably disciplined, they might never use 'non-syntactic macros'. If that is the case, then the files don't need to be preprocessed.
Then a parser is mostly just looking for a the function name (an identifier) at file scope followed by ( parameter-list ) { ... code ... }
I'd SWAG it would be a few days work using yacc & lex / flex & bison, and it might be so simple that their is no need for the parser generator.
If the code is Java, then ANTLR is a possible, and I think there was a simple Java parser example.
If Haskell is your focus, their may be student projects published which have made a reasonable stab at a parser.

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.file.name and tar.gz Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?

Including huge string in our c++ programs?

I am trying to include huge string in my c++ programs, Its size is 20598617 characters , I am using #define to achieve it. I have a header file which contains this statement
#define "<huge string containing 20598617 characterd>"
When I try to compile the program I get error as fatal error C1060: compiler is out of heap space
I tried following command line options with no success
/Zm200
/Zm1000
/Zm2000
How can I make successful compilation of this program?
Platform: Windows 7
You can't, not reliably. Even if it will compile, it's liable to break the runtime library, or the OS assumptions, and so forth.
If you tell us why you're trying to do it, we can offer lots of alternatives. Deciding how to handle arbitrarily large data is a major part of programming.
Edited to add:
Rather than guess, I looked into MSDN:
Prior to adjacent strings being
concatenated, a string cannot be
longer than 16380 single-byte
characters.
A Unicode string of about one half
this length would also generate this
error.
The page concludes:
You may want to store exceptionally
large string literals (32K or more) in
a custom resource or an external file.
What do other compilers say?
Further edited to add:
I created a file like this:
char s[] = {'x','x','x','x'};
I kept doubling the occurrences of 'x', testing each one as an #include file.
An 8388608 byte string succeeded; 16777216 bytes failed, with the "out of heap space" error.
I suspect you are running into a design limit on the size of a character string.
Most people really think that a million characters is long enough :-}
To avoid such design limits, I'd try not to put the whole thing into a single literal string. On the suspicion that #define macro bodies likewise have similar limits, I't try not to put the entire thing in a single #define, either.
Most C compilers will accept pretty big lists of individual characters as initializers. If you write
char c[]={ c1, c2, ... c20598617 };
with the c_i being your individual characters, you may succeed. I've seen GCC2 applications where there were 2 million elements like this (apparantly they were loading some type of ROM image). You might even be able to group the c_i into blocks of K characters for K=100, 1000, 10000 as suits your tastes, and that might actually help the compiler.
You might also consider running your string through a compression algorithm,
putting the compressed result into your C++ file by any of the above methods,
and decompressing after the program was loaded.
I suspect you can get a decompression algorithm into a few thousand bytes.
Store the string to a file and just open and read it...
Its much cleaner/organized that way [i'm assuming that right now you have a file named blargh.h which contains that one #Define...]
Um, store the string in a separate resource of some sort and load it in? Seriously, in embedded land, you would have this as a separate resource and not hold it in RAM. On windows, I believe you can use .dlls or other external resources to handle this for you. Compilers aren't designed to hold this size of resources for you and they will fail.
Increase the compiler heap space.
If your string comes from a large text or binary file, you may have luck with either the xxd -i command (to get everything in an array, per Ira Baxter's answer) or a variant of the bin2obj command (to get everything into a .o file you can link into the program).
Note that the string may not be null terminated in this case.
See answers to the earlier question, "How can I get the contents of a file at build time into my C++ string?"
(Also, as an aside: note the existence of the .xbm format.)
This is a very old question, but since there's no definitive answer yet: C++11's raw string literals seem to do the job.
This compiles nicely on GCC 4.8:
#include <string>
std::string data = R"(
... <1.4 MB of base85-encoded string> ...
)";
As said in other posts in this thread, this is definitely not the preferred way of handling large amounts of data.