Getting Rid of Unnecessary Input Files

Getting Rid of Unnecessary Input Files - c++

I have the following file. I wrote a C++ program that copies and pastes it on another one (along with additional stuff). However, I want to get rid of it. That is, I do not want to distribute both an executable and this file. I tried hard-coding its contents, but it is tedious since there are special characters (like ", \n, etc.), and a string variable may not always have the necessary memory to hold all of that data.
What else can I do?

The first thing that I was going to suggest is to encode the content of the file. Automatically, of course. I could imagine a nightmare of typing that in. That being said, there is a variety of tools out there. For example, bin2h. Or, for instance, there is a Qt Resources framework that is indeed nice, but is probably not worth pulling in a dependency unless you already use Qt.
However, if the content of whatever you are trying to cope with is large enough that it does not fit into memory, you have no other way but stick with having that file distributed externally along with a program. In fact, this is a pretty common way of doing things. For example, most of the heavy-weight (and not only) applications for OS X (and iOS) are distributed as "bundles", which is nothing but a zip compressed file with resource file(s) and executable(s). That as well might be a solution if you are targeting one of the platforms that promote such distribution practices.
Hope it helps :)

Related

c++ separate files?

My question is, when you compile your c++ program, why is it all put into one exe file? The file could become to large. Would you use dll libraries to shrink the size, or are there other files you can make? I just want to know how to make a program that uses separate files to run.
(EDIT) I just don't want it all in a single file. Files could become too large eventually for the computer to handle it, right? There must be a way to separate the files. Like in java, everything is in a class file, which just seems easier and more efficient. Some drives like FAT32 can't have a file bigger than 4 gigabytes, so they need a more broken down program. I looked at my game called portal, its exe is 100KB and it has about 100 dll files!

To answer your question, yes. You absolutely can split your program into separate DLL files if you'd like.
I've seen some developers compile utility functions into a separate common DLL files which can be included in other projects as references. This way its objects and methods can be be called from it.
In hindsight, compiled code is relatively small. Binary data is really what consumes the most space: videos, images, models, sounds, etc. Although it is possible and common for smaller programs to pack these resources directly into the executable, it generally isn't a good idea for many obvious reasons.
Finally, large executables aren't a huge problem with today's technology. For smaller programs, I wouldn't sweat it. It's more about the design development the larger the project gets.

Too large for what? If it was due to storage space restrictions, splitting it into multiple files wouldn't buy you anything. Unless you are somehow overflowing the maximum size for a file on a platform (like a 2GB limit on some 32-bit platforms), which seems very unlikely, you are probably worrying about a non-issue.
You can reduce the size of the generated executable by turning off debug options in the compiler, "stripping" it on various platforms, setting optimization settings to optimize for code size rather than execution speed, etc.

The way to split an exe into several files (on Windows) is, as the OP suggested, to use dlls.
The commenters are correct that this will not actually take any less space on disk, nor save any memory, but there are other reasons to split an application into multiple files. For example, to share code (a single dll can be used by several applications), or to help an application load more quickly (only load the dll when it is actually needed).

hibernate-like saving state of a program

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.

What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.

I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)

There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.

Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.

Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

how to encrypt data files in C++?

now I am writing a app in C++, and currently my app reads models or parameters from several data files. Those files, i.e. self-define dictionary, are currently stored in plain text and to be loaded dynamically by C++ while runtime.
Yet, I don't want those files to be easily seen by my client while they get the released application, so I need to encrypt the file first. What's the general practice for this situation?
And those file are huge in size, so compile to a resource file is not a good option.
Actually I just need a simple 'encryption', at least not plain text stored in released version. And I dont want the encryption libraries which will load the whole file into the memory first in order to perform decryption, since the files are huge and no need to load its whole body into memory at one time.
Thanks!

Usually when you want to deal with encryption in C++ people tend to go for Open SSL libraries which encompass all of the functionality in a pretty standard way.
You'd have to get yourself a copy of the library and some code samples, but it's a pretty common thing and there's lots of documentation around.

Edit strings vars in compiled exe? C++ win32

I want to have a few strings in my c++ app and I want to be able to edit them later in the deployed applications (the compiled exe), Is there a way to make the exe edit itself or it resources so I can update the strings value?
The app checks for updates on start, so I'm thinking about using that to algo send the command when I need to edit the strings (for example the string that contains the url used to check for updates).
I don't want to use anything external to the exe, I could simply use the registry but I prefer to keep everything inside the exe.
I am using visual studio 2010 c++ (or any other version of ms visual c++).

I know you said you don't want to use anything external to the program, but I think what you really want in this case is a resource-only DLL. The executable can load whichever DLL has the strings that you need in a given invocation.

Another idea is to move the strings into a "configuration" file, such as in XML or INI format.
Modifying the EXE without compilation is hacking and highly discouraged. You could use a hex editor, find the string and modify it. The new text must be have a length less than or equal to the original EXE.
Note, some virus checkers perform CRCs or checksums on the executables. Altering the executables is a red flag to these virus checkers.

It is impossible, unless your strings won't change in position & length.
So to make it possible: make your "size" of the, in your example, URL that is used to get updates pretty big (think of: 512 characters, null-filled at the end). This way, you have got some space to update the String.
Why is it impossible to use variable-sized strings? Well I can explain this with a small x86 Assembler snippet:
PUSH OFFSET test.004024F0
Let's say; at the offset of test.004024F0 is your variable-sized string. Now consider the change:
I want to insert a string, which is longer than the original string, which is stored before the string at 004024F0: This makes 004024F0 to a new value, let's say: 004024F5 (the new string, before this entry, is 5 characters longer than it's original).
You think it's simple: search for all 004024F0 and replace it with 004024F5? Wrong. 004024F0 can also be a regular "instruction" (to be precise: ADD BYTE PTR DS:[EAX+24],AL; LOCK ...). If this instruction happens to be in your code, it'll be replaced by something wrong.
Well, you might think, what about searching for that PUSH instruction? Wrong: there are virtually unlimited ways to "PUSH". For instance, MOV EAX, 004024F0; MOV ESP, EAX; ADD ESP, 4. There is also the possibility that the field is calculated: MOV EAX, 00402000; ADD EAX, 4F0; .... So this makes it "virtually unlimited".
However, if you use statically sized fields; you don't have to change the code refering to Strings. If you reserve enough space of a specific field, then you can easily write a "longer" string than original, because the size of a string is calculated by finding the first "null-byte"; pad the rest of the field with nulls.
If you use statically sized fields, it's, however, very hard to find the "position in the file", at compile-time. Considering a lot of time spending hacking your own app; you can write code that modifies the .exe, and stores a new String value at a specified offset. This file-offset isn't known at compile time, and you can patch this file-offset yourself later, using a tool like OllyDbg. This enables the executable to patch itsself :-)

Creating a self-editing exe is a very ill-advised approach to solving this problem. You are much better off storing and reading the strings from an external file. Maybe if you provide some background as to why you don't want to use anything but an exe, we can address those issues?

In theory, BeginUpdateResource, UpdateResource and EndUpdateResource are intended for this purpose. In reality, getting these to work at all is pretty tricky. I'm not at all sure they'll work for updating resources in a running executable.

Not wanting to chastise, but this doesn't sound like a great idea. Having the URL for checking for updates baked inside the program makes it inflexible.
You're trying to mitigate the inflexibility by rewriting the strings in your exe. This is really asking for trouble:
are you sure users that run your program have write permission to be able to update the exe? Default users have no write access to files installed in the program folder.
If the program is run by multiple users or simply multiple times by the same user, the exe will be locked and unmodifiable
Systems administrators will have a hard time tweaking the URL.
There is a real risk you will corrupt your exe. The rewrite process is likely to be complex, especially if you want to make the URL longer than is presently allocated.
By modifying your exe, you remove the possiblity of using code signing, which can be useful in a networked environment.
The registry (despite all it's weaknesses) is really where this kind of configuration data should go. You can put a default value in your EXE, but if you need to make changes, put them in the registry. This makes the changes transparent, saving you a lot of grief later.
Yyour algorithm that wants to write a new URL for updates should do this by writing it to the registry. Alternatively, have a config file that ships alongsite with your exe, and update that. (But bear in mind user permissions - you may not have write access to that file, but you can always write to the user hive of the registry.)

How to create a virtual file?

I'd like to simulate a file without writing it on disk. I have a file at the end of my executable and I would like to give its path to a dll. Of course since it doesn't have a real path, I have to fake it.
I first tried using named pipes under Windows to do it. That would allow for a path like \\.\pipe\mymemoryfile but I can't make it works, and I'm not sure the dll would support a path like this.
Second, I found CreateFileMapping and GetMappedFileName. Can they be used to simulate a file in a fragment of another ? I'm not sure this is what this API does.
What I'm trying to do seems similar to boxedapp. Any ideas about how they do it ? I suppose it's something like API interception (Like Detour ), but that would be a lot of work. Is there another way to do it ?
Why ? I'm interested in this specific solution because I'd like to hide the data and for the benefit of distributing only one file but also for geeky reasons of making it works that way ;)
I agree that copying data to a temporary file would work and be a much easier solution.

Use BoxedApp and do not worry.

You can store the data in an NTFS stream. That way you can get a real path pointing to your data that you can give to your dll in the form of
x:\myfile.exe:mystreamname
This works precisely like a normal file, however it only works if the file system used is NTFS. This is standard under Windows nowadays, but is of course not an option if you want to support older systems or would like to be able to run this from a usb-stick or similar. Note that any streams present in a file will be lost if the file is sent as an attachment in mail or simply copied from a NTFS partition to a FAT32 partition.
I'd say that the most compatible way would be to write your data to an actual file, but you can of course do it one way on NTFS systems and another on FAT systems. I do recommend against it because of the added complexity. The appropriate way would be to distribute your files separately of course, but since you've indicated that you don't want this, you should in that case write it to a temporary file and give the dll the path to that file. Make sure you write the temporary file to the users' temp directory (you can find the path using GetTempPath in C/C++).
Your other option would be to write a filesystem filter driver, but that is a road that I strongly advise against. That sort of defeats the purpose of using a single file as well...
Also, in case you want only a single file for distribution, how about using a zip file or an installer?

Pipes are for communication between processes running concurrently. They don't store data for later access, and they don't have the same semantics as files (you can't seek or rewind a pipe, for instance).
If you're after file-like behaviour, your best bet will always be to use a file. Under Windows, you can pass FILE_ATTRIBUTE_TEMPORARY to CreateFile as a hint to the system to avoid flushing data to disk if there's sufficient memory.
If you're worried about the performance hit of writing to disk, the above should be sufficient to avoid the performance impact in most cases. (If the system is low enough on memory to force the file data out to disk, it's probably also swapping heavily anyway -- you've already got a performance problem.)
If you're trying to avoid writing to disk for some other reason, can you explain why? In general, it's quite hard to stop data from ever hitting the disk -- the user can always hibernate the machine, for instance.

Since you don't have control over the DLL you have to assume that the DLL expects an actual file. It probably at some point makes that assumption which is why named pipes are failing on you.
The simplest solution is to create a temporary file in the temp directory, write the data from your EXE to the temp file and then delete the temporary file.
Is there a reason you are embedding this "pseudo-file" at the end of your EXE instead of just distributing it with our application? You are obviously already distributing this third party DLL with your application so one more file doesn't seem like it is going to hurt you?
Another question, will this data be changing? That is are you expecting to write back data this "pseudo-file" in your EXE? I don't think that will work well. Standard users may not have write access to the EXE and that would probably drive anti-virus nuts.
And no CreateFileMapping and GetMappedFileName definitely won't work since they don't give you a file name that can be passed to CreateFile. If you could somehow get this DLL to accept a HANDLE then that would work.
And I wouldn't even bother with API interception. Just hand the DLL a path to an acutal file.

Reading your question made me think: if you can pretend an area of memory is a file and have kind of "virtual path" to it, then this would allow loading a DLL directly from memory which is what LoadLibrary forbids by design by asking for a path name. And this is why people write their own PE loader when they want to achieve that.
I would say you can't achieve what you want with file mapping: the purpose of file mapping is to treat a portion of a file as if it was physical memory, and you're wanting the reciprocal.
Using Detours implies that you would have to replicate everything the intercepted DLL function does except from obtaining data from a real file; hence it's not generic. Or, even more intricate, let's pretend the DLL uses fopen; then you provide your own fopen that detects a special pattern in the path and you mimmic the C runtime internals... Hmm is it really worth all the pain? :D

Please explain why you can't extract the data from your EXE and write it to a temporary file. Many applications do this -- it's the classic solution to this problem.
If you really must provide a "virtual file", the cleanest solution is probably a filesystem filter driver. "clean" doesn't mean "good" -- a filter is a fully documented and supported solution, so it's cleaner than API hooking, injection, etc. However, filesystem filters are not easy.
OSR Online is the best place to find Windows filesystem information. The NTFSD mailing list is where filesystem developers hang out.

How about using a some sort of RamDisk and writing the file to this disk? I have tried some ramdisks myself, though never found a good one, tell me if you are successful.

Well, if you need to have the virtual file allocated in your exe, you will need to create a vector, stream or char array big enough to hold all of the virtual data you want to write.
that is the only solution I can think of without doing any I/O to disk (even if you don't write to file).
If you need to keep a file like path syntax, just write a class that mimics that behaviour and instead of writing to a file write to your memory buffer. It's as simple as it gets. Remember KISS.
Cheers

Open the file called "NUL:" for writing. It's writable, but the data are silently discarded. Kinda like /dev/null of *nix fame.
You cannot memory-map it though. Memory-mapping implies read/write access, and NUL is write-only.

I'm guessing that this dll cant take a stream? Its almost to simple to ask BUT if it can you could just use that.

Have you tried using the \?\ prefix when using named pipes? Many APIs support using \?\ to pass the remainder of the path directly through without any parsing/modification.
http://msdn.microsoft.com/en-us/library/aa365247(VS.85,lightweight).aspx

Why not just add it as a resource - http://msdn.microsoft.com/en-us/library/7k989cfy(VS.80).aspx - the same way you would add an icon.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js