Should I bother about user might mess up my program's files? - c++

I am writing a C++ program for Linux which creates some files on the disk during the work. These files contain information about program's internal objects state, so that the next time the program is started it reads these files to resume previous session. Some of these files are also being read/written to during the execution to read/write some variables values. The problem is that modifying/renaming/deleting these files would lead in undefined behavior, segmentation faults and other surprises causing the program to crash. I am certainly not going to restrict the user in accessing files on his/her machine, but inside the program I can always check if a file has been modified before accessing it to at least prevent the program from crash. It would involve many extra checks and make the code larger though.
The question is what is a good practice to deal with such issues? Should I even be so paranoid, or just expect the user to be smart enough to not mess with program's files?

First, check your resources and if it is worth the effort. Will the user be even tempted to trace and edit these files?
If so, my advice is this: Don't be concerned whether or not the file has been modified. Rather you should validate the input you get (from the file).
This may not be the most satisfying answer, but error handling is a big part of programming, especially when it comes to input validation.
Let assume you are writing into a ~/.config/yourApp/state.conf.
Prepare defaults.
Does the file exist? If not use defaults.
Use some well known structure like INI, JSON, YAML, TOML, you name it, that the user can understand and the application can check for integrity (libraries may help here). If the file is broken use defaults.
In case the user does delete a single entry, use the default value.
Some values deserve special checking in case a out of bound access would lead to undefined behavior. If the values is out of bounds, use a default.
Ignore unknown fields.
In case you can't recover from a malformed input, provide a meaningful error message, so the user may be able to restore a save state (restore from a backup or start over from the beginning).

If using files are really important to your codebase then you can simply hide those files by adding dot "." before them. This way the user will not be able to know the files are there unless checked specifically.
If you want to access those files from anywhere in the system you can also store these files in some temp hidden folder which can be accessed easily.
eg: ~/.temp_proj/<file>
This way the folder will be hidden from the user and you can use it for your own program.

Related

How to throw "The file is in use"

I have some files (.xml extensions) that my app requires them to be present as long as my application is open.
So, is there a cross-platform solution to mark these files as "File in use" so that the user cannot delete or modify them?
Since you specify you need it to work cross-platform, you might want to use Qt with QFile::setPermissions and set it to QFileDevice::ReadOwner. Do note the platform-specifc notes the documentation makes. There is nothing similar in the C++ Standard Library as far as I am aware.
Edit: turns out I was wrong! Since C++17 can use std::filesystem::permissions and set the permissions to read-only.
These steps could work for you:
read and store the file(s) to memory (or perhaps a temporary storage if memory is a problem)
implement a file-change watcher (concrete solutions here How do I make my program watch for file modification in C++? )
if a change occurs, you could:
overwrite the changed file (from data you have in memory or in temporary storage); or create the file anew if it was deleted
notify the user that the files have been changed and the program might not work correctly
Not sure about this one, never tried it, but could theoretically block access:
Open the file(stream) for reading and writing but don't close it (until program finishes)
try to even read or write from/to file
File management is always tricky because it is operating system dependant, although most OS behave similarly. Here you have some ideas that could work:
If the .xml files are never modified: make them read-only. The C++17 standard introduced a method to configure file permissions (https://en.cppreference.com/w/cpp/filesystem/permissions), so you can always ensure that they are read-only from your application. However, this will not prevent some user deleting them, but for example in linux you will see a warning when trying to remove the files with "rm".
If the files are not that big, I would just parse the XML files at the beginning of the program and keep the data structures in RAM, so then you can just forget about the actual files in disk.
An alternative would be to just copy the .xml files to a temporary location, rarely someone will be deleting temporary files. The "tmp" directories are platform dependant, but the lib has a method to create temporary files, so you could create one for each xml and copy their contents: http://www.cplusplus.com/reference/cstdio/tmpfile/
Since it is platform-dependent and you want a cross-platform solution, you'll have to make it by preprocessor flags. So you have to consider all the platforms that your application will have support for and write special code for each of them. As far as I know with Windows and Linux you can use std::filesystem::permissions. Just set read-only and OS will automatically warn the user once he wants to remove any of marked files. Also, tmpfile is mentioned in the answers could be a good fit if you don't say I exactly need to set file permissions.

Are preprocessor directives safe for sensitive information?

I am creating an archive which contains HTML/CSS/JS files for my C++ application, and I don't want users to have access to these files. So, i decided to encrypt archive with password.
My first thought was to store a password inside a program via preprocessor macro (through CMake). But, is it safe?
Can you access the password back from compiled application? (exe, in my case)
And if you can, how to protect from it? Is it technically possible or i should give up and leave it as is?
If the macro is actually used in the application then yes, it's accessible in the executable file -- it has to be for the program to use it.
Any credential you embed in your program can be recovered by a sufficiently-motivated attacker. There is no encryption mechanism you can use to prevent this, as you would need to provide the decryption key for the program to function.
The proof is very simple: if the program itself can obtain the credential without any user input, then the executable file must contain the key, or all of the information needed to produce/derive the key. Therefore, it must be possible for anyone (with the requisite expertise) to produce the credential with only the information in the executable file.
This could be done by inspecting the executable. It could also be done by running the executable under the supervision of a debugger and watching what it is doing.
This is the same reason DRM schemes are pointless -- the consumer must be in possession of a key to use the material, and if they can get their hands on the key (they must be able to in order for them to consume the content) then they scheme doesn't work. (Of course, in newer DRM schemes the key is buried in a chip that is designed to destroy the key if it is opened, but that just means it's difficult to get the key, not impossible.)
tl;dr: It's never a question of whether it's possible to recover an embedded key. It's always possible. It's a question of how much effort it will take to recover that key.

Are FindFirstFile, FindNextFile APIs unreliable?

For my purpose I was looking to optimize ways of recursively enumerating subfolders from a given folder on the NTFS file system on Windows, and I came across this little "gem" from the Microsoft's page for the FindFirstFile API:
Note In rare cases or on a heavily loaded system, file attribute
information on NTFS file systems may not be current at the time this
function is called. To be assured of getting the current NTFS file
system file attributes, call the GetFileInformationByHandle function.
So, let me try to understand it.
I do rely on the dwFileAttributes parameter returned in the WIN32_FIND_DATA struct to tell a file from a folder. So what this note suggests is that in some cases I might get some bogus result, right? If so, why not fix it in one of their updates instead of posting it here?
And also their suggested workaround of using GetFileInformationByHandle API. How exactly am I supposed to call it? It takes a file handle. So do they really want us to open each file that the FindNextFile returns and call GetFileInformationByHandle on it? Can you imagine "how far" my optimization would go with such approach?
Anyway, it'd be nice if someone could shed some light on this...
Distinguishing a file from a folder will be OK, because that information is likely to be constant. Files aren't being turned into folders or folders into files.
The documentation says "may not be current" because other processes may be changing attributes, and without a locking mechanism to synchronize the attributes are being written lazily. If your application requires absolutely current info, you retrieve it ...ByHandle which insures that the information is current.
This is the way every status-reporting function works. At best, it reports the status at some undefined point in-between when you called the function and when the function returned. But it doesn't "freeze the world" to ensure that the data is still valid later.
Rather than noting this on every single function, documentation typically only notes it on functions that tend to lead to severe problems, particularly security problems, when this is not taken into account.
If you open a file and get a handle to it, you are assured that all your operations using that handle will be to the same underlying file. But when you perform operations by name, there is no such assurance. Files can be created, deleted, and renamed. So the same name may not later refer to the same file.
dwFileAttributes is not something that is going to be unreliable when it comes to telling the difference between files and folders. I think that note is referring to information that might be cached for update by the file system (modified/accessed timestamps, etc) but whether an item is a file or a folder is not something that is going to change.

Can code formatting lead to change in object file content?

I have run though a code formatting tool to my c++ files. It is supposed to make only formatting changes. Now when I built my code, I see that size of object file for some source files have changed. Since my files are very big and tool has changed almost every line, I dont know whether it has done something disastrous. Now i am worried to check in this code to repo as it might lead to runtime error due to formatting tool. My question is , will the size of object file be changed , if code formatting is changed.?
Brief answer is no:)
I would not check your code into the repo without thoroughly checking it first (review, testing).
Pure formatting changes should not change the object file size, unless you've done a debug build (in which case all bets are off). A release build should be not just the same size, but barring your using __DATE__ and such to insert preprocessor content, it should be byte-for-byte the same as well.
If the "reformatting" tool has actually done some micro-optimizations for you (caching repeated access to invariants in local vars, or undoing your having done that unnecessarily), that might affect the optimization choices the compiler makes, which can have an effect on the object file. But I wouldn't assume that that was the case.
if ##__LINE__ macro is used might produce longer strings. How different are the sizes?
(this macro is often hides in new and assert messages in debug.)
just formatting the code should not change the size of the object file.
It might if you compile with debugging symbols, as it might have added more line number information. Normally it wouldn't though, as has already been pointed out.
Try comparing object files built without debugging symbols.
Try to find a comparison tool that won't care about the formatting changes (like perhaps "diff--ignore-all-space") and check using that before checking in.

detect modified .exe (build)

Is there anyway for a program to know if it has been modified since it was compiled and built?
I'd like to prevent the .exe from being modified after I build it.
You could use a private key to sign the EXE, and public key to check that signature. I haven't worked with the EXE file format in nearly 20 years, but as I recall there are spaces where you could store such a signature. Of course, the portion of the file that you're checking would have to exclude the signature itself.
However, if you're trying to do this to prevent cracking your EXE, you're out of luck: the cracker will simply patch out the code that validates the signature.
even if you know.. the person who knows you had such a prevent, will change computer time to your build time than modify this exe..
so it can not be a prevention..
Is it possible for a program to know if it has been modified since it was built?
Yes. A checksum of the rest of the program can be stored in an isolated resource string.
Is it possible for a program to know if it was maliciously modified since it was built?
No. The checksum, or even the function that executes and compares it, could be modified as well.
Are you talking about Tamper Aware and Self Healing Code?
The article demonstrates detecting
hardware faults or unauthorized
patches; back patching the executable
to embed the expected hash value of
the .text section; and demonstrates
the process of repairing the effects
of hostile code (for example, an
unauthorized binary patcher). The
ideas presented in the article work
equally well whether the executable
was patched on disk or in-memory.
However, the self repair occurs in
memory.
Most popular compilers have a switch to fill in the "Checksum" field of the PE header, or, you can leave it blank and supply your own custom vale. At any rate this is the 'standard' place to store such data.
Unfortunately there's no real way to stop someone tampering with a binary, because you'll have to put checks inside the exe itself to detect it, at which point they can be patched out.
One solution to this problem is to encrypt certain functions and use the checksum of some known data as the key (for example the checksum of another function). Then, when you leave the function you reencrypt it. Obviously you'll need to come up with your own prologue/epilogue code to handle this. This is not really suitable if your program is heavily multi-threaded, but if you're single-threaded or only lightly threaded (and can serizalize access to the functions and control all entry points) then this will 'raise the bar' if you will.
That is a step above most 'packers' which simply encrypt the .text/.data/.rdata/etc sections and decrypt it all at runtime. These are very easy to 'dump', as all you have to do is run the program, suspend all its threads, then dump the memory to a file. This attack works against Themida for example (one of the most aggressive packers). From there all you need to do is rebuild the IAT, fix up some relocs, etc.
Of course it's still possible for the attacker to use a debugger to dump out the unencrypted code and hence 'unpack' the exe, but obviously nothing is foolproof.