Writing raw data "signature" to disk without corrupting filesystem - c++

I am trying to create a program that will write a series of 10-30 letters/numbers to a disk in raw format (not to a file that the OS will read). Perhaps to make my attempt clearer, if you were to open the disk in a hex editor, you would see the 10-30 letters/numbers but a file manager such as Windows Explorer would not see it (because the data is not a file).
My goal is to be able to "sign" a disk with a series of characters and to be able to read and write that "signature" in my program. I understand NTFS signs its partitions with a NTFS flag as do other file systems and I have to be careful to not write my signature to any of those critical parts.
Are there any libraries in C++/C that could help me write at a low level to a disk and how will I know a safe sector to start writing my signature to? To narrow this down, it only needs to be able to write to NTFS, FAT, FAT32, FAT16 and exFAT file systems and run on Windows. Any links or references are greatly appreciated!
Edit: After some research, USB drives allow only 1 partition without applying hacking tricks that would unfold further problems for the user. This rules out the "partition idea" unfortunately.

First, as the commenters said, you should look at why you're trying to do this, and see if it's really a good idea. Most apps which try to circumvent the normal UI the user has for using his/her computer are "bad", in various ways.
That said, you could try finding a well-known file which will always be on the system and has some slack in the block size for the disk, and write to the slack. I don't think most filesystems would care about extra data in the slack, and it would probably even be copied if the file happens to be relocated (more efficient to copy the whole block at the disk level).
Just a thought; dunno how feasible it would be, but seems like it could work in theory.

Though I think this is generally a pretty poor idea, the obvious way to do it would be to mark a cluster as "bad", then use it for your own purposes.
Problems with that:
Marking it as bad is non-trivial (on NTFS bad clusters are stored in a file named something like $BadClus, but it's not accessible to user code (and I'm not even sure it's accessible to a device driver either).
There are various programs to scan for (and attempt to repair) bad clusters/sectors. Since we don't even believe this one is really bad, almost any of these that works at all will find that it's good and put it back into use.
Most of the reasons people think of doing things like this (like tying a particular software installation to a particular computer) are pretty silly anyway.
You'd have to scan through all the "bad" sectors to see if any of them contained your signature.

This is very dangerous, however, zero-fill programs do the same thing so you can google how to wipe your hard drive with zero's in C++.
The hard part is finding a place you KNOW is unused and won't be used.

Related

boost::filesystem:path detect that two paths share same physical drive

Background
I have some complex application which can consume lots of disc space (~10TB). To prevent unheated errors which comes from disc full scenarios my application has some logic which mages stored data.
Currently runs on Windows platform, but it is ported to Linux.
Problem
It is possible that two kinds of data are stored on different physical drive. Depending on that business logic is a bit different. Now on Windows physical drive can be identified by boost::filesystem::path::root_path() (it is not perfect, but good enough in my scenarios), but on other platforms this logic falls apart, since root_path() is always empty().
Question
I'm looking for some multi-platform solution (boost preferably) to detect if two paths are sharing same physical drive.
If there is not such thing I will have to use platform specific API and I prefer to avoid that.
I think your best bet is taking a step back and rethink your approach:
If your OS and file system support it, try creating a hard link. Now you know relatively reliably whether they are the same file system. (Unfortunately, using network file systems and the like one can still avoid the OS knowing the file systems are really the same.)
Knowing whether it is the same hard drive finally seems rather pointless for preventing stuffing too much crap onto it, even if it is interesting for throughput, and probably needs OS specific handling.
And if you know the paths should be the same, creating a test file lets you avoid any kind of flawed simulation and just let the system figure it out for you.

How to know if files have been changed?

I'm writing a custom C++ program that copies files only if they were changed in the source since the last time they were copied. So I need to know if files in my specific folder were changed.
I was originally thinking about calculating SHA-1 hash on those files, but then this probably means that I have to do this on the entire folder. Plus, what if the size of those files is 100GB. That would mean that I have to calculate SHA-1 on 100GB of data that would probably take some time.
So I'm curious if there's a better way to do this?
You have at least a couple of possibilities.
One would be to use NTFS change journals to track what files have been modified.
Each file also has an "archive" flag associated with it. This is typically used by backup programs. Any time you write to a file, the flag is set. When you copy/back it up, you clear the flag. When you want to see what files to copy/backup, you just check whether the flag is set or clear. Obvious problem: collisions with other backup programs.
There is also a ReadDirectoryChangesW1. This, however, can only detect changes that happen while your code that uses it is running. So, to use it to track changes you need to do something like setting up a service that runs in the background all the time to keep track of changes. Depending on the file and how it gets modified, it's still possible for even this to miss changes that happen during boot (before your service starts executing).
I've listed these in roughly descending order of how well they seem to fit your needs--i.e., change journals are almost certainly the best fit, the archive flag second and ReadDirectoryChangesW (by quite a large margin) the worst fit for your apparent needs.
1. There's also an older FindFirstChangeNotification/FindNextChangeNotification, but they're less versatile and have all the same shortcomings as ReadDirectoryChangesW. At one time they were useful for code that needed to be compatible with Windows 95/98/SE (since those didn't include ReadDirectoryChangesW) but it's been years since there was a good reason to use them.
In comments for other answers, you've stated that you can't use a file-monitoring API (such as FindFirstChangeNotification) since your code may not be running at the time the change occurs.
I would suggest a multi-pronged approach.
If your application is running, use the file monitoring APIs to detect new changes.
On startup or when a new disk appears, check to see if the file size is the same as before. If it isn't, then you know you have a change.
If the file size is the same, you could use the file's archive flags to determine if it has changed. However, the archive flag is easily altered by users and therefore you probably shouldn't rely on it.
Use the file's last altered timestamp. This can be modified by users, but it's more difficult to do.
Use a hash to determine if the file has changed. The hash you pick depends on how important it is to detect changes. If it isn't critical something like CRC32 or MD5 would be sufficient. If it needs to be secure, consider SHA-256. Consider breaking large files into chunks. That way you don't have to hash the whole file before getting a "this changed" result.
This tiered approach lets you skip the expensive hashing whenever you can.
If you want to do it in "real-time", Windows has a native API for that. FindFirstChangeNotifcation()

Crossplatform library for uniquely identifying the machine my app is currently running on?

So I have next situation - shared file system, over N alike machines. My app is run on all of them. I need to understand on which machine my app runs in each instance - some unique ID... Is there such thing, is it possible to emulate it? Is there any crossplatform library that would help with that?
There are two concerns here, security and stability of your matching.
Hardware characteristics are a good place to start. Things like MAC address, CPU, hdd identifiers.
These things theoretically can change. If a hdd failed you probably would lose whatever configuration you had on the system as well. I could see a system that sent a hash of each characteristic separately work alright. If 4 out of 5 matched, you could probably guess that their network card caught on fire and it was replaced.
If you just need a head count, you may not even be interested that this new system with a different signature used to be another one.
Usually, people aren't too concerned with security with these systems; they just want to track resources on a network. If someone wanted to spoof the hardware identifiers they could. For simple cases, I would look into an installer that registered a salted identifier. If you really need something terribly secure you might start looking at commercial products (or ask another question about the security aspects specifically).
Both of these are error prone obviously. I'm not sure you should even fully automate it in those cases. Think about a case where network cards were behaving weird and you swapped them with another machine.
Human eyes are pretty good, let an administrator use them. At worst, they can probably figure things out with a quick email. Just give them enough information to make an informed decision when something does go wrong. Really, if you just log everything a human should be able to recreate the scenario and make a decision. Most of these things won't change daily. There is more work when hardware fails, not every day.

Planning for a file indexing program

I'm somewhat new to C++, but not programming in general. I want to write my first practice program in C++ as a file indexing program.
It's seems easy enough scanning directories for names, storing that information, and filtering them depending on what I want to view.
What I'm concerned about is at some point, I want to index a whole drive (I have an extra 1TB drive apart from my OS to store files on). I have about 400,000-500,000 files on there and I was wondering what would be the best way to store this information? I highly doubt keeping all those records in a text file is optimal and would like to think it's naive.
Is there anything else I should be concerned about?
Thanks.
Isn't some kind of database the obvious answer?
If you don't want to hook up to a server, you can try something like SQLite. Alternatively, if you only need to do basic lookups, you could also create your own proprietary file format. You can utilize any combination of binary and textual data in your file. It's hard to suggest possible layouts without knowing what data you need to store and how you'll be accessing it.
You can safely persist your data to a text file. However, you'd need to read the file into memory at startup, and do all the complex operations in memory. Even if we'd assume a naive approach, where you store the file path with every file, you'd still look at ~100 bytes/file, or ~50 megabyte. A smarter approach stores just the filename and a pointer to the directory name.

Machine ID for Mac OS?

I need to calculate a machine id for computers running MacOS, but I don't know where to retrieve the informations - stuff like HDD serial numbers etc. The main requirement for my particular application is that the user mustn't be able to spoof it. Before you start laughing, I know that's far fetched, but at the very least, the spoofing method must require a reboot.
The best solution would be one in C/C++, but I'll take Objective-C if there's no other way. The über-best solution would not need root privileges.
Any ideas? Thanks.
Erik's suggestion of system_profiler (and its underlying, but undocumented SystemProfiler.framework) is your best hope. Your underlying requirement is not possible, and any solution without hardware support will be pretty quickly hackable. But you can build a reasonable level of obfuscation using system_profiler and/or SystemProfiler.framework.
I'm not sure your actual requirements here, but these posts may be useful:
Store an encryption key in Keychain while application installation process (this was related to network authentication, which sounds like your issue)
Obfuscating Cocoa (this was more around copy-protection, which may not be your issue)
I'll repeat here what I said in the first posting: It is not possible, period, not possible, to securely ensure that only your client can talk to your server. If that is your underlying requirement, it is not a solvable problem. I will expand that by saying it's not possible to construct your program such that people can't take out any check you put in, so if the goal is licensing, that also is not a completely solvable problem. The second post above discusses how to think about that problem, though, from a business rather than engineering point of view.
EDIT: Regarding your request to require a reboot, remember that Mac OS X has kernel extensions. By loading a kernel extension, it is always possible to modify how the system sees itself at runtime without a reboot. In principle, this would be a Mac rootkit, which is not fundamentally any more complex than a Linux rootkit. You need to carefully consider who your attacker is, but if your attackers include Mac kernel hackers (which is not an insignificant group), then even a reboot requirement is not plausible. This isn't to say that you can't make spoofing annoying for the majority of users. It's just always possible by a reasonably competent attacker. This is true on all modern OSes; there's nothing special here about Mac.
The tool /usr/sbin/system_profiler can provide you with a list of serial numbers for various hardware components. You might consider using those values as text to generate an md5 hash or something similar.
How about getting the MAC ID of a network card attached to a computer using ifconfig?