Can I create a Handle without a file?

Can I create a Handle without a file? - c++

I want to create a dump in windows with the function MiniDumpWriteDump. The problem is that that function takes a Handle to a file to write the result to. I want the data in memory so that I can send it over the internet. Therefore, I was wondering if there is a way to create a handle without a file backing it and I can just get a pointer to the data?

You can use memory mapped files. See here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx
You need to pass hFile = INVALID_HANDLE_VALUE and specify maximal size of file. Please, check msdn for the details.

There are a couple of possibilities.
One would be to use CreateFile, but pass FILE_ATTRIBUTE_TEMPORARY. This will create a file, but tells Windows to attempt to keep as much of the file in the cache as possible. While this doesn't completely avoid creating a file, if you have enough memory it can often eliminate any (or much, anyway) I/O to/from the disk from happening.
Another possibility (though one I've never tested) would be to pass a handle to a named (or maybe even an anonymous) pipe. You can generally write to a pipe like you would a file, so as long as the crash dump writer just needs to be able to pass the handle to WriteFile, chances are pretty good this will work fine. From there, you could (for example) have another small program that would read the data from the pipe and write it to a socket. Obviously it would be nice to be able to avoid the extra processing to translate from pipe to socket, but such is life some times.
If you haven't tried it, you might want to test with just passing a socket handle to the crash dump writer. Although it's somewhat limited, Windows does support treating a socket handle like a normal file (or whatever) handle. There's certainly nothing close to a guarantee that it'll work, but it may be worth a shot anyway.

The crash dump is indeed process's memory. So, it doesn't make sense. Why don't you simply send the file and delete after successful send?
By the way, you can compress the file and send it, because crashdumps are usually big files.

The documentation says to pass a file handle, so if you do anything else you're breaking the contract and (if it works at all) the behaviour will not be reliable.

Pass a named pipe handle. Pipe the data back to yourself.

Related

Thread-safe file updates

I need to learn how to update a file concurrently without blocking other threads. Let me explain how it should work, needs, and how I think it should be implemented, then I ask my questions:
Here is how the worker works:
Worker is multithreaded.
There is one very large file (6 Terabyte).
Each thread is updating part of this file.
Each write is equal to one or more disk blocks (4096 bytes).
No two worker write at same block (or same group of blocks) at the same time.
Needs:
Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)
In case of (any kind of) failure, There is no problem if updating block corrupts.
In case of (any kind of) failure, blocks that are not updating should not corrupts.
If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)
I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.
How I think it should be implemented:
I'm not much familiar with file manipulation and how it works at operating system level, but I think writing on a single block should not corrupt other blocks when errors happen. So I think this code should perfectly work as needed, without any change:
char write_value[] = "...4096 bytes of data...";
int write_block = 12345;
int block_size = 4096;
FILE *fp;
fp = fopen("file.txt","w+");
fseek(fp, write_block * block_size, SEEK_SET);
fputs(write_value, fp);
fsync(fp);
fclose(fp);
Questions:
Obviously, I'm trying to understand how it should be implemented. So any suggestions are welcome. Specially:
If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?
In short, What things should be considered on perfecting code above, (according to the last question)?
Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).
Any device/file system/operating system specific notes? (This code will run on CentOS/FreeBSD (not decided yet), but I can change the OS if there is better alternative for this problem. File is on one 8TB SSD).

Threads should not block other blocks (no lock on file, or minimum possible number of locks should be used)
Your code sample uses fseek followed by fwrite. Without locking in-between those two, you have a race condition because another thread could jump in-between. There are three reasonable solutions:
Use flockfile, followed by regular fseek and fwrite_unlocked then funlock. Those are POSIX-2001 standard
Use separate file handles per thread
Use pread and pwrite to do IO without having to worry about the seek position
Option 3 is the best for you.
You could also use the asynchronous IO from <aio.h> to handle the multithreading. It basically works with a thread-pool calling pwrite on most Unix implementations.
In case of (any kind of) failure, There is no problem if updating block corrupts
I understand this to mean that there should be no file corruption in any failure state. To the best of my knowledge, that is not possible when you overwrite data. When the system fails in the middle of a write command, there is no way to guarantee how many bytes were written, at least not in a file-system agnostic version.
What you can do instead is similar to a database transaction: You write the new content to a new location in the file. Then you do an fsync to ensure it is on disk. Then you overwrite a header to point to the new location. If you crash before the header is written, your crash recovery will see the old content. If the header gets written, you see the new content. However, I'm not an expert in this field. That final header update is a bit of a hand-wave.
In case of (any kind of) failure, blocks that are not updating should not corrupts.
Should be fine
If file write was successful, we must be sure that it is not buffered and be sure that actually written on disk (fsync)
Your sample code called fsync, but forgot fflush before that. Or you set the file buffer to unbuffered using setvbuf
I can convert this large file to as many smaller files as needed (down to 4kb files), but I prefer not to do that. Handling that many files is difficult, and needs a lot of file handles open/close operations, which has negative impact on performance.
Many calls to fsync will kill your performance anyway. Short of reimplementing database transactions, this seems to be your best bet to achieve maximum crash recovery. The pattern is well documented and understood:
Create a new temporary file on the same file system as the data you want to overwrite
Read-Copy-Update the old content to the new temporary file
Call fsync
Rename the new file to the old file
The renaming on a single file system is atomic. Therefore this procedure will ensure after a crash, you either get the old data or the new one.

If writing to one block of a large file fails, what is the chance of corrupting other blocks of data?
None.
Is it possible to replace one block of data with another file/block atomically? (like how rename() system call replaces one file with another atomically, but in block-level. Something like replacing next-block-address of previous block in file system or whatever else).
No.

How safe are memory-mapped files for reading input files?

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.
However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX e.g. doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)
If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.
Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.

It is not really a problem.
Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely, since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.
If you use conventional I/O (such as read), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications. It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv, you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, e.g. under Linux writes are demonstrably not atomic. Not even by accident.
The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (e.g. log file), or a file that you explicitly told the process to write to (e.g. saving in a text editor), or the process creates a new file (e.g. compiler creating an object file), or the process merely appends to an existing file (e.g. db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).
In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.
If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).

In theory, you're probably in real trouble if someone does
modify the file while you're reading it. In practice: you're
reading characters, and nothing else: no pointers, or anything
which could get you into trouble. In practice... formally,
I think it's still undefined behavior, but it's one which
I don't think you have to worry about. Unless the modifications
are very minor, you'll get a lot of compiler errors, but that's
about the end of it.
The one case which might cause problems is if the file was
shortened. I'm not sure what happens then, when you're reading
beyond the end.
And finally: the system isn't arbitrarily going to open and
modify the file. It's a source file; it will be some idiot
programmer who does it, and he deserves what he gets. In no
case will your undefined behavior corrupt the system or other
peoples files.
Note too that most editors work on a private copy; when the
write back, they do so by renaming the original, and creating
a new file. Under Unix, once you've opened the file to mmap
it, all that counts is the inode number. And when the editor
renames or deletes the file, you still keep your copy. The
modified file will get a new inode. The only thing you have to
worry about is if someone opens the file for update, and then
goes around modifying it. Not many programs do this on text
files, except for appending additional data to the end.
So while formally, there's some risk, I don't think you have to
worry about it. (If you're really paranoid, you could turn off
write authorisation while you're mmaped. And if there's
really an enemy agent out to get your, he can turn it right back
on.)

convert image buffer to filestream

Something similar to this may have been asked earlier, I could not find an exact answer to my problem to decided to ask here.
I am working with a 3rd party framework that has it's own classes defined to handle image files. It only accepts the file name and the whole implementation is around being able to open these filestreams and perform reads/writes.
I'd like to input an image buffer (that I obtain through some pre-processing on an image open earlier) and feed it to this framework. The problem being I cannot feed a buffer to it, only a filename string.
I am looking at the best way to convert my buffer to a filestream so it can be seekable and be ingested by the framework. Please help me figure out what I should be looking at.
I tried reading about streambuf (filebuf and stringbuf) and tried assigning the buffer to these types, but no success so far.

If the framework only takes a file name, then you have to pass it a file name. Which means the data must reside in the file system.
The portable answer is "write your data to a temporary file and pass the name of that".
On Unix, you might be able to use a named pipe and fork another thread to feed the data through the pipe...
But honestly, you are probably better off just using a temporary file. If you manage to open, read, and delete the file quickly enough, it most likely will never make it out to disk anyway, since the kernel will cache the data.
And if you are able to use a ramdisk (tmpfs), you can guarantee that everything happens in memory.
[edit]
One more thought. If you can modify your code base to operate on std::iostream instead of std::fstream, you can pass it a std::stringstream. They support all of the usual iostream operations on a memory buffer, including things like seeking.

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?

Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.

Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.

I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.

Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute

In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.

A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.

There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.

Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.

In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.

This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)

on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.

Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.

The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.

Determine the size of a pipe without calling read()

I need a function called SizeOfPipe() which should return the size of a pipe - I only want to know how much data is in the pipe and not actually read data off the pipe itself.
I thought the following code would work:
fseek (pPipe, 0 , SEEK_END);
*pBytes = ftell (pPipe);
rewind (pPipe);
but fseek() doesn't work on file descriptors. Another option would be to read the pipe then write the data back but would like to avoid this if possible. Any suggestions?

Depending on your unix implementation ioctl/FIONREAD might do the trick
err = ioctl(pipedesc, FIONREAD, &bytesAvailable);
Unless this returns the error code for "invalid argument" (or any other error) bytesAvailable contains the amount of data available for unblocking read operations at that time.

Some UNIX implementations return the number of bytes that can be read in the st_size field after calling fstat(), but this is not portable.

Unfortunately the system cannot always know the size of a pipe - for example if you are piping a long-running process into another command, the source process may not have finished running yet. In this case there is no possible way (even in theory) to know how much more data is going to come out of it.
If you want to know the amount of data currently available to read out of the pipe that might be possible, but it will depend on OS buffering and other factors which are hard to control. The most common approach here is just to keep reading until there's nothing left to come (if you don't get an EOF then the source process hasn't finished yet). However I don't think this is what you are looking for.
So I'm afraid there is no general solution.

It's not in general possible to know the amount of data you can read from a pipe just from the pipe handle alone. The data may be coming in across a network, or being dynamically generated by another process. If you need to know up front, you should arrange for the information to be sent to you - through the pipe, or out of band - by whatever process is at the other end of the pipe.

There is no generic, portable way to tell how much data is available in a pipe without reading it. At least not under POSIX specifications.
Pipes are not seekable, and neither is it possible to put the data back into the reading end of a pipe.
Platform-specific tricks might be possible, though. If your question is platform-specific, editing your question to say so might improve your chances to get a working answer.

It's almost never necessary to know how many bytes are in the pipe: perhaps you just want to do a non-blocking read() on the pipe, ie. to check if there are any bytes ready, and if so, read them, but never stop and wait for the pipe to be ready.
You can do that in two steps. First, use the select() system call to find out whether data is available or not. An example is here: http://www.developerweb.net/forum/showthread.php?t=2933
Second, if select tells you data is available, call read() once, and only once, with a large block size. It will read only as many bytes are available, or up to the size of your block, whichever is smaller. If select() returns true, read() will always return right away.

I don't think it is possible, isn't the point of a pipe to provide interprocess communication between the two ends (in one direction). If I'm correct in that assertion, the send may not yet have finished pushing data into the pipe -- so it'd be impossible to determine the length.
What platform are you using?

I do not think it's possible. Pipes present stream-oriented protocol rather than packet-oriented one. IOW, if you write to a pipe twice, once with,say, 250 bytes and once with, say, 520 bytes, there is no way to tell how many bytes you'll get from the other end in one read request. You could get 256, 256, and then the rest.
If you need to impose packets on a pipe, you need to do it yourself by writing pre-determined (or delimited) number of bytes as packet length, and then the rest of teh packet. Use select() to find out if there is data to read, use read() to get a reasonably-sized buffer. When you have your buffer, it's your responsibility to determine the packet boundary.

If you want to know the amount of data that it's expected to arrive, you could always write at the begining of every msg sent by the pipes the size of the msg.
So write for example 4 bytes at the start of every msg with the length of your data, and then only read the first 4 bytes.

There is no portable way to tell the amount of data coming from a pipe.
The only thing you could do is to read and process data as it comes.
For that you could use something like a circular buffer

You can wrap it in object with buffering that can be rewinded. This would be feasible only for small amounts of data.
One way to do this in C is to define stuct and wrap all functions operating on pipes for your struct.

As many have answered, you cannot portably tell how many bytes there is to read, OTOH what you can do is poll the pipe for data to be read. First be sure to open the pipe with O_RDWR|O_NONBLOCK - it's mandated by POSIX that a pipe be open for both read and write to be able poll it.
Whenever you want to know if there is data available, just select/poll for data to read. You can also know if the pipe is full by checking for write but see the note below, depending on the type or write it may be inaccurate.
You won't know how much data there is but keep in mind writes up to PIPE_BUF bytes are guaranteed to be atomic, so if you're concerned about having a full message on the pipe, just make sure they fit within that or split them up.
Note: When you select for write, even if poll/select says you can write to the pipe a write <= PIPE_BUF will return EAGAIN if there isn't enough room for the full write. I have no ideas how to tell if there is enough room to write... that is what I was looking for (I may end padding with \0's to PIPE_BUF size... in my case it's just for testing anyway).
I have an old example app Perl that can read one or more pipes in non-blocking mode, OCP_Daemon. The code is pretty close to what you would do in C using an event loop.

On Windows you can always use PeekNamedPipe, but I doubt that's what you want to do anyway.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js