Read from an open file - c++

I'm working on a project that encrypts data at rest, then the encrypted data is decrypted into a temporary file in order to write the unencrypted data to an XML stream. The encryption and decryption work, but I am having trouble reading the data from the file. I believe this is due to the file still being open, but I can't close the file since it gets deleted on close. I'm wondering if there's a way to read from this file?
As a last resort I could rewrite the code to just use a large encrypted buffer instead of a file, but I'd like to figure out how to read the data from the open file.
EDIT: I should've said earlier that I have the decryption logic in a C++ class and that I port the functions I need to C with extern "C". The function doing the decryption is in C++, which allowed me to get a HANDLE from the file descriptor and then use FlushFileBuffers(HANDLE) to flush the buffer.

Input and output can be buffered both at the C library level and at the kernel level; writes from one process are not necessarily immediately visible to another process until the buffer has been flushed. If you're using the C library's standard IO features, you can use fflush in the writer process to make sure its output is available to the reader process.

Related

How to get correct file size only on the completion of a detected file change, not at the beginning?

I'm using libuv's uv_fs_event_t to monitor file changes. And once a change is detected, I open the file in the callback uv_fs_event_cb.
However, my program requires to also get the full file size when opening the file, so I would know how much memory is to be allocated based on the file size. I found that no matter I use libuv's uv_fs_fstat or POSIX's stat/stat64, or fseek+ftell I never get the correct file size immediately. It's because when my program is opening the file, the file is still being updated.
My program runs in a tight single thread with callbacks so delay/sleep isn't the best option here (and no guaranteed correctness either).
Is there any way to handle this with or without leveraging libuv, so that I can, say hold off opening and reading the file, until the write to the file has completed? In other words, instead of immediately detects the start of a change of a file event, can I in some way detects a completion of a file change?
One approach is to have the writer create an intermediate file, and finish I/O by renaming it to the target file. e.g. this is what happens in most browsers, the file has an "downloading.tmp" name until download is complete to discourage you from opening it.
Another approach is to write/touch a "finished" file after writing the main target file, and wait to see that file before the reader starts his job.
Last option I can see, if the file format can be altered slightly, have the writer print the file size as first bytes of the file, then the reader can preallocate correctly even if the file is not fully written, and then it will insist on reading all the data.
Overall I'm suggesting instead of a completion event, make the writer produce any event that can be monitored after it has completed it's task, and have the reader wait/synchronize on that event.

Reading file that changes over time C++

I am going to read a file in C++. The reading itself is happening in a while-loop, and is reading from one file.
When the function reads information from the file, it is going to push this information up some place in the system. The problem is that this file may change while the loop is ongoing.
How may I catch that new information in the file? I tried out std::ifstream reading and changing my file manually on my computer as the endless-loop (with a sleep(2) between each loop) was ongoing, but as expected -- nothing happend.
EDIT: the file will overwrite itself at each new entry of data to the file.
Help?
Running on virtual box Ubuntu Linux 12.04, if that may be useful info. And this is not homework.
The usual solution is something along the lines of what MichaelH
proposes: the writing process opens the file in append mode, and
always writes to the end. The reading process does what
MichaelH suggests.
This works fine for small amounts of data in each run. If the
processes are supposed to run a long time, and generate a lot of
data, the file will eventually become too big, since it will
contain all of the processed data as well. In this case, the
solution is to use a directory, generating numbered files in it,
one file per data record. The writing process will write each
data set to a new file (incrementing the number), and the
reading process will try to open the new file, and delete it
when it has finished. This is considerably more complex than
the first suggestion, but will work even for processes
generating large amounts of data per second and running for
years.
EDIT:
Later comments by the OP say that the device is actually a FIFO.
In that case:
you can't seek, so MichaelH's suggestion can't be used
literally, but
you don't need to seek, since data is automatically removed
from the FIFO whenever it has been read, and
depending on the size of the data, and how it is written, the
writes may be atomic, so you don't have to worry about partial
records, even if you happen to read exactly in the middle of
a write.
With regards to the latter: make sure that both the read and
write buffers are large enough to contain a complete record, and
that the writer flushes after each record. And make sure that
the records are smaller than the size needed to guarantee
atomicity. (Historically, on the early Unix I know, this was
4096, but I would be surprised if it hasn't increased since
then. Although... Under Posix, this is defined by PIPE_BUF,
which is only guaranteed to be at least 512, and is only 4096
under modern Linux.)
Just read the file, rename the file, open the renamed file. do the processing of data to your system, and at the end of the loop close the file. After a sleep, re-open the file at the top of the white loop, rename it and repeat.
That's the simplest way to approach the problem and saves having to write code to process dynamic changes to the file during the processing stage.
To be absolutely sure you don't get any corruption it's best to rename the file. This guarantees that any changes from another process do not affect the processing. It may not be necessary to do this - it depends on the processing and how the file is updated. But it's a safer approach. A move or rename operation is guaranteed to be atomic - so there should be no concurreny issues if using this approach.
You can use inotify to watch file changes.
If you need simpler solution - read file attributes ( with stat(), and check last_write_time of a file ).
However you still may miss some file modification, while you'll be opening and rereading file. So if you have control over the application which writes to a file, i'd recommend you using something else to communicate between these processes, pipes for example.
To be more explicit, if you want tail-like behavior you'll want to:
Open the file, read in the data. Save the length. Close the file.
Wait for a bit.
Open the file again, attempt to seek to the last read position, read the remaining data, close.
rinse and repeat

Reading a Linux device with fstream

I'm attempting to get feedback from some hardware that is used over USBTMC and SCPI.
I can read and write to the device using /dev/usbtmc0 in a C++ [io]fstream, alternating by reading and writing to send and receive messages. Most commands are terminated by a single newline, so it's easy to tell when the end of a response is received. The simplified code I'm using for that is:
fstream usb;
usb.open("/dev/usbtmc0", fstream::in);
if (usb.good())
{
string output;
getline(usb, output);
usb.close();
// Do things with output
}
// Additional cleanup code...
There is, however, one thing that is escaping me, and it's defined in the SCPI/IEEE specification as "*LRN?". When sent, the connected device will send back arbitrary data (actual wording from the specification) that can be used to later reprogram the device should it get into a weird state.
The issue with the response message of this LRN command is that it contains one or more newlines. It does properly terminate the entire message with a newline, but the fact that there are newlines embedded makes it really tricky to work with. Some hardware will prefix the payload with a length, but some don't.
When reading data from the hardware, there is a 5 second timeout built into the Linux usbtmc kernel driver that will block any read calls if you try to read past what's available.
Using fstream::eof doesn't seem to return anything useful. It acts much like a socket. Is there a way that I can read all data on the device without knowing about its length, termination, and while avoiding a kernel timeout?
The problem with using fstream for this is that fstream has internal buffering, there's no 1:1 correlation between device fileOps->read calls and fstream operations.
For interacting with device drivers, you really need to use the low-level open, read, write functions from unistd.h and fcntl.h.

C/C++ Determine Whether Files have been completely written

I have a directory (DIR_A) to dump from Server A to Server B which is
expected to take a few weeks. DIR_A has the normal tree
structure i.e. a directory could have subfolders or files, etc
Aim:
As DIR_A is being dumped to server B, I will have to
go through DIR_A and search for certain files within it (do not know the
exact name of each file because server A changes the names of all the files
being sent). I cannot wait for weeks to process some files within DIR_A. So, I want to
start manipulating some of the files once I receive them at server B.
Brief:
Server A sends DIR_A to Server B. Expected to take weeks.
I have to start processing the files at B before the upload is
complete.
Attempt Idea:
I decided to write a program that will list the contents of DIR_A.
I went on finding out whether files exist within folders and subfolders of DIR_A.
I thought that I might look for the EOF of a file within DIR_A. If it is not present
then the file has not yet been completely uploaded. I should wait till the EOF
is found. So, I keep looping, calculating the size of the file and verifying whether EOF is present. If this is the case, then I start processing that file.
To simulate the above, I decided to write and execute a program writing to
a text file and then stopped it in the middle without waiting for completion.
I tried to use the program below to determine whether the EOF could be found. I assumed that since I abrubtly ended the program writing to the text file the eof will not be present and hence the output "EOF FOUND" should not be reached. I am wrong since this was reached. I also tried with feof(), and fseek().
std::ifstream file(name_of_file.c_str, std::ios::binary);
//go to the end of the file to determine eof
char character;
file.seekg(0, ios::end);
while(!file.eof()){
file.read(character, sizeof(char));
}
file.close();
std::cout << "EOF FOUND" << std::endl
Could anyone provide with an idea of determining whether a file has been completely written or not?
EOF is simply C++'s way of telling you there is no more data. There's no EOF "character" that you can use to check if the file is completely written.
The way this is typically accomplished is to transfer the file over with one name, i.e. myfile.txt.transferring, and once the transfer is complete, move the file on the target host (back to something like myfile.txt). You could do the same by using separate directories.
Neither C nor C++ have a standard way to determine if the file is still open for writing by another process. We have a similar situation: a server that sends us files and we have to pick them up and handle as soon as possible. For that we use Linux's inotify subsystem, with a watch configured for IN_CLOSE_WRITE events (file was closed after having been opened for writing), which is wrapped in boost::asio::posix::stream_descriptor for convenient asynchronicity.
Depending on the OS, you may have a similar facility. Or just lsof as already suggested.
All finite files have an end. If a file is being written by one process, and (assuming the OS allows it) simultaneously read (faster than it is being written) by another process,then the reading process will see an EOF when it has read all the characters that have been written.
What would probably work better is, if you can determine a length of time during which you can guarantee that you'll receive a significant number of bytes and write them to a file (beware OS buffering), then you can walk the directory once per period, and any file that has changed its file size can be considered to be unfinished.
Another approach would require OS support: check what files are open by the receiving process, with a tool like lsof. Any file open by the receiver is unfinished.
In C, and I think it's the same in C++, EOF is not a character; it is a condition a file is (or is not) in. Just like media removed or network down is not a character.

Sending the contents of a file to a client

I am writing a C++ server side application called quote of the day. I am using the winsock2 library. I want to send the contents of a file back to the client, including newlines by using the send function. The way i tried it doesn't work. How would i go about doing this?
Reading the file and writing to the socket are 2 distinct operations. Winsock does not have an API for sending a file directly.
As for reading the file, simply make sure you open it in read binary mode if using fopen, or simply use the CreateFile, and ReadFile Win32 API and it will be binary mode by default.
Usually you will read the file in chunks (for example 10KB at a time) and then send each of those chunks over the socket by using send or WSASend. Once you are done, you can close the socket.
On the receiving side, read whatever's available on the socket until the socket is closed. As you read data into a buffer, write the amount read to a file.
Hmm... I think Win32 should have something similar to "sendfile" in Linux.
If it doesn't you still can use memory-mapping (but, don't forgot to handle files with size larger than available virtual address space). You probably will need to use blocking sockets to avoid returning to application until all data is consumed. And I think there was something with "overlapped" operation to implement async IO.
I recommend dropping winsock and instead using something more modern such as Boost.Asio:
http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio/tutorial.html
There is also an example on transmitting a file:
http://www.boost.org/doc/libs/1_37_0/doc/html/boost_asio/examples.html