Delay in ofstream::open, possibly due to mixing with _iobuf? - c++

I have a C++ program that creates an output file "A" with ofstream. This file is then read by some legacy C code that opens the file with _iobuf. The legacy code then creates its own output file "B" using _iobuf, and this file is then read by the C++ program using ifstream. This sequence is iterated many times, with the same file names for A and B in each iteration.
Occasionally, the C++ program cannot open the output file A for writing, and I must try several times before it succeeds. This occurs nondeterministically, and maybe once in a thousand iterations. Note that the C program never has to wait to open its input or output file, nor does the C++ program ever have to wait to open its input file. This informal observation is based on hundreds of thousands of iterations.
I'm wondering if this has something to do with mixing ofstream and _iobuf in the same program? Both the C++ code and the C code are linked into the same program. And the legacy C code is technically C++ code, but was written in a very C-like style. Is there anything I can do to eliminate this waiting to open the ofstream file? I do not want to change the legacy code if I can possibly avoid it.
Pseudo code (not compiled):
void someObject::someMethod()
{
for (int count = 0; count < someLimit; ++count)
{
newerObject::firstMethod();
olderObject::secondMethod();
newerObject::thirdMethod();
}
}
void newerObject::firstMethod()
{
// do some processing first
// then write the results of the processing to a file
ofstream A;
A.open("A", ofstream::out); // this sometimes must be tried multiple times
// write data to file A
A.close();
}
void olderObject::secondMethod()
{
FILE* f;
f = fopen("A", "rt"); // this always works the first time
// read data from file A
fclose(f);
// do some processing
f = fopen("B", "w");
// write data to file B
fclose(f);
}
void newerObject::thirdMethod()
{
ifstream B;
B.open("B"); // this always works the first time
// read data from file B
B.close();
// do some processing
}
Currently, as a work around, I put the ofstream::open in a do-while loop. I would love to get rid of this awkwardness. Thanks in advance for any advice you can give.

First off, the problem is almost certainly not the use of different methods to access the files: under the hood, the C and C++ I/O functions use the same system I/O facilities. You seem to be using Windows (on other systems files typically can be open multiple times simultaneously) and I don't know much about the system but I would suspect that the file system hasn't been updated to reflect that the file is closed when you try to open it. This may have to do with the "t" open flag: I don't know what this is about.
On UNIXes you can force the I/O operations to wait until the actual change on disk completed. Something like this could help avoiding the problem but has the significant cost that operations become hideously slow. On UNIXes one approach would be to blow away the file system entry the moment the file was opened successfully (after all, at this point its name isn't used anymore):
if (FILE* fp = fopen("file", "r")) {
remove("file");
// do processing
}
However, if I recall correctly on Windows you can neither remove the file nor rename it. Personally, in solving the problem I would proceed as follows:
Determine under which situations the file can't be opened, e.g. by keeping the file open and trying to open it. This is mainly intended to create a setup where the problem is reproducible so you can verify later that you indeed found a solution.
Once I found a way to reproduce the problem I would probably a better idea of the actual root cause and possibly googling would help. In any case this is the point where researching the root cause comes in.
Once the cause is understood it is hopefully easy to devise a solution. If not, opening the file multiple times under it is successful may very well be the right solution.

Related

How to reduce the size of a fstream file in C++

What is the best way to cut the end off of a fstream file in C++ 11
I am writing a data persistence class to store audio for my audio editor. I have chosen to use fstream (possibly a bad idea) to create a random access binary read write file.
Each time I record a little sound into my file I simply tack it onto the end of this file. Another internal data structure / file, contains pointers into the audio file and keeps track of edits.
When I undo a recording action and then do something else the last bit of the audio file becomes irrelevant. It is not referenced in the current state of the document and you cannot redo yourself back to a state where you can ever see it again. So I want to chop this part of the file off and start recording at the new end. I don’t need to cut out bitts in the middle, just off the end.
When the user quits this file will remain and be reloaded when they open the project up again.
In my application I expect the user to do this all the time and being able to do this might save me as much as 30% of the file size. This file will be long, potentially very, very long, so rewriting it to another file every time this happens is not a viable option.
Rewriting it when the user saves could be an option but it is still not that attractive.
I could stick a value at the start that says how long the file is supposed to be and then overwrite the end to recycle the space but in the mean time. If I wanted to continually update the data store file in case of crash this would mean I would be rewriting the start over and over again. I worry that this might be bad for flash drives. I could also recomputed the end of the useful part of the file on load, by analyzing the pointer file but in the mean time I would be wasting all that space potentially, and that is complicated.
Is there a simple call for this in the fstream API?
Am I using the wrong library? Note I want to stick to something generic STL I preferred, so I can keep the code as cross platform as possible.
I can’t seem to find it in the documentation and have looked for many hours. It is not the end of the earth but would make this a little simpler and potentially more efficient. Maybe I am just missing it somehow.
Thanks for your help
Andre’
Is there a simple call for this in the fstream API?
If you have C++17 compiler then use std::filesystem::resize_file. In previous standards there was no such thing in standard library.
With older compilers ... on Windows you can use SetFilePointer or SetFilePointerEx to set the current position to the size you want, then call SetEndOfFile. On Unixes you can use truncate or ftruncate. If you want portable code then you can use Boost.Filesystem. From it is simplest to migrate to std::filesystem in the future because the std::filesystem was mostly specified based on it.
If you have variable, that contains your current position in the file, you could seek back for the length of your "unnedeed chunk", and just continue to write from there.
// Somewhere in the begining of your code:
std::ofstream *file = new std::ofstream();
file->open("/home/user/my-audio/my-file.dat");
// ...... long story of writing data .......
// Lets say, we are on a one millin byte now (in the file)
int current_file_pos = 1000000;
// Your last chunk size:
int last_chunk_size = 12345;
// Your chunk, that you are saving
char *last_chunk = get_audio_chunk_to_save();
// Writing chunk
file->write(last_chunk, last_chunk_size);
// Moving pointer:
current_file_pos += last_chunk_size;
// Lets undo it now!
current_file_pos -= last_chunk_size;
file->seekp(current_file_pos);
// Now you can write new chunks from the place, where you were before writing and unding the last one!
// .....
// When you want to finally write file to disk, you just close it
file->close();
// And when, truncate it to the size of current_file_pos
truncate("/home/user/my-audio/my-file.dat", current_file_pos);
Unfortunatelly, you'll have to write a crossplatform function truncate, that would call SetEndOfFile in windows, and truncate in linux. It's easy enough with using preprocessor macros.

output redirection in the function system() seems not to work

I have been scratching my head around this problem for a while, and I couldn't find any answer through surfing the web either.
The problem is that I call system("csvtojson someFile.csv 1> someOtherFile.json") inside my program to produce a JSON file. After this line I want to open, read, and process the JSON file. Although, I can see that the file is created, but fopen() returns NULL.
I read that system() is synchronized so I think the rest of my program will not get executed until the system call is finished, and so the file will be created.
I suspect the problem is somehow related to redirecting the output stream using "1>"; not sure, though.
Any help or hint will be much appreciated.
Thanks! :)
P.S. I don't want to use a library to convert csv to JSON, and I can't perform the conversion outside the program because there are tons of very large csv files and the only way for me is to convert each to a JSON file inside the program, run my algorithm, and move to the next csv file ( converting it to JSON and saving it in the very same JSON file). So in total I have only one JSON file, being like a buffer for my csv files. Having said that, if anyone has a better design approach that can be implemented quickly, that would be also great.
UPDATE : Actual code that exhibits the problem, copied from the OP's answer:
int main(){
system("csvtojson Test_Trace.csv 1> ~/Traces/Test_Trace.json");
FILE* traceFile = fopen("~/Traces/Test_Trace.json", "r");
if(traceFile == NULL)
perror("Error in Openning the trace file");
else
cout << "Successfull openning of the trace file!" << endl;
return 0;
}
Thank you guys for your answers. I had to be more detailed in my question as the problem seemed to be somewhere that wasn't clear from my question.
I figured out what was the problem, and would like to share it here (not a super interesting finding, but worth mentioning).
I wrote a simple program to find the problem:
int main(){
system("csvtojson Test_Trace.csv 1> ~/Traces/Test_Trace.json");
FILE* traceFile = fopen("~/Traces/Test_Trace.json", "r");
if(traceFile == NULL)
perror("Error in Openning the trace file");
else
cout << "Successfull openning of the trace file!" << endl;
return 0;
}
If you run this program you will get the error message No such file or directory, but if you replace the address string with the absolute location, i.e., /home/USER_ID/Traces/Test_Trace.json, in both system(...) and fopen(...) calls, your code will work fine. Interestingly, myself suspected that this could be the problem and I changed just the one for system(...) but still it wasn't working (though the file was being created in the location that was passed to fopen(...)).
EDIT: Thanks to #Peter's comment, this problem was because system() call takes care of ~, but fopen() does not and need an absolute path. So there is really no need to have both functions been given the absolute path.
Anyhow,
Thanks Again. :)
Perhaps the reason for this is because the system command hasn't finished executing by the time your program continues to the next instructions where it tries to read from the file that hasn't been created yet.
Although, this isn't the best way, putting in a short pause might make the difference, or at least let you know if that is the issue.

Why doesn't pugixml write back to the currently opened file?

The following code is basically everything I'm doing - opening an XML file, processing it and (trying to) write it back. But writing back fails, every time. I tried to find a solution wrote code, Googled, but got no answer.
xml_parse_result result = doc.load_file("data.xml");
//I checked the value of result, it is equal to status_ok, so the file opened fine.
//...
//some XML processing
//...
bool b = doc.save_file("data.xml"); //b is always false
So, is it like pugi doesn't close the file after taking in the input or what? That doesn't seem to be the case as I can delete the file while the program is running. Does anyone know why my program reads the file but doesn't write the modifications back into it?
Try loading the file from an ifstream. This way you have control over the file, and can be sure when it is closed.
// Initialization code
{
std::ifstream stream("data.xml");
pugi::xml_parse_result result = doc.load(stream);
// Check validity
} // Input stream implicitly destructed and file closed.
// Processing
{
std::ofstream stream("data.xml");
doc.save(stream);
} // Output stream implicitly destructed and file closed.
As to why this happens... The Documentation isn't explicit about it, so it's hard to tell. It seems that it should close the file after loading, but the only way to be sure is by looking at the source code. BTW, if you're on a linux OS, you should be able to delete opened files.

Deleting Lines after reading them in C++ program using system()

I am trying to understand how basic I/O with files is handled in c++ or c. My aim is to read file line by line and send the lines across to a remote server. If the line is sent, I want to delete it from the file.
One way I tried was that I kept a count of the lines read and called an system() system call to delete the 'count' number of lines. I used the bash command: sed -i -e 1,'count'd filename.
After that I continued reading the file and surprisingly it worked as planned.
I have two questions:
Is this way reliable?
And why does this work at all, when while
reading the file I deleted a part of it and yet it works? What if I
did a seek to a previous position, what then?
Best,Digvijay
PS:
I would be glad if somebody could suggest a better way.
Also here is the code for the program I wrote:
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
#include<cstdlib>
int main(){
std::ifstream f;
std::string line;
std::stringstream ss;
int i=0;
f.open("in.txt");
if(f.is_open()){
while(getline(f,line)){
std::cout<<line<<std::endl;
i++;
if(i==2)break;
}
ss<<"sed -i -e 1,"<<i<<"d in.txt";
system(ss.str().c_str());
while(getline(f,line)){
std::cout<<line<<std::endl;
}
}
return 0;
}
Edit:
Firstly thanks for taking the time to write answers. But here is some extra information which I missed out on earlier. The files I am dealing with are log files. So they are constantly being appended with information from devices. The reason why I want to avoid creating a copy is, because the log file themselves are very big(at times) and plus this would help to keep the log file short. Since they would be divided into parts and archived on the server.
Solution
I have found the way to deal with the problem. Apparently Thomas is right, that sed does create a new file. So the old file remains as is. Using this, I can read n lines, call the system function, close the file pointer and open it again. I do this on small chunks of the log, repeatedly until it becomes small and hence efficient to deal with. The server while archives the logs in 1gb files.
However I have a new question, due to memory constraint, I need to know if it is possible to split a log file into two efficiently. (Which possibly would be another question on SO)
Most modern file systems don't support deleting lines at the beginning of the file, so doing so would be very inefficient.
The normal solution to your actual problem is to stop writing to your log file when it reaches some size, then start writing to a new file. The code that copies the files can delete a whole file once it has been written (this is an efficient operation).
sed writes a new version of the file, while the program keeps reading the same version that it opened. This is the usual behavior of Unix and Linux when a program writes a file that another program has open.
You can see this for yourself with this small C program:
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *f = fopen("in.txt", "r");
while (1) {
rewind(f);
int lines = 0;
int c;
while ((c = getc(f)) != EOF)
if (c == '\n')
++lines;
printf("Number of lines in file: %d\n", lines);
}
return 0;
}
Run that program in one window, and then use sed in another window to edit the file. The number of lines printed by the program will stay the same, even if the file on disk has been edited, and this is because Unix keeps the old, open version, even if it is no longer accessible to other programs.
As to your first question, how reliable your solution is, as far as I can see it should be reliable, except with the usual caveats about the system crashing or running out of memory in the middle of an update, someone else accessing the file, and of course all the problems with the system call. It is not very efficient, though, and for large data sets you might want to do it differently.
sujin's comment about using a temporary file for the lines you want to keep seems reasonable. It would be both faster and safer. Keep the original file, so if the system crashes you'll still have your data, and wait until you have finished to rename the old file to "in.txt.bak", and then rename your temporary file to "in.txt".
First off, avoid use of system calls as much as you can (if possible, don't use it at all) as they create race conditions and other problems which drastically (and often) detrimentally affect the outcome of your program. This especially true if access to files are involved.
Given your problem, there are a number of ways to do this, each with its own caveats.
I'll cover three possible solutions:
1) If the file is small enough:
you can read in the entire thing in a data structure (vector, list, deque, etc.)
delete the original file
determine how many lines to read (and send off via server protocol)
then write the remaining lines as the name of the original file.
If you intend to parallelize your program later on, this may be a better solution, provided that the file is small. Note: small is a relative term, but is generally limited by how much memory you have available.
2) If the file is quite large or you're limited by memory constraints, you will have to get creative by using buffers. Once you've read a line and successfully sent it off via your program, you determine where the file pointer is and copy the remaining information until the end of the current file as a new file. Once done, close and delete the old file, then close and rename the new file the same name as the old file.
3) If your solution doesn't have to be in C++, you can use shell-scripting or (controversially) another language to get the job done.
1) No, it's not reliable.
2) The C++ runtime library reads your file in blocks (internally) which are then parceled out to your (higher level) input requests until the block(s) is(are) exhausted, forcing it to (internally) read more blocks from disk. Since one or more physical blocks are read in before you make any call to sed, it/they cannot be altered if sed happens to change that first part of the file.
To see your code fail, you would need to make the input file big enough that there are remaining blocks of the file that have not been read in (internally by the runtime library) before you call sed. By "fail" I mean your program would not see all the characters that were originally in the file before sed clobbered some lines.
As the other guys said, you have to make another file with the records you need after read the original file and then delete it. But in this application perhaps you will see more useful a fifo than a file. If you are on a *NIX platform check up about the makefifo statement from the console.
It is like a file with the singularity that after read a line it gets deleted.

Does constructing an iostream (c++) read data from the hard drive into memory?

When I construct an iostream when say opening a file will this always read the entire file from the hard disk and then put it into memory, or is it streamed in and buffered by the OS on demand?
I ask because one way to check if a file exists is to see if opening it fails, but I fear if the files I am opening are very large then this take a long time if iostream must read the entire file in on open.
To check whether a file exists can be done like this if you want to use boost.
#include <boost/filesystem.hpp>
bool fileExists = boost::filesystem::exists("foo.txt");
No, it will not read the entire file into memory when you open it. It will read your file in chunks though, but I believe this process will not start until you read the first byte. Also these chunks are relatively small (on the order of 4-128 kibibytes in size), and the fact it does this will speed things up greatly if you are reading the file sequentially.
In a test on my Linux box (well, Linux VM) simply opening the file only results in the OS open system call, but no read system call. It doesn't start reading anything from the file until the first attempt to read from the stream. And then it reads 8191 (why 8191? that seems a very strange number) byte chunks as I read the file in.
Opening a file is a bad way of testing if the file exists - all it does is tell you if you can open it. Opening might fail for a number of reasons, typically because you don't have read permission, but the file will still exist. It is usually better to use an operating system specific function to test for existence. And no, opening an fstream will not cause the contents to be read.
What I think is, when you open a file, the corresponding data structures for the process opening the file are populated which include file pointer, file descriptor, v node etc.
Now one can read and write to a file using buffered streams (fwrite , fread) or using system calls (read and write).
When we use buffered streams, we buffer the data and then write or read it[This is done for efficiency puposes]. This statement itself means that the whole file is not read into memory but certain bytes are read into buffer and then made available.
In case of sys calls such as read and write , kernel level buffering is done (using fsync one can flush out kernel buffer too), but data is actually read and written to the device .file
checking existance of file
#include &lt sys/stat.h &gt
int main(){
struct stat file_i;
std::string f("myfile.txt");
if (stat(f.c_str(),&file_i) != 0){
cout &lt&lt "File not found" &lt&lt endl;
}
return 0;
}
Hope this clarifies a bit.