I would like to ask about inputing more than one file or about the easiest way how to put filenames to some queue.
This files are about edit.
I have, let me say, 100 txt files and for each one I wanted to open it, find something and and save it.
I have functions/methods for each operation.
But I run into problem loading files into program. I made it for five or less files.
Process is. Program asked me for file name in program root directory or full path to file
for example C/myfile.txt
after pressing enter executes.
unfortunately bad thing happened and I have to do 100 files per day. So I know in C# is possible to make open file dialog - multiple file load...
In this program, I was thinking about doing static array of strings and for each make a for cycle with iterration.
but I have no idea what is the easier way how to load this strings (filenames) into this array.
I read something on msdn but it looks so complicated for me.
Can someone help? Program is just for me and I don't want to set too many things. It is possible?
what is the less or minimalistic part of code I must use - add to my program?
example of result I want
array[0] = C/text. txt
array[1] = C/texta. txt
...
array[50] = C/textdhsjfk. txt
maybe it is not so easy as I think, maybe yes.
But I have no support in this major... So I only tried to find something on the interent and I am not sure about the result I have found.
Thank you for time and willing to help.
A loop and a container look like good things to use:
static const char filenames[] =
{
"C:\\Fred\\file1.txt", "c:\\Fred\\file2.txt", "c:\\Barney\\file1.txt",
};
static const unsigned int quantity_of_files =
sizeof(filenames) / sizeof(filenames[0]);
//...
for (unsigned int i = 0u; i < quantity_of_files; ++i)
{
std::ifstream input_stream(filenames[i]);
Process_Input_Stream(input_stream);
input_stream.close();
}
The task is to get the code working correctly and robust, then worry about performance.
Note: The program is set up so that file names can be added to the array without having to change or retest the code.
Related
What is the best way to cut the end off of a fstream file in C++ 11
I am writing a data persistence class to store audio for my audio editor. I have chosen to use fstream (possibly a bad idea) to create a random access binary read write file.
Each time I record a little sound into my file I simply tack it onto the end of this file. Another internal data structure / file, contains pointers into the audio file and keeps track of edits.
When I undo a recording action and then do something else the last bit of the audio file becomes irrelevant. It is not referenced in the current state of the document and you cannot redo yourself back to a state where you can ever see it again. So I want to chop this part of the file off and start recording at the new end. I don’t need to cut out bitts in the middle, just off the end.
When the user quits this file will remain and be reloaded when they open the project up again.
In my application I expect the user to do this all the time and being able to do this might save me as much as 30% of the file size. This file will be long, potentially very, very long, so rewriting it to another file every time this happens is not a viable option.
Rewriting it when the user saves could be an option but it is still not that attractive.
I could stick a value at the start that says how long the file is supposed to be and then overwrite the end to recycle the space but in the mean time. If I wanted to continually update the data store file in case of crash this would mean I would be rewriting the start over and over again. I worry that this might be bad for flash drives. I could also recomputed the end of the useful part of the file on load, by analyzing the pointer file but in the mean time I would be wasting all that space potentially, and that is complicated.
Is there a simple call for this in the fstream API?
Am I using the wrong library? Note I want to stick to something generic STL I preferred, so I can keep the code as cross platform as possible.
I can’t seem to find it in the documentation and have looked for many hours. It is not the end of the earth but would make this a little simpler and potentially more efficient. Maybe I am just missing it somehow.
Thanks for your help
Andre’
Is there a simple call for this in the fstream API?
If you have C++17 compiler then use std::filesystem::resize_file. In previous standards there was no such thing in standard library.
With older compilers ... on Windows you can use SetFilePointer or SetFilePointerEx to set the current position to the size you want, then call SetEndOfFile. On Unixes you can use truncate or ftruncate. If you want portable code then you can use Boost.Filesystem. From it is simplest to migrate to std::filesystem in the future because the std::filesystem was mostly specified based on it.
If you have variable, that contains your current position in the file, you could seek back for the length of your "unnedeed chunk", and just continue to write from there.
// Somewhere in the begining of your code:
std::ofstream *file = new std::ofstream();
file->open("/home/user/my-audio/my-file.dat");
// ...... long story of writing data .......
// Lets say, we are on a one millin byte now (in the file)
int current_file_pos = 1000000;
// Your last chunk size:
int last_chunk_size = 12345;
// Your chunk, that you are saving
char *last_chunk = get_audio_chunk_to_save();
// Writing chunk
file->write(last_chunk, last_chunk_size);
// Moving pointer:
current_file_pos += last_chunk_size;
// Lets undo it now!
current_file_pos -= last_chunk_size;
file->seekp(current_file_pos);
// Now you can write new chunks from the place, where you were before writing and unding the last one!
// .....
// When you want to finally write file to disk, you just close it
file->close();
// And when, truncate it to the size of current_file_pos
truncate("/home/user/my-audio/my-file.dat", current_file_pos);
Unfortunatelly, you'll have to write a crossplatform function truncate, that would call SetEndOfFile in windows, and truncate in linux. It's easy enough with using preprocessor macros.
I have a code in this format:
srcSAXController control(input_filename.c_str());
std::string output_filename = input_filename;
output_filename = "c-" + output_filename.erase(input_filename.rfind(XML_STR));
std:: ofstream myfile(output_filename.c_str());
coverage_handler handler(i == MAIN_POS ? true : false, output_filename);
control.parse(&handler);
myfile.write((char *)&control, sizeof(control));
myfile.close();
I want the content of object 'control' to be written into my file. How to fix the code above, so that content of the control object is written to the file.
In general you need much more than just writing the bytes of the object to be able to save and reload it.
The problem is named "serialization" and depending on a lot of factors there are several strategies.
For example it's important to know if you need to save and reload the object on the same system or if you may need to reload it on a different system; it's also fundamental to know if the object contains links to other objects, if the link graph is a simple tree or if there are possibly loops, if you need to support versioning etc. etc.
Writing the bytes to disk like the code is doing is not going to work even for something as simple as an object containing an std::string.
I have a series of large text files (10s - 100s of thousands of lines) that I want to parse line-by-line. The idea is to check if the line has a specific word/character/phrase and to, for now, record to a secondary file if it does.
The code I've used so far is:
ifstream infile1("c:/test/test.txt");
while (getline(infile1, line)) {
if (line.empty()) continue;
if (line.find("mystring") != std::string::npos) {
outfile1 << line << '\n';
}
}
The end goal is to be writing those lines to a database. My thinking was to write them to the file first and then to import the file.
The problem I'm facing is the time taken to complete the task. I'm looking to minimize the time as far as possible, so any suggestions as to time savings on the read/write scenario above would be most welcome. Apologies if anything is obvious, I've only just started moving into C++.
Thanks
EDIT
I should say that I'm using VS2015
EDIT 2
So this was my own dumb fault, when switching to Release and changing the architecture type I had noticeable speed increases. Thanks to everyone for pointing me in that direction. I'm also looking at the mmap stuff and that's proving useful too. Thanks guys!
When you use ifstream to read and process to/from really big files, you have to increase the default buffer size that is used (normally 512 bytes).
The best buffer size depends on your needs, but as a hint you can use the partition block size of the file(s) your reading/writing. To know that information you can use a lot of tools or even code.
Example in Windows:
fsutil fsinfo ntfsinfo c:
Now, you have to create a new buffer to ifstream like this:
size_t newBufferSize = 4 * 1024; // 4K
char * newBuffer = new char[newBufferSize];
ifstream infile1;
infile1.rdbuf()->pubsetbuf(newBuffer, newBufferSize);
infile1.open("c:/test/test.txt");
while (getline(infile1, line)) {
/* ... */
}
delete newBuffer;
Do the same with the output stream and don't forget set new buffer before open file or it may not work.
You can play with values to find the very best size for you.
You'll note the difference.
C-style I/O functions are much faster than fstream.
You may use fgets/fputs to read/write each text line.
I am trying to understand how basic I/O with files is handled in c++ or c. My aim is to read file line by line and send the lines across to a remote server. If the line is sent, I want to delete it from the file.
One way I tried was that I kept a count of the lines read and called an system() system call to delete the 'count' number of lines. I used the bash command: sed -i -e 1,'count'd filename.
After that I continued reading the file and surprisingly it worked as planned.
I have two questions:
Is this way reliable?
And why does this work at all, when while
reading the file I deleted a part of it and yet it works? What if I
did a seek to a previous position, what then?
Best,Digvijay
PS:
I would be glad if somebody could suggest a better way.
Also here is the code for the program I wrote:
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
#include<cstdlib>
int main(){
std::ifstream f;
std::string line;
std::stringstream ss;
int i=0;
f.open("in.txt");
if(f.is_open()){
while(getline(f,line)){
std::cout<<line<<std::endl;
i++;
if(i==2)break;
}
ss<<"sed -i -e 1,"<<i<<"d in.txt";
system(ss.str().c_str());
while(getline(f,line)){
std::cout<<line<<std::endl;
}
}
return 0;
}
Edit:
Firstly thanks for taking the time to write answers. But here is some extra information which I missed out on earlier. The files I am dealing with are log files. So they are constantly being appended with information from devices. The reason why I want to avoid creating a copy is, because the log file themselves are very big(at times) and plus this would help to keep the log file short. Since they would be divided into parts and archived on the server.
Solution
I have found the way to deal with the problem. Apparently Thomas is right, that sed does create a new file. So the old file remains as is. Using this, I can read n lines, call the system function, close the file pointer and open it again. I do this on small chunks of the log, repeatedly until it becomes small and hence efficient to deal with. The server while archives the logs in 1gb files.
However I have a new question, due to memory constraint, I need to know if it is possible to split a log file into two efficiently. (Which possibly would be another question on SO)
Most modern file systems don't support deleting lines at the beginning of the file, so doing so would be very inefficient.
The normal solution to your actual problem is to stop writing to your log file when it reaches some size, then start writing to a new file. The code that copies the files can delete a whole file once it has been written (this is an efficient operation).
sed writes a new version of the file, while the program keeps reading the same version that it opened. This is the usual behavior of Unix and Linux when a program writes a file that another program has open.
You can see this for yourself with this small C program:
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *f = fopen("in.txt", "r");
while (1) {
rewind(f);
int lines = 0;
int c;
while ((c = getc(f)) != EOF)
if (c == '\n')
++lines;
printf("Number of lines in file: %d\n", lines);
}
return 0;
}
Run that program in one window, and then use sed in another window to edit the file. The number of lines printed by the program will stay the same, even if the file on disk has been edited, and this is because Unix keeps the old, open version, even if it is no longer accessible to other programs.
As to your first question, how reliable your solution is, as far as I can see it should be reliable, except with the usual caveats about the system crashing or running out of memory in the middle of an update, someone else accessing the file, and of course all the problems with the system call. It is not very efficient, though, and for large data sets you might want to do it differently.
sujin's comment about using a temporary file for the lines you want to keep seems reasonable. It would be both faster and safer. Keep the original file, so if the system crashes you'll still have your data, and wait until you have finished to rename the old file to "in.txt.bak", and then rename your temporary file to "in.txt".
First off, avoid use of system calls as much as you can (if possible, don't use it at all) as they create race conditions and other problems which drastically (and often) detrimentally affect the outcome of your program. This especially true if access to files are involved.
Given your problem, there are a number of ways to do this, each with its own caveats.
I'll cover three possible solutions:
1) If the file is small enough:
you can read in the entire thing in a data structure (vector, list, deque, etc.)
delete the original file
determine how many lines to read (and send off via server protocol)
then write the remaining lines as the name of the original file.
If you intend to parallelize your program later on, this may be a better solution, provided that the file is small. Note: small is a relative term, but is generally limited by how much memory you have available.
2) If the file is quite large or you're limited by memory constraints, you will have to get creative by using buffers. Once you've read a line and successfully sent it off via your program, you determine where the file pointer is and copy the remaining information until the end of the current file as a new file. Once done, close and delete the old file, then close and rename the new file the same name as the old file.
3) If your solution doesn't have to be in C++, you can use shell-scripting or (controversially) another language to get the job done.
1) No, it's not reliable.
2) The C++ runtime library reads your file in blocks (internally) which are then parceled out to your (higher level) input requests until the block(s) is(are) exhausted, forcing it to (internally) read more blocks from disk. Since one or more physical blocks are read in before you make any call to sed, it/they cannot be altered if sed happens to change that first part of the file.
To see your code fail, you would need to make the input file big enough that there are remaining blocks of the file that have not been read in (internally by the runtime library) before you call sed. By "fail" I mean your program would not see all the characters that were originally in the file before sed clobbered some lines.
As the other guys said, you have to make another file with the records you need after read the original file and then delete it. But in this application perhaps you will see more useful a fifo than a file. If you are on a *NIX platform check up about the makefifo statement from the console.
It is like a file with the singularity that after read a line it gets deleted.
I have a C++ program that creates an output file "A" with ofstream. This file is then read by some legacy C code that opens the file with _iobuf. The legacy code then creates its own output file "B" using _iobuf, and this file is then read by the C++ program using ifstream. This sequence is iterated many times, with the same file names for A and B in each iteration.
Occasionally, the C++ program cannot open the output file A for writing, and I must try several times before it succeeds. This occurs nondeterministically, and maybe once in a thousand iterations. Note that the C program never has to wait to open its input or output file, nor does the C++ program ever have to wait to open its input file. This informal observation is based on hundreds of thousands of iterations.
I'm wondering if this has something to do with mixing ofstream and _iobuf in the same program? Both the C++ code and the C code are linked into the same program. And the legacy C code is technically C++ code, but was written in a very C-like style. Is there anything I can do to eliminate this waiting to open the ofstream file? I do not want to change the legacy code if I can possibly avoid it.
Pseudo code (not compiled):
void someObject::someMethod()
{
for (int count = 0; count < someLimit; ++count)
{
newerObject::firstMethod();
olderObject::secondMethod();
newerObject::thirdMethod();
}
}
void newerObject::firstMethod()
{
// do some processing first
// then write the results of the processing to a file
ofstream A;
A.open("A", ofstream::out); // this sometimes must be tried multiple times
// write data to file A
A.close();
}
void olderObject::secondMethod()
{
FILE* f;
f = fopen("A", "rt"); // this always works the first time
// read data from file A
fclose(f);
// do some processing
f = fopen("B", "w");
// write data to file B
fclose(f);
}
void newerObject::thirdMethod()
{
ifstream B;
B.open("B"); // this always works the first time
// read data from file B
B.close();
// do some processing
}
Currently, as a work around, I put the ofstream::open in a do-while loop. I would love to get rid of this awkwardness. Thanks in advance for any advice you can give.
First off, the problem is almost certainly not the use of different methods to access the files: under the hood, the C and C++ I/O functions use the same system I/O facilities. You seem to be using Windows (on other systems files typically can be open multiple times simultaneously) and I don't know much about the system but I would suspect that the file system hasn't been updated to reflect that the file is closed when you try to open it. This may have to do with the "t" open flag: I don't know what this is about.
On UNIXes you can force the I/O operations to wait until the actual change on disk completed. Something like this could help avoiding the problem but has the significant cost that operations become hideously slow. On UNIXes one approach would be to blow away the file system entry the moment the file was opened successfully (after all, at this point its name isn't used anymore):
if (FILE* fp = fopen("file", "r")) {
remove("file");
// do processing
}
However, if I recall correctly on Windows you can neither remove the file nor rename it. Personally, in solving the problem I would proceed as follows:
Determine under which situations the file can't be opened, e.g. by keeping the file open and trying to open it. This is mainly intended to create a setup where the problem is reproducible so you can verify later that you indeed found a solution.
Once I found a way to reproduce the problem I would probably a better idea of the actual root cause and possibly googling would help. In any case this is the point where researching the root cause comes in.
Once the cause is understood it is hopefully easy to devise a solution. If not, opening the file multiple times under it is successful may very well be the right solution.