Append data columnwise in C/C++ - c++

I want to add columns of data to a text file, one column in each iteration (one space between each column). If I open file for appending, it adds next column at the bottom of first column. Is it possible to append sideways?
All data isn't available at the start. Only one column of data becomes available in each iteration, and it gets lost in the next iteration.

Consider the file to be one long stream of characters, some of them just happen to be line breaks. Append always starts at the end of the file. If I'm reading you right you need to use seekp(seek new position to put new characters at) on your fstream to get to the right position before writing.
You know the format of your file, therefore you can calculate how much to skip in each line.
Something like this might work:
read line
while line != "":
skip forward the right number of " "
write new column
read new line

Related

Reading a line of a text file from a specific position in C++

I would like to read a text file in C++ in following manner:
Ignore the entire first line as it is simply meant as an introduction.
Only read the following lines from a specific position.
That starting position for reading is a fixed one and remains the same for every line; however, the numbers after that may be of variable length. I need to save all of these numbers from line 2 to line n into an Array.
At the moment I can read a regular 2D Array with getline.
How can I work around these things?
An example for a line I want to read could be:
Person1: 25 988.3 0.0023 7
To set the file to a position, use std::ifstream::seekg().
To set the file to the beginning of a line, you must read and count the line endings. Many text files have variable length text lines.
How can I work around these things?
You can't, unless you can ensure that all of the data lines after the first line are all the same length.
If you can't ensure that, then all you can do is read through all of the preceding lines.
An alternative I have employed in the past is to generate an 'index' of line start positions in a secondary file in binary format (so that I CAN jump directly to the right place in that file), and use that to jump to the right place in the text file. Of course that means that you need to regenerate that index file every time you replace/amend the data file.

Go back one line on a text file C++

my program reads a line from a text file using :
std::ifstream myReadFile("route.txt");
getline(myReadFile, line)
And if it finds something that i'm looking for (tag) it stores that line in a temp String. I wan't to continue this until i find some other tag, if i find an other tag i want to be able to return to the previous line in order for the program to read if again as some that other tag and do something else.
I have been looking at putback() and unget() i'm confuse on how to use them and if they might be the correct answer.
Best would be to consider a one pass algorithm, that stores in memory what it could need at the first tag without going back.
If this is not possible, you can "bookmark" the stream position and retreive it later with tellg() and seekg():
streampos oldpos = myReadFile.tellg(); // stores the position
....
myReadFile.seekg (oldpos); // get back to the position
If you read recursively embedded tags (html for example), you could even use a stack<streampos> to push and pop the positions while reading. However, be aware that performance is slowed down by such forward/backward accesses.
You mention putback() and unget(), but these are limited to one char, and seem not suited to your getline() approach.
The easiest thing by far, if you only ever want to roll back by one line, is always to keep track of the line you're on and the line before.
Maintain a cur variable that stores the current line, and prev that stores the previous one. When you move to the next line, you copy cur into prev, and read the new line into cur.
That way, you always have the previous line available.

c++ overwriting file data?

I am trying to run a program to replace certain data within a file. The relevant parts of the file attempting to be replaced look like the following:
1 Information 15e+10
2 Information 2e+16
3 Information 6e+2
And so on.
The files in question can be very large in the multiple gigabyte range and to my understanding because of this using a buffer of the whole file and rewriting the whole file is impossible/unreasonable. Well that is all fine I just want to replace the values (ex. the 15e+10).
This all works fine with simple ios::in|ios::out and tellp() if I am replacing the value with a similar sized value (15e+10->12e+12) or even if its a smaller size as I can simply add an extra space which can be ignored down the line (ex. 15e+10->4e+10 ). But I am running into the problem if I need to replace the value with a value whose length is longer than already in the file (ex. 6e+2->16e+10) it will write over the new line character or start writing over the information in the next line.
I have searched on the forums and everyone says you can either overwrite in the file, you can append to the end of the file, or you can buffer and recreate the whole file. Is there anyway I can achieve my goal of overwriting the value correctly without having to recreate the file?
If not then how can I have 2 files open (1 input 1 output) to do this if multiple files in question are too large for the memory?
Note: I would also like to avoid using boost:: as I need to be able to run this on a system without the boost library.
Open a stream to read from the input (IN) file and a second stream (OUT) to write to a new output (tmp) file.
Read from IN and write to OUT. When you get a value from IN that you want to replace write the replacement to OUT instead of the value you got from IN.
When parsing is complete replace the first file with the second (tmp) file.
Would this work for you?
Use lseek()/fseek() for "jump" to a given position in a file.
You can use seekp to go to the location and rewrite it with <<
Example:
example.txt ( |?| = 1 byte of data )
|A|B|C|\n|1|2|3|D|E|F|\n|4|5|6|
//Somewhere in the code
fstream file;
open("example.txt");
//Somehow find the character distance and store it into "distance"
seekp(distance);//If distance = 0, it will go to "A" like rewind() but easier for me
If the distance is 4, the next character will be overwritten is 1
file << "987";
And the file will be
|A|B|C|\n|9|8|7|D|E|F|\n|4|5|6|
BUT the only problem here is when you need to increase/decrease the size:
Increase:
You will overwrite the other character so you need to create a temp string to store it the rest of data or separate it into smaller chunk if the data is too large like
|A|B|C|\n|9|8|7|D|E|F|\n|4|5|6|
string tempstring;
seekp(distance);
file >> tempstring;
seekp(distance);
file << content << tempstring; //content is the data
Decrease:
The easiest solution is to write NULL character \0 to the excess space like
|A|B|C|\n|1|\0|\0|D|E|F|\n|4|5|6|
The only side-effect is the file size is the same as before

Splitting an ifstream in C++

I'm new to C++ and probably have a silly question. I have an ifstream which I'd like to split approximately in half.
The file in question is a sorted csv and I wish to search on the first value of each line of the file.
Eventually the file will be very large so I am trying to avoid having to read every line of the file.
e.g.
If the file contains 7 lines I'd like to split the ifstream to give 1 stream containing the first 3 lines and 1 stream containing the last 4 lines.
First, use the answer to this question to determine the size of your file. Then divide that number by two. Read the input line by line, and write it to the first output stream; check file.tellg() after each call. Once you're past the half-way point, switch the output to the second file.
This wouldn't split the strings evenly between the files, but the total number of characters in these strings should be close enough, and it wouldn't split your file in the middle of a string.
Think of it as a relational database with one huge table. In order to find a certain piece of data, you can either do a sequential scan over the entire table, or use an index (which must be usable for the type of query you want to perform).
A typical index for a text file would be a list of offsets inside the file, sorted by the index expression. If the csv file is sorted by a specific column already, then the offsets in the index would be ascending, which is useful to know when building the index.
So basically you have to read the file once anyway, to find out where lines end; this is the index for the sort column. To find a particular element, use a binary search, using the index to find individual elements in the data set.
Depending on the data type, you can extend your index to allow for quick comparison without reading the actual data table. For example, in a word list you could keep the first four letters of the word next to the offset, which allows you to get into the right area quickly and only requires data reads for the last accesses (which you can then optimize to a sequential scan, as filesystems handle that a lot better).
The same technique can be applied to the other columns as well; the offsets stored in the index would no longer be ascending in file order, of course.
Since it is CSV data, a special case also applies: If the only index you have is in the same order as the file data itself and the end of record can be determined easily (that is, either you have a fixed record length, or there is a clear record separator, such as an EOL character), then building the actual index can be omitted and the values guessed (for fixed length records, offset is always equal to record length times offset in the index; for separated records you can just jump into the middle of a record and seek for the next terminator; be aware that there are nasty corner cases with binary search here). This does however mean that you will always be reading data pages here, which is less efficient than just reading the index.

QTextStream Maniuplation

I'm opening a file and getting a QTextStream of it. I am then reading the stream line by line using readLine(). When the line matches a certain string, I need to replace it with another string. I need the behaviour to be that the line is completely replaced (ie, if the line was "longword" and I replace it with "word", the line should contain "word" and "word" only).
At the moment I am using seek() and then the << operator to put my string in at the given location, but the remnants of the last string remain, so I am left with something like "wordword". How can I prevent this from happening and ensure the entire previous line is fully replaced with my new one?
To my knowledge, you cannot simply remove a chunk of a text file in-place. If the replacement string was identical in size, you might be able to replace those exact bytes, and if it were shorter you might be able to hack around the problem by filling the empty space with nulls.
If you didn't want to do that, you would have to create a new file, read each line from the old file, make any required changes to that line in memory, then write that line out to the new file. Once this is complete, you could then replace the original file with the new file.
If it were possible to add/remove chunks to/from the file, you would most likely be left with a considerably fragmented file on the HDD. If you needed to insert more characters, extra fragments would have to be created as the new data simply couldn't fit in the amount of space occupied by the old data, and removing data would leave holes in the file.