Is there a way to control Cursor Point in std::fstream? - c++

I'm reading big datas from .txt file(s) like 1000-5000 persons data. and they saved in a file with alegorithm that i can easily read datas back. but to do it more easy, I'm seeking the way to control File cursor point.
for example in console app you can use VT100 Escape sequences like \033[3A (3 line up) \033[2D (2 Letter Left) \0337 (Save Position)
So is there way to control cursor point like that?

std::fstream can be seen as a linear stream of single bytes. As this, there is no way to use something like "cursor up" which as std::fstream has no knowledge of lines in the file.
What you can do is asking for the current position you are currently reading tellgor writing with tellp.
After keeping such positions, you can go back with seekg or seekp.
If you like you can store current line positions while reading the file and navigate later with this stored position. Alternatively, if you modify the file in a random fashion and the file is not such big, you can read it completely to a data structure of your choice, modify the data internally and write it back later.

Related

Most Efficient way of freqently(every 5 secs) updating/maintaining a file/vector mirror?

1) I have a file whose contents are mirrored via a vector type container.
2) The file contents are checked( for changes ) every 5 secs or so.
3) Any changes made to the file causes the vector to be updated, thus the mirror is maintained.
4) The contents of the vector are displayed on a screen in real time.
This problem must come up a lot, but I didn't find a satisfactory answer. It could be the answer just happens to be unsatisfactory, the two are not mutually exclusive, but lets see...
Possible Solutions:
Using basic C++ and the STL only.
1) File Data Length or Last Read Position.
After each read, store last read position.
Any new reads start from last read position.
Cons:
Any changes to existing file data will remain undetected.
2) Hash Check.
After each read, store the hash of each file line, which can be used later to check/read new file data.
Pro:
Any changes to file contents are reflected in the vector.
Con:
Every file line has be read, hashed and stored....twice!
Overhead as file grows in size.
3) No Checks.
Don't check anything just read the entire file and overwrite the vector each time, regardless of changes to the file contents.
Pro:
Any changes to the file contents will be reflected in the vector.
Con:
?
if you are on windows, just let the OS notify you on changes in the monitored folder.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365261(v=vs.85).aspx
this way you do not need to poll and also get info about changes to file attributes (e.g. last write)
once you know a actual change happended, you can read/update.
not sure how its done for other operating systems.

can i perform gzseek to update a file compressed using gzwrite (CPP)?

I have a file written using gzwrite. Now i want to edit this file and insert some data in the middle by seeking. Is this possible with gzseek/gzwrite in cpp?
No, it isn't possible. You have to create a new file by successively writing the pieces.
So it is not much different from inserting data in the middle of an uncompressed file, except for one thing: with the uncompressed file, you could leave a hole of the right size (a series of spaces, for example) and later on overwrite that with the data to be inserted, but of course that is not possible with the compressed file because you cannot predict its compressed length.

C++ Read only random lines in a file

I had requirement to read text file but its too large then I decide to only read some lines in this file. Can I use seek method for jump given line? Then I can only read that line because that text file is too large reading whole file is wasting lot of time. If its not possible, any one give better solution for that? (seek to given line and read it) (I know binary text files are reading byte by byte)
ex of my file
event1 0
subevent 1
subevent 2
event2 3
(In my file after one event its display number of lines I want to seek for previous event)
Yes, you can seek to a point in the file then read from there. One possible problem is that if the lines are all different lengths, a random location in the file will have a higher probability of being in a longer line: you're not getting evenly distributed probabilities of different lines. If you really really must have identical probabilities then you need to make at least one pass over the file to find the start of each line - then you can store those offsets in a vector and randomly select a vector element to guide seeking to the line data in the file. If you only care a little bit, then you can perhaps advance a small but random number of lines past the one you initially seek to... that will even the odds a bit, avoids the initial pass, but isn't perfect. hansmaad's comment adds a neat approach too - perfect results with pretty-good performance - but requires that you have all the lines numbered in the file itself.
Unless each line has exactly the same length, you're going to have to scan through it.
If you want to jump around in it, you can scan through it, saving the offset of each line in a container of your choice, and then use that to seek to a specific line.
Assuming that the lines are variable / random length, I don't believe there is any built-in way to jump directly to the start of a particular line. You can seek to an arbitrary byte position in the file. However, this might land anywhere in the beginning / middle / end of a line.
My best suggestion would be to attack the problem in two steps:
First, make a complete pass through the file, byte by byte, searching for the start of each line. Record the byte position of each line and store it into an array, vector, etc. (Basically, you are creating an index that maps from line number to starting position.) Then, when you have this index built up, you can easily jump to a particular line, by looking up the position in your index.
As far as I know, there is no built-in way to seek to a new line without already knowing where the lines are. I can't tell you the best way to achieve your goal, because most of your question details how you're trying to accomplish it, not what it is you're actually trying to accomplish. Therefore, I might go one of two ways with this:
1) If you actually need every last bit of data from the file (there is no metadata or other information that can be discarded):
Someone mentioned scanning through the file, tracking the lines as you go and building an index with it so you can read in one line at a time. This might work, and it would be the way to go if you actually need each line in its entirety, or if you only need the line number and plan on reading in small pieces at a time from there. However, without knowing details about your constraints or requirements, I would not recommend reading in entire lines using this method for one main reason: I have no way of knowing that one line will not itself be too large to load (what if there is only one line in the file?).
Instead, I would simply allocate a buffer of a size that is an appropriate amount to process at a time, and process the file in chunks of that size until you reach the end. You can stream more data in as you go. Without additional details, I can't tell you what that magic number should be, but the size of the largest chunk of information you might need to process is a good starting point as a minimum.
2) If you don't need every last bit of data from the file (you can discard some of the information in it), then you only need some of it. If you only need select pieces of data, then they are easier to find if they are tagged (which is what XML is for). There are lots of free XML parsers, or you can write your own. Then you'd search for tags instead of arbitrary line numbers, and changes to the file that result in the data being in a different location won't affect your ability to find it if it's tagged, as it would if you're just going by line numbers.

C++ write to a specific part of text file without overwriting

I'm writing a simple calendar application which saves the data in a text file. I use the iCalendar-format, so my text file ends "END:VCALENDAR".
When the user adds a new event, the application should write the associated data at the end of the text file without overwriting "END:VCALENDAR", how can I do this? What about deleting an event which is saved in the middle of the text file? Is there a need to write the whole file again using the updated data? Many thanks.
You can't dynamically "expand" the file by writing in the middle of it.
You'll need to, either:
Deserialize the whole calendar to memory, then write it back (best option)
Read into memory everything which lies past the point you want to insert the data, write you data, then write the stored file "tail"
There isn't any way of inserting into the middle of a file; the underlying OS doesn't support it. The usual technique is to copy the file into a temporary file, making whatever modifications you need to along the line, then (and only if there are no errors on the output of the copy—do verify that the output stream has not failed after the close) delete the input file and rename/move the temporary file to the original name.
There is no method supported by the C++ libraries that, unlike append, gives an option to insert at any specific position into a file; be it a text or a binary file.
There are two options for you then:
First is the one you are presuming, that is, read the whole file, update the data and write it back again.
Second is to seek in the file to the last line's first character E as in END:VCALENDAR, write your event and then append "END:VCALENDAR" to it.
And yes, you can find that first character of last line, E right after the last newline character, programmatically.
Sorry, there isn't really any other way around, as far as I know.

skipping first half of a 59GB fastq file to process last half: read line-by-line, or fgetpos?

I have 2 ~59GB text files in ".fastq" format. fastq files are genomics read files from a sequencer. Every 4 lines is a new read, but the lines are of variable size.
The filesize is roughly 59GB, and there are about 211M reads-- which means, give or take, approximatley 211M*4 = 844M lines. The program I'm using, Bowtie, currently has the ability to do the following options:
"--skip 105M --qupto 105M"
which essentially means "skip the first 105M reads and only process up to the next 105M reads." In this way you can break up processing of the file. The problem is, the way that it does the skipping is incredibly slow. It just reads the first 105M reads as it normally would, but doesn't process them. Then it starts comparisons once it gets to the read value it was given.
I am wondering if I can use something like C/C++'s fsetpos to set the position to the middle of the file [or wherever] which I realize will probably put me somewhere in the middle of a line, and then from there find the beginning of the first full read to start processing rather than waiting for it to read approximately 422M lines until it gets where it needs to go. Does anybody have experience doing fsetpos on such a large file, and know whether or not the performance is any better than it is how it's currently doing it?
Thanks--
Nick
Yes, you can position to the middle of a file using C++.
For huge files, the performance is usually better than reading the data.
In general, the process for positioning within a file:
A request is made to read the directory entry for the file.
The directory is searched to find the track and sector for the file
position.
Note: Some filesystems may have directory extensions for large
files, thus more data will need to be read.
On the next read, the hard drive is told to go to the given track
and sector, then read in data.
You are saving time from all the previous data to pass through the communications port and into memory (or ignored).