when writing to a text file, using std::ofstream::operator<< I can either write from the beginning or the end of the file and are unable to write from the middle.
I have this text file.
##scores##
player1: 50
player2: 10
player3: 80
now, player2 just gains 40 points. and I want one able to write that in without touching the other players' scores. I could copy the whole text and then modify it inside my application but it feels a bit hacky and slow. I wondered if there are any better ways? kind of like:
std::ofstream::setpos(50);
Related
I want to write to a text file with limited size (1KB for example)
and when the file reaches the maximum size I want the first half to be removed and continue writing to the file (append to what remained after removing first half).
is this possible in C++?
for example:
file that reached maximum size:
1 2 3 4 5 6
and I want to continue writing [7,9]
the new file will look like:
4 5 6 7 8 9
You could use a dedicated logging library for example spdlog.
https://github.com/gabime/spdlog
It has a ton of features, including rotating logs.
You can define how big a log file should be, and how many logfiles you want. It can automatically discard the older logs.
If you really want to write it yourself, you have to
keep the content you want to keep in memory, e.g. in a ring buffer.
Whenever something is added to that buffer, you have to rewrite the file. And then flush it, otherwise it is lost if the program crashes.
This can have a big performance impact, so hande with care.
I am given a config file that looks like this for example:
Start Simulator Configuration File
Version/Phase: 2.0
File Path: Test_2e.mdf
CPU Scheduling Code: SJF
Processor cycle time (msec): 10
Monitor display time (msec): 20
Hard drive cycle time (msec): 15
Printer cycle time (msec): 25
Keyboard cycle time (msec): 50
Mouse cycle time (msec): 10
Speaker cycle time (msec): 15
Log: Log to Both
Log File Path: logfile_1.lgf
End Simulator Configuration File
I am supposed to be able to take this file, and output the cycle and cycle times to a log and/or monitor. I am then supposed to pull data from a meta-data file that will tell me how many cycles each of these run (among other things) and then im supposed to calculate and log the total time. for example 5 Hard drive cycles would be 75msec. The config and meta data files can come in any order.
I am thinking I will put each item in an array and then cycle through waiting for true when the strings match(This will also help detect file errors). The config file should always be the same size despite a different order. The metadata file can be any size so I figured i would do a similar thing but in a vector.
Then I will multiply the cycle times from the config file by the number of cycles in the matching metadata file string. I think the best way to read the data from the vector is in a queue.
Does this sound like a good idea?
I understand most of the concepts. But my data structures is shaky in terms of actually coding it. For example when reading from the files, should I read it line by line, or would it be best to separate the int's from the strings to calculate them later? I've never had to do this that from a file that can change before.
If i separate them, would I have to use separate arrays/vectors?
Im using C++ btw
Your logic should be:
Create two std::map variables, one that maps a string to a string, and another that maps a string to a float.
Read each line of the file
If the line contains :, then, split the string into two parts:
3a. Part A is the line starting from zero, and 1-minus the index of the :
3b. Part B is the part of the line starting from 1+ the index of the :
Use these two parts to store in your custom std::map types, based on the value type.
Now you have read the file properly. When you read the meta file, you will simply look up the key in the meta data file, use it to lookup the corresponding key in your configuration file data (to get the value), then do whatever mathematical operation is required.
I had requirement to read text file but its too large then I decide to only read some lines in this file. Can I use seek method for jump given line? Then I can only read that line because that text file is too large reading whole file is wasting lot of time. If its not possible, any one give better solution for that? (seek to given line and read it) (I know binary text files are reading byte by byte)
ex of my file
event1 0
subevent 1
subevent 2
event2 3
(In my file after one event its display number of lines I want to seek for previous event)
Yes, you can seek to a point in the file then read from there. One possible problem is that if the lines are all different lengths, a random location in the file will have a higher probability of being in a longer line: you're not getting evenly distributed probabilities of different lines. If you really really must have identical probabilities then you need to make at least one pass over the file to find the start of each line - then you can store those offsets in a vector and randomly select a vector element to guide seeking to the line data in the file. If you only care a little bit, then you can perhaps advance a small but random number of lines past the one you initially seek to... that will even the odds a bit, avoids the initial pass, but isn't perfect. hansmaad's comment adds a neat approach too - perfect results with pretty-good performance - but requires that you have all the lines numbered in the file itself.
Unless each line has exactly the same length, you're going to have to scan through it.
If you want to jump around in it, you can scan through it, saving the offset of each line in a container of your choice, and then use that to seek to a specific line.
Assuming that the lines are variable / random length, I don't believe there is any built-in way to jump directly to the start of a particular line. You can seek to an arbitrary byte position in the file. However, this might land anywhere in the beginning / middle / end of a line.
My best suggestion would be to attack the problem in two steps:
First, make a complete pass through the file, byte by byte, searching for the start of each line. Record the byte position of each line and store it into an array, vector, etc. (Basically, you are creating an index that maps from line number to starting position.) Then, when you have this index built up, you can easily jump to a particular line, by looking up the position in your index.
As far as I know, there is no built-in way to seek to a new line without already knowing where the lines are. I can't tell you the best way to achieve your goal, because most of your question details how you're trying to accomplish it, not what it is you're actually trying to accomplish. Therefore, I might go one of two ways with this:
1) If you actually need every last bit of data from the file (there is no metadata or other information that can be discarded):
Someone mentioned scanning through the file, tracking the lines as you go and building an index with it so you can read in one line at a time. This might work, and it would be the way to go if you actually need each line in its entirety, or if you only need the line number and plan on reading in small pieces at a time from there. However, without knowing details about your constraints or requirements, I would not recommend reading in entire lines using this method for one main reason: I have no way of knowing that one line will not itself be too large to load (what if there is only one line in the file?).
Instead, I would simply allocate a buffer of a size that is an appropriate amount to process at a time, and process the file in chunks of that size until you reach the end. You can stream more data in as you go. Without additional details, I can't tell you what that magic number should be, but the size of the largest chunk of information you might need to process is a good starting point as a minimum.
2) If you don't need every last bit of data from the file (you can discard some of the information in it), then you only need some of it. If you only need select pieces of data, then they are easier to find if they are tagged (which is what XML is for). There are lots of free XML parsers, or you can write your own. Then you'd search for tags instead of arbitrary line numbers, and changes to the file that result in the data being in a different location won't affect your ability to find it if it's tagged, as it would if you're just going by line numbers.
I've looked at ostream, istream, iostream, and fstream, and I still can't quite figure out how to simply delete lines from an input or output text file just as any clown can do with a text editor.
If I've got a file that reads, for example,
box 1\n
socks\n
box 2\n
pajamas\n
box 3\n
textbooks\n\eof
etc, and I want to totally delete the second box (the 3rd and 4th line) so that the text file reads
box 1\n
socks\n
box 3\n
textbooks\n\eof
I would evidently need to create a temporary file, and place only what I want to keep in that temp file, delete the original file, then rename the temp file with the name of the original file. This is fine and dandy and I understand that it's actually how all programs handle document editing. But it's a tedious pain in the ass to code.
What I'd REALLY love to be able to do is just manipulate what exists without jumping through all these hoops every time I manipulate my text, since I have a huge amount of stuff to sort through and edit.
So this is my hypothesis that I'd like some advice with. Would it be easier to, upon opening the file, to store the entire contents of the file into a string, a vector, a dynamically allocated char array, or perhaps a stringstream, so that I can easily delete parts of it and rearrange it? I could then dump my edited text into a temp text, delete the original, and rename the tempfile with the name of the original file.
Is there any validity to this, or is there a simpler way to do it? I'm tempted to use vectors as my first guess.
[EDIT] Keep in mind that the file I'm dealing with isn't quite so nicely organized to merit the use of structs for easy manipulation of chunks of data. It could be huge paragraphs of prose, or meaningless strings of digits.
If you have many lines and lots of changes, I'm tempted to use a std::list rather than a std::vector.
If you delete or insert often, then the lines must be rearranged. Doing that with a std::vector is more expensive than with a std::list.
I have 2 ~59GB text files in ".fastq" format. fastq files are genomics read files from a sequencer. Every 4 lines is a new read, but the lines are of variable size.
The filesize is roughly 59GB, and there are about 211M reads-- which means, give or take, approximatley 211M*4 = 844M lines. The program I'm using, Bowtie, currently has the ability to do the following options:
"--skip 105M --qupto 105M"
which essentially means "skip the first 105M reads and only process up to the next 105M reads." In this way you can break up processing of the file. The problem is, the way that it does the skipping is incredibly slow. It just reads the first 105M reads as it normally would, but doesn't process them. Then it starts comparisons once it gets to the read value it was given.
I am wondering if I can use something like C/C++'s fsetpos to set the position to the middle of the file [or wherever] which I realize will probably put me somewhere in the middle of a line, and then from there find the beginning of the first full read to start processing rather than waiting for it to read approximately 422M lines until it gets where it needs to go. Does anybody have experience doing fsetpos on such a large file, and know whether or not the performance is any better than it is how it's currently doing it?
Thanks--
Nick
Yes, you can position to the middle of a file using C++.
For huge files, the performance is usually better than reading the data.
In general, the process for positioning within a file:
A request is made to read the directory entry for the file.
The directory is searched to find the track and sector for the file
position.
Note: Some filesystems may have directory extensions for large
files, thus more data will need to be read.
On the next read, the hard drive is told to go to the given track
and sector, then read in data.
You are saving time from all the previous data to pass through the communications port and into memory (or ignored).