how to read a data from random place of txt file in c++ - c++

I am coding a Ranking system for a game written in C++ and want to read a word from random place in a file.
For example, if the 2 last lines of a file are:
.
.
.
dani 1902
pat 1300
and I read 1300, how can I get back to read 1902?

When reading information from a file; the process of opening and closing it a bunch of times in succession is not really a good code design. In terms of CPU cycles, cache etc. accessing files from a hard drive is considered slow compare to that which is in ram, or cache. So my advise to you would be the same that David C. Rankin mentioned in the comments above...
Read all of the files information into a either a large single string buffer, or into a vector of strings where each string in that vector is a single line of text from that file. Then close the file after all of the contents have been read and be done with it.
This makes it a one time find file, open file, and read access file operation.
Afterwards it is a matter of parsing your string or vector of strings which is much faster because these containers are already in ram where the application is running since it is a part of the same process or running thread.

Related

One large file, or several small files?

I'm writing 3D model data out to file, while includes a lot of different types of information (meshes, textures, animation, etc) and would be about 50 to 100 mb in size.
I want to put all this in a single file, but I'm afraid it will cost me if I need to read only a small portion of that file to get what I want.
Should I be using multiple smaller files for this, or is a single very large file okay? I don't know how the filesystem treats trying to jump around giant files, so for all I know iterating through a large file may either be costly, or no problem at all.
Also, is there anything special I must do if using a single large file?
There is no issue with accessing data in the middle of a file - the operating system won't need to read the entire file, it can skip to any point easily. Where the complexity comes in is you'll need to provide an index that can be read to identify where the various pieces of data are.
For example, if you want to read a particular animation, you'll need a way to tell your program where this data is in the file. One way would be to store an index structure at the beginning of the file, which your program would read to find out where all of the pieces of data are. It could then look up the animation in this index, discover that it's at position 24680 and is 2048 bytes long, and it could then seek to this position to read the data.
You might want to look up the fseek call if you're not familiar with seeking within a file: http://www.cplusplus.com/reference/cstdio/fseek/

C++ Write Data to Random HDD Sector

I need to write a program using C++ which is able to perform data write/read to/from both random and sequential hard disk sectors.
However, actually I am confused with the term sector and its relation with a file.
What I want to know is, if I simply:
1. Create a string contains word "Hello, world" and then;
2. Save the string into "myfile.txt",
does the data written in sequential or random sector? If it is sequential (I guess), then how can I write the string to random hard disk sector and then read it again? And also vice-versa.
What you are trying to do is pretty much impossible today because of file systems. If you want a file (which you seem to), you need a file system. A file system then places the data in some format it wants to the sectors it thinks are best. Advanced filesystems such as btrfs and zfs also do compression, checksumming and placing data on multiple hard disks. So you can't just write to a sector, because you would likely destroy data and you couldn't read it anymore because your file system doesn't understand your data format. Also it wouldn't even know that there is data there because file must be registered in the MFT/btrfs metadata/... tables.
TL;DR Don't try to do it, it will mess up your system and it won't work.

Remove A Line Of Text With Filestreams (C++)

I have a large text file.
Each time my program runs, it needs to read in the first line, remove it, and put that data back into the bottom of the file.
Is there a way to accomplish this task without having to read in every part of the file?
It would be great to follow this example of pseudo code:
1. Open file stream for reading/writing
2. data = first line of file
3. remove first line from file <-- can I do this?
4. Close file stream
5. Open file stream for appending
6. write data to file
7. Close file stream
The reason I'm trying to avoid reading everything in is because the program runs at a specific time each day. I don't want the delay to be longer each time the file gets bigger.
All the solutions I've found require that the program process the whole file. If C++ filestreams are not able to accomplish this, I'm up for whatever alternative is quick and efficient for my C++ program to execute.
thanks.
The unfortunate truth is that no filesystem on a modern OS is designed to do this. The only way to remove something from the beginning of a file is to copy the contents to a new file, except for the first bit. There's simply no way to do precisely what you want to do.
But hopefully you can do a bit of redesign. Maybe each entry could be a record in a database -- then the reordering can be done very efficiently. Or perhaps the file could contain fixed-size records, and you could use a second file of indexes to specify record order, so that rearranging the file was just a matter of updating the indices.

Truncating the file in c++

I was writing a program in C++ and wonder if anyone can help me with the situation explained here.
Suppose, I have a log file of about size 30MB, I have copied last 2MB of file to a buffer within the program.
I delete the file (or clear the contents) and then write back my 2MB to the file.
Everything works fine till here. But, the concern is I read the file (the last 2MB) and clear the file (the 30MB file) and then write back the last 2MB.
To much of time will be needed if in a scenario where I am copying last 300MB of file from a 1GB file.
Does anyone have an idea of making this process simpler?
When having a large log file the following reasons should and will be considered.
Disk Space: Log files are uncompressed plain text and consume large amounts of space.
Typical compression reduce the file size by 10:1. However a file cannot be compressed
when it is in use (locked). So a log file must be rotated out of use.
System resources: Opening and closing a file regularly will consume lots of system
resources and it would reduce the performance of the server.
File size: Small files are easier to backup and restore in case of a failure.
I just do not want to copy, clear and re-write the last specific lines to a file. Just a simpler process.... :-)
EDIT: Not making any inhouse process to support log rotation.
logrotate is the tool.
I would suggest an slightly different approach.
Create a new temporary file
Copy the required data from the original file to the temporary file
Close both files
Delete the original file
Rename the temp file to the same name as the original file
To improve the performance of the copy, you can copy the data in chunks, you can play around with the chunk size to find the optimal value.
If this is your file before:
-----------------++++
Where - is what you don't want and + is what you do want, the most portable way of getting:
++++
...is just as you said. Read in the section you want (+), delete/clear the file (as with fopen(... 'wb') or something similar and write out the bit you want (+).
Anything more complicated requires OS-specific help, and isn't portable. Unfortunately, I don't believe any major OS out there has support for what you want. There might be support for "truncate after position X" (a sort of head), but not the tail like operation you're requesting.
Such an operation would be difficult to implement, as varying blocksizes on filesystems (if the filesystem has a block size) would cause trouble. At best, you'd be limited to cutting on blocksize boundaries, but this would be harry. This is such a rare case, that this is probably why such a procudure is not directly supported.
A better approach might be not to let the file grow that big but rather use rotating log files with a set maximum size per log file and a maximum number of old files being kept.
If you can control the writing process, what you probably want to do here is to write to the file like a circular buffer. That way you can keep the last X bytes of data without having to do what you're suggesting at all.
Even if you can't control the writing process, if you can at least control what file it writes to, then maybe you could get it to write to a named pipe. You could attach your own program at the end of this named pipe that writes to a circular buffer as discussed.

How do you process a large data file with size such as 10G?

I found this open question online. How do you process a large data file with size such as 10G?
This should be an interview question. Is there a systematic way to answer this type of question?
If you're interested you should check out Hadoop and MapReduce which are created with big (BIG) datasets in mind.
Otherwise chunking or streaming the data is a good way to reduce the size in memory.
I have used streambased processing in such cases. An example was when I had to download a quite large (in my case ~600 MB) csv-file from an ftp server, extract the records found and put them into a database. I combined three streams reading from each other:
A database inserter which read a stream of records from
a record factory which read a stream of text from
an ftp reader class which downloaded the ftp stream from the server.
That way I never had to store the entire file locally, so it should work with arbitrary large files.
It would depend on the file and how the data in the file may be related. If you're talking about something where you have a bunch of independent records that you need to process and output to a database or another file, it would be beneficial to multi-thread the process. Have a thread that reads in the record and then passes it off to one of many threads that will do the time-consuming work of processing the data and doing the appropriate output.
In addition to what Bill Carey said, not only does the type of file determine "meaningful chunks" but also, it determines what "processing" would mean.
In other words, what you do to process, how you determine what to process will vary tremendously.
What separates a "large" data file from a small one is--broadly speaking--whether you can fit the whole file into memory or whether you have to load portions of the file from the disk one at a time.
If the file is so large that you can't load the whole thing into memory, you can process it by identifying meaningful chunks of the file, then reading and processing them serially. How you define "meaningful chunks" will depend very much on the type of file. (i.e. binary image files will require different processing from massive xml documents.)
Look for opportunities to split the file down so that it can be tackled by multiple processes. You don't say if records in the file are related, which makes the problem harder but the solution is in principle the same - identify mutually exclusive partitions of data that you can process in parallel.
A while back I needed to process 100s of millions of test data records for some performance testing I was doing on a massively parallel machine. I used some Perl to split the input file into 32 parts (to match the number of CPUs) and then spawned 32 processes, each transforming the records in one file.
Because this job ran over the 32 processors in parallel, it took minutes rather than the hours it would have taken serially. I was lucky though, having no dependencies between any of the records in the file.