Read in a directory from a given file point in C++ - c++

I have two programs that will be reading / writing files to the same directory at the same time (but not to the same exact files at the same time). I have the writing portion done, but I am struggling to get a half way decent and working implementation of the reading directory portion.
The files within the directory follow the following naming scheme:
Image-[INDEX]-[KEY/DEL]--[TIMESTAMP]
[INDEX] increments up from 000000, [KEY/DEL] alternates based on whether the image is a key or a delta frame and [TIMESTAMP] is the Unix / Linux epoch time at file creation.
Right now, the reading program reads in the directory (using the dirent.h library) one file at a time every time it needs to find an image within the directory. When the directory gets extremely large, I would imagine that this operation / method will quickly become extremely resource intensive, and eventually fail. So, I am trying to find an alternative method. I was thinking of reading in the entire directory at initialization, and saving the file information in an array to access / use later in the program. Then, when a file is requested that is not in the array, the program would go and update the array of files by reading in the directory, but this time starting from the point it left off at the end of the initialization.
Is this possible? To start reading in the file names within a directory at a known point (the last file "read in") in the directory? Or do I have to start all the way from the beginning each time?
Or is there a better way of doing this?
Thanks.

As Andrew said, I would confirm that this is actually a problem before trying to solve it.
If you can discount the possibility of files being created out of sequence, that is, no file
you wish to process before another file will ever be created after that file, then you can use this method.
First, read the entire directory listing into an array or vector. Then, when iterating files, just iterate the vector. Finally, if you get a file not found or reach the end of the vector, refresh it just in case more have been created.
You will no doubt want to encapsulate this logic into some sort of context object, which remembers the last file read. You could also optimise by sorting the vector.

Related

Get the oldest file in a directory

my problem is that I want to store the five oldest files from a directory in a list. Since the software should be safe against time changes done by the user I'm looking for a possibility to extract this information without using the file time. Is there any internal counter implemented in windows that can be extracted from the files meta-data? Or is it possible to set such a counter during the file creation (e.g. in a specific field of the meta-information)?
Best regards
NouGHt
Are you saying you don't want to use "the file time" in case users
have modified the files since they were created?
If that is the case, your problem may solved with the information that
Windows stores three distinct
FILETIMEs
for each file: 1) the file's creation time,
2) the file's last access time, 3) the file's last write time.
You would want the first of these. You can get all of them by calling
the win api
GetFileAttributesEx
function passing the file name. The
WIN32_FILE_ATTRIBUTE_DATA
structure that is returned to you contains all three times.

Remove A Line Of Text With Filestreams (C++)

I have a large text file.
Each time my program runs, it needs to read in the first line, remove it, and put that data back into the bottom of the file.
Is there a way to accomplish this task without having to read in every part of the file?
It would be great to follow this example of pseudo code:
1. Open file stream for reading/writing
2. data = first line of file
3. remove first line from file <-- can I do this?
4. Close file stream
5. Open file stream for appending
6. write data to file
7. Close file stream
The reason I'm trying to avoid reading everything in is because the program runs at a specific time each day. I don't want the delay to be longer each time the file gets bigger.
All the solutions I've found require that the program process the whole file. If C++ filestreams are not able to accomplish this, I'm up for whatever alternative is quick and efficient for my C++ program to execute.
thanks.
The unfortunate truth is that no filesystem on a modern OS is designed to do this. The only way to remove something from the beginning of a file is to copy the contents to a new file, except for the first bit. There's simply no way to do precisely what you want to do.
But hopefully you can do a bit of redesign. Maybe each entry could be a record in a database -- then the reordering can be done very efficiently. Or perhaps the file could contain fixed-size records, and you could use a second file of indexes to specify record order, so that rearranging the file was just a matter of updating the indices.

Fastest way to erase part of file in C++

I wonder which is the fastest way to erase part of a file in c++.
I know the way of write a second file and skip the part you want. But i think is slow when you work with big files.
What about database system, how they remove records so fast?
A database keeps an index, with metadata listing which parts of the file are valid and which aren't. To delete data, just the index is updated to mark that section invalid, and the main file content doesn't have to be changed at all.
Database systems typically just mark deleted records as deleted, without physically recovering the unused space. They may later reuse the space occupied by deleted records. That's why they can delete parts of a database quickly.
The ability to quickly delete a portion of a file depends on the portion of the file you wish to delete. If the portion of the file that you are deleting is at the end of the file, you can simply truncate the file, using OS calls.
Deleting a portion of a file from the middle is potentially time consuming. Your choice is to either move the remainder of the file forward, or to copy the entire file to a new location, skipping the deleted portion. Either way could be time consuming for a large file.
The fastest way I know is to open data file as a Persisted memory-mapped file and simple move over the part you don't need. Would be faster than moving to second file but still not too fast with big files.

How to get the latest file from a directory

This is specific to creating a logfiles. When I am connecting to a server using my application, it writes the details to a log file. When the log file reaches to specific size let's say 1MB then I create another file named LOG2.log.
Now While Writing back to log file , there are two or even more log files and I want to pick up the latest one. I don not want to traverse through all the files in that directory and the pick up the file, as this will take processing time, Is there any other way to get the last created file or log file in the directory.
Your best bet is to rotate log files, which is what gets done in Unix normally (generally via cron.)
One possible implementation is to keep 10 (or however many) old log files around, if your program detects that Log.log is over 1MB then move Log09.log to Log10.log, Log08.log to Log09.log, 7 to 8, 6 to 7, ... 2 to 3, and then Log.log to Log02.log. Finally, create a new Log.log file and continue recording.
This way you'll always write to Log.log and there's no filesystem mystery. In theory, this approach is scalable to ridiculous numbers of log files (more than you would ever reasonably need) and is more standard than writing to Log3023.log. Plus, one would always know where to find the current log.
I believe the answer is "stiff". You have to iterate and find the most recent one yourself, as the OS won't keep indices for each possible sort order around on the off chance someone may want them.
Are you able to modify the server? If so, perhaps introduce a LASTLOG.log file that either contains the name of the latest log file, or the actual contents of it.
Otherwise, Tony's right.. No real way to do it other than iterate through yourself.
How about the elegant :
ls -t | head -n 1
The most efficient way is to use a specialized function to go through all entries (as NTFS or FAT don't index by time), but ignore what you don't need. For that, call FindFirstFileEx with info level FindExInfoBasic. This skips 8.3 name resolution.

Writing to the middle of the file (without overwriting data)

In windows is it possible through an API to write to the middle of a file without overwriting any data and without having to rewrite everything after that?
If it's possible then I believe it will obviously fragment the file; how many times can I do it before it becomes a serious problem?
If it's not possible what approach/workaround is usually taken? Re-writing everything after the insertion point becomes prohibitive really quickly with big (ie, gigabytes) files.
Note: I can't avoid having to write to the middle. Think of the application as a text editor for huge files where the user types stuff and then saves. I also can't split the files in several smaller ones.
I'm unaware of any way to do this if the interim result you need is a flat file that can be used by other applications other than the editor. If you want a flat file to be produced, you will have to update it from the change point to the end of file, since it's really just a sequential file.
But the italics are there for good reason. If you can control the file format, you have some options. Some versions of MS Word had a quick-save feature where they didn't rewrite the entire document, rather they appended a delta record to the end of the file. Then, when re-reading the file, it applied all the deltas in order so that what you ended up with was the right file. This obviously won't work if the saved file has to be usable immediately to another application that doesn't understand the file format.
What I'm proposing there is to not store the file as text. Use an intermediate form that you can efficiently edit and save, then have a step which converts that to a usable text file infrequently (e.g., on editor exit). That way, the user can save as much as they want but the time-expensive operation won't have as much of an impact.
Beyond that, there are some other possibilities.
Memory-mapping (rather than loading) the file may provide efficiences which would speed things up. You'd probably still have to rewrite to the end of the file but it would be happening at a lower level in the OS.
If the primary reason you want fast save is to start letting the user keep working (rather than having the file available to another application), you could farm the save operation out to a separate thread and return control to the user immediately. Then you would need synchronisation between the two threads to prevent the user modifying data yet to be saved to disk.
The realistic answer is no. Your only real choices are to rewrite from the point of the modification, or build a more complex format that uses something like an index to tell how to arrange records into their intended order.
From a purely theoretical viewpoint, you could sort of do it under just the right circumstances. Using FAT (for example, but most other file systems have at least some degree of similarity) you could go in and directly manipulate the FAT. The FAT is basically a linked list of clusters that make up a file. You could modify that linked list to add a new cluster in the middle of a file, and then write your new data to that cluster you added.
Please note that I said purely theoretical. Doing this kind of manipulation under a complete unprotected system like MS-DOS would have been difficult but bordering on reasonable. With most newer systems, doing the modification at all would generally be pretty difficult. Most modern file systems are also (considerably) more complex than FAT, which would add further difficulty to the implementation. In theory it's still possible -- in fact, it's now thoroughly insane to even contemplate, where it was once almost reasonable.
I'm not sure about the format of your file but you could make it 'record' based.
Write your data in chunks and give each chunk an id.
Id could be data offset in file.
At the start of the file you could
have a header with a list of ids so
that you can read records in
order.
At the end of 'list of ids' you could point to another location in the file (and id/offset) that stores another list of ids
Something similar to filesystem.
To add new data you append them at the end and update index (add id to the list).
You have to figure out how to handle delete record and update.
If records are of the same size then to delete you can just mark it empty and next time reuse it with appropriate updates to index table.
Probably the most efficient way to do this (if you really want to do it) is to call ReadFileScatter() to read the chunks before and after the insertion point, insert the new data in the middle of the FILE_SEGMENT_ELEMENT[3] list, and call WriteFileGather(). Yes, this involves moving bytes on disk. But you leave the hard parts to the OS.
If using .NET 4 try a memory-mapped file if you have an editor-like application - might jsut be the ticket. Something like this (I didn't type it into VS so not sure if I got the syntax right):
MemoryMappedFile bigFile = MemoryMappedFile.CreateFromFile(
new FileStream(#"C:\bigfile.dat", FileMode.Create),
"BigFileMemMapped",
1024 * 1024,
MemoryMappedFileAccess.ReadWrite);
MemoryMappedViewAccessor view = MemoryMapped.CreateViewAccessor();
int offset = 1000000000;
view.Write<ObjectType>(offset, ref MyObject);
I noted both paxdiablo's answer on dealing with other applications, and Matteo Italia's comment on Installable File Systems. That made me realize there's another non-trivial solution.
Using reparse points, you can create a "virtual" file from a base file plus deltas. Any application unaware of this method will see a continuous range of bytes, as the deltas are applied on the fly by a file system filter. For small deltas (total <16 KB), the delta information can be stored in the reparse point itself; larger deltas can be placed in an alternative data stream. Non-trivial of course.
I know that this question is marked "Windows", but I'll still add my $0.05 and say that on Linux it is possible to both insert or remove a lump of data to/from the middle of a file without either leaving a hole or copying the second half forward/backward:
fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, offset, len)
fallocate(fd, FALLOC_FL_INSERT_RANGE, offset, len)
Again, I know that this probably won't help the OP but I personally landed here searching for a Linix-specific answer. (There is no "Windows" word in the question, so web search engine saw no problem with sending me here.