How to get the latest file from a directory

How to get the latest file from a directory - c++

This is specific to creating a logfiles. When I am connecting to a server using my application, it writes the details to a log file. When the log file reaches to specific size let's say 1MB then I create another file named LOG2.log.
Now While Writing back to log file , there are two or even more log files and I want to pick up the latest one. I don not want to traverse through all the files in that directory and the pick up the file, as this will take processing time, Is there any other way to get the last created file or log file in the directory.

Your best bet is to rotate log files, which is what gets done in Unix normally (generally via cron.)
One possible implementation is to keep 10 (or however many) old log files around, if your program detects that Log.log is over 1MB then move Log09.log to Log10.log, Log08.log to Log09.log, 7 to 8, 6 to 7, ... 2 to 3, and then Log.log to Log02.log. Finally, create a new Log.log file and continue recording.
This way you'll always write to Log.log and there's no filesystem mystery. In theory, this approach is scalable to ridiculous numbers of log files (more than you would ever reasonably need) and is more standard than writing to Log3023.log. Plus, one would always know where to find the current log.

I believe the answer is "stiff". You have to iterate and find the most recent one yourself, as the OS won't keep indices for each possible sort order around on the off chance someone may want them.

Are you able to modify the server? If so, perhaps introduce a LASTLOG.log file that either contains the name of the latest log file, or the actual contents of it.
Otherwise, Tony's right.. No real way to do it other than iterate through yourself.

How about the elegant :
ls -t | head -n 1

The most efficient way is to use a specialized function to go through all entries (as NTFS or FAT don't index by time), but ignore what you don't need. For that, call FindFirstFileEx with info level FindExInfoBasic. This skips 8.3 name resolution.

Related

Get the oldest file in a directory

my problem is that I want to store the five oldest files from a directory in a list. Since the software should be safe against time changes done by the user I'm looking for a possibility to extract this information without using the file time. Is there any internal counter implemented in windows that can be extracted from the files meta-data? Or is it possible to set such a counter during the file creation (e.g. in a specific field of the meta-information)?
Best regards
NouGHt

Are you saying you don't want to use "the file time" in case users
have modified the files since they were created?
If that is the case, your problem may solved with the information that
Windows stores three distinct
FILETIMEs
for each file: 1) the file's creation time,
2) the file's last access time, 3) the file's last write time.
You would want the first of these. You can get all of them by calling
the win api
GetFileAttributesEx
function passing the file name. The
WIN32_FILE_ATTRIBUTE_DATA
structure that is returned to you contains all three times.

Read in a directory from a given file point in C++

I have two programs that will be reading / writing files to the same directory at the same time (but not to the same exact files at the same time). I have the writing portion done, but I am struggling to get a half way decent and working implementation of the reading directory portion.
The files within the directory follow the following naming scheme:
Image-[INDEX]-[KEY/DEL]--[TIMESTAMP]
[INDEX] increments up from 000000, [KEY/DEL] alternates based on whether the image is a key or a delta frame and [TIMESTAMP] is the Unix / Linux epoch time at file creation.
Right now, the reading program reads in the directory (using the dirent.h library) one file at a time every time it needs to find an image within the directory. When the directory gets extremely large, I would imagine that this operation / method will quickly become extremely resource intensive, and eventually fail. So, I am trying to find an alternative method. I was thinking of reading in the entire directory at initialization, and saving the file information in an array to access / use later in the program. Then, when a file is requested that is not in the array, the program would go and update the array of files by reading in the directory, but this time starting from the point it left off at the end of the initialization.
Is this possible? To start reading in the file names within a directory at a known point (the last file "read in") in the directory? Or do I have to start all the way from the beginning each time?
Or is there a better way of doing this?
Thanks.

As Andrew said, I would confirm that this is actually a problem before trying to solve it.
If you can discount the possibility of files being created out of sequence, that is, no file
you wish to process before another file will ever be created after that file, then you can use this method.
First, read the entire directory listing into an array or vector. Then, when iterating files, just iterate the vector. Finally, if you get a file not found or reach the end of the vector, refresh it just in case more have been created.
You will no doubt want to encapsulate this logic into some sort of context object, which remembers the last file read. You could also optimise by sorting the vector.

How to get file path from NTFS index number?

I have dwVolumeSerialNumber, nFileIndexHigh, nFileIndexLow values obtained from a call to GetFileInformationByHandle. How can I get file path from these values?

Because of hard links, there may be multiple paths that map to the given VolumeSerialNumber and FileIndex. To find all such paths:
Iterate volumes to find one whose root directory matches dwVolumeSerialNumber
Recursively enumerate all directories on the volume, skipping symbolic links and reparse points, to find all files with matching nFileIndexHigh and nFileIndexLow.
This can be quite time-consuming. If you really need to do this as fast as possible and your filesystem is NTFS, you can raw read the entire MFT into a buffer and parse it yourself. This will get all directories that fit inside an MFT entry in one fell swoop. The rest of the directories can be read through the OS or also through raw reads, depending on the amount of work you want to do. But any way you look at it, this is a lot of work and doesn't even apply to FAT, FAT32 or any other filesystem.
A better solution is probably to hang onto the original path if at all possible.

This MSDN article shows how to get the path from a file handle.
You use OpenFileById to open a file given its file ID but you also need an open file elsewhere on the same volume, I assume to get the volume serial number.
This blog posting raises an interesting issue that you need to pass in 24 for the structure size (worked out by looking at assembly code).
I leave it as an interesting exercise (I couldn't find an easy answer) how you go from a dwVolumeSerialNumber to having a valid other handle open for that volume or a file on that volume, but maybe you already have enough information for your case. One possibility is to iterate all mounted volumes calling GetVolumeInformation to find the one with matching serial number.
Note: If you don't have the file open then you may not be able to rely on the nFileIndexHigh/Low combo (aka file ID) as described in the BY_HANDLE_FILE_INFORMATION Structure notes, which warns it can change for FAT systems, but In the NTFS file system, a file keeps the same file ID until it is deleted.

Note: The original question had an error in it. Now that the question has been fixed this answer no longer applies.
In general you can't. The information you retrieved just tells you what disk the file is on and how big it is. It does not provide enough information to identify the actual file. Specifically:
dwVolumeSerialNumber identifies the volume, and
nFileSizeHigh and nFileSizeLow give you the size of the file
If the file happens to be the only file on that volume that is that exact size, you could search the volume for a file of that size. But in general this is both expensive and unreliable, so I don't recomment it.

Out of Core Implementation of a Quadtree

I am trying to build a Quadtree data structure(or let's just say a tree) on the secondary memory(Hard Disk).
I have a C++ program to do so and I use fopen to create the files. Also, I am using tesseral coding to store each cell in a file named with its corresponding code to store it on the disk in one directory.
The problem is that after creating about 1,100 files, fopen just returns NULL and stops creating new files. I can create further files manually in that directory, but using C++ it can not create any further files.
I know about max limit of inode on ext3 filesystem which is (from Wikipedia) 32,000 but mine is way less than that, also note that I can create files manually on the disk; just not through fopen.
Also, I really appreciate any idea regarding the best way to store a very dynamic quadtree on disk(I need the nodes to be in separate files and the quadtree might have a depth of 50).
Using nested directories is one idea, but I think it will slow down the performance because of following the links on the filesystem to access the file.
Thanks,
Nima

Whats the errno value of the failed fopen() call?
Do you keep the files you have created open? If yes you are most probably exceeding the maximum number of open files per process.

When you use directories as data structures, you delegate the work of maintaining that structure to the file system, which is not necessarily designed to do that.
Edit: Frank is probably right that you'v exceeded the number of available file descriptors. You can increase those, but that shows that you're also using internals of your ABI as a data structure. Slow and (as resources are exhausted) unstable.
Either code for a very specific OS installation, or use a SQL database.

I have no idea why fopen wouldn't work. Look at errno.
However, storing everything in one directory is a bad idea. When you add a lot of files, it will get slow. Having a directory for every level of the tree will also be slow.
Instead, combine multiple levels into one directory. You could, for example, have one directory for every four levels of the tree. This would limit the number of directories, amount of nesting, and number of files per directory, giving very good performance.

The limitation could come from:
stdio (C library). most 256 handles. Can be increased to 1024 (in VC, call _setmaxstdio)
OS kernel on the file hanldes per process (usually 1024).

Is there a O(1) way in windows api to concatenate 2 files?

Is there a O(1) way in windows API to concatenate 2 files?
O(1) with respect to not having to read in the entire second file and write it out to the file you want to append to. So as opposed to O(n) bytes processed.
I think this should be possible at the file system driver level, and I don't think there is a user mode API available for this, but I thought I'd ask.

If the "new file" is only going to be read by your application, then you can get away without actually concatenating them on disk.
You can just implement a stream interface that behaves as if the two files have been concatenated, and then use that stream as opposed to what ever the default filestream implementation used by your app framework is.
If that won't work for you, and you are using windows, you could always create a re parse point and a file system filter. I believe if you create a "mini filter" that it will run in user mode, but I'm not sure.
You can probably find more information about it here:
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx

No, there isn't.
The best you could hope for is O(n), where n is the length of the shorter of the two files.

From a theoretical perspective, this is possible (on-disk) provided that:
the second file is destroyed
the concatenation honours the filesystem's fragment alignment (e.g. occurs on a cluster boundary)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js