std::filesystem::recursive_directory_iterator with consistent path separation? - c++

I just noticed that std::filesystem::recursive_directory_iterator uses different path separateors (i.e. / vs \) depending on whether it's on Windows or Linux, is there a way to make it return paths with '/' to make it consistent across systems?
This is how I am getting the paths:
for(auto& path: fs::recursive_directory_iterator(dir_path))
{
// Skip directories in the enumeration.
if(fs::is_directory(path)) continue;
string path_str = path.path().string();
}
What I mean is, the contents of path_str will be different between the two OSs (because the separators will be different), I would like them to be the same. I could just replace them on the final string, but that uses more cycles than if I can instruct the stl to use '/' for everything isntead.

So, your problem has nothing to do with recursive_directory_iterator, which iterates on directory_entry objects, not paths. Your confusion probably stems from the fact that directory entries are implicitly convertible to paths, so you can use them as such.
Your problem is really about path::string(), which, as the documentation states, uses the native format (i.e. with a platform dependent separator). You would get the same problem regardless of how you get your path.
If you want to get / as the directory separator, use path::generic_string() instead to get the path in generic format.
for(auto& dir_entry: fs::recursive_directory_iterator(dir_path))
{
if(dir_entry.is_directory()) continue;
string path_str = dir_entry.path().generic_string();
}

Related

Non-empty relative current path, in standard C++?

With C++17 (or Boost::filesystem), we can get the current path / current working directory using filesystem::current_path(). However - that gives us an absolute path.
We could also use an empty path as the relative current path - sometimes.
But - is it possible to obtain, portably, the equivalent of "." or "./" ? i.e. a non-empty relative current path?
Use "." for the current directory.
std::filesystem will recognize "." as representing the current directory / path - regardless of the platform you're on. So, it will not just happen to work on Linux/Windows, it is guaranteed to work.
auto relative_current path = std::filesystem::path{"."};
Relevant wording in the standard: fs.path.generic.3.
This answer is basically due to #NathanOliver...

Difference between \\ and / when working with path directories

Whenever I do any sort of file read or write, I always use the '/'
but I've seen some examples where the value of the given filepath is '\\' instead.
So what's the difference?
Am I doing it wrong or introducing bugs if I use '/'?
There's nothing wrong with using / on systems that support it. In fact, on UNIX systems it's the only thing that works.
Windows supports both / and \ as path separator in most situations.
Note that a platform agnostic option is available in the form of std::filesystem::path.
The common convention used for managing paths in Windows is just reciprocal of Linux. It's formatted something like: C:\abc\abc.txt, although it's your own choice which method you would prefer to access/write the file or folder.
This \\ is an escape sequence to print a common backslash to read or write the file. Note that you won't able to use a single backslash between string value since it reads next character as an escape sequence (e.g. \n, \b, etc.)
That's it.

POSIX or Linux API function to get file extension from path

I need a POSIX or Linux API function that takes file path and returns this file's extension. Every platform should have one, but I can't it for Linux. What's it called?
First use strrchr to find the last '.' in the pathname. If it doesn't exist, there's no "extension".
Next, use strchr to check whether there's any '/' after the last '.'. If so, the last '.' is in a directory component, not the filename, so there's no extension.
Otherwise, you found the extension. You can use the pointer to the position one past the '.' directly as a C string. No need to copy it to new storage unless the original string will be freed or clobbered before you use it.
Note: The above is assuming you define "extension" as only the final '.'-delimited component. If you want to consider things like .tar.gz and .cpp.bak as extensions, a slightly different approach works:
First, use strrchr to find the final '/'. If not found, treat the start of the string as your result.
Second, use strchr to find the first '.' starting from the position you just found. The result is your extension.
I don't think there's a default function for this.
In my filesystem library, I just apply string operations.
First, I get the filename with extension from the full path, looking for / separators and extracting everything after the last one. Then, I grab everything after the first . dot character, including the dot itself. It worked well so far.
Remember that some system files can start with a . dot character - so check if the filename begins with the dot character before extracting the extension.
Algorithm
Get file name from full path by removing folder names from the left:
/home/test/.myfile.cpp.bak ->
/test/.myfile.cpp.bak ->
/.myfile.cpp.bak ->
.myfile.cpp.bak
Check if the file name begins with .:
If it does, remove it from current file name .myfile.cpp.bak -> myfile.cpp.bak
Now, extract everything after the first . you encounter from the left (if you want multiple extensions) - otherwise, extract everything after the last . from the left
myfile.cpp.bak -> .cpp.bak (first case)
myfile.cpp.bak -> .bak (second case)
Including boost for filesystem is a bit too much. But as boost implementation reach TR2 and is implemented in visual studio it's maybe time to start looking at it.http://cpprocks.com/introduction-to-tr2-filesystem-library-in-vs2012/http://msdn.microsoft.com/en-us/library/hh874694.aspx
What seems to me the best way to solve this problem (in absence of API function, which itself is weird) is to combine Vittorio's and R.'s answers with basename function that takes a path and returns the file name, if the path points to a file: http://linux.die.net/man/3/basename
I also convert the resulting string to UTF-16 with mbstowcs and do all the finding with std::wstring:
std::wstring fileExtFromPath (const char * path)
{
const char * fileName = basename(filePath);
wchar_t buffer [MAX_PATH] = {0}; // Use mblen if you don't like MAX_PATH
const std::wstring fileNameW (buffer);
const size_t pointPosition = fileNameW.rfind(L".");
const std::wstring fileExtW = pointPosition == 0 ? std::wstring() : fileNameW.substr( + 1);
return fileExtW;
}

Path and string chopping in Powershell

I'm doing some work involving some automated file moving, and these files contain relative paths that must be maintained. Unfortunately, I'm finding the facilities offered by System.IO.Path, System.String, and Powershell's operators to be a little ill-equipped to handle my work gracefully.
One function that would be very useful to me is the notion of a subtraction of paths, that would work in theory like subtracting vectors. Conceptually, A - B gets you a path from B to A. In the application to paths, D:\A\B\C\D - D:\A\B\ = \C\D. Likewise, D:\A\B\ - D:\A\B\C\D = \..\.. in this case. I can accept, for now, that this only makes sense when one path is wholly contained in the other.
This seems to consist of two steps: 1) determine containment of one path in the other. 2) remove the contained path from the containing path. 3) Optionally, replace folder names with the parent .. symbol based on the sidedness of the operation.
As I am concerned with NTFS, I need both containment and replacement operations to be case-insensitive. For containment, I can use select-string since it is case-insensitive, and allows the -simple switch which allows me to use a path without hacking it apart to escape them for regex.
Removing the string from the other is a little more annoying though. System.IO.Path has nothing for this, System.String's pertinent methods are all case-sensitive, and powershell's operators all require massaging so that the regex will match things.
All this seems like more work than it should be--are there any tools I'm missing that would better handle this?
Determine containment - convert your paths to absolute paths (if not already). You can use Resolve-Path for this. Then you can use $path1.StartsWith($path2, 'OrdinalIgnoreCase') to test for containment.
Remove contained path - $path1.Substring($path2.length)
Replace parent folder names with ... - although I don't have the regex off the top of my head, I'm pretty sure you could do this with a regular expression search/replace using PowerShell's -replace operator
filedirectorypath, on CodePlex, may offer what you need
It's not a PowerShell specific API, but that's no reason not to use it from PowerShell.
Benefits of the NDepend.Helpers.FilePathDirectory over the .NET Framework class System.IO.Path include:
Strongly typed File/Directory path.
Relative / absolute path conversion.
Path normalization API
Path validity check API
Path comparison API
Path browsing API.
Path rebasing API
List of path operations (TryGetCommonRootDirectory, GetListOfUniqueDirsAndUniqueFileNames, list equality…)

In C++ how do i validate a file or folder path?

A user input string for a destination path can potentially contain spaces or other invalid characters.
Example: " C:\users\username\ \directoryname\ "
Note that this has whitespace on both sides of the path as well as an invalid folder name of just a space in the middle. Checking to see if it is an absolute path is insufficient because that only really handles the leading whitespace. Removing trailing whitespace is also insufficient because you're still left with the invalid space-for-folder-name in the middle.
How do i prove that the path is valid before I attempt to do anything with it?
The only way to "prove" the path is valid is to open it.
SHLWAPI provides a set of path functions which can be used to canonicalize the path or verify that a path seems to be valid. This can be useful to reject obviously bad paths but you still cannot trust that the path is valid without going through the file system.
With NTFS, I believe the path you give is actually valid (though Explorer may not allow you to create a directory with only a space.)
The Boost Filesystem library provides helpers to manipulate files, paths and so... Take a look at the simple ls example and the exists function.
I use GetFileAttributes for checking for existence. Works for both folders (look for the FILE_ATTRIBUTE_DIRECTORY flag in the returned value) and for files. I've done this for years, never had a problem.
If you don't want to open the file you can also use something like the access() function on POSIX-like platforms or _access() and friends on Windows. However, I like the Boost.Filesystem method Ricardo pointed out.