I need a POSIX or Linux API function that takes file path and returns this file's extension. Every platform should have one, but I can't it for Linux. What's it called?
First use strrchr to find the last '.' in the pathname. If it doesn't exist, there's no "extension".
Next, use strchr to check whether there's any '/' after the last '.'. If so, the last '.' is in a directory component, not the filename, so there's no extension.
Otherwise, you found the extension. You can use the pointer to the position one past the '.' directly as a C string. No need to copy it to new storage unless the original string will be freed or clobbered before you use it.
Note: The above is assuming you define "extension" as only the final '.'-delimited component. If you want to consider things like .tar.gz and .cpp.bak as extensions, a slightly different approach works:
First, use strrchr to find the final '/'. If not found, treat the start of the string as your result.
Second, use strchr to find the first '.' starting from the position you just found. The result is your extension.
I don't think there's a default function for this.
In my filesystem library, I just apply string operations.
First, I get the filename with extension from the full path, looking for / separators and extracting everything after the last one. Then, I grab everything after the first . dot character, including the dot itself. It worked well so far.
Remember that some system files can start with a . dot character - so check if the filename begins with the dot character before extracting the extension.
Algorithm
Get file name from full path by removing folder names from the left:
/home/test/.myfile.cpp.bak ->
/test/.myfile.cpp.bak ->
/.myfile.cpp.bak ->
.myfile.cpp.bak
Check if the file name begins with .:
If it does, remove it from current file name .myfile.cpp.bak -> myfile.cpp.bak
Now, extract everything after the first . you encounter from the left (if you want multiple extensions) - otherwise, extract everything after the last . from the left
myfile.cpp.bak -> .cpp.bak (first case)
myfile.cpp.bak -> .bak (second case)
Including boost for filesystem is a bit too much. But as boost implementation reach TR2 and is implemented in visual studio it's maybe time to start looking at it.http://cpprocks.com/introduction-to-tr2-filesystem-library-in-vs2012/http://msdn.microsoft.com/en-us/library/hh874694.aspx
What seems to me the best way to solve this problem (in absence of API function, which itself is weird) is to combine Vittorio's and R.'s answers with basename function that takes a path and returns the file name, if the path points to a file: http://linux.die.net/man/3/basename
I also convert the resulting string to UTF-16 with mbstowcs and do all the finding with std::wstring:
std::wstring fileExtFromPath (const char * path)
{
const char * fileName = basename(filePath);
wchar_t buffer [MAX_PATH] = {0}; // Use mblen if you don't like MAX_PATH
const std::wstring fileNameW (buffer);
const size_t pointPosition = fileNameW.rfind(L".");
const std::wstring fileExtW = pointPosition == 0 ? std::wstring() : fileNameW.substr( + 1);
return fileExtW;
}
Related
I have a string and it's going to be a filename . So i want to check if there is a special characters that i'm going to replace them so i won't be a problem when i'm going to create the file . is it a good practice to replace them with "_" ?
i' used this is it correct ? is there other characters excepts alphabet and number can be used on file name ? Which characters should I avoid in file names
String filename = ch.replaceAll(RegExp('[^A-Za-z0-9]'), '_');
The list of allowed filename characters depends on the underlying filesystem. On (most) Unix, anything except / and \0 is allowed. On Windows, the rules get weird. For example, you (usually) can't end a filename with a period; you can't name a file NUL, etc.
Other considerations: It would be confusing to allow spaces at the beginning/end of a filename. Spaces within a filename break certain tools (looking at you, make). Is your filesystem case-sensitive or case-preserving? Does it have a maximum filename length?
Which characters should I avoid in file names?
Wrong question. Do you have a particular need to allow "unusual" characters in filenames?
If these are machine-generated names, just do what you're doing (I prefer hyphens, but that's a stylistic decision). If these are user-generated filenames, just try saving the file -- if it fails, get the user to choose another name.
tl;dr: use URL-safe characters: [A-Za-z0-9_-]+.
I just noticed that std::filesystem::recursive_directory_iterator uses different path separateors (i.e. / vs \) depending on whether it's on Windows or Linux, is there a way to make it return paths with '/' to make it consistent across systems?
This is how I am getting the paths:
for(auto& path: fs::recursive_directory_iterator(dir_path))
{
// Skip directories in the enumeration.
if(fs::is_directory(path)) continue;
string path_str = path.path().string();
}
What I mean is, the contents of path_str will be different between the two OSs (because the separators will be different), I would like them to be the same. I could just replace them on the final string, but that uses more cycles than if I can instruct the stl to use '/' for everything isntead.
So, your problem has nothing to do with recursive_directory_iterator, which iterates on directory_entry objects, not paths. Your confusion probably stems from the fact that directory entries are implicitly convertible to paths, so you can use them as such.
Your problem is really about path::string(), which, as the documentation states, uses the native format (i.e. with a platform dependent separator). You would get the same problem regardless of how you get your path.
If you want to get / as the directory separator, use path::generic_string() instead to get the path in generic format.
for(auto& dir_entry: fs::recursive_directory_iterator(dir_path))
{
if(dir_entry.is_directory()) continue;
string path_str = dir_entry.path().generic_string();
}
I have a char* which only contains ASCII characters (decimal: 32-126). I'm searching for a c++ function which escapes (add a backslash before the character) characters that have special meanings in the unix filesystem like '/' or '.'. I want to open the file with fopen later.
I'm not sure, if manually replacing would be a good option. I don't know all characters with special meanings. I also don't know if '?' or '*' would work with fopen.
Actually Unix (or more specific the SuS) disallows only the byte values '/' and '\0' in file names. Everything else actually is fair game. The exact (in the sense that they're immediately following and followed by a '/') strings "." and ".." are reserved to relative path access, but they are very well valid in a Unix path.
And of course any number and sequence of '.' is perfectly allowed in a Unix filename, as long as another character other than '/' or '\0' is part of the filename. Yes, newline, any control character, they're all perfectly valid Unix filenames.
Of course the file system you're using may have a different idea about what's permissible, but you were just asking about Unix.
Update:
Oh and it should be noted, that Unix doesn't specify dome "parse" method for filenames. Which essentially means, a filename is treated as a binary blob key into a key→value database. It also means, that there's no such thing as "escaping" for Unix filenames.
POSIX filenames don't have a concept of escape characters. There is no way to have a slash as an element of a filename (when the system renders filenames using Unicode you may be able to create a filename which looks as if it contains a slash, though). I think all other printable characters are just fine although using special characters like * and ? in filename will probably cause problems when people try use them from a shell.
Okay, after two days of searching the web and MSDN, I didn't found any real solution to this problem, so I'm gonna ask here in hope I've overlooked something.
I have open dialog window, and after I get location from selected file, it gives the string in following way C:\file.exe. For next part of mine program I need C:\\file.exe. Is there any Microsoft function that can solve this problem, or some workaround?
ofn.lpstrFile = fileName;
char fileNameStr[sizeof(fileName)+1] = "";
if (GetOpenFileName(&ofn))
strcpy(fileNameStr, fileName);
DeleteFile(fileName); // doesn't works, invalid path
I've posted only this part of code, because everything else works fine and isn't relevant to this problem. Any assistence is greatly appreciated, as I'm going mad in last two days.
You are confusing the requirement in C and C++ to escape backslash characters in string literals with what Windows requires.
Windows allows double backslashes in paths in only two circumstances:
Paths that begin with "\\?\"
Paths that refer to share names such as "\\myserver\foo"
Therefore, "C:\\file.exe" is never a valid path.
The problem here is that Microsoft made the (disastrous) decision decades ago to use backslashes as path separators rather than forward slashes like UNIX uses. That decision has been haunting Windows programmers since the early 1980s because C and C++ use the backslash as an escape character in string literals (and only in literals).
So in C or C++ if you type something like DeleteFile("c:\file.exe") what DeleteFile will see is "c:ile.exe" with an unprintable 0xf inserted between the colon and "ile.exe". That's because the compiler sees the backslash and interprets it to mean the next character isn't what it appears to be. In this case, the next character is an f, which is a valid hex digit. Therefore, the compiler converts "\f" into the character 0xf, which isn't valid in a file name.
So how do you create the path "c:\file.exe" in a C/C++ program? You have two choices:
"c:/file.exe"
"c:\\file.exe"
The first choice works because in the Win32 API (and only the API, not the command line), forward slashes in paths are accepted as path separators. The second choice works because the first backslash tells the compiler to treat the next character specially. If the next character is a hex digit, that's what you will get. If the next character is another backslash, it will be interpreted as exactly that and your string will be correct.
The library Boost.Filesystem "provides portable facilities to query and manipulate paths, files, and directories".
In short, you should not use strings as file or path names. Use boost::filesystem::path instead. You can still init it from a string or char* and you can convert it back to std::string, but all manipulations and decorations will be done correctly by the class.
Im guessing you mean convert "C:\file.exe" to "C:\\file.exe"
std::string output_string;
for (auto character : input_string)
{
if (character == '\\')
{
output_string.push_back(character);
}
output_string.push_back(character);
}
Please note it is actually looking for a single backslash to replace, the double backslash used in the code is to escape the first one.
So I need to parse the input of the user in the following way:
If the user enters
C:\Program\Folder\NextFolder\File.txt
OR
C:\Program\Folder\NextFolder\File.txt\
Then I want to remove the file and just save
C:\Program\Folder\NextFolder\
I essentially want to find the first occurrence of \ starting at the end and if they put a trailing slash then I can find the second occurrence. I can decifer first or second with this code:
input.substr(input.size()-1,1)!="/"
But I don't understand how to find the first occurrence starting from the end. Any ideas?
This
input.substr(input.size()-1,1)!="/"
is very inefficient*. Use:
if( ! input.empty() && input[ input.length() - 1 ] == '/' )
{
// something
}
Finding the first occurrence of something, starting from the end is the same as finding the last "something", starting from the beginning. You may use find_last_of, or rfind Or, you may even use standard find, combined with rbegin and rend
*std::string::substr creates one substring, "/" probably creates another (depends on std::string::operator!=), compares the two strings and destroys the temp objects.
Note that
C:\Program\Folder\NextFolder\File.txt\
is not a path to a file, it's a directory.
If your input is of type std::string( that I think it is ) you can search it using string::find for normal search and string::rfind for reverse search( end to start ) and also to check last character you don't need and you shouldn't use substr, since it create a new instance of string just to check one character. You may just say if( input.back() == '/' )
If you are using C++ strings, then try the reverse iterator on the strings, to write your own logic on what is acceptable and what is not. There is a clear example in the link I provided.
From what I guessed, you are trying to store the directory name given a path which could be end with a file or a directory.
If that is the case, you are better of removing the trailing '\' and checking if it is a directory, and stop if it is, or else proceed if it is not.
Alternately, you can try splitting the string on '\' into two parts. Some related notes here.
If those are actual file names, (looks like you are using windows), so try the _splitpath function as well.