I have a string and it's going to be a filename . So i want to check if there is a special characters that i'm going to replace them so i won't be a problem when i'm going to create the file . is it a good practice to replace them with "_" ?
i' used this is it correct ? is there other characters excepts alphabet and number can be used on file name ? Which characters should I avoid in file names
String filename = ch.replaceAll(RegExp('[^A-Za-z0-9]'), '_');
The list of allowed filename characters depends on the underlying filesystem. On (most) Unix, anything except / and \0 is allowed. On Windows, the rules get weird. For example, you (usually) can't end a filename with a period; you can't name a file NUL, etc.
Other considerations: It would be confusing to allow spaces at the beginning/end of a filename. Spaces within a filename break certain tools (looking at you, make). Is your filesystem case-sensitive or case-preserving? Does it have a maximum filename length?
Which characters should I avoid in file names?
Wrong question. Do you have a particular need to allow "unusual" characters in filenames?
If these are machine-generated names, just do what you're doing (I prefer hyphens, but that's a stylistic decision). If these are user-generated filenames, just try saving the file -- if it fails, get the user to choose another name.
tl;dr: use URL-safe characters: [A-Za-z0-9_-]+.
Related
Need some guidance how to solve this one. Have 10 000s of files in multiple subfolders where the encoding got screwed up. Via ls command I see a filename named like this 'F'$'\366''ljesedel.pdf', that includes the ' at beginning and end. That's just one example where the Swedish characters åäö got wrong, in this example this should have been 'Följesedel.pdf'. If If I run
#>find .
Then I see a list of files like this:
./F?ljesedel.pdf
Not the same encoding. How on earth solving this one? The most obvious ways:
myvar='$'\366''
char="ö"
find . -name *$myvar* -exec rename 's/$myvar/ö' {} \;
and other possible ways fails since
find . -name cannot find it due to the ? instead of the "real" characters " '$'\366'' "
Any suggestions or guidance would be very much appreciated.
The first question is what encoding your terminal expects. Make sure that is UTF-8.
Then you need to find what bytes the actual filename contains, not just what something might display it as. You can do this with a perl oneliner like follows, run in the directory containing the file:
perl -E'opendir my $dh, "."; printf "%s: %vX\n", $_, $_ for grep { m/jesedel\.pdf/ } readdir $dh'
This will output the filename interpreted as UTF-8 bytes (if you've set your terminal to that) followed by the hex bytes it actually contains.
Using that you can determine what your search pattern should be. Your replacement must be the UTF-8 encoded representation of ö, which it will be by default as part of the command arguments if your terminal is set to that.
I'm not an expert - but it might not be a problem with the file name (which seems to hold the correct Unicode file name) - but with the way ls (and many other utilities) show the name to the terminal.
I was able to show the correct name by setting the terminal character encoding to Unicode. Also I've noticed the GUI programs (file manager, etc), were able to show the correct file name.
Gnome Terminal: "Terminal .. set character encoding - Unicode UTF8
It is still a challenge with many utilities to 'select' those files (e.g., REGEXP, wildcard). In few cases, you will have to select those character using '*' pattern. If this is a major issue considering using Ascii only - may be use the 'o' instead of 'ö'. Not sure if this is acceptable.
I have a char* which only contains ASCII characters (decimal: 32-126). I'm searching for a c++ function which escapes (add a backslash before the character) characters that have special meanings in the unix filesystem like '/' or '.'. I want to open the file with fopen later.
I'm not sure, if manually replacing would be a good option. I don't know all characters with special meanings. I also don't know if '?' or '*' would work with fopen.
Actually Unix (or more specific the SuS) disallows only the byte values '/' and '\0' in file names. Everything else actually is fair game. The exact (in the sense that they're immediately following and followed by a '/') strings "." and ".." are reserved to relative path access, but they are very well valid in a Unix path.
And of course any number and sequence of '.' is perfectly allowed in a Unix filename, as long as another character other than '/' or '\0' is part of the filename. Yes, newline, any control character, they're all perfectly valid Unix filenames.
Of course the file system you're using may have a different idea about what's permissible, but you were just asking about Unix.
Update:
Oh and it should be noted, that Unix doesn't specify dome "parse" method for filenames. Which essentially means, a filename is treated as a binary blob key into a key→value database. It also means, that there's no such thing as "escaping" for Unix filenames.
POSIX filenames don't have a concept of escape characters. There is no way to have a slash as an element of a filename (when the system renders filenames using Unicode you may be able to create a filename which looks as if it contains a slash, though). I think all other printable characters are just fine although using special characters like * and ? in filename will probably cause problems when people try use them from a shell.
Okay, after two days of searching the web and MSDN, I didn't found any real solution to this problem, so I'm gonna ask here in hope I've overlooked something.
I have open dialog window, and after I get location from selected file, it gives the string in following way C:\file.exe. For next part of mine program I need C:\\file.exe. Is there any Microsoft function that can solve this problem, or some workaround?
ofn.lpstrFile = fileName;
char fileNameStr[sizeof(fileName)+1] = "";
if (GetOpenFileName(&ofn))
strcpy(fileNameStr, fileName);
DeleteFile(fileName); // doesn't works, invalid path
I've posted only this part of code, because everything else works fine and isn't relevant to this problem. Any assistence is greatly appreciated, as I'm going mad in last two days.
You are confusing the requirement in C and C++ to escape backslash characters in string literals with what Windows requires.
Windows allows double backslashes in paths in only two circumstances:
Paths that begin with "\\?\"
Paths that refer to share names such as "\\myserver\foo"
Therefore, "C:\\file.exe" is never a valid path.
The problem here is that Microsoft made the (disastrous) decision decades ago to use backslashes as path separators rather than forward slashes like UNIX uses. That decision has been haunting Windows programmers since the early 1980s because C and C++ use the backslash as an escape character in string literals (and only in literals).
So in C or C++ if you type something like DeleteFile("c:\file.exe") what DeleteFile will see is "c:ile.exe" with an unprintable 0xf inserted between the colon and "ile.exe". That's because the compiler sees the backslash and interprets it to mean the next character isn't what it appears to be. In this case, the next character is an f, which is a valid hex digit. Therefore, the compiler converts "\f" into the character 0xf, which isn't valid in a file name.
So how do you create the path "c:\file.exe" in a C/C++ program? You have two choices:
"c:/file.exe"
"c:\\file.exe"
The first choice works because in the Win32 API (and only the API, not the command line), forward slashes in paths are accepted as path separators. The second choice works because the first backslash tells the compiler to treat the next character specially. If the next character is a hex digit, that's what you will get. If the next character is another backslash, it will be interpreted as exactly that and your string will be correct.
The library Boost.Filesystem "provides portable facilities to query and manipulate paths, files, and directories".
In short, you should not use strings as file or path names. Use boost::filesystem::path instead. You can still init it from a string or char* and you can convert it back to std::string, but all manipulations and decorations will be done correctly by the class.
Im guessing you mean convert "C:\file.exe" to "C:\\file.exe"
std::string output_string;
for (auto character : input_string)
{
if (character == '\\')
{
output_string.push_back(character);
}
output_string.push_back(character);
}
Please note it is actually looking for a single backslash to replace, the double backslash used in the code is to escape the first one.
i have a bunch of files.. they are all in this kind of file name
english words number.extension
or
english words Charaters.extension (Charaters mean Chinese, Japanese, Koren etc)
how can i write a Regexp to filter them, remove the number and non-english charater
so that they can become
english words.extension
-thx
For just 26 English letters you could use /[^A-Za-z]/ or /[^a-z]/i. I don't know what programming language you're using to give a more specific example.
If you don't mind being a bit verbose, you can make an explicit list of 'acceptable' characters and reject anything not on the list. For example:
for old_filename in `ls`; do
new_filename = `echo $old_filename |sed -e 's/[^a-zA-Z.-_ ]//g'`
mv $old_filename $new_filename
done
If the 'A-Z', etc character ranges are picking up some characters that you don't want (may or may not be an issue depending on your locale) then you can always list every letter individually.
Adjust the 'ls' call if you only want to pick up certain files in the directory (filter by extension, etc). You will run into problems if more than one file transforms into the same 'English-only' name, but you should be able to work around that by appending an extra character to the filename.
I need to escape all special characters and replace national characters and get "plain text" for a tablename.
string getTableName(string name)
My string could be "šárka65_%&." and I want to get string I can use in my database as a tablename.
Which DBMS?
In standard SQL, a name enclosed in double quotes is a delimited identifier and may contain any characters.
In MS SQL Server, a name enclosed in square brackets is a delimited identifier.
In MySQL, a name enclosed in back-ticks is a delimieted identifier.
You could simply choose to enclose the name in the appropriate markers.
I had a feeling that wasn't what you wanted...
What codeset is your string in? It seems to be UTF-8 by the time it gets to my browser. Do you need to be able to invert the mapping unambiguously? That is harder.
You can use many schemes to map the information:
One simple minded one is simply to hex-encode everything, using a marker (X) to protect against leading digits:
XC5A1C3A1726B6136355F25262E
One slightly less simple minded one is hex-encode anything that is not already an ASCII alphanumeric or underscore.
XC5A1C3A1rka65_25262E
Or, as a comment suggests, you can devise a mapping table for accented Latin letters - indeed, a mapping table appropriately initialized will be the fastest approach. The input is the character in the source string; the output is the desired mapped character or characters. If you use an 8-bit character set, this is entirely manageable. If you use full Unicode, it is a lot less manageable (not least, how do you map all the Han syllabary to ASCII?).
Or ...