Filename with an extra ":" or a "-" c++ - c++

I want to create a filename with characters like ":","-".
i tried the following code to append the date and time to my filename.
Str.Format(_T("%d-%d-%d-%d:%d:%d.log"),systemTime.wDay ,systemTime.wMonth ,systemTime.wYear,systemTime.wHour,systemTime.wMinute,systemTime.wSecond);
std::wstring NewName=filename.c_str() + Str;
MoveFileEx(oldFilename.c_str(), NewName.c_str(), 2 )
MoveFileEx fails with windows ErrorCode 123(ERROR_INVALID_NAME).So i think the issue is with my new Filename which contain ":" and "-"
Thanks,

Indeed, you cannot use the : character in windows file names. Replace it with something else. If a program depends on the name then modify it to interpret the alternative delimiter.

"I want to create..." No you don't. Different systems impose different constraints on what is legal in a filename. Most modern systems do allows fairly long names (say more than a 100 characters), and don't impose a format on them (although Windows does still handle anything after the last ., if there is one, specially, so you want to be careful there). If you're not concerned about portability, you can simply follow the rules of the system you're on: under Unix, no '/' or '\0' (but I'd also avoid anything a Unix shell would consider a meta-character: anything in ()[]{}<>!$|?*" \ and the backtick, at least), and I'd avoid starting a filename with a '-'. Windows formally forbids anything in <>:"/\|?*; here to, I'd avoid anything other programs might consider special (including using two %, which could be interpreted as a shell variable), and I'd also be careful that if there was a ., the final .something was meaningful to the system. (If the filename already ends with something like .log, there's no problem with additional dots before that.)
In most cases, it's probably best to be conservative; you never know what system you'll be using in the future. In my own work (having been burned by creating a filename witha a colon under Linux, and not being able to even delete it later under Windows), I've pretty much adopted the rule of only allowing '-', '_' and the alphanumeric characters (and forbidding filenames which differ only in case—more than a few people I know will only use lower case for letters). That's far more restrictive than just Unix and Windows, but who knows what the future holds. (It's also too liberal for some of the systems I've worked on in the past. These are hopefully gone for good, however.)

Windows does not allow few special character for creating as a file name.
But, for creating file name using current date and time you can use this formatting.
CTime CurrentTime( CTime::GetCurrentTime() );
SampleFileName = CurrentTime.Format( _T( " %m_%d_%y %I_%M_%S" ) ) + fileExtension;
For more time formating, Please refer this

Related

C++ : Which locale is considered by sprintf?

I am using two functions sprintf and snprintf for dealing with conversions of "double" to string,
In one of the case, the application which is running has a different locale than the Windows' locale. So, in such a scenario locale which is considered by sprintf is always of the application. Whereas, snprintf sometimes starts using Windows locale. As a consequence of this, decimal characters returned by both the methods are different and it causes a problem.
To provide further details,
I have a library in my project which builds a string from "double", this library uses snprintf to convert a double to string. Then I need to send this information to server which would understand "." (dot) only as a decimal symbol. Hence, I need to replace the local decimal character with a "." (dot). To find out the local decimal character (in order to replace it), I am using one of the libraries provided in my project which uses sprintf. Then I replace this character with a dot to get the final output.
Also, please note, sprintf is always considering locale of native application while snprintf sometimes considers locale of Windows.
As the problem is inconsistent, sorry for not providing a clear example.
So, what are the circumstances under which snprintf might behave differently?
Why am I getting such different behavior from these two methods?
How can I avoid it?
P.S. - I have to use these 2 methods, so please suggest a solution which would not require me to use any different methods.
Thanks.
The locale used by both sprintf and snprintf is not the Windows locale, but your application locale. As this locale is global to your application, any line of code in your program can change it.
In your case, the (not thread safe) solution may be to temporarily replace the locale for the snprintf call:
auto old = std::locale::global(std::locale::classic());
snprintf(...);
std::locale::global(old);
BTW, the "Windows locale" can be accessed via just std::locale("") , you don't need to know its exact name.

How to parse numbers like "3.14" with scanf when locale expects "3,14"

Let's say I have to read a file, containing a bunch of floating-point numbers. The numbers can be like 1e+10, 5, -0.15 etc., i.e., any generic floating-point number, using decimal points (this is fixed!). However, my code is a plugin for another application, and I have no control over what's the current locale. It may be Russian, for example, and the LC_NUMERIC rules there call for a decimal comma to be used. Thus, Pi is expected to be spelled as "3,1415...", and
sscanf("3.14", "%f", &x);
returns "1", and x contains "3.0", since it refuses to parse past the '.' in the string.
I need to ignore the locale for such number-parsing tasks.
How does one do that?
I could write a parseFloat function, but this seems like a waste.
I could also save the current locale, reset it temporarily to "C", read the file, and restore to the saved one. What are the performance implications of this? Could setlocale() be very slow on some OS/libc combo, what does it really do under the hood?
Yet another way would be to use iostreams, but again their performance isn't stellar.
My personal preference is to never use LC_NUMERIC, i.e. just call setlocale with other categories, or, after calling setlocale with LC_ALL, use setlocale(LC_NUMERIC, "C");. Otherwise, you're completely out of luck if you want to use the standard library for printing or parsing numbers in a standared form for interchange.
If you're lucky enough to be on a POSIX 2008 conforming system, you can use the uselocale and *_l family of functions to make the situation somewhat better. There are at least 2 basic approaches:
Leave the default locale unset (at least the troublesome parts like LC_NUMERIC; LC_CTYPE should probably always be set), and pass a locale_t object for the user's locale to the appropriate *_l functions only when you want to present things to the user in a way that meets their own cultural expectations; otherwise use the default C locale.
Have your code that needs to work with data for interchange keep around a locale_t object for the C locale, and either switch back and forth using uselocale when you need to work with data in a standard form for interchange, or use the appropriate *_l functions (but there is no scanf_l).
Note that implementing your own floating point parser is not easy and is probably not the right solution to the problem unless you're an expert in numerical computing. Getting it right is very hard.
POSIX.1-2008 specifies isalnum_l(), isalpha_l(), isblank_l(), iscntrl_l(), isdigit_l(), isgraph_l(), islower_l(), isprint_l(), ispunct_l(), isspace_l(), isupper_l(), and isxdigit_l().
Here's what I've done with this stuff in the past.
The goal is to use locale-dependent numeric converters with a C-locale numeric representation. The ideal, of course, would be to use non-locale-dependent converters, or not change the locale, etc., etc., but sometimes you just have to live with what you've got. Locale support is seriously broken in several ways and this is one of them.</rant>
First, extract the number as a string using something like the C grammar's simple pattern for numeric preprocessing tokens. For use with scanf, I do an even simpler one:
" %1[-+0-9.]%[-+0-9A-Za-z.]"
This could be simplified even more, depending on how what else you might expect in the input stream. The only thing you need to do is to not read beyond the end of the number; as long as you don't allow numbers to be followed immediately by letters, without intervening whitespace, the above will work fine.
Now, get the struct lconv (man 7 locale) representing the current locale using localeconv(3). The first entry in that struct is const char* decimal_point; replace all of the '.' characters in your string with that value. (You might also need to replace '+' and '-' characters, although most locales don't change them, and the sign fields in the lconv struct are documented as only applying to currency conversions.) Finally, feed the resulting string through strtod and see if it passes.
This is not a perfect algorithm, particularly since it's not always easy to know how locale-compliant a given library actually is, so you might want to do some autoconf stuff to configure it for the library you're actually compiling with.
I am not sure how to solve it in C.
But C++ streams (can) have a unique locale object.
std::stringstream dataStream;
dataStream.imbue(std::locale("C"));
// Note: You must imbue the stream before you do anything wit it.
// If any operations have been performed then an imbue() can
// be silently ignored by the stream (which is a pain to debug).
dataStream << "3.14";
float x;
dataStream >> x;

Create a safe, escaped path base/file name, check if safe

I wonder if there is a generic way to produce filesystem safe filenames that is portable. That is, I have a user entered string and would like to produce a file with a name that as closely resembles the name they have chosen. The resulting name must not include any path reference or other special file-system special name or tag.
Currently I just replace a bunch of known bad characters with other characters, or empty strings. For example, given the name ABC / DEF* : A Company? I'd produce the string ABC - DEF - A Company. My choice for replacement characters is totally arbitrary as I don't know of a generic escape symbol.
So my related questions are:
Is there a method (perhaps in boost filesystem) that can tell me if the name refers strictly to a file without a path?
Is there a function that tells me if the name is "safe" to use as a file (this may be an additional check from 1 for some filesystems)?
Is there a function to convert a string into a reasonable safe name?
Addtional Notes
For #1 I thought to just compare a boost path::filename() to the original object, if they are the same then I have a file. However this still allows things like '..' and '.' But that might be okay if there is a good solution for #2
In theory I'd have to provide a directory in which the file would reside, since different file-systems may have different requirements. But a global solution for the OS would also be okay.
I already have a function that just replaces a bunch of commonly known unsafe characters.
Common file dialogs cannot be used to do the filtering since the interface may not always allow them and in some cases the user isn't directly aware of the relationship to the file (advanced users would however).
According to POSIX fully portable filenames, the only portable filenames are those that contain only A–Za–z0–9._- and are max 14 characters long.
That said, a more practical approach is to assume that modern filesystems can cope with longer filenames and to simply replace all characters which are not explicitly marked as "safe" with _. Sometimes, instead of replacing with _, those characters are hex-encoded, like in URLs: sample%20file.txt. KDE applications use this, for example.
As for implementation, it's as simple as s/[^A-Za-z0-9.-]/_/.
How portable is portable? Many systems had limits on length, and some
probably still do. Is disinguishing between names an issue? Some
systems distinguish case, and others don't. What about a final .xxx?
For some systems, it is significant, for others, it's just text.
Neglecting length, the safest bet is to take the opposite approach:
create a set of known safe characters, and convert everything outside of
that to a specific character. ASCII alphanumerics, and '_' seem
pretty safe, and you're probably OK (today) with '-', but I doubt the
list goes much further. And depending on what you're doing with these
names, you might want to force them to a single case, either upper or
lower.

Multi-language input validation with UTF-8 encoding

To check a user input english name is valid, I would usually match the input against regular expression such as [A-Za-z]. But how can I do this if multi-language(like Chinese, Japanese etc.) support is required with utf8 encoding?
You can approximate the Unicode derived property \p{Alphabetic} pretty succintly with [\pL\pM\p{Nl}] if your language doensn’t support a proper Alphabetic property directly.
Don’t use Java’s \p{Alpha}, because that’s ASCII-only.
But then you’ll notice that you’ve failed to account for dashes (\p{Pd} or DashPunctuation works, but that does not include most of the hyphens!), apostrophes (usually but not always one of U+27, U+2BC, U+2019, or U+FF07), comma, or full stop/period.
You probably had better include \p{Pc} ConnectorPunctuation, just in case.
If you have the Unicode derived property \p{Diacritic}, you should use that, too, because it includes things like the mid-dot needed for geminated L’s in Catalan and the non-combining forms of diacritic marks which people sometimes use.
But then you’ll find people who use ordinal numbers in their names in ways that \p{Nl} (LetterNumber) doesn’t accomodate, so you throw \p{Nd} (DecimalNumber) or even all of \pN (Number) into the mix.
Then you realize that Asian names often require the use of ZWJ or ZWNJ to be written correctly in their scripts, so then you have to add U+200D and U+200C to the mix, which are both \p{Cf} (Format) characters and indeed also JoinControl ones.
By the time you’re done looking up the various Unicode properties for the various and many exotic characters that keep cropping up — or when you think you’re done, rather — you’re almost certain to conclude that you would do a much better job at this if you simply allowed them to use whatever Unicode characters for their name that they wish, as the link Tim cites advises. Yes, you’ll get a few jokers putting in things like “əɯɐuʇƨɐ⅂ əɯɐuʇƨɹᴉℲ”, but that just goes with the territory, and you can’t preclude silly names in any reasonable way.
Think about whether you really need to validate the user's name. Maybe you should let users call themselves whatever they want.
You certainly should never use [A-Za-z], because some people have names with apostrophes or hyphens. It can be quite insulting to prevent someone from using their real name just because it doesn't follow your arbitrary rules for what a name should look like.
In PHP I use this nasty hack:
setlocale(LC_ALL, 'de_DE');
preg_match('/^[[:alpha:]]+$/', $name);
That includes "Umlauts" (i.e. 'ä','ö' and the like) plus accented vowels (è,í,etc.).
But it falls short to validate for Cyrillic (Russia, Bulgaria, ...) or Chinese characters...

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.file.name and tar.gz Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?