Is it safe to use `basename` with __FILE__?

Is it safe to use `basename` with __FILE__? - c++

The title is pretty clear: Is it safe to use basename (man 3 basename) with __FILE__ ?.
It compiles and seems to work fine, but basename's argument is char* (not const char*) and the man-page says:
Both dirname() and basename() may modify the contents of path, so it may be desirable to pass a copy when calling one of these functions.
So, this makes me worry.
Maybe the question should be more like: what is the type of __FILE__? Isn't it a string literal / const char*? But if it is, why there's no a compile-time error (const char* to char*)?

Read carefully basename(3) and notice:
Warning: there are two different functions basename() - see below.
and take care of the NOTES saying
There are two different versions of basename() - the POSIX version
described above, and the GNU version, which one gets after
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <string.h>
The GNU version never modifies its argument, and returns the empty
string when path has a trailing slash
(emphasis is mine)
Because it is said that the GNU version does not modify its argument, using it is safe with __FILE__
BTW, you could consider customizing your GCC (e.g. with MELT) to define some __builtin_basename which would compute the basename at compile time if given a string literal like __FILE__ or else invoke basename at runtime.
Notice that libgen.h has #define basename __xpg_basename

what is the type of __FILE__? Isn't it a string literal / const char*?
Yes. It's a string literal.
But if it is, why there's no a compile-time error (const char* to char*)?
Possibly because the implementation you use (glibc's) may be returning a pointer within the string literal you pass (i.e. it doesn't modify its input).
In any case, you can't rely on it for the above stated below.
C standard (c11, § 6.10.8.1) says the __FILE__ is a string literal:
__FILE__ The presumed name of the current source ﬁle (a character string siteral).
POSIX says:
The basename() function may modify the string pointed to by path,
and may return a pointer to internal storage. The returned pointer
might be invalidated or the storage might be overwritten by a
subsequent call to basename().
(emphasis mine).
So, no, it's not safe to call basename() with __FILE__. You can simply take a copy of __FILE__ and do basename() on it:
char *filename = strdup(__FILE__);
if (filename) {
/* error */
}
char *file_basename = basename(filename);
free(filename);
Since __FILE__ is a string literal, using an array is another option:
char filename[] = __FILE__;
char *file_basename = basename(filename);

Related

std::string constructor with lvalue throws with clang

I am making use of Matei David's handy C++ wrapper for zlib, but I get an error when compiling on macOs (clang-1100.0.33.
include/strict_fstream.hpp:39:37: error: cannot initialize a parameter of type 'const char *' with an lvalue of type 'int'
The problem is here:
/// Overload of error-reporting function, to enable use with VS.
/// Ref: http://stackoverflow.com/a/901316/717706
static std::string strerror()
{
std::string buff(80, '\0');
// options for other environments omitted, where error message is set
// if not Win32 or _POSIX_C_SOURCE >= 200112L, error message is left empty.
auto p = strerror_r(errno, &buff[0], buff.size());
// error next line
std::string tmp(p, std::strlen(p));
std::swap(buff, tmp);
buff.resize(buff.find('\0'));
return buff;
}
(Which IIUC has nothing to do with zlib, just trying to report errors in a thread safe manner).
If I change to this:
static std::string strerror()
{
std::string buff(80, '\0');
auto p = strerror_r(errno, &buff[0], buff.size());
// "fix" below
size_t length = buff.size();
std::string tmp(p, length);
std::swap(buff, tmp);
buff.resize(buff.find('\0'));
return buff;
}
My program compiles and runs fine.
I have two questions:
Why does clang not like the constructor std::string tmp(p, std::strlen(p));?
The buffer was declared at the beginning of the function as length 80. Why are we even bothering to look up the length?
The answer to 2 may answer this, but is there something wrong with my version?
Thanks.

If you use int strerror_r(int errnum, char *buf, size_t buflen);, then there is no appropriate string constructor and the program is ill formed.
If you use char *strerror_r(int errnum, char *buf, size_t buflen);, then the program is well-formed.
The standard C/POSIX library implementation influences which function you get. Compiler is only involved as much as influencing what system library may be used by default.
Former function is extension to POSIX specified in XSI (which is essentially an optional part of POSIX) and latter is a GNU extension.
In case you use glibc (I don't know if that is an option on MacOS), you can control which version you get with macros, although the XSI compliant version is not available in older versions. Its documentation says:
The XSI-compliant version of strerror_r() is provided if:
(_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600) && ! _GNU_SOURCE
The buffer was declared at the beginning of the function as length 80. Why are we even bothering to look up the length?
In the construction std::string tmp(p, std::strlen(p));, strlen seems entirely unnecessary to me. std::string tmp(p); is equivalent.
If you don't need thread safety, then the most portable solution is to use std::strerror which is in standard C++:
return std::strerror(errno); // yes, it's this simple
If you do need thread safety, then you could wrap this in a critical section using a mutex.
Note that strerror, the name of your function, is reserved to the language implementation in the global namespace when the standard library is used. The function should be in a namespace, or be renamed.

There are two different versions of the strerror_r that you'll commonly see:
A POSIX-compliant version that always stores the error message in the provided buffer (if it succeeds) and returns an int (0 for success, non-zero for error)
A GNU version that may store the error message in the provided buffer, or maybe not. It returns a char* pointing to the error message, which may point to the user-supplied buffer or may point to some other global static storage.
That strerror function is clearly written to work with the GNU version of strerror_r.
As for your second question, you need the strlen. buff is 80 characters long, but the actual error message may be shorter and only partially fill the buffer. That strlen is being used to trim off any extra nul characters from the end.

what does function setlocale does?

I write a function to convert wstring to string.If I remove the code setlocale(LC_CTYPE, "") the program goes wrong.I refer to cplusplus read the doc.
C string containing the name of a C locale. These are system specific,
but at least the two following locales must exist:
"C" Minimal "C" locale
"" Environment's default locale
If the value of this parameter is NULL, the function does not make any
changes to the current locale, but the name of the current locale is
still returned by the function.
my code here,source code from cplusplus.com(I add some chinese character):
/* wcstombs example */
#include <stdio.h> /* printf */
#include <stdlib.h> /* wcstombs, wchar_t(C) */
#include <locale.h> /* setlocale */
int main()
{
setlocale(LC_CTYPE, "");
const wchar_t str[] = L"中国、wcstombs example";
char buffer[64];
int ret;
printf ("wchar_t string: %ls \n",str);
ret = wcstombs ( buffer, str, sizeof(buffer) );
if (ret==64)
buffer[63]='\0';
if (ret)
printf ("length:%d,multibyte string: %s \n",ret,buffer);
return 0;
}
If I remove the code setlocale(LC_CTYPE, ""),the program does not run as I expect.
My question is :"If I run in different machine,the program will differ? As the doc say,if the locale is "" ,function does not make any changes to the current locale,but the name of the current locale is still returned by the funciton."
Because the current locale in different machine may differ?
Here is a my c++ version of convert wstring with string,while string to wstring do not need function setlocale,and the program runs well:
/*
string converts to wstring
*/
std::wstring s2ws(const std::string& src)
{
std::wstring res = L"";
size_t const wcs_len = mbstowcs(NULL, src.c_str(), 0);
std::vector<wchar_t> buffer(wcs_len + 1);
mbstowcs(&buffer[0], src.c_str(), src.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}
/*
wstring converts to string
*/
std::string ws2s(const std::wstring & src)
{
setlocale(LC_CTYPE, "");
std::string res = "";
size_t const mbs_len = wcstombs(NULL, src.c_str(), 0);
std::vector<char> buffer(mbs_len + 1);
wcstombs(&buffer[0], src.c_str(), buffer.size());
res.assign(buffer.begin(), buffer.end() - 1);
return res;
}

If the second argument to setlocale is NULL, it does nothing apart from returning the current locale. But you're not doing that. You're sending it a string entirely consisting of a single nil byte, aka "". My setlocale man page says
If locale is an empty string, "", each part of the locale that should be modified is set according to the environment variables. The details are implementation-dependent.
So what this is doing for you is setting the locale to whatever the user has specified or to the system default.
Without running setlocale at all presumably leaves the current locale either uninitialized or NULL on your system, which is why your program fails without that setting.
Two other man pages for stuff you're using say
The behavior of mbstowcs() depends on the LC_CTYPE category of the current locale.
The behavior of wcstombs() depends on the LC_CTYPE category of the current locale.
Presumably these routines are what is failing if you haven't set the locale at all.
I would guess that you probably don't need to run the setlocale statement on every invocation of these routines, but you do need to make sure it's run at least once before running them.
As far as what happens differently depending on the current locale, I believe that would be how exactly the multibyte string is converted to wide characters and vis versa. I think that the man page for those routines leaves it vague because of that difference. Personally, I'd prefer if it set some examples, such as, "if the current locale is C, the multibyte string is ASCII characters." I would guess there's also at least one in which it is interpreted as UTF-8, but I don't know enough about the different locales to say exactly which one that is. There's probably also at least one locale where the multibyte string happened to be another two bytes per character encoding, but C and C++ would still treat it as bytes.
Edit: Thinking about this more, given the characters you added to the example code, it might make sense to explicitly state that using locales that do not support Chinese characters will cause the final printf to report that the length was -1, and this includes the default C locale. In this case, the contents of the buffer is not clearly specified by the standard - at least, my reading of it indicates that the buffer value will probably be all of the characters up to but not including the one that failed to convert. While neither the C++ documentation nor the C documentation state what happens regarding the character that could not be converted. I haven't paid for the official standards, but I do have copies of the last free releases. C++17 defers to C17. C17 also refrains from commenting on this aspect of this function. For wcsrtombs, it explicitly states that the conversion state is unspecified. However, on wcstombs_s, C17 states
If the conversion stops without converting a null wide character and dst is not a null pointer, then a null character is stored into the array pointed to by dst immediately following any multibyte characters already stored.
In my own experiments with the code provided by the OP above, it appears that the wcstombs implementation on Fedora 28 simply refrains from making any further changes to the buffer. That seems to indicate to me, if the exact behavior of the code matters for this situation, it may make sense to use wcstombs_s instead. But at a minimum, you just check to see if the length returned is -1, and if it is, report an error rather than assuming the conversion worked.

Can std::string::c_str() be used whenever a string literal is expected?

I would guess that the last two lines in this code should compile.
#include "rapidjson/document.h"
int main(){
using namespace rapidjson ;
using namespace std ;
Document doc ;
Value obj(kObjectType) ;
obj.AddMember("key", "value", doc.GetAllocator()) ; //this compiles fine
obj.AddMember("key", string("value").c_str(), doc.GetAllocator()) ; //this does not compile!
}
My guess would be wrong, though. One line compiles and the other does not.
The AddMember method has several variants as documented here, but beyond that... why is the return of .c_str() not equivalent to a string literal?
My understanding was that where ever a string literal was accepted, you could pass string::c_str() and it should work.
PS: I'm compiling with VC++ 2010.
EDIT:
The lack of #include <string> is not the problem. It's already included by document.h
This is the error:
error C2664: 'rapidjson::GenericValue<Encoding> &rapidjson::GenericValue<Encoding>::AddMember(rapidjson::GenericValue<Encoding> &,rapidjson::GenericValue<Encoding> &,Allocator &)'
: cannot convert parameter 1 from 'const char [4]' to 'rapidjson::GenericValue<Encoding> &'
with
[
Encoding=rapidjson::UTF8<>,
Allocator=rapidjson::MemoryPoolAllocator<>
]
and
[
Encoding=rapidjson::UTF8<>
]
EDIT2:
Please ignore the fact that .c_str() is called on a temporal value. This example is just meant to show the compile error. The actual code uses a string variable.
EDIT3:
Alternate version of the code:
string str("value") ;
obj.AddMember("key", "value", doc.GetAllocator()) ; //compiles
obj.AddMember("key", str, doc.GetAllocator()) ; // does not compile
obj.AddMember("key", str.c_str(), doc.GetAllocator()) ; // does not compile

The std::string::c_str() method returns a char const*. The type of a string literal is char const[N] where N is the number of characters in the string (including the null terminator). Correspondingly, the result of c_str() can not be used in all places where a string literal can be used!
I'd be surprised if the interface you are trying to call requires a char array, though. That is, in your use it should work. It is more likely that you need to include <string>.

even if this code compiled:
obj.AddMember("key2", string("value").c_str(), doc.GetAllocator());
You cannot guarantee that it is safe.
The const char* returned by std::string::c_str() will be valid until the end of this statement.
If the AddMember method stores a copy of the string itself, all well and good. If it stores a pointer then you're doomed. You need knowledge of the inner workings of AddMember before you can reason about the correctness of your code.
I suspect the authors have already thought of this and have constructed overloads that demand that you either send in a std::string object (or equivalent) or a string literal reference (template<std::size_t N> void AddMember(const char (&str)[N]))
Even if this is not what they had in mind, they might be looking to protect you from yourself, in case you inadvertently send in an invalid pointer.
While seemingly an inconvenience, this compile time error indicates a possibly-faulty program. It's a tribute to the library's authors. Because compile time errors are a gazillion times more useful than runtime errors.

Looking at the documentation you linked to, it seems like you are trying to call the overload of AddMember taking two StringRefTypes (and an Allocator). StringRefType is a typedef for GenericStringRef<Ch>, which has two overloaded constructors taking a single argument:
template<SizeType N>
GenericStringRef(const CharType(&str)[N]) RAPIDJSON_NOEXCEPT;
explicit GenericStringRef(const CharType *str);
When you pass a string literal, the type is const char[N], where N is the length of the string + 1 (for the null terminator). This can be implicitly converted to a GenericStringRef<Ch> using the first constructor overload. However, std::string::c_str() returns a const char*, which cannot be converted implicitly to a GenericStringRef<Ch>, because the second constructor overload is declared explicit.
The error message you get from the compiler is caused by it choosing another overload of AddMember which is a closer match.

Re
” why is the return of .c_str() not equivalent to a string literal
A string literal is a zero-terminated string in an array with size known at compile time.
c_str() produces a pointer to (the first item in) a zero-terminated string in an array with size known only at run-time.
Usually a string literal expression will be used in a context where the expression decays to pointer to first item, but in some special cases it does not decays. These cases include
binding to a reference to array,
using the sizeof operator, and
forming a larger literal by compile time concatenation of string literals (simply writing them in order).
I think that's an exhaustive list.
The error message you cite,
” cannot convert parameter 1 from 'const char [4]' to 'rapidjson::GenericValue &
… does not match your presented code
#include "rapidjson/document.h"
int main(){
using namespace rapidjson ;
using namespace std ;
Document doc ;
Value obj(kObjectType) ;
obj.AddMember("key1", "value", doc.GetAllocator()) ; //this compiles fine
obj.AddMember("key2", string("value").c_str(), doc.GetAllocator()) ; //this does not compile!
}
Nowhere in this code is there a three character long string literal.
Hence the claims that “this compiles” and “this does not compile”, are not very trustworthy.
You
should have quoted the actual error message and actual code (at least one of them is not what you had when you compiled), and
should have quoted the documentation of the function you're calling.
Also, note that the actual argument that compiler reacts to in the quoted diagnostic, is a literal or an array declared as such, not a c_str() call.

Why would FUNCTION be undefined?

I have a C++ library that uses the predefined macro __FUNCTION__, by way of crtdefs.h. The macro is documented here. Here is my usage:
my.cpp
#include <crtdefs.h>
...
void f()
{
L(__FUNCTIONW__ L" : A diagnostic message");
}
static void L(const wchar_t* format, ...)
{
const size_t BUFFERLENGTH = 1024;
wchar_t buf[BUFFERLENGTH] = { 0 };
va_list args;
va_start(args, format);
int count = _vsnwprintf_s(buf, BUFFERLENGTH, _TRUNCATE, format, args);
va_end(args);
if (count != 0)
{
OutputDebugString(buf);
}
}
crtdefs.h
#define __FUNCTIONW__ _STR2WSTR(__FUNCTION__)
The library (which is compiled as a static library, if that matters) is consumed by another project in the same solution, a WPF app written in C#.
When I compile the lib, I get this error:
identifier "L__FUNCTION__" is undefined.
According to the docs, the macro isn't expanded if /P or /EP are passed to the compiler. I have verified that they are not. Are there other conditions where this macro is unavailable?

You list the error as this:
identifier "L__FUNCTION__" is undefined.
Note it's saying "L__FUNCTION__" is not defined, not "__FUNCTION__".
Don't use __FUNCTIONW__ in your code. MS didn't document that in the page you linked, they documented __FUNCTION__. And you don't need to widen __FUNCTION__.
ETA: I also note that you're not assigning that string to anything or printing it in anyway in f().

Just use
L(__FUNCTION__ L" : A diagnostic message");
When adjacent string literals get combined, the result will be a wide string if any of the components were.
There's nothing immediately wrong with using L as the name of a function... it's rather meaningless however. Good variable and function identifiers should be descriptive in order to help the reader understand the code. But the compiler doesn't care.
Since your L function wraps vsprintf, you may also use:
L(L"%hs : A diagnostic message", __func__);
since __func__ is standardized as a narrow string, the %hs format specifier is appropriate.
The rule is found in 2.14.5p13:
In translation phase 6 (2.2), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-deﬁned behavior.

I think the definition of __FUNCTIONW__ is incorrect. (I know you did not write it.)
From: http://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
These identifiers are not preprocessor macros. In GCC 3.3 and earlier,
in C only, __FUNCTION__ and __PRETTY_FUNCTION__ were treated as string
literals; they could be used to initialize char arrays, and they could
be concatenated with other string literals. GCC 3.4 and later treat
them as variables, like __func__. In C++, __FUNCTION__ and
__PRETTY_FUNCTION__ have always been variables.
At least in current GCC then you cannot prepend L to __FUNCTION__, because it is like trying to prepend L to a variable. There probably was a version of VC++ (like there was of GCC) where this would have worked, but you are not using that version.

Pass contents of stringstream to function taking char* as argument

I have a function for writing ppm files (a picture format) to disk. It takes the filename as a char* array. In my main function, I put together a filename using a stringstream and the << operator. Then, I want to pass the results of this to my ppm function. I've seen this discussed elsewhere, often with very convoluted looking methods (many in-between conversion steps).
What I've done is shown in the code below, and the tricky part that others usually do in many steps with temp variables is (char*) (PPM_file_name.str().data()). What this accomplishes is to extract the string from stringstream PPM_file_name with .str(), then get the pointer to its actual contents with .data() (this is a const char*), then cast that to a regular (char*). More complete example below.
I've found the following to work fine so far, but it makes me uneasy because usually when other people have done something in a seemingly more convoluted way, it's because that's a safer way to do it. So, can anyone tell me if what I'm doing here is safe and also how portable is it?
Thanks.
#include <iostream>
#include <sstream>
#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char *argv[]){
// String stream to hold the file name so I can create it from a series of other variable
stringstream PPM_file_name;
// ... a bunch of other code where int ccd_num and string cur_id_str are created and initialized
// Assemble the file name
PPM_file_name << "ccd" << ccd_num << "_" << cur_id_str << ".ppm";
// From PPM_file_name, extract its string, then the const char* pointer to that string's data, then cast that to char*
write_ppm((char*)(PPM_file_name.str().data()),"ladybug_vidcapture.cpp",rgb_images[ccd_num],width,height);
return 0;
}

Thanks everyone. So, following a few peoples' suggestions here, I've done the following, since I do have control over write_ppm:
Modified write_ppm to take const char*:
void write_ppm(const char *file_name, char *comment, unsigned char *image,int width,int height)
And now I'm passing ppm_file_name as follows:
write_ppm((PPM_file_name.str().c_str()),"A comment",rgb_images[ccd_num],width,height);
Is there anything I should do here, or does that mostly clear up the issues with how this was being passed before? Should all the other char arguments to write_ppm be const as well? It's a very short function, and it doesn't appear to modify any of the arguments. Thanks.

This looks like a typical case of someone not writing const-correct code and it having the knock-on effect. You have several choices:
If write_ppm is under your control, or the control of anyone you know, get them to make it const corrct
If it is not, and you can guarantee it never changes the filename then const_cast
If you cannot guarantee that, copy your string into a std::vector plus the null terminator and pass &vec[0] (where vec represents the name of your vector variable)

You should use PPM_file_name.str().c_str(), since data() isn't guaranteed to return a null-terminated string.
Either write_ppm() should take its first argument by const char* (promising not to change the string's content) or you must not pass a string stream (because you must not change its content that way).
You shouldn't use C-style casts in C++, because they don't differentiate between different reasons to cast. Yours is casting away const, which, if at all, should be done using const_cast<>. But as a rule of thumb, const_cast<> is usually only required to make code compile that isn't const-correct, which I'd consider an error.

It's absolutely safe and portable as long as write_ppm doesn't actually change the argument, in which case it is undefined behavior. I would recommend using const_cast<char*> instead of C-style cast. Also consider using c_str() member instead of the data() member. The former guarantees to return a null-terminated string

Use c_str() instead of data() (c_str() return a NULL-terminated sequence of characters).

Why not simply use const_cast<char *>(PPM_file_name.str().c_str()) ?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js