How can getenv() be thread-safe? - c++

I want to use the getenv() function.
Now I got a remark from somebody that if multiple threads are calling this function, this will not be thread-safe. However if I look at the information page for this function, it states that:
Concurrently calling this function is safe, provided that the environment remains unchanged.
I understand the concept of a static block of data, and the function returns a pointer to it. I understand that the contents of the block can change over time, by making multiple calls to the function, as the reference pages state.
If one thread is calling
getenv("myEnvVar1")
and another one is calling
getenv("myEnvVar2")
will the same memory block be used where the returned pointers are pointing to? How should I interpret the fact that "Concurrently calling this function is safe"?

getenv returns a pointer to the ACTUAL environment content - so the process has an array of strings with the environment variables in them, and you get back, not a copy, but the ACTUAL pointer to that.
Note that char *p = getenv("foo"); ... setenv("foo", "new value"); ... use p is also undefined, as the string p points at may well have changed now [and not in a well-defined way]

It's not.
The function getenv is part of the C Standard Library, thus it's behaviour is specified by the set of C standards.
POSIX.1-2017 (Which defers to the ISO C Standard) states:
The returned string pointer might be invalidated or the string content might be overwritten by a subsequent call to getenv(),
setenv(), unsetenv(), or (if supported) putenv() but they shall not be
affected by a call to any other function in this volume of
POSIX.1-2017.
The returned string pointer might also be invalidated if the calling thread is terminated.
The getenv() function need not be thread-safe.
ISO C11 (Which POSIX defers to), says:
The set of environment names and the method for altering the
environment list are implementation-defined. The getenv function
need not avoid data races with other threads of execution that modify
the environment list.
You cannot be sure that getenv itself does not modify the environment when searching through it (I cannot think why it would, but doing so would not violate the standard). If you want to be sure that the version of getenv you're using is a thread-safe implementation, you must consult your implementation's documentation to confirm that it is.

Related

What are the function-pointer fields of an SCDynamicStoreContext for?

SCDynamicStoreContext is defined like this (version 0):
typedef struct {
CFIndex version;
void * info;
const void * (*retain)(const void *info);
void (*release)(const void *info);
CFStringRef (*copyDescription)(const void *info);
} SCDynamicStoreContext;
Various examples that I have seen of how to initialize an SCDynamicStoreContext (including one from Apple) all set the retain, release, and copyDescription fields to NULL, but I am wondering what these fields are for.
What are the implications of not passing a retain and release function when, for example, the info object is an NSObject?
What is the copyDescription function used for?
The lifetime of the SCDynamicStore object created by, for example, SCDynamicStoreCreate() is indeterminate. It will live until it is fully released. So long as it lives, it may call the supplied callback. When it does, it will pass the info pointer provided in the context. If you don't take steps to make sure that that pointer remains valid for as long as the dynamic store object lives, then the info pointer may become invalid. Your callback could cause a crash or misbehave if it accesses the info pointer after it has become invalid.
The retain and release function pointers of the context allow the framework to give you a means to know how long that the info pointer must remain valid. It obviously must be valid (or NULL) at the time that the dynamic store is created. Furthermore, it must remain valid so long as any calls to the retain function have not been balanced by a corresponding call to the release function.
If you don't provide retain and release functions, then either the info pointer must remain valid in perpetuity or for at least as long as the dynamic store object does, and you're responsible for ensuring that. That can be somewhat difficult because you don't always know what other APIs will retain the dynamic store object. It will definitely be kept alive so long as its run loop source is scheduled on a run loop, although removing the source from all run loops does not necessarily guarantee that it will be fully released at that moment.
The copyDescription function is a means to enhance debugging output. For example, under certain circumstances, the framework may write a log message. It will try to describe the dynamic store object that encountered the circumstances. To do that in a manner which makes most sense to you the client, it can include a description of the info from the context. If it doesn't have a copyDescription function, the best it can do is record the pointer value. If it does, then it can write whatever description is provided by that function.
Not coincidentally, the signatures of the three function pointers match those of CFRetain(), CFRelease(), and CFCopyDescription(). So, if info is a Core Foundation object or a Cocoa object (since NSObject is toll-free bridged to CFTypeRef), then you can supply those functions in the context and everything behaves as you'd expect.

is it possible to use function pointers this way?

This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.

strcat adds second parameter twice

class Vars{
public:
char *appData = getenv("AppData");
string datadir = strcat(appData, "\\Bob");
};
cout << v.datadir;
outputs "C:\Users\Adam\AppData\Roaming\Bob\Bob"
instead of "C:\Users\Adam\AppData\Roaming\Bob"
It always adds the second parameter twice. How come?
"The string pointed by the pointer returned by this function shall not be modified by the program." Changing the value like you did (by strcat) leads to unpredictable behavior. The solution it to simply copy the immutable given string to a string and do the concatenation there.
What about making a new public function that does this:
string datadir(getenv("AppData"));
datadir += "\\Bob";
This is pre-C++11 code.
The issue is that you are modifying memory that you should not be. You get a pointer from getenv, but that is pointing to memory that you do not control (emphasis mine).
The pointer returned points to an internal memory block, whose content or validity may be altered by further calls to getenv (but not by other library functions).
The string pointed by the pointer returned by this function shall not be modified by the program. Some systems and library implementations may allow to change environmental variables with specific functions (putenv, setenv...), but such functionality is non-portable.
By calling strcat(appData, "\\Bob"); you are writing \Bob into a piece of memory you do not control. The operating system may decide to do any number of things with it. As has already been pointed out by #Liviu, it is much better to take a copy of the original value and append to that.
std::string appData( getEnv("AppData" ) );
appData += "\\Bob";

Thread safety of c_str() in C++

I have created a class, SettingsClass that contains static strings that hold db connection strings to be used by the MySQL C++ connector library (for e.g. hostname, dbname, username, password).
Whenever a function needs to connect to the database, it calls the .c_str() function on these static strings. For example:
Class SettingsClass
{
public:
static string hostname;
...
}SettingsClass;
string SettingsClass::hostname;
//A function that needs to connect to the DB uses:
driver = get_griver_instance();
driver->connect(SettingsClass.hostname.c_str(),...);
The static strings are populated once in the process lifetime. Its value is read from a configuration file.
My application is multithreaded. Am I using c_str() in a safe way?
The c_str() in itself should be threadsafe. However, if you have another thread that accesses (writes to) the string that you are taking c_str() of, then you're playing with matches sitting in a pool of petrol.
Typically c_str() is implemented by adding a zero value (the null character 0x00, not the character for zero which is 0x30) on the end of the existing string (if there isn't one there already) and passes back the address of where the string is stored.
Anyone interested can read the libstdc++ code here:
libstdc++, basic_string.h
The interesting lines are 1802 and 294 - 1802 is the c_str() function, and 294 is the function that c_str() calls.
In theoretically at least, the standard allows implementations
which wouldn't be thread safe. In practice, I don't think
you'll run any risk (although formally, there could be undefined
behavior), but if you want to be sure, just call .c_str() once
on all of the strings during initialization (before threads have
been started). The standard guarantees the validity of the
returned pointer until the next non-const function is called
(even if you don't keep a copy of it), which means that the
implementation cannot mutate any data.
Since the string type is constant (so, assume, it is immutable), it is never modified. Any function only reading it should be thread-safe.
So, I think .c_str() is thread-safe.
As long the std::string variables are initialized correctly before your threads are started and aren't changed afterwards using the const char* representation via c_str() should be thread safe.

Threads and string literals

Is it valid (defined behavior) to access a string literal simultaneously with multiple threads? Given a function like this:
const char* give()
{
return "Hello, World!";
}
Would it be save to call the function and dereference the pointer simultaneously?
Edit: Many answers. Will accept the first one who can show me the section out of the standard.
According to the standard:
C++11 1.10/3: The value of an object visible to a thread T at a particular point is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
A string literal, like any other constant object, cannot legally be assigned to; it has static storage duration, and so is initialised before the program starts; therefore, all threads will see its initial value at all times.
Older standards had nothing to say about threads; so if your compiler doesn't support the C++11 threading model then you'll have to consult its documentation for any thread-safety guarantees. However, it's hard to imagine any implementation under which access to immutable objects were not thread-safe.
Yes, it's safe. Why wouldn't it be? It would be unsafe if you'd try to modify the string, but that's illegal anyway.
It is always safe to access immutable data from multiple threads. String literals are an example of immutable data (since it's illegal to modify them at run-time), so it is safe to access them from multiple threads.
As long as you only read data, you can access it from as many threads as you want. When data needs to be changed, that's when it gets complicated.
This depends on the implementation of the C Compiler. But I do not know of an implementation where concurrent read accesses might be unsafe, so in practice this is safe.
String literals are (conceptually) stored in read only memory and initialised on loading (rather than at runtime). It's therefore safe to access them from multiple threads at any time.
Note that more complex structures might not be initialised at load time, and so multiple thread access might have the possibility of issues immediately after the creation of the object.
But string literals are completely safe.