Is opening the SAME file in two different fstreams Undefined Behaviour? - c++

This recently asked question has raised another interesting issue, as discussed in the comments to one of its answers.
To summarize: the OP there was having issues with code like that below, when subsequently attempting to read and write data from/to the two streams 'concurrently':
ifstream infile;
infile.open("accounts.txt");
ofstream outfile;
outfile.open("accounts.txt");
Although the issue, in itself, was successfully resolved, it has raised a question to which I cannot find an authoritative answer (and I've made some quite extensive searches of Stack Overflow and the wider web).
It is very clearly stated what happens when calling the open() method of a stream that is already associated with a file (cppreference), but what I cannot find an answer to is what happens when (as in this case) the file is already associated with a (different) stream.
If the stream is already associated with a file (i.e., it is already
open), calling this function fails.
I can see several possible scenarios here:
The second open call will fail and any attempted writes to it will also fail (but that is not the case in the cited question).
The second open call will 'override' the first, effectively closing it (this could explain the issues encountered in said code).
Both streams remain open but enter into a 'mutual clobbering' match regarding their internal file pointers and buffers.
We enter the realm of undefined (or implementation-defined) behaviour.
Note that, as the first open() call is made by an input stream, the operating system will not necessarily 'lock' the file, as it probably would for an output stream.
So, does anyone have a definitive answer to this? Or a citation from the Standard (cppreference will be 'acceptable' if nothing more authoritative can be found)?

basic_filebuf::open (and all things that depend on it, like fstream::open) has no statement about what will happen in this case. A filesystem may allow it or it may not.
What the standard says is that, if the file successfully opens, then you can play with it in accord with the interface. And if it doesn't successfully open, then there will be an error. That is, the standard allows a filesystem to permit it or forbid it, but it doesn't say which must happen. The implementation can even randomly forbid it. Or forbid you from opening any files in any way. All are (theoretically) valid.

To me, this falls even out of the 'implementation defined' field. The very same code will have different behaviour depending of the underlying filesystem or OS (some OSes forbid to open a file twice).

No.
Such a scenario is not discussed by the standard.
It's not even managed by the implementation (your compiler, standard library implementation etc).
The stream ultimately asks the operating system for access to that file in the desired mode, and it's up to the operating system to decide whether that access shall be granted at that time.
A simple analogy would be your program making some API call to a web application over a network. Perhaps the web application does not permit more than ten calls per minute, and returns some error code if you attempt more than that. But that doesn't mean your program has undefined behaviour in such a case.

C implementations exist for many different platforms, whose underlying file systems may handle such corner cases differently. For the Standard to mandate any particular corner-case behavior would have made the language practical only on platforms whose file systems behave in such fashion. Instead, the Standard regards such issues as being outside its jurisdiction (i.e. to use its own terminology, "Undefined Behavior"). That doesn't mean that implementations whose target OS offers useful guarantees shouldn't make such guarantees to programs when practical, but implementation designers are presumed to know more than the Committee about how best to serve their customers.
On the other hand, it may sometime be helpful for an implementation not to expose the underlying OS behavior. On an OS that doesn't have a distinct "append" mode, for example, but code needing an "open for append" could do an "open existing file for write" followed by "seek to end of file", an attempt to open two streams for appending to the same file could result in data corruption when one stream writes part of a file, and the other stream then rewrites that same portion. It may be helpful for an implementation that detects that condition to either inject its own logic to either ensure smooth merging of the data or block the second open request. Either course of action might be better, depending upon an application's purpose, but--as noted above--the choice is outside the Standard's jurisdiction.

I open the zip file as stream twice.The zip file contains some XML files.
std::ifstream("filename") file;
zipstream *p1 = new zipstream(file);
zipstream *p2 = new zipstream(file);
p1->getNextEntry();
auto p3 = p1.rdbuf();
autp p4 = p2.rdbuf();
Then see p3 address = p4 address, but the member variables are different between them. Such as _IGfirst.
The contents of one of the XML files are as follows:
<test>
<one value="0.00001"/>
</test>
When the contents of file are read in two thread at the same time.error happend.
string One = p1.getPropertyValue("one");
// one = "0001two"

Related

Creating a new file avoiding race conditions

I need to develop a C++ routine performing this apparently trivial task: create a file only if it does not exist, else do nothing/raise error.
As I need to avoid race conditions, I want to use the "ask forgiveness not permission" principle (i.e. attempting the intended operation and checking if it succeeded, as opposed to checking preconditions in advance), which, to my knowledge, is the only robust and portable method for this purpose [Wikipedia article][an example with getline].
Still, I could not find a way to implement it in my case. The best I could come up with is opening a fstream in app mode (or fopening with "a"), checking the output position with tellp (C++) or ftell (C) and aborting if such position is not zero. This has however two disadvantages, namely that if the file exists it gets locked (although for a short time) and its modification date is altered.
I checked other possible combinations of ios_base::openmode for fstream, as well as the mode strings of fopen but found no option that suited my needs. Further search in the C and C++ standard libraries, as well as Boost Filesystem, proved unfruitful.
Can someone point out a method to perform my task in a robust way (no collateral effects, no race conditions) without relying on OS-specific functions?
My specific problem is in Windows, but portable solutions would be preferred.
EDIT: The answer by BitWhistler completely solves the problem for C programs. Still, I am amazed that no C++ idiomatic solution seems to exist. Either one uses open with the O_EXCL attribute as proposed by Andrew Henle, which is however OS-specific (in Windows the attribute seems to be called _O_EXCL with an additional underscore [MSDN]) or one separately compiles a C11 file and links it from the C++ code. Moreover, the file descriptor obtained cannot be converted to a stream except with nonstandard extensions (e.g. GCC's __gnu_cxx::stdio_filebuf). I hope a future version of C++ will implement the "x" subattribute and possibly also a corresponding ios:: modificator for file streams.
The new C standard (C2011, which is not part of C++) adds a new standard subspecifier ("x"), that can be appended to any "w" specifier (to form "wx", "wbx", "w+x" or "w+bx"/"wb+x"). This subspecifier forces the function to fail if the file exists, instead of overwriting it.
source: http://www.cplusplus.com/reference/cstdio/fopen/

File created when using ofstream, but not ifstream and fstream/cannot seem to access gcount()

all. A new project of mine involves reading names from a file, and I realized, especially for somebody who likes to (attempt to, more like) makes games, reading/writing to store information is very useful. I looked into it and discovered the std library pulls through again. I subsequently realized that, at least for me, the libraries from ios, ios_base, iostream, fstream, etc etc, seem to be kind of complex.
I looked around, but am not quite sure why this specific approach does not work. It puzzles me because examples found online that seem to follow the library as they should - I will refer to one on cplusplus.com. Of interesting note is that one worked as expected, while another didn't. Far as I know, that is.
My problem is, I can create a file using ofstream, but not ifstream nor fstream. The way I understand it, there constructors, which I was using (which I beleive is identical paramaters as you would pass to open - that is, the filename and flags) are identical except in that they have different default perameters - ostream has ios::out, istream has ifstream has ios::in, and fstream has ios::in | ios::out - both flags, which is logical since it is the combination of both classes. Note this is not a problem of WHERE the file is created, rather, but the fact of its creation. I know this by two reasons. First, when using ofstream the file appears in expected directory, but not at all with ifstream nor fstream. Second, using the is_open function, (specifically, testing the expressions "while (is_open)" , the console never closed (I test most new concepts in the console because, well, it is simple), but with the others it did. If it had, then it never would have ended, since it can never go out of scope and thus the destructor spells the end for the open file.
My second problem was with using gcount - mostly how to access it. No matter how I tried (std::gcount, file::gcount, fstream::gcount...) it never recognized what it was. I was a bit befuddled.
Now comes a bit off off-topic ness to the specific issue, but more into the general reason of why I encountered this problem to begin with, which is sort of a discussion but I think it could be beneficial thing to learn, unless somehow I am the only one with this problem.
Firstly, the tuturial I had read over (learncpp.com) goes over the seekp/seekg functions, and said that it moves a relative amount in BYTES. It occured to me this is generally the same as a char, but is it possible that this is different on a different system (i.e, if a char was 2 bytes, you could figure this out, and apply this to the file searching), or will it always go by character, essentially, anyway?
And my main wonder... how to really use these tools to do things. Let's say I want to put a name on each line. I would be a bit unsure how to do that, - newline character perhaps? Or the other end, reading a name per line - how do you find the starting position of the next line? (my issue with gcount arrises because I figured you could use getline(), then gcount(), and voila, move that many characters... not sure if that works though)
Thanks.
To answer just one of your questions: you can't create a file with ifstream because, well, it doesn't create files; it's simply not intended to. This purpose of this class is to read data from existing files, and to open a file and read from it, it must exist first. Simple as that.
As far as fstream goes, you could make a case that it would make sense for it to create a file if none exists, but in fact, that's not what it does; the argument to fstream's constructor must be an existing file. If the named file doesn't exist, a failure bit is set in the object, and no file is created.

What's a good, threadsafe, way to pass error strings back from a C shared library

I'm writing a C shared library for internal use (I'll be dlopen()'ing it to a c++ application, if that matters). The shared library loads (amongst other things) some java code through a JNI module, which means all manners of nightmare error modes can come out of the JVM that I need to handle intelligently in the application. Additionally, this library needs to be re-entrant. Is there in idiom for passing error strings back in this case, or am I stuck mapping errors to integers and using printfs to debug things?
Thanks!
My approach to the problem would be a little different from everyone else's. They're not wrong, it's just that I've had to wrestle with a different aspect of this problem.
A C API needs to provide numeric error codes, so that the code using the API can take sensible measures to recover from errors when appropriate, and pass them along when not. The errno.h codes demonstrate a good categorization of errors; in fact, if you can reuse those codes (or just pass them along, e.g. if all your errors come ultimately from system calls), do so.
Do not copy errno itself. If possible, return error codes directly from functions that can fail. If that is not possible, have a GetLastError() method on your state object. You have a state object, yes?
If you have to invent your own codes (the errno.h codes don't cut it), provide a function analogous to strerror, that converts these codes to human-readable strings.
It may or may not be appropriate to translate these strings. If they're meant to be read only by developers, don't bother. But if you need to show them to the end user, then yeah, you need to translate them.
The untranslated version of these strings should indeed be just string constants, so you have no allocation headaches. However, do not waste time and effort coding your own translation infrastructure. Use GNU gettext.
If your code is layered on top of another piece of code, it is vital that you provide direct access to all the error information and relevant context information that that code produces, and you make it easy for developers against your code to wrap up all that information in an error message for the end user.
For instance, if your library produces error codes of its own devising as a direct consequence of failing system calls, your state object needs methods that return the errno value observed immediately after the system call that failed, the name of the file involved (if any), and ideally also the name of the system call itself. People get this wrong waaay too often -- for instance, SQLite, otherwise a well designed API, does not expose the errno value or the name of the file, which makes it infuriatingly hard to distinguish "the file permissions on the database are wrong" from "you have a bug in your code".
EDIT: Addendum: common mistakes in this area include:
Contorting your API (e.g. with use of out-parameters) so that functions that would naturally return some other value can return an error code.
Not exposing enough detail for callers to be able to produce an error message that allows a knowledgeable human to fix the problem. (This knowledgeable human may not be the end user. It may be that your error messages wind up in server log files or crash reports for developers' eyes only.)
Exposing too many different fine distinctions among errors. If your callers will never plausibly do different things in response to two different error codes, they should be the same code.
Providing more than one success code. This is asking for subtle bugs.
Also, think very carefully about which APIs ought to be allowed to fail. Here are some things that should never fail:
Read-only data accessors, especially those that return scalar quantities, most especially those that return Booleans.
Destructors, in the most general sense. (This is a classic mistake in the UNIX kernel API: close and munmap should not be able to fail. Thankfully, at least _exit can't.)
There is a strong case that you should immediately call abort if malloc fails rather than trying to propagate it to your caller. (This is not true in C++ thanks to exceptions and RAII -- if you are so lucky as to be working on a C++ project that uses both of those properly.)
In closing: for an example of how to do just about everything wrong, look no further than XPCOM.
You return pointers to static const char [] objects. This is always the correct way to handle error strings. If you need them localized, you return pointers to read-only memory-mapped localization strings.
In C, if you don't have internationalization (I18N) or localization (L10N) to worry about, then pointers to constant data is a good way to supply error message strings. However, you often find that the error messages need some supporting information (such as the name of the file that could not be opened), which cannot really be handled by constant data.
With I18N/L10N to worry about, I'd recommend storing the fixed message strings for each language in an appropriately formatted file, and then using mmap() to 'read' the file into memory before you fork any threads. The area so mapped should then be treated as read-only (use PROT_READ in the call to mmap()).
This avoids complicated issues of memory management and avoids memory leaks.
Consider whether to provide a function that can be called to get the latest error. It can have a prototype such as:
int get_error(int errnum, char *buffer, size_t buflen);
I'm assuming that the error number is returned by some other function call; the library function then consults any threadsafe memory it has about the current thread and the last error condition returned to that thread, and formats an appropriate error message (possibly truncated) into the given buffer.
With C++, you can return (a reference to) a standard String from the error reporting mechanism; this means you can format the string to include the file name or other dynamic attributes. The code that collects the information will be responsible for releasing the string, which isn't (shouldn't be) a problem because of the destructors that C++ has. You might still want to use mmap() to load the format strings for the messags.
You do need to be careful about the files you load and, in particular, any strings used as format strings. (Also, if you are dealing with I18N/L10N, you need to worry about whether to use the 'n$ notation to allow for argument reordering; and you have to worry about different rules for different cultures/languages about the order in which the words of a sentence are presented.)
I guess you could use PWideChars, as Windows does. Its thread safe. What you need is that the calling app creates a PwideChar that the Dll will use to set an error. Then, the callling app needs to read that PWideChar and free its memory.
R. has a good answer (use static const char []), but if you are going to have various spoken languages, I like to use an Enum to define the error codes. That is better than some #define of a bunch of names to an int value.
return integers, don't set some global variable (like errno— even if it is potentially TLSed by an implementation); aking to Linux kernel's style of return -ENOENT;.
have a function similar to strerror that takes such an integer and returns a pointer to a const string. This function can transparently do I18N if needed, too, as gettext-returnable strings also remain constant over the lifetime of the translation database.
If you need to provide non-static error messages, then I recommend returning strings like this: error_code_t function(, char** err_msg). Then provide a function to free the error message: void free_error_message(char* err_msg). This way you hide how the error strings are allocated and freed. This is of course only worth implementing of your error strings are dynamic in nature, meaning that they convey more than just a translation of error codes.
Please havy oversight with mu formatting. I'm writing this on a cell phone...

tmpnam warning saying it is dangerous

I get this warning saying that tmpnam is dangerous, but I would prefer to use it, since it can be used as is in Windows as well as Linux. I was wondering why it would be considered dangerous (I'm guessing it's because of the potential for misuse rather than it actually not working properly).
From tmpnam manpage :
The tmpnam() function generates a different string each time it is called, up to TMP_MAX times. If it is called more than TMP_MAX times, the behavior is implementation defined.
Although tmpnam() generates names that are difficult to guess, it is nevertheless possible that between the time that tmpnam() returns a pathname, and the time that the program opens it, another program might create that pathname using open(2), or create it as a symbolic link. This can lead to security holes. To avoid such possibilities, use the open(2) O_EXCL flag to open the pathname. Or better yet, use mkstemp(3) or tmpfile(3).
Mktemp really create the file, so you are assured it works, whereas tmpnam returns a name, possibly already existing.
If you want to use the same symbol on multiple platforms, use a macro to define TMPNAM. As long as you pick more secure functions with the same interface, you'll be able to use it on both. You have conditional compilation somewhere in your code anyway, right?
if you speak about the compiler warning of MSVC:
These functions are deprecated because more secure versions are available;
see tmpnam_s, _wtmpnam_s.
(http://msdn.microsoft.com/de-de/library/hs3e7355(VS.80).aspx)
otherwise just read what the manpages say about the drawbacks of this function. it is mostly about a 2nd process creating exactly the same file name as your process just did.
From the tmpnam(3) manpage:
Although tmpnam() generates names that are difficult to guess, it is nevertheless possible that between the time
that tmpnam() returns a pathname, and the time that the program opens it, another program might create that path‐
name using open(2), or create it as a symbolic link. This can lead to security holes. To avoid such possibili‐
ties, use the open(2) O_EXCL flag to open the pathname. Or better yet, use mkstemp(3) or tmpfile(3).
The function is dangerous, because you are responsible for allocating a buffer that will be big enough to handle the string that tmpnam() is going to write into that buffer. If you allocate a buffer that is too small, tmpnam() has no way of knowing that, and will overrun the buffer (Causing havoc). tmpnam_s() (MS's secure version) requires you to pass the length of the buffer, so tmpnam_s know when to stop.

Why can't I use fopen?

In the mold of a previous question I asked about the so-called safe library deprecations, I find myself similarly bemused as to why fopen() should be deprecated.
The function takes two C strings, and returns a FILE* ptr, or NULL on failure. Where are the thread-safety problems / string overrun problems? Or is it something else?
Thanks in advance
You can use fopen(). Seriously, don't take any notice of Microsoft here, they're doing programmers a real disservice by deviating from the ISO standards . They seem to think that people writing code are somehow brain-dead and don't know how to check parameters before calling library functions.
If someone isn't willing to learn the intricacies of C programming, they really have no business doing it. They should move on to a safer language.
This appears to be just another attempt at vendor lock-in by Microsoft of developers (although they're not the only ones who try it, so I'm not specifically berating them). I usually add:
#define _CRT_SECURE_NO_WARNINGS
(or the "-D" variant on the command line) to most of my projects to ensure I'm not bothered by the compiler when writing perfectly valid, legal C code.
Microsoft has provided extra functionality in the fopen_s() function (file encodings, for one) as well as changing how things are returned. This may make it better for Windows programmers but makes the code inherently unportable.
If you're only ever going to code for Windows, by all means use it. I myself prefer the ability to compile and run my code anywhere (with as little change as possible).
As of C11, these safe functions are now a part of the standard, though optional. Look into Annex K for full details.
There is an official ISO/IEC JTC1/SC22/WG14 (C Language) technical report TR24731-1 (bounds checking interfaces) and its rationale available at:
http://www.open-std.org/jtc1/sc22/wg14
There is also work towards TR24731-2 (dynamic allocation functions).
The stated rationale for fopen_s() is:
6.5.2 File access functions
When creating a file, the fopen_s and freopen_s functions improve security by protecting the file from unauthorized access by setting its file protection and opening the file with exclusive access.
The specification says:
6.5.2.1 The fopen_s function
Synopsis
#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
errno_t fopen_s(FILE * restrict * restrict streamptr,
const char * restrict filename,
const char * restrict mode);
Runtime-constraints
None of streamptr, filename, or mode shall be a null pointer.
If there is a runtime-constraint violation, fopen_s does not attempt to open a file.
Furthermore, if streamptr is not a null pointer, fopen_s sets *streamptr to the
null pointer.
Description
The fopen_s function opens the file whose name is the string pointed to by
filename, and associates a stream with it.
The mode string shall be as described for fopen, with the addition that modes starting
with the character ’w’ or ’a’ may be preceded by the character ’u’, see below:
uw truncate to zero length or create text file for writing, default permissions
ua append; open or create text file for writing at end-of-file, default permissions
uwb truncate to zero length or create binary file for writing, default permissions
uab append; open or create binary file for writing at end-of-file, default
permissions
uw+ truncate to zero length or create text file for update, default permissions
ua+ append; open or create text file for update, writing at end-of-file, default
permissions
uw+b or uwb+ truncate to zero length or create binary file for update, default
permissions
ua+b or uab+ append; open or create binary file for update, writing at end-of-file,
default permissions
To the extent that the underlying system supports the concepts, files opened for writing
shall be opened with exclusive (also known as non-shared) access. If the file is being
created, and the first character of the mode string is not ’u’, to the extent that the
underlying system supports it, the file shall have a file permission that prevents other
users on the system from accessing the file. If the file is being created and first character
of the mode string is ’u’, then by the time the file has been closed, it shall have the
system default file access permissions10).
If the file was opened successfully, then the pointer to FILE pointed to by streamptr
will be set to the pointer to the object controlling the opened file. Otherwise, the pointer
to FILE pointed to by streamptr will be set to a null pointer.
Returns
The fopen_s function returns zero if it opened the file. If it did not open the file or if
there was a runtime-constraint violation, fopen_s returns a non-zero value.
10) These are the same permissions that the file would have been created with by fopen.
The fopen_s() function has been added by Microsoft to the C runtime with the following fundamental differences from fopen():
if the file is opened for writing ("w" or "a" specified in the mode) then the file is opened for exclusive (non-shared) access (if the platform supports it).
if the "u" specifier is used in the mode argument with the "w" or "a" specifiers, then by the time the file is closed, it will have system default permissions for others users to access the file (which may be no access if that's the system default).
if the "u" specified is not used in those cases, then when the file is closed (or before) the permissions for the file will be set such that other users will not have access to the file.
Essentially it means that files the application writes are protected from other users by default.
They did not do this to fopen() due to the likelyhood that existing code would break.
Microsoft has chosen to deprecate fopen() to encourage developers for Windows to make conscious decisions about whether the files their applications use will have loose permissions or not.
Jonathan Leffler's answer provides the proposed standardization language for fopen_s(). I added this answer hoping to make clear the rationale.
Or is it something else?
Some implementations of the FILE structure used by 'fopen' has the file descriptor defined as 'unsigned short'. This leaves you with a maximum of 255 simultaneously open files, minus stdin, stdout, and stderr.
While the value of being able to have 255 open files is debatable, of course, this implementation detail materializes on the Solaris 8 platform when you have more than 252 socket connections! What first appeared as a seemingly random failure to establish an SSL connection using libcurl in my application turned out to be caused by this, but it took deploying debug versions of libcurl and openssl and stepping the customer through debugger script to finally figure it out.
While it's not entirely the fault of 'fopen', one can see the virtues of throwing off the shackles of old interfaces; the choice to deprecate might be based on the pain of maintaining binary compatibility with an antiquated implementation.
The new versions do parameter validation whereas the old ones didn't.
See this SO thread for more information.
Thread safety. fopen() uses a global variable, errno, while the fopen_s() replacement returns an errno_t and takes a FILE** argument to store the file pointer to.