C++ how to check if a stream (iostream) is seekable

C++ how to check if a stream (iostream) is seekable - c++

Is there a way I can check if an istream of ostream is seekable?
I suspect doing a test seek and checking for failbit is not correct
since the seek can fail for unrelated reasons.
I need this to work on Linux and mac, if that makes a difference.

Iostreams doesn't give you much. Stream objects are just wrappers around a buffer object derived from class std::streambuf. (Assuming "narrow" characters.) The standard derived buffer classes are std::stringbuf for strings and std::filebuf for files. Supposing you're only interested in files, std::filebuf is just a simple wrapper around the C library functionality. The C library does not define a way to determine if a FILE object supports seeking or not, besides attempting to do so, so neither does C++.
For what it's worth, the semantics of seek vary a bit. Some platforms might allow you to "seek" a pipe but only to the current position, to determine how many characters have been read or written. Seeking past the end might resize the file, or might cause the next write operation to resize the file, or something in between.
You might also try checking errno if badbit is set (or, as I prefer, use exceptions instead of flags).

Related

How to write custom input function for Flex in C++ mode?

I have a game engine and a shader parser. The engine has an API for reading from a virtual file system. I would like to be able to load shaders through this API. I was thinking about implementing my own std::ifstream but I don't like it, my api is very simple and I don't want to do a lot of unnecessary work. I just need to be able to read N bytes from the VFS. I used a C++ mod for more convenience, but in the end I can not find a solution to this problem, since there is very little official information about this. Everything is there for the C API, at least I can call the scan_string function, I did not find such a function in the yyFlexParser interface.
To be honest, I wanted to abandon the std::ifstream in the parser, and return only the C api . The only thing I used the Flex C++ mode for is to interact with the Bison C++ API and so that the parser can be used in a multi-threaded environment, but this can also be achieved with the C API.
I just couldn't compile the C parser with the C++ compiler.
I would be happy if there is a way to add such functionality through some kind of macro.
I wouldn't mind if there was a way to return the yy_scan_string function, I could read the whole file myself and provide just a string.

The simple solution, if you just want to provide a string input, is to make the string into a std::istringstream, which is a valid std::istream. The simplicity of this solution reduces the need for an equivalent to yy_scan_string.
On the other hand, if you have a data source you want to read from which is not derived from std::istream, you can easily create a lexical scanner which does whatever is necessary. Just subclass yyFlexLexer, add whatever private data members you will need and a constructor which initialises them, and override int LexerInput(char* buffer, size_t maxsize); to read at least one and no more than maxsize bytes into buffer, returning the number of characters read. (YY_INPUT also works in the C++ interface, but subclassing is more convenient precisely because it lets you maintain your own reader state.)
Notes:
If you decide to subclass and override LexerInput, you need to be aware that "interactive" mode is actually implemented in LexerInput. So if you want your lexer to have an interactive mode, you'll have to implement it in your override, too. In interactive mode, LexerInput always reads exactly one character (unless, of course, it's at the end of the file).
As you can see in the Flex code repository, a future version of Flex will use refactored versions of these functions, so you might need to be prepared to modify your code in the future, although Flex generally maintains backwards compatibility for a long time.

How do I get a HANDLE to the containing directory from a file HANDLE?

Given a HANDLE to a file (e.g. C:\\FolderA\\file.txt), I want a function which will return a HANDLE to the containing directory (in the previous example, it would be a HANDLE to C:\\FolderA). For example:
HANDLE hFile = CreateFileA(
"C:\\FolderA\\file.txt",
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
HANDLE hDirectory = somefunc(hFile);
Possible implementation for someFunc:
HANDLE someFunc(HANDLE h)
{
char *path = getPath(h); // "C:\\FolderA\\file.txt"
char *parent = getParentPath(path); // "C:\\FolderA"
HANDLE hFile = CreateFileA(
parent,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
free(parent);
free(path);
return hFile;
}
But is there a way to implement someFunc without getParentPath or without making it look at the string and removing everything after the last directory separator (because this is terrible from a performance point of view)?

I don't know what getParentPath is. I assume it's a function that searches for the trailing backslash in the string and uses that to strip off the file specification. You don't have to define such a function yourself; Windows already provides one for you—PathCchRemoveFileSpec. (Note that this assumes the specified path actually contains a file name to remove. If the path doesn't contain a file name, it will remove the trailing directory name. There are other functions you can use to verify whether a path contains a file specification.)
The older version of this function is PathRemoveFileSpec, which is what you would use on downlevel operating systems where the newer, safer function is not available.
Outside of the Windows API, there are other ways of doing the same thing. If you're targeting C++17, there is the filesystem::path class. Boost provides something similar. Or you could write it yourself with the find_last_of member function of the std::string class, if you absolutely have to. (But prefer not to re-invent the wheel. There are lots of edge cases when it comes to path manipulation that you probably won't think of, and that your testing probably won't reveal.)
You express concerns about the performance of this approach. This is nonsense. Stripping some characters from a string is not a slow operation. It wouldn't even be slow if you started searching from the beginning of the string and then, once you found the file specification, made a second copy of the string, again starting from the beginning of the string. It's a simple loop searching through the characters of a reasonable-length string, and then a simple memcpy. There is absolutely no way that this operation could be a performance bottleneck in code that does file I/O.
But, the implementation probably isn't even going to be that naïve. You can optimize it by starting the search from the end of the path string, reducing the number of characters that you have to iterate through, and you can avoid any type of memory copy altogether if you're allowed to manipulate the original string. With a C-style string, you just replace the trailing path separator (the one that demarcates the beginning of the path specification) with a NUL character (\0). With a C++-style string, you just call the erase member function.
In fact, if you really care about performance, this is virtually guaranteed to be faster than making a system call to retrieve the containing folder from a file object. System calls are a lot slower than some compiler-generated, inlinable code to iterate through a string and strip out a sub-string.
Once you have the path to the directory, you can obtain a HANDLE to it by calling the CreateFile function with the FILE_FLAG_BACKUP_SEMANTICS flag. (It is necessary to pass that flag if you want to retrieve a handle to a directory.
I have measured that this is slow and am looking for a faster way.
Your measurements are wrong. Either you've made the common mistake of benchmarking a debugging build, where the standard library functionality (e.g., std::string) is not optimized, and/or the real performance bottleneck is the file I/O. CreateFile is not a speedy function by any stretch of the imagination. I can almost guarantee that is going to be your hotspot.
Note that if you don't already have the path, it is straightforward to obtain the path from a HANDLE to a file. As was pointed out in the comments, on Windows Vista and later, you simply need to call the GetFinalPathNameByHandle function. More details are available in this article on MSDN, including sample code and an alternative for use on downlevel versions of Windows.
As was mentioned already in the comments to the question, you can optimize this further by allocating a buffer of length MAX_PATH (or perhaps even larger) on the stack. That compiles to a single instruction to adjust the stack pointer, so it won't be a performance bottleneck, either. (Okay, I lied: you actually will need two instructions—one to create space on the stack, and the other to free the allocated space on the stack. Still not a performance problem.) That way, you don't even have to do any dynamic memory allocation.
Note that for maximum robustness, especially on Windows 10, you want to handle the case that a path is longer than MAX_PATH. In such cases, your stack-allocated buffer will be too small, and the function you call to fill it will return an error. Handle that error, and allocate a larger buffer on the free store. That will be slower, but this is an edge case and probably not one that is worth optimizing. The 99% common case will use the stack-allocated buffer.
Furthermore, eryksun points out (in comments to this answer) that, although it is convenient, GetFinalPathNameByHandle requires multiple system calls to map the file object between the NT and DOS namespaces and to normalize the path. I haven't disassembled this function, so I can't confirm his claims, but I have no reason to doubt them. Under normal circumstances, you wouldn't worry about this sort of overhead or possible performance costs, but since this seems to be a big concern for your application, you can use eryksun's alternative suggestion of calling GetFileInformationByHandleEx and requesting the FileNameInfo class. GetFileInformationByHandleEx is a general, multi-purpose function that can retrieve all different sorts of information about a file, including the path. Its implementation is simpler, calling directly down to the native NtQueryInformationFile function. I would have thought GetFinalPathNameByHandle was just a user-mode wrapper providing exactly this service, but eryksun's research suggests it is doing extra work that you might want to avoid if this is truly a performance hot-spot. I have to qualify this slightly by noting that GetFileInformationByHandleEx, in order to retrieve the FileNameInfo, is going to have to create an I/O Request Packet (IRP) and call down to the underlying device driver. That's not a cheap operation, so I'm not sure that the additional overhead of normalizing the path is really going to matter. But in this case, there's no real harm in using the GetFileInformationByHandleEx approach, since it's a documented function.
If you've written the code as described but are still having measurable performance problems, then please post that code for someone to review and help you optimize. The Code Review Stack Exchange site is a great place to get help like that on working code. Feel free to leave me a link to such a question in a comment under this answer so that I don't miss it.
Whatever you do, please stop calling the ANSI versions of the Windows API functions (the ones that end with an A suffix). You want the wide-character (Unicode) versions. These end with a W suffix, and work with strings composed of WCHAR (== wchar_t) characters. Aside from the fact that the ANSI versions have been deprecated for decades now because they do not provide Unicode support (it is not optional for any application written after the year 2000 to support Unicode characters in paths), as much as you care about performance, you should be aware of the fact that all A-suffixed API functions are just stubs that convert the passed-in ANSI string to a Unicode string and then delegate to the W-suffixed version. If the function returns a string, a second conversion also must be done by the A-suffixed version, since all native APIs work with Unicode strings. Performance isn't the real reason why you should avoid calling ANSI functions, but perhaps it's one that you'll find more convincing.
There might be a way to do what you want (map a file object via a HANDLE to its containing directory), but it would require undocumented usage of the NT native API. I don't see anything at all in the documented functions that would allow you to obtain this information. It certainly isn't accessible via the GetFileInformationByHandleEx function. For better or worse, the user-mode file system API is almost entirely path-based. Presumably, it is tracked internally, but even the documented NT native API functions that take a root directory HANDLE (e.g., NtDeleteFile via the OBJECT_ATTRIBUTES structure) allow this field to be NULL, in which case the full path string is used.
As always, if you had provided more details on the bigger picture, we could probably provide a more appropriate solution. This is what the commenters were driving at when they mentioned an XY problem. Yes, people are questioning your motives because that's how we provide the most appropriate help.

Creating a new file avoiding race conditions

I need to develop a C++ routine performing this apparently trivial task: create a file only if it does not exist, else do nothing/raise error.
As I need to avoid race conditions, I want to use the "ask forgiveness not permission" principle (i.e. attempting the intended operation and checking if it succeeded, as opposed to checking preconditions in advance), which, to my knowledge, is the only robust and portable method for this purpose [Wikipedia article][an example with getline].
Still, I could not find a way to implement it in my case. The best I could come up with is opening a fstream in app mode (or fopening with "a"), checking the output position with tellp (C++) or ftell (C) and aborting if such position is not zero. This has however two disadvantages, namely that if the file exists it gets locked (although for a short time) and its modification date is altered.
I checked other possible combinations of ios_base::openmode for fstream, as well as the mode strings of fopen but found no option that suited my needs. Further search in the C and C++ standard libraries, as well as Boost Filesystem, proved unfruitful.
Can someone point out a method to perform my task in a robust way (no collateral effects, no race conditions) without relying on OS-specific functions?
My specific problem is in Windows, but portable solutions would be preferred.
EDIT: The answer by BitWhistler completely solves the problem for C programs. Still, I am amazed that no C++ idiomatic solution seems to exist. Either one uses open with the O_EXCL attribute as proposed by Andrew Henle, which is however OS-specific (in Windows the attribute seems to be called _O_EXCL with an additional underscore [MSDN]) or one separately compiles a C11 file and links it from the C++ code. Moreover, the file descriptor obtained cannot be converted to a stream except with nonstandard extensions (e.g. GCC's __gnu_cxx::stdio_filebuf). I hope a future version of C++ will implement the "x" subattribute and possibly also a corresponding ios:: modificator for file streams.

The new C standard (C2011, which is not part of C++) adds a new standard subspecifier ("x"), that can be appended to any "w" specifier (to form "wx", "wbx", "w+x" or "w+bx"/"wb+x"). This subspecifier forces the function to fail if the file exists, instead of overwriting it.
source: http://www.cplusplus.com/reference/cstdio/fopen/

C++ restore stream fmtflags

Is it a good practice to avoid modifying the fmtflags of a stream permanently?
For instance, the function I wrote does
std::ios_base::fmtflags flags = std::cout.setf(std::ios_base::boolalpha);
at the beginning and
std::cout.setf(flags);
right before the end.
Should I do this? Suppose multiple unrelated functions use the same stream and require different formatting options.

Yes, it seems self-evident that leaving the stream in the state you found it (as pertains to formatting flags) is a decent thing to do in the general case.
Whether you should actually do it in your specific case entirely depends on what other components use the same stream and what their expectations are of it.

What does ifstream::open() really do?

Consider this code:
ifstream filein;
filein.open("y.txt");
When I use the open() function, what happens?
Does the file stream itself get opened?
or does the object's state change to open?
or both?

It's not clear if you want to know implementation details or standard requirements - but as for implementation details - it will call the underlying open system call on the operating system. On Linux for example this is called open. On Windows it is called CreateFile.

The filestream being open or closed is represented by it's state. So if you change the state to open, the filestream is now open. Like a doorway. If you open it, you've changed it's state to the open position. Then you can later close it, which involves changing it's state to the closed position. Changing its state to open and opening the stream are the exact same thing.

The std::ifstream is set up to own a std::filebuf which is a class derived from std::streambuf. The stream buffer is managing buffering for streams in a generic way and abstracts the details of how a stream is accessed. For a std::filebuf the underlying stream is an operating system file accessed as needed. When std::ifstream::open() is called this call is mainly delegated to std::filebuf::open() which does the actual work. However, the std::ifstream will clear() its state bits if the call to std::filebuf::open() succeeds and set std::ios_base::failbit if the call fails. The file buffer will call the system's method to allocate a file handle and, if successful, arrange for this file handle to be released in its destructor or in the std::filebuf::close() function - whatever comes first. When calling std::ifstream::open() with the default arguments the system call will check that the file exists, is accessible, not too many file handles are open, etc. There is an std::ios_base::openmode parameter which can be used to modify the behavior in some ways and when different flags are used when calling std::ofstream::open().
Whether the call to std::filebuf::open() has any other effects is up to the implementation. For example, the implementation could choose to obtain a sequence of bytes and convert them into characters. Since the user can override certain setting, in particular the std::locale (see the std::streambuf::pubimbue() function), it is unlikely that much will happen prior to the first read, though. In any case, none of the state flags would be affected by the outcome of any operation after opening the file itself.
BTW, the mentioned classes are actually all templates (std::basic_ifstream, std::basic_filebuf, std::basic_streambuf, and std::basic_ofstream) which are typedef'ed to the names used above for the instantiation working on char as a character type. There are similar typedefs using a w prefix for instantiations working on wchar_t. Interestingly, there are no typedefs for the char16_t and char32_t versions and it seems it will be a bit of work to get them instantiated as well.

If you think logically, ifstream is just the stream in which we will get our file contents. The parameters, we provide to ifstream.open() will open the file and mark it as open. When the file is marked as open, it will not allow you to do some operations on file like renaming a file as it is opened by some program. It will allow you to do the same after you close the stream. ifstream - imo is only the helper class to access files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js