What is the safe alternative to realpath()?

What is the safe alternative to realpath()? - c++

I am working on a program which uses realpath() to get the absolute path of a file. Unfortunately, this function takes a string buffer that is expected to be so large that it is big enough and that's not safe when this application has to run across multiple platforms. Is there a safe version of this function which avoids the buffer overflow issue, perhaps using dynamic memory allocation?

See here for information on safe and portable use of realpath:
http://www.opengroup.org/onlinepubs/9699919799/functions/realpath.html
Basically, modern standards allow you to pass a NULL pointer, and realpath will allocate a buffer of the appropriate length. If you want to be portable to legacy systems which do not support this standard, simply check #ifdef PATH_MAX and use a fixed-size buffer of length PATH_MAX. As far as I know, there are no legacy systems that lack a constant PATH_MAX but which do not support NULL arguments to realpath.

From the manpage:
If resolved_path is specified as
NULL, then realpath() uses malloc(3)
to allocate a buffer of up to PATH_MAX
bytes to hold the resolved pathname,
and returns a pointer to this buffer.
The caller should deallocate this
buffer using free(3).buffer using free(3).
So it seems like you can just do this:
char *real_path = realpath(path, NULL);
// use real_path
free(real_path);

There is a way to do the same in Boost using boost::filesystem:
#include <boost/filesystem.hpp>
using namespace boost::filesystem;
try{
path absolutePath = canonical("./../xxx"); //will throw exception if file not exists!
}
catch{...}{
cout << "File not exists";
}
cout << absolutePath.string();
There is QT way to do it (got from here):
QFileInfo target_file_name(argv[1]);
QString absolute_path = target_file_name.absolutePath()
There is a bit complicated implementation of realpath in C++. Cannot tell anything about it's safety, but it should allow path with length more then PATH_MAX. Going to test it soon.

Related

Function to call #include macro from a string variable argument?

Is it possible to have a function like this:
const char* load(const char* filename_){
return
#include filename_
;
};
so you wouldn't have to hardcode the #include file?
Maybe with a some macro?
I'm drawing a blank, guys. I can't tell if it's flat out not possible or if it just has a weird solution.
EDIT:
Also, the ideal is to have this as a compile time operation, otherwise I know there's more standard ways to read a file. Hence thinking about #include in the first place.

This is absolutely impossible.
The reason is - as Justin already said in a comment - that #include is evaluated at compile time.
To include files during run time would require a complete compiler "on board" of the program. A lot of script languages support things like that, but C++ is a compiled language and works different: Compile and run time are strictly separated.

You cannot use #include to do what you want to do.
The C++ way of implementing such a function is:
Find out the size of the file.
Allocate memory for the contents of the file.
Read the contents of the file into the allocated memory.
Return the contents of the file to the calling function.
It will better to change the return type to std::string to ease the burden of dealing with dynamically allocated memory.
std::string load(const char* filename)
{
std::string contents;
// Open the file
std::ifstream in(filename);
// If there is a problem in opening the file, deal with it.
if ( !in )
{
// Problem. Figure out what to do with it.
}
// Move to the end of the file.
in.seekg(0, std::ifstream::end);
auto size = in.tellg();
// Allocate memory for the contents.
// Add an additional character for the terminating null character.
contents.resize(size+1);
// Rewind the file.
in.seekg(0);
// Read the contents
auto n = in.read(contents.data(), size);
if ( n != size )
{
// Problem. Figure out what to do with it.
}
contents[size] = '\0';
return contents;
};
PS
Using a terminating null character in the returned object is necessary only if you need to treat the contents of the returned object as a null terminated string for some reason. Otherwise, it maybe omitted.

I can't tell if it's flat out not possible
I can. It's flat out not possible.
Contents of the filename_ string are not determined until runtime - the content is unknown when the pre processor is run. Pre-processor macros are processed before compilation (or as first step of compilation depending on your perspective).
When the choice of the filename is determined at runtime, the file must also be read at runtime (for example using a fstream).
Also, the ideal is to have this as a compile time operation
The latest time you can affect the choice of included file is when the preprocessor runs. What you can use to affect the file is a pre-processor macro:
#define filename_ "path/to/file"
// ...
return
#include filename_
;

it is theoretically possible.
In practice, you're asking to write a PHP construct using C++. It can be done, as too many things can, but you need some awkward prerequisites.
a compiler has to be linked into your executable. Because the operation you call "hardcoding" is essential for the code to be executed.
a (probably very fussy) linker again into your executable, to merge the new code and resolve any function calls etc. in both directions.
Also, the newly imported code would not be reachable by the rest of the program which was not written (and certainly not compiled!) with that information in mind. So you would need an entry point and a means of exchanging information. Then in this block of information you could even put pointers to code to be called.
Not all architectures and OSes will support this, because "data" and "code" are two concerns best left separate. Code is potentially harmful; think of it as nitric acid. External data is fluid and slippery, like glycerine. And handling nitroglycerine is, as I said, possible. Practical and safe are something completely different.
Once the prerequisites were met, you would have two or three nice extra functions and could write:
void *load(const char* filename, void *data) {
// some "don't load twice" functionality is probably needed
void *code = compile_source(filename);
if (NULL == code) {
// a get_last_compiler_error() would be useful
return NULL;
}
if (EXIT_SUCCESS != invoke_code(code, data)) {
// a get_last_runtime_error() would also be useful
release_code(code);
return NULL;
}
// it is now the caller's responsibility to release the code.
return code;
}
And of course it would be a security nightmare, with source code left lying around and being imported into a running application.
Maintaining the code would be a different, but equally scary nightmare, because you'd be needing two toolchains - one to build the executable, one embedded inside said executable - and they wouldn't necessarily be automatically compatible. You'd be crying loud for all the bugs of the realm to come and rejoice.
What problem would be solved?
Implementing require_once in C++ might be fun, but you thought it could answer a problem you have. Which is it exactly? Maybe it can be solved in a more C++ish way.
A better alternative, considering also performances etc., to compile a loadable module beforehand, and load it at runtime.
If you need to perform small tunings to the executable, place parameters into an external configuration file and provide a mechanism to reload it. Once the modules conform to a fixed specification, you can even provide "plugins" that weren't available when the executable was first developed.

Getting address of string in stringstream object

Is it possible to get the pointer to the string stored in string stream object . Basically i want to copy the string from that location into another buffer ..
I found that i can get the length from below code
myStringStreamObj.seekg(0, ios::end);
lengthForStringInmyStringStreamObj = myStringStreamObj.tellg();
I know i can always domyStringStreamObj.str().c_str(). However my profiler tells me this code is taking time and i want to avoid it . hence i need some good alternative to get pointer to that string.
My profiler also tells me that another part of code where is do myStringStreamObj.str(std::string()) is slow too . Can some one guide me on this too .
Please , I cant avoid stringstream as its part of a big code which i cant change / dont have permission to change .

The answer is "no". The only documented API to obtain the formatted string contents is the str() method.
Of course, there's always a small possibility that whatever the compiler or platform you're using might have its own specific non-standard and/or non-documented methods for accessing the internals of a stringstream object; which might be faster. Because your question did not specify any particular compiler or implementation, I must conclude that you are looking for a portable, standards-compliant answer; so the answer in that case is a pretty much a "no".
I would actually be surprised if any particular compiler or platform, even some who might have a "reputation" for poisonings language standards <cough>, would offer any alternatives. I would expect that all implementations would prefer to keep the internal stringstream moving gears private, so that they can be tweaked and fiddled with, in future releases, without breaking binary ABI compatibility.
The only other possibility you might want to investigate is to obtain the contents of the stringstream using its iterators. This is, of course, an entirely different mechanism for pulling out what's in a stringstream; and is not as straightforward as just calling one method that hands you the string, on a silver platter; so it's likely to involve a fairly significant rewrite of your code that works with the returned string. But it is possible that iterating over what's in the stringstream might turn out to be faster since, presumably, there will not be any need for the implementation to allocate a new std::string instance just for str()'s benefit, and jamming everything inside it.
Whether or not iterating will be faster in your case depends on how your implementation's stringstream works, and how efficient the compiler is. The only way to find out is to go ahead and do it, then profile the results.

I can not provide a portable, standards compliant method.
Although you can't get the internal buffer you can provide your own.
According to the standard setting the internal buffer of a std::stringbuf object has implementation defined behaviour.
cplusplus.com: std::stringbuf::setbuf()
As it happens in my implementation of GCC 4.8.2 the behaviour is to use the external buffer you provide instead if its internal std::string.
So you can do this:
int main()
{
std::ostringstream oss;
char buf[1024]; // this is where the data ends up when you write to oss
oss.rdbuf()->pubsetbuf(buf, sizeof(buf));
oss << "Testing" << 1 << 2 << 3.2 << '\0';
std::cout << buf; // see the data
}
But I strongly advise that you only do stuff like this as a (very) temporary measure while you sort out something more efficient that is portable according to the standard.
Having said all that I looked at how my implementation implements std::stringstream::str() and it basically returns its internal std::string so you get direct access and with optimization turned on the function calls should be completely optimized away. So this should be the preferred method.

Redirect FILE handle to char-buffer

I'm using a third-party library that allows conversion between two file formats A and B. I would like to use this library to load a file of format A and convert it to format B, but I only need the converted representation in memory. So I would like to do the conversion without actually saving a file of the target format to disk and rather obtain an unsigned char* buffer or something similar. Unfortunately the libraries only conversion function is of the form
void saveAsB(A& a, std::FILE *const file);
What can I do? Is there any way to redirect the write operations performed on the handle to some buffer?

If your platform supports it, use open_memstream(3). This will be available on Linux and BSD systems, and it's probably better than fmemopen() for your use case because open_memstream() allocates the output buffer dynamically rather than you having to know the maximum size in advance.
If your platform doesn't have those functions, you can always use a "RAM disk" approach, which again on Linux would be writing a "file" to /dev/shm/ which will never actually reach any disk, but rather be stored in memory.
Edit: OK, so you say you're using Windows. Here's an outline of what you can try:
Open a non-persisted memory-mapped files.
Use _open_osfhandle to convert the HANDLE to an int file descriptor.
Use _fdopen to convert the int file descriptor to FILE*.
Cross your fingers. I haven't tested any of this.
I found this reference useful in putting the pieces together: http://www.codeproject.com/Articles/1044/A-Handy-Guide-To-Handling-Handles
Edit 2: It looks like CreateFileMapping() and _open_osfhandle() may be incompatible with each other--you would be at least the third person to try it:
https://groups.google.com/forum/#!topic/comp.os.ms-windows.programmer.win32/NTGL3h7L1LY
http://www.progtown.com/topic178214-createfilemapping-and-file.html
So, you can try what the last link suggested, which is to use setvbuf() to "trick" the data into flowing to a buffer you control, but even that has potential problems, e.g. it won't work if the library seeks within the FILE*.
So, perhaps you can just write to a file on some temporary/scratch filesystem and be done with it? Or use a platform other than Windows? Or use some "RAM disk" software.

If you can rely on POSIX being available, then use fmemopen().

Is there a C++ equivalent to getcwd?

I see C's getcwd via:
man 3 cwd
I suspect C++ has a similar one, that could return me a std::string .
If so, what is it called, and where can I find it's documentation?
Thanks!

Ok, I'm answering even though you already have accepted an answer.
An even better way than to wrap the getcwd call would be to use boost::filesystem, where you get a path object from the current_path() function. The Boost filesystem library allows you to do lots of other useful stuff that you would otherwise need to do a lot of string parsing to do, like checking if files/directories exist, get parent path, make paths complete etcetera. Check it out, it is portable as well - which a lot of the string parsing code one would otherwise use likely won't be.
Update (2016): Filesystem has been published as a technical specification in 2015, based on Boost Filesystem v3. This means that it may be available with your compiler already (for instance Visual Studio 2015). To me it also seems likely that it will become part of a future C++ standard (I would assume C++17, but I am not aware of the current status).
Update (2017): The filesystem library has been merged with ISO C++ in C++17, for
std::filesystem::current_path();

std::string's constructor can safely take a char* as a parameter. Surprisingly there's a windows version too.
Edit: actually it's a little more complicated:
std::string get_working_path()
{
char temp[MAXPATHLEN];
return ( getcwd(temp, sizeof(temp)) ? std::string( temp ) : std::string("") );
}
Memory is no problem -- temp is a stack based buffer, and the std::string constructor does a copy. Probably you could do it in one go, but I don't think the standard would guarantee that.
About memory allocation, via POSIX:
The getcwd() function shall place an absolute pathname of the current working directory in the array pointed to by buf, and return buf. The pathname copied to the array shall contain no components that are symbolic links. The size argument is the size in bytes of the character array pointed to by the buf argument. If buf is a null pointer, the behavior of getcwd() is unspecified.

Let's try and rewrite this simple C call as C++:
std::string get_working_path()
{
char temp [ PATH_MAX ];
if ( getcwd(temp, PATH_MAX) != 0)
return std::string ( temp );
int error = errno;
switch ( error ) {
// EINVAL can't happen - size argument > 0
// PATH_MAX includes the terminating nul,
// so ERANGE should not be returned
case EACCES:
throw std::runtime_error("Access denied");
case ENOMEM:
// I'm not sure whether this can happen or not
throw std::runtime_error("Insufficient storage");
default: {
std::ostringstream str;
str << "Unrecognised error" << error;
throw std::runtime_error(str.str());
}
}
}
The thing is, when wrapping a library function in another function you have to assume that all the functionality should be exposed, because a library does not know what will be calling it. So you have to handle the error cases rather than just swallowing them or hoping they won't happen.
It's usually better to let the client code just call the library function, and deal with the error at that point - the client code probably doesn't care why the error occurred, and so only has to handle the pass/fail case, rather than all the error codes.

You'll need to just write a little wrapper.
std::string getcwd_string( void ) {
char buff[PATH_MAX];
getcwd( buff, PATH_MAX );
std::string cwd( buff );
return cwd;
}

I used getcwd() in C in the following way:
char * cwd;
cwd = (char*) malloc( FILENAME_MAX * sizeof(char) );
getcwd(cwd,FILENAME_MAX);
The header file needed is stdio.h.
When I use C compiler, it works perfect.
If I compile exactly the same code using C++ compiler, it reports the following error message:
identifier "getcwd" is undefined
Then I included unistd.h and compiled with C++ compiler.
This time, everything works.
When I switched back to the C compiler, it still works!
As long as you include both stdio.h and unistd.h, the above code works for C AND C++ compilers.

All C functions are also C++ functions. If you need a std::string, just create one from the char* that getcwd gets for you.

I also used boost::filesystem as stated in another answer above. I just wanted to add that since the current_path() function does not return a std::string, you need to convert it.
Here is what I did:
std::string cwd = boost::filesystem::current_path().generic_string();

You could create a new function, which I would prefer over linking to a library like boost(unless you already are).
std::string getcwd()
{
char* buff;//automatically cleaned when it exits scope
return std::string(getcwd(buff,255));
}

Non-threadsafe file I/O in C/C++

While troubleshooting some performance problems in our apps, I found out that C's stdio.h functions (and, at least for our vendor, C++'s fstream classes) are threadsafe. As a result, every time I do something as simple as fgetc, the RTL has to acquire a lock, read a byte, and release the lock.
This is not good for performance.
What's the best way to get non-threadsafe file I/O in C and C++, so that I can manage locking myself and get better performance?
MSVC provides _fputc_nolock, and GCC provides unlocked_stdio and flockfile, but I can't find any similar functions in my compiler (CodeGear C++Builder).
I could use the raw Windows API, but that's not portable and I assume would be slower than an unlocked fgetc for character-at-a-time I/O.
I could switch to something like the Apache Portable Runtime, but that could potentially be a lot of work.
How do others approach this?
Edit: Since a few people wondered, I had tested this before posting. fgetc doesn't do system calls if it can satisfy reads from its buffer, but it does still do locking, so locking ends up taking an enormous percentage of time (hundreds of locks to acquire and release for a single block of data read from disk). Not doing character-at-a-time I/O would be a solution, but C++Builder's fstream classes unfortunately use fgetc (so if I want to use iostream classes, I'm stuck with it), and I have a lot of legacy code that uses fgetc and friends to read fields out of record-style files (which would be reasonable if it weren't for locking issues).

I'd simply not do IO a char at a time if it is sensible performance wise.

fgetc is almost certainly not reading a byte each time you call it (where by 'reading' I mean invoking a system call to perform I/O). Look somewhere else for your performance bottleneck, as this is probably not the problem, and using unsafe functions is certainly not the solution. Any lock handling you do will probably be less efficient than the handling done by the standard routines.

The easiest way would be to read the entire file in memory, and then provide your own fgetc-like interface to that buffer.

Why not just memory map the file? Memory mapping is portable (except in Windows Vista which requires you to jump through hopes to use it now, the dumbasses). Anyhow, map your file into memory, and do you're own locking/not-locking on the resulting memory location.
The OS handles all the locking required to actually read from the disk - you'll NEVER be able to eliminate this overhead. But your processing overhead, on the otherhand, won't be affected by extraneous locking other than that which you do yourself.

the multi-platform approach is pretty simple. Avoid functions or operators where standard specifies that they should use sentry. sentry is an inner class in iostream classes which ensures stream consistency for every output character and in multi-threaded environment it locks the stream related mutex for each character being output. This avoids race conditions at low level but still makes the output unreadable, since strings from two threads might be output concurrently as the following example states:
thread 1 should write: abc
thread 2 should write: def
The output might look like: adebcf instead of abcdef or defabc. This is because sentry is implemented to lock and unlock per character.
The standard defines it for all functions and operators dealing with istream or ostream. The only way to avoid this is to use stream buffers and your own locking (per string for example).
I have written an app, which outputs some data to a file and mesures the speed. If you add here a function which ouptuts using the fstream directly without using the buffer and flush, you will see the speed difference. It uses boost, but I hope it is not a problem for you. Try to remove all the streambuffers and see the difference with and without them. I my case the performance drawback was factor 2-3 or so.
The following article by N. Myers will explain how locales and sentry in c++ IOStreams work. And for sure you should look up in ISO C++ Standard document, which functions use sentry.
Good Luck,
Ovanes
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <cstdlib>
#include <boost/progress.hpp>
#include <boost/shared_ptr.hpp>
double do_copy_via_streambuf()
{
const size_t len = 1024*2048;
const size_t factor = 5;
::std::vector<char> data(len, 1);
std::vector<char> buffer(len*factor, 0);
::std::ofstream
ofs("test.dat", ::std::ios_base::binary|::std::ios_base::out);
noskipws(ofs);
std::streambuf* rdbuf = ofs.rdbuf()->pubsetbuf(&buffer[0], buffer.size());
::std::ostreambuf_iterator<char> oi(rdbuf);
boost::progress_timer pt;
for(size_t i=1; i<=250; ++i)
{
::std::copy(data.begin(), data.end(), oi);
if(0==i%factor)
rdbuf->pubsync();
}
ofs.flush();
double rate = 500 / pt.elapsed();
std::cout << rate << std::endl;
return rate;
}
void count_avarage(const char* op_name, double (*fct)())
{
double av_rate=0;
const size_t repeat = 1;
std::cout << "doing " << op_name << std::endl;
for(size_t i=0; i<repeat; ++i)
av_rate+=fct();
std::cout << "average rate for " << op_name << ": " << av_rate/repeat
<< "\n\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n"
<< std::endl;
}
int main()
{
count_avarage("copy via streambuf iterator", do_copy_via_streambuf);
return 0;
}

One thing to consider is to build a custom runtime. Most compilers provide the source to the runtime library (I'd be surprised if it weren't in the C++ Builder package).
This could end up being a lot of work, but maybe they've localized the thread support to make something like this easy. For example, with the embedded system compiler I'm using, it's designed for this - they have documented hooks to add the lock routines. However, it's possible that this could be a maintenance headache, even if it turns out to be relatively easy initially.
Another similar route would be to talk to someone like Dinkumware about using a 3rd party runtime that provides the capabilities you need.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js