How to query the length of the boost::filesystem::path? - c++

I couldn't find a 'path length' method in the boost::filesystem::path, is there one?
If there is no such method (why?) - should I use .native().length() or .string().length() ?
I take it .string().length() should be faster, right?

.native() directly returns the internal representation of the path, while string() might perform some conversions. All in all, it won't make much difference though whether you use native().length() or string().length().

How about string() method? (returns std::string)
fs::path path;
...
path.string().size();

There is no length on path and it doesn't really follow why you would want it.
.string() is the generally recommended thing to use for externally visible representations. Check out the path decomposition table in their docs to get that warm fuzzy reassurance on what to expect.
I have no reason to believe performance would differ with either. You probably shouldn't worry about it until your profiler tells you to. :)

Related

String.blit vs String.sub in OCaml

Is it better to use String.blit or String.sub in OCaml? By better I mean more time- or memory-efficient or even just more idiomatic.
I.e. is it "better" to do:
let new_string = String.sub old_string 0 4;;
Or
String.blit old_string 0 new_string 0 4;; (* I guess new_string is a byte seq here. *)
As those two functions have different semantics, it depends on what you want to do.
String.blit could indeed be used to copy part of a string into a new fresh string, but I think you should not use it instead of sub. First, that would be bad when you switch to OCaml 4.02 and try to use Bytes. Then, it makes your code way less clear (and you'd also have to add a string creation).
Also, note that blit is an imperative feature, whereas sub is itself functional. So it mainly depend on your personal programing style. In matters of performance though, they are quite comparable. That may change if the dev team decides to use a different representation for constant strings (no hurry though, it won't happen in the next few years).
Both of them call the same function unsafe_blit internally. I would use whichever makes your code the clearest.
String.blit allows reusing the same buffer or working with parts of buffers. If you don't need those things, just use sub.
In terms of complexity, both operations are linear in the size of the substring/interval.

Avoiding getting a substring?

I have a situation where I have a std::string, and I only need characters x to x + y, and I think it would speed it up quite a bit if I instead could somehow do (char*)&string[x], but the problem is all my functions expect a NULL terminated string.
What can I do?
Thanks
Nothing nice can be done. The only trick I can think of is temporarily setting s[x+y+1] to 0, pass &s[x], then restore the character. But you should resort to this ONLY if you are sure this will reasonably boost the performance and that boost is necessary
nothing (if the string you need is in the middle). the speed difference will be utterly trivial unless its being done A LOT (several millions)
Use:
string.c_str() + x;
This assumes your function takes a const char *
If you need actual 0-termination, you'll have to copy.
You have no choice here. You can't create a null-terminated substring without copying or modifying the original string.
You say you "think it would speed it up". Have you measured?
You could overwrite &string[x+y+1] with a NUL, and pass &string[x] to your functions. If you're going to need the whole string again afterward, you can save it for the duration, and restore it when needed.

C++ Dynamically convert string to any basic type

In C++ I need to convert a string to any type at runtime where I do not know what type I might be getting in the string. I have heard there is a lexical_cast in boost that I can use, but what would be the most effective way to implement it?
I might get a bunch of string like this from a client: Date="25/08/2010", Someval="2", Blah="25.5".
Now I want to be able to convert these strings to their type, eg, the Somval is obviously an int, and the Date could be a boost::date or whatever. The point is, I don't know at this time in what order these would be given to me, so it's hard to write some code that will perform a bunch of casts.
I could use a bunch of if/else statements or a switch/case statements, however I'm thinking that there is possibly a better way to do this.
I'm not looking for something different to lexical_cast, I can totally use that, I am looking to see if someone knows a better way then doing this:
std::string str = "256";
int a = lexical_cast<int>(str);
//now check if the cast worked, if not, try another...
This is too much of a guessing game, and if I have 10 possible types, for any given string, it sounds a bit ineffective. Especially if it has to do 1000's of these at any given time.
Can anybody advice?
Alex Brown notes - the example string is a fragment of the XML data that comes from the client.
Use an XML parser to read XML data, it will do almost all of the legwork for you, and deal with the ordering issues. Then you simply need to ask the parser for the data you need for the calculation.
Details differ with different XML parsers - go find one, read the documentation. If you need more help, come back here with an XML parser question.
GMan is right, you can not cast an arbitrary string to for example a Date type if the underlaying data structure is different. You can, however, parse the content and instantiate a new object using the data in the string. std::atoi() parses a c-string to an int for example.
You need to parse the string, not cast it.
What you're describing is actually a parser. Even the trial-and-error approach using lexical_cast is really just a (crude) parser.
I suggest to clarify the format of the input string and then, if it's simple enough, write a Recursive descent parser by hand to parse the input string into whatever data structure is convenient for your need.
you could use a VARIANT type of struct (i.e. one of every possible results, and a "type" specifying which it was, and a big enum of types), and a ConvertStringToVariant() function.
This is too much of a guessing game,
and if I have 10 possible types, for
any given string
If you're concerned about this, you need a lexical analyzer, such as flex or Boost::Spirit.
It will still be a guessing game, but a more "informed" guessing one.

What's the difference between istringstream, ostringstream and stringstream? / Why not use stringstream in every case?

When would I use std::istringstream, std::ostringstream and std::stringstream and why shouldn't I just use std::stringstream in every scenario (are there any runtime performance issues?).
Lastly, is there anything bad about this (instead of using a stream at all):
std::string stHehe("Hello ");
stHehe += "stackoverflow.com";
stHehe += "!";
Personally, I find it very rare that I want to perform streaming into and out of the same string stream.
Usually I want to either initialize a stream from a string and then parse it; or stream things to a string stream and then extract the result and store it.
If you're streaming to and from the same stream, you have to be very careful with the stream state and stream positions.
Using 'just' istringstream or ostringstream better expresses your intent and gives you some checking against silly mistakes such as accidental use of << vs >>.
There might be some performance improvement but I wouldn't be looking at that first.
There's nothing wrong with what you've written. If you find it doesn't perform well enough, then you could profile other approaches, otherwise stick with what's clearest. Personally, I'd just go for:
std::string stHehe( "Hello stackoverflow.com!" );
A stringstream is somewhat larger, and might have slightly lower performance -- multiple inheritance can require an adjustment to the vtable pointer. The main difference is (at least in theory) better expressing your intent, and preventing you from accidentally using >> where you intended << (or vice versa). OTOH, the difference is sufficiently small that especially for quick bits of demonstration code and such, I'm lazy and just use stringstream. I can't quite remember the last time I accidentally used << when I intended >>, so to me that bit of safety seems mostly theoretical (especially since if you do make such a mistake, it'll almost always be really obvious almost immediately).
Nothing at all wrong with just using a string, as long as it accomplishes what you want. If you're just putting strings together, it's easy and works fine. If you want to format other kinds of data though, a stringstream will support that, and a string mostly won't.
In most cases, you won't find yourself needing both input and output on the same stringstream, so using std::ostringstream and std::istringstream explicitly makes your intention clear. It also prevents you from accidentally typing the wrong operator (<< vs >>).
When you need to do both operations on the same stream you would obviously use the general purpose version.
Performance issues would be the least of your concerns here, clarity is the main advantage.
Finally there's nothing wrong with using string append as you have to construct pure strings. You just can't use that to combine numbers like you can in languages such as perl.
istringstream is for input, ostringstream for output. stringstream is input and output.
You can use stringstream pretty much everywhere.
However, if you give your object to another user, and it uses operator >> whereas you where waiting a write only object, you will not be happy ;-)
PS:
nothing bad about it, just performance issues.
std::ostringstream::str() creates a copy of the stream's content, which doubles memory usage in some situations. You can use std::stringstream and its rdbuf() function instead to avoid this.
More details here: how to write ostringstream directly to cout
To answer your third question: No, that's perfectly reasonable. The advantage of using streams is that you can enter any sort of value that's got an operator<< defined, while you can only add strings (either C++ or C) to a std::string.
Presumably when only insertion or only extraction is appropriate for your operation you could use one of the 'i' or 'o' prefixed versions to exclude the unwanted operation.
If that is not important then you can use the i/o version.
The string concatenation you're showing is perfectly valid. Although concatenation using stringstream is possible that is not the most useful feature of stringstreams, which is to be able to insert and extract POD and abstract data types.
Why open a file for read/write access if you only need to read from it, for example?
What if multiple processes needed to read from the same file?

Is there a 'catch' with FastFormat?

I just read about the FastFormat C++ i/o formatting library, and it seems too good to be true: Faster even than printf, typesafe, and with what I consider a pleasing interface:
// prints: "This formats the remaining arguments based on their order - in this case we put 1 before zero, followed by 1 again"
fastformat::fmt(std::cout, "This formats the remaining arguments based on their order - in this case we put {1} before {0}, followed by {1} again", "zero", 1);
// prints: "This writes each argument in the order, so first zero followed by 1"
fastformat::write(std::cout, "This writes each argument in the order, so first ", "zero", " followed by ", 1);
This looks almost too good to be true. Is there a catch? Have you had good, bad or indifferent experiences with it?
Is there a 'catch' with FastFormat?
Last time I checked, there was one annoying catch:
You can only use either the narrow string version or the wide string version of this library. (The functions for wchar_t and char are the same -- which type is used is a compile time switch.)
With iostreams, stdio or Boost.Format you can use both.
Found one "catch", though for most people it will never manifest. From the project page:
Atomic operation. It doesn't write out statement elements one at a time, like the IOStreams, so has no atomicity issues
The only way I can see this happening is if it buffers the whole write() call's output itself, then writes it out to the ostream in one step. This means it needs to allocate memory, and if an object passed into the write() call produces a lot of output (several megabytes or more), it can consume up to twice that much memory in internal buffers (assuming it uses the grow-a-buffer-by-doubling-its-size-each-time trick).
If you're just using it for logging, and not, say, dumping huge amounts of XML, you'll never see this problem.
The only other "catch" I'm seeing is:
Highly portable. It will work with all good modern C++ compilers; it even works with Visual C++ 6!
So it won't work with an old C++ compiler, like cfront, whereas iostreams is backward compatible to the late 80's. Again, I'd be surprised if anyone ever had a problem with this.
Although FastFormat is a good library there are a number of issues with it:
Limited formatting support, in particular the following features are not supported:
Leading zeros (or any other non-space padding)
Octal/hexadecimal encoding
Runtime width/alignment specification
The library is quite big for a relatively small task of formatting and has even bigger dependency (STLSoft).
It looks pretty interesting indeed! Good tip regardless, and +1 for that!
I've been playing with it for a bit. The main drawback I see is that FastFormat supports less formatting options for the output. This is I think a direct consequence of the way the higher typesafety is achieved, and a good tradeoff depending on your circumstances.
If you look in detail at his performance benchmark page, you'll notice that good old C printf-family functions are still winning on Linux. In fact, the only test case where they perform poorly is the test case that should be static string concatenations, where I would expect printf to be wasteful. Moreover, GCC provides static type-checking on printf-style function calls, so the benefit of type-safety is reduced. So: if you are running on Linux and if you need the absolute best performance, FastFormat is probably not the optimal solution.
The library depends on a couple of environment variables, as mentioned in the docs.
That might be no biggie to some people, but I'd prefer my code to be as self-contained as possible. If I check it out from source control, it should work and compile. It won't, if it requires you to set environment variables.