For example, the stringstream contains "abc\n", I want to remove the last char '\n'.
I know it can be done by using 'str' first.
But could it be done without stringstream::str()?
No, there isn't, at least not in a guaranteed manner. Although internally, it maintains a string buffer, you currently do not have access to it without a copy being made. There is a proposal to change this:
Streams have been the oldest part of the C++ standard library and their specification doesn’t take into account many things introduced since C++11. One of the oversights is that there is no non-copying access to the internal buffer of a basic_stringbuf which makes at least the obtaining
of the output results from an ostringstream inefficient, because a copy is always made. I personally speculate that this was also the reason why basic_strbuf took so long to get deprecated with its char * access.
With move semantics and basic_string_view there is no longer a reason to keep this pessimissation alive on basic_stringbuf.
Internally, there is no reason why there should be this limited, as I believe (I may be wrong) that basic_stringbuf requires a basic_string buffer, and Clang certainly implements basic_stringbuf in such a manner.
Right now, you can stringstream like any other stream, or access a copy of it's underlying buffer, however, you cannot modify the buffer directly. This means that any attempts to modify the end of the stream require copying the underlying buffer or reading bytes until the end.
stringstream ss;
ss<<"abc\n";
ss.seekp(-1, std::ios_base::end);
ss << '\0';
Related
Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?
EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.
If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.
Instead, consider a simple C string wrapper, like the proposed string_ref<T>.
Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:
struct binbuf_desc {
uint8_t* addr;
size_t len;
binbuf_desc(addr,len) : addr(addr), len(len) {}
}
You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.
There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)
A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.
Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string
There is no constraint that the data() method returns a zero-terminated string though.
If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)
Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.
When we have to work with string manipulation, is there any significants performance difference between std::string and std::stringbuf, and if yes why.
More generally when it is good to use std::stringbuf over std::string ?
A std::stringbuf uses a string internally to buffer data, so it is probably a bit slower. I don't think the difference would be significant though, because it basically just delegation. To be sure you'd have to run some performance-tests though.
std::stringbuf is useful when you want an IO-stream to use a string as buffer (like std::stringstream, which uses a std::stringbuf by default).
First of all, a std::stringbuf does not necessarily (or even ordinarily) use an std::string for its internal storage. For example, the standard describes initialization from an std::string as follows:
Constructs an object of class basic_stringbuf ... Then copies the content of
str into the basic_stringbuf underlying character sequence [...]
Note the wording: "character sequence" -- at least to me, this seems to be quite careful to avoid saying (or even implying) that the content should be stored in an actual string.
Past that, I think efficiency is probably a red herring. Both of them are fairly thin wrappers for managing dynamically allocated buffers of some sort of character-like sequence. There's a big difference in capabilities (e.g., string has lots of searching and insertion/deletion in the middle of a string that are entirely absent from stringbuf). Given its purpose, it might make sense to implement stringbuf on top of something like a std::deque, to optimize the (usual) path of insertion/deletion at the ends, but this is likely to be insubstantial for most uses.
If I were doing it, I'd probably be most worried by the fact that stringbuf is probably only tested along with stringstream, so if I used it differently than stringstream did, I might encounter problems, even if I'm following what the standard says it should support.
std::stringbuf extends std::string container, for reading and writing to std::string.
Generally, there is no significant performance difference between std::string and std::stringbuf.
Because std::streambuf <- std::stringbuf, both because:
typedef basic_stringbuf<char> stringbuf;
typedef basic_string<char> string;
Read this for more details.
I should preface this question by saying I think the answer is probably no, but I'd like to see what other people think about the issue.
I spend most of my time writing C++ that interacts with the Win32 API which like most C style APIs wants to either:
Take buffers which I've provided and operate on them.
Or return pointers to buffers which I need to later free.
Both of these scenarios essentially mean that if you want to use std::string in your code you've got to accept the fact that you're going to be doing a lot of string copying every time you construct a std::string from a temporary buffer.
What would be nice would be:
To be able to allow C style APIs to safely directly mutate a std::string and pre-reserve its allocation and set its size in advance (to mitigate scenario 1)
To be able to wrap a std::string around an existing char[] (to mitigate scenario 2)
Is there a nice way to do either of these, or should I just accept that there's an inherent cost in using std::string with old school APIs? It looks like scenario 1 would be particularly tricky because std::string has a short string optimisation whereby its buffer could either be on the stack or the heap depending on its size.
In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.
Previously, you can use .data() or .c_str() but the string is not mutable through these.
Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.
Well you could probably just const_cast the .data() of a string to char* and it would most likely work. As with all optimisations, make sure that it is actually this bit of the code that is the bottleneck. If it is, wrap this up in an inline-able function, or a template class or something so that you can write some tests for it and change the behaviour if it doesn't work on some platform.
I think the only thing that you can do safely with std::(w)string here is pass it as an input that's not going to be modified by its user; use .c_str() to get a pointer to (W)CHAR.
You may be able to use a std::vector<char> instead. You can directly pass a pointer to the first character into C code and let the C code write it which you can't do with a string. And many of the operations you'd want to perform on a string you can do on a std::vector<char> just as well.
Since C++11, you don't have to use temporary buffers. You can interchangeably use strings & c-strings and even write to the buffer of c++ strings, but you need to use string::front(), not string::data() or string::c_str() as those only return const char*. See Directly write into char* buffer of std::string.
while I was reading nVidia CUDA source code, I stumbled upon these two lines:
std::string stdDevString;
stdDevString = std::string(device_string);
Note that device_string is a char[1024]. The question is: Why construct an empty std::string, then construct it again with a C string as an argument? Why didn't they call std::string stdDevString = std::string(device_string); in just one line?
Is there a hidden string initialization behavior that this code tries to evade/use? Is to ensure that the C string inside stdDevString remains null terminated no matter what? Because as far as I know, initializing an std::string to a C string that's not null terminated will still exhibit problems.
Why didn't they call std::string stdDevString = std::string(device_string); in just one line?
No good reason for what they did. Given the std::string::string(const char*) constructor, you can simply use any of:
std::string stdDevString = device_string;
std::string stdDevString(device_string);
std::string stdDevString{device_string}; // C++11 { } syntax
The two-step default construction then assignment is just (bad) programmer style or oversight. Sans optimisation, it does do a little unnecessary construction, but that's still pretty cheap. It's likely removed by optimisation. Not a biggie - I doubt if I'd bother to mention it in a code review unless it was in an extremely performance sensitive area, but it's definitely best to defer declaring variables until a useful initial value is available to construct them with, localising it all in one place: not only is it less error prone and cross-referenceable, but it minimises the scope of the variable simplifying the reasoning about its use.
Is to ensure that the C string inside stdDevString remains null terminated no matter what?
No - it made no difference to that. Since C++11 the internal buffer in stdDevString would be kept NUL terminated regardless of which constructor is used, while for C++03 isn't not necessarily terminated - see dedicated heading for C++03 details below - but there's no guarantees regardless of how construction / assignment is done.
Because as far as I know, initializing an std::string to a C string that's not null terminated will still exhibit problems.
You're right - any of the construction options you've listed will only copy ASCIIZ text into the std::string - considering the first NUL ('\0') the terminator. If the char array isn't NUL-terminated there will be problems.
(That's a separate issue to whether the buffer inside the std::string is kept NUL terminated - discussed above).
Note that there's a separate string(const char*, size_type) constructor that can create strings with embedded NULs, and won't try to read further than told (Constructor (4) here)
C++03 std::strings were not guaranteed NUL-terminated internally
Whichever way the std::string is constructed and initialised, before C++11 the Standard did not require it to be NUL-terminated within the string's buffer. std::string was best imagined as containing a bunch of potentially non-printable (loosely speaking, binary in the ftp/file I/O sense) characters starting at address data() and extending for size() characters. So, if you had:
std::string x("help");
x[4]; // undefined behaviour: only [0]..[3] are safe
x.at(4); // will throw rather than return '\0'
x.data()[4]; // undefined behaviour, equivalent to x[4] above
x.c_str()[4]; // safely returns '\0', (perhaps because a NUL was always
// at x[4], one was just added, or a new NUL-terminated
// buffer was just prepared - in which case data() may
// or may not start returning it too)
Note that the std::string API requires c_str() to return a pointer to a NUL-terminated value. To do so, it can either:
proactively keep an extra NUL on the end of the string buffer at all times (in which case data[5] would happen to be safe on that implementation, but the code could break if the implementation changed or the code was ported to another Standard library implementation etc.)
reactively wait until c_str() is called, then:
if it has enough capacity at the current address (i.e. data()), append a NUL and return the same pointer value that data() would return
otherwise, allocate a new, larger buffer, copy the data over, NUL terminate it, and return a pointer to it (typically but optionally this buffer would replace the old buffer which would be deleted, such that calling data() immediately afterwards would return the same pointer returned by c_str())
I would say that it's equivalent of writing:
std::string stdDevString = std::string(device_string);
Or, even simpler:
std::string stdDevString = device_string;
Once the std::string has been created, it contains a private copy of the data in the C string.
I think it is ignorant to dismiss this as poor coding. If we assume that this string was allocated at file scope or as a static variable, it could be good coding.
When programming C++ for embedded systems with non-volatile memory present, there are many reasons why you wish to avoid static initialization: the main reason is that it adds lots of overhead code in the beginning of the program, where all such variables much be initialized. If they are instances of classes, constructors will be called.
This will lead to a delay peak at the beginning of the program execution. You don't want this workload peak there, because there are much more important tasks to do when starting up the program, like setting up various hardware.
To avoid this, you typically enable an option in the compiler which removes such static initialization, and then write your code in such a manner that no static/global variables are initialized, but instead set them in runtime.
On such a system, the code posted by the OP is the correct way to do it.
Looks like an artefact to me. Perhaps there was some other code in between, then it got removed, and someone was too lazy to join those two remaining lines into a single one.
Without writing a custom rdbuf is there any way to use a stringstream efficiently? That is, with these requirements:
the stream can be reset and writing start again without deallocating previous memory
get a const char* to the data written (along with the length) without creating a temporary
populate the stream without creating a temporary string
If somebody can give me a definitive "no" that would be great.
Now, I also use boost, so if somebody can provide a boost alternative which does this that would be great. It has to have both istream and ostream interfaces available.
Use boost::interprocess::vectorstream or boost::interprocess::bufferstream. These classes basically meet all of your requirements.
boost::interprocess::vectorstream won't return a const char*, but it will return a const reference to an internal container class, (like an internal vector), rather than returning a temporary string copy. On the other hand, boost::interprocess::bufferstream will basically allow you to use any arbitrary buffer as an I/O stream, giving you complete control over memory allocation, so you can easily use a char buffer if you want.
These are both great classes, and wonderful replacements for std::stringstream, which, in my opinion, has always been hindered by the fact that it doesn't give you direct access to the internal buffer, resulting in the unnecessary creation of temporary string objects. It's a shame these classes are somewhat obscure, hidden away in the interprocess library.