There are a number of Win32 functions that take the address of a buffer, such as TCHAR[256], and write some data to that buffer. It may be less than the size of the buffer or it may be the entire buffer.
Often you'll call this in a loop, for example to read data off a stream or pipe. In the end I would like to efficiently return a string that has the complete data from all the iterated calls to retrieve this data. I had been thinking to use std::string since it's += is optimized in a similar way to Java or C#'s StringBuffer.append()/StringBuilder.Append() methods, favoring speed instead of memory.
But I'm not sure how best to co-mingle the std::string with Win32 functions, since these functions take the char[] to begin with. Any suggestions?
If the argument is input-only use std::string like this
std::string text("Hello");
w32function(text.c_str());
If the argument is input/output use std::vector<char> instead like this:
std::string input("input");
std::vector<char> input_vec(input.begin(), input.end());
input_vec.push_back('\0');
w32function(&input_vec[0], input_vec.size());
// Now, if you want std::string again, just make one from that vector:
std::string output(&input_vec[0]);
If the argument is output-only also use std::vector<Type> like this:
// allocates _at least_ 1k and sets those to 0
std::vector<unsigned char> buffer(1024, 0);
w32function(&buffer[0], buffer.size());
// use 'buffer' vector now as you see fit
You can also use std::basic_string<TCHAR> and std::vector<TCHAR> if needed.
You can read more on the subject in the book Effective STL by Scott Meyers.
std::string has a function c_str() that returns its equivalent C-style string. (const char *)
Further, std::string has overloaded assignment operator that takes a C-style string as input.
e.g. Let ss be std::string instance and sc be a C-style string then the interconversion can be performed as :
ss = sc; // from C-style string to std::string
sc = ss.c_str(); // from std::string to C-style string
UPDATE :
As Mike Weller pointed out, If UNICODE macro is defined, then the strings will be wchar_t* and hence you would have to use std::wstring instead.
Rather than std::string, I would suggest to use std::vector, and use &v.front() while using v.size(). Make sure to have space already allocated!
You have to be careful with std::string and binary data.
s += buf;//will treat buf as a null terminated string
s += std::string(buf, size);//would work
You need a compatible string type: typedef std::basic_string<TCHAR> tstring; is a good choice.
For input only arguments, you can use the .c_str() method.
For buffers, the choice is slightly less clear:
std::basic_string is not guaranteed to use contiguous storage like std::vector is. However, all std::basic_string implementations I've seen do use contiguous storage, and the C++ standards committee consider the missing guarantee to be a defect in the standard. The defect has been corrected in the C++0x draft.
If you're willing to bend the rules ever so slightly - with no negative consequences - you can use &(*aString.begin()) as a pointer to a TCHAR buffer of length aString.size(). Otherwise, you're stuck with std::vector for now.
Here's what the C++ standard committee have to say about contiguous string storage:
Not standardizing this existing
practice does not give implementors
more freedom. We thought it might a
decade ago. But the vendors have
spoken both with their
implementations, and with their voice
at the LWG meetings. The
implementations are going to be
contiguous no matter what the standard
says. So the standard might as well
give string clients more design
choices.
Related
Many topics have discussed the difference between string and char[]. However, they are not clear to me to understand why we need to bring string in c++? Any insight is welcome, thanks!
char[] is C style. It is not object oriented, it forces you as the programmer to deal with implementation details (such as '\0' terminator) and rewrite standard code for handling strings every time over and over.
char[] is just an array of bytes, which can be used to store a string, but it is not a string in any meaningful way.
std::string is a class that properly represents a string and handles all string operations.
It lets you create objects and keep your code fully OOP (if that is what you want).
More importantly, it takes care of memory management for you.
Consider this simple piece of code:
// extract to string
#include <iostream>
#include <string>
main ()
{
std::string name;
std::cout << "Please, enter your name: ";
std::cin >> name;
std::cout << "Hello, " << name << "!\n";
return 0;
}
How would you write the same thing using char[]?
Assume you can not know in advance how long the name would be!
Same goes for string concatenation and other operations.
With real string represented as std::string you combine two strings with a simple += operator. One line.
If you are using char[] however, you need to do the following:
Calculate the size of the combined string + terminator character.
Allocate memory for the new combined string.
Use strncpy to copy first string to new array.
Use strncat to append second string to first string in new array.
Plus, you need to remember not to use the unsafe strcpy and strcat and to free the memory once you are done with the new string.
std::string saves you all that hassle and the many bugs you can introduce while writing it.
As noted by MSalters in a comment, strings can grow. This is, in my opinion, the strongest reason to have them in C++.
For example, the following code has a bug which may cause it to crash, or worse, to appear to work correctly:
char message[] = "Hello";
strcat(message, "World");
The same idea with std::string behaves correctly:
std::string message{"Hello"};
message += "World";
Additional benefits of std::string:
You can send it to functions by value, while char[] can only be sent by reference; this point looks rather insignificant, but it enables powerful code like std::vector<std::string> (a list of strings which you can add to)
std::string stores its length, so any operation which needs the length is more efficient
std::string works similarly to all other C++ containers (vector, etc) so if you are already familiar with containers, std::string is easy to use
std::string has overloaded comparison operators, so it's easy to use with std::map, std::sort, etc.
String class is no more than an amelioration of the char[] variable.
With strings you can achieve the same goals than the use of a char[] variable, but you won't have to matter about little tricks of char[] like pointers, segmentation faults...
This is a more convenient way to build strings, but you don't really see the "undergrounds" of the language, like how to implement concatenation or length functions...
Here is the documentation of the std::string class in C++ : C++ string documentation
I hope this question is appropriate for stackoverflow... What is the difference between storing raw data bytes (8 bits) in a std::string rather than storing them in std::vector<char>. I'm reading binary data from a file and storing those raw bytes in a std::string. This works well, there are no problems or issues with doing this. My program works as expected. However, other programmers prefer the std::vector<char> approach and suggest I stop using std::string as it's unsafe for raw bytes. So I'm wondering why might it be unsafe to use std::string to hold raw data bytes? I know std::string is most often used to store ASCII text, but a byte is a byte, so I don't understand the preference of the std::vector<char>.
Thanks for any advice!
The problem is not really whether it works or it doesn't. The problem is that it is utterly confusing for the next guy reading your code. std::string is meant for displaying text. Anybody reading your code will expect that. You'll declare your intent much better with a std::vector<char>.
It increases your WTF/min in code reviews.
In C++03, using std::string to store an array of byte data was not a good idea. By the standard, std::string did not have to store data contiguously. C++11 fixed that so that it's data does have to be contiguous.
So it would not be functional to do this in C++03. Not unless you have personally vetted your C++ standard library implementation of std::string to ensure that it is contiguous.
Either way, I would suggest vector<char>. Generally, when you see string, you expect it to be a... string. You know, a sequence of characters in some form of encoding. A vector<char> makes it obvious that it isn't a string, but an array of bytes.
Besides contiguous storage and code-clarity issues, I ran into some fairly insidious errors trying to use std::string to hold raw bytes.
Most of them centered around trying to convert a char array of bytes to std::string when interfacing with C libraries. For example:
std::string password = "pass\0word";
std::cout << password.length() << std::endl; // prints 4, not 9
Maybe you can fix that by specifying the length:
std::string password("pass\0word", 0, 9);
std::cout << password.length() << std::endl; // nope! still 4!
This is probably because the constructor expects to receive a C-string, not a byte array. There might be a better way, but I ended up with this:
std::string password("pass0word", 0, 9);
password[4] = '\0';
std::cout << password.length() << std::endl; // hurray! 9!
A little clunky. Thankfully I found this in unit testing, but I would have missed it if my test vectors didn't have null bytes. What makes this insidious is that the second approach above will work fine until the array contains a null byte.
So far std::vector<uint8_t> looks like a good option (thanks J.N. and Hurkyl):
char p[] = "pass\0word";
std::vector<uint8_t> password(p, p, p+9); // :)
Note: I haven't tried the iterator constructor with std::string, but this error is easy enough to make that it might be worth avoiding even the possibility.
Lessons learned:
Test byte-handling methods witih null byte-containing test vectors.
Be careful when (and I would say avoid) using std::string to hold raw bytes.
Is it possible to somehow adapt a c-style string/buffer (char* or wchar_t*) to work with the Boost String Algorithms Library?
That is, for example, it's trimalgorithm has the following declaration:
template<typename SequenceT>
void trim(SequenceT &, const std::locale & = std::locale());
and the implementation (look for trim_left_if) requires that the sequence type has a member function erase.
How could I use that with a raw character pointer / c string buffer?
char* pStr = getSomeCString(); // example, could also be something like wchar_t buf[256];
...
boost::trim(pStr); // HOW?
Ideally, the algorithms would work directly on the supplied buffer. (As far as possible. it obviously can't work if an algorithm needs to allocate additional space in the "string".)
#Vitaly asks: why can't you create a std::string from char buffer and then use it in algorithms?
The reason I have char* at all is that I'd like to use a few algorthims on our existing codebase. Refactoring all the char buffers to string would be more work than it's worth, and when changing or adapting something it would be nice to just be able to apply a given algorithm to any c-style string that happens to live in the current code.
Using a string would mean to (a) copy char* to string, (b) apply algorithm to string and (c) copy string back into char buffer.
For the SequenceT-type operations, you probably have to use std::string. If you wanted to implement that by yourself, you'd have to fulfill many more requirements for creation, destruction, value semantics etc. You'd basically end up with your implementation of std::string.
The RangeT-type operations might be, however, usable on char*s using the iterator_range from Boost.Range library. I didn't try it, though.
There exist some code which implements a std::string like string with a fixed buffer. With some tinkering you can modify this code to create a string type which uses an external buffer:
char buffer[100];
strcpy(buffer, " HELLO ");
xstr::xstring<xstr::fixed_char_buf<char> >
str(buffer, strlen(buffer), sizeof(buffer));
boost::algorithm::trim(str);
buffer[str.size()] = 0;
std::cout << buffer << std::endl; // prints "HELLO"
For this I added an constructor to xstr::xstring and xstr::fixed_char_buf to take the buffer, the size of the buffer which is in use and the maximum size of the buffer. Further I replaced the SIZE template argument with a member variable and changed the internal char array into a char pointer.
The xstr code is a bit old and will not compile without trouble on newer compilers but it needs some minor changes. Further I only added the things needed in this case. If you want to use this for real, you need to make some more changes to make sure it can not use uninitialized memory.
Anyway, it might be a good start for writing you own string adapter.
I don't know what platform you're targeting, but on most modern computers (including mobile ones like ARM) memory copy is so fast you shouldn't even waste your time optimizing memory copies. I say - wrap char* in std::string and check whether the performance suits your needs. Don't waste time on premature optimization.
Is there a way to get the "raw" buffer o a std::string?
I'm thinking of something similar to CString::GetBuffer(). For example, with CString I would do:
CString myPath;
::GetCurrentDirectory(MAX_PATH+1, myPath.GetBuffer(MAX_PATH));
myPath.ReleaseBuffer();
So, does std::string have something similar?
While a bit unorthodox, it's perfectly valid to use std::string as a linear memory buffer, the only caveat is that it isn't supported by the standard until C++11 that is.
std::string s;
char* s_ptr = &s[0]; // get at the buffer
To quote Herb Sutter,
Every std::string implementation I know of is in fact contiguous and null-terminates its buffer. So, although it isn’t formally
guaranteed, in practice you can probably get away with calling &str[0]
to get a pointer to a contiguous and null-terminated string. (But to
be safe, you should still use str.c_str().)
"Probably" is key here. So, while it's not a guarantee, you should be able to rely on the principle that std::string is a linear memory buffer and you should assert facts about this in your test suite, just to be sure.
You can always build your own buffer class but when you're looking to buy, this is what the STL has to offer.
Use std::vector<char> if you want a real buffer.
#include <vector>
#include <string>
int main(){
std::vector<char> buff(MAX_PATH+1);
::GetCurrentDirectory(MAX_PATH+1, &buff[0]);
std::string path(buff.begin(), buff.end());
}
Example on Ideone.
Not portably, no. The standard does not guarantee that std::strings have an exclusive linear representation in memory (and with the old C++03 standard, even data-structures like ropes are permitted), so the API does not give you access to it. They must be able to change their internal representation to that (in C++03) or give access to their linear representation (if they have one, which is enforced in C++11), but only for reading. You can access this using data() and/or c_str(). Because of that, the interface still supports copy-on-write.
The usual recommendation for working with C-APIs that modify arrays by accessing through pointers is to use an std::vector, which is guaranteed to have a linear memory-representation exactly for this purpose.
To sum this up: if you want to do this portably and if you want your string to end up in an std::string, you have no choice but to copy the result into the string.
It has c_str, which on all C++ implementations that I know returns the underlying buffer (but as a const char *, so you can't modify it).
std::string str("Hello world");
LPCSTR sz = str.c_str();
Keep in mind that sz will be invalidated when str is reallocated or goes out of scope. You could do something like this to decouple from the string:
std::vector<char> buf(str.begin(), str.end()); // not null terminated
buf.push_back(0); // null terminated
Or, in oldfashioned C style (note that this will not allow strings with embedded null-characters):
#include <cstring>
char* sz = strdup(str.c_str());
// ... use sz
free(sz);
According to this MSDN article, I think this is the best approach for what you want to do using std::wstring directly. Second best is std::unique_ptr<wchar_t[]> and third best is using std::vector<wchar_t>. Feel free to read the article and draw you own conclusions.
// Get the length of the text string
// (Note: +1 to consider the terminating NUL)
const int bufferLength = ::GetWindowTextLength(hWnd) + 1;
// Allocate string of proper size
std::wstring text;
text.resize(bufferLength);
// Get the text of the specified control
// Note that the address of the internal string buffer
// can be obtained with the &text[0] syntax
::GetWindowText(hWnd, &text[0], bufferLength);
// Resize down the string to avoid bogus double-NUL-terminated strings
text.resize(bufferLength - 1);
I think you will be frowned upon by the purists of STD cult for doing this. In any case, its much better to not relay on bloated and generic standard library if you want dynamic string type that can be easily passed to low level API functions that will modify its buffer and size at the same time, without any conversions, than you will have to implement it! Its actually very challenging and interesting task to do. For example in my custom txt type I overload this operators:
ui64 operator~() const; // Size operator
uli32 * operator*(); // Size modification operator
ui64 operator!() const; // True Size Operator
txt& operator--(); // Trimm operator
And also this casts:
operator const char *() const;
operator char *();
And as such, i can pass txt type to low level API functions directly, without even calling any .c_str(). I can then also pass the API function it's true size (i.e. size of buffer) and also pointer to internal size variable (operator*()), so that API function can update amount of characters written, thus giving valid string without the need to call stringlength at all!
I tried to mimic basic types with this txt, so it has no public functions at all, all public interface is only via operators. This way my txt fits perfectly with ints and other fundamental types.
For a specific example, consider atoi(const std::string &). This is very frustrating, since we as programmers would need to use it so much.
More general question is why does not C++ standard library reimplement the standard C libraries with C++ string,C++ vector or other C++ standard element rather than to preserve the old C standard libraries and force us use the old char * interface?
Its time consuming and the code to translate data types between these two interfaces is not easy to be elegant.
Is it for compatible reason,considering there was much more legacy C code than these days and preserving these C standard interfaces would make translation from C code to C++ much easier?
In addition,I have heard many other libraries available for C++ make a lot of enhancement and extensions to STL.So does there libraries support these functions?
PS: Considering much more answers to the first specific question, I edit a lot to clarify the question to outline the questions that I am much more curious to ask.
Another more general question is why do not STL reimplementate all the standard C libraries
Because the old C libraries do the trick. The C++ standard library only re-implements existing functionality if they can do it significantly better than the old version. And for some parts of the C library, the benefit of writing new C++-implementations just isn't big enough to justify the extra standardization work.
As for atoi and the like, there are new versions of these in the C++ standard library, in the std::stringstream class.
To convert from a type T to a type U:
T in;
U out;
std::stringstream sstr(in);
sstr >> out;
As with the rest of the IOStream library, it's not perfect, it's pretty verbose, it's impressively slow and so on, but it works, and usually it is good enough. It can handle integers of all sizes, floating-point values, C and C++ strings and any other object which defines the operator <<.
EDIT:In addition,I have heard many other libraries avaliable for C++ make a lot of enhancement and extensions to STL.So does there libraries support these functions?
Boost has a boost::lexical_cast which wraps std::stringstream. With that function, you can write the above as:
U out = boost::lexical_cast<U>(in);
Even in C, using atoi isn't a good thing to do for converting user input. It doesn't provide error checking at all. Providing a C++ version of it wouldn't be all that useful - considering that it wouldn't throw and do anything, you can just pass .c_str() to it and use it.
Instead you should use strtol in C code, which does do error checking. In C++03, you can use stringstreams to do the same, but their use is error-prone: What exactly do you need to check for? .bad(), .fail(), or .eof()? How do you eat up remaining whitespace? What about formatting flags? Such questions shouldn't bother the average user, that just want to convert his string. boost::lexical_cast does do a good job, but incidentally, C++0x adds utility functions to facilitate fast and safe conversions, through C++ wrappers that can throw if conversion failed:
int stoi(const string& str, size_t *idx = 0, int base = 10);
long stol(const string& str, size_t *idx = 0, int base = 10);
unsigned long stoul(const string& str, size_t *idx = 0, int base = 10);
long long stoll(const string& str, size_t *idx = 0, int base = 10);
unsigned long long stoull(const string& str, size_t *idx = 0, int base = 10);
Effects: the first two functions call strtol(str.c_str(), ptr, base), and the last three functions
call strtoul(str.c_str(), ptr, base), strtoll(str.c_str(), ptr, base), and strtoull(str.c_str(), ptr, base), respectively. Each function returns the converted result, if any. The argument ptr designates a pointer to an object internal to the function that is used to determine what to store at *idx. If the function does not throw an exception and idx != 0, the function stores in *idx the index of the first unconverted element of str.
Returns: the converted result.
Throws: invalid_argument if strtol, strtoul, strtoll, or strtoull reports that no conversion could be performed. Throws out_of_range if the converted value is outside the range of representable values for the return type.
There's no good way to know if atoi fails. It always returns an integer. Is that integer a valid conversion? Or is the 0 or -1 or whatever indicating an error? Yes it could throw an exception, but that would change the original contract, and you'd have to update all your code to catch the exception (which is what the OP is complaining about).
If translation is too time consuming, write your own atoi:
int atoi(const std::string& str)
{
std::istringstream stream(str);
int ret = 0;
stream >> ret;
return ret;
}
I see that solutions are offered that use std::stringstream or std::istringstream.
This might be perfectly OK for single threaded applications but if an application has lots of threads and often calls atoi(const std::string& str) implemented in this way that will result in performance degradation.
Read this discussion for example: http://gcc.gnu.org/ml/gcc-bugs/2009-05/msg00798.html.
And see a backtrace of the constructor of std::istringstream:
#0 0x200000007eb77810:0 in pthread_mutex_unlock+0x10 ()
from /usr/lib/hpux32/libc.so.1
#1 0x200000007ef22590 in std::locale::locale (this=0x7fffeee8)
at gthr-default.h:704
#2 0x200000007ef29500 in std::ios_base::ios_base (this=<not available>)
at /tmp/gcc-4.3.1.tar.gz/gcc-4.3.1/libstdc++-v3/src/ios.cc:83
#3 0x200000007ee8cd70 in std::basic_istringstream<char,std::char_traits<char>,std::allocator<char> >::basic_istringstream (this=0x7fffee4c,
__str=#0x7fffee44, __mode=_S_in) at basic_ios.h:456
#4 0x4000f70:0 in main () at main.cpp:7
So every time you enter atoi() and create a local varibale of type std::stringstream you will lock a global mutex and in a multithreaded application it is likely to result in waiting on this mutex.
So, it's better in a multithreaded application not to use std::stringstream. For example simply call atoi(const char*):
inline int atoi(const std::string& str)
{
return atoi(str.c_str());
}
For your example, you've got two options:
std::string mystring("4");
int myint = atoi(mystring.c_str());
Or something like:
std::string mystring("4");
std::istringstream buffer(mystring);
int myint = 0;
buffer >> myint;
The second option gives you better error management than the first.
You can write a more generic string to number convert as such:
template <class T>
T strToNum(const std::string &inputString,
std::ios_base &(*f)(std::ios_base&) = std::dec)
{
T t;
std::istringstream stringStream(inputString);
if ((stringStream >> f >> t).fail())
{
throw runtime_error("Invalid conversion");
}
return t;
}
// Example usage
unsigned long ulongValue = strToNum<unsigned long>(strValue);
int intValue = strToNum<int>(strValue);
int intValueFromHex = strToNum<int>(strHexValue,std::hex);
unsigned long ulOctValue = strToNum<unsigned long>(strOctVal, std::oct);
For conversions I find simplest to use boost's lexical_cast (except it might be too rigorously checking the validity of the conversions of string to other types).
It surely isn't very fast (it just uses std::stringstream under the hood, but significantly more convenient), but performance is often not needed where you convert values (e.g to create error output messages and such). (If you do lots of these conversions and need extreme performance, chances are you are doing something wrong and shouldn't be performing any conversions at all.)
Because the old C libraries still work with standard C++ types, with a very little bit of adaptation. You can easily change a const char * to a std::string with a constructor, and change back with std::string::c_str(). In your example, with std::string s, just call atoi(s.c_str()) and you're fine. As long as you can switch back and forth easily there's no need to add new functionality.
I'm not coming up with C functions that work on arrays and not container classes, except for things like qsort() and bsearch(), and the STL has better ways to do such things. If you had specific examples, I could consider them.
C++ does need to support the old C libraries for compatibility purposes, but the tendency is to provide new techniques where warranted, and provide interfaces for the old functions when there isn't much of an improvement. For example, the Boost lexical_cast is an improvement over such functions as atoi() and strtol(), much as the standard C++ string is an improvement over the C way of doing things. (Sometimes this is subjective. While C++ streams have considerable advantages over the C I/O functions, there's times when I'd rather drop back to the C way of doing things. Some parts of the C++ standard library are excellent, and some parts, well, aren't.)
There are all sorts of ways to parse a number from a string, atoi can easily be used with a std::string via atoi(std.c_str()) if you really want, but atoi has a bad interface because there is no sure way to determine if an error occurred during parsing.
Here's one slightly more modern C++ way to get an int from a std::string:
std::istringstream tmpstream(str);
if (tmpstream >> intvar)
{
// ... success! ...
}
The tongue in cheek answer is: Because STL is a half-hearted attempt to show how powerful C++ templates could be. Before they got to each corner of the problem space, time was up.
There are two reasons: Creating an API takes time and effort. Create a good API takes a lot of time and a huge effort. Creating a great API takes an insane amount of time and an incredible effort. When the STL was created, OO was still pretty new to the C++ people. They didn't have the ideas how to make fluent and simple API. Today, we think iterators are so 1990 but at the time, people thought "Bloody hell, why would I need that? for (int i=0; i<...) has been good enough for three decades!"
So STL didn't became the great, fluent API. This isn't all C++ fault because you can make good APIs with C++. But it was the first attempt to do that and it shows. Before the API could mature, it was turned into a standard and all the ugly shortcomings were set into stone. And on top of this, there was all this legacy code and all the libraries which already could do everything, so the pressure wasn't really there.
To solve your misery, give up on STL and have a look at the successors: Try boost and maybe Qt. Yeah, Qt is a UI library but it also has a pretty good standard library.
Since C++11, you can use std::stoi. It is like atoi but for std::string.