Is there a way to get the "raw" buffer o a std::string?
I'm thinking of something similar to CString::GetBuffer(). For example, with CString I would do:
CString myPath;
::GetCurrentDirectory(MAX_PATH+1, myPath.GetBuffer(MAX_PATH));
myPath.ReleaseBuffer();
So, does std::string have something similar?
While a bit unorthodox, it's perfectly valid to use std::string as a linear memory buffer, the only caveat is that it isn't supported by the standard until C++11 that is.
std::string s;
char* s_ptr = &s[0]; // get at the buffer
To quote Herb Sutter,
Every std::string implementation I know of is in fact contiguous and null-terminates its buffer. So, although it isn’t formally
guaranteed, in practice you can probably get away with calling &str[0]
to get a pointer to a contiguous and null-terminated string. (But to
be safe, you should still use str.c_str().)
"Probably" is key here. So, while it's not a guarantee, you should be able to rely on the principle that std::string is a linear memory buffer and you should assert facts about this in your test suite, just to be sure.
You can always build your own buffer class but when you're looking to buy, this is what the STL has to offer.
Use std::vector<char> if you want a real buffer.
#include <vector>
#include <string>
int main(){
std::vector<char> buff(MAX_PATH+1);
::GetCurrentDirectory(MAX_PATH+1, &buff[0]);
std::string path(buff.begin(), buff.end());
}
Example on Ideone.
Not portably, no. The standard does not guarantee that std::strings have an exclusive linear representation in memory (and with the old C++03 standard, even data-structures like ropes are permitted), so the API does not give you access to it. They must be able to change their internal representation to that (in C++03) or give access to their linear representation (if they have one, which is enforced in C++11), but only for reading. You can access this using data() and/or c_str(). Because of that, the interface still supports copy-on-write.
The usual recommendation for working with C-APIs that modify arrays by accessing through pointers is to use an std::vector, which is guaranteed to have a linear memory-representation exactly for this purpose.
To sum this up: if you want to do this portably and if you want your string to end up in an std::string, you have no choice but to copy the result into the string.
It has c_str, which on all C++ implementations that I know returns the underlying buffer (but as a const char *, so you can't modify it).
std::string str("Hello world");
LPCSTR sz = str.c_str();
Keep in mind that sz will be invalidated when str is reallocated or goes out of scope. You could do something like this to decouple from the string:
std::vector<char> buf(str.begin(), str.end()); // not null terminated
buf.push_back(0); // null terminated
Or, in oldfashioned C style (note that this will not allow strings with embedded null-characters):
#include <cstring>
char* sz = strdup(str.c_str());
// ... use sz
free(sz);
According to this MSDN article, I think this is the best approach for what you want to do using std::wstring directly. Second best is std::unique_ptr<wchar_t[]> and third best is using std::vector<wchar_t>. Feel free to read the article and draw you own conclusions.
// Get the length of the text string
// (Note: +1 to consider the terminating NUL)
const int bufferLength = ::GetWindowTextLength(hWnd) + 1;
// Allocate string of proper size
std::wstring text;
text.resize(bufferLength);
// Get the text of the specified control
// Note that the address of the internal string buffer
// can be obtained with the &text[0] syntax
::GetWindowText(hWnd, &text[0], bufferLength);
// Resize down the string to avoid bogus double-NUL-terminated strings
text.resize(bufferLength - 1);
I think you will be frowned upon by the purists of STD cult for doing this. In any case, its much better to not relay on bloated and generic standard library if you want dynamic string type that can be easily passed to low level API functions that will modify its buffer and size at the same time, without any conversions, than you will have to implement it! Its actually very challenging and interesting task to do. For example in my custom txt type I overload this operators:
ui64 operator~() const; // Size operator
uli32 * operator*(); // Size modification operator
ui64 operator!() const; // True Size Operator
txt& operator--(); // Trimm operator
And also this casts:
operator const char *() const;
operator char *();
And as such, i can pass txt type to low level API functions directly, without even calling any .c_str(). I can then also pass the API function it's true size (i.e. size of buffer) and also pointer to internal size variable (operator*()), so that API function can update amount of characters written, thus giving valid string without the need to call stringlength at all!
I tried to mimic basic types with this txt, so it has no public functions at all, all public interface is only via operators. This way my txt fits perfectly with ints and other fundamental types.
Related
I have a class that wraps C functions for reading and writing data using file descriptors
I'm currently stuck at read method.
I want to create a read method that wraps the C function ssize_t read(int fd, void *buf, size_t count);
The function above uses void *buf as an output and returns the number of bytes written in the buffer.
I want to have a method read that would return a variable size object that would contain that data or nullptr if no data was read.
What is the best way to do that?
EDIT: I already have a char array[4096] that I use to read data. I just want to return them and also give the caller the ability to know the length of the data that I return.
The char array[4096] is a member of the class that wraps C read. The reason I use it is to store the data temporarily before return them to the caller. Every time I call the wrapper read the char array will ovewriten by design. An upper layer will be responsible for concatenate the data and construct messages. This upper layer is the one that needs to know how much data has arrived.
The size of the char array[4096] is randomly chosen. It could be very small but more calls would be needed.
The object that contains the member char array will always be global.
I use C++17
Should I use std::vector or std::queue ?
The general answer here is: Don't use mutable global state. It breaks reentrancy and threading. And don't compound the issue by trying to return views of mutable global state, which makes even sequential calls a problem.
Just allocate a per-call buffer and use that; if you want to allow the caller to provide a buffer, that's also acceptable. Examples would look like:
// Some class assumed to have an fd member for reading via the C API
class Reader
{
// Define member attributes, e.g. fd
public:
std::string_view read(std::string& buf) {
ssize_t numread = read(fd, buf.data(), buf.size());
// Error checking if applicable, presumably handling negative return values
// by raising exception
return std::string_view(buf.data(), numread); // Guaranteed copy-elision
}
std::string read(size_t max_read) {
std::string buf(max_read, '\0'); // Allocate appropriately sized buffer
auto view = read(buf); // Delegate to view-based API
buf.resize(view.size()); // Resize to match amount actually read
return buf; // Likely (but not guaranteed) NRVO based copy-elision
}
}
std::string and std::string_view could be replaced with std::vector and std::span of some type in C++20 if you preferred (std::span would allow receiving a std::span instead of std::string& in C++20, making the code more generic).
This provides the caller with multiple options:
Call read with an existing pre-sized std::string (maybe change to std::span for C++20) that the caller can reuse over and over
Call read with an explicit size and get a freshly allocated std::string with few if any no copies involved (NRVO will avoid copying the std::string being returned in most cases, though if the underlying read reads very little, the resize call might reallocate the underlying storage and trigger a copy of whatever real data exists)
For maximum efficiency, many callers calling this repeatedly would choose #1 (they'd just create a local std::string of a given size, pass it in by reference, then use the returned std::string_view to limit how much of the buffer they actually work with), but for simple one-off uses, option #2 is convenient.
EDIT: I already have a char array[4096] that I use to read data. I just want to return them and also give the caller the ability to know the length of the data that I return.
Right, so the key information is that you don't want to copy that (or at least you don't want to force an additional copy).
Current preferred return type is std::span, but that's C++20 and you're still on 17.
Second preference is std::string_view. It'll work fine for binary data but may confuse people who expect it to be printable, not contain null terminators and so on.
Otherwise you can obviously return some struct or tuple with pointer & length (and possiblyerrno, which is otherwise discarded).
Returning something that might be nullptr is pretty much the least preferred option. Don't do it. It's actually harder to use correctly than the original C interface.
You could use function overloading:
void read(int fileDescriptor, short int & variable)
{
static_cast<void>(read(fileDescriptor, &variable, sizeof(variable));
}
void read(int fileDescriptor, int & variable)
{
static_cast<void>(read(fileDescriptor, &variable, sizeof(variable));
}
You may want to also look into using templates.
Consider a scenario, where std::string is used to store a secret. Once it is consumed and is no longer needed, it would be good to cleanse it, i.e overwrite the memory that contained it, thus hiding the secret.
std::string provides a function const char* data() returning a pointer to (since C++11) continous memory.
Now, since the memory is continous and the variable will be destroyed right after the cleanse due to scope end, would it be safe to:
char* modifiable = const_cast<char*>(secretString.data());
OpenSSL_cleanse(modifiable, secretString.size());
According to standard quoted here:
$5.2.11/7 - Note: Depending on the type of the object, a write operation through the pointer, lvalue or pointer to data member resulting from a const_cast that casts away a const-qualifier68 may produce undefined behavior (7.1.5.1).
That would advise otherwise, but do the conditions above (continuous, to-be-just-removed) make it safe?
The standard explicitly says you must not write to the const char* returned by data(), so don't do that.
There are perfectly safe ways to get a modifiable pointer instead:
if (secretString.size())
OpenSSL_cleanse(&secretString.front(), secretString.size());
Or if the string might have been shrunk already and you want to ensure its entire capacity is wiped:
if (secretString.capacity()) {
secretString.resize(secretString.capacity());
OpenSSL_cleanse(&secretString.front(), secretString.size());
}
It is probably safe. But not guaranteed.
However, since C++11, a std::string must be implemented as contiguous data so you can safely access its internal array using the address of its first element &secretString[0].
if(!secretString.empty()) // avoid UB
{
char* modifiable = &secretString[0];
OpenSSL_cleanse(modifiable, secretString.size());
}
std::string is a poor choice to store secrets. Since strings are copyable and sometimes copies go unnoticed, your secret may "get legs". Furthermore, string expansion techniques may cause multiple copies of fragments (or all of) your secrets.
Experience dictates a movable, non-copyable, wiped clean on destroy, unintelligent (no tricky copies under-the-hood) class.
You can use std::fill to fill the string with trash:
std::fill(str.begin(),str.end(), 0);
Do note that simply clearing or shrinking the string (with methods such clear or shrink_to_fit) does not guarantee that the string data will be deleted from the process memory. Malicious processes may dump the process memory and can extract the secret if the string is not overwritten correctly.
Bonus: Interestingly, the ability to trash the string data for security reasons forces some programming languages like Java to return passwords as char[] and not String. In Java, String is immutable, so "trashing" it will make a new copy of the string. Hence, you need a modifiable object like char[] which does not use copy-on-write.
Edit: if your compiler does optimize this call out, you can use specific compiler flags to make sure a trashing function will not be optimized out:
#ifdef WIN32
#pragma optimize("",off)
void trashString(std::string& str){
std::fill(str.begin(),str.end(),0);
}
#pragma optimize("",on)
#endif
#ifdef __GCC__
void __attribute__((optimize("O0"))) trashString(std::string& str) {
std::fill(str.begin(),str.end(),0);
}
#endif
#ifdef __clang__
void __attribute__ ((optnone)) trashString(std::string& str) {
std::fill(str.begin(),str.end(),0);
}
#endif
There's a better answer: don't!
std::string is a class which is designed to be userfriendly and efficient. It was not designed with cryptography in mind, so there are few guarantees written into it to help you out. For example, there's no guarantees that your data hasn't been copied elsewhere. At best, you could hope that a particular compiler's implementation offers you the behavior you want.
If you actually want to treat a secret as a secret, you should handle it using tools which are designed for handling secrets. In fact, you should develop a threat model for what capabilities your attacker has, and choose your tools accordingly.
Tested solution on CentOS 6, Debian 8 and Ubuntu 16.04 (g++/clang++, O0, O1, O2, O3):
secretString.resize(secretString.capacity(), '\0');
OPENSSL_cleanse(&secretString[0], secretString.size());
secretString.clear();
If you were really paranoid you could randomise the data in the cleansed string, so as not to give away the length of the string or a location that contained sensitive data:
#include <string>
#include <stdlib.h>
#include <string.h>
typedef void* (*memset_t)(void*, int, size_t);
static volatile memset_t memset_func = memset;
void cleanse(std::string& to_cleanse) {
to_cleanse.resize(to_cleanse.capacity(), '\0');
for (int i = 0; i < to_cleanse.size(); ++i) {
memset_func(&to_cleanse[i], rand(), 1);
}
to_cleanse.clear();
}
You could seed the rand() if you wanted also.
You could also do similar string cleansing without openssl dependency, by using explicit_bzero to null the contents:
#include <string>
#include <string.h>
int main() {
std::string secretString = "ajaja";
secretString.resize(secretString.capacity(), '\0');
explicit_bzero(&secretString[0], secretString.size());
secretString.clear();
return 0;
}
Is it possible to somehow adapt a c-style string/buffer (char* or wchar_t*) to work with the Boost String Algorithms Library?
That is, for example, it's trimalgorithm has the following declaration:
template<typename SequenceT>
void trim(SequenceT &, const std::locale & = std::locale());
and the implementation (look for trim_left_if) requires that the sequence type has a member function erase.
How could I use that with a raw character pointer / c string buffer?
char* pStr = getSomeCString(); // example, could also be something like wchar_t buf[256];
...
boost::trim(pStr); // HOW?
Ideally, the algorithms would work directly on the supplied buffer. (As far as possible. it obviously can't work if an algorithm needs to allocate additional space in the "string".)
#Vitaly asks: why can't you create a std::string from char buffer and then use it in algorithms?
The reason I have char* at all is that I'd like to use a few algorthims on our existing codebase. Refactoring all the char buffers to string would be more work than it's worth, and when changing or adapting something it would be nice to just be able to apply a given algorithm to any c-style string that happens to live in the current code.
Using a string would mean to (a) copy char* to string, (b) apply algorithm to string and (c) copy string back into char buffer.
For the SequenceT-type operations, you probably have to use std::string. If you wanted to implement that by yourself, you'd have to fulfill many more requirements for creation, destruction, value semantics etc. You'd basically end up with your implementation of std::string.
The RangeT-type operations might be, however, usable on char*s using the iterator_range from Boost.Range library. I didn't try it, though.
There exist some code which implements a std::string like string with a fixed buffer. With some tinkering you can modify this code to create a string type which uses an external buffer:
char buffer[100];
strcpy(buffer, " HELLO ");
xstr::xstring<xstr::fixed_char_buf<char> >
str(buffer, strlen(buffer), sizeof(buffer));
boost::algorithm::trim(str);
buffer[str.size()] = 0;
std::cout << buffer << std::endl; // prints "HELLO"
For this I added an constructor to xstr::xstring and xstr::fixed_char_buf to take the buffer, the size of the buffer which is in use and the maximum size of the buffer. Further I replaced the SIZE template argument with a member variable and changed the internal char array into a char pointer.
The xstr code is a bit old and will not compile without trouble on newer compilers but it needs some minor changes. Further I only added the things needed in this case. If you want to use this for real, you need to make some more changes to make sure it can not use uninitialized memory.
Anyway, it might be a good start for writing you own string adapter.
I don't know what platform you're targeting, but on most modern computers (including mobile ones like ARM) memory copy is so fast you shouldn't even waste your time optimizing memory copies. I say - wrap char* in std::string and check whether the performance suits your needs. Don't waste time on premature optimization.
There are a number of Win32 functions that take the address of a buffer, such as TCHAR[256], and write some data to that buffer. It may be less than the size of the buffer or it may be the entire buffer.
Often you'll call this in a loop, for example to read data off a stream or pipe. In the end I would like to efficiently return a string that has the complete data from all the iterated calls to retrieve this data. I had been thinking to use std::string since it's += is optimized in a similar way to Java or C#'s StringBuffer.append()/StringBuilder.Append() methods, favoring speed instead of memory.
But I'm not sure how best to co-mingle the std::string with Win32 functions, since these functions take the char[] to begin with. Any suggestions?
If the argument is input-only use std::string like this
std::string text("Hello");
w32function(text.c_str());
If the argument is input/output use std::vector<char> instead like this:
std::string input("input");
std::vector<char> input_vec(input.begin(), input.end());
input_vec.push_back('\0');
w32function(&input_vec[0], input_vec.size());
// Now, if you want std::string again, just make one from that vector:
std::string output(&input_vec[0]);
If the argument is output-only also use std::vector<Type> like this:
// allocates _at least_ 1k and sets those to 0
std::vector<unsigned char> buffer(1024, 0);
w32function(&buffer[0], buffer.size());
// use 'buffer' vector now as you see fit
You can also use std::basic_string<TCHAR> and std::vector<TCHAR> if needed.
You can read more on the subject in the book Effective STL by Scott Meyers.
std::string has a function c_str() that returns its equivalent C-style string. (const char *)
Further, std::string has overloaded assignment operator that takes a C-style string as input.
e.g. Let ss be std::string instance and sc be a C-style string then the interconversion can be performed as :
ss = sc; // from C-style string to std::string
sc = ss.c_str(); // from std::string to C-style string
UPDATE :
As Mike Weller pointed out, If UNICODE macro is defined, then the strings will be wchar_t* and hence you would have to use std::wstring instead.
Rather than std::string, I would suggest to use std::vector, and use &v.front() while using v.size(). Make sure to have space already allocated!
You have to be careful with std::string and binary data.
s += buf;//will treat buf as a null terminated string
s += std::string(buf, size);//would work
You need a compatible string type: typedef std::basic_string<TCHAR> tstring; is a good choice.
For input only arguments, you can use the .c_str() method.
For buffers, the choice is slightly less clear:
std::basic_string is not guaranteed to use contiguous storage like std::vector is. However, all std::basic_string implementations I've seen do use contiguous storage, and the C++ standards committee consider the missing guarantee to be a defect in the standard. The defect has been corrected in the C++0x draft.
If you're willing to bend the rules ever so slightly - with no negative consequences - you can use &(*aString.begin()) as a pointer to a TCHAR buffer of length aString.size(). Otherwise, you're stuck with std::vector for now.
Here's what the C++ standard committee have to say about contiguous string storage:
Not standardizing this existing
practice does not give implementors
more freedom. We thought it might a
decade ago. But the vendors have
spoken both with their
implementations, and with their voice
at the LWG meetings. The
implementations are going to be
contiguous no matter what the standard
says. So the standard might as well
give string clients more design
choices.
What is the correct C++ way of comparing a memory buffer with a constant string - strcmp(buf, "sometext") ? I want to avoid unnecessary memory copying as the result of creating temporary std::string objects.
Thanks.
If you're just checking for equality, you may be able to use std::equal
#include <algorithms>
const char* text = "sometext";
const int len = 8; // length of text
if (std::equal(text, text+len, buf)) ...
of course this will need additional logic if your buffer can be smaller than the text
strcmp is good if you know the contents of your buffer. std::strncmp might give you a little more security against buffer overflows.
strcmp works fine, no copy is made. Alternatively, you could also use memcmp. However, when in C++, why not use std::strings?
I would use memcmp, and as the last parameter, use the minimum of the 2 sizes of data.
Also check to make sure those 2 sizes are the same, or else you are simply comparing the prefix of the shortest one.
You may do it like,
const char* const CONST_STRING = "sometext";
strcmp(buf,CONST_STRING);