How to cast a pointer of memory block to std stream - c++

I have programed an application on windows XP and in Visual Studio with c++ language.
In that app I used LoadResource() API to load a resource for giving a file in the resource memory.
It returned a pointer of memory block and I wanna cast the pointer to the std stream to use for compatibility.
Could anyone help me?

You can't cast the resource to a stream type. Either you copy the bytes:
std::stringstream ss;
ss.rdbuf().sputn(buf, len);
or you wrap your resource in your own streambuf:
class resourcebuf : public std::streambuf {
// Todo: implement members including at least xsgetn, uflow and underflow
};
and pass it to istream::istream

Why would you need this?
Casting raw data pointers to streams means byte-by-byte copying of your resource and, therefore, lacks in performance (and, also to mention, I don't see any benefit in this approach).
If you want to work with raw memory, work with it. Casting here (compatibility?) seems to be a very strange approach.
Still, if you want to do it, you could create some stream from your memory block, that treats it as a sequence of bytes. In this case, it means using std::stringstream (istringstream).
After you lock your resource by LockResource, create a string from received void* pointer and pass it to your stringstream instance.
void* memory = LockResource(...);
// You would probably want to use SizeofResource() here
size_t memory_size = ... ;
std::string casted_memory(static_cast<char*>(memory), memory_size);
std::istringstream stream(casted_memory);

Most straightforward way is probably to convert the buffer to string and then stringstream:
std::stringstream ss(std::string(buf,len));
I think that will copy it twice, though, so if it turns out to be taking a lot of time you might need to look for alternatives. You could use strstream, but it might freak out the squares.

Related

Writing a map<> out to a file and reading it back in again [duplicate]

This is something the professor showed us in his scripts. I have not used this method in any code I have written.
Basically, we take a class, or struct, and reinterpret_cast it and save off the entire struct like so:
struct Account
{
Account()
{ }
Account(std::string one, std::string two)
: login_(one), pass_(two)
{ }
private:
std::string login_;
std::string pass_;
};
int main()
{
Account *acc = new Account("Christian", "abc123");
std::ofstream out("File.txt", std::ios::binary);
out.write(reinterpret_cast<char*>(acc), sizeof(Account));
out.close();
This produces the output (in the file)
ÍÍÍÍChristian ÍÍÍÍÍÍ ÍÍÍÍabc123 ÍÍÍÍÍÍÍÍÍ
I'm confused. Does this method actually work, or does it cause UB because magical things happen within classes and structs that are at the whims of individual compilers?
It doesn't actually work, but it also does not cause undefined behavior.
In C++ it is legal to reinterpret any object as an array of char, so there is no undefined behavior here.
The results, however, are usually only usable if the class is POD (effectively, if the class is a simple C-style struct) and self-contained (that is, the struct doesn't have pointer data members).
Here, Account is not POD because it has std::string members. The internals of std::string are implementation-defined, but it is not POD and it usually has pointers that refer to some heap-allocated block where the actual string is stored (in your specific example, the implementation is using a small-string optimization where the value of the string is stored in the std::string object itself).
There are a few issues:
You aren't always going to get the results you expect. If you had a longer string, the std::string would use a buffer allocated on the heap to store the string and so you will end up just serializing the pointer, not the pointed-to string.
You can't actually use the data you've serialized here. You can't just reinterpret the data as an Account and expect it to work, because the std::string constructors would not get called.
In short, you cannot use this approach for serializing complex data structures.
It's not undefined. Rather, it's platform dependent or implementation defined behavior. This is, in general bad code, because differing versions of the same compiler, or even different switches on the same compiler, can break your save file format.
This could work depending on the contents of the struct, and the platform on which the data is read back. This is a risky, non-portable hack which your teacher should not be propagating.
Do you have pointers or ints in the struct? Pointers will be invalid in the new process when read back, and int format is not the same on all machines (to name but two show-stopping problems with this approach). Anything that's pointed to as part of an object graph will not be handled. Structure packing could be different on the target machine (32-bit vs 64-bit) or even due to compiler options changing on the same hardware, making sizeof(Account) unreliable as a read back data size.
For a better solution, look at a serialization library which handles those issues for you. Boost.Serialization is a good example.
Here, we use the term "serialization"
to mean the reversible deconstruction
of an arbitrary set of C++ data
structures to a sequence of bytes.
Such a system can be used to
reconstitute an equivalent structure
in another program context. Depending
on the context, this might used
implement object persistence, remote
parameter passing or other facility.
Google Protocol Buffers also works well for simple object hierarchies.
It's no substitute for proper serialization. Consider the case of any complex type that contains pointers - if you save the pointers to a file, when you load them up later, they're not going to point to anything meaningful.
Additionally, it's likely to break if the code changes, or even if it's recompiled with different compiler options.
So it's really only useful for short-term storage of simple types - and in doing so, it takes up way more space than necessary for that task.
This method, if it works at all, is far from robust. It is much better to decide on some "serialized" form, whether it is binary, text, XML, etc., and write that out.
The key here: You need a function/code to reliably convert your class or struct to/from a series of bytes. reinterpret_cast does not do this, as the exact bytes in memory used to represent the class or struct can change for things like padding, order of members, etc.
No.
In order for it to work, the structure must be a POD (plain old data: only simple data members and POD data members, no virtual functions... probably some other restrictions which I can't remember).
So if you wanted to do that, you'd need a struct like this:
struct Account {
char login[20];
char password[20];
};
Note that std::string's not a POD, so you'd need plain arrays.
Still, not a good approach for you. Keyword: "serialization" :).
Some version of string don;t actually use dynamic memory for the string when the string is small. Thus store the string internally in the string object.
Think of this:
struct SimpleString
{
char* begin; // beginning of string
char* end; // end of string
char* allocEnd; // end of allocated buffer end <= allocEnd
int* shareCount; // String are usually copy on write
// as a result you need to track the number of people
// using this buffer
};
Now on a 64 bit system. Each pointer is 8 bytes. Thus a string of less than 32 bytes could fit into the same structure without allocating a buffer.
struct CompressedString
{
char buffer[sizeof(SimpleString)];
};
stuct OptString
{
int type; // Normal /Compressed
union
{
SimpleString simple;
CompressedString compressed;
}
};
So this is what I believe is happening above.
A very efficient string implementation is being used thus allowing you to dump the object to file without worrying about pointers (as the std::string are not using pointers).
Obviously this is not portable as it depends on the implementation details of std::string.
So interesting trick, but not portable (and liable to break easily without some compile time checks).

Can I do a zero-copy std::string allocation in C++ from a const char * array?

Profiling of my application reveals that it is spending nearly 5% of CPU time in string allocation. In many, many places I am making C++ std::string objects from a 64MB char buffer. The thing is, the buffer never changes during the running of the program. My analysis of std::string(const char *buf,size_t buflen) calls is that that the string is being copied because the buffer might change after the string is made. That isn't the problem here. Is there a way around this problem?
EDIT: I am working with binary data, so I can't just pass around char *s. Besides, then I would have a substantial overhead from always scanning for the NULL, which the std::string avoids.
If the string isn't going to change and if its lifetime is guaranteed to be longer than you are going to use the string, then don't use std::string.
Instead, consider a simple C string wrapper, like the proposed string_ref<T>.
Binary data? Stop using std::string and use std::vector<char>. But that won't fix your issue of it being copied. From your description, if this huge 64MB buffer will never change, you truly shouldn't be using std::string or std::vector<char>, either one isn't a good idea. You really ought to be passing around a const char* pointer (const uint8_t* would be more descriptive of binary data but under the covers it's the same thing, neglecting sign issues). Pass around both the pointer and a size_t length of it, or pass the pointer with another 'end' pointer. If you don't like passing around separate discrete variables (a pointer and the buffer’s length), make a struct to describe the buffer & have everyone use those instead:
struct binbuf_desc {
uint8_t* addr;
size_t len;
binbuf_desc(addr,len) : addr(addr), len(len) {}
}
You can always refer to your 64MB buffer (or any other buffer of any size) by using binbuf_desc objects. Note that binbuf_desc objects don’t own the buffer (or a copy of it), they’re just a descriptor of it, so you can just pass those around everywhere without having to worry about binbuf_desc’s making unnecessary copies of the buffer.
There is no portable solution. If you tell us what toolchain you're using, someone might know a trick specific to your library implementation. But for the most part, the std::string destructor (and assignment operator) is going to free the string content, and you can't free a string literal. (It's not impossible to have exceptions to this, and in fact the small string optimization is a common case that skips deallocation, but these are implementation details.)
A better approach is to not use std::string when you don't need/want dynamic allocation. const char* still works just fine in modern C++.
Since C++17, std::string_view may be your way. It can be initialized both from a bare C string (with or without a length), or a std::string
There is no constraint that the data() method returns a zero-terminated string though.
If you need this "zero-terminated on request" behaviour, there are alternatives such as str_view from Adam Sawicki that looks satisfying (https://github.com/sawickiap/str_view)
Seems that using const char * instead of std::string is the best way to go for you. But you should also consider how you are using strings. It may be possible that there could be going on implicit conversion from char pointers to std::string objects. This could happen during function calls, for example.

Can you avoid using temporary buffers when using std::string to interact with C style APIs?

I should preface this question by saying I think the answer is probably no, but I'd like to see what other people think about the issue.
I spend most of my time writing C++ that interacts with the Win32 API which like most C style APIs wants to either:
Take buffers which I've provided and operate on them.
Or return pointers to buffers which I need to later free.
Both of these scenarios essentially mean that if you want to use std::string in your code you've got to accept the fact that you're going to be doing a lot of string copying every time you construct a std::string from a temporary buffer.
What would be nice would be:
To be able to allow C style APIs to safely directly mutate a std::string and pre-reserve its allocation and set its size in advance (to mitigate scenario 1)
To be able to wrap a std::string around an existing char[] (to mitigate scenario 2)
Is there a nice way to do either of these, or should I just accept that there's an inherent cost in using std::string with old school APIs? It looks like scenario 1 would be particularly tricky because std::string has a short string optimisation whereby its buffer could either be on the stack or the heap depending on its size.
In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.
Previously, you can use .data() or .c_str() but the string is not mutable through these.
Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.
Well you could probably just const_cast the .data() of a string to char* and it would most likely work. As with all optimisations, make sure that it is actually this bit of the code that is the bottleneck. If it is, wrap this up in an inline-able function, or a template class or something so that you can write some tests for it and change the behaviour if it doesn't work on some platform.
I think the only thing that you can do safely with std::(w)string here is pass it as an input that's not going to be modified by its user; use .c_str() to get a pointer to (W)CHAR.
You may be able to use a std::vector<char> instead. You can directly pass a pointer to the first character into C code and let the C code write it which you can't do with a string. And many of the operations you'd want to perform on a string you can do on a std::vector<char> just as well.
Since C++11, you don't have to use temporary buffers. You can interchangeably use strings & c-strings and even write to the buffer of c++ strings, but you need to use string::front(), not string::data() or string::c_str() as those only return const char*. See Directly write into char* buffer of std::string.

Using reinterpret cast to save a struct or class to file

This is something the professor showed us in his scripts. I have not used this method in any code I have written.
Basically, we take a class, or struct, and reinterpret_cast it and save off the entire struct like so:
struct Account
{
Account()
{ }
Account(std::string one, std::string two)
: login_(one), pass_(two)
{ }
private:
std::string login_;
std::string pass_;
};
int main()
{
Account *acc = new Account("Christian", "abc123");
std::ofstream out("File.txt", std::ios::binary);
out.write(reinterpret_cast<char*>(acc), sizeof(Account));
out.close();
This produces the output (in the file)
ÍÍÍÍChristian ÍÍÍÍÍÍ ÍÍÍÍabc123 ÍÍÍÍÍÍÍÍÍ
I'm confused. Does this method actually work, or does it cause UB because magical things happen within classes and structs that are at the whims of individual compilers?
It doesn't actually work, but it also does not cause undefined behavior.
In C++ it is legal to reinterpret any object as an array of char, so there is no undefined behavior here.
The results, however, are usually only usable if the class is POD (effectively, if the class is a simple C-style struct) and self-contained (that is, the struct doesn't have pointer data members).
Here, Account is not POD because it has std::string members. The internals of std::string are implementation-defined, but it is not POD and it usually has pointers that refer to some heap-allocated block where the actual string is stored (in your specific example, the implementation is using a small-string optimization where the value of the string is stored in the std::string object itself).
There are a few issues:
You aren't always going to get the results you expect. If you had a longer string, the std::string would use a buffer allocated on the heap to store the string and so you will end up just serializing the pointer, not the pointed-to string.
You can't actually use the data you've serialized here. You can't just reinterpret the data as an Account and expect it to work, because the std::string constructors would not get called.
In short, you cannot use this approach for serializing complex data structures.
It's not undefined. Rather, it's platform dependent or implementation defined behavior. This is, in general bad code, because differing versions of the same compiler, or even different switches on the same compiler, can break your save file format.
This could work depending on the contents of the struct, and the platform on which the data is read back. This is a risky, non-portable hack which your teacher should not be propagating.
Do you have pointers or ints in the struct? Pointers will be invalid in the new process when read back, and int format is not the same on all machines (to name but two show-stopping problems with this approach). Anything that's pointed to as part of an object graph will not be handled. Structure packing could be different on the target machine (32-bit vs 64-bit) or even due to compiler options changing on the same hardware, making sizeof(Account) unreliable as a read back data size.
For a better solution, look at a serialization library which handles those issues for you. Boost.Serialization is a good example.
Here, we use the term "serialization"
to mean the reversible deconstruction
of an arbitrary set of C++ data
structures to a sequence of bytes.
Such a system can be used to
reconstitute an equivalent structure
in another program context. Depending
on the context, this might used
implement object persistence, remote
parameter passing or other facility.
Google Protocol Buffers also works well for simple object hierarchies.
It's no substitute for proper serialization. Consider the case of any complex type that contains pointers - if you save the pointers to a file, when you load them up later, they're not going to point to anything meaningful.
Additionally, it's likely to break if the code changes, or even if it's recompiled with different compiler options.
So it's really only useful for short-term storage of simple types - and in doing so, it takes up way more space than necessary for that task.
This method, if it works at all, is far from robust. It is much better to decide on some "serialized" form, whether it is binary, text, XML, etc., and write that out.
The key here: You need a function/code to reliably convert your class or struct to/from a series of bytes. reinterpret_cast does not do this, as the exact bytes in memory used to represent the class or struct can change for things like padding, order of members, etc.
No.
In order for it to work, the structure must be a POD (plain old data: only simple data members and POD data members, no virtual functions... probably some other restrictions which I can't remember).
So if you wanted to do that, you'd need a struct like this:
struct Account {
char login[20];
char password[20];
};
Note that std::string's not a POD, so you'd need plain arrays.
Still, not a good approach for you. Keyword: "serialization" :).
Some version of string don;t actually use dynamic memory for the string when the string is small. Thus store the string internally in the string object.
Think of this:
struct SimpleString
{
char* begin; // beginning of string
char* end; // end of string
char* allocEnd; // end of allocated buffer end <= allocEnd
int* shareCount; // String are usually copy on write
// as a result you need to track the number of people
// using this buffer
};
Now on a 64 bit system. Each pointer is 8 bytes. Thus a string of less than 32 bytes could fit into the same structure without allocating a buffer.
struct CompressedString
{
char buffer[sizeof(SimpleString)];
};
stuct OptString
{
int type; // Normal /Compressed
union
{
SimpleString simple;
CompressedString compressed;
}
};
So this is what I believe is happening above.
A very efficient string implementation is being used thus allowing you to dump the object to file without worrying about pointers (as the std::string are not using pointers).
Obviously this is not portable as it depends on the implementation details of std::string.
So interesting trick, but not portable (and liable to break easily without some compile time checks).

std::stringstream efficient way to get written data, copy to another stream

Without writing a custom rdbuf is there any way to use a stringstream efficiently? That is, with these requirements:
the stream can be reset and writing start again without deallocating previous memory
get a const char* to the data written (along with the length) without creating a temporary
populate the stream without creating a temporary string
If somebody can give me a definitive "no" that would be great.
Now, I also use boost, so if somebody can provide a boost alternative which does this that would be great. It has to have both istream and ostream interfaces available.
Use boost::interprocess::vectorstream or boost::interprocess::bufferstream. These classes basically meet all of your requirements.
boost::interprocess::vectorstream won't return a const char*, but it will return a const reference to an internal container class, (like an internal vector), rather than returning a temporary string copy. On the other hand, boost::interprocess::bufferstream will basically allow you to use any arbitrary buffer as an I/O stream, giving you complete control over memory allocation, so you can easily use a char buffer if you want.
These are both great classes, and wonderful replacements for std::stringstream, which, in my opinion, has always been hindered by the fact that it doesn't give you direct access to the internal buffer, resulting in the unnecessary creation of temporary string objects. It's a shame these classes are somewhat obscure, hidden away in the interprocess library.