Problem using reinterpret_cast<> in c++

Problem using reinterpret_cast<> in c++ - c++

I am trying to cast a datastream into a struct since the datastream consists of fixed-width messages and each message has fulle defined fixed width fields as well. I was planning on creating a struct and then using reinterpret_cast to cast pointer to the datastream to the struct to get the fields. I made some test code and get weird results. Could any explain why I am getting these or how to correct the code. (the datastream will be binary and alpha numeric mixed but im just testing with strings)
#pragma pack(push,1)
struct Header
{
char msgType[1];
char filler[1];
char third[1];
char fourth[1];
};
#pragma pack(pop)
int main(void)
{
cout << sizeof(Header) << endl;
char* data = "four";
Header* header = reinterpret_cast<Header*>(data);
cout << header->msgType << endl;
cout << header ->filler << endl;
cout << header->third << endl;
cout << header->fourth << endl;
return 0;
}
The result that are coming up are
4
four
our
ur
r
I think the four, our and ur is printing since it cant find the null terminator. How do I get around the null terminator issue?

In order to be able to print an array of chars, and being able to distinguish it from a null-terminated string, you need other operator<< definitions:
template< size_t N >
std::ostream& operator<<( std::ostream& out, char (&array)[N] ) {
for( size_t i = 0; i != N; ++i ) out << array[i];
return out;
}

You're right about the lack of null terminator. The reason it's printing "ur" again is because you repeated the header->third instead of header->fourth. Instead of "char[1]", why not just declare those variables as "char"?
struct Header
{
char msgType;
char filler;
char third;
char fourth;
};

The issue is not reinterpret_cast (although using it is a very bad idea) but in the types of the things in the struct. They should be of type 'char', not of type 'char[1]'.

#pragma pack(push,1)
template<int N>
struct THeader
{
char msgType[1+N];
char filler[1+N];
char third[1+N];
char fourth[1+N];
};
typedef THeader<0> Header0;
typedef THeader<1> Header1;
Header1 Convert(const Header0 & h0) {
Header1 h1 = {0};
std::copy(h0.msgType, h0.msgType + sizeof(h0.msgType)/sizeof(h0.msgType[0]), h1.msgType);
std::copy(h0.filler, h0.filler+ sizeof(h0.filler)/sizeof(h0.filler[0]), h1.filler);
std::copy(h0.third , h0.third + sizeof(h0.third) /sizeof(h0.third [0]), h1.third);
std::copy(h0.fourth, h0.fourth+ sizeof(h0.fourth)/sizeof(h0.fourth[0]), h1.fourth);
return h1;
}
#pragma pack(pop)
int main(void)
{
cout << sizeof(Header) << endl;
char* data = "four";
Header0* header0 = reinterpret_cast<Header*>(data);
Header1 header = Convert(*header0);
cout << header.msgType << endl;
cout << header.filler << endl;
cout << header.third << endl;
cout << header.fourth << endl;
return 0;
}

In my experience, using #pragma pack has caused headaches -- partially due to a compiler that doesn't correctly pop, but also due to developers forgetting to pop in one header. One mistake like that and structs end up defined differently depending on which order headers get included in a compilation unit. It's a debugging nightmare.
I try not to do memory overlays for that reason -- you can't trust that your struct is properly aligned with the data you are expecting. Instead, I create structs (or classes) that contain the data from a message in a "native" C++ format. For example, you don't need a "filler" field defined if it's just there for alignment purposes. And perhaps it makes more sense for the type of a field to be int than for it to be char[4]. As soon as possible, translate the datastream into the "native" type.

Assuming you want to keep using an overlayable struct (which is sensible, since it avoids the copy in Alexey's code), you can replace your raw char arrays with a wrapper like the following:
template <int N> struct FixedStr {
char v[N];
};
template <int N>
std::ostream& operator<<( std::ostream& out, FixedStr const &str) {
char const *nul = (char const *)memchr(str.v, 0, N);
int n = (nul == NULL) ? N : nul-str.v;
out.write(str.v, n);
return out;
}
Then your generated structures will look like:
struct Header
{
FixedStr<1> msgType;
FixedStr<1> filler;
FixedStr<1> third;
FixedStr<40> forty;
};
and your existing code should work fine.
NB. you can add methods to FixedStr if you want (eg, std::string FixedStr::toString()) just don't add virtual methods or inheritance, and it will overlay fine.

Related

iostream equivalent to snprintf(NULL, 0, format_string, args...)

I want to find the number of characters that a stream formatting operation would produce without allocating memory from the heap. In C, it can be done with
int nchars = snprintf(NULL, 0, format_string, args...);
How can it be done within the ostream framework in C++?
An implementation with std::ostringstream may allocate memory from the heap:
template <class T>
int find_nchar(const T& value) {
std::ostringstream os; // may allocate memory from the heap
os << value;
return os.str().size(); // may allocate memory from the heap
}
I think I need to make a custom ostream class to achieve this. The custom ostream should respect all the formatting flags one can set for the normal std::ostream.
I am searching for a solution that only uses the C++ standard library, not boost::iostreams, for example.

Rather than a custom std::ostream it might be easier -- and perhaps more flexible -- to implement a custom std::streambuf that can then be used with any std::ostream.
#include <streambuf>
template <class CharT, class Traits = std::char_traits<CharT>>
struct counting_streambuf: std::basic_streambuf<CharT, Traits> {
using base_t = std::basic_streambuf<CharT, Traits>;
using typename base_t::char_type;
using typename base_t::int_type;
std::streamsize count = 0;
std::streamsize xsputn(const char_type* /* unused */, std::streamsize n)
override
{
count += n;
return n;
}
int_type overflow(int_type ch)
override
{
++count;
return ch;
}
};
Then use as...
#include <iostream>
int
main (int argc, char **argv)
{
using char_type = decltype(std::cout)::char_type;
counting_streambuf<char_type> csb;
/*
* Associate the counting_streambuf with std::cout whilst
* retaining a pointer to the original std::streambuf.
*/
auto *oldbuf = std::cout.rdbuf(&csb);
std::cout << "Some text goes here...\n";
/*
* Restore the original std::streambuf.
*/
std::cout.rdbuf(oldbuf);
std::cout << "output length is " << csb.count << " characters\n";
}
Running the above results in...
output length is 23 characters
Edit: The original solution didn't overload overflow. This works on Linux but not on Windows. Thanks go to Peter Dimov from Boost, who found the solution.

constexpr length of a string from template parameter

I am trying to obtain the length of a string passed as a template argument using C++11. Here is what I have found so far:
#include <iostream>
#include <cstring>
extern const char HELLO[] = "Hello World!!!";
template<const char _S[]>
constexpr size_t len1() { return sizeof(_S); }
template<const char _S[]>
constexpr size_t len2() { return std::strlen(_S); }
template<const char _S[], std::size_t _Sz=sizeof(_S)>
constexpr size_t len3() { return _Sz-1; }
template<unsigned int _N>
constexpr size_t len5(const char(&str)[_N])
{
return _N-1;
}
int main() {
enum {
l1 = len1<HELLO>(),
// l2 = len2<HELLO>() does not compile
l3 = len3<HELLO>(),
l4 = len3<HELLO, sizeof(HELLO)>(),
l5 = len5(HELLO),
};
std::cout << l1 << std::endl; // outputs 4
// std::cout << l2 << std::endl;
std::cout << l3 << std::endl; // outputs 3
std::cout << l4 << std::endl; // outputs 14
std::cout << l5 << std::endl; // outputs 14
return 0;
}
I am not very surprised with the results, I understand that the size of the array is lost in the case of len1() and len2(), although the information is present at compile time.
Is there a way to pass the information about the size of the string to the template as well? Something like:
template<const char _S[unsigned int _N]>
constexpr size_t len6() { return _N-1; }
[Edit with context and intent]
I gave up trying to concatenate a set of strings at compile time so I am trying to do it at initialization time. Writing something like a().b().c().str() would output "abc" while a().b().str() would output "ab"
Using templates, a().b() creates a type of B with a parent type A. a().b().c() creates a type C with a parent type B which has a parent type A, etc.
Given a type B with a parent A, this is a unique type and it can have it's own static buffer to hold the concatenation (this is why l5 isn't good for me). I could then strcpy` each one consecutively in a static buffer. I don't want to use a dynamically allocated buffer because my allocator is not necessarily configured at this point.
The size of that buffer which should be big enough to hold the string associated with A and the string associated with B is what I am trying to figure out. I can get it to work if I explicitly sizeof() as an extra template parameter (as done with l4 in the snippet above), but that make the whole code heavy to read and cumbersome to use.
[Edit 2] I marked the answer that was most helpful - but Yakk's answer was also good on gcc but except it did not compile with Visual Studio.
My understanding at this point is that we cannot rely on const char [] with external linkage to provide their size. It may work locally (if the template is compiled in the same unit as the symbol), but it won't work if the const char[] is in a header file to be used in multiple places.
So I gave up on trying to extract the length from the const char* template paramter and decided to live with l4 where the sizeof() is also provided to the template arguments.
For those who are curious how the whole thing turned out, I pasted a full working sample on ideone: http://ideone.com/A0JwO8
I can now write Path<A>::b::c::path() and get the corresponding "b.c" string in a static buffer at initialization.

constexpr std::size_t length( const char * str ) {
return (!str||!*str)?0:(1+length(str+1));
}
template<const char * String>
constexpr size_t len() { return length(String); }
extern constexpr const char HELLO[] = "Hello World!!!";
live example. Recursion wouldn't be needed in C++14.

To concatenate string at compile time,
with gnu extension, you may do:
template<typename C, C...cs> struct Chars
{
using str_type = C[1 + sizeof...(cs)];
static constexpr C str[1 + sizeof...(cs)] = {cs..., 0};
constexpr operator const str_type&() const { return str; }
};
template<typename C, C...cs> constexpr C Chars<C, cs...>::str[1 + sizeof...(cs)];
// Requires GNU-extension
template <typename C, C...cs>
constexpr Chars<C, cs...> operator""_cs() { return {}; }
template <typename C, C...lhs, C...rhs>
constexpr Chars<C, lhs..., rhs...>
operator+(Chars<C, lhs...>, Chars<C, rhs...>) { return {}; }
With usage
constexpr auto hi = "Hello"_cs + " world\n"_cs;
std::cout << hi;
Demo
Without gnu extension, you have to use some MACRO to transform literal into char sequence, as I do there.

Alignment and padding of data inside a blob

I'm using a large blob (allocated memory) to store data continuously in the memory.
I want data inside the blob to be organized like this:
| data1 type | data1 | data2 type | data2 | dataN type | dataN |
dataN type is an int that I use in a switch to convert the dataN to the appropriate type.
The problem is I want to keep data properly aligned to do so I want to enforce all data inside the blob to be 8-bytes packed (I chosen 8 bytes for packing because it will probably keep data properly aligned?), this way data will tightly packed (there won't be holes between data->data types because of alignment).
I tried this:
#pragma pack(8)
class A
{
public:
short b;
int x;
char v;
};
But it doesn't work because using sizeof(A) I get 12 bytes instead of the expected 16 bytes.
P.S: Is there any data type larger than 8 bytes in either x86 or x64 architectures?

This answer assumes two things:
You want the binary blob to be packed tightly (no holes).
You don't want the data members to accessed in an unaligned fashion (which is slow compared to accessing data members that are aligned the way the compiler wants by default).
If this is the case, then you should consider a design where you treat the large "blob" as a byte-oriented stream. In this stream, you marshall/demarshall tag/value pairs that populate objects having natural alignment.
With this scheme, you get the best of both worlds. You get a tightly packed blob, but once you extract objects from the blob, accessing object members is fast because of the natural alignment. It is also portable1 and does not rely of compiler extensions. The disadvantage is the boilerplate code that you need to write for every type that can be put in the blob.
Rudimentary example:
#include <cassert>
#include <iomanip>
#include <iostream>
#include <stdint.h>
#include <vector>
enum BlobKey
{
kBlobKey_Widget,
kBlobKey_Gadget
};
class Blob
{
public:
Blob() : cursor_(0) {}
// Extract a value from the blob. The key associated with this value should
// already have been extracted.
template <typename T>
Blob& operator>>(T& value)
{
assert(cursor_ < bytes_.size());
char* dest = reinterpret_cast<char*>(&value);
for (size_t i=0; i<sizeof(T); ++i)
dest[i] = bytes_[cursor_++];
return *this;
}
// Insert a value into the blob
template <typename T>
Blob& operator<<(const T& value)
{
const char* src = reinterpret_cast<const char*>(&value);
for (size_t i=0; i<sizeof(T); ++i)
bytes_.push_back(src[i]);
return *this;
}
// Overloads of << and >> for std::string might be useful
bool atEnd() const {return cursor_ >= bytes_.size();}
void rewind() {cursor_ = 0;}
void clear() {bytes_.clear(); rewind();}
void print() const
{
using namespace std;
for (size_t i=0; i<bytes_.size(); ++i)
cout << setfill('0') << setw(2) << hex << int(bytes_[i]) << " ";
std::cout << "\n" << dec << bytes_.size() << " bytes\n";
}
private:
std::vector<uint8_t> bytes_;
size_t cursor_;
};
class Widget
{
public:
explicit Widget(int a=0, short b=0, char c=0) : a_(a), b_(b), c_(c) {}
void print() const
{
std::cout << "Widget: a_=" << a_ << " b=" << b_
<< " c_=" << c_ << "\n";
}
private:
int a_;
short b_;
long c_;
friend Blob& operator>>(Blob& blob, Widget& widget)
{
// Demarshall members from blob
blob >> widget.a_;
blob >> widget.b_;
blob >> widget.c_;
return blob;
};
friend Blob& operator<<(Blob& blob, Widget& widget)
{
// Marshall members to blob
blob << kBlobKey_Widget;
blob << widget.a_;
blob << widget.b_;
blob << widget.c_;
return blob;
};
};
class Gadget
{
public:
explicit Gadget(long a=0, char b=0, short c=0) : a_(a), b_(b), c_(c) {}
void print() const
{
std::cout << "Gadget: a_=" << a_ << " b=" << b_
<< " c_=" << c_ << "\n";
}
private:
long a_;
int b_;
short c_;
friend Blob& operator>>(Blob& blob, Gadget& gadget)
{
// Demarshall members from blob
blob >> gadget.a_;
blob >> gadget.b_;
blob >> gadget.c_;
return blob;
};
friend Blob& operator<<(Blob& blob, Gadget& gadget)
{
// Marshall members to blob
blob << kBlobKey_Gadget;
blob << gadget.a_;
blob << gadget.b_;
blob << gadget.c_;
return blob;
};
};
int main()
{
Widget w1(1,2,3), w2(4,5,6);
Gadget g1(7,8,9), g2(10,11,12);
// Fill blob with widgets and gadgets
Blob blob;
blob << w1 << g1 << w2 << g2;
blob.print();
// Retrieve widgets and gadgets from blob
BlobKey key;
while (!blob.atEnd())
{
blob >> key;
switch (key)
{
case kBlobKey_Widget:
{
Widget w;
blob >> w;
w.print();
}
break;
case kBlobKey_Gadget:
{
Gadget g;
blob >> g;
g.print();
}
break;
default:
std::cout << "Unknown object type in blob\n";
assert(false);
}
}
}
If you can use Boost, you might want to use Boost.Serialization with a binary memory stream, as in this answer.
(1) Portable means that the source code should compile anywhere. The resulting binary blob will not be portable if transferred to other machines with different endianness and integer sizes.

It looks like in this case #pragma pack(8) has no effect.
In MS compiler documentation the parameter of pack is described in the following way:
Specifies the value, in bytes, to be used for packing. The default value for n is 8. Valid values are 1, 2, 4, 8, and 16. The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.
Thus, the #pragma pack directive cannot increase the alignment of a member, but rather can decrease it (using #pragma pack(1) for example). In you case the whole structure alignment is chosen to make its biggest element to be naturally aligned (int which is usually 4 bytes on both 32 and 64-bit CPUs). As a result, the total size is 4 * 3 = 12 bytes.

#Negai explained why you get the observed size.
You should also reconsider your assumptions about "tightly packed" data. With the above structure there is holes in the structure. Assuming 32 bits int and 16 bits short, there is a two bytes hole after the short, and a 3 bytes hole after the char. But it does not matter as this space is inside the structure.
In other words either you get a tightly packed data structure, or you get an aligned data structure, but not both.
Typically, you won't have anything special to do to get the "aligned" behavior that is what the compiler do by default. #pragma pack is useful if you want your data "packed" instead of aligned, that is removing some holes introduced by compiler to keep data aligned.

Did you try this?
class A {
public:
union {
uint64_t dummy;
int data;
};
};
Instances of A and its data member will always be aligned to 8 bytes now. Of course this is pointless if you squeeze a 4 byte data type in the front, it has to be 8 bytes too.

Printing chars as Integers

I want to control whether my ostream outputting of chars and unsigned char's via << writes them as characters or integers. I can't find such an option in the standard library. For now I have reverted to using multiple overloads on a set of alternative print functions
ostream& show(ostream& os, char s) { return os << static_cast<int>(s); }
ostream& show(ostream& os, unsigned char s) { return os << static_cast<int>(s); }
Is there a better way?

No, there isn't a better way. A better way would take the form of a custom stream manipulator, like std::hex. Then you could turn your integer printing off and on without having to specify it for each number. But custom manipulators operate on the stream itself, and there aren't any format flags to do what you want. I suppose you could write your own stream, but that's way more work than you're doing now.
Honestly, your best bet is to see if your text editor has functions for making static_cast<int> easier to type. I assume you'd otherwise type it a lot or you wouldn't be asking. That way someone who reads your code knows exactly what you mean (i.e., printing a char as an integer) without having to look up the definition of a custom function.

Just an update to an old post. The actual trick is using '+'. Eg:
template <typename T>
void my_super_function(T x)
{
// ...
std::cout << +x << '\n'; // promotes x to a type printable as a number, regardless of type
// ...
}
In C++11 you could do:
template <typename T>
auto promote_to_printable_integer_type(T i) -> decltype(+i)
{
return +i;
}
Credit: How can I print a char as a number? How can I print a char* so the output shows the pointer’s numeric value?

I have a suggestion based on the technique used in how do I print an unsigned char as hex in c++ using ostream?.
template <typename Char>
struct Formatter
{
Char c;
Formatter(Char _c) : c(_c) { }
bool PrintAsNumber() const
{
// implement your condition here
}
};
template <typename Char>
std::ostream& operator<<(std::ostream& o, const Formatter<Char>& _fmt)
{
if (_fmt.PrintAsNumber())
return (o << static_cast<int>(_fmt.c));
else
return (o << _fmt.c);
}
template <typename Char>
Formatter<Char> fmt(Char _c)
{
return Formatter<Char>(_c);
}
void Test()
{
char a = 66;
std::cout << fmt(a) << std::endl;
}

In C++20 you'll be able to use std::format to do this:
unsigned char uc = 42;
std::cout << std::format("{:d}", uc); // format uc as integer 42 (the default)
std::cout << std::format("{:c}", uc); // format uc as char '*' (assuming ASCII)
In the meantime you can use the {fmt} library, std::format is based on.
Disclaimer: I'm the author of {fmt} and C++20 std::format.

Can I write the contents of vector<bool> to a stream directly from the internal buffer?

I know vector< bool > is "evil", and dynamic_bitset is preferred (bitset is not suitable) but I am using C++ Builder 6 and I don't really want to pursue the Boost route for such an old version. I tried :
int RecordLen = 1;
int NoBits = 8;
std::ofstream Binary( FileNameBinary );
vector< bool > CaseBits( NoBits, 0 );
Binary.write( ( const char * ) & CaseBits[ 0 ], RecordLen);
but the results are incorrect. I suspect that the implementation may mean this is a stupid thing to try, but I don't know.

Operator[] for vector <bool> doesn't return a reference (because bits are not addressable), so taking the return value's address is going to be fraught with problems. Have you considered std::deque <bool>?

the bool vector specialization does not return a reference to bool.
see here, bottom of the page.

It's too late for me to decide how compliant this is, but it works for me: give the bitvector a custom allocator to alias the bits to your own buffer.
Can someone weigh in with whether the rebound allocator inside the vector is required to be copy-constructed from the one passed in? Works on GCC 4.2.1. I seem to recall that the functionality is required for C++0x, and since it's not incompatible with anything in C++03 and is generally useful, support may already be widespread.
Of course, it's implementation-defined whether bits are stored forwards or backwards or left- or right-justified inside whatever storage vector<bool> uses, so take great care.
#include <vector>
#include <iostream>
#include <iomanip>
using namespace std;
template< class T >
struct my_alloc : allocator<T> {
template< class U > struct rebind {
typedef my_alloc<U> other;
};
template< class U >
my_alloc( my_alloc<U> const &o ) {
buf = o.buf;
}
my_alloc( void *b ) { buf = b; }
// noncompliant with C++03: no default constructor
T *allocate( size_t, const void *hint=0 ) {
return static_cast< T* >( buf );
}
void deallocate( T*, size_t ) { }
void *buf;
};
int main() {
unsigned long buf[ 2 ];
vector<bool, my_alloc<bool> > blah( 128, false, my_alloc<bool>( buf ) );
blah[3] = true;
blah[100] = true;
cerr << hex << setw(16) << buf[0] << " " << setw(16) << buf[1] << endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Problem using reinterpret_cast<> in c++ - c++

You're right about the lack of null terminator. The reason it's printing "ur" again is because you repeated the header->third instead of header->fourth. Instead of "char[1]", why not just declare those variables as "char"? struct Header { char msgType; char filler; char third; char fourth; };

The issue is not reinterpret_cast (although using it is a very bad idea) but in the types of the things in the struct. They should be of type 'char', not of type 'char[1]'.

Related

iostream equivalent to snprintf(NULL, 0, format_string, args...)

constexpr length of a string from template parameter

Alignment and padding of data inside a blob

Printing chars as Integers

Can I write the contents of vector<bool> to a stream directly from the internal buffer?

Categories

Resources