Casting buffers as structures - c++

I'm doing this as a learning exercise. The C++ book I'm studying from casts a buffer as a structure for easy manipulation and streaming. Everything seems fine until I try using an array (body) and look at the binary data in the buffer after assigning values. It doesn't match what I expect.
#include <iostream>
#include <bitset>
#include <netinet/in.h>
using namespace std;
struct dataStruct
{
uint32_t header;
uint32_t *body;
};
int main(int argc, char* argv[])
{
int size, streamSize;
// 4 bytes per size + 4 bytes for header
size = 1;
streamSize = (size * 4) + 4;
// Create a stream of bytes of appropriate size
uint8_t *buffer = new uint8_t[streamSize];
// Cast stream as structure
dataStruct *sStream = (dataStruct *)buffer;
// Populate structure with nice 101010... binary patterns
sStream->header = 2863311530;
sStream->body = new uint32_t[1];
sStream->body[0] = 2863311530;
cout << "Struct: " << sStream->header << ", " << sStream->body[0] << endl;
// Look at raw data in stream
for (int i=0; i<sizeof(buffer); i++)
{
std::bitset<8> x(buffer[i]);
cout << "[" << i << "]->" << x << endl;
}
return 0;
}
The output is:
Struct: 2863311530, 2863311530
[0]->10101010
[1]->10101010
[2]->10101010
[3]->10101010
[4]->00000000
[5]->00000000
[6]->00000000
[7]->00000000
Why is index 4-7 not the same as 0-3? Both sStream->header and sStream->body contain the same values. They are mapped to the buffer. Is this because body is an array? If so how would I manipulate the stream for this to work when using an array?
Thanks

You are using uninitialized varieable size in:
streamSize = (size * 4) + 4;
Everything after that depends on streamSize is suspect and is a cause for undefined behavior.
Update
Even after size is initialized to 1, there are problems. Let's me walk through the code and how it affects the memory you have allocated.
After you execute the line:
uint8_t *buffer = new uint8_t[streamSize];
you have buffer pointing to memory like this:
buffer
|
v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
After you have executed the line:
dataStruct *sStream = (dataStruct *)buffer;
you have sStream pointing to the same memory like:
sStream
|
v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
If your compiler does not add any padding to the members of dataStruct (the best case scenario), you'll have:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
If your compiler adds padding to dataStream.header, sStream.body will point to something different. Worst case scenario: You have a 64-bit compiler. It adds 32 bits of padding to dataStream.header. In that case, you will have:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
Then, you will end up using unathorized memory when you try to assign anything to sStream.body, like:
sStream->body = new uint32_t[1];
Best case scenario, you have 32 bit compiler and there is no padding added to dataStream.header. Looks like you have a 64-bit compiler. Even if you compiler does not add any padding to dataStream.header, you are still looking at a memory overrun problem if sizeof(void*) is 64 bits, which I think you do.
Let's take the best case scenario of a 32 bit compiler that doesn't add any padding and the member of sStream point to the allocated memory like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
After you execute the line:
sStream->header = 2863311530;
the memory looks like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| 2863311530 | |
+---+---+---+---+---+---+---+---+
After you execute the line:
sStream->body = new uint32_t[1];
the memory looks like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| 2863311530 | SomeMemory |
+---+---+---+---+---+---+---+---+
SomeMemory
|
v
+---+---+---+---+
| |
+---+---+---+---+
After you execute the line:
sStream->body[0] = 2863311530;
SomeMemory gets populated and looks like:
SomeMemory
|
v
+---+---+---+---+
| 2863311530 |
+---+---+---+---+
I think you were surprised to see that the memory pointed to by buffer does not look like:
buffer
|
v
+---+---+---+---+---+---+---+---+
| 2863311530 | 2863311530 |
+---+---+---+---+---+---+---+---+
I hope it makes sense now why it does not.

Related

Concatenating string_view objects

I've been adding std::string_views to some old code for representing string like config params, as it provides a read only view, which is faster due to no need for copying.
However, one cannot concatenate two string_view together as the operator+ isn't defined. I see this question has a couple answers stating its an oversight and there is a proposal in for adding that in. However, that is for adding a string and a string_view, presumably if that gets implemented, the resulting concatenation would be a std::string
Would adding two string_view also fall in the same category? And if not, why shouldn't adding two string_view be supported?
Sample
std::string_view s1{"concate"};
std::string_view s2{"nate"};
std::string_view s3{s1 + s2};
And here's the error
error: no match for 'operator+' (operand types are 'std::string_view' {aka 'std::basic_string_view<char>'} and 'std::string_view' {aka 'std::basic_string_view<char>'})
A view is similar to a span in that it does not own the data, as the name implies it is just a view of the data. To concatenate the string views you'd first need to construct a std::string then you can concatenate.
std::string s3 = std::string(s1) + std::string(s2);
Note that s3 will be a std::string not a std::string_view since it would own this data.
A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.
But what does it look like?
Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,
// directly from my /usr/include/c++/12.2.0/string_view
size_t _M_len;
const _CharT* _M_str;
i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.
In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are consecutive in memory. With just two such memebrs, you can't represent a string which is made up of non-contiguous substrings.
Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.
Maybe a diagram can help.
Assume these lines of code
std::string s1{"hello"};
std::string s2{"world"};
s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+
I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.
Now, imagine you create two string views like this:
std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);
Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v · ^ ·
+---+---+---+---+---+ · | ·
| h | e | l | l | o | +---+ ·
+---+---+---+---+---+ | · ·
· ^ · | · s2._M_len ·
· | · | <----------->
+---+ · |
| · · +-- s2._M_str
| · s1._M_len ·
| <------------------->
|
+-------- s1._M_str
Given the above, can you see what's wrong with expecting that
std::string_view s3{s1 + s2};
works?
How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?
You can't because "hello" and "world" are located in two unrelated areas of memory.
std::string_view does not own any data, it is only a view. If you want to join two views to get a joined view, you can use boost::join() from the Boost library. But result type will be not a std::string_view.
#include <iostream>
#include <string_view>
#include <boost/range.hpp>
#include <boost/range/join.hpp>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = boost::join(s1, s2);
// print joined string
std::copy(joined.begin(), joined.end(), std::ostream_iterator(std::cout, ""));
std::cout << std::endl;
// other method to print
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}
C++23 has joined ranges in the standard library with the name of std::ranges::views::join_with_view
#include <iostream>
#include <ranges>
#include <string_view>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = std::ranges::views::join_with_view(s1, s2);
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}

How does std::vector support contiguous memory for custom objects of unknown size

I'm struggling with the correct mental model and understanding of std::vector.
What I thought I knew
When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory that is N * sizeof(T) bytes. For example,
// Initialize a vector of int
std::vector<int> intvec;
// Reserve contigious block of 4 4-byte chunks of memory
intvec.reserve(4); // [ | | | ]
// Filling in the memory chunks has obvious behavior:
intvec.push_back(1); // [1| | | ]
intvec.push_back(2); // [1|2| | ]
Then we can access any element in random access time because, if we ask for the kth element of the vector, we simply start at the memory address of the start of the vector and then "jump" k * sizeof(T) bytes to get to the kth element.
Custom Objects
My mental model breaks down for custom objects of unknown/varying size. For example,
class Foo {
public:
Foo() = default;
Foo(std::vector<int> vec): _vec{vec} {}
private:
std::vector<int> _vec;
};
int main() {
// Initialize a vector Foo
std::vector<Foo> foovec;
// Reserve contigious block of 4 ?-byte chunks of memory
foovec.reserve(4); // [ | | | ]
// How does memory allocation work since object sizes are unkown?
foovec.emplace_back(std::vector<int> {1,2}); // [{1,2}| | | ]
foovec.emplace_back(std::vector<int> {1,2,3,4,5}); // [{1,2}|{1,2,3,4,5}| | ]
return 0;
}
Since we don't know the size of each instance of Foo, how does foovec.reserve() allocate memory? Furthermore, how could you achieve random access time we don't know how far to "jump" to get to the kth element?
Your concept of size is flawed. A std::vector<type> has a compile time known size of space it is going to take up. It also has a run time size that it may use (this is allocated at run time and the vector holds a pointer to it). You can picture it laid out like
+--------+
| |
| Vector |
| |
| |
+--------+
|
|
v
+-------------------------------------------------+
| | | | | |
| Element | Element | Element | Element | Element |
| | | | | |
+-------------------------------------------------+
So when you have a vector of things that have a vector in them, each Element becomes the vector and then those point of to their own storage somewhere else like
+--------+
| |
| Vector |
| |
| |
+----+---+
|
|
v
+----+----+---------+---------+
| Object | Object | Object |
| with | with | with |
| Vector | Vector | Vector |
+----+----+----+----+----+----+
| | | +---------+---------+---------+---------+---------+
| | | | | | | | |
| | +--->+ Element | Element | Element | Element | Element |
| | | | | | | |
| | +-------------------------------------------------+
| | +-------------------------------------------------+
| | | | | | | |
| +--->+ Element | Element | Element | Element | Element |
| | | | | | |
| +-------------------------------------------------+
| +-------------------------------------------------+
| | | | | | |
+--->+ Element | Element | Element | Element | Element |
| | | | | |
+---------+---------+---------+---------+---------+
This way all of the vectors are next to each other, but the elements the vectors have can be anywhere else in memory. It is for this reason you don't want to use a std:vector<std::vector<int>> for a matrix. All of the sub vectors get memory to wherever so there is no locality between the rows.
Do note that this applies to all of the allocator aware containers as they do not store the elements inside the container directly. This is not true for std::array as, like a raw array, the elements are part of the container. If you have an std::array<int, 20> then it is at least sizeof(int) * 20 bytes in size.
the size of
class Foo {
public:
Foo() = default;
Foo(std::vector<int> vec): _vec{vec} {}
private:
std::vector<int> _vec;
};
is known and constant, the internal std::vector does the allocation in the heap, so there is no problem to do foovec.reserve(4);
else how a std::vector can be in the stack ? ;-)
The size of your class Foo is known at compile time, the std::vector class has a constant size, as the elements that it hold are allocated on the heap.
std::vector<int> empty{};
std::vector<int> full{};
full.resize(1000000);
assert(sizeof(empty) == sizeof(full));
Both instances of std::vector<int>, empty and full will always have the same size despite holding a different number of elements.
If you want an array which you can not resize, and it's size must be known at compile time, use std::array.
When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory
The compiler does no such thing. It generates code to request storage from the vector's allocator at runtime. By default this is std::allocator, which delegates to operator new, which will fetch uninitialized storage from the runtime system.
My mental model breaks down for custom objects of unknown/varying size
The only way a user-defined type can actually have unknown size is if it is incomplete - and you can't declare a vector to an incomplete type.
At any point in your code where the type is complete, its size is also fixed, and you can declare a vector storing that type as usual.
Your Foo is complete, and its size is fixed at compile time. You can check this with sizeof(Foo), and sizeof(foovec[0]) etc.
The vector owns a variable amount of storage, but doesn't contain it in the object. It just stores a pointer and the reserved & used sizes (or something equivalent). For example, an instance of:
class toyvec {
int *begin_;
int *end_;
size_t capacity_;
public:
// push_back, begin, end, and all other methods
};
always has fixed size sizeof(toyvec) = 2 * sizeof(int*) + sizeof(size_t) + maybe_some_padding. Allocating a huge block of memory, and setting begin to the start of it, has no effect on the size of the pointer itself.
tl;dr C++ does not have dynamically-resizing objects. The size of an object is fixed permanently by the class definition. C++ does have objects which own - and may resize - dynamic storage, but that isn't part of the object itself.

C++, why is an increase in one element of a multi-dimensional array appear to be increasing another?

This may not be elegant. Chiefly because I am relatively new to C++, but this little program I am putting together is stumbling here.
I don't get it. Have I misunderstood arrays? The edited code is:
int diceArray [6][3][1] = {};
...
}else if (y >= xSuccess || x >= xSuccess){
// from here...
diceArray[2][1][0] = diceArray[2][1][0] + 1;
diceArray[2][1][1] = diceArray[2][1][1] + 1;
// ...to here, diceArray[2][2][0] increases by 1. I am not referencing that part of the array at all. Or am I?
}
By using comments I tracked the culprit down to the second expression. If I comment out the first one diceArray[2][2][0] does not change.
Why is diceArray[2][1][1] = diceArray[2][1][1] + 1 causing diceArray[2][2][0] to increment?
I tried..
c = diceArray[2][1][1] + 1;
diceArray[2][1][1] = c;
..as a workaround but it was just the same. It increased diceArray[2][2][0] by one.
You are indexing out of bounds. If I declare such an array
int data [3];
Then the valid indices are
data[0]
data[1]
data[2]
The analog to this is that you declare
int diceArray [6][3][1]
^
But then try to assign to
diceArray[2][1][0]
^
diceArray[2][1][1] // This is out of range
^
Since you are assigning out of range, due to pointer arithmetic you are actually assigning to the next dimension due to striding, etc.
The variable is declared as:
int diceArray [6][3][1] = {};
This is how it looks like in memory:
+---+ -.
| | <- diceArray[0][0] \
+---+ \
| | <- diceArray[0][1] > diceArray[0]
+---+ /
| | <- diceArray[0][2] /
+---+ -'
| | <- diceArray[1][0] \
+---+ \
| | <- diceArray[1][1] > diceArray[1]
+---+ /
| | <- diceArray[1][2] /
+---+ -'
. . .
. . .
. . .
+---+ -.
| | <- diceArray[5][0] \
+---+ \
| | <- diceArray[5][1] > diceArray[5]
+---+ /
| | <- diceArray[5][2] /
+---+ -'
The innermost component of diceArray is an array of size 1.
C/C++ arrays are always indexed starting from 0 and that means the only valid index in and array of size 1 is 0.
During the compilation, a reference to diceArray[x][y][z] is converted using pointer arithmetic to offset x*3*1+y*1+z (int values) using the memory address of diceArray as base.
The code:
diceArray[2][1][1] = diceArray[2][1][1] + 1;
operates on offset 8 (=2*3*1+1*1+1) inside diceArray. The same offset is computed using diceArray[2][2][0], which is a legal access inside the array.
The modern compilers are usually able to detect this kind of errors and warn you on the compilation.

how does the value of variable get changed through swap function? [duplicate]

This question already has answers here:
C++ Swapping Pointers
(7 answers)
Closed 6 years ago.
Here I have two swap functions
void kswap(int* a, int* b)
{
int* temp = a;
a = b;
b = temp;
}
void kswap(int* a, int* b)
{
int temp = *a;
*a = *b;
*b = temp;
}
The value only changed inside of the first function,
and the second function change the value permanently..
Can anyone tell me the different between two functions?
I thought as both functions take pointer type through parameter, the value would be changed through both functions..
In function swap, a and b are int *, aka integer pointers, that means
they contain address of an integer in memory. As seen in diagram below:
Memory
==================
+----------------+
| |
+------> | num1 = 5 |
| | |
| +----> | num2 = 6 |
| | | |
| | | |
| | |================|
| | | Function swap |
| | | |
+-(------------ a |
| | |
+------------ b |
| |
+----------------+
Here,
`*a` : should be read as : `value at address contined in a`
`*b` : should be read as : `value at address contined in b`
In first example
In first kswap, after executing below statements,
int* temp = a; /* A pointer which points to same place as 'a' */
a = b; /* 'a' will now point to where 'b' is pointing */
b = temp; /* 'b' will now point to where 'temp' is pointing
* that means where 'a' was previously pointing */
the result is:
Memory
==================
+----------------+
| |
+------> | num1 = 5 | <------+
| | | |
| +----> | num2 = 6 | |
| | | | |
| | | | |
| | |================| |
| | | Function swap | |
| | | | |
+ +------------ a | |
| | | |
+-------------- b | |
| | |
| temp -----------------+
+----------------+
Note that, neither *a or *b is assigned any value, so neither of:
`*a` : that is : `value at address contined in a`
`*b` : that is : `value at address contined in b`
are changed.
So as seen in above picture, num1 is still 5, and num2 is still 6.
Only thing that has happended is that a is pointing to num2, and b is
pointing to num1.
In second example
In second kswap, after executing below statements,
int temp = *a; /* An int variable which will contain the same value as the
* value at adress contained in a */
*a = *b; /* value at address contained in 'a' will be equal to value
* at address contained in 'b' */
*b = temp; /* value at address contained in 'b' will be equal to value
* contained in 'temp' */
the result is:
Memory
==================
+----------------+
| |
+------> | num1 = 6 |
| | |
| +----> | num2 = 5 |
| | | |
| | | |
| | |================|
| | | Function swap |
| | | |
+-(------------ a |
| | |
+------------ b |
| |
| temp = 5 |
+----------------+
Note that, both *a or *b are assigned new value, so both:
`*a` : that is : `value at address contained in a`
`*b` : that is : `value at address contained in b`
are changed.
And as seen in above picture, num1 is now 6, and num2 is now 5. So in the second example, values of variables num1 and num2 are permanently changed.
Assume each function is called as:
void f()
{
int x = 101, y = 999;
kswap(&x, &y);
}
Remember that in C++ arguments are passed by value, so kswap receives the values of the addresses where x, y reside. The rest of the answer is inlined in the code comments below.
The kswap that works.
void kswap(int* a, int* b)
{
int temp = *a; // `a` is the address of `int x`
// `*a` is the integer value at address `a`
// i.e. the value of `x` so temp == 101 now
*a = *b; // same as above `*b` is the value of `y` i.e. 999
// now this integer value is copied to the address where `a` points
// effectively overwriting the old `x` value `101` with `999`
*b = temp; // finally, this copies the value in `temp` i.e. 101
// to the address where `b` points and overwrites
// the old `y` value `999`, which completes the swap
}
The kswap which does not work.
void kswap(int* a, int* b)
{
int* temp = a; // this copies `a` i.e. the address of `x`
// to local variable `temp`
a = b; // this copies `b` to `a`
// since arguments `a` and `b` are pointers and passed by value
// this only modifies the value of variable `a`
// it does **not** change `x` or its address in any way
b = temp; // this copies 'temp' to 'b', same comments as above
// now 'a' holds the address of `y` and `b` holds the address
// of `x` but **neither** 'x' nor 'y' values have been modified
// and pointer variables `a`, `b` go out of scope as soon as
// the function returns, so it's all a big no-op in the end
}
The first function swaps the addresses, but not outside the scope of the function.
The second function swaps the values, and outside the function's scope.
Adding the * to the name, means you want the value, not where it's at.

passing 2d array as pointer to pointers in c++ gives segmentation fault [duplicate]

i am trying to cast a void** pointer to an int** 2D array in C
here is the code that i am trying to work with (with all the extraneous bits removed):
\*assume that i have a data structure called graph with some
*element "void** graph" in it and some element "int order" */
void initialise_graph_data(graph_t *graph)
{
void **graph_data = NULL;
int (*matrix)[graph->order];
size_t size = (graph->order * graph->order) * sizeof(int);
graph_data = safe_malloc(size); /*safe malloc works fine*/
matrix = (int(*)[graph->order])graph_data;
graph->graph = graph_data;
}
when i compile that, it works fine, but gives me a warning that variable 'matrix' is set but not used. i dont really want to have to use the interim matrix variable because the function is just supposed to initialise the array, not put anything in it; but if i try to cast graph_data directly to an int** when i am assiging it to graph->graph like so:
graph->graph = (int(*)[graph->order])graph_data;
it gives me an assignment from incompatible pointer type warning.
am i just not casting it properly? does anyone have any suggestions as to how i can make it work without the interim "matrix" variable? or if not, what i can do with that variable so that it doesnt give me the warning that it is set but not used?
thanks
The compiler is right, an array of arrays (or a pointer to an array) is not the same as a pointer to a pointer. Just think about how they would be laid out in memory:
A matrix of size MxN in the form of an array of arrays:
+--------------+--------------+-----+----------------+--------------+-----+------------------+
| matrix[0][0] | matrix[0][1] | ... | matrix[0][N-1] | matrix[1][0] | ... | matrix[M-1][N-1] |
+--------------+--------------+-----+----------------+--------------+-----+------------------+
A and the same "matrix" in the form of pointer to pointer:
+-----------+-----------+-----------+-----+
| matrix[0] | matrix[1] | matrix[2] | ... |
+-----------+-----------+-----------+-----+
| | |
| | V
| | +--------------+--------------+-----+
| | | matrix[2][0] | matrix[2][1] | ... |
| | +--------------+--------------+-----+
| |
| V
| +--------------+--------------+-----+
| | matrix[1][0] | matrix[1][1] | ... |
| +--------------+--------------+-----+
|
V
+--------------+--------------+-----+
| matrix[0][0] | matrix[0][1] | ... |
+--------------+--------------+-----+
It doesn't matter if you allocate the correct size, the two variables simply are incompatible which is what your compiler is telling you.