Consider this example:
class A
{
public:
int a;
public:
A():a(1){};
};
class B:public A
{
public:
int b;
public:
B():b(2){};
};
void print(A* a)
{
for(int i=0; i<10; ++i)
{
cout<<a[i].a<<" ";
}
}
int main()
{
B b[10];
print(b);
}
My program output is : 1 2 1 2 1 2 1 2 1 2 1 2.
But how it is accessing data b through a[i].a, because I never mentioned b in the output.
The problem is that sizeof(A) != sizeof(B).
And the print function thinks that is has an array (essentially) of A objects. It doesn't know anything about B or its members.
While B is an A, B[] is not an A[].
There are three natural ways to solve your problem:
Make print take a B* argument.
Make print a template function.
Create an overload of print that takes a B* argument.
Which one to pick is up to you, and depends on your use-case.
Why you seem to print values from B is because of the memory layout.
An array of A look something like this in memory:
+--------+--------+--------+--------+--------+--------+--------+-----+
| A[0].a | A[1].a | A[2].a | A[3].a | A[4].a | A[5].a | A[6].a | ... |
+--------+--------+--------+--------+--------+--------+--------+-----+
An array of B look like this
+--------+--------+--------+--------+--------+--------+--------+-----+
| B[0].a | B[0].b | B[1].a | B[1].b | B[2].a | B[2].b | B[3].a | ... |
+--------+--------+--------+--------+--------+--------+--------+-----+
Now if we put the two arrays next to each other:
+--------+--------+--------+--------+--------+--------+--------+-----+
| A[0].a | A[1].a | A[2].a | A[3].a | A[4].a | A[5].a | A[6].a | ... |
+--------+--------+--------+--------+--------+--------+--------+-----+
| B[0].a | B[0].b | B[1].a | B[1].b | B[2].a | B[2].b | B[3].a | ... |
+--------+--------+--------+--------+--------+--------+--------+-----+
With the above "overlay" it's easy to see why you seem to be printing the b member from the B class.
Though you cannot rely on this behavior, a probable explanation for the output,
Considering an array of objects of B
a|b | a|b
object1 object2
When you interpret it as A*
a | b | a
object1 object2 object3
Hence you get 1 2 1 2...
Related
I've been adding std::string_views to some old code for representing string like config params, as it provides a read only view, which is faster due to no need for copying.
However, one cannot concatenate two string_view together as the operator+ isn't defined. I see this question has a couple answers stating its an oversight and there is a proposal in for adding that in. However, that is for adding a string and a string_view, presumably if that gets implemented, the resulting concatenation would be a std::string
Would adding two string_view also fall in the same category? And if not, why shouldn't adding two string_view be supported?
Sample
std::string_view s1{"concate"};
std::string_view s2{"nate"};
std::string_view s3{s1 + s2};
And here's the error
error: no match for 'operator+' (operand types are 'std::string_view' {aka 'std::basic_string_view<char>'} and 'std::string_view' {aka 'std::basic_string_view<char>'})
A view is similar to a span in that it does not own the data, as the name implies it is just a view of the data. To concatenate the string views you'd first need to construct a std::string then you can concatenate.
std::string s3 = std::string(s1) + std::string(s2);
Note that s3 will be a std::string not a std::string_view since it would own this data.
A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.
But what does it look like?
Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,
// directly from my /usr/include/c++/12.2.0/string_view
size_t _M_len;
const _CharT* _M_str;
i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.
In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are consecutive in memory. With just two such memebrs, you can't represent a string which is made up of non-contiguous substrings.
Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.
Maybe a diagram can help.
Assume these lines of code
std::string s1{"hello"};
std::string s2{"world"};
s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+
I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.
Now, imagine you create two string views like this:
std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);
Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v · ^ ·
+---+---+---+---+---+ · | ·
| h | e | l | l | o | +---+ ·
+---+---+---+---+---+ | · ·
· ^ · | · s2._M_len ·
· | · | <----------->
+---+ · |
| · · +-- s2._M_str
| · s1._M_len ·
| <------------------->
|
+-------- s1._M_str
Given the above, can you see what's wrong with expecting that
std::string_view s3{s1 + s2};
works?
How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?
You can't because "hello" and "world" are located in two unrelated areas of memory.
std::string_view does not own any data, it is only a view. If you want to join two views to get a joined view, you can use boost::join() from the Boost library. But result type will be not a std::string_view.
#include <iostream>
#include <string_view>
#include <boost/range.hpp>
#include <boost/range/join.hpp>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = boost::join(s1, s2);
// print joined string
std::copy(joined.begin(), joined.end(), std::ostream_iterator(std::cout, ""));
std::cout << std::endl;
// other method to print
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}
C++23 has joined ranges in the standard library with the name of std::ranges::views::join_with_view
#include <iostream>
#include <ranges>
#include <string_view>
void test()
{
std::string_view s1{"hello, "}, s2{"world"};
auto joined = std::ranges::views::join_with_view(s1, s2);
for (auto c : joined) std::cout << c;
std::cout << std::endl;
}
I'm struggling with the correct mental model and understanding of std::vector.
What I thought I knew
When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory that is N * sizeof(T) bytes. For example,
// Initialize a vector of int
std::vector<int> intvec;
// Reserve contigious block of 4 4-byte chunks of memory
intvec.reserve(4); // [ | | | ]
// Filling in the memory chunks has obvious behavior:
intvec.push_back(1); // [1| | | ]
intvec.push_back(2); // [1|2| | ]
Then we can access any element in random access time because, if we ask for the kth element of the vector, we simply start at the memory address of the start of the vector and then "jump" k * sizeof(T) bytes to get to the kth element.
Custom Objects
My mental model breaks down for custom objects of unknown/varying size. For example,
class Foo {
public:
Foo() = default;
Foo(std::vector<int> vec): _vec{vec} {}
private:
std::vector<int> _vec;
};
int main() {
// Initialize a vector Foo
std::vector<Foo> foovec;
// Reserve contigious block of 4 ?-byte chunks of memory
foovec.reserve(4); // [ | | | ]
// How does memory allocation work since object sizes are unkown?
foovec.emplace_back(std::vector<int> {1,2}); // [{1,2}| | | ]
foovec.emplace_back(std::vector<int> {1,2,3,4,5}); // [{1,2}|{1,2,3,4,5}| | ]
return 0;
}
Since we don't know the size of each instance of Foo, how does foovec.reserve() allocate memory? Furthermore, how could you achieve random access time we don't know how far to "jump" to get to the kth element?
Your concept of size is flawed. A std::vector<type> has a compile time known size of space it is going to take up. It also has a run time size that it may use (this is allocated at run time and the vector holds a pointer to it). You can picture it laid out like
+--------+
| |
| Vector |
| |
| |
+--------+
|
|
v
+-------------------------------------------------+
| | | | | |
| Element | Element | Element | Element | Element |
| | | | | |
+-------------------------------------------------+
So when you have a vector of things that have a vector in them, each Element becomes the vector and then those point of to their own storage somewhere else like
+--------+
| |
| Vector |
| |
| |
+----+---+
|
|
v
+----+----+---------+---------+
| Object | Object | Object |
| with | with | with |
| Vector | Vector | Vector |
+----+----+----+----+----+----+
| | | +---------+---------+---------+---------+---------+
| | | | | | | | |
| | +--->+ Element | Element | Element | Element | Element |
| | | | | | | |
| | +-------------------------------------------------+
| | +-------------------------------------------------+
| | | | | | | |
| +--->+ Element | Element | Element | Element | Element |
| | | | | | |
| +-------------------------------------------------+
| +-------------------------------------------------+
| | | | | | |
+--->+ Element | Element | Element | Element | Element |
| | | | | |
+---------+---------+---------+---------+---------+
This way all of the vectors are next to each other, but the elements the vectors have can be anywhere else in memory. It is for this reason you don't want to use a std:vector<std::vector<int>> for a matrix. All of the sub vectors get memory to wherever so there is no locality between the rows.
Do note that this applies to all of the allocator aware containers as they do not store the elements inside the container directly. This is not true for std::array as, like a raw array, the elements are part of the container. If you have an std::array<int, 20> then it is at least sizeof(int) * 20 bytes in size.
the size of
class Foo {
public:
Foo() = default;
Foo(std::vector<int> vec): _vec{vec} {}
private:
std::vector<int> _vec;
};
is known and constant, the internal std::vector does the allocation in the heap, so there is no problem to do foovec.reserve(4);
else how a std::vector can be in the stack ? ;-)
The size of your class Foo is known at compile time, the std::vector class has a constant size, as the elements that it hold are allocated on the heap.
std::vector<int> empty{};
std::vector<int> full{};
full.resize(1000000);
assert(sizeof(empty) == sizeof(full));
Both instances of std::vector<int>, empty and full will always have the same size despite holding a different number of elements.
If you want an array which you can not resize, and it's size must be known at compile time, use std::array.
When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory
The compiler does no such thing. It generates code to request storage from the vector's allocator at runtime. By default this is std::allocator, which delegates to operator new, which will fetch uninitialized storage from the runtime system.
My mental model breaks down for custom objects of unknown/varying size
The only way a user-defined type can actually have unknown size is if it is incomplete - and you can't declare a vector to an incomplete type.
At any point in your code where the type is complete, its size is also fixed, and you can declare a vector storing that type as usual.
Your Foo is complete, and its size is fixed at compile time. You can check this with sizeof(Foo), and sizeof(foovec[0]) etc.
The vector owns a variable amount of storage, but doesn't contain it in the object. It just stores a pointer and the reserved & used sizes (or something equivalent). For example, an instance of:
class toyvec {
int *begin_;
int *end_;
size_t capacity_;
public:
// push_back, begin, end, and all other methods
};
always has fixed size sizeof(toyvec) = 2 * sizeof(int*) + sizeof(size_t) + maybe_some_padding. Allocating a huge block of memory, and setting begin to the start of it, has no effect on the size of the pointer itself.
tl;dr C++ does not have dynamically-resizing objects. The size of an object is fixed permanently by the class definition. C++ does have objects which own - and may resize - dynamic storage, but that isn't part of the object itself.
This question already has answers here:
C++ Swapping Pointers
(7 answers)
Closed 6 years ago.
Here I have two swap functions
void kswap(int* a, int* b)
{
int* temp = a;
a = b;
b = temp;
}
void kswap(int* a, int* b)
{
int temp = *a;
*a = *b;
*b = temp;
}
The value only changed inside of the first function,
and the second function change the value permanently..
Can anyone tell me the different between two functions?
I thought as both functions take pointer type through parameter, the value would be changed through both functions..
In function swap, a and b are int *, aka integer pointers, that means
they contain address of an integer in memory. As seen in diagram below:
Memory
==================
+----------------+
| |
+------> | num1 = 5 |
| | |
| +----> | num2 = 6 |
| | | |
| | | |
| | |================|
| | | Function swap |
| | | |
+-(------------ a |
| | |
+------------ b |
| |
+----------------+
Here,
`*a` : should be read as : `value at address contined in a`
`*b` : should be read as : `value at address contined in b`
In first example
In first kswap, after executing below statements,
int* temp = a; /* A pointer which points to same place as 'a' */
a = b; /* 'a' will now point to where 'b' is pointing */
b = temp; /* 'b' will now point to where 'temp' is pointing
* that means where 'a' was previously pointing */
the result is:
Memory
==================
+----------------+
| |
+------> | num1 = 5 | <------+
| | | |
| +----> | num2 = 6 | |
| | | | |
| | | | |
| | |================| |
| | | Function swap | |
| | | | |
+ +------------ a | |
| | | |
+-------------- b | |
| | |
| temp -----------------+
+----------------+
Note that, neither *a or *b is assigned any value, so neither of:
`*a` : that is : `value at address contined in a`
`*b` : that is : `value at address contined in b`
are changed.
So as seen in above picture, num1 is still 5, and num2 is still 6.
Only thing that has happended is that a is pointing to num2, and b is
pointing to num1.
In second example
In second kswap, after executing below statements,
int temp = *a; /* An int variable which will contain the same value as the
* value at adress contained in a */
*a = *b; /* value at address contained in 'a' will be equal to value
* at address contained in 'b' */
*b = temp; /* value at address contained in 'b' will be equal to value
* contained in 'temp' */
the result is:
Memory
==================
+----------------+
| |
+------> | num1 = 6 |
| | |
| +----> | num2 = 5 |
| | | |
| | | |
| | |================|
| | | Function swap |
| | | |
+-(------------ a |
| | |
+------------ b |
| |
| temp = 5 |
+----------------+
Note that, both *a or *b are assigned new value, so both:
`*a` : that is : `value at address contained in a`
`*b` : that is : `value at address contained in b`
are changed.
And as seen in above picture, num1 is now 6, and num2 is now 5. So in the second example, values of variables num1 and num2 are permanently changed.
Assume each function is called as:
void f()
{
int x = 101, y = 999;
kswap(&x, &y);
}
Remember that in C++ arguments are passed by value, so kswap receives the values of the addresses where x, y reside. The rest of the answer is inlined in the code comments below.
The kswap that works.
void kswap(int* a, int* b)
{
int temp = *a; // `a` is the address of `int x`
// `*a` is the integer value at address `a`
// i.e. the value of `x` so temp == 101 now
*a = *b; // same as above `*b` is the value of `y` i.e. 999
// now this integer value is copied to the address where `a` points
// effectively overwriting the old `x` value `101` with `999`
*b = temp; // finally, this copies the value in `temp` i.e. 101
// to the address where `b` points and overwrites
// the old `y` value `999`, which completes the swap
}
The kswap which does not work.
void kswap(int* a, int* b)
{
int* temp = a; // this copies `a` i.e. the address of `x`
// to local variable `temp`
a = b; // this copies `b` to `a`
// since arguments `a` and `b` are pointers and passed by value
// this only modifies the value of variable `a`
// it does **not** change `x` or its address in any way
b = temp; // this copies 'temp' to 'b', same comments as above
// now 'a' holds the address of `y` and `b` holds the address
// of `x` but **neither** 'x' nor 'y' values have been modified
// and pointer variables `a`, `b` go out of scope as soon as
// the function returns, so it's all a big no-op in the end
}
The first function swaps the addresses, but not outside the scope of the function.
The second function swaps the values, and outside the function's scope.
Adding the * to the name, means you want the value, not where it's at.
I'm doing this as a learning exercise. The C++ book I'm studying from casts a buffer as a structure for easy manipulation and streaming. Everything seems fine until I try using an array (body) and look at the binary data in the buffer after assigning values. It doesn't match what I expect.
#include <iostream>
#include <bitset>
#include <netinet/in.h>
using namespace std;
struct dataStruct
{
uint32_t header;
uint32_t *body;
};
int main(int argc, char* argv[])
{
int size, streamSize;
// 4 bytes per size + 4 bytes for header
size = 1;
streamSize = (size * 4) + 4;
// Create a stream of bytes of appropriate size
uint8_t *buffer = new uint8_t[streamSize];
// Cast stream as structure
dataStruct *sStream = (dataStruct *)buffer;
// Populate structure with nice 101010... binary patterns
sStream->header = 2863311530;
sStream->body = new uint32_t[1];
sStream->body[0] = 2863311530;
cout << "Struct: " << sStream->header << ", " << sStream->body[0] << endl;
// Look at raw data in stream
for (int i=0; i<sizeof(buffer); i++)
{
std::bitset<8> x(buffer[i]);
cout << "[" << i << "]->" << x << endl;
}
return 0;
}
The output is:
Struct: 2863311530, 2863311530
[0]->10101010
[1]->10101010
[2]->10101010
[3]->10101010
[4]->00000000
[5]->00000000
[6]->00000000
[7]->00000000
Why is index 4-7 not the same as 0-3? Both sStream->header and sStream->body contain the same values. They are mapped to the buffer. Is this because body is an array? If so how would I manipulate the stream for this to work when using an array?
Thanks
You are using uninitialized varieable size in:
streamSize = (size * 4) + 4;
Everything after that depends on streamSize is suspect and is a cause for undefined behavior.
Update
Even after size is initialized to 1, there are problems. Let's me walk through the code and how it affects the memory you have allocated.
After you execute the line:
uint8_t *buffer = new uint8_t[streamSize];
you have buffer pointing to memory like this:
buffer
|
v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
After you have executed the line:
dataStruct *sStream = (dataStruct *)buffer;
you have sStream pointing to the same memory like:
sStream
|
v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
If your compiler does not add any padding to the members of dataStruct (the best case scenario), you'll have:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
If your compiler adds padding to dataStream.header, sStream.body will point to something different. Worst case scenario: You have a 64-bit compiler. It adds 32 bits of padding to dataStream.header. In that case, you will have:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
Then, you will end up using unathorized memory when you try to assign anything to sStream.body, like:
sStream->body = new uint32_t[1];
Best case scenario, you have 32 bit compiler and there is no padding added to dataStream.header. Looks like you have a 64-bit compiler. Even if you compiler does not add any padding to dataStream.header, you are still looking at a memory overrun problem if sizeof(void*) is 64 bits, which I think you do.
Let's take the best case scenario of a 32 bit compiler that doesn't add any padding and the member of sStream point to the allocated memory like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
After you execute the line:
sStream->header = 2863311530;
the memory looks like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| 2863311530 | |
+---+---+---+---+---+---+---+---+
After you execute the line:
sStream->body = new uint32_t[1];
the memory looks like:
sStream.header sStream.body
| |
v v
+---+---+---+---+---+---+---+---+
| 2863311530 | SomeMemory |
+---+---+---+---+---+---+---+---+
SomeMemory
|
v
+---+---+---+---+
| |
+---+---+---+---+
After you execute the line:
sStream->body[0] = 2863311530;
SomeMemory gets populated and looks like:
SomeMemory
|
v
+---+---+---+---+
| 2863311530 |
+---+---+---+---+
I think you were surprised to see that the memory pointed to by buffer does not look like:
buffer
|
v
+---+---+---+---+---+---+---+---+
| 2863311530 | 2863311530 |
+---+---+---+---+---+---+---+---+
I hope it makes sense now why it does not.
I have a vector that I want to sort alphabetically. I have successfully been able to sort it by one indexes value alphabetically, but when I do it only changes the order of that index and not the entire vector. How can I get it to apply the order change to the entire vector?
This is my current code I am running:
std::sort (myvector[2].begin(), myvector[2].end(), compare);
bool icompare_char(char c1, char c2)
{
return std::toupper(c1) < std::toupper(c2);
}
bool compare(std::string const& s1, std::string const& s2)
{
if (s1.length() > s2.length())
return true;
if (s1.length() < s2.length())
return false;
return std::lexicographical_compare(s1.begin(), s1.end(),
s2.begin(), s2.end(),
icompare_char);
}
My general structure for this vector is vector[row][column] where:
| One | Two | Three |
| 1 | 2 | 3 |
| b | a | c |
For example if I had a vector:
myvector[0][0] = 'One' AND myvector[2][0]='b'
myvector[0][1] = 'Two' AND myvector[2][1]='a'
myvector[0][2] = 'Three' AND myvector[2][2]='c'
| One | Two | Three |
| 1 | 2 | 3 |
| b | a | c |
And I sort it I get:
myvector[0][0] = 'One' AND myvector[2][0]='a'
myvector[0][1] = 'Two' AND myvector[2][1]='b'
myvector[0][2] = 'Three' AND myvector[2][2]='c'
| One | Two | Three |
| 1 | 2 | 3 |
| a | b | c |
and not what I want:
myvector[0][0] = 'Two' AND myvector[2][0]='a'
myvector[0][1] = 'One' AND myvector[2][1]='b'
myvector[0][2] = 'Three' AND myvector[2][2]='c'
| Two | One | Three |
| 2 | 1 | 3 |
| a | b | c |
I looked around for a good approach but could not find anything that worked... I was thinking something like:
std::sort (myvector.begin(), myvector.end(), compare);
Then handle the sorting of the third index within my compare function so the whole vector would get edited... but when I passed my data I either only changed the order in the function and still did not change the top layer or got errors. Any advice or help would be greatly appreciated. Thank you in advance.
Ideally, merge the 3 data fields into a struct so that you can have just 1 vector and so sort it simply.
struct DataElement{
std::string str;
char theChar;
int num;
bool operator<(const DataElement& other)const{return theChar<other.theChar;}
};
std::vector<DataElement> myvector;
std::sort (myvector.begin(), myvector.end());