We can construct a vector to store a bunch of strings by writing vector, but a string is can be variable length, how can vector deal with that?
I also test a demo, test[0] begin with 0x2508cb0, test[1] begin with 0x2508cb8, but the diff of two addresses and the capacity of test[0] seems to be not same.
int main()
{
vector<string> test;
test.push_back("tes3235235et");
test.push_back("135125151241241241");
cout << test[0].capacity() << endl;
cout << test[1].capacity() << endl;
cout << &(test[0]) << endl;
cout << &(test[1]) << endl;
return 0;
}
Output:
12
18
0x2508cb0
0x2508cb8
The vector doesn't need to deal with that, because the string deals with that. Just like std::vector, std::string stores its elements in dynamically allocated memory. The characters are not part of the string object itself (except in the case of small string optimization), but are instead just referred to via a pointer. The actual size of the string object is set at compile time and is the same for all strings (and can be obtained by sizeof(std::string)), regardless of the number of characters.
Related
I am reading a file header using ifstream.
Edit: I was asked to put the full minimal program, so here it is.
#include <iostream>
#include <fstream>
using namespace std;
#pragma pack(push,2)
struct Header
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
int main(int argc,char* argv[])
{
string fileName;
fileName = "test";
string fileInName = fileName + ".dst";
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
if (!fileIn)
{
cout << "File Not Found" << endl;
return 0;
}
Header h={};
if (fileIn.is_open()) {
cout << "\n" << endl;
fileIn.read(reinterpret_cast<char *>(&h.label), sizeof(h.label));
cout << "Label: " << h.label << endl;
fileIn.read(reinterpret_cast<char *>(&h.st), sizeof(h.st));
cout << "Stitches: " << h.st << endl;
fileIn.read(reinterpret_cast<char *>(&h.co), sizeof(h.co));
cout << "Colour Count: " << h.co << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusXExtends),sizeof(h.plusXExtends));
cout << "Extends: " << h.plusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.minusXExtends),sizeof(h.minusXExtends));
cout << "Extends: " << h.minusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusYExtends),sizeof(h.plusYExtends));
cout << "Extends: " << h.plusYExtends << endl;
// This will output corrupted
cout << endl << endl;
cout << "Label: " << h.label << endl;
cout << "Stitches: " << h.st << endl;
cout << "Colour Count: " << h.co << endl;
cout << "Extends: " << h.plusXExtends << endl;
cout << "Extends: " << h.minusXExtends << endl;
cout << "Extends: " << h.plusYExtends << endl;
}
fileIn.close();
cout << "\n";
//cin.get();
return 0;
}
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
Then I use a struct to store the header items
The actual struct is longer than this. I shortened it because I didn't need the whole struct for the question.
Anyway as I read the struct I do a cout to see what I am getting. This part is fine.
As expected my cout shows the Label, Stitches, Colour Count no problem.
The problem is that if I want to do another cout after it has read the header I am getting corruption in the output. For instance if I put the following lines right after the above code eg
Instead of seeing Label, Stitches and Colour Count I get strange symbols, and corrupt output. Sometimes you can see the output of the h.label, with some corruption, but the labels are Stitches are written over. Sometimes with strange symbols, but sometimes with text from the previous cout. I think either the data in the struct is getting corrupted, or the cout output is getting corrupted, and I don't know why. The longer the header the more the problem becomes apparent. I would really like to do all the couts at the end of the header, but if I do that I see a big mess instead of what should be outputting.
My question is why is my cout becoming corrupted?
Using arrays to store strings is dangerous because if you allocate 20 characters to store the label and the label happens to be 20 characters long, then there is no room to store a NUL (0) terminating character. Once the bytes are stored in the array there's nothing to tell functions that are expecting null-terminated strings (like cout) where the end of the string is.
Your label has 20 chars. That's enough to store the first 20 letters of the alphabet:
ABCDEFGHIJKLMNOPQRST
But this is not a null-terminated string. This is just an array of characters. In fact, in memory, the byte right after the T will be the first byte of the next field, which happens to be your 11-character st array. Let's say those 11 characters are: abcdefghijk.
Now the bytes in memory look like this:
ABCDEFGHIJKLMNOPQRSTabcdefghijk
There's no way to tell where label ends and st begins. When you pass a pointer to the first byte of the array that is intended to be interpreted as a null-terminated string by convention, the implementation will happily start scanning until it finds a null terminating character (0). Which, on subsequent reuses of the structure, it may not! There's a serious risk of overrunning the buffer (reading past the end of the buffer), and potentially even the end of your virtual memory block, ultimately causing an access violation / segmentation fault.
When your program first ran, the memory of the header structure was all zeros (because you initialized with {}) and so after reading the label field from disk, the bytes after the T were already zero, so your first cout worked correctly. There happened to be a terminating null character at st[0]. You then overwrite this when you read the st field from disk. When you come back to output label again, the terminator is gone, and some characters of st will get interpreted as belonging to the string.
To fix the problem you probably want to use a different, more practical data structure to store your strings that allows for convenient string functions. And use your raw header structure just to represent the file format.
You can still read the data from disk into memory using fixed sized buffers, this is just for staging purposes (to get it into memory) but then store the data into a different structure that uses std::string variables for convenience and later use by your program.
For this you'll want these two structures:
#pragma pack(push,2)
struct RawHeader // only for file IO
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
struct Header // A much more practical Header struct than the raw one
{
std::string label;
std::string st;
std::string co;
std::string plusXExtends;
std::string minusXExtends;
std::string plusYExtends;
};
After you read the first structure, you'll transfer the fields by assigning the variables. Here's a helper function to do it.
#include <string>
#include <string.h>
template <int n> std::string arrayToString(const char(&raw)[n]) {
return std::string(raw, strnlen_s(raw, n));
}
In your function:
Header h;
RawHeader raw;
fileIn.read((char*)&raw, sizeof(raw));
// Now marshal all the fields from the raw header over to the practical header.
h.label = arrayToString(raw.label);
h.st = arrayToString(raw.st);
h.st = arrayToString(raw.st);
h.co = arrayToString(raw.co);
h.plusXExtends = arrayToString(raw.plusXExtends);
h.minusXExtends = arrayToString(raw.minusXExtends);
h.plusYExtends = arrayToString(raw.plusYExtends);
It's worth mentioning that you also have the option of keeping the raw structure around and not copying your raw char arrays to std::strings when you read the file. But you must then be certain that when you want to use the data, you always to compute and pass lengths of the strings to functions that will deal with those buffers as string data. (Similar to what my arrayToString helper does anyway.)
I'm a student at university. I work mostly with Java, C++ is very new to me, so I probably make many silly mistakes and I have upcoming exams to cope with. Don't be too harsh with me.
Note: I can NOT use C++ std::string because I need to work with C-strings due to university tasks!
Referring to my studies and the question I asked about pointers and const arguments (which you find here) I tried messing around with memory management but it seems it has no effect, or I just misunderstood some aspects about sizeof or actual sizes of certain elements.
This is my class Person:
Person.cpp
using namespace std;
Person::Person()
{
Person::name = new (char[64]);
Person::adress = new (char[64]);
Person::phone = new (char[64]);
cout << "standard constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
}
Person::Person(const char *name, const char *adress , const char *phone)
{
Person::name = new (char[strlen(name)]);
Person::adress = new (char[strlen(adress)]);
Person::phone = new (char[strlen(phone)]);
setName(name);
setAdress(adress);
setPhone(phone);
cout << "general constructor called; object created, allocated " << sizeof(this->name) << "+" << sizeof(this->adress) << "+" << sizeof(this->phone) << "bytes" << endl;
};
Person::Person(Person const &other)
{
Person::name = new (char[strlen(other.getName())]);
Person::adress = new (char[strlen(other.getAdress())]);
Person::phone = new (char[strlen(other.getPhone())]);
setName(other.getName());
setAdress(other.getAdress());
setPhone(other.getPhone());
cout << "copy constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
};
Person::~Person()
{
delete [] name;
delete [] adress;
delete [] phone;
cout << "destructor called; object removed" << endl;
};
I tried to spare memory with creating a C-string with a string length of the given parameters.
Thinking that a C-string is a char array, sparing chars would result in sparing memory, e.g. a C-string of "John" takes up less memory than a C-string of "Jonathan".
So now I'm not sure if I just got the wrong concept of C-strings or char arrays, or my implementation is just faulty.
In my main I create the following objects:
int main()
{
Person t;
t.printPerson();
cout << "size of t: " << sizeof(t) << endl;
Person p("John", "some street", "0736182");
p.printPerson();
cout << "size of p: " << sizeof(p) << endl;
Person x(p);
x.printPerson();
cout << "size of x: " << sizeof(x) << endl;
Person y("Jonathan", "Lancaster Ave 53", "3584695364");
y.printPerson();
cout << "size of y: " << sizeof(y) << endl;
cin.get();
};
But I alwas get a size of 24 per object, so 8 for each member variable. Why is that?
Thanks in advance.
I think you are expecting the sizeof operator to behave differently than it actually does. Let's take this code, for example:
const char* str = new char[137];
Here, if you write sizeof(str) you'll probably either get 4 or 8, depending on your system, because sizeof(str) measures the number of bytes of the pointer str itself rather than the number of bytes in the array pointed at by str. So, on a 32-bit system, you'd probably get 4, and on a 64-bit system you'd probably get 8, independently of how many characters you allocated.
Unfortunately, C++ doesn't have a way for you to get the number of characters or the memory used up by a dynamically allocated array. You just have to track that yourself.
Similarly, in your main function, when you write sizeof(p), you're measuring the number of bytes used by the object p, not the total number of bytes used by p and the arrays it points at. You'll always get back the same value for sizeof(p) regardless of what strings it points at.
If you're planning on working with strings in C++, I strongly recommend using std::string over raw C-style strings. They're much easier to use, they remember their length (so it's harder to mix up strlen and sizeof), and if you have a class holding s bunch of std::strings you don't need a copy constructor or assignment operator to handle the logic to shuffle them around. That would significantly clean up your code and eliminate most of the memory errors in it.
sizeof gives you a number of bytes which c/c++ need to keep the object in memory. In you r case (though you have not shown it) it looks like name, address, and phone are pointers to char:
struct Person {
char *name, *address, *phone;
}
a pointer is a variable which keeps an address of another object. So, depending on the underlying system it could occupy 32 bits (4 bytes) or 64 bite (8 bytes) (or some other number). In this case the sizeof struct person will be for 64-bit system -- 24. (3 pointers per 8 bytes each). This corresponds to your results.
The sizeof provides you with a shallow size calculation. Your strings are pointed by the those pointers and their lengths are not included. So, potentially you need to create a member function which will calculate those for you, i.e.
struct Person {
char *name, *address, *phone;
int getSize() {
return strlen(name) + strlen(address) + strlen(phone);
}
};
And as mentioned in the comments before, every char *string in c/c++ must have a termination character ('\0') which tells the program where the string ends. So, if you allocate space for a string, you should provide space for it as well (+ 1 to the length). And you have to make sure that this character is written as '\0'. if you use library functions to copy strings, they will take car of it, otherwise you need to do it manually.
void setName(const char *n) {
name = new char[strlen(n) + 1]; // includes needed '0', if exists in 'n'
strcpy(name, n); // copies the string and adds `\0` to the end
}
If you use the loop to copy chars instead of strcpy you would need to add it manually:
name[strlen(n)] = 0;
I am a beginner in C++ and I am currently working with strings.
My question is why when compiling the code I'm providing below, I can get the string's characters when I use index notation, but cannot get the string itself using cout?
This is the code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string original; // original message
string altered; // message with letter-shift
original = "abc";
cout << "Original : " << original << endl; // display the original message
for(int i = 0; i<original.size(); i++)
altered[i] = original[i] + 5;
// display altered message
cout << altered[0] << " " << altered[1] << " " << altered[2] << endl;
cout << "altered : " << altered << endl;
return 0;
}
When I run this, the characters in the string altered are displayed correctly with this line:
cout << altered[0] << " " << altered[1] << " " << altered[2] << endl;
But the string itself is not displayed with this line:
cout << "altered : " << altered << endl;
I would like to know why this happens.
You have not resized your altered string to fit the length of the original string before the loop, thus your code exhibits undefined behavior:
altered[i] = original[i] + 5; // UB - altered is empty
To fix this, resize altered before the loop:
altered.resize(original.size());
Or use std::string::operator+= or similar to append to altered:
altered += original[i] + 5;
This way, it can be empty before the loop, it will automatically resize itself to contain appended characters.
Explanation
The way UB is happening here, is that you're succeeding in writing the data in the static array, which std::string uses for short string optimization (std::string::operator[] does no checks if you're accessing this array past the std::string::size()), but std::string::size() remains 0, as well as std::string::begin() == std::string::end().
That's why you can access the data individually (again, with UB):
cout << altered[0] << " " << altered[1] << " " << altered[2] << endl;
but cout << aligned does not print anything, considering simplified operator<< definition for std::string looks functionally like this:
std::ostream &operator<<(std::ostream &os, std::string const& str)
{
for(auto it = str.begin(); it != str.end(); ++it) // this loop does not run
os << *it;
return os;
}
In one sentence, std::string is not aware of what you did to its underlying array and that you meant the string to grow in length.
To conclude, <algoritm> way of doing this transformation:
std::transform(original.begin(), original.end(),
std::back_inserter(altered), // or altered.begin() if altered was resized to original's length
[](char c)
{
return c + 5;
}
(required headers: <algorithm>, <iterator>)
In your program string altered is empty. It has no elements.
Thus you may not use the subscript operator to access non-existent elements of the string as you are doing
altered[i] = original[i] + 5;
So you can append the string with new characters. There are several ways to do this. For example
altered.push_back( original[i] + 5 );
or
altered.append( 1, original[i] + 5 );
or
altered += original[i] + 5;
As you may not apply the subscript operator for an empty string to assign a value then it is better to use the range-based for loop because the index itself in fact is not used. For example
for ( char c : original ) altered += c + 5;
The size of altered is always zero - by using indexes you are trying to copy values from original to altered at indexes altered does not have. As LogicStuff has said, this is undefined behaviour - it doesn't generate an error because when we use indexes with std::string we are in fact calling an operator on a std::string to access the data field of a string. Using [] operator is defined in the C++ Standard as having no range check - that's why no error was thrown. The safe way to access indexes is to use the at(i) method: altered.at(i) will instead throw a range error if altered.size() <= i
However, I'm going to give this as my solution because it's a "Modern C++" approach (plus shorter and complete).
This is the alternative I would do to what has been given above:
string original = "abc";
string altered = original;
for (auto& c : altered) c += 5; // ranged for-loop - for each element in original, increase its value by 5
cout << altered << endl;
Note the significant reduction in code :-)
Even if I were doing it LogicStuff's way, I would still do it like this:
string original = "abc"
string altered = ""; // this is actually what an empty string should be initialised to.
for (auto c : original) altered += (c+5);
However, I actually don't recommend this approach, because of the way push_back() and string appending / string concatenation work. It's fine in this small example, but what if original was a string holding the first 10 pages of a book to be parsed? Or what if it's a raw input of a million characters? Then every time the data field for altered reaches its limit it needs to be re-allocated via a system call and the contents of altered are copied and the prior allocation for the data field is freed. This is a significant performance impediment which grows relative to the size of original -- it's just bad practice. It would always be more efficient to do a complete copy and then iterate, making the necessary adjustments on the copied string. The same applies to std::vector.
I have this snippet from Thinking in C++.
#include <iostream>
#include <string>
int main ()
{
string bigNews("I saw Elvis in a UFO. ");
cout << bigNews << endl;
bigNews.insert(0, " thought I ");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
bigNews.append("I've been working too hard.");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
return 0;
}
And I get output as shown below:
I saw Elvis in a UFO.
thought I I saw Elvis in a UFO.
Size = 33
Capacity = 44
thought I I saw Elvis in a UFO. I've been working too hard.
Size = 60
Capacity = 88
I can figure out why the size increases, but I am not able to make out how the Capacity increases?
What i know is Capacity is the string buffer where we can Pushback, but how that space is allocated?
capacity is the maximum number of characters that the string can currently hold without having to grow. size is how many characters actually exist in the string. The reason they're separate concepts is that allocating memory is generally inefficient, so you try to allocate as rarely as possible by grabbing more memory than you actually need at one time. (Many data structures use a "doubling" method where, if they hit their capacity of N and need more space, they will allocate 2*N space, to avoid having to reallocate again any time soon.)
capacity will increase automatically as you use the string and require more space. You can also manually increase it using the reserve function.
From the documentation:
capacity()
returns the number of characters that can be held in currently allocated storage
(public member function)
So, it is the allocation size of the internal buffer. What you see is its size doubling when it's exhausted -- this is a common technique for using dynamically-sized buffers efficiently, and it's called "exponential storage expansion". What it boils down to is basically this:
void resize_buffer(char **buf, size_t *cap, size_t newsize)
{
while (newsize > *cap)
*cap *= 2;
*buf = realloc(*buf, *cap);
}
(Of course, this is largely simplified, don't use this for actual reallocation code in production.) Probably your implementation of std::string is using this trick, that's why you see the buffer size going up by 100%.
Why does the function sizeof not return the same size when its getting used on the struct itself?
I need to cast it because of a winsock program that im working on.
Thanks for any help, true.
#include <iostream>
#include <string>
using namespace std;
struct stringstruct
{
string s1;
string s2;
};
int main()
{
stringstruct ss = {"123","abc"};
char *NX = (char*)&ss;
cout << sizeof(NX) << endl << sizeof(*NX) << endl;
cout << sizeof(&ss) << endl << sizeof(ss) << endl;
getchar();
return 0;
}
the example above outputs
4
1
4
64
sizeof will tell you the size of the given expression's type. In both the sizeof(NX) and sizeof(&ss), the result is 4 because pointers on your machine take up 4 bytes. For sizeof(*NX), you are dereferencing a char*, which gives you a char, and a char takes up 1 byte (and always does), so you get the output 1. When you do sizeof(ss), ss is a stringstruct, so you get the size of a stringstruct, which appears to be 64 bytes.
stringstruct ss = {"123","abc"};
char *NX = (char*)&ss;
cout << sizeof(NX) << endl << sizeof(*NX) << endl;
cout << sizeof(&ss) << endl << sizeof(ss) << endl;
I'm pretty sure that any of these casts are pretty meaningless. NX will point at the beginning of your struct. Inside the struct are two objects of type string, which in turn have pointers pointing to the data they were initialized with "123" and "abc" respectively. sizeof(*NX) is just that - size of a char, and sizeof(NX) is indeed the size of a pointer. sizeof(ss) is the size of your two string members (and any padding added by the compiler) - and sizeof(&ss) is the size of a pointer to a stringstruct.
Now, I expect what you REALLY want is a way to send your data, "123" and "abc" as two separate strings over a network. None of the above will help you do that, since even if sizeof(ss) gives you the size of the data structure you want to send, the string values are not within that structure [1]. What you really need is something calls serialization - something that writes out your strings as separate elements as text/string.
Something like this would work:
struct stringstruct {
string s1;
string s2;
string to_string()
}
string stringstruct::to_string()
{
string res = s1 + " " + s2;
return res;
}
Then use to_string like this:
string temp = ss.to_string();
const char *to_send = temp.c_str();
int send_len = temp.length();
... send the string `to_send` with number of bytes `send_len`.
[1] There is an optimization where std::string is actually storing short strings within the actual class itself. But given a sufficiently long strong, it won't do that.
A pointer is of size 4(in your case seems to be 32 bit) no matter what it points. Size of the object itself on the other hand returns the real number of bytes that an object of that structure takes.