C++ sizeof C-style string / char array - optimization - c++

I'm a student at university. I work mostly with Java, C++ is very new to me, so I probably make many silly mistakes and I have upcoming exams to cope with. Don't be too harsh with me.
Note: I can NOT use C++ std::string because I need to work with C-strings due to university tasks!
Referring to my studies and the question I asked about pointers and const arguments (which you find here) I tried messing around with memory management but it seems it has no effect, or I just misunderstood some aspects about sizeof or actual sizes of certain elements.
This is my class Person:
Person.cpp
using namespace std;
Person::Person()
{
Person::name = new (char[64]);
Person::adress = new (char[64]);
Person::phone = new (char[64]);
cout << "standard constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
}
Person::Person(const char *name, const char *adress , const char *phone)
{
Person::name = new (char[strlen(name)]);
Person::adress = new (char[strlen(adress)]);
Person::phone = new (char[strlen(phone)]);
setName(name);
setAdress(adress);
setPhone(phone);
cout << "general constructor called; object created, allocated " << sizeof(this->name) << "+" << sizeof(this->adress) << "+" << sizeof(this->phone) << "bytes" << endl;
};
Person::Person(Person const &other)
{
Person::name = new (char[strlen(other.getName())]);
Person::adress = new (char[strlen(other.getAdress())]);
Person::phone = new (char[strlen(other.getPhone())]);
setName(other.getName());
setAdress(other.getAdress());
setPhone(other.getPhone());
cout << "copy constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
};
Person::~Person()
{
delete [] name;
delete [] adress;
delete [] phone;
cout << "destructor called; object removed" << endl;
};
I tried to spare memory with creating a C-string with a string length of the given parameters.
Thinking that a C-string is a char array, sparing chars would result in sparing memory, e.g. a C-string of "John" takes up less memory than a C-string of "Jonathan".
So now I'm not sure if I just got the wrong concept of C-strings or char arrays, or my implementation is just faulty.
In my main I create the following objects:
int main()
{
Person t;
t.printPerson();
cout << "size of t: " << sizeof(t) << endl;
Person p("John", "some street", "0736182");
p.printPerson();
cout << "size of p: " << sizeof(p) << endl;
Person x(p);
x.printPerson();
cout << "size of x: " << sizeof(x) << endl;
Person y("Jonathan", "Lancaster Ave 53", "3584695364");
y.printPerson();
cout << "size of y: " << sizeof(y) << endl;
cin.get();
};
But I alwas get a size of 24 per object, so 8 for each member variable. Why is that?
Thanks in advance.

I think you are expecting the sizeof operator to behave differently than it actually does. Let's take this code, for example:
const char* str = new char[137];
Here, if you write sizeof(str) you'll probably either get 4 or 8, depending on your system, because sizeof(str) measures the number of bytes of the pointer str itself rather than the number of bytes in the array pointed at by str. So, on a 32-bit system, you'd probably get 4, and on a 64-bit system you'd probably get 8, independently of how many characters you allocated.
Unfortunately, C++ doesn't have a way for you to get the number of characters or the memory used up by a dynamically allocated array. You just have to track that yourself.
Similarly, in your main function, when you write sizeof(p), you're measuring the number of bytes used by the object p, not the total number of bytes used by p and the arrays it points at. You'll always get back the same value for sizeof(p) regardless of what strings it points at.
If you're planning on working with strings in C++, I strongly recommend using std::string over raw C-style strings. They're much easier to use, they remember their length (so it's harder to mix up strlen and sizeof), and if you have a class holding s bunch of std::strings you don't need a copy constructor or assignment operator to handle the logic to shuffle them around. That would significantly clean up your code and eliminate most of the memory errors in it.

sizeof gives you a number of bytes which c/c++ need to keep the object in memory. In you r case (though you have not shown it) it looks like name, address, and phone are pointers to char:
struct Person {
char *name, *address, *phone;
}
a pointer is a variable which keeps an address of another object. So, depending on the underlying system it could occupy 32 bits (4 bytes) or 64 bite (8 bytes) (or some other number). In this case the sizeof struct person will be for 64-bit system -- 24. (3 pointers per 8 bytes each). This corresponds to your results.
The sizeof provides you with a shallow size calculation. Your strings are pointed by the those pointers and their lengths are not included. So, potentially you need to create a member function which will calculate those for you, i.e.
struct Person {
char *name, *address, *phone;
int getSize() {
return strlen(name) + strlen(address) + strlen(phone);
}
};
And as mentioned in the comments before, every char *string in c/c++ must have a termination character ('\0') which tells the program where the string ends. So, if you allocate space for a string, you should provide space for it as well (+ 1 to the length). And you have to make sure that this character is written as '\0'. if you use library functions to copy strings, they will take car of it, otherwise you need to do it manually.
void setName(const char *n) {
name = new char[strlen(n) + 1]; // includes needed '0', if exists in 'n'
strcpy(name, n); // copies the string and adds `\0` to the end
}
If you use the loop to copy chars instead of strcpy you would need to add it manually:
name[strlen(n)] = 0;

Related

Why does printing the 'address of index n' of c style strings lead to output of substring

I'm rather new to C++ and while working with a pointer to a char array (C style string) I was confused by its behavior with the ostream object.
const char* items {"sox"};
cout << items << endl;
cout << items[0] << endl;
cout << *items << endl;
cout << &items << endl;
cout << &items[1] << endl;
Running this leads to:
sox
s
s
0x7fff2e832870
ox
In contrary to pointer of other data types, printing the variable doesn't output the address, but the string as a whole. By what I understand, this is due to the << operator being overloaded for char arrays to treat them as strings.
What I don't understand is, that cout << &items[1] prints the string from index 1 onward (ox), instead of the address of the char at index 1. Is this also due to << operator being overloaded or what is the reason for this behavior?
The type of &items[1] is const char *. Therefore the const char * overload of operator << is used, which prints the string from index 1 onwards.
OTOH, the type of &items is const char **, for which no specific overload exists, so the address of items is printed (via the const void * overload).
Back in the olden days, when C ran the world, there was no std::string, and programmers had to make do with arrays of char to manage text. When C++ brought enlightenment (and std::string), old habits persevered, and arrays of char are still used to manage text. Because of this heritage, you'll find many places where arrays of char act differently from arrays of any other type.
So,
const int integers[] = { 1, 2, 3, 4 };
std::cout << integers << '\n';
prints the address of the first element in the array.
But,
const char text[] = { 'a', 'b', 'c', '\0' };
std::cout << text << '\n';
prints the text in the array text, up to the final 0: abc
Similarly, if you try to print addresses inside the array, you get different behavior:
std::cout << &integers[1] << '\n';
prints the address of the second element in th array, but
std::cout << &text[1] << '\n';
prints the text starting at the second character of the array: bc
And, as you suspected, that's because operator<< has an overload that takes const char* and copies text beginning at the location pointed to by the pointer, and continuing up to the first 0 that it sees. That's how C strings work, and that behavior carries over into C++.
items[1] is the second character of the array and its address, i.e. &items[1], is a pointer to the second character (with index 1) as well. So, with the same rule that you have mentioned for operator <<, the second character of the string till the end is printed.

When creating a char array, its length is different from required

I need to create a newStr array with length of str array. But after its created the strlen(newStr) is totally different. For example if a strlen(str) is 5, then strlen(newStr) would be 22. What am I doing wrong?
#include <iostream>
using namespace std;
int main()
{
char *str = "Hello";
int strLength = strlen(str);
std::cout << "str = " << str << "\t" << "strLength = " << strLength << std::endl;
char *newStr = new char[strLength];
std::cout << "newStrLength = " << strlen(newStr) << std::endl;
system("pause");
return 0;
}
In the console will be
str = Hello strLength = 5
newStrLength = 22
You are mixing up two different concepts:
new[] allocates uninitialized memory block to your program,
strlen(...) counts characters in a C string before null terminator '\0' is reached.
The size of the allocated block cannot be measured with strlen. In fact, it cannot be measured at all - your program must know how much memory it has requested, and make sure that it does not go past the limit.
Once you allocated new char[n], you can safely copy a C string of length up to n-1 into that block. C++ guarantees that enough memory would be there for you to complete the operation successfully:
char *newStr = new char[strLength+1]; // Note +1 for null terminator
strcpy(newStr, str);
std::cout << "newStrLength = " << strlen(newStr) << std::endl;
delete[] newStr;
The way strlen works is that it examines the contents of the string passed to it, and counts how many characters there are until the first terminating character. The terminating character for a string is '\0' (or 0).
What you've done is asked for the length of a string that you've not assigned any value to; leading to strlen examining random memory; looking for the first 0. In this case, it found it 22 bytes further down; but it could be anything. It could even crash because you start looking into memory you don't have read access to.
The best way to resolve this is to use std::string and then you can call length and other helper functions without having to worry about the underlying pointers too much; which will also resolve your memory leak.

Conversion from string constant, pointers in c++

After reading several answers I have corrected my code to as follows;
int main()
{
// a pointer to char is initialized with a string literal
char Buffer[100];
cout << "Enter an initial string: " << endl;
cin >> Buffer;
cout << "Original content of Buffer:--> " << Buffer << endl;
cout << "Enter a sentence: " << endl;
cin >> Buffer;
gets(Buffer);
cout << "Now the Buffer contains:--> " << Buffer << endl;
return 0;
}
I know longer have the warning code, but now the program doesnt execute as I would like. The last part does not output my new sentance.
I know people mentioned not to use gets, but I tried using getline, obviously I cant use it as a direct replacement so I was a bit lost.
Any suggestions
You cannot read into a memory which contains string constant. Often those string constants are stored in read-only memory and even if not, they can share the constants so you would override one string for all parts of your code.
You need to copy the string into some buffer and then do whatever you want. For example:
const char *myOrigBuffer = "Dummy string";
char buffer[1024];
strcpy(buff, myOrigBuffer);
....
gets(buff);
You cannot modify string literral. Your way of coding is too much "C style".
If the original buffer content doesn't matter and you must use gets(), don't initialize your buffer :
char Buffer[100];
cout << "Enter a sentence: " << endl;
gets(Buffer);
cout << "Now the Buffer contains:--> " << endl;
cout << Buffer << endl;
Don't forget that if you input more than 100 characters (as the size of the buffer), this will also crash.
As gets is deprecated, fgets must be encouraged : it protects you from overflows. You should code this way in C-Style :
char buffer[10];
printf("Your sentence : \n");
fgets(buffer, sizeof buffer, stdin);
printf("Your sentence : %s\n", buffer);
Ouputs :
Your sentence :
1 34 6789012345
Your sentence : 1 34 6789
Nonetheless, you should consider using std::cin with std::string to make correct C++ inputs :
std::string sentence;
std::cout << "Enter a sentence (with spaces): ";
std::getline(std::cin, sentence);
std::cout << "Your sentence (with spaces): " << sentence << std::endl;
Outputs :
Enter a sentence (with spaces): My super sentence
Your sentence (with spaces): My super sentence
A string literal like "Dummy content." is logically const, since any operation that attempts to change its contents results in undefined behaviour.
The definition/initialisation
char *Buffer = "Dummy content.";
however, makes Buffer a non-const pointer to (the first character of) a string literal. That involves a conversion (from array of const char to a char *). That conversion exists in C for historical reasons so is still in C++. However, subsequently using Buffer to modify the string literal - which is what gets(Buffer) does unless the user enters no data - still gives undefined behaviour.
Your "stopped working" error is one manifestation of undefined behaviour.
Giving undefined behaviour is the reason the conversion is deprecated.
Note: gets() is more than deprecated. It has been removed from the C standard, from where it originated, completely because it is so dangerous (no way to prevent it overwriting arbitrary memory). In C++, use getline() instead. It is often not a good idea to mix C I/O function and C++ stream functions on the same input or output device (or file) anyway.
char *Buffer = "Dummy content.";
You should use pointer on const char here because "Dummy content." is not a buffer but pointer on string literal that has type "array of n const char" and static storage duration, so cannot be changed through pointer. Correct variant is:
char const* Literal = "Dummy content.";
But you cannot use it as parameter for gets
gets(Buffer);
It is bad idea and should cause write access exception or memory corruption on writing. You should pass to gets a pointer to a block of memory where received string will be stored.
This block should have enough length to store whole string, so in general gets is unsafe, check https://stackoverflow.com/a/4309845/2139056 for more info.
But as temporary test solution you can use buffer on stack:
char Buffer[256];
gets(Buffer);
or dynamic allocated buffer:
char* Buffer= new char[256];
gets(Buffer);
//and do not forget to free memory after your output operations
delete [] Buffer;

The sizeof several casts

Why does the function sizeof not return the same size when its getting used on the struct itself?
I need to cast it because of a winsock program that im working on.
Thanks for any help, true.
#include <iostream>
#include <string>
using namespace std;
struct stringstruct
{
string s1;
string s2;
};
int main()
{
stringstruct ss = {"123","abc"};
char *NX = (char*)&ss;
cout << sizeof(NX) << endl << sizeof(*NX) << endl;
cout << sizeof(&ss) << endl << sizeof(ss) << endl;
getchar();
return 0;
}
the example above outputs
4
1
4
64
sizeof will tell you the size of the given expression's type. In both the sizeof(NX) and sizeof(&ss), the result is 4 because pointers on your machine take up 4 bytes. For sizeof(*NX), you are dereferencing a char*, which gives you a char, and a char takes up 1 byte (and always does), so you get the output 1. When you do sizeof(ss), ss is a stringstruct, so you get the size of a stringstruct, which appears to be 64 bytes.
stringstruct ss = {"123","abc"};
char *NX = (char*)&ss;
cout << sizeof(NX) << endl << sizeof(*NX) << endl;
cout << sizeof(&ss) << endl << sizeof(ss) << endl;
I'm pretty sure that any of these casts are pretty meaningless. NX will point at the beginning of your struct. Inside the struct are two objects of type string, which in turn have pointers pointing to the data they were initialized with "123" and "abc" respectively. sizeof(*NX) is just that - size of a char, and sizeof(NX) is indeed the size of a pointer. sizeof(ss) is the size of your two string members (and any padding added by the compiler) - and sizeof(&ss) is the size of a pointer to a stringstruct.
Now, I expect what you REALLY want is a way to send your data, "123" and "abc" as two separate strings over a network. None of the above will help you do that, since even if sizeof(ss) gives you the size of the data structure you want to send, the string values are not within that structure [1]. What you really need is something calls serialization - something that writes out your strings as separate elements as text/string.
Something like this would work:
struct stringstruct {
string s1;
string s2;
string to_string()
}
string stringstruct::to_string()
{
string res = s1 + " " + s2;
return res;
}
Then use to_string like this:
string temp = ss.to_string();
const char *to_send = temp.c_str();
int send_len = temp.length();
... send the string `to_send` with number of bytes `send_len`.
[1] There is an optimization where std::string is actually storing short strings within the actual class itself. But given a sufficiently long strong, it won't do that.
A pointer is of size 4(in your case seems to be 32 bit) no matter what it points. Size of the object itself on the other hand returns the real number of bytes that an object of that structure takes.

weird output when printing data of custom string (c++ newbie)

my main concern is if i am doing this safely, efficiently, and for the most part doing it right.
i need a bit of help writing my implementation of a string class. perhaps someone could help me with what i would like to know?
i am attempting to write my own string class for extended functionality and for learning purposes. i will not use this as a substitute for std::string because that could be potentially dangerous. :-P
when i use std::cout to print out the contents of my string, i get some unexpected output, and i think i know why, but i am not really sure. i narrowed it down to my assign function because any other way i store characters in the string works quite fine. here is my assign function:
void String::assign(const String &s)
{
unsigned bytes = s.length() + 1;
// if there is enough unused space for this assignment
if (res_ >= bytes)
{
strncpy(data_, s.c_str(), s.length()); // use that space
res_ -= bytes;
}
else
{
// allocate enough space for this assignment
data_ = new char[bytes];
strcpy(data_, s.c_str()); // copy over
}
len_ = s.length(); // optimize the length
}
i have a constructor that reserves a fixed amount of bytes for the char ptr to allocate and hold. it is declared like so:
explicit String(unsigned /*rbytes*/);
the res_ variable simply records the passed in amount of bytes and stores it. this is the constructor's code within string.cpp:
String::String(unsigned rbytes)
{
data_ = new char[rbytes];
len_ = 0;
res_ = rbytes;
}
i thought using this method would be a bit more efficient rather than allocating new space for the string. so i can just use whatever spaced i reserved initially when i declared a new string. here is how i am testing to see if it works:
#include <iostream>
#include "./string.hpp"
int main(int argc, char **argv)
{
winks::String s2(winks::String::to_string("hello"));
winks::String s(10);
std::cout << s2.c_str() << "\n" << std::endl;
std::cout << s.unused() << std::endl;
std::cout << s.c_str() << std::endl;
std::cout << s.length() << std::endl;
s.assign(winks::String::to_string("hello")); // Assign s to "hello".
std::cout << s.unused() << std::endl;
std::cout << s.c_str() << std::endl;
std::cout << s.length() << std::endl;
std::cout.flush();
std::cin.ignore();
return 0;
}
if you are concerned about winks::String::to_string, i am simply converting a char ptr to my string object like so:
String String::to_string(const char *c_s)
{
String temp = c_s;
return temp;
}
however, the constructor i use in this method is private, so i am forcing to_string upon myself. i have had no problems with this so far. the reason why i made this is to avoid rewriting methods for different parameters ie: char * and String
the code for the private constructor:
String::String(const char *c_s)
{
unsigned t_len = strlen(c_s);
data_ = new char[t_len + 1];
len_ = t_len;
res_ = 0;
strcpy(data_, c_s);
}
all help is greatly appreciated. if i have no supplied an efficient amount of information please notify me with what you want to know and i will gladly edit my post.
edit: the reason why i am not posting the full string.hpp and string.cpp is because it is rather large and i am not sure if you guys would like that.
You have to make a decision whether you will always store your strings internally terminated with a 0. If you don't store your strings with a terminating zero byte, your c_str function has to add one. Otherwise, it's not returning a C-string.
Your assign function doesn't 0 terminate. So either it's broken, or you didn't intend to 0 terminate. If the former, fix it. If the latter, check your c_str function to make sure it puts a 0 on the end.