Deep understanding of strcat and strlen functions - c++

We know that strcat() recevies a poiner to a destination array as parameters and concatenate them with source string. The destination array should be large enough to store the concatenated result. Recently i found out that it is still possible for strcat() to execute as expected, for small programs, even when the destination array is not large enough to add second string. I start surfing stackoverflow and found out couple -
answers for this question. I want to go more deeply,and understand what exactly happends in hardware layer when i run this code below ?
#include<iostream>
#include<iomanip>
#include<cmath>
#include<cstring>
using namespace std;
int main(){
char p[6] = "Hello";
cout << "Length of p before = " << strlen(p) << endl;
cout << "Size of p before = " << sizeof(p) << endl;
char as[8] = "_World!";
cout << "Length of as before = " << strlen(as) << endl;
cout << "Size of as before = " << sizeof(as) << endl;
cout << strcat(p,as) << endl;
cout << "After concatenation:" << endl;
cout << "Length of p after = " << strlen(p) << endl;
cout << "Size of p after = " << sizeof(p) << endl;
cout << "Length of as after = " << strlen(as) << endl;
cout << "Size of as after = " << sizeof(as) << endl;
return 0;
}
After running this code the length of array p[] is 12, and the size of p[] is 6. How can physically such length be stored on such array size ? I mean for this array the number of bytes are limited, so does it mean that strlen(p) function looks only for NULL terminator, and keeps counting untill it founds it and ignores the actual allocated size of that array. And sizeof() function doesn't really care if the last element in array, allocated purposely for null-character, is stores null-character or not.

The array p is allocated on the function stack frame, so strcat "overflows" the buffer p and continues wrting to some other area of the stack - typically it overrides other local parameters, function return address, etc (keep in mind that on x86 platform function stack usually grows "downwards", i.e. towards lesser addresses). This is well-known "buffer overflow" vulnerability.
strlen cannot know what is the actual size of your buffer, it just looks for 0-terminator. On the other hand, sizeof is a compile-time function that returns the array size in bytes.

You are writing outside the bounds of p and the behavior of your program is therefore undefined.
While the behavior is totally undefined, there are a couple of common behaviors that occur:
You overwrite some unrelated data. This could be other local variables, the function return address, etc. It's impossible to guess exactly what will get overwritten without examining the assembly generated by the compiler for that specific program. This can result in a severe security vulnerability since it can allow an attacker to inject their own code into your program's memory space and let them overwrite a function's return address to cause the program to execute their injected code.
The program crashes. This can happen if you write far enough past the end of the array to pass a memory page boundary. The program can try to write to a virtual memory address that the OS hasn't mapped to physical memory for your application. This results in the OS killing your application (with a SIGSEGV on Linux, for example). This will usually happen more often with dynamically allocated arrays than function-local arrays.

Related

C++ cout corruption

I am reading a file header using ifstream.
Edit: I was asked to put the full minimal program, so here it is.
#include <iostream>
#include <fstream>
using namespace std;
#pragma pack(push,2)
struct Header
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
int main(int argc,char* argv[])
{
string fileName;
fileName = "test";
string fileInName = fileName + ".dst";
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
if (!fileIn)
{
cout << "File Not Found" << endl;
return 0;
}
Header h={};
if (fileIn.is_open()) {
cout << "\n" << endl;
fileIn.read(reinterpret_cast<char *>(&h.label), sizeof(h.label));
cout << "Label: " << h.label << endl;
fileIn.read(reinterpret_cast<char *>(&h.st), sizeof(h.st));
cout << "Stitches: " << h.st << endl;
fileIn.read(reinterpret_cast<char *>(&h.co), sizeof(h.co));
cout << "Colour Count: " << h.co << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusXExtends),sizeof(h.plusXExtends));
cout << "Extends: " << h.plusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.minusXExtends),sizeof(h.minusXExtends));
cout << "Extends: " << h.minusXExtends << endl;
fileIn.read(reinterpret_cast<char *>(&h.plusYExtends),sizeof(h.plusYExtends));
cout << "Extends: " << h.plusYExtends << endl;
// This will output corrupted
cout << endl << endl;
cout << "Label: " << h.label << endl;
cout << "Stitches: " << h.st << endl;
cout << "Colour Count: " << h.co << endl;
cout << "Extends: " << h.plusXExtends << endl;
cout << "Extends: " << h.minusXExtends << endl;
cout << "Extends: " << h.plusYExtends << endl;
}
fileIn.close();
cout << "\n";
//cin.get();
return 0;
}
ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
Then I use a struct to store the header items
The actual struct is longer than this. I shortened it because I didn't need the whole struct for the question.
Anyway as I read the struct I do a cout to see what I am getting. This part is fine.
As expected my cout shows the Label, Stitches, Colour Count no problem.
The problem is that if I want to do another cout after it has read the header I am getting corruption in the output. For instance if I put the following lines right after the above code eg
Instead of seeing Label, Stitches and Colour Count I get strange symbols, and corrupt output. Sometimes you can see the output of the h.label, with some corruption, but the labels are Stitches are written over. Sometimes with strange symbols, but sometimes with text from the previous cout. I think either the data in the struct is getting corrupted, or the cout output is getting corrupted, and I don't know why. The longer the header the more the problem becomes apparent. I would really like to do all the couts at the end of the header, but if I do that I see a big mess instead of what should be outputting.
My question is why is my cout becoming corrupted?
Using arrays to store strings is dangerous because if you allocate 20 characters to store the label and the label happens to be 20 characters long, then there is no room to store a NUL (0) terminating character. Once the bytes are stored in the array there's nothing to tell functions that are expecting null-terminated strings (like cout) where the end of the string is.
Your label has 20 chars. That's enough to store the first 20 letters of the alphabet:
ABCDEFGHIJKLMNOPQRST
But this is not a null-terminated string. This is just an array of characters. In fact, in memory, the byte right after the T will be the first byte of the next field, which happens to be your 11-character st array. Let's say those 11 characters are: abcdefghijk.
Now the bytes in memory look like this:
ABCDEFGHIJKLMNOPQRSTabcdefghijk
There's no way to tell where label ends and st begins. When you pass a pointer to the first byte of the array that is intended to be interpreted as a null-terminated string by convention, the implementation will happily start scanning until it finds a null terminating character (0). Which, on subsequent reuses of the structure, it may not! There's a serious risk of overrunning the buffer (reading past the end of the buffer), and potentially even the end of your virtual memory block, ultimately causing an access violation / segmentation fault.
When your program first ran, the memory of the header structure was all zeros (because you initialized with {}) and so after reading the label field from disk, the bytes after the T were already zero, so your first cout worked correctly. There happened to be a terminating null character at st[0]. You then overwrite this when you read the st field from disk. When you come back to output label again, the terminator is gone, and some characters of st will get interpreted as belonging to the string.
To fix the problem you probably want to use a different, more practical data structure to store your strings that allows for convenient string functions. And use your raw header structure just to represent the file format.
You can still read the data from disk into memory using fixed sized buffers, this is just for staging purposes (to get it into memory) but then store the data into a different structure that uses std::string variables for convenience and later use by your program.
For this you'll want these two structures:
#pragma pack(push,2)
struct RawHeader // only for file IO
{
char label[20];
char st[11];
char co[7];
char plusXExtends[9];
char minusXExtends[9];
char plusYExtends[9];
};
#pragma pack(pop)
struct Header // A much more practical Header struct than the raw one
{
std::string label;
std::string st;
std::string co;
std::string plusXExtends;
std::string minusXExtends;
std::string plusYExtends;
};
After you read the first structure, you'll transfer the fields by assigning the variables. Here's a helper function to do it.
#include <string>
#include <string.h>
template <int n> std::string arrayToString(const char(&raw)[n]) {
return std::string(raw, strnlen_s(raw, n));
}
In your function:
Header h;
RawHeader raw;
fileIn.read((char*)&raw, sizeof(raw));
// Now marshal all the fields from the raw header over to the practical header.
h.label = arrayToString(raw.label);
h.st = arrayToString(raw.st);
h.st = arrayToString(raw.st);
h.co = arrayToString(raw.co);
h.plusXExtends = arrayToString(raw.plusXExtends);
h.minusXExtends = arrayToString(raw.minusXExtends);
h.plusYExtends = arrayToString(raw.plusYExtends);
It's worth mentioning that you also have the option of keeping the raw structure around and not copying your raw char arrays to std::strings when you read the file. But you must then be certain that when you want to use the data, you always to compute and pass lengths of the strings to functions that will deal with those buffers as string data. (Similar to what my arrayToString helper does anyway.)

C++ sizeof C-style string / char array - optimization

I'm a student at university. I work mostly with Java, C++ is very new to me, so I probably make many silly mistakes and I have upcoming exams to cope with. Don't be too harsh with me.
Note: I can NOT use C++ std::string because I need to work with C-strings due to university tasks!
Referring to my studies and the question I asked about pointers and const arguments (which you find here) I tried messing around with memory management but it seems it has no effect, or I just misunderstood some aspects about sizeof or actual sizes of certain elements.
This is my class Person:
Person.cpp
using namespace std;
Person::Person()
{
Person::name = new (char[64]);
Person::adress = new (char[64]);
Person::phone = new (char[64]);
cout << "standard constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
}
Person::Person(const char *name, const char *adress , const char *phone)
{
Person::name = new (char[strlen(name)]);
Person::adress = new (char[strlen(adress)]);
Person::phone = new (char[strlen(phone)]);
setName(name);
setAdress(adress);
setPhone(phone);
cout << "general constructor called; object created, allocated " << sizeof(this->name) << "+" << sizeof(this->adress) << "+" << sizeof(this->phone) << "bytes" << endl;
};
Person::Person(Person const &other)
{
Person::name = new (char[strlen(other.getName())]);
Person::adress = new (char[strlen(other.getAdress())]);
Person::phone = new (char[strlen(other.getPhone())]);
setName(other.getName());
setAdress(other.getAdress());
setPhone(other.getPhone());
cout << "copy constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
};
Person::~Person()
{
delete [] name;
delete [] adress;
delete [] phone;
cout << "destructor called; object removed" << endl;
};
I tried to spare memory with creating a C-string with a string length of the given parameters.
Thinking that a C-string is a char array, sparing chars would result in sparing memory, e.g. a C-string of "John" takes up less memory than a C-string of "Jonathan".
So now I'm not sure if I just got the wrong concept of C-strings or char arrays, or my implementation is just faulty.
In my main I create the following objects:
int main()
{
Person t;
t.printPerson();
cout << "size of t: " << sizeof(t) << endl;
Person p("John", "some street", "0736182");
p.printPerson();
cout << "size of p: " << sizeof(p) << endl;
Person x(p);
x.printPerson();
cout << "size of x: " << sizeof(x) << endl;
Person y("Jonathan", "Lancaster Ave 53", "3584695364");
y.printPerson();
cout << "size of y: " << sizeof(y) << endl;
cin.get();
};
But I alwas get a size of 24 per object, so 8 for each member variable. Why is that?
Thanks in advance.
I think you are expecting the sizeof operator to behave differently than it actually does. Let's take this code, for example:
const char* str = new char[137];
Here, if you write sizeof(str) you'll probably either get 4 or 8, depending on your system, because sizeof(str) measures the number of bytes of the pointer str itself rather than the number of bytes in the array pointed at by str. So, on a 32-bit system, you'd probably get 4, and on a 64-bit system you'd probably get 8, independently of how many characters you allocated.
Unfortunately, C++ doesn't have a way for you to get the number of characters or the memory used up by a dynamically allocated array. You just have to track that yourself.
Similarly, in your main function, when you write sizeof(p), you're measuring the number of bytes used by the object p, not the total number of bytes used by p and the arrays it points at. You'll always get back the same value for sizeof(p) regardless of what strings it points at.
If you're planning on working with strings in C++, I strongly recommend using std::string over raw C-style strings. They're much easier to use, they remember their length (so it's harder to mix up strlen and sizeof), and if you have a class holding s bunch of std::strings you don't need a copy constructor or assignment operator to handle the logic to shuffle them around. That would significantly clean up your code and eliminate most of the memory errors in it.
sizeof gives you a number of bytes which c/c++ need to keep the object in memory. In you r case (though you have not shown it) it looks like name, address, and phone are pointers to char:
struct Person {
char *name, *address, *phone;
}
a pointer is a variable which keeps an address of another object. So, depending on the underlying system it could occupy 32 bits (4 bytes) or 64 bite (8 bytes) (or some other number). In this case the sizeof struct person will be for 64-bit system -- 24. (3 pointers per 8 bytes each). This corresponds to your results.
The sizeof provides you with a shallow size calculation. Your strings are pointed by the those pointers and their lengths are not included. So, potentially you need to create a member function which will calculate those for you, i.e.
struct Person {
char *name, *address, *phone;
int getSize() {
return strlen(name) + strlen(address) + strlen(phone);
}
};
And as mentioned in the comments before, every char *string in c/c++ must have a termination character ('\0') which tells the program where the string ends. So, if you allocate space for a string, you should provide space for it as well (+ 1 to the length). And you have to make sure that this character is written as '\0'. if you use library functions to copy strings, they will take car of it, otherwise you need to do it manually.
void setName(const char *n) {
name = new char[strlen(n) + 1]; // includes needed '0', if exists in 'n'
strcpy(name, n); // copies the string and adds `\0` to the end
}
If you use the loop to copy chars instead of strcpy you would need to add it manually:
name[strlen(n)] = 0;

C++ (g++ compiler) Dynamically Initializing Statically Declared Array Leads to Memory Questions

We went over differing compiler behavior today in my data structures class. The example below (plus some of my cout'ing and other tinkering) was given as a program that could compile on g++ but not necessarily on all other compilers.
#include <iostream>
using namespace std;
int main(void)
{
int i;
int j = 5;
cout << "i address: " << &i << " -- j address: " << &j
<< endl << "enter size: ";
cin >> i;
cout << "value i: "<< i << "; size memory of i (ints): "<<(&i - &j)<< endl;
int a[i];
cout << "address of array a start: " << a << endl;
int b = 14;
cout << "value b: " << b << "; b address= " << &b << endl
<< "distance bt j and b(ints): "<< (&j - &b) << endl;
cout << "distance bt b and start of array(ints): " << (&b - a) << endl;
}
After playing around with inputs (and learning a little about how dynamic arrays are allocated memory in the process), I decided that entering 0 was the most interesting. The output:
i address: 0x7fff5b303764 -- j address: 0x7fff5b303760
enter size: 0
value i: 0; size memory of i (ints): 1
address of array a start: 0x7fff5b3036b0
value b: 14; b address= 0x7fff5b303754
distance bt j and b(ints): 3
distance bt b and start of array(ints): 41
My Questions:
How does g++ know to treat the array a as one which needs to be dynamically created vs immediately trying to statically create one with whatever the default value of i is or throwing some sort of compile time error?
The distance between j and b is 3 ints worth of memory instead of the expected 1. What is going on with that? I suspect, purely from empirical evidence gathered by playing around with the program, that it has something to do with the couts, but I'm unfamiliar with how/why they would be stored in memory in seemingly random amounts.
I entered 0 for size of array a and, based on playing around with different sizes, I think it is unlikely that a dynamic array of length 0 would be initialized to length 41. So, if it is not the array taking up all of those 41 ints worth of memory, what then is stored between b (the last data stored on the stack) and the array a (the first data purposefully stored on the heap) and why is whatever it is there?
a is a variable-length array which is not part of standard C++ but a g++ extension. g++ knows that it is not a regular array with size determined at compile-time, because i is not a compile-time constant, such as const int i = 3; a is not stored on the heap. It is stored on the stack. I imagine the address distance between j and b depends on the size allocated to a, which can only be known at run-time. I don't know how to account for the distance between b and a for size 0. If you built in debug-mode, it's possible that some extra hidden buffer was added after the array in order to detect an accidental overwrite beyond the bounds of the array.

Difference between "size" and "capacity" in c++ string?

I have this snippet from Thinking in C++.
#include <iostream>
#include <string>
int main ()
{
string bigNews("I saw Elvis in a UFO. ");
cout << bigNews << endl;
bigNews.insert(0, " thought I ");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
bigNews.append("I've been working too hard.");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
return 0;
}
And I get output as shown below:
I saw Elvis in a UFO.
thought I I saw Elvis in a UFO.
Size = 33
Capacity = 44
thought I I saw Elvis in a UFO. I've been working too hard.
Size = 60
Capacity = 88
I can figure out why the size increases, but I am not able to make out how the Capacity increases?
What i know is Capacity is the string buffer where we can Pushback, but how that space is allocated?
capacity is the maximum number of characters that the string can currently hold without having to grow. size is how many characters actually exist in the string. The reason they're separate concepts is that allocating memory is generally inefficient, so you try to allocate as rarely as possible by grabbing more memory than you actually need at one time. (Many data structures use a "doubling" method where, if they hit their capacity of N and need more space, they will allocate 2*N space, to avoid having to reallocate again any time soon.)
capacity will increase automatically as you use the string and require more space. You can also manually increase it using the reserve function.
From the documentation:
capacity()
returns the number of characters that can be held in currently allocated storage
(public member function)
So, it is the allocation size of the internal buffer. What you see is its size doubling when it's exhausted -- this is a common technique for using dynamically-sized buffers efficiently, and it's called "exponential storage expansion". What it boils down to is basically this:
void resize_buffer(char **buf, size_t *cap, size_t newsize)
{
while (newsize > *cap)
*cap *= 2;
*buf = realloc(*buf, *cap);
}
(Of course, this is largely simplified, don't use this for actual reallocation code in production.) Probably your implementation of std::string is using this trick, that's why you see the buffer size going up by 100%.

C++ address handling (pointers)

I need to track my current location in a data buffer (which will be used as a packet), so I am using two variables, bufferLoc and dataBuffer.
char dataBuffer[8192];
char** bufferLoc;
I am pointing to the starting location of dataBuffer with bufferLoc. But incrementing bufferLoc does not affect its physical address in memory.
bufferLoc = (char**)&dataBuffer;
cout << &bufferLoc << endl;
bufferLoc++;
cout << &bufferLoc << endl;
These two prints will output the same location. Does my error have to do with type casting, with bufferLoc itself, or something completely different?
Thanks for your help.
If your intention is to scan through dataBuffer one byte at a time, then the second variable should be a pointer, not a pointer to a pointer.
char* bufferLoc;
then print it out without the ampersand:
cout << (unsigned int *)bufferLoc << endl;
note that cout will try to print your variable as text unless you cast to an unsigned int*
cout << &bufferLoc << endl;
prints the address of bufferLoc. This address is always the same. You can print the value stored in bufferLoc:
cout << bufferLoc << endl;
this value is the address of dataBuffer initially, when you increment it, it will be 4 bytes greater in the second print statement.
dataBuffer itself stores a pointer to a char array of 8192 bytes. What you want to do is to get this value:
char *bufferLoc = dataBuffer;
and increment this value. Note that type of bufferLoc is a pointer to a char array (just as dataBuffer). After assigning the address stored in dataBuffer to bufferLoc, you can print the first element: like this: cout << bufferLoc[0] << end.