Why strstr() - char pointer = number? - c++

I have this program:
#include <iostream>
#include <conio.h>
#include <string.h>
using namespace std;
int main()
{
char char1[30] = "ExtraCharacter", char2[30] = "Character", *p;
p = strstr(char1, char2);
cout << "p: " << p << endl;
cout << "char1: " << char1 << endl;
cout << "(p-char1): " << (p-char1) << endl;
return 0;
}
When I run it, I get:
p: Character
char1: ExtraCharacter
(p-char1): 5
as expected.
But this is not the problem, I'm not sure why "Character" - "ExtraCharacter" is an integer (5)? Perhaps not an integer, but a number/digit anyways.
Actually I don't understand why is "Character" stored in p, and not the memory address.
If I understood well from a book, strstr() returns a memory address, shouldn't it be more like a strange value, like a hex (0x0045fe00) or something like that? I mean, it's cout << p not cout << *p to display the actual value of that memory address.
Can someone explain me how it works?
P.S.: I apologize if the title is not that coherent.

But this is not the problem, I'm not sure why "Character" - "ExtraCharacter" is an integer (5)?
You subtract one pointer from another and result - number, distance from char char1 points to to char p points to. This is how pointer arithmetic works.
Note: this subtraction is only valid when both pointers point to the same array (or behind the last element), which is the case in your code, but you need to be careful. For example if strstr() does not find susbtring then it would return nullptr and your subtraction will have UB. So at least check p before subtracting (and passing nullptr to std::cout would have UB as well)
If I understood well from a book, strstr() returns a memory address, shouldn't it be more like a strange value, like a hex (0x0045fe00) or something like that? I mean, it's cout << p not cout << *p to display the actual value of that memory address.
Yes p is a pointer aka memory adress. std::ostream has special rule how to print pointers to char - as strings, because strings in C stored that way. If you want to see it as a pointer just cast it:
std::cout << static_cast<void *>( p );
then you will see it as an address.

To display address, you have to cast char* to void*:
std::cout << "p: " << static_cast<const void*>(p) << std::endl;
Demo

For std::basic_ostream (type of cout), character and character string arguments (e.g., of type char or const char*) are handled by the non-member overloads of operator<< which are being treated as strings. char[30] will be decayed to const char* argument and basic_ostream will output the null terminated string at the address of the pointer.
As for (p-char1), the result of subtracting two pointers is a std::ptrdiff_t. It is an implementation-defined signed integer. That's why the output is 5

Related

Why does printing the 'address of index n' of c style strings lead to output of substring

I'm rather new to C++ and while working with a pointer to a char array (C style string) I was confused by its behavior with the ostream object.
const char* items {"sox"};
cout << items << endl;
cout << items[0] << endl;
cout << *items << endl;
cout << &items << endl;
cout << &items[1] << endl;
Running this leads to:
sox
s
s
0x7fff2e832870
ox
In contrary to pointer of other data types, printing the variable doesn't output the address, but the string as a whole. By what I understand, this is due to the << operator being overloaded for char arrays to treat them as strings.
What I don't understand is, that cout << &items[1] prints the string from index 1 onward (ox), instead of the address of the char at index 1. Is this also due to << operator being overloaded or what is the reason for this behavior?
The type of &items[1] is const char *. Therefore the const char * overload of operator << is used, which prints the string from index 1 onwards.
OTOH, the type of &items is const char **, for which no specific overload exists, so the address of items is printed (via the const void * overload).
Back in the olden days, when C ran the world, there was no std::string, and programmers had to make do with arrays of char to manage text. When C++ brought enlightenment (and std::string), old habits persevered, and arrays of char are still used to manage text. Because of this heritage, you'll find many places where arrays of char act differently from arrays of any other type.
So,
const int integers[] = { 1, 2, 3, 4 };
std::cout << integers << '\n';
prints the address of the first element in the array.
But,
const char text[] = { 'a', 'b', 'c', '\0' };
std::cout << text << '\n';
prints the text in the array text, up to the final 0: abc
Similarly, if you try to print addresses inside the array, you get different behavior:
std::cout << &integers[1] << '\n';
prints the address of the second element in th array, but
std::cout << &text[1] << '\n';
prints the text starting at the second character of the array: bc
And, as you suspected, that's because operator<< has an overload that takes const char* and copies text beginning at the location pointed to by the pointer, and continuing up to the first 0 that it sees. That's how C strings work, and that behavior carries over into C++.
items[1] is the second character of the array and its address, i.e. &items[1], is a pointer to the second character (with index 1) as well. So, with the same rule that you have mentioned for operator <<, the second character of the string till the end is printed.

Printing out the value of pointer to the first index of an char array

I'm new to C++ and is trying to learn the concept of pointer. When I tried to print out the value of pStart, I was expecting its value to be the address of text[0] in hexdecimal (e.g. something like 0x7fff509c5a88). However, the actual value printed out is abcdef.
Could someone explain it to me why this is the case? What parts am I missing?
char text[] = "abcdef";
char *pStart = &text[0];
cout << "value of pStart: " << pStart << endl;
Iostreams provide an overload that assumes a pointer to char points to a NUL-terminated (C-style) string, and prints out the string it points to.
To get the address itself to print out, cast it to a pointer to void instead:
cout << "value of pStsart: " << (void *)pStart << "\n";
Note that you don't really need pStart here at all though. The name of an array (usually, including this case) evaluates to the address of the beginning of the array, so you can just print it directly:
cout << "address of text: " << (void *)text << "\n";
Get out of the habit of using endl as well. It does things you almost certainly don't realize and almost never want.

std::cout << cstring; prints value of cstring elements, not cstring hex address. Why?

I understand that an array of chars is different to a cstring, due to the inclusion of a suffixing \0 sentinel value in a cstring.
However, I also understand that, in the case of a cstring, an array of chars, or any other type of array, the array identifier in the program is a pointer to the array.
So, below is perfectly valid.
char some_c_string[] = "stringy";
char *stringptr;
stringptr = some_c_string; // assign pointer val to other pointer
What I don't understand is why std::cout automatically assumes I want to output the value of each element in either a cstring, or an array of chars, rather than the hex address. For example:
char some_c_string[] = "stringy"; // got a sentinel val
char charArray[5] = {'H','e','l','l','o'}; // no space for sentinel val \0
char *stringptr;
stringptr = some_c_string;
int intArray[3] = {1, 2, 4};
cout << some_c_string << endl << charArray << endl
<< stringptr << endl << intArray << endl;
Will result in the output:
stringy
Hello
stringy
0xsomehexadd
So for the cstring and the char array, std::cout has given me the value of each element, rather than the hex address like with the int array.
I guess this became a standard in C++ for convenience. But can someone please expand on 1) When this became standard. 2) How std::cout differentiates between char/cstrings and other arrays. I guess it uses sizeof() to see it's is an array of single bytes, and that value of each array element is an ASCII int value to identify an array of chars/cstring.
Thanks! :D
There is nothing fancy going on. The operator<< has a special overload for char*, so that you can do std::cout << "Hello World";. It's been like that since day 1 of c++.
For anything besides char*, the pointer address is displayed as hex.
If you want to display the address of a char*, simply cast it to void*, ie
std::cout << (void*)"Hello World";

Why is a char pointer dereferenced automatically in a dynamic array [duplicate]

This question already has answers here:
Why does cout print char arrays differently from other arrays?
(4 answers)
Closed 7 years ago.
Perhaps a stupid question. When I cout the pointer to the char array, I thought it would print an address; instead it dereferences the address and prints the actual values till null.
As opposed to an int array where it does what I expect it to. It prints the address of the first element.
Why does the char element gets dereferenced when you print the pointer.
char* as = new char[100];
as[0] = 'a';
as[1] = 'b';
as[2] = NULL;
cout << as << endl;
int* s = new int[100];
s[0] = 2;
cout << s << endl;
Asking this because when I try to get the address to the first char element a[0] = 'a';. I have to store it in a pointer to a pointer. Which seems weird to me but that's besides the point.
char ** d = &as;
cout << d << "this is d" << endl;
There is no overloaded output operator << that prints the address for any char pointer, it treats all char pointers as strings. If you want to print the address of a pointer, you need to cast it to void*
std::cout << "Address of string is " << static_cast<void*>(as) << '\n';
On a side-note, the code
char ** d = &as;
cout << d << "this is d" << endl;
will not print the address of the string, i.e. the pointer contained inside as, instead it will print where the variable as is stored in memory. Not quite the same thing.
char* character pointers are considered to be C-style null terminated strings by the std::ostream << operator. You are right that this is a different behavior from other pointer types.
&a is not a pointer to a[0] . It is a pointer to a which is itself a pointer. a is in fact the pointer to a[0] and is equivalent to &a[0].
IOStreams treat char* (and const char*) specially, so that you can print C-strings without further effort:
std::cout << "hello world\n";
(Bear in mind that the string literal expression decays immediately to a const char* when passed to operator<<.)
If you do not want this behaviour, you can cast to void*:
char* as = new char[100];
as[0] = 'a';
as[1] = 'b';
as[2] = NULL;
cout << (void*)as << endl;
Your "fix" is actually broken, because you are printing the address of the pointer as, not the address of the array elements that as points to. This is indicated by the char** type, which you already noticed.
It prints the string because that's what the definition of that particular operator<< overload does. Cast to void * if you want to print an address:
cout << static_cast<void *>(as) << endl;
The << operator is overloaded to take a char* and output its contents where as there is no such overload for a int*/int[].

why are the addresses of these variables printing as ab# and b#?

I run the follow code:
#include <iostream>
using namespace std;
typedef struct Test
{
char a;
char b;
int i;
double d;
}Test;
int main()
{
Test test;
test.a = 'a';
test.b = 'b';
test.i = 478;
test.d = 4.7;
cout << &test.a << '\n'
<< &test.b << '\n'
<< &test.i << '\n'
<< &test.d << '\n';
return 0;
}
The output is:
ab#
b#
0x28fe94
0x28fe98
At first, i thought it is a result of the precedence between & and ..
But the 0x28fe94 and 0x28fe94 indicate it's not the problem of precedence.
I can figure out what does the ab# and b# mean?
When you write
cout << &test.a
because test.a is a char, this will invoke the operator<< (ostream&, const char*) overload, thinking that your char* is a pointer to a C-style string rather than a pointer to just one character. Consequently, the operation will start reading bytes starting at the memory address of &test.a until it finds a null terminator (a zero byte). This happens to print out ab#, since a has value 'a', b has value 'b', and the number 478, on your system, happens to correspond to an # character followed eventually by a null byte.
If you want to see the numeric addresses of test.a and test.b, cast the pointers to void*s, which will select the operator<< (ostream&, const void*) overload. For example:
cout << static_cast<void*>(&test.a)
Hope this helps!
It means undefined behaviour. There's a special overload for const char * that prints it as a null-terminated string. Yours has no terminator, so it goes beyond and triggers the aforementioned UB. Fix it with a cast:
std::cout << static_cast<void *>(&test.a) << '\n'
<< static_cast<void *>(&test.b) << '\n'
<< &test.i << '\n'
<< &test.d << '\n';
Because you are passing the address of a character when you use &test.a and &test.b, the cout operator is treating the addresses as the start of a string. That's why you get first ab# and then b#. The # is less obvious; it falls in the padding after the two char values and before the int value in the structure, and is quasi random junk. You're actually invoking undefined behaviour because you're asking the output code to access the padding. If you want to print addresses, cast to void *. If you want to print characters, don't supply the address.