I was reading some code and I came across this example. What I don't understand is why the author uses an offset of 1 from both variables on the last line. At first glance I would assume this is illegal because it is referring to a possibly uninitialized memory area (and it could cause a segmentation fault). My head keeps telling me undefined behavior but is this really so?
static bool lt(wchar_t a, wchar_t b)
{
const std::collate<wchar_t>& coll =
std::use_facet< std::collate<wchar_t> >(std::locale());
return coll.compare(&a, &a+1, &b, &b+1) < 0;
}
The last line is the one in question. Why is it necessary that he's doing this, is it legal, and when should it be done?
It appears that the author just wanted to compare two characters using the current global locale.
Since std::collate<T>::compare uses [low, high) for the two ranges, adding 1 to the address of the parameters will simply cause the comparison to stop after only a is compared to b. There should be no invalid memory accesses.
what book are you reading,
plus it depends on what are you comparing too!
sometimes you need to compare an ID that happens to be in the beginning of buffer, and with a certain size.
Testing your function
#include <locale>
static bool lt(wchar_t a, wchar_t b)
{
const std::collate<wchar_t>& coll =
std::use_facet< std::collate<wchar_t> >(std::locale());
return coll.compare(&a, &a+1, &b, &b+1) < 0;
}
int main () {
bool b = lt('a', 'b');
return 0;
}
Inside the debugger
Breakpoint 1, main () at test.cpp:13
13 bool b = lt('a', 'b');
(gdb) s
lt (a=97 L'a', b=98 L'b') at test.cpp:6
6 std::use_facet< std::collate<wchar_t> >(std::locale());
(gdb) p &a
$1 = 0x7fffffffdddc L"a\001翿\x400885"
(gdb) p &a+1
$2 = 0x7fffffffdde0 L"\001翿\x400885"
From this I believe
the code is legal
but &a + 1 is referring to possibly uninitialized memory
From what gdb returns I tend to think taking the address of a wchar_t returns a char* thus &a (a is a wchar_t) is a char* to the start of the multibyte variable that is a and &a+1 returns pointer to the second byte. Am I correct?
Related
I'm trying to debug a small C++ program using gdb, but may be getting hung up on some pointer arithmetic:
A* get(int) returns a pointer to an instance of a class A I've defined. Internally, get(int) references an array of A, returning:
class A_list {
private:
A* A_array;
int count;
public:
A_list(int c): count(c) { A = new A[c]; }
void insertAt(A a, int idx) {
A_array[idx] = a;
}
A* get(int);
};
A* A_list::get(int idx) {
...
A* result = A_array + idx;
return result;
}
presumably, when dealing with an array of A, I can simply add the index (times the size of a single A) to get the address of the idx'th.
This seems to work as expected. However, when calling get(int) from within another member function of A_list, I watch the value assignment in gdb and see two different values:
void A_list::foo() {
A* a = nullptr; // I declare my pointer, and initialize to 0x0
...
a = get(0); // I store the address of `A_array[0]`
The gdb watchpoint outputs:
Old value = (Number *) 0x0
New value = (Number *) 0x55555556b2c0
However, when I print the address stored in a, I get a completely different value, with an unrecognized message attached.
(gdb) p a
(Number *) 0x7ffff7b4e5c0 <_IO_file_overflow+256>
attempting to dereference any of the member values gives unexpected results
I can't find <_IO_file_overflow+256> defined anywhere in the gdb sources. What does it mean?
Why might the value stored in a appear to be different from the value returned when get() is called from inside a member function of A_list? From outside (eg - in main()) I get the expected value.
Edit 9-08:
Changed assignment in get() based on feedback. Still getting the same arbitrary address when I return from the get() function.
When doing pointer arithmetic, it's done in elements and not in units of bytes.
Therefore the multiplication with sizeof(A) is invalid and wrong: The expression A_array + (idx * sizeof(A)) should be plain A_array + idx.
Or you could be explicit and return &A_array[idx].
All this means that for any pointer or array a and (valid) index i, the expression *(a + i) is exactly the same as a[i]. And from that follows that &a[i] will be exactly the same as a + i.
To answer your last question -- <_IO_file_overflow+256> is telling you gdb's best guess as to what that address (0x7ffff7b4e5c0) refers to -- in this case, the address is pointing into some shared libaray, and the symbol _IO_file_overflow is the closest symbol defined in that library (and specifically, this address is 256 bytes past that symbol). This looks to be part of libc.
You can get more detail about what addresses correspond to what in your program by examining the file /proc/<pid>/maps -- you just need to know the pid of the process you are debugging and you can look at that file in another window.
As to why you're getting this odd value when it looks like you've just assigned a different value, it may be that gdb is getting confused and you have another a defined somewhere and gdb is printing that. Or you may have incomplete/incorrect debugging info in your program -- make sure that you compile with -O0 -g if you want accurate debug info.
This question already has answers here:
What are the differences between a pointer variable and a reference variable?
(44 answers)
Closed 7 years ago.
I have a few questions. This isn't homework. I just want to understand better.
So if I have
int * b = &k;
Then k must be an integer, and b is a pointer to k's position in memory, correct?
What is the underlying "data type" of b? When I output it, it returns things like 0x22fe4c, which I assume is hexadecimal for memory position 2293324, correct?
Where exactly is memory position '2293324'? The "heap"? How can I output the values at, for example, memory positions 0, 1, 2, etc?
If I output *b, this is the same as outputting k directly, because * somehow means the value pointed to by b. But this seems different than the declaration of b, which was declared int * b = k, so if * means "value of" then doesn't mean this "declare b to the value of k? I know it doesn't but I still want to understand exactly what this means language wise.
If I output &b, this is actually returning the address of the pointer itself, and has nothing to do with k, correct?
I can also do int & a = k; which seems to be the same as doing int a = k;. Is it generally not necessary to use & in this way?
1- Yes.
2- There's no "underlying data type". It's a pointer to int. That's its nature. It's as data type as "int" or "char" for c/c++.
3- You shouldn't even try output values of memory which wasn't allocated by you. That's a segmentation fault. You can try by doing b-- (Which makes "b" point to the "int" before it actual position. At least, to what your program thinks it's an int.)
4- * with pointers is an operator. With any data type, it's another data type. It's like the = symbol. It has one meaning when you put == and another when you put =. The symbol doesn't necesarilly correlates with it meaning.
5- &b is the direction of b. It is related to k while b points to k. For example, if you do (**(&b)) you are making the value pointed by the value pointed by the direction of b. Which is k. If you didn't changed it, of course.
6- int & a = k means set the direction of a to the direction of k. a will be, for all means, k. If you do a=1, k will be 1. They will be both references to the same thing.
Open to corrections, of course. That's how I understand it.
In answer to your questions:
Yes, b is a pointer to k: It contains the address of k in the heap, but not the value of k itself.
The "data type" of b is an int: Essentially, this tells us that the address to which b points is the address of an int, but this has nothing to do with b itself: b is just an address to a variable.
Don't try to manually allocate memory to a specific address: Memory is allocated based of the size of the object once initialized, so memory addresses are spaced to leave room for objects to be allocated next to each other in the memory, thus manually changing this is a bad idea.
* In this case is a de-reference to b. As I've said, b is a memory address, but *b is what's at b's address. In this case, it's k, so manipulating *b is the same as manipulating k.
Correct, &b is the address of the pointer, which is distinct from both k and b itself.
Using int & a = k is creating a reference to k, which may be used as if it were k itself. This case is trivial, however, references are ideal for functions which need to alter the value of a variable which lies outside the scope of the function itself.
For instance:
void addThree(int& a) {
a += 3;
}
int main() {
int a = 3; //'a' has a value of 3
addThree(a); //adds three to 'a'
a += 2; //'a' now has a value of 8
return 0;
}
In the above case, addThree takes a reference to a, meaning that the value of int a in main() is manipulated directly by the function.
This would also work with a pointer:
void addThree(int* a) { //Takes a pointer to an integer
*a += 3; //Adds 3 to the int found at the pointer's address
}
int main() {
int a = 3; //'a' has a value of 3
addThree(&a); //Passes the address of 'a' to the addThree function
a += 2; //'a' now has a value of 8
return 0;
}
But not with a copy-constructed argument:
void addThree(int a) {
a += 3; //A new variable 'a' now a has value of 6.
}
int main() {
int a = 3; //'a' has a value of 3
addThree(a); //'a' still has a value of 3: The function won't change it
a += 2; //a now has a value of 5
return 0;
}
There are compliments of each other. * either declares a pointer or dereferences it. & either declares a (lvalue) reference or takes the address of an object or builtin type. So in many cases they work in tandem. To make a pointer of an object you need its address. To use a pointer as a value you dereference it.
3 - If k is a local variable, it's on the stack. If k is a static variable, it's in the data section of the program. The same applies to any variable, including b. A pointer would point to some location in the heap if new, malloc(), calloc(), ... , is used. A pointer would point to the stack if alloca() (or _alloca()) is used (alloca() is similar to using a local variable length array).
Example involving an array:
int array_of_5_integers[5];
int *ptr_to_int;
int (*ptr_to_array_of_5_integers)[5];
ptr_to_int = array_of_5_integers;
ptr_to_array_of_5_integers = &array_of_5_integers;
The following code comes from Arduino's "Print.cpp":
size_t Print::print(const __FlashStringHelper *ifsh)
{
PGM_P p = reinterpret_cast<PGM_P>(ifsh);
size_t n = 0;
while (1) {
unsigned char c = pgm_read_byte(p++);
if (c == 0) break;
n += write(c);
}
return n;
}
The __FlashStringHelper is basically a char array or string literal which is stored in PROGMEM (program memory) instead of RAM. It is declared as in WString.h:
class __FlashStringHelper;
#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(PSTR(string_literal)))
I am interested in the unsigned char c = pgm_read_byte(p++); line and more spesifically the (p++) part. I assume that the value of the pointer p is read here and is also incremented by one byte here so that all the bytes in *ifsh can be read one after the other. Am I correct with that?
So I have 2 question considering the above;
What is the sequence in normal C/C++ compilers and what is the sequence in Arduino? Is the value first read and then the pointer incremented? Or is it the other way around?
Is the sequence always the same for C/C++ compilers or will it
differ between them?
The expression pgm_read_byte(p++) is equivalent to
pgm_read_byte(p);
p += 1;
And all C or C++ compilers that follow the standard will behave that way.
It's defined by C++ standard and:
It's the same (in this case it's post increment so it will happen after reading value from p)
Always the same.
Regarding the second question, this is from c89 standard:
The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented.
I somehow believe that this is true for all newer c versions as well as cpp.
Arduino uses the avr-gcc compiler. So i can guess you can safely assume:
A = p++ is equal to A = p; p++;
The post-increment operator (i.e. variable++) will increment the variable but return the old value of the variable. It is equivalent to:
SomeType PostIncrement(SomeType& v)
{
SomeType oldValue = v;
v = v + 1;
return oldValue;
}
Example:
int x = 5;
int y = x++;
// Here x is 6 while y is 5 (i.e. the old value of x)
When you use post-increment on a variable you pass in a function call, the function will be called with the old value. Still the increment is done before the function call because the arguments will be fully evaluated before the call.
Example:
void f(int y)
{
// When execution arrives here, x is 6 while y is 5 (i.e. the old value of x)
}
int main()
{
int x = 5;
f(x++);
return 0;
}
I'm a programming and c++ novice. I'd appreciate some help with this.
the following program (in c++) doesn't encounter any problem either in compilation or run-time:
int main()
{
int b = 5;
int*a = &b;
*(a+5) = 6;
return 0;
}
But according to everything I learned it shouldn't work, because a is a pointer to a single variable. What am I missing here?
Your program should indeed not encounter any problem at compile time. It is all valid code with regards to compilation.
However it will encounter undefined behaviour at runtime as a+5 is not a valid address.
If you want to know why it should compile, you can write code like this:
int func( int * buf, size_t size )
{
for( size_t i = 0; i < size; ++i )
{
*(buf + size) = static_cast<int>(i); // or (int)i in C
}
}
int main()
{
int buf[ 6 ];
func( buf, 6 );
}
In your code a is a pointer to memory. a + 5 means an address 5 "ints" on from where a points. As a was pointed at a single integer b, there are no guarantees about such an address. Interestingly enough, it is well defined to refer to a+1 even though it points to a place in memory that you should not read from or write to. But the pointer itself has some guarantees, i.e. it will be greater than a and if you subtract 1 from it you will get back to a and if you do a ptrdiff between it and a you will get 1. But that is just a special property of "one past the end" which allows programmers to specify memory ranges.
The program do have an undefined behaviour:
int main()
{
//This cause the loading of the "main" function to allocate memory for variable b
//It could be in a memory page that was already allocated to the program
//or in a new allocated page.
int b = 5;
//Here a just getting the address of variable b.
int*a = &b;
//This is the undefined behavior and can end up in two cases:
// 1. If (a+5) value is in a memory space that is allocated to the application.
// Then no runtime error will happen, and the value will be writing there.
// probably dirting some other value, and can cause an undefined behavior later
// in the application execution.
// 2. If (a+5) value is in a memory space that wasn't allocated to the application.
// the application will crash
*(a+5) = 6;
return 0;
}
Now, since a page size is probably 4096 and b is somewhere within a page, *b+5 is in most cases still be in the same page. If you want to challenge it more change it from 5 to 5000 or higher and the chance for crashes will increase.
Yes it shouldn't work when you access memory space which is not in your process region, but perhaps no one has owned that particular region ((a + 5)) which didn't cause run time illegal memory access or it can. Hence its a UB.
Just adding to the existing answers.
The access
*(a+5) = a[5]
So this is the location not allocated by you.
In the case of array say
int a[6];
You have a valid access from a[0] to a[5] where a[5] is the last element of the array and any further access like a[6] will lead to undefined behavior as that location is not allocated by you.
Similarly you just have a integer allocated like
int b=5;
int *a = &b;
a is a pointer pointing to &b i.e address of b.
So the valid access for this is just a[0] which is the only location allocated by you on the stack.
Any other access like a[1] a[2]... and so on will lead to undefined behavior.
The access turns out to be VALID if you have something like
int b[6];
int *a = b;
Now a[5] will give the value of the last element of the array b
#include <stdio.h>
#include <string.h>
#include <conio.h>
#include <iostream>
using namespace std;
char a[21]; // If this is put inside the function unter -> junk output
char* b="";
void unter()
{
char *new_str = "";
strcpy(a, new_str);
char str_temp[10]="";
int chnum =0, neighbor1=3, neighbor2=5, mynode=4;
sprintf(str_temp,"%d",chnum);
b = strcat(a, str_temp);
b = strcat(b, "_from");
sprintf(str_temp,"%d",mynode);
b = strcat(b, str_temp);
b = strcat(b, "to");
sprintf(str_temp,"%d",neighbor1);
b = strcat(b, str_temp);
}
int main()
{
unter();
cout << a;
cout << b;
std::cin.get();
}
This is my code in C++. I'm not sure how the character array 'a' also has the same values as 'b'. And moreover, when I declare 'a'
char a[21];
inside function unter(), I'm getting some junk value for 'b'(as output). Care to explain how?
a is a char array and b is a pointer that points to a, so when printing them, they always print the same thing. When you move the declaration for a into unter, it is destroyed when unter returns, leaving b a dnagling pointer, so you get garbage when you print it.
b = strcat(a, str_temp);
is probably what's causing your issue, since the return value from strcat() is the first parameter that was passed to it, hence why you're seeing a and b becoming equal, since b is getting set to a in that call.
strcat() returns the result of the concatenation operation, so
b = strcat(a, str_temp);
results in b pointing to the array a[]. The subsequent strcat() operations effectively do the same, so the end result is that b points to a[] as you observe. If you declare a[] inside unter() it will have local scope to that function, with the result that the global variable b will point to random/undefined memory contents after you exit the call to unter().
It's mildly worth noting that you're doing a lot of work that could be accomplished more easily with
sprintf(a, "%d_from%dto%d", chnum, mynode, neighbor1);
You can do the whole concatenation and sprintf's in a single line.
char* b="";
void unter()
{
int chnum =0, neighbor1=3, neighbor2=5, mynode=4;
char str_temp[21];
sprintf(str_temp,"%d_from%dto%d", chnum, mynode, neighbor1);
b = new char[strlen(str_temp)+1];
b = strcpy(b, str_temp);
}
Only funny thing is you must remember to delete b when you are done. The other option is using the a buffer and sprintf directly to it:
char a[21];
void unter()
{
int chnum =0, neighbor1=3, neighbor2=5, mynode=4;
char str_temp[21];
sprintf(a,"%d_from%dto%d", chnum, mynode, neighbor1);
}
When you define a inside the function, memory for the variable a is allocated on stack. This memory is destroyed when the function exits. Your pointer b is pointing to starting address of a. Now, if you try to access b outside the function, it is pointing to a memory location which is already destructed and contain garbage values. Basically, b becomes a dangling pointer.
If you declare a inside the unter() function, then it is only scoped inside that function. Attempt to print b from outside the function will print junk since it is pointing to a which is already destroyed.
This is a classic example of why you should always make sure to not to point to a local variable from a global one.
In addition to the other hints provided, you should take notice of the line
b = strcat(b, str_temp);
which seems rather inappropriate for b is merely defined as a char pointer to a single byte storage ("" defines an empty string, i.e. an array of chars with a single element containing '\0')
So when strcat appends to b, it creates a buffer overrun.
Edit:Actually I just noticed that b was assigned to point to a, (thanks to the line preceding the one mentionned) so that would then be ok, since a may have the room for this... It doesn't make sense however, to need the two variables.
Maybe what you do not understand, is that although strcat() returns a pointer, this return doesn't need to be "consumed", it is merely a convenience, for when we chain commands an such. In other words you can simply write:
strcat(a, str_temp);
Not requiring any char * b.