So I am learning C++ at the moment and got some really weird results when I tried printing out element of a list of integers, that are actually out of range. (Don't ask why, I stumbled on that by accident)
This is my Code:
#include <iostream>
using namespace std;
int main() {
int nums[0] = {};
for (int i = 0; i <= 10; i++) {
cout << nums[i] << endl;
}
return 0;
}
Results of when I ran the code the first time:
0
17110752
0
4199367
0
0
0
65
0
4225392
0
Results of when I ran the code a second time:
0
1906400
0
4199367
0
0
0
65
0
4225392
0
Results of when I ran the code a third time:
0
15603424
0
4199367
0
0
0
65
0
4225392
0
Every time I run this programm, the second number seems to be diffrent.
I know why I get unexpected results, but I'm curious where these values come from.
Edit: "I know why I get unexpected results" is just poorly worded. What I meant was "I know that I get unexpected result, because I am calling a element that is not part of the list".
Are they values from the RAM?
Why are some value repeating and some not?
The result of compiling your code to (my invented) machine code is something like this.
1 main:
2 reserve basePtr, 2 ; reserve 2 slots on the stack, and store in basePtr
3 mov nums, basePtr ; num points at the first local variable
4 mov [nums], 0 ; int num[1] = {0} - set num to zero
5 mov i, basePtr + 1 ; i points at second slot for local variables
6 mov [i], 0 ; int i = 0;
7 for_loop:
8 cmp [i], 10 ;
9 jmp_above_or_equal fini ; exit loop
10 mov tmp, [nums+i] ;
11 call cout, tmp
12 call cout, std::endl ;
13 inc [i]
14 jmp for_loop
15 fini:
16 return 0
So we reserve 2 slots for variables, and then read beyond the reserved space.
Imagine the call works like this :-
Initial stack
+-----------+
| ret_main | // crt code which calls main
+-----------+
| nums[0] |
+-----------+
| i |
SP-> +-----------+
push param (e.g. tmp) creates a stack like so...
+-----------+
| ret_main | // crt code which calls main
+-----------+
| nums[0] |
+-----------+
| i |
+-----------+
| tmp |
SP-> +-----------+
With the call itself....
+-----------+
| ret_main | // crt code which calls main
+-----------+
| nums[0] |
+-----------+
| i |
+-----------+
| tmp |
+-----------+
| line_12 |
SP-> +-----------+
With this stack growing down through calls, your further requests are showing the memory values of the return address into your code, and the local variables of the previous call to cout.
If the stack grows the other way (more normal), a stack may look as so...
SP-> +-----------+
| line_12 |
+-----------+
| tmp |
+-----------+
| i |
+-----------+
| nums[0] |
+-----------+
| ret_main | // crt code which calls main
+-----------+
This is the normal direction for stacks in modern processors. In this case when you look further into the nums array, you are reading values of the return address for the function, and then the local variables which are from the function calling main.
In practice i is probably optimized into a register, and entry into a function would save some registry values based on the platform's calling conventions.
So the values you are reading, is likely to be in this order...
Registers saved by main
Values to detect against buffer overrun (which is happening here)
Return address to main
arguments to main
local variables in the function which calls main.
Why are they repeating. If the function calling main has copies of argv, then it would be likely the push to store argv are matched by the callers copy in local variables.
Why do some change. Since Address Space Layout Randomization (ASLR), processes normally launch with a random base address, and random stack location, making it harder to exploit a buffer overflow.
Related
I want to clobber all load instructions - essentially, I want to find all load instructions, and after the load is complete I want to modify the value in the register that stores the value that was read from memory.
To do so, I instrument all instructions and when I find a load I insert a call to some function that will clobber the write register after the load. I pass in the register that needs to be modified (i.e. the register containing the data loaded from memory) using PIN_REGISTER*.
Assuming I know the type of data that was loaded (i.e. int, float, etc.) I can access the PIN_REGISTER union according to the data type (See this). However, as you can see in the link, PIN_REGISTER stores an array of values - i.e. it doesn't store one signed int but rather MAX_DWORDS_PER_PIN_REG signed ints.
Will the value loaded from memory always be stored at index 0? If for instance, I load a 32 bit signed int from memory into a register, can I always assume that it would be stored at s_dword[0]? What if for instance I write to the 8 bit AH/BH/CH/DH registers? Since these correspond to "middle" bits of 32 bit registers, I assume the data would not be at index 0 in the array?
What's the easiest way for me to figure out which index in the array the loaded data is stored at?
If for instance, I load a 32 bit signed int from memory into a register, can I always assume that it would be stored at s_dword[0]?
Yes.
If you are in long mode and have, e.g., the RAX register, you have two DWORDs: the lower less significant 32 bits (index 0 in s_dword) and the higher most significant 32 bits (index 1 in s_dword).
What if for instance I write to the 8 bit AH/BH/CH/DH registers? Since these correspond to "middle" bits of 32 bit registers, I assume the data would not be at index 0 in the array?
Note: AH is rAX[8:16] (rAX is RAX or EAX), not really in the 'middle'.
It really depends on which member of the union you are accessing. If we stay with the s_dword member (or dword), then AH is still in the "lowest" DWORD (index 0) of the 32 or 64-bit register. It' is at the same time in the high part (most significant 8 bits) of the lowest WORD (16-bit quantity).
// DWORD (32-bit quantity)
auto ah = pinreg->dword[0] >> 8;
auto al = pinreg->dword[0] & 0xff;
// still the same for word (16-bit quantity)
auto ah = pinreg->word[0] >> 8;
auto al = pinreg->word[0] & 0xff;
// not the same for byte (8-bit quantity)
auto ah = pinreg->byte[1];
auto al = pinreg->byte[0];
What's the easiest way for me to figure out which index in the array the loaded data is stored at?
Hard to say, it just seems natural to me to know at which index it is. As long as you know the size of the various denominations in the union, it's quite simple:
byte: 8 bits
word: 16 bits
dword: 32 bits
qword: 64 bits
Here's a crude drawing with different sizes:
+---+---+---+---+---+---+---+---+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | byte
+---+---+---+---+---+---+---+---+
+-------+-------+-------+-------+
| 3 | 2 | 1 | 0 | word
+-------+-------+-------+-------+
+---------------+---------------+
| 1 | 0 | dword
+---------------+---------------+
+-------------------------------+
| 0 | qword
+-------------------------------+
^ ^
MSB LSB
The same with AL and AH (you can see that AH is byte[1] and AL is byte[0] both are in word[0], dword[0] and qword[0]):
+---+---+---+---+---+---+---+---+
| 7 | 6 | 5 | 4 | 3 | 2 | AH| AL| byte
+---+---+---+---+---+---+---+---+
+-------+-------+-------+-------+
| 3 | 2 | 1 | 0 | word
+-------+-------+-------+-------+
+---------------+---------------+
| 1 | 0 | dword
+---------------+---------------+
+-------------------------------+
| 0 | qword
+-------------------------------+
^ ^
MSB LSB
I came through a program which was like below
firstMissingPositive(vector<int> &A) {
vector<bool> dict(A.size()+1,false);
for(int i=0;i<A.size();i++){
if(A[i]>0 && A[i]<dict.size()) dict[A[i]]=true;
}
if(A.size()==1 && A[0]!=1) return 1;
else if(A.size()==1 && A[0]==1) return 2;
int i=0;
for(i=1;i<dict.size();i++){
if(dict[i]==false) return i;
}
return i;
}
In this program, I could not get what is mean by following line
vector<bool> dict(A.size()+1,false);
What is dict and this statement?
It's simply a variable.
The definition of the variable calls a specific constructor of the vector to initialize it with a specific size, and initialize all elements to a specific value.
It's equivalent to
vector<bool> dict;
dict.resize(A.size()+1,false);
See e.g. this std::vector constructor reference for more information about available constructors.
It is an definition of a variable "dict" of type vector. And please Google it first
You are declaring container of bool's (it means variables which stores only 0/1 (8B)) which has same count of elements as int vector A and all these elements are set to false -> 0.
It calls this constructor
vector (size_type n, const value_type& val,
const allocator_type& alloc = allocator_type());
Example:
This is vector A:
0 1 2 3 4 <- Indexes
+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | (int)
+---+---+---+---+---+
Its size is 5, so it would declare container with size 5, initialized to 0's.
0 1 2 3 4 <- Indexes
+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | (bool)
+---+---+---+---+---+
In this case its used to flag indexes in first vectror.
For example it is often used for Sieve of Eratosthenes. You can set 1's to primes with each iteration. It would be (for numbers 0-4)
0 1 2 3 4
+---+---+---+---+---+
| 0 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
Then you know on which indexes are primes in vector A.
for (int i = 0; i < A.size(); i++)
{
if ( dict[i] == true )
{
std::cout << "Prime number: << A[i] << std::endl;
}
}
If I have a large array where the data streams are interleaved in some complex fashion, can I define a pointer p such that p + 1 is some arbitrary offset b bytes.
For example lets say I have 1,000,000 ints, 4 bytes each.
int* p_int = my_int_array;
This gives me *(p_int+1) == my_int_array[1] (moves 4 bytes)
I am looking for something like
something_here(b)* big_span_p_int = my_int_array;
which would make *(big_span_p_int + 1) == my_int_array[b] (moves 4*b or b bytes, whichever is possible or easier)
Is this possible? easy?
Thanks for the help.
EDIT:
b is compile time variable.
Using some of your code. There is no need to declare an additional pointer/array. Applying pointer arithmetic on p_int is enough to traverse and reach the number value you are seeking.
Let's look at this example:
int main() {
int my_int_array[5] {1,2,3,4,5};
int* p_int = my_int_array;
int b = 2;
std::cout << *(p_int + b) << std::endl; // Output is 3, because *p_int == my_int_array[0], so my_int_array[2] will give you the third index of the array.
}
Graphically represented:
Memory Address | Stored Value (values or memory addresses)
----------------------------------------------
0 | .....
1 | .....
2 | .....
3 | .....
4 | .....
5 | .....
6 | .....
7 | .....
8 | .....
. | .....
. | .....
. | .....
n-1 | .....
Imagine the memory as being a very big array in which you can access positions by its memory address (in this case we've simplified the addresses to natural numbers. In reality they're hexadecimal values). "n" is the total amount (or size) of the memory. Since Memory counts and starts in 0, size is equivalent to n-1.
Using the example above:
1. When you invoke:
int my_int_array[5] {1,2,3,4,5};
The Operating System and the C++ compiler allocates the integer array memory statically for you, but we can think that our memory has been changed. E.g. Memory address 2 (decided by the compiler) now has our first value of my_int_array.
Memory Address | Name - Stored Value (values or memory addresses)
-----------------------------------------------------
0 | .....
1 | .....
2 | my_int_array[0] = 1
3 | my_int_array[1] = 2
4 | my_int_array[2] = 3
5 | my_int_array[3] = 4
6 | my_int_array[4] = 5
7 | .....
8 | .....
. | .....
. | .....
. | .....
n-1 | .....
2. Now if we say:
int* p_int = my_int_array;
The memory changes again. E.g. Memory address 8 (decided by the compiler) now has a int pointer called *p_int.
Memory Address | Name - Stored Value (values or memory addresses)
-----------------------------------------------------
0 | .....
1 | .....
2 | my_int_array[0] = 1
3 | my_int_array[1] = 2
4 | my_int_array[2] = 3
5 | my_int_array[3] = 4
6 | my_int_array[4] = 5
7 | .....
8 | p_int = 2 (which means it points to memory address 2, which has the value of my_int_array[0] = 1)
. | .....
. | .....
. | .....
n-1 | .....
3. If in your program you now say:
p_int += 2; // You increase the value by 2 (or 8 bytes), it now points elsewhere, 2 index values ahead in the array.
Memory Address | Name - Stored Value (values or memory addresses)
-----------------------------------------------------
0 | .....
1 | .....
2 | my_int_array[0] = 1
3 | my_int_array[1] = 2
4 | my_int_array[2] = 3
5 | my_int_array[3] = 4
6 | my_int_array[4] = 5
7 | .....
8 | p_int = 4 (which means it points to memory address 4, which has the value of my_int_array[2] = 3)
. | .....
. | .....
. | .....
n-1 | .....
When doing memory allocation and pointer arithmetic in a simple case like this, you don't have to worry about the size in bytes of an int (4 bytes). The pointers here are already bound to a type (in this case int) when you declared them, so just by increasing their value by integer values, p_int + 1, this will make point p_int point to the next 4 bytes or int value. Just by adding the values to the pointers you will get the next integer.
If b is a constant expression (a compile-time constant), then pointer declared as
int (*big_span_p_int)[b]
will move by b * sizeof(int) bytes every time you increment it.
In C you can use a run-time value in place of b, but since your question is tagged [C++], this is not applicable.
I have a 2D matrix
matrix[m][n];
I know that matrix is a double pointer with type int**. I would like to obtain a double pointer pointing to a submatrix of the original matrix. For example, I want the submatrix to start for cell (1,1). How do I get such a double pointer from the original matrix[m][n]?
I know that matrix is a double pointer with type int**.
No, you don't. Arrays are not pointers. If you declared it as int matrix[m][n];, then the type of the expression matrix is int [m][n]; unless matrix is the operand of the sizeof or unary & operators, it will have its type converted ("decay") to int (*)[n] (pointer to n-element array of int).
The problem is that you can't create arbitrary submatrices by just declaring a pointer of the right type; C and C++ don't provide an easy way to "slice" arrays this way. You can certainly create a pointer of type int (*)[n-1] and assign the value of &matrix[1][1] to it (with an appropriate cast), but it won't do what you want.
EDIT
Now that I have a real keyboard in front of me I can expand on this a little bit.
Let's imagine a 3x3 matrix declared as follows:
int m[3][3] = {{0,1,2},{3,4,5},{6,7,8}};
We normally visualize such a matrix as
+---+---+---+
| 0 | 1 | 2 |
+---+---+---+
| 3 | 4 | 5 |
+---+---+---+
| 6 | 7 | 8 |
+---+---+---+
In C and C++, 2-dimensional arrays are laid out in row-major order1, 2, so the above matrix would be represented in memory as
+---+
m: | 0 | m[0][0]
+---+
| 1 | m[0][1]
+---+
| 2 | m[0][2]
+---+
| 3 | m[1][0]
+---+
| 4 | m[1][1]
+---+
| 5 | m[1][2]
+---+
| 6 | m[2][0]
+---+
| 7 | m[2][1]
+---+
| 8 | m[2][2]
+---+
So suppose you want the 2x2 submatrix starting at m[1][1]:
+---+---+---+
| 0 | 1 | 2 |
+---+---+---+
| 3 | +---+---+
+---+ | 4 | 5 |
| 6 | +---+---+
+---+ | 7 | 8 |
+---+---+
That corresponds to the following array elements:
+---+
m: | 0 | m[0][0]
+---+
| 1 | m[0][1]
+---+
| 2 | m[0][2]
+---+
| 3 | m[1][0]
+---+
+---+
| 4 | m[1][1]
+---+
| 5 | m[1][2]
+---+
+---+
| 6 | m[2][0]
+---+
+---+
| 7 | m[2][1]
+---+
| 8 | m[2][2]
+---+
That's not a contiguous subarray within m, so just declaring a pointer and setting it to &m[1][1] won't do what you really want. You'll need to create a separate matrix object and copy the elements you want to it:
int subm[2][2] = {{m[1][1], m[1][2]}, {m[2][1], m[2][2]}};
You can write a function to grab a 2x2 "slice" of your matrix like so:
void slice2x2( int (*mat)[3], int (*submat)[2], size_t startx, size_t starty )
{
for ( size_t i = 0; i < 2; i++ )
for ( size_t j = 0; j < 2; j++ )
submat[i][j] = mat[startx + i][starty + j];
}
int main( void )
{
int matrix[3][3] = {{0,1,2},{3,4,5},{6,7,8}};
int submat[2][2];
slice2x2( matrix, submat, 1, 1 );
// do something with submat
}
Pre-publication draft of the C 2011 standard, §6.2.5.1, ¶3.
Pre-publication draft of the C++ 2014 standard, §8.3.4, ¶9
A matrix defined as 2D array of constant size:
int matrix [m][n];
is stored as m contiguous blocks of n elements. You can therefore imagine this technically as a flat sequence of m*n elements in memory. You can use pointer arithmetic to find the start of a row, or to find a specific element. But you can't locate a submatrix int that way.
The "double" pointer:
int **pmatrix;
obeys a different logic: it is a pointer to a pointer and works as an array of m pointers pointing at lines of n consecutive elements. So your elements are not necessarily consecutive. You can use pointer arithmetic and indirection to locate the start of a row or a specific item. But again this can't address a submatrix.
Both matrix and pmatrix can be used with 1D or 2D indexing, but the compiler generates different code to address the elements.
For getting sub-matrices you have to make iterations to find the right elements, using vertical and horizontal offsets, but you can't imagine to pass somehow a pointer to the sub-matrix if you don't copy the right elements in a new matrix of target's size.
I have a list of weighted objects i.e.:
A->1 B->1 C->3 D->2 E->3
is there an efficient algorithm in C++ to pick random elements according to their weight?
For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)
As #Dukeling said, we need more info. Like how you interpret and use the selection chance.
At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.
Suppose you start with badness score
B[i] = how badly you don't want to select the i-th item
And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.
As you say, one obvious way is to use multiplicative inverse:
S[i] = 1 / B[i]
However, there might be a little problem with that.
The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.
Ask yourself this:
Say
B[1] = 1 -> S[1] = 1
B[2] = 2 -> S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2
But with the same amount of change
B[3] = 1000 -> S[3] = 0.001
B[4] = 1001 -> S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4
I'll just throw one possible alternative scheme here for now.
S[i] = max(B) - B[i] + 1
The + 1 part helps so no item has zero chance to be selected.
This ends the part of calculating selection score.
Next, let's clear up how to use the selection score in roulette wheel fashion.
Assume we decided to use the additive inverse scheme.
B[1] = 1 -> S[1] = 1001
B[2] = 2 -> S[2] = 1000
B[3] = 1000 -> S[3] = 2
B[4] = 1001 -> S[4] = 1
Then imagine each point in the score is correspond to a lottery ticket.
Let's assign the ticket a running IDs.
| Item | Score = #ticket | ticket ID | win chance |
| 1 | 1001 | 0 to 1000 | 1001/2004 ~ 0.499500998 |
| 2 | 1000 | 1001 to 2000 | 1000/2004 ~ 0.499001996 |
| 3 | 2 | 2001 to 2002 | 2/2004 ~ 0.000998004 |
| 4 | 1 | 2003 to 2003 | 1/2004 ~ 0.000499002 |
There are 2004 tickets in total.
To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004).
Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.
For comparison, here is the selection chance in case the multiplicative inverse scheme is used.
| Item | win chance |
| 1 | 1/1.501999001 ~ 0.665779404 |
| 2 | 0.5/1.501999001 ~ 0.332889702 |
| 3 | 0.001/1.501999001 ~ 0.000665779 |
| 4 | 0.000999001/1.501999001 ~ 0.000665114 |
You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.
Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.