comparing array subscripts in LLVM - llvm

I am analyzing getelementptr instructions for array accesses. How can I compare the subscripts for two gep instructions with array accesses?
For example, for the code
a[i]=b[i+1]+i;
How can I compare two array subscripts i and i+1 in IR?

You can iterate over them using GetElementPtrInst::idx_begin() function.

Related

2D arrays: C++ vs Java (row major vs no major)

The argument for Java 2D arrays being neither row major nor column major is that "two-dimensional array in Java is actually an array of references to arrays". C++ also follows the "array of arrays" abstraction. Then why does C++ require row major order?
Quoting the link in the question
What we may sometimes think of as two-dimensional array in Java is actually an array of references to arrays. It's not stored linearly in memory.
The difference is in C++ arrays are stored linearly in memory.
If you represent an m by n 2D array in linear memory, it has to either be layed out as n groups of m items or m groups of n items. In C and C++ an array like ary[M][N] will be stored as M groups of N items. Whether you want to call those groups rows or columns is arbitrary but the point is the language does specify an order whereas Java does not because a 2D array in Java is not stored "flat".

What's the difference between row and column majors?

Multidimensional arrays can be stored in linear memory in two orders: row-major and column-major. What is the difference between these two orders?
Row Major will search through information as:
[0][0],[0][1],...,[0][n],[1][0],...,[1][n],..[m][n]
Column Major will search through info information as:
[0][0],[1][0],...,[m][0],[0][1],...,[m][1],...,[m][n]
In memory it is always stored as:
[0][0],[0][1],...,[0][n],[1][0],...,[1][n],..[m][n]
From https://en.wikipedia.org/wiki/Row-major_order
The difference is simply that in row-major order, consecutive elements of the rows of the array are contiguous in memory; in column-major order, consecutive elements of the columns are contiguous.
There are no multidimensional arrays in C++, so this question is moot.

Creating array of integers in LLVM

I have a vector of integer values, vector<Value*> myIntegers in my LLVM code (not necessarily constant). I want to create a Store instruction to store these integers. To create the store instruction using the format below, for the first argument I need to create a Value* pointing to these integers (create an array out of them).
new StoreInst(Value *Val, Value *Ptr, ...);
If my integers were constants I would have used:
Constant *IntArrayConstant = ConstantDataArray::get(getGlobalContext(), ArrayRef<Value*> myIntegers);
How can I create a generic array of i32 types, with a Value* pointing to it? The documentation says storing ArrayRef is not safe either.
You should probably use VectorType::get(), create an UndefValue of the type you just obtained, and then populate it with N InsertElementInsts, with N the number of elements. You will then create a StoreInst to store the Value* on the heap.
The result of the last InsertElementInst will thus be the Value* you are looking for (i.e. a vector containing the values). Please note that, depending on what you're trying to do, the StoreInst might actually be not needed at all.
Note that I'm assuming that all your Values have the same underlying type (i.e. getType() returns the same result for all of them).
Edit: also note that maybe, depending on what you're trying to do, it could be more appropriate to use ArrayType::get instead of VectorType::get.

Can you change the way 2d arrays are ordered in C++/CUDA

Suppose I have a two dimensional array in C++ under CUDA, stored in the shared memory,
like so:
__shared__ float arr[4][4]; // C++ has a default row-major ordering
By default C++ will order the elements in arr in a row-major format.
That is it will allocate a continuous block of memory and store the elements like this (0,0), (0,1), (0,2), (0,3), (1,0), (1,1), ... and so on...
Is there a way to tell the C++/CUDA compiler to arrange this in a column-major order?
Why don't you just swap indexes you are using?
Instead of using arr[x][y] use arr[y][x].
Interesting is why you would like to do this. Maybe using cache memory could be helpful but I can't tell for sure without details.
Hope it help.
Transpose the matrix. arr[4][4] means that arr is an array of 4 arrays of size 4. The reason to store the values in "row-major" ordering is that arr[0], for example, must give us the pointer to the first of these four arrays, and elements of a single array should be placed in contiguous memory locations so that they can be individually referenced by adding an index to a unique identifier.

C/C++: Pointer Arithmetic

I was reading a bit in Pointer Arithmetic, and I came upon 2 things I couldn't understand neither know it's use
address_expression - address_expression
and also
address_expression > address_expression
Can someone please explain them to me, how do they work and when they are used.
Edit:
What I meant to say is what do they produce if I just take two addresses and subtract them
And If I take two addresses and compare them what is the result or comparing based upon
Edit:
I now understand the result of subtracting addresses, but comparing addresses I still don't get it.
I understand that 1<2, but how is an address greater than another one and what are they compared upon
Several answers here have stated that pointers are numbers. This is not an accurate description of pointers as specified by the C standard.
In large part, you can think of pointers as numbers, and as addresses in memory, provided (a) you understand that pointer subtraction converts the difference from bytes to elements (of the type of the pointers being subtracted), and (b) you understand the limits where this model breaks.
The following uses the 1999 C standard (ISO/IEC 9899, Second edition, 1999-12-01). I expect the following is more detailed than the asker requested, but, given some of the misstatements here, I judge that precise and accurate information should be given.
Per 6.5.6 paragraph 9, you may subtract two pointers that point to elements of the same array or to one past the last element of the array. So, if you have int a[8], b[4];, you may subtract a pointer to a[5] from a pointer to a[2], because a[5] and a[2] are elements in the same array. You may also subtract a pointer to a[5] from a pointer to a[8], because a[8] is one past the last element of the array. (a[8] is not in the array; a[7] is the last element.) You may not subtract a pointer to a[5] from a pointer to b[2], because a[5] is not in the same array as b[2]. Or, more accurately, if you do such a subtraction, the behavior is undefined. Note that it is not merely the result that is unspecified; you cannot expect that you will get some possibly nonsensical number as a result: The behavior is undefined. According to the C standard, this means that the C standard does not say anything about what occurs as a consequence. Your program could give you a reasonable answer, or it could abort, or it could delete files, and all those consequences would be in conformance to the C standard.
If you do an allowed subtraction, then the result is the number of elements from the second pointed-to element to the first pointed-to element. Thus, a[5]-a[2] is 3, and a[2]-a[5] is −3. This is true regardless of what type a is. The C implementation is required to convert the distance from bytes (or whatever units it uses) into elements of the appropriate type. If a is an array of double of eight bytes each, then a[5]-a[2] is 3, for 3 elements. If a is an array of char of one byte each, then a[5]-a[2] is 3, for 3 elements.
Why would pointers ever not be just numbers? On some computers, especially older computers, addressing memory was more complicated. Early computers had small address spaces. When the manufacturers wanted to make bigger addresses spaces, they also wanted to maintain some compatibility with old software. They also had to implement various schemes for addressing memory, due to hardware limitations, and those schemes may have involved moving data between memory and disk or changing special registers in the processor that controlled how addresses were converted to physical memory locations. For pointers to work on machines like that, they have to contain more information than just a simple address. Because of this, the C standard does not just define pointers as addresses and let you do arithmetic on the addresses. Only a reasonable amount of pointer arithmetic is defined, and the C implementation is required to provide the necessary operations to make that arithmetic work, but no more.
Even on modern machines, there can be complications. On Digital’s Alpha processors, a pointer to a function does not contain the address of the function. It is the address of a descriptor of the function. That descriptor contains the address of the function, and it contains some additional information that is necessary to call the function correctly.
With regard to relational operators, such as >, the C standard says, in 6.5.8 paragraph 5, that you may compare the same pointers you may subtract, as described above, and you may also compare pointers to members of an aggregate object (a struct or union). Pointers to members of an array (or its end address) compare in the expected way: Pointers to higher-indexed elements are greater than pointers to lower-indexed elements. Pointers to two members of the same union compare equal. For pointers to two members of a struct, the pointer to the member declared later is greater than the pointer to the member declared earlier.
As long as you stay within the constraints above, then you can think of pointers as numbers which are memory addresses.
Usually, it is easy for a C implementation to provide the behavior required by the C standard. Even if a computer has a compound pointer scheme, such as a base address and offset, usually all elements of an array will use the same base address as each other, and all elements of a struct will use the same base address as each other. So the compiler can simply subtract or compare the offset parts of the pointer to get the desired difference or comparison.
However, if you subtract pointers to different arrays on such a computer, you can get strange results. It is possible for the bit pattern formed by a base address and offset to appear greater (when interpreted as a single integer) than another pointer even though it points to a lower address in memory. This is one reason you must stay within the rules set by the C standard.
Pointer subtraction yields the number of array elements between two pointers of the same type.
For example,
int buf[10] = /* initializer here */;
&buf[10] - &buf[0]; // yields 10, the difference is 10 elements
Pointer comparison. For example, for the > relational operator: the > operation yields 1 if the pointed array element or structure member on the left hand side is after the pointed array element or structure member on the right hand side and it yields 0 otherwise. Remember arrays and structures are ordered sequences.
&buf[10] > &buf[0]; // 1, &buf[10] element is after &buf[0] element
Subtracting two pointer addresses returns the number of elements of that type.
So if you have an array of integers and two pointers into it, subtracting those pointers will return the number of int values between, not the number of bytes. Same with char types. So you need to be careful with this, especially if you are working with a byte buffer or wide characters, that your expression is calculating the right value. If you need byte-based buffer offsets for something that does not use a single byte for storage (int, short, etc) you need to cast your pointers to char* first.
The first expression subtracts one pointer from another. As a simple example of why this might be useful, consider a C string. The string is in contiguous memory, so if you had the address of the first character of the string, and the address of the last character, you could find the length of the string by doing:
int strLength = (last_char_address - first_char_address) + 1;
Such pointer arithmetic is type aware, meaning that the result of the arithmetic represents the number of elements - of the specific type - between two pointers. In the above example using char, the difference is the number of characters. This works similarly for e.g. pointers to two structs.
Similarly, your second expression is simply comparing pointers and the result will be 1 or 0. As a very simple example, the address of element 5 of an array is always > the address of element 4: &string[4] > &string[5] is true.
An analogy I like to use when explaining pointer arithmetic — both how it works, and its limitations — is to think about street addresses.
Suppose there are a bunch of houses on same-sized lots on Elm Street, with all the lots, say, 50 feet wide. Suppose I want to know how far it is from #12 Elm Street to #46 Elm Street, and suppose I want to know this distance as a number of houses, not a distance in feet. Well, obviously, I can just subtract 12 from 46, and get an answer of 34 houses. (Actually, of course, it's a little more complicated than that, because there are probably houses on both sides of the street, but let's ignore that issue for now.)
And suppose over on 10th Avenue there are a bunch of industrial buildings on bigger lots, all 100 feet wide. I can still subtract street numbers, and I'll get distances in number of buildings (not feet).
And this is analogous to pointer subtraction in C, where you get differences that are scaled by the size of the pointed-to objects. You do not get answers as raw bytes (analogous to feet in the street address analogy).
But the other thing the street address analogy helps us understand is why we can't use pointer arithmetic to work with pointers into different arrays. Suppose I want to know how far it is from #12 Elm Street to #30 10th Avenue. Subtracting the addresses doesn't work! It's meaningless. You can't meaningfully subtract or compare addresses on different streets, just as you can't meaningfully subtract or compare pointers into different arrays.
Pointers can often be thought of as just numbers that represents the memory address, like 0x0A31FCF20 (or 2736770848 in decimal), or 0xCAFEDEAD (sometimes systems use this to indicate an error, I don't remember the details.)
Pointer comparison is often used in sorting arrays of pointers. Sorted arrays of pointers are helpful when you need to check if a pointer is in a list of pointers; if the list is sorted, you don't have to look through every element of the list to figure out if the pointer is in that list. You need to use comparisons to sort a list.
Pointer arithmetic is often used when you have a pointer to a chunk of data, and you need to access something that is not at the beginning of the chunk of data. For example:
const char *string = "hello world!"
const char *substring = string+6;
std::cout << string << "\n";
std::cout << substring << std::endl;
This would output:
hello world!
world!
Here we got the string after the first 6 characters of "hello world!", or "world!". Keep in mind that you should use std::string where its available, instead, if possible. A concept very similar to pointer arithmetic is random access iterators.
Subtracting pointers can help you find the distance between those two pointers. If you have a pointer to the first element of an array, and a pointer to one element past the last element of the array, subtracting these two pointers helps you find the size of the array.
Another case where you might treat pointers as integers is in an optimized version of a linked list, called an XOR linked list. You can find more details about it here. I can expand on this if you'd like; let me know in the comments.
You can treat an address like an int in many ways. The only difference is that that int is representing the number of sizes in that address. For example, if int * p happens to have the value of, say, 234 (from some safe instruction of for example p = new int[12];), it represents the address 234. If we do p += 1;, it's just adding, in terms of int-size. Now p is (assuming 4-byte int for this example) 238, aka p[1]. In fact p[x] is equivalent to *(p+x). You can compare and such just like an int. In some contexts this is useful, for example in the given example p[0] now refers to what was p[1]. This avoids having to do something like p = &p[1] which dereferences unnecessarily.