Transposing dimensions after a reshape: when it is required? - python-2.7

tf.reshape can change the dimension of a tensor data structure, but is there any guarantee about how will be the data ordered on each dimension?
suppose I have a tensor with dimensions A[objects, x, t] that describes some value across different objects, positions and time, and another that is just across B[x,t]. I usually have to do reshapes in order to copy B over dimensions 0 like this:
B_res = tf.tile( tf.reshape( B , [1, X_SIZE, T_SIZE]), tf.pack([OBJECT_SIZE,1,1]))
some_op = tf.mul( A, B_res )
The problem I see is when X_SIZE == T_SIZE reshape does not have any way to know how to arrange B data in the reshape, for all that I know, it could be aligning data along dimension 0 of B along dimension 2 of B_res!
Is there any rules for how reshape orders data? I want to know if a few tf.transpose operations are required after a certain tf.reshape

In memory arrays / tensors are really just 1D objects. So say you need to store a 10x10 array. In memory this would just be a 1D array of length 100 where the first 10 elements correspond to the first row. This is known as row major ordering. Say you want to reshape it to the shape 20x5. The underlying memory is not changed by the reshape so the first ten elements now make up rows 1 and 2. Transpose on the other hand physically reorders the memory to maintain the row major ordering while changing the location of the dimensions.
Also I think you are tiling unessarily. You should read up on broadcasting. I think you could do the following.
some_op = A * tf.reshape(B, [1, X_SIZE, T_SIZE])
In this case it will automatically broadcast B along the first dimension of A.
Note. I am not actually sure if tensorflow is using row major or column major ordering but the same concepts still apply.

Related

How to change the xyz order when using a flattened 3D array

I'm using a flattened 3d array for my program, and from everything I've read, the way to access it is: array[x + width * (y + depth * z)]
This is the equivalent of array[x][y][z]
Unfortunately in my code I need to use array[x][z][y] Does anyone know how to change the flattened formula from xyz to xzy?
As long as you stay consistent in accessing the array, you can access it however you like. A "flattened" 3D array is just a normal array now, and the indices has absolutely no meaning for the system. The only meaning is what you assign to them in your mind.
Therefore, it is perfectly OK to imagine the array to be - as you put it - an XZY array instead of an XYZ array, or any other permutation of the indices. The general formula is (when you are imagining the array to be a flattened 3D array):
FlatIndex(1st, 2nd, 3rd) = 1st * PlaneSize + 2nd * RowSize + 3rd;
where RowSize is obviously the number of elements in a row (i.e. the number of elements in the 3rd dimension) and PlaneSize is the number of elements in a plane, or the number of elements in a row times the number of rows there is.
Deriving the above formula is simple: if you think about how the elements are put beside each other, you will see that when you increment the 3rd index by 1, you increase its "address" by 1 as well (address means where it is in the 1D array) hence the fact that the 3rd index has a coefficient of 1. When you increase the 2nd index by 1, you are effectively going to the next row, and your address in increased by a row's worth of elements at once. In other words, an increment of 1 for the 2nd index is the same as an increment equal to the size of a row for the 3rd index, therefore the 2nd index has a coefficient of RowSize. The same goes for the 1st index.
If we had any number of indices (i.e. a flattened N-dimensional array,) the reasoning would have been the same.
But this is all imaginary. You can devise any "mapping" function between the three indices used in a 3D array and the single index used in a normal 1D array (with some restrictions; e.g. it must cover all your domain, and it must be reversible.) The formula everybody uses though is a pretty good one: it's fast to compute, it's 1-to-1 which means it uses all the entries in the 1D array and leaves no holes (i.e. wastes no memory,) etc.
As long as you are consistent in use of your mapping function (in all its aspects, including the order of indices) everywhere you access the same array, you'll be fine.

Performing storage of banded matrix in Fortran

I wrote a derived data type to store banded matrices in Compressed Diagonal Storage format; in particular I store each diagonal of the banded matrix in a column of the 2D array cds(1:N,-L:U), where N is the number of rows of the full matrix and L and U are the number of lower and upper diagonals (this question includes the definition of the type).
I also wrote a function to perform the product between a matrix in this CDS format and a full vector. To obtain each element of the product vector, the elements of the corresponding row of cds are used, which are not contiguous in memory, since the language is Fortran. Because of this I was wandering if a better solution would be to store the diagonals in the rows of a 2D array cds2(-L:U,1:N), which seems pretty reasonable to me.
On the contrary here I read
we can allocate for the matrix A an array val(1:n,-p:q). The declaration with reversed dimensions (-p:q,n) corresponds to the LINPACK band format [132], which, unlike compressed diagonal storage (CDS), does not allow for an efficiently vectorizable matrix-vector multiplication if p + q is small.
Which is just what seems appropriate to C in my opinion. What am I missing?
EDIT
The core of the routine performing matrix vector products is the following
DO i = A%lb(1), A%ub(1)
CDS_mat_x_full_vec(i) = DOT_PRODUCT(A%matrix(i,max(-lband,lv-i):min(uband,uv-i)), &
& v(max(i-lband,lv):min(i+uband,uv)))
END DO
(Where lv and uv are used to take into account the case of the vector indexed from an index other than 1.)
The matrix A is then accessed by rows.
I implemented the derived type which stores the diagonals in an array val(-p:q,1:n) and it is faster, as I supposed. So I think that the link I referenced refers to a row major storage language as C and not a column major one as Fortran. (Or it implements the matrix product in a way I can't even imagine.)

Perfect hashing function in a hash table implementation of a sparse matrix class

I'm currently implementing a sparse matrix for my matrix library - it will be a hash table. I already implemented a dense matrix as a nested vector, and since I'm doing it just to learn new stuff, I decided that my matrices will be multi-dimensional (not just a 2D table of numbers, but also cubes, tesseracts, etc).
I use an index type which holds n numbers of type size_t, n being a number of dimensions for this particular index. Index of dimension n may be used only as an address of an element in a matrix of appropriate dimension. It is simply a tuple with implicit constructor for easy indexing, like Matrix[{1,2,3}].
My question is centered around the hashing function I plan on using for my sparse matrix implementation. I think that the function is always minimal, but is perfect only up to a certain point - to the point of size_t overflow, or an overflow of intermediate operation of the hashing function (they are actually unsigned long long). Sparse matrices have huge boundaries, so it's practically guaranteed to overflow at some point (see below).
What the hashing function does is assign consecutive numbers to matrix elements as follows:
[1 2 3 4 5 6 7 8 ...]^T //transposed 1-dimensional matrix
1 4 7
2 5 8
3 6 9 //2-dimensional matrix
and so on. Unfortunately, I'm unable show you the ordering for higher order matrices, but I hope that you get the idea - the value increases top to bottom, left to right, back to front (for cube matrices), etc.
The hashing function is defined like this:
value = x1+d1*x2+d1*d2*x3+d1*d2*d3*x3+...+d1*d2*d3*...*d|n-1|*xn
where:
x1...xn are index members - row, column, height, etc - {x1, x2, x3,
..., xn}
d1...d|n-1| are matrix boundary dimensions - one past the end of matrix in the appropriate direction
I'm actually using a recursive form of this function (simple factoring, but complexity becomes O(n) instead of O(n^2)):
value = x1+d1(x2+d2(x3+d3(...(x|n-1|+d|n-1|(xn))...)))
I'm assuming that elements will be distributed randomly (uniform) across the matrix, but the bucket number is hash(key) mod numberOfBuckets, so it is practically guaranteed to have collisions despite the fact, that the function is perfect.
Is there any way to exploit the features of my hash function in order to minimize collisions?
How should I choose the load factor? Should I leave the choice to the user? Is there any good default value for this case?
Is the hash table actually a good solution for this problem? Are there any other data structures that guarantee average O(1) complexity given that I can roll the index into a number and a number into an index (mind the size_t overflow)? I am aware of different ways to store a sparse matrix, but I want to try the DOK way first.

Large Data Set for Processing, need to maintain original data set

Here's my problem definition: I have an array of seven million indices, each containing a label. So, for simplicity, here's an example array that I'm dealing with: [1 2 3 3 3 5 2 1 7].
I need to go through this array and every time I come across a label, input the location of the label into a "set" with all others of the same label. With the array being so large, I want to access only a specific label's location at any given point, so let's say, I want to access only the locations of 3 and process those locations and change them to 5's, but I want to do more than just one operation and not only that, I want to do it on all labels, separately. In a small array like in my example, it seems trivial just to stick with the array. However, with an array of seven million points, it is much more time expensive to complete the searching for said labels and then operate on them.
To clear up confusion, using my example, I want the example array to give me the following:
1 mapped to a set containing 0 and 7
2 mapped to a set containing 1 and 6
3 mapped to a set containing 2, 3, and 4
5 mapped to a set containing 5
Originally, I did my processing on the original array and simply operated on the array. This took roughly ~30 seconds to determine the number of corresponding indices for each label (so I was able to determine that the size of 1 was two, size of six was two, size of 3 was three, etc. However, it did not produce the locations of said labels using this method. Therefore, there was added time throughout the rest of my processing finding the locations of each label as well although it was sped up by adding the termination that once it found all the indices of the referenced label, to stop searching.
Next step, I used a map<int,set<int>> but this ultimately led to an increase in time to ~100 seconds but decreased time in processing later down the road, but not enough to justify the heavy increase in time.
I haven't implemented it yet, but as an additional step, I am planning on trying to initialize an array of sets, with the indices corresponding to the label and trying to do it this method.
I have also tried hash_maps as well to no avail. Unordered_sets and unordered_maps are not included in the STL in Visual Studio 2005 so I have not implemented the above with these two structures.
Key points:
I have pre-processed the array such that I know the maximum label, and that all labels are consecutive (there are no gaps between the minimum label and the maximum). However, they are randomly dispersed in the original array. This may prove useful in initialization of a set size data structure.
Order of the indices corresponding to the labels is not important. Order of the labels in their given data structure is also not important.
Edit:
For background, the array corresponds to a binary image, and I implemented binary sequential labeling to output an array of same size as the binary image of UINT16 with all binary blobs labeled. What I want to do now is to obtain a map of the points that make up each blob as efficiently as possible.
Why do you use such complicated data structures for that task? Just create a vector of vectors to store all the positions of each label and that's it. And you also can avoid annoying vector memory allocation by pre-processing how much space you need for each label. Something like that:
vector <int> count(N);
for(size_t i = 0; i < N; ++i)
++count[dataArray[i]];
vector < vector <int> > labels(N);
vector <int> last(N);
for(size_t i = 0; i < N; ++i)
labels[i].resize(count[i]);
for(size_t i = 0; i < N; ++i) {
labels[dataArray[i]][last[dataArray[i]]] = i;
++last[dataArray[i]];
}
It will work in O(N) time, what looks like 1 second for your seven million of integers.
I wouldn't necessarily use general purpose maps (or hash tables) for this.
My initial gut feeling is that I'd create a second array "positions" of seven million (or whatever N) locations, and a third array "last_position_for_index" corresponding to the range [min-label, max-label]. Note that this will almost certainly take less storage than any kind of map.
Initialize all the entries of last_position_for_index to some reserved value, and then you can just loop through your array with something like (untested):
for (std::size_t k = 0; k<N; ++k) {
IndexType index = indices[k];
positions[k] = last_position_for_index[index-min_label];
last_position_for_index[index-min_label] = k;
}

C++ How to generate the set of cartesian product of n-dimensional tuples

I wish to generate some data that represents the co-ordinates of a cloud of points representing an n-cube of n dimensions. These points should be evenly distributed throughout the n-space and should be able to be generated with a user-defined spacing between them. This data will be stored in an array.
I have found an implementation of a cartesian product using Boost.MPL.
There is an actual Cartesian product in Boost as well but that is a preprocessor directive, I assume it is of no use for you.
To keep things simple here's an example for an ordinary cube, ie one with 3 dimensions. Let it have side length 1 and suppose you want points spaced at intervals of 1/n. (This is leading to a uniform rectangular distribution of points, not entirely sure that this is what you want).
Now some pseudo-code:
for i=0;i<=n;i++ //NB i<=n because there will be n+1 points along each axis-parallel line
for j=0;j<=n;j++
for k=0;k<=n;k++
addPointAt(i/n,j/n,k/n) //float arithmetic required here
Note that this is not the Cartesian product of anything but seems to satisfy (a special case of) your criteria. If you want the points spaced differently, adjust the loop start and end indices or the interval size.
To generalise this to any specified higher dimension is easy, add more loops.
To generalise to any higher dimension which is not known until run time is only slightly more difficult. Instead of declaring an N-dimensional array, declare a 1-D array with the same number of elements. Then you have to write the index arithmetic explicitly instead of having the compiler write it for you.
I expect that you are now going to tell me that this is not what you want ! If it isn't could you clarify.
You can do this recursively(pseudocode):
Function Hypercube(int dimensions, int current, string partialCoords)
{
for i=0, i<=steps, i++
{
if(current==dimensions)
print partialCoords + ", " + i + ")/n";
else if current==0
Hypercube(dimensions, current+1, "( "+i);
else
Hypercube(dimensions, current+1, partialCoords+", "+i);
}
}
You call it: Hypercube(n,0,""); This will print the coordinates of all points, but you can also store them in a structure.