how to transpose a 5x5 matrix using mapreduce - mapreduce

I have tried that mapper is dividing the matrix into 2x2 and giving it to reducer. The reducer perform transpose for each 2x2 matrix. But while dividing odd order matrix into 2x2 leaving 1x1 matrix at the end. So how to write a map and reduce function to overcome this problem. Please send me a solution.

Assuming the matrix is in csv delimited for columns and new line for rows. You can read line by line in map task
For each line, do a split based on comma and get token for each column.
Now you would need a custom value object for reducer(inherit from writable), which would store the line_number & value at specific column.
The map task would emit, the key as column number at which column value is read and value would be the custom value object defined above.
You would need a secondary sort/comparator which would sort based on custom value object's line_number filed so that when the order can be maintained when values would be iterated at reducer end.
At reducer, just read the keys sorted based on line number, get the iterator for each key and create a csv string and write to output file.

Related

In Libreoffice Calc, which formula will check if a a keyword or part of it is contained in a cell in a row and copy the entire content of that cell?

I am learning how to use formulas in spreadsheets, I do use libre office.
I need to sort out data in a quite huge messy spreadsheet.
Each column contains mixed data, the sheet is huge, dozens of columns and thousands of rows, if the spreadsheet does not contain errors each cell in a row either contains a different keyword or is empty, there should not be two cells in the same row containing the same keyword.
The problem to solve is to sort out all the data so to reach to have a new spreadsheet in which each cell marked with a given specific keyword is kept in the same position but placed in one column dedicated to that same keyword.
the kind of spreadsheet with mixed up cells to be sorted out
the data in the spreadsheet has to be fixed so to appear in this way
A formula that can be used to extract sorted out data from a cell is the following:
=IF(SEARCH("Text1";B2;1);B2;0)
The formula can be dragged to each cell below to hit the proper cell next to it. The result is correct.
The results are correct, but I do not know why the expected 0 is not printed, there is #VALUE! instead
The logic is very simple, if the cell contains the keyword or any other text that contains that keyword the result is the full content of that cell, otherwise the result is 0.
Here comes the first question, why do I get #VALUE! as a result for those cells that do not contain the keyword? I expected to get 0 instead, just as indicated in the formula,
I tried to leave this filed empty and also to put the 0 result in quotes, the actual result is always the same, #VALUE!...
However, of course this formula extracts only the information contained in one column, so for each other column the process must be repeated.
In order to avoid to create a column with the formula for each column in the spreadsheet or anyway to process each column one by one and more importantly to have then to merge all the results to form one columns containing only cells with a given keyword I thought to use the same formula extending the parsing to each next cell in the row as follows:
=IF(SEARCH("text";B2;1);B2;IF(SEARCH("text";C2;1);C2;IF(SEARCH("text";D2;1);D2;0)))
The logic is very simple and should output in one go a column containing all the cells containing the keyword that are found in the row, check if the first cell in the row contains a word using the search function, if does then the result is the content of that cell, otherwise perform the next test, the next test is the same, check if the next cell contains a certain word using the search function, if does then the result is the content of that cell, otherwise proceed to the next test…. and so on until last test, if no test gave a true result then print 0 (but we get #VALUE!, OK I could live with that...).
In theory should work for a any number of cells, but in the practice does not at all, in fact does work only for the first IF test and cell indicated in the formula.
WHY?
The result using the extended version of the formula to parse N cells in sequence is the same obtained with the simple formula to parse only one cell
Finally, how do I resolve this problem using IF and Search?
Is there any other better approach and way to solve this kind of problems and sort out data in huge spreadsheets of this kind?
Thank you for any hint and help.

How to write columnwise in FORMAT in FORTRAN 77

I am using FORTRAN77 as a third party language on ANSYS computation software. Here we can write the entire row and columns to files during I/O operations. I am not able to however move the cursor to the first row and write column wise thereafter- for every column in the 2D array defined. It writes all the data in the single column unfortunately. I need to know what I can use at the place quoted as XXX
*CFOPEN, ACT_STR, CSV,,APPEND
*DO,INF,1,2*S,1
*VWRITE, S0(1,INF),
(XXX,F10.2,',')
*CFCLOS
You can try transpose of the matrix and then print the matrix row-wise. you can write a small subroutine that can do the transpose for SO.

Map-reduce : work on multiple lines

I have a requirement where i need to work on multiple rows of input data, first sort the data and then substract one value from row one in next row and so on. Can we do this operation in map reduce somehow ?
You can make your custom Record Reader and send your desired number of records to map task and perform the calculations.

Data structure for storing unique values

I need to keep data of the following form:
(a,b,1),
(c,d,2),
(e,f,3),
(g,h,4),
(i,j,5),
(k,l,6),
(m,a,7)
...
such that the integers within the data (3rd column) are consecutively ordered and are unique. Also there are 2,954,208,208 such rows. I am searching for a data structure which returns the value of the 3rd column given the value of first two columns e.g.
Given: (i,j) it returns 5
And given the value of 3rd column, first two columns can be retrieved. For example,
Given: 5 it returns (a,b)
Is there some data structure which may help me achieve the same.
My approach towards solving this problem was to use hash-maps..but hash-maps do not turn out to be efficient. Is there some other way out.
The values in the first, second and third column are all of 64-bit.
I have 4GB of RAM.

Splitting File Input Into Numerous Vectors

I have a CSV file that has about 10 different columns. Im trying to figure out whats the best method to go about here.
Data looks like this:
"20070906 1 0 0 NO"
Theres about 40,000 records like this to be analyzed. Im not sure whats best here, split each column into its own vector, or put each whole row into a vector.
Thanks!
I think this is kind of subjective question but imho I think that having a single vector that contains the split up rows will likely be easier to manage than separate vectors for each column. You could even create a row object that the vector stores to make accessing and processing the data in the rows/columns more friendly.
Although if you are only doing processing on a column level and not on a row or entry level having individual column vectors would be easier.
Since the data set is fairly small (assuming you are using a PC and not some other device, like a smartphone), you can read the file line by line into a vector of strings and then parse the elements one by one and populate a vector of some structures holding the records data.