Getting the first row of the element from an array - apache-spark-2.0

I want to get the first row from a spark 2 dataset..the dataset is as follow:
|arrayValue |
+-------------------------------------------------------------+
|[1.47527718E12, 134535353E12] |
+-------------------------------------------------------------+
I used below codes to acess the tow values
double training_point = (double) ratios.collectAsList().get(0).getDouble(0);
double validation_point = (double) ratios.collectAsList().get(0).getDouble(1);
but it gives me below exception:
java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.lang.Double
Does anyone know how to fix the error?

i think you are trying to get 2 arrays, when you only have one

Related

Check if two arrays are exactly the same in BigQuery merge statement

I have two tables in BigQuery that I am trying to merge. For the purpose of explanation, let us name the two tables as A and B. So, we merge B into A. Also, I have a primary key called id based on which I am performing the merge. Now, both of them have a column (let us name it as X for explanation purposes) which is of the type ARRAY. My main intention is to replace the array data in A with that of B if the arrays are not equal in both the table. How can I do that. I did find posts on SO and other sites but none of them are working in my usecase.
A B
---------- ----------
id | x id | x
---------- ----------
1 | [1,2] 1 | [1,2]
---------- ----------
2 | [3] 2 | [4, 5]
The result of the merge should be
A
----------
id | x
----------
1 | [1,2]
----------
2 | [4,5]
How can I achieve the above result. Any leads will be very helpful. Also, if there are some other posts that address the above scenario directly, please point me to them
Edits:
I tried the following:
merge A as main_table
using B as updated_table
on main_table.id = updated_taable.id
when matched and main_table.x != updated_table.x then update set main_table.x = updated_table.x
when not matched then
insert(id, x) values (updated_table.id, updated_table.x)
;
Hope, this helps.
I cannot direclty use a compare operator over array right. My use case is that only update values when they are not equal. So, i cannot use something like != directly. This is the main problem
You can use to_json_string function to compare two arrays "directly"
to_json_string(main_table.x) != to_json_string(updated_table.x)

Django Queryset GET (1 result) checking MAX value of one filed

Maybe the solution is to do it with Filter and then loop. But let's see if you guys can tell me a way to do it with GET
I have this query with GET as I need to be sure I get only one result
result = OtherModel.objects.get(months_from_the_avail__lte=self.obj.months_from_avail)
Months_from_avail is an Integer value.
Example
months_from_the_avail = 22
In the other model there's 3 lines.
A) months_from_the_avail = 0
B) months_from_the_avail = 7
C) months_from_the_avail = 13
So, when I query it returns all of them as all are less than equal the value 22 but I need to get the 13 as is the last range.
range 1 = 0-6
range 2 = 7-12
range 3 = 13 ++
Is there any way that I haven't thought to do it? Or should I change it to filter() and then loop on the results?
you can get the first() section from the query order_by months_from_the_avail
Remember that django query are lazy, it won't execute until the query if finished calling so you can still use filter:
result = OtherModel.objects.filter(months_from_the_avail__lte=self.obj.months_from_avail).order_by('-months_from_the_avail').first()
#order by descending get first object which is the largest, return None if query set empty
another suggestion from Abdul which i think it's faster and better is using latest()
OtherModel.objects.latest('-months_from_the_avail')

Problem while implementing the correlation coefficient

I'm new to PowerBi/powerquery and I was trying to write a function that calculates the correlation coefficient of 2 given lists.
I used the formula in the following image :
[![enter image description here][1]][1]
let
Function = (l1 as list , l2 as list) =>
let
CorCoefNumerator = List.Sum((l1 - List.Average(l1)) * (l2 -
List.Average(l2))),
Denominator1 = List.Sum(Number.Power(l1 - List.Average(l1), 2)),
Denominator2 = List.Sum(Number.Power(l2 - List.Average(l2), 2)),
CorCoefDenominator = Number.Sqrt(Denominator1 - Denominator2),
CorCoef = Value.Divide(CorCoefNumerator, CorCoefDenominator)
in
CorCoef
in
Function(Table.ToList([Sales]), Table.ToList([Profit]))
```
The Error message I'm getting is :
An error occurred in the ‘’ query. Expression.Error: There is an unknown identifier. Did you use the [field] shorthand for a _[field] outside of an 'each' expression?
One more question : Is there a way to use DAX function while writing power query queries ? becaus when first I tried to compute this correlation coefficient I needed it to work on columns, but since I couldn't use the DAX functions I had to use my columns as Lists !
[1]: https://i.stack.imgur.com/5z3uJ.png

Check if value of Cell is in another Sheet2.CellRange if yes, take the value of Sheet2.CellD position

OBS: im using OpenOffice, i cant use the "OpenOffice" tag, =|
i have this Sheet2:
and I'm planning to type the value of B4:B12 inside another Sheet
for example, i type in A1 the value 4, so it will fill the B with D4 and C with E4(from sourceSheet position)
Sheet1 that will get the value of D or E from Row where Sheet2.B is equal Sheet1.A
--A--B--C
1|4-D4--E4
2|
3|7-D7--E7
4|1-D1--E1
and i tried this:
LOOKUP(A1;Sheet2.B1:Sheet2.B12;Sheet2.D4:Sheet2.D12);
but its not getting the value, just return sometimes #NAME
I believe your ranges are written incorrectly.
First, Sheet2.B1:Sheet2.B12 should be Sheet2.B1:B12
Second, for the Lookup function, the searchtable and result table must be the same size (take a look at the online documentation for details).
Try this instead:
LOOKUP(A1;Sheet2.B1:B12;Sheet2.D1:D12);
Please try in B1 and copied across to C1, then both down to suit:
=IF(ISERROR(LOOKUP($A1;Sheet2.$B$4:$B$12;Sheet2.D$4:D$12));"";LOOKUP($A1;Sheet2.$B$4:$B$12;Sheet2.D$4:D$12))

Python Pandas: How to get column values to be a list?

I have a dataframe with one column and 20 rows. I want to use
dataframe[column].apply(lambda x : some_func(x))
to get second column. The function returns a list. Pandas is not giving me what I want. It is filling the second column with NaN instead of the list items that some_func() is returning.
Is there a clever or simple way to fix this?
It seems that the error was cause because I forgot to include:
axis = 1
My full line of code should have been:
dataframe[column].apply(lambda x : some_func(x), axis = 1)
You can just assign it like a dictionary:
dataframe['column2'] = dataframe['column1'].apply(lambda x : some_func(x))
Simple as that.