Related
I am developing an app in flutter. For which I am using lists of map but there something that I am unable to undertand. Consider the following cases:
SCENARIO 1
void main() {
List<Map<String,String>> _reminders = [];
Map<String , String> _tempMap = {};
for (int i = 0; i < 5; i++) {
_tempMap.clear();
_tempMap.putIfAbsent('M' , () => 'm ' + i.toString());
_tempMap.putIfAbsent('D' , () => 'd : ' + i.toString());
_reminders.add(_tempMap);
// or _reminders.insert(i, _tempMap);
}
print(_reminders.toString());
return;
}
to which the result is as follows
[{M: m 4, D: d : 4}, {M: m 4, D: d : 4}, {M: m 4, D: d : 4}, {M: m 4, D: d : 4}, {M: m 4, D: d : 4}]
SCENARIO 2
void main() {
List<Map<String,String>> _reminders = [];
for (int i = 0; i < 5; i++) {
Map<String , String> _tempMap = {};
_tempMap.putIfAbsent('M' , () => 'm ' + i.toString());
_tempMap.putIfAbsent('D' , () => 'd : ' + i.toString());
_reminders.add(_tempMap);;
}
print(_reminders.toString());
return;
}
to which the result is as follows
[{M: m 0, D: d : 0}, {M: m 1, D: d : 1}, {M: m 2, D: d : 2}, {M: m 3, D: d : 3}, {M: m 4, D: d : 4}]
As far as I understand, these scenarios should give similar results. Also in my use case scenario 2 is the correct way as it gives me the result that I want. Please note the above examples have been changed to similify the question. The usage in my original code is much more complex.
Dart, like many other programming languages including java, stores objects as reference, and not contiguous memory blocks. In the first case, in all the iterations of the loop, you have added the same Map using the _reminders.add(_tempMap). Your intuition that "Everytime I add the Map, a copy is created of the current state of Map and that copy is appended to the list" is incorrect.
From my understanding, both are different
The problem is with _tempMap.clear(); in the SCENARIO 1. You have used the global variable for map object and when you apply clear inside the for loop all the previously added entries will be cleared and map becomes empty.
when i = 0 => {} => clear() => all entries will be cleared => New item inserted.
when i = 1 => {"Item inserted in 0th iteration"} => clear() => all entries will be cleared => New item inserted.
So for every iteration map is cleared and holds only last iterated value. After for loop is completed it contains only the last iterated value(i=4) since we are clearing the global map variable every time when a new iteration starts.
EDIT :
You can print the map values inside the for loop and can check yourself.
for (int i = 0; i < 5; i++) {
print('\n $i => ${_tempMap} \n');
I have come across a code where i get confused , An unordered_map is initialised like below
std::unordered_map<std::string, int> wordMap;
// Inserting elements through an initializer_list
wordMap.insert({ {"First", 1}, {"Second", 2}, {"Third", 3} } );
But what surprise me is the below code
int arr[] = { 1, 5, 2, 1, 3, 2, 1 };
unordered_map<int, int> hash;
for (int i = 0; i < n; i++)
hash[arr[i]]++;
Here i am not getting how key and value is inserted in the map
Here, In unordered_map, hash[arr[i]]++; works in this way:
It searches for a key (arr[i]). If it is found, the corresponding value is incremented by 1.
If it is not found, a new element will be created with key arr[i] and because value is of type int, default value of 0 is stored for it. Because of ++ operator, it will be incremented by one. So, at the end of the operation, the value will be 1.
To be very explicit for your example, it works like this:
i = 0 => arr[i] = 1 => Not present in map => New pair added => hash: [{1, 1}]
i = 1 => arr[i] = 5 => Not present in map => New pair added => hash: [{1, 1}, {5, 1}]
i = 2 => arr[i] = 2 => Not present in map => New pair added => hash: [{1, 1}, {5, 1}, {2, 1}]
i = 3 => arr[i] = 1 => Present in map => Existing pair updated => hash: [{1, 2}, {5, 1}, {2, 1}]
i = 4 => arr[i] = 3 => Not present in map => New pair added => hash: [{1, 2}, {5, 1}, {2, 1}, {3, 1}]
i = 5 => arr[i] = 2 => Present in map => Existing pair updated => hash: [{1, 2}, {5, 1}, {2, 2}, {3, 1}]
i = 6 => arr[i] = 1 => Present in map => Existing pair updated => hash: [{1, 3}, {5, 1}, {2, 2}, {3, 1}]
The order mentioned here might be different from actual one. The above explanation is just to explain things.
The key of the unordered map must be unique so all 1:s will be combined. But when they do combine the loop will add 1 to the value side:
hash[arr[i]]++ will be equal to this example: hash[1] += 1;
Since there are three 1 values, hash[1] will end up with a value of 3. You will find two records of the value 2 and this will make hash[2] = 2.
#include <iostream>
#include <unordered_map>
int main()
{
int arr[] = { 1, 5, 2, 1, 3, 2, 1 };
std::unordered_map<int, int> hash;
for (int i = 0; i < 7; i++) {
hash[arr[i]] += 1;
}
for (auto i : hash) {
printf("%i:%i\n", i.first, i.second);
}
}
# Output:
# 3:1
# 2:2
# 5:1
# 1:3
operator[] checks if the element exists. If it doesn't then it creates one using default constructor and returns a reference (or const reference to it). ie :
hash[arr[0]]++
it creates hash[1]first
which is
hash[1]++ => hash[1]=hash[1]+1 which is 0+1 ( since hash[1] at the begining was 0 by default.. )
when it get to the second 1 it become hash[1]=hash[1]+1 = 2 ...
..ect same for other values
basically it s creating & counting the number of the duplicates in the array
at the end it gives you
hash[1]=3
hash[2]=2
hash[3]=1
hash[5]=1
I have a pretty complicated query I am trying to convert over to use with Hive.
Specifically, I am running it as a Hive "step" in an AWS EMR cluster.
I have tried to clean up the query a bit for the post and just leave essence of the thing.
The full error message is:
FAILED: SemanticException [Error 10128]: Line XX:XX Not yet supported place for UDAF 'COUNT'
The line number is pointing to the COUNT at the bottom of the select statement:
INSERT INTO db.new_table (
new_column1,
new_column2,
new_column3,
... ,
new_column20
)
SELECT MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(new_column5," ")||"_"||...) AS
new_col1,
TBL1.col2,
TBL1.col3,
TBL1.col3 AS new_column3,
TBL1.col4,
CASE
WHEN TBL1.col5 = …
ELSE “some value”
END AS new_column5,
TBL1.col6,
TBL1.col7,
TBL1.col8,
CASE
WHEN TBL1.col9 = …
ELSE "some value"
END AS new_column9,
CASE
WHEN TBL1.col10 = …
ELSE "value"
END AS new_column10,
TBL1.col11,
"value" AS new_column12,
TBL2.col1,
TBL2.col2,
from_unixtime(…) AS new_column13,
CAST(…) AS new_column14,
CAST(…) AS new_column15,
CAST(…) AS new_column16,
COUNT(DISTINCT TBL1.col17) AS new_column17
FROM db.table1 TBL1
LEFT JOIN
db.table2 TBL2
ON TBL1.col311 = TBL2.col311
WHERE TBL1.col14 BETWEEN "low" AND "high"
AND TBL1.col44 = "Y"
AND TBL1.col55 = "N"
GROUP BY 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20;
If I have left out too much, please let me know.
Thanks for your help!
Updates
It turns out, I did in fact leave out way too much info. Sorry for those who have already tried to help...
I made the updates above.
Removing the 20th group by column, eg:
GROUP BY 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;
Produced: Expression not in GROUP BY key '' ''
LATEST
Removing the 20th group by column and adding the first one, eg:
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;
Produced:
Line XX:XX Invalid table alias or column reference 'new_column5':(possible column
names are: TBL1.col1, TBL1.col2, (looks like all columns of TBL1),
TBL2.col1, TBL2.col2, TBL2.col311)
Line # is referring the line with the SELECT statement. Just those three columns from TBL2 are listed in the error output.
The error seems to be pointing to COALESCE(new_column5). Note that I have a CASE statement within the TBL 1 select which I am running with AS new_column5.
You are addressing calculated column name new_column5 at the same subquery level where it is being calculated. This is not possible in Hive. Replace it with calculation itself or use upper level subquery.
This:
MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(CASE WHEN TBL1.col5 = … ELSE “some value” END," ")||"_"||...) AS new_col1,
Instead of this:
MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(new_column5," ")||"_"||...) AS
new_col1,
I have a DataFrame with the following simple schema:
root
|-- amount: double (nullable = true)
|-- Date: timestamp (nullable = true)
I was trying to see the sum of amounts per day and per hour, some like:
+---+--------+--------+ ... +--------+
|day| 0| 1| | 23|
+---+--------+--------+ ... +--------+
|148| 306.0| 106.0| | 0.0|
|243| 1906.0| 50.0| | 1.0|
| 31| 866.0| 100.0| | 0.0|
+---+--------+--------+ ... +--------+
Well, first I added a column hour and then I grouped by day, and pivoted by hour. However, I got an exception, which perhaps is related to missing sales for some hours. This is what I'm trying to fix but I haven't realized how.
(df.withColumn("hour", hour("date"))
.groupBy(dayofyear("date").alias("day"))
.pivot("hour")
.sum("amount").show())
An excerpt of the exception.
AnalysisException: u'resolved attribute(s) date#3972 missing from
day#5367,hour#5354,sum(amount)#5437 in operator !Aggregate
[dayofyear(cast(date#3972 as date))], [dayofyear(cast(date#3972 as
date)) AS day#5367, pivotfirst(hour#5354, sum(amount)#5437, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 0, 0) AS __pivot_sum(amount) AS sum(amount)#5487];'
The problem is unresolved day column. You can create it outside groupBy clause to address that:
df = (sc
.parallelize([
(1.0, "2016-03-30 01:00:00"), (30.2, "2015-01-02 03:00:02")])
.toDF(["amount", "Date"])
.withColumn("Date", col("Date").cast("timestamp"))
.withColumn("hour", hour("date")))
with_day = df.withColumn("day", dayofyear("Date"))
with_day.groupBy("day").pivot("hour", range(0, 24)).sum("amount")
values argument for pivot is optional but advisable.
Apologies if this is obvious but I'm pretty new to Python and I cannot get my head around this problem. In the following code I have populated a tuple with a series of lists and I am trying to create a new list with items from this tuple. I was hoping that the final result will be that test_raw remains unchanged and test_working will look like the following:
[['aa', 1, 2, 99.5, ['bb', 1, 2, 27.2]],
['aa', 5, 5, 74.2, ['bb', 5, 5, 37]]]
However, in the process, I seem to be appending the 'bb' lists to my tuple as well. I thought that once a tuple is constructed, it cannot be changed but obviously not. Any idea what is happening?
test_raw = (['aa',1,2,99.5],
['bb',1,2,27.2],
['aa',5,5,74.2],
['bb',5,5,37])
test_working = []
for i in range(len(test_raw)):
if test_raw[i][0] == "aa":
test_working.append(test_raw[i])
for i in range(len(test_raw)):
if test_raw[i][0] == "bb":
for j in range(len(test_working)):
if test_working[j][1:3] == test_raw[i][1:3]:
test_working[j].append(test_raw[i])
break
print(test_raw)
(['aa', 1, 2, 99.5, ['bb', 1, 2, 27.2]], ['bb', 1, 2, 27.2], ['aa',.....)
You are not appending to the tuple itself, but the lists inside tuple. I won't debug your code for you but when you run your code, you'll notice that your first list (originally ['aa',1,2,99.5]) has a new element in it (['bb', 1, 2, 27.2])
You aren't appending to the tuple, you are just changing the lists that are inside that tuple
Consider this simple example
my_tuple = (1,2,3, [4,5,6])
my_tuple[3].append(7)
This doesn't add onto my_tuple, just the list that is the last element of it