enter image description here
Input-
Code value Min Max
A abc 10 null
A abc Null 20
Output-
Code value Min Max
A abc 10 20
You can use an aggregator transformation to remove nulls and get single row. I am providing solution based on your data only.
use an aggregator with below ports -
inout_Code (group by)
inout_value (group by)
in_Min
in_Max
out_Min= MAX(in_Min)
out_Max = MAX(in_Max)
And then attach out_Min, out_Max, code and value to target.
You will get 1 record for a combination of code and value and null values will be gone.
Now, if you have more than 4/5/6/more etc. code,value combinations and some of min, max columns are null and you want multiple records, you need more complex mapping logic. Let me know if this helps. :)
I have one sheet with data on my facebook ads. I have another sheet with data on the products in my store. I'm having trouble with some countifs where I'm counting how many times my product ID exists in a row where multiple numbers are. They are formatted like this: /2032/2034/2040/1/
It's easy on the rows where only one product ID exists but some rows have multiple ID's separated by a /. And I need to see if the ID exists as a exact match alone or somewhere between the /'s.
Rows with facebook ads data:
A1: /2032/2034/2040/1/
A2: /1548/84/2154/2001/
A3: /2032/1689/1840/2548/
Row with product data:
B1: 2034
C1: I need a countifs here that checks how many times B1 exists in column A. Lets say I have thousands of rows with different variations of A1 where B1 could standalone. How do I count this? I always need exact matches.
You can compare the number you want (56) with the REGEX #MonkeyZeus commented whith a little change -> "(?:^|/)"&B1&"(?:/|$)" so the end result is:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), true, false)
Example:
UPDATE
If you need to count the total of 56 in X rows you can change the "True / False" of the condition for "1 / 0" and then do a =SUM(C1:C5) on the last row:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), 1, 0)
UPDATE 2
Thanks for contributing. Unfortunately I'm not able to do it this way
since I have loads of data to do this on. Is there a way to do it with
a countif in a single cell without adding a extra step with "sum"?
In that case you can do:
=COUNTA(FILTER(A:A, REGEXMATCH(A:A, "(?:^|/)"&B2&"(?:/|$)")))
Example:
UPDATE 3
With the following condition you check every single possibility just by adding another COUNTIF:
=COUNTIF(A:A,B1) + COUNTIF(A:A, "*/"&B1) + COUNTIF(A:A, B1&"/*") + COUNTIF(A:A, "*/"&B1&"/*")
Hope this helps!
try:
=COUNTIF(SPLIT(A1, "/"), B1)
UPDATE:
=ARRAYFORMULA(IF(A2<>"", {
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="carousel"), 1, )),
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="imagepost"), 1, ))}, ))
I have UDF output as :-
Sample records:-
({(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,5),(Todd,10),(Todd,20),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10)})
({(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,5),(Jon,10),(Jon,20),(Jon,10),(Jon,10),(Jon,10),(Jon,10),(Jon,5),(Jon,20),(Jon,1)})
Schema for UDF:- name:chararray(1 single column)
Now i want to read this bag of tuples and generate output as :-
Todd,240
Jon,422
The output of the UDF i stored in a temp file and read it back using different schema as:-
D = LOAD '/home/training/pig/pig/UDFdata.txt' AS (B: bag {T: tuple(name:chararray, denom:int)});
After that i am trying to use foreach loop and reference dot notation to find the sum.
X = foreach D generate B.T.name,SUM(B.T.denom);
2017-03-04 13:52:59,507 ERROR org.apache.pig.tools.grunt.Grunt: ERROR
1128: Cannot find field T in name:chararray,denom:int Details at
logfile: /home/training/pig_1488648405070.log
Can you please let me know how to find it? I am new to Apache Pig so not sure how it traverse in Bag of Tuples and find sum.
GROUP the dataset on name before performing SUM.
FLATTEN the bag to perform GROUP.
flattened = FOREACH D GENERATE FLATTEN(B);
dump flattened;
...
(Todd,10)
(Todd,10)
(Jon,1)
(Jon,1)
....
Then, GROUP them on name
grouped = GROUP flattened by name;
dump grouped;
(Jon,{(Jon,1),(Jon,20),(Jon,5),(Jon,10),(Jon,10),(Jon,10),(Jon,10),(Jon,20),(Jon,10),(Jon,5),(Jon,1),(Jon,1),(Jon,1),(Jon,1),(Jon,1)})
(Todd,{(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,10),(Todd,20),(Todd,10),(Todd,5),(Todd,1),(Todd,1),(Todd,1),(Todd,1),(Todd,1)})
And apply SUM() over the result
final_sum = FOREACH grouped GENERATE group, SUM(flattened.denom);
dump final_sum;
(Jon,106)
(Todd,100)
I am using PLSQL to realize some of the function below.
I have the table which have piece level data with each piece weight. Basically I want to realize the following function:
if piece weight is over 1 LB. groupby ceil(weight) (next LB)
if piece weight is less 1 LB groupby cell(weight*16) ( Next OZ)
I am just curious how can I realize that in plsql. I feel I need to have the if statement. But I am not sure how to do that.
(Weight is already an variable in that table, do I need to declare here?)
begin
if weight <1 then
select ceil(weight*16),sum(weight)
from ops_owner.track_mail_item
where manifestdate = '24-aug-2016'
group by ceil(weight*16)
else select ceil(weight),sum(weight)
from ops_owner.track_mail_item
where manifestdate = '24-aug-2016'
end if,
end;
Thank you very much!
I would adjust the weight value in an inline view, I think.
select
ceil(adjusted_weight),
sum(adjusted_weight)
from
(
select
case
when weight < 1 then weight * 16
else weight
end adjusted_weight
from
ops_owner.track_mail_item
where
manifestdate = '24-aug-2016'
)
group by
ceil(adjusted_weight);