Counting unique login using Map Reduce - mapreduce

Let say I have a very big log file with this kind of format( based on where a user login )
UserId1 , New York
UserId1 , New Jersey
UserId2 , Oklahoma
UserId3 , Washington DC
....
userId999999999, London
Note that UserId1 logged in New York first and then he flied to New Jersey and logged again from there.
If I need to get how many unique user login (means 2 login will same userid considered as 1 login), how should I map and reduce it?
My initial plan is that I want to map it first to this kind of format :
UserId1, 1
UserId1, 1
UserId2, 1
UserId3, 1
And then reduce it to
UserId1, 2
UserId2, 1
UserId3, 1
But would this cause the output to be still big in number (Especially if common behaviour of user is to login 1 or 2 times a day ). Or is there a better way to implement this?

Do map-reduce.
For example, you have 10000 lines of data, but you can only process 1000 lines of data in a time.
Then, process 1000 lines of data for 10 times.
If the sum of lines of the 10 processing's result > 1000:
do the above step again.
else:
use set directly.

I recommend making use of a custom key in the map phase. You can refer the tutorial here for writing and using custom keys. The custom key should have two parts 1) userid 2)placeid. So essentially in the mapper phase you are doing this.
emit(<userid, place>, 1)
In the reduce phase, you just have to access the key and emit the two parts of the key separately.

Related

How to merger these two records ino one row removing Null value in Informatica using transformation. Please see the snapshot for scenario

enter image description here
Input-
Code value Min Max
A abc 10 null
A abc Null 20
Output-
Code value Min Max
A abc 10 20
You can use an aggregator transformation to remove nulls and get single row. I am providing solution based on your data only.
use an aggregator with below ports -
inout_Code (group by)
inout_value (group by)
in_Min
in_Max
out_Min= MAX(in_Min)
out_Max = MAX(in_Max)
And then attach out_Min, out_Max, code and value to target.
You will get 1 record for a combination of code and value and null values will be gone.
Now, if you have more than 4/5/6/more etc. code,value combinations and some of min, max columns are null and you want multiple records, you need more complex mapping logic. Let me know if this helps. :)

Implementing a calculated field within my Tableau Viz

I have data within tableau that I wish to show a breakdown of USED and FREE storage. However, I need to first filter a specific column to perform 2 different types of calculations. Here is the data
Total Free SKU
10 5 A
20 1 A
5 4 B
2 0 B
10 5 C
10 6 D
I am wanting to show a tableau bar chart that displays the available, used and total within Tableau. However, I need to first filter out by SKU:
I created this calculated field below as well as this calculated field:
Used = Total - Free
IF CONTAINS(ATTR([SKU]),'A') or
CONTAINS(ATTR([SKU]),'D')
THEN SUM([Total])
ELSEIF CONTAINS(ATTR([SKU]),'B') or
CONTAINS(ATTR([SKU]),'C')
THEN AVG([Total])
END
This is what I have done so far, but not sure how to incorporate the calculated field within the viz
Any suggestion is appreciated.
If I understand your problem correctly, proceed like this
Situation-1 You want to work at SKUG level
Create calculation fields each for total/USED/FREE as
SUM(ZN(IF CONTAINS([SKU], 'A') OR CONTAINS([SKU], 'D')
THEN [Total] END))
+
AVG(ZN(IF CONTAINS([SKU], 'B') OR CONTAINS([SKU], 'C')
THEN [Total] END))
Needless to say, please replace [total] by [used] or [free] as applicable
Situation-2 You want to work at higher level of detail instead. In this case you need to decide what you have to do with each of the SKU's group. Let's assume you want to add these. then creating similar fields will do. else replace + in a separate field with your desired operator(!).
Good luck!

How to countif 56 exists in 156/56/2567 and only return true once? Google sheets

I have one sheet with data on my facebook ads. I have another sheet with data on the products in my store. I'm having trouble with some countifs where I'm counting how many times my product ID exists in a row where multiple numbers are. They are formatted like this: /2032/2034/2040/1/
It's easy on the rows where only one product ID exists but some rows have multiple ID's separated by a /. And I need to see if the ID exists as a exact match alone or somewhere between the /'s.
Rows with facebook ads data:
A1: /2032/2034/2040/1/
A2: /1548/84/2154/2001/
A3: /2032/1689/1840/2548/
Row with product data:
B1: 2034
C1: I need a countifs here that checks how many times B1 exists in column A. Lets say I have thousands of rows with different variations of A1 where B1 could standalone. How do I count this? I always need exact matches.
You can compare the number you want (56) with the REGEX #MonkeyZeus commented whith a little change -> "(?:^|/)"&B1&"(?:/|$)" so the end result is:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), true, false)
Example:
UPDATE
If you need to count the total of 56 in X rows you can change the "True / False" of the condition for "1 / 0" and then do a =SUM(C1:C5) on the last row:
=IF(REGEXMATCH(A1, "(?:^|/)"&B1&"(?:/|$)"), 1, 0)
UPDATE 2
Thanks for contributing. Unfortunately I'm not able to do it this way
since I have loads of data to do this on. Is there a way to do it with
a countif in a single cell without adding a extra step with "sum"?
In that case you can do:
=COUNTA(FILTER(A:A, REGEXMATCH(A:A, "(?:^|/)"&B2&"(?:/|$)")))
Example:
UPDATE 3
With the following condition you check every single possibility just by adding another COUNTIF:
=COUNTIF(A:A,B1) + COUNTIF(A:A, "*/"&B1) + COUNTIF(A:A, B1&"/*") + COUNTIF(A:A, "*/"&B1&"/*")
Hope this helps!
try:
=COUNTIF(SPLIT(A1, "/"), B1)
UPDATE:
=ARRAYFORMULA(IF(A2<>"", {
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="carousel"), 1, )),
SUM(IF((REGEXMATCH(""&DATA!C:C, ""&A2))*(DATA!B:B="imagepost"), 1, ))}, ))

Automation Array Formula

I need to make a formula that generates the number of the incident in this data
Whenever someone takes an action from consequences tab I should choose the incident and the number of it so if someone took the same incident before and will take it again this will be the 2nd time with a different action and every action has a 180 days expiry which is represented in the expiry column 0 means expired 1 means, not expired
what I need here is to generate the number of incidents automatically when the array formula looks at the name of the employee then it counts the incident if the incident number is equal 2 then it means 2 times same incident so generate 2nd time and if it is equal 3 it means 3 times same incident ETC...
I have tried this array but it counts all the errors without taking the agent as criteria
=ARRAYFORMULA(IF(ROW(A:A)=1,"Number of incidents (Automation)",IF(LEN(A:A)=0,IFERROR(1/0),IF(COUNTIF(B:B,B:B)=0," ",IF(COUNTIF(C:C,C:C)=6,"6th Time",IF(COUNTIF(C:C,C:C)=5,"5th Time",IF(COUNTIF(C:C,C:C)=4,"4th Time",IF(COUNTIF(C:C,C:C)=3,"3rd Time",IF(COUNTIF(C:C,C:C)=2,"2nd Time",IF(COUNTIF(C:C,C:C)=1,"1st Time"," "))))))))))
here is a sample of the data https://docs.google.com/spreadsheets/d/1OqxTwyeZlbzYsUYIF6sNqkQS3uzEkpyC15RRVP2P4rA/edit?usp=sharing
try like this in D1:
={"Number of incidents (Automation)";
ARRAYFORMULA(IF(LEN(C2:C), COUNTIFS(C2:C, C2:C, ROW(C2:C), "<="&ROW(C2:C)), ))}
if that needs to be per employee then use:
={"Number of incidents (Automation)";
ARRAYFORMULA(IF(LEN(C2:C), COUNTIFS(B2:B&C2:C, B2:B&C2:C, ROW(C2:C), "<="&ROW(C2:C)), ))}
and to exclude expired you can do:
={"Number of incidents (Automation)";
ARRAYFORMULA(IF(F2:F=1, COUNTIFS(B2:B&C2:C&F2:F, B2:B&C2:C&F2:F, ROW(C2:C), "<="&ROW(C2:C)), ))}
I didn't get what you exactly want. But hope this helps you.
Instead of
IF(COUNTIF(C:C,C:C)=X,"Xnd Time"
You can write
IF(COUNTIF(C:C,C:C)&"nd Time"
in full formula
=ARRAYFORMULA(IF(ROW(A:A)=1,"Number of incidents (Automation)",IF(LEN(A:A)=0,IFERROR(1/0),IF(COUNTIF(B:B,B:B)=0," ",COUNTIF(C:C,C:C)&"th Time"))))

Ruby on Rails Model Actions

I am working on a webserver project and have little experience with MVC architecture. What reading I have done points me to the idea of "Skinny Controllers, Fat Models." Keeping this in mind, I've tried tinkering around in rails to get the effect I'm looking for:
I accept through a form a string and must sanitize it (Assuming this is done through Callbacks). After it has been sanitized, I'm converting it into an array and sorting it.
Each step of the sort is being recorded in a variable #states. I need to put #states variable into the database when the process is done.
My question is: how is this setup best enacted? I think I'm correct that the sanitation should be performed through a callback... but the second part is what's giving me fits.
Here's something of an example:
recieve input: "4 3 2 1\r\n5 2 2 1\r\n1 5 4 3"
sanitize input to: 4 3 2 1 5 2 2 1 1 5 4 3 # Presumably through a callback
until input.sorted? do:
input.bubblesort(#states) # each time bubblesort moves an entry, it sends a copy of the array to #states
#Do something with #states in the model.rb file
I would love some input on this because I'm at an impasse and nothing quite makes sense.
The value you get from the form submission will be stored in the params hash. I'm fairly certain rails sanitizes everything automatically as of rails 4 and that you only need to be wary of tainted data during ActiveRecord queries.
Assuming you have your input: "4 3 2 1\r\n5 2 2 1" I would replace all r's and n's with spaces via the .gsub method. Once you have input such as
input= "4 3 2 1 5 2 2 1" you can do input.split(/\s/) which will convert the string to an array with all elements split where there are spaces. From there you can easily order the array since strings are comparable in ruby.
EDIT***
#states = input.bubblesort(#states) #then maybe something like this depending on your table setup: #new_state = YourModel.create(#states) ' If you're using a db like Postgresql you can specify that the input of a certain attribute will be an array. This is done as so via a migration: 'add_column :table_name, :column_name, :string, array: true, default: []'
Normally, the only logic that is kept in the model are your validations and sort/scope methods.