How to distribute values into group in python - python-2.7

I have a dataset of actions doing over time, an attribute 'Hour' ( contains values from 0 ->23 ). Now I want to create another attribute, say 'PartOfDay', which group 24 hours into 4 parts. For tuples have 'Hour' value of 0 to 5, then the 'PartOfDay' value should be 1; if 'Hour' value in [6,11], then the 'PartOfDay' value should be 2;...How can I do?
The codes would do this:
train['PartOfDay']=1
train.loc[(train.Hour>=6) & (train.hour<=11),'PartOfDay']=2
train.loc[(train.Hour>=12) & (train.hour<=17),'PartOfDay']=3
train.loc[(train.Hour>=18) & (train.hour<=23),'PartOfDay']=4
but it seems not so beautiful, I would like to know a more decent one if possible
Thank you for all your supports!!

While it is not clear what train.loc represents, a general approach to your problem is to use modulus function to set the RHS:
1 + int(train.Hour / 6)

Related

Code to missing values if all Items of an Item battery have value 1

I have a large data set in Stata.
There are several item batteries in this data set.
One item battery consists of 8 items (v1 - v8), each scaled from 1 to 7.
I want to code all items that take the value 1 in all items as missing values.
If v1 to v8 have the value "1", all rows to which this applies are to be replaced with missings.
I know how to code missing values with the if qualifier, but the selection with the complex condition causes me difficulties.
The code for R would probably solve this via rowSums, but I need the solution for Stata.
(I assume in R it would work like this:
df[rowSums(df[,c("v1", ... "v8")]!=1)==0, c("v1", .... "v8")] <- NA
But I need a solution for Stata.
If I understood this correctly, you want
egen rowall = concat(v1-v8)
mvdecode v1-v8 if rowall == 8 * "1", mv(1)
That is, all instances in v1-v8 of 1 are recoded as missing if and only if the values of those variables are all 1 in any observation.

How to merger these two records ino one row removing Null value in Informatica using transformation. Please see the snapshot for scenario

enter image description here
Input-
Code value Min Max
A abc 10 null
A abc Null 20
Output-
Code value Min Max
A abc 10 20
You can use an aggregator transformation to remove nulls and get single row. I am providing solution based on your data only.
use an aggregator with below ports -
inout_Code (group by)
inout_value (group by)
in_Min
in_Max
out_Min= MAX(in_Min)
out_Max = MAX(in_Max)
And then attach out_Min, out_Max, code and value to target.
You will get 1 record for a combination of code and value and null values will be gone.
Now, if you have more than 4/5/6/more etc. code,value combinations and some of min, max columns are null and you want multiple records, you need more complex mapping logic. Let me know if this helps. :)

Increment 2 different Ids based on selected option

I'm trying to update a working solution of incrementing one ID based on several conditions, so I was using the ROW() function without any issue. But now I'm trying to increment 2 different IDs based on selected option as shown in the screenshot below, where I've started the following so far:
=ARRAYFORMULA(IF(LEN(A2:A),COUNTIFS(A2:A, A2:A, ROW(A2:A), "<="&ROW(A2:A),A2:A,"Option 2"),))
Can anyone bring some light on this scenario: thanks
Link of spreadsheet illustrating my situation: here
You have to first check if the value is Option 1/Option 2 or not. A way to do this without using OR (which can't be iterated over an array) is this:
IF(A2:A="Option 1",0,1)*IF(A2:A="Option 2",0,1)
Next, you can wrap this into another IF so that the returned value depends on whether the previous condition is true. So, if option is not 1 nor 2, the corresponding value should result from the count of all the previous values which are not 1 or 2. So the COUNTIFS should check if the option is not 1 nor 2. Something like this:
29999 + COUNTIFS(A2:A,"<>Option 1",A2:A,"<>Option 2",ROW(A2:A), "<="&ROW(A2:A))
Finally, if the option is 1 or 2, the returned value should result from hte count of all previous 1 and 2 values. Since that's an OR condition, you have to sum two different COUNTIFS, one for option 1 and one for 2. Could be like this:
9999 + COUNTIFS(A2:A,"=Option 1",ROW(A2:A), "<="&ROW(A2:A)) + COUNTIFS(A2:A,"=Option 2",ROW(A2:A), "<="&ROW(A2:A))
Putting it all together, it could be like this:
=ARRAYFORMULA(IF(LEN(A2:A),IF(IF(A2:A="Option 1",0,1)*IF(A2:A="Option 2",0,1),
29999 + COUNTIFS(A2:A,"<>Option 1",A2:A,"<>Option 2",ROW(A2:A), "<="&ROW(A2:A)),
9999 + COUNTIFS(A2:A,"=Option 1",ROW(A2:A), "<="&ROW(A2:A)) + COUNTIFS(A2:A,"=Option 2",ROW(A2:A), "<="&ROW(A2:A))),""))
slight alternative:
=ARRAYFORMULA(IF(A2:A="",,IF(REGEXMATCH(A2:A, H2&"$|"&H3&"$"),
9999+COUNTIFS(REGEXMATCH(A2:A, H2&"$|"&H3&"$"),
REGEXMATCH(A2:A, H2&"$|"&H3&"$"), ROW(A2:A), "<="&ROW(A2:A)),
29999+COUNTIFS(A2:A, "<>"&H2, A2:A, "<>"&H3, ROW(A2:A), "<="&ROW(A2:A)))))

Make =IF Function Output Numbers For "Scoring": Google Sheets

I'm am exploring methods of giving scores to different datapoints within a dataset. These points come from a mix of numbers and text string attributes looking for certain characteristics, e.g. if Col. A contains more than X number of "|", then give it a 1. If not, it gets a 0 for that category. I also have some that give the point when the value is >X.
I have been trying to do this with =IF, for example, =IF([sheet] = [Text], "1","0").
I can get it to give me 1 or 0, but I am unable to get a point total with sum.
I have tried changing the formatting of the text to both "number", "plain text", and have left it as automatic, but I can't get it to sum. Thoughts? Is there maybe a better way to do this?
FWIW - I'm trying to score based on about 12 factors.
Best,
Alex
The issue here might be that you're having the cell evaluate to either the string "0" or the string "1" rather than the number 0 or the number 1. That would explain why you're seeing the right things but the math isn't coming out right - the cell contents look like numbers, but they're really text, which the summation would then ignore.
One option would be to drop the quotation marks and write something like this:
=IF(condition, 1, 0)
This has the condition evaluate to 1 if it's true and 0 if it's false.
Alternatively, you could write something like this:
=(condition) * 1
This will take the boolean TRUE or FALSE returned by condition and convert it to either the numeric value 1 (true) or the numeric value 0 (false).

Filtering on the count with the Django ORM

I have a query that's basically "count all the items of type X, and return the items that exist more than once, along with their counts". Right now I have this:
Item.objects.annotate(type_count=models.Count("type")).filter(type_count__gt=1).order_by("-type_count")
but it returns nothing (the count is 1 for all items). What am I doing wrong?
Ideally, it should get the following:
Type
----
1
1
2
3
3
3
and return:
Type, Count
-----------
1 2
3 3
In order to count the number of occurrences of each type, you have to group by the type field. In Django this is done by using values to get just that field. So, this should work:
Item.objects.values('group').annotate(
type_count=models.Count("type")
).filter(type_count__gt=1).order_by("-type_count")
It's logical error ;)
type_count__gt=1 means type_count > 1 so if the count == 1 it won't be displayed :)
use type_count__gte=1 instead - it means type_count >= 1 :)