I have x as data.frame. with multiple column such as Q1, Q2,....
I have tried to get 1 and 0 for Right and wrong answer. but my code is not workig.
any sugugtion which I can Write all of them and get them in new table. with 1 and 0.
x<-read.csv("data.csv")
unlist(x)
summary(x)
x<-ifelse(x$Q1=="120",1,0)
Related
I have a large data set in Stata.
There are several item batteries in this data set.
One item battery consists of 8 items (v1 - v8), each scaled from 1 to 7.
I want to code all items that take the value 1 in all items as missing values.
If v1 to v8 have the value "1", all rows to which this applies are to be replaced with missings.
I know how to code missing values with the if qualifier, but the selection with the complex condition causes me difficulties.
The code for R would probably solve this via rowSums, but I need the solution for Stata.
(I assume in R it would work like this:
df[rowSums(df[,c("v1", ... "v8")]!=1)==0, c("v1", .... "v8")] <- NA
But I need a solution for Stata.
If I understood this correctly, you want
egen rowall = concat(v1-v8)
mvdecode v1-v8 if rowall == 8 * "1", mv(1)
That is, all instances in v1-v8 of 1 are recoded as missing if and only if the values of those variables are all 1 in any observation.
I have got a data frame like this:
ID A B
1 x5.11 2,34
2 x5.57 5,36
3 x6,13 0,45
I would like to remove the 'x' of all values of the column A. How might I best accomplish this in R.
Thanks!
I have found a very easy way:
data.frama$A <- gsub("x", "", data.frame$A)
I'm am exploring methods of giving scores to different datapoints within a dataset. These points come from a mix of numbers and text string attributes looking for certain characteristics, e.g. if Col. A contains more than X number of "|", then give it a 1. If not, it gets a 0 for that category. I also have some that give the point when the value is >X.
I have been trying to do this with =IF, for example, =IF([sheet] = [Text], "1","0").
I can get it to give me 1 or 0, but I am unable to get a point total with sum.
I have tried changing the formatting of the text to both "number", "plain text", and have left it as automatic, but I can't get it to sum. Thoughts? Is there maybe a better way to do this?
FWIW - I'm trying to score based on about 12 factors.
Best,
Alex
The issue here might be that you're having the cell evaluate to either the string "0" or the string "1" rather than the number 0 or the number 1. That would explain why you're seeing the right things but the math isn't coming out right - the cell contents look like numbers, but they're really text, which the summation would then ignore.
One option would be to drop the quotation marks and write something like this:
=IF(condition, 1, 0)
This has the condition evaluate to 1 if it's true and 0 if it's false.
Alternatively, you could write something like this:
=(condition) * 1
This will take the boolean TRUE or FALSE returned by condition and convert it to either the numeric value 1 (true) or the numeric value 0 (false).
I am working on scraping a table that has major and minor column names. When I do this, the table comes in having read both the column names and column groups, so the column names are misaligned in the dataframe like so (simplified):
unnamed1 unnamed2 unnamed3 Year Passing Rushing Receiving
2015 NA 200 60 NA NA NA
2014 NA 180 70 NA NA NA
My challenge is in shifting the column names so that 'Year' aligns over '2015' and so forth. The problem is then that the number of columns to shift does not remain constant from table to table (this is only one of many). My code at the moment looks like the following:
table1=read_html('http://www.pro-football-reference.com/players/T/TyexWi00.htm')
df=table1[0]
to_shift=len(df.dropna(how='all', axis=1).columns) #Number of empty columns to shift by
df2=df.dropna(how='all',axis=1) #Drop the empty columns
df2.columns=df.columns[-to_shift:] #Shift all columns left by the number i've found
The problem is that for a player that has none of one stat (passing in this simple example), there are completely blank columns in the middle of the dataframe as well as at the right end, so that the code shifts too far. Is there a clean way of counting the columns from right to left until one is not completely empty?
Much thanks, and I hope my question is clear!
Is there a clean way of counting the columns from right to left until one is not completely empty?
from itertools import takewhile
len(df.columns) - len(list(takewhile(lambda col: df[col].isnull().all(), reversed(df.columns)))) - 1
Explanation:
takewhile returns all elements of a list (beginning at the front) until the given condition is False. When we call it on reversed(df.columns), we get all elements from the end. With df[col].isnull().all() we can check whether all entries of a column are null (a.k.a. nan). Consequently the above takewhile expression returns the suffix of columns which are completely 'empty'. By calculating total_length - bad_suffix_length - 1, we get the first index for which the condition is not satisfied.
Adding to the correct response from Michael Hoff (Thank you very much!), the code has been edited to
to_shift=len(df.columns) - len(list(takewhile(lambda col: df[col].isnull().all(), reversed(df.columns)))) #Index of origianl dataframe to keep
df2=df.drop(list(takewhile(lambda col: df[col].isnull().all(), reversed(df.columns))),axis=1) #Drop the empty right side columns
colnames=df.columns[-to_shift:]
df2.columns=colnames
Hey guys anyone know how to create number of rows based on the count value without using java transformation in informatica 9.6(For flat file).Please help me with that
You can create an auxiliary table with n rows for each possible count value between 1 and N:
1
2
2
3
3
3
...
...
N rows with the last value
...
N rows with the last value
Join this table to the source data using the n count value as the key and you will get n copies of each source row.