Here is my data base:
Name| 1st | 2nd | 3rd | 4th | 5th
Ann | five | five | four | five | one
Tom | four | one | four | five | four
and what I want to do is to create columns that would contain the number of occurrences in a row, so in this case what I want to achieve:
Name| 1st | 2nd | 3rd | 4th | 5th | Five| Four | One
Ann | five | five | four | five | one | 3 | 1 | 1
Tom | four | one | four | five | four | 1 | 3 | 1
Ideally, you want to unpivot your data so that it looks like this:
Name | Number | Value
-----|--------|------
Ann | 1st | five
Ann | 2nd | five
Ann | 3rd | four
Ann | 4th | five
Ann | 5th | one
Tom | 1st | four
Tom | 2nd | one
Tom | 3rd | four
Tom | 4th | five
Tom | 5th | four
Then you could easily create a matrix visual like this by putting Name on the rows, Value on the columns, and the count of Number in the values field.
I don't recommend it, but if you need to keep it in your current layout, then your calculated columns could be written like:
Five = (TableName[1st] = "five") + (TableName[2nd] = "five") + (TableName[3rd] = "five") +
(TableName[4th] = "five") + (TableName[5th] = "five")
The Four and One column formulas would be analogous.
I have the similar issue here, but my dataframe is not fixed in terms of columns, I mean, for next refresh of database the number of columns may change get bigger (more columns) or smallest (less columns), perhaps the name of column header changes too... In this case, I can't pass the columns name or index for the counter look for, the formula needs look into the entire row no matter how many columns or name of it..
Related
I have a table like this in spreadsheet A:
| title| total |
|----- |--------|
| X1 | 2 |
| Y | 3 |
| Z | 4 |
| X2 | 5 |
Since this spreadsheet A is constantly updated and is using other formulas, I need to export it to another sheet to work on.
I also need to sum the Total column if the Title column match a condition such as Regexp.
Result should be as:
| title| total |
|----- |--------|
| X | 7 |
| Y | 3 |
| Z | 4 |
Please advise on this case, I've been studying query with sumif formula but it does not support sum when condition is not matched.
Thanks in advance.
You can try SUMIFS() with wildcard option. Use below formula-
=SUMIFS($B$2:$B$5,$A$2:$A$5,D2 & "*")
after you allow access try:
=INDEX(QUERY({REGEXREPLACE(
IMPORTRANGE("id", "sheetname!A2:A"), "\d+$", ),
IMPORTRANGE("id", "sheetname!B2:B")},
"select Col1,sum(Col2)
where Col1 is not null
group by Col1
label sum(Col2)''"))
I have to find a pattern for matching only the URL's of a list like this:
one | two | three | four | http://www.site/whatever-the-site-uses
one | two | three | four | http://www.site/whatever-the-site-uses
one | two | three | four | http://www.site/whatever-the-site-uses
i need to grab the whole http or https inside a SINGLE GROUP but I can't get a good pattern for it
Can someone help me? So far, I have \|(.*?)/$, but the result is something like
| two | three | four | http://www.site/whatever-the-site-uses
Did I understand you right? From the last "| https://" capture the group staring with https? to the end of the line
Is this it?
/\|\ (https?\:\/\/.*)$/
I have a dataframe in spark, something like this:
ID | Column
------ | ----
1 | STRINGOFLETTERS
2 | SOMEOTHERCHARACTERS
3 | ANOTHERSTRING
4 | EXAMPLEEXAMPLE
What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:
ID | New Column
------ | ------
1 | STRIN_F
2 | SOMEO_E
3 | ANOTH_S
4 | EXAMP_E
I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:
import pyspark
split_col = pyspark.sql.functions.split(DF['column'], ' ')
newDF = DF.withColumn('new_column', split_col.getItem(0))
Thanks all!
Use something like this:
df.withColumn('new_column', concat(df.Column.substr(1, 5),
lit('_'),
df.Column.substr(8, 1)))
This use the function substr and concat
Those functions will solve your problem.
I have a data set with first name, middle name, and last name. I'm going to merge it with another data set matching on the same variables.
In one data set the variable mi looks like:
Lowell
Ann
Carl
A
Fran
Allen
And I want it to look like:
L
A
C
A
F
A
I tried this:
gen mi2 = substr(mi, 2, length(mi))
but this does the opposite of what I want but it's the closest that I've been able to do. I know this is probably a really easy problem but I'm stumped at the moment.
You are on the right track with substr. See the example below:
clear
input str10 mi
Lowell
Ann
Carl
A
Fran
Allen
end
gen mi2 = substr(mi,1,1)
list, sep(0)
+--------------+
| mi mi2 |
|--------------|
1. | Lowell L |
2. | Ann A |
3. | Carl C |
4. | A A |
5. | Fran F |
6. | Allen A |
+--------------+
The second and third arguments to substr are the starting position and number of characters respectively. In this case, you want to start at the first character, and take one character, so substr(mi, 1, 1) is what you need.
I have a command line which looks for certain IDs (2 IDs )in 2nd column. But I want this command to search all the columns, not just second column.
Can anyone help?
The command line for searching 2nd column is:
findstr /rb /c:"[^|]*| *ID1 *|" /c:"[^|]*| *ID2 *|" "src.txt" >" dest.txt"
Can someone modify it so that it searches all the columns instead of just the second and also give 2 command lines which will:
(1) Searches all the columns instead of just 2nd.
(2) Searches only for 1 ID.
(3) Searches only for 3 IDs.
src.txt -
The text is in this manner:
Ja | 11 | xxx
Jn | 19 | yyy
Jx | 21 | yyyas | sas
Also few lines may have more columns like that last one.
Thanks!
To find in src.txt containing the lines
Ja | 11 | xxx
Jn | 19 | yyy
nJ | 19 | yyy
Ax | 21 | Jyyas | sas
Ax | 23 | yyJas | sas
only the 3 lines where a value within a column starts with J and therefore writting to file dest.txt the lines
Ja | 11 | xxx
Jn | 19 | yyy
Ax | 21 | Jyyas | sas
the following command can be used
findstr /R /C:"^J" /C:"\| *J" "src.txt" >"dest.txt"
^J is for finding lines starting with J and \| *J is for finding lines having a value starting with J after 0 or more spaces in a different column than first column.
Please note that parameter /B is removed as otherwise this would not work.
/rb in your example is /R an /B combined in one parameter string.