Count of occurrence of a string group by day in Google spreadsheet

Count of occurrence of a string group by day in Google spreadsheet - if-statement

I have table data in Google Spreadsheet something like this:
Date|Diet
4-Jan-2020|Coffee
4-Jan-2020|Snacks
4-Jan-2020|xyz
4-Jan-2020|Coffee
5-Jan-2020|Snacks
5-Jan-2020|abc
6-Jan-2020|Coffee
6-Jan-2020|Snacks
This table is a list of food items I had on a daily basis. I would like to get the number of times I had coffee on a daily basis. So I would like to get the output like this:
Date | No of times I had Coffee
4-Jan-2020| 2
5-Jan-2020| 0
6-Jan-2020| 1
I used this query to get the output.
=query(A1:B1425,"select A, COUNT(B) where B='Coffee' group by A")
With this query, I get the below output. Do note that I don't get those days when I didn't have coffee
4-Jan-2020| 2
6-Jan-2020| 1
So count for 5-Jan-2020 is missing because there is no string "Coffee" for that day.
How do I get the desired output including the count 0? Thank you.

try:
=ARRAYFORMULA({UNIQUE(FILTER(A1:A, A1:A<>"")),
IFNA(VLOOKUP(UNIQUE(FILTER(A1:A, A1:A<>"")),
QUERY(A1:B,
"select A,count(B)
where B='Coffee'
group by A
label count(B)''"), 2, 0))*1})
or try:
=ARRAYFORMULA(QUERY({A1:B, IF(B1:B="coffee", 1, 0)},
"select Col1,sum(Col3)
where Col1 is not null
group by Col1
label sum(Col3)''"))

you might want to change the counter into an If statement.
Something like "IF(COUNT(B) where B='Coffee' group by A">0,COUNT(B) where B='Coffee' group by A",0).
That will force the counter to have an actual value (0), even when nothing is found

Related

google sheets, splitting and stacking a paragraph

I have a 3 row by 2 column table
1Q18 hello. testing row one.
2Q18 There are about 7.5b people. That's alot.
3Q18 Last sentence. To be stacking.
I want to split each sentence then have a quarter label with it, out would be
1Q18 hello
1Q18 testing row one
2Q18 There are about 7.5b people
2Q18 That's alot
3Q18 Last sentence
3Q18 To be stacking
I can get one line to work with:
=TRANSPOSE({split(rept(A1&" ",counta(split(B1,".")))," ");split(B1,".")})
which would give me:
1Q18 hello
1Q18 testing row one
I need a formula that will let me go down 100 rows, so I can't manually repeat the formula 3 times and use {} with ;
I've also tried using the
=map(A1:A,B2:B,LAMBDA(x,y,TRANSPOSE({split(rept(x&" ",counta(split(y,".")))," ");split(y,".")})))
but get a
Error Result should be a single column.

try:
=INDEX(QUERY(SPLIT(FLATTEN(LAMBDA(x, IF(x="",,A1:A&""&x))
(SPLIT(B1:B&" ", ". ", ))), ""), "where Col2 is not null", ))

Try below formula-
=QUERY(REDUCE(,B1:B3,LAMBDA(a,x,{a;TRANSPOSE(INDEX(INDEX(A1:A,ROW(x)) & " " & SPLIT(SUBSTITUTE(x,". ",".|"),"|")))})),"offset 1",0)

Here's another formula you can try:
=ARRAYFORMULA(
QUERY(
REDUCE({0,0},
QUERY(A1:A&"❄️"&SPLIT(B1:B,". ",),
"where Col1 <> '#VALUE!'"),
LAMBDA(a,c,
{a;SPLIT(c,"❄️",,)})),
"where Col2 is not null offset 1",))

Convert a number column into a time format in Power BI

I'm looking for a way to convert a decimal number into a valid HH:mm:ss format.
I'm importing data from an SQL database.
One of the columns in my database is labelled Actual Start Time.
The values in my database are stored in the following decimal format:
73758 // which translates to 07:27:58
114436 // which translates to 11:44:36
I cannot simply convert this Actual Start Time column into a Time format in my Power BI import as it returns errors for some values, saying it doesn't recognise 73758 as a valid 'time'. It needs to have a leading zero for cases such as 73758.
To combat this, I created a new Text column with the following code to append a leading zero:
Column = FORMAT([Actual Start Time], "000000")
This returns the following results:
073758
114436
-- which is perfect. Exactly what I needed.
I now want to convert these values into a Time.
Simply changing the data type field to Time doesn't do anything, returning:
Cannot convert value '073758' of type Text to type Date.
So I created another column with the following code:
Column 2 = FORMAT(TIME(LEFT([Column], 2), MID([Column], 3, 2), RIGHT([Column], 2)), "HH:mm:ss")
To pass the values 07, 37 and 58 into a TIME format.
This returns the following:
_______________________________________
| Actual Start Date | Column | Column 2 |
|_______________________________________|
| 73758 | 073758 | 07:37:58 |
| 114436 | 114436 | 11:44:36 |
Which is what I wanted but is there any other way of doing this? I want to ideally do it in one step without creating additional columns.

You could use a variable as suggested by Aldert or you can replace Column by the format function:
Time Format = FORMAT(
TIME(
LEFT(FORMAT([Actual Start Time],"000000"),2),
MID(FORMAT([Actual Start Time],"000000"),3,2),
RIGHT([Actual Start Time],2)),
"hh:mm:ss")
Edit:
If you want to do this in Power query, you can create a customer column with the following calculation:
Time.FromText(
if Text.Length([Actual Start Time])=5 then Text.PadStart( [Actual Start Time],6,"0")
else [Actual Start Time])
Once this column is created you can drop the old column, so that you only have one time column in the data. Hope this helps.

I, on purpose show you the concept of variables so you can use this in future with more complex queries.
TimeC =
var timeStr = FORMAT([Actual Start Time], "000000")
return FORMAT(TIME(LEFT([timeStr], 2), MID([timeStr], 3, 2), RIGHT([timeStr], 2)), "HH:mm:ss")

How to add multiple Sentences (which are stored in a list) into a pandas dataframe

I would like to create an aspect analysis from user reviews. The reviews contain various aspects and therefore the reviews need to be separated into sentences. I save the data in a pandas dataframe and separate the sentences with the nltk library.
I put the separate sentences in a list that I want to format into a dataframe and connect to the original dataframe. However, I get an error. Instead of an extra column, I get 19 new columns. (the individual sentences are not stored in a cell, I think every single sentence gets their own column) I also tested itertools but I also get a wrong record.
Can someone help me to get the right format?
I would like to have a new dataframe which looks like that:
U_REVIEW | SENTENCES
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Im a Sentence. Iam another Sentence in a Row. |[u'Im a Sentence', u'Iam another Sentence in a Row.']
Here we go, next Sentence. Blub, more blubs. |[u"Here weg o, next Sentence.", u'Blub, more blubs.']
Once again, more Sentence. And some other information. The Restaurant was ok, but not awesome.|[u"Once again, more Sentence.", u'And some other information.',u’The Restaurant was ok, but not awesome.’]
That’s how my code looks like:
ta = ta[['U_REVIEW']]
Output:
U_REVIEW
Im a Sentence. Iam another Sentence in a Row.
Here we go, next Sentence. Blub, more blubs.
Once again, more Sentence. And some other information. The Restaurant was ok, but not awesome.
# the empty lists
sentences = []
ss = []
for sentence in ta['U_REVIEW']:
# seperates the review into sentence
sentence = sent_tokenize(sentence)
sentences.append(sentence)
test = itertools.chain(sentences)
#new dataframe to add the Sentences
df2 = pd.DataFrame(sentences)
#create Column
cols2 = ['REVIEW_SENTENCES']
# bring the two dataframes together
df2 = pd.DataFrame(sentences, columns=cols2)
Output of senteces:
[[u'Im a Sentence', u'Iam another Sentence in a Row.'],[u"Here weg o, next Sentence.", u'Blub, more blubs.'],[u"Once again, more Sentence.", u'And some other information.',u’The Restaurant was ok, but not awesome.’]]
Output of test:
<itertools.chain object at 0x000000001316DC18>
Output and Information of the new Dataframe df2:
AssertionError: 1 columns passed, passed data had 19 columns
U_REVIEW | 0 | 1 | 2 ...
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Im a Sentence. Iam another Sentence in a Row. |Im a Sentence |Iam another Sentence in a Row. |
Here we go, next Sentence. Blub, more blubs. |Here we go, next Sentence.|Blub, more blubs. |
Once again, more Sentence. And some other information. The Restaurant was ok, but not awesome.|Once again, more Sentence.|And some other information. |The Restaurant was ok, but not awesome.
Here is a Testset of a Dataframe:
import pandas as pd
ta = pd.DataFrame( ['Im a Sentence. Iam another Sentence in a Row','Here we go, next Sentence. Blub, more blubs.','Once again, more Sentence. And some other information. The Restaurant was ok, but not awsome.'])
ta.columns =['U_REVIEW']

try this I have done it in python 3.5 I think it should work for 2.5 also:
In [45]: df = pd.DataFrame(ta.U_REVIEW.str.split('.',expand=True).replace('',np.nan).fillna(np.nan).values.flatten()).dropna()
In [46]: df
Out[46]:
0
0 Im a Sentence
1 Iam another Sentence in a Row
4 Here we go, next Sentence
5 Blub, more blubs
8 Once again, more Sentence
9 And some other information
10 The Restaurant was ok, but not awsome
is this what you want:
ta.U_REVIEW.str.split('.',expand=True)
Out[50]:
0 1 \
0 Im a Sentence Iam another Sentence in a Row
1 Here we go, next Sentence Blub, more blubs
2 Once again, more Sentence And some other information
2 3
0 None None
1 None
2 The Restaurant was ok, but not awsome
or
In [52]: ta.U_REVIEW.str.split('.').apply(list)
Out[52]:
0 [Im a Sentence, Iam another Sentence in a Row]
1 [Here we go, next Sentence, Blub, more blubs, ]
2 [Once again, more Sentence, And some other in...
Name: U_REVIEW, dtype: object

Time Series manipulation

So I have a dataframe that I dump a time series into. The index is the date. I need to do calculations based on date.
For eg. I have {
XRT_Close
Date
2010-01-04 35.94
2010-01-05 36.17
2010-01-06 36.50
...
2015-02-07 36.60
2015-02-08 36.52 }
How would I go about doing say... Percentage change of beginning to end of the month? How would I construct a loop to cycle through the months?
Any help will be met with huge appreciation. Thank you.

First create year and month columns:
df['year'] = [x.year for x in df.index]
df['month'] = [x.month for x in df.index]
Group by them:
grouped = df.groupby(['year','month'])
Define the function you want to run on the groups:
def PChange(df):
begin = df['column_name'].iloc[0]
end = df['column_name'].iloc[-1]
return (end-begin)/(end+begin)*100
Apply the function to the groups:
grouped.apply(PChange)
Let me know if it works.

fetching multiple values from a string using regular expression

I have a table temp that have a column name "REMARKS"
Create script
Create table temp (id number,remarks varchar2(2000));
Insert script
Insert into temp values (1,'NAME =GAURAV Amount=981 Phone_number =98932324 Active Flag =Y');
Insert into temp values (2,'NAME =ROHAN Amount=984 Phone_number =98932333 Active Flag =N');
Now , i want to fetch the corresponding value of NAME ,Amount ,phone_number, active_flag from the remarks column of the table.
I thought of using regular expression ,but i am not comfortable in using it .
I tried with substr and instr to fetch the name from the remakrs column ,but if i want to fetch all four, i need to write a pl sql .Can we achieve this using Regular expression.
Can i get output(CURSOR) like
id Name Amount phone_number Active flag
------------------------------------------
1 Gaurav 981 98932324 Y
2 Rohan 984 98932333 N
-------------------------------------------
Thanks for your help

you can use something like :
SQL> select regexp_replace(remarks, '.*NAME *=([^ ]*).*', '\1') name,
2 regexp_replace(remarks, '.*Amount *=([^ ]*).*', '\1') amount,
3 regexp_replace(remarks, '.*Phone_number *=([^ ]*).*', '\1') ph_number,
4 regexp_replace(remarks, '.*Active Flag *=([^ ]*).*', '\1') flag
5 from temp;
NAME AMOUNT PH_NUMBER FLAG
-------------------- -------------------- -------------------- --------------------
GAURAV 981 98932324 Y
ROHAN 981 98932324 N

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Count of occurrence of a string group by day in Google spreadsheet - if-statement

you might want to change the counter into an If statement. Something like "IF(COUNT(B) where B='Coffee' group by A">0,COUNT(B) where B='Coffee' group by A",0). That will force the counter to have an actual value (0), even when nothing is found

Related

google sheets, splitting and stacking a paragraph

Convert a number column into a time format in Power BI

How to add multiple Sentences (which are stored in a list) into a pandas dataframe

Time Series manipulation

fetching multiple values from a string using regular expression

Categories

Resources