Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))
Related
I have two dates in cells
A1=05.11.2021 18:16
B1=05.11.2021 20:16
I need to find difference in hours between two dates. Result should be (B1-A1)=2 I can't find an answer on the Internet, I ask for help.
use:
=TEXT((DATE(
REGEXEXTRACT(B1, "\d{4}"),
REGEXEXTRACT(B1, "\.(\d+)\."),
REGEXEXTRACT(B1, "^\d+"))+INDEX(SPLIT(B1, " "),,2))-(DATE(
REGEXEXTRACT(A1, "\d{4}"),
REGEXEXTRACT(A1, "\.(\d+)\."),
REGEXEXTRACT(A1, "^\d+"))+INDEX(SPLIT(A1, " "),,2)), "[h]")
arrayformula:
=INDEX(IFNA(TEXT((DATE(
REGEXEXTRACT(B1:B, "\d{4}"),
REGEXEXTRACT(B1:B, "\.(\d+)\."),
REGEXEXTRACT(B1:B, "^\d+"))+INDEX(SPLIT(B1:B, " "),,2))-(DATE(
REGEXEXTRACT(A1:A, "\d{4}"),
REGEXEXTRACT(A1:A, "\.(\d+)\."),
REGEXEXTRACT(A1:A, "^\d+"))+INDEX(SPLIT(A1:A, " "),,2)), "[h]")))
shorter:
=INDEX(IFERROR(1/(1/(TEXT(
REGEXREPLACE(B1:B, "(\d+).(\d+).(\d{4})", "$2/$1/$3")-
REGEXREPLACE(A1:A, "(\d+).(\d+).(\d{4})", "$2/$1/$3"), "[h]")))))
EDIT:
As what #basic mentioned in the above comment, you can format the cell where your output goes or use text with h for hour difference and [h] for the whole duration in hours (got from Cooper's answer). See usage and difference below:
Text:
=text(B1-A1, "h")
or
=text(B1-A1, "[h]")
Update:
Make sure your Date Times uses proper delimiters. / and - are acceptable (e.g. 5/11/2021 18:16:00 or 5-11-2021 18:16:00). (This depends entirely on your locale.)
If you want to show it having . as delimiter, just use a custom Date Time format and use . as its delimiter.
Using custom format:
Actual value vs Display value:
If you don't want to do any changes to the date time and want to have it as text, then replace them using regexreplace before using them in text.
RegexReplace:
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "h")
or
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "[h]")
Does pandas have a built-in string matching function for exact matches and not regex? The code below for tropical_two has a slightly higher count. Documentation tells me it does a regex search.
tropical = reviews['description'].map(lambda x: "tropical" in x).sum()
print(tropical)
tropical_two = reviews['description'].str.count("tropical").sum()
print(tropical_two)
The first way is the answer key from Kaggle but something about it seems less readable and intuitive to me compared to a .str function because when I run this it returns True instead of 2 so I am a little confused about if the answer key method is actually counting all occurrences of "tropical" and not just the first.
def in_str(text):
return "tropical" in text
in_str("tropical is tropical")
First 2 lines of dataframe:
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe #kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss #vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
Notebook here, tropical code in cell #2
https://www.kaggle.com/mikexie0/exercise-summary-functions-and-maps
You may use str.count with word boundary markers to match the exact search term:
tropical_two = reviews['description'].str.count(r'\btropical\b').sum()
print(tropical_two)
There may not be the need for a separate exact API, as str.count can be used for exact matches as well.
I have a database with string column product_name which has data like:
Vans Classic Slip-On Black & White Checkerboard/ White - veľkosť (US) : 6 (EUR: 38)
Vans Old Skool - čierna - veľkosť (US) : 9.5 (EUR: 42.5)
I am trying to extract the US size...
SELECT REGEXP_SUBSTR("product_name", ...) AS "size"
...with desired output like this.
size
6
9.5
I have tried this, but to no avail
SELECT REGEXP_SUBSTR("product_name", '(US)(\d+)') AS "size"
I need to agree with B001, this might not be the best way of saving your information. However, if you are sure your strings are going to have this format, you could use this regex
\(US\) ?: ?(\d+\.?\d*) \(EUR: ?(\d+\.?\d*)\)
This will match the US shoe size first and then the EUR one.
Here is a visual explaination of the regex
Please note that this regex will match BOTH sizes, I'm not sure which one you prefer
You can test more cases in this regex101
When working in the web UI I had to double slash my slashes. Thus the following worked as you want.
select REGEXP_SUBSTR(str, '\\(US\\)\\s\\:\\s(\\d+\\.?\\d*)',1,1,'i',1)
from values ('Vans Classic Slip-On Black & White Checkerboard/ White - veľkosť (US) : 6 (EUR: 38)'),
('Vans Old Skool - čierna - veľkosť (US) : 9.5 (EUR: 42.5)') v(str);
gives:
REGEXP_SUBSTR(STR, '\\(US\\)\\S\\:\\S(\\D+\\.?\\D*)',1,1,'I',1)
6
9.5
I am working with Google Charts. I need to add a '$' before the values on the y-axis as well as the value in the bubbles.
Is there a setting for this?
take care,
lee
UPdate,
Here is the data being used by the charts:
'Month','Semi-Detached in Toronto E04','Semi-Detached in Toronto E08','Condominium Townhouse in Toronto E04','Condominium Townhouse in Toronto E08', ],
['7/2011', 4354000,15305800,6776500,495000],['8/2011', 700000,10514418,7060786,0],['9/2011', 6854800,17805400,12087300,0],['10/2011', 7287400,14248900,16206500,0],['11/2011', 2696245,9733270,12698090,0],['12/2011', 1965800,6054500,8854390,0],['1/2012', 2450968,9012200,5500100,0] ]);
I've tried adding '$' before the values as well as '%24' as suggested before, but both throw syntax errors. And the values cannot be quoted without throwing a Google Charts error '
Data column(s) for axis #0 cannot be of type string×'.
Thanks everyone for your input. I found a question that was 99.9% the same:
How to set tooltips to display percentages to match axis in Google Visualization Line Chart?
Try using %24 which is the urlencoded form for $.
Use:
var formatter = new google.visualization.NumberFormat({prefix: '$'});
Check the example here:
https://developers.google.com/chart/interactive/docs/examples#interaction_example
More details here:
https://developers.google.com/chart/interactive/docs/reference#numberformatter
I am using perl to scrape the following through .txt which I'd ultimately bring into Stata. What format option works? I have many such observations, so would like to use an approach over which I can generalize.
The original data are of the form:
First Name: Allen
Last Name: Von Schmidt
Birth Year: 1965
Location: District 1, Ocean City, Cape May, New Jersey, USA
First Name: Lee Roy
Last Name: McBride
Birth Year: 1967
Location: Precinct 5, District 2, Chicago, Cook, Illinois, USA
The goal is to create the variables in Stata:
First Name: Allen
Last Name: Von Schmidt
Birth Year: 1965
County: Cape May
State: New Jersey
First Name: Allen
Last Name: McBride
Birth Year: 1967
County: Cook
State: Illinois
What possible .txt might lead to such, and how would I load it into Stata?
Also, the amount of terms vary in Location as in these 2 examples, but I always want the 2 before USA.
At the moment, I am putting "", around each variable from the table for the .txt.
"Allen","Von Schmidt","1965","District 1, Ocean City, Cape May, New Jersey, USA"
"Lee Roy","McBride","1967","Precinct 5, District 2, Chicago, Cook, Illinois, USA"
Is there a better way to format the .txt? How would I create the corresponding variables in Stata?
Thank you for your help!
P.S. I know that stata uses infile or insheet and can handle , or tabs to separate variables. I did not know how to scrape a variable like Location in perl with all of the those so I added the ""
There are two ways to do this. The first is to paste the data into your do-file and use input. Assuming the format is fairly regular, you can clean it up easily using commas to parse. Note that I removed the commas:
#delimit;
input
str100(first_name last_name yob geo);
"Allen" "Von Schmidt" "1965" "District 1, Ocean City, Cape May, New Jersey, USA";
end;
compress;
destring, replace;
split geo, parse(,);
rename geo1 district;
rename geo2 city;
rename geo3 county;
rename geo4 state;
rename geo5 country;
drop geo;
The second way is to insheet the data from the txt file directly, which is probably easier. This assumes that the commas were not removed:
#delimit;
insheet first_name last_name yob geo using "raw_data.txt", clear comma nonames;
Then clean it up as in the first example.
This isn't a complete answer, but I need more space and flexibility than comments (easily) allow.
One trick is based on peeling off elements from the end. The easiest way to do that could be to start looking for the last comma, which is in turn the first comma in the reversed string. Use strpos(reverse(stringvar), ",").
For example the first commma is found by strpos() like this
. di strpos("abcd,efg,h", ",")
5
and the last comma like this
. di strpos(reverse("abcd,efg,h"), ",")
2
Once you know where the last comma is you can peel off the last element. If the last comma is at position # in the reversed string, it is at position -# in the string.
. di substr("abcd,efg,h", -2, 2)
,h
These examples clearly are calculator-style examples for single strings. But the last element can be stripped off similarly for entire string variables.
. gen poslastcomma = strpos(reverse(var), ",")
. gen var_end = substr(var, -poslastcomma, poslastcomma)
. gen var_begin = substr(var, 1, length(var) - poslastcomma)
Once you get used to stuff like this you can write more complicated statements with fewer variables, but slowly, slowly step by step is better when you are learning.
By the way, a common Stata learner error (in my view) is to assume that a solution to a string problem must entail the use of regular expressions. If you are very fluent at regular expressions, you can naturally do wonderful things with them, but the other string functions in conjunction can be very powerful too.
In your specific example, it sounds as if you want to ignore a last element such as "USA" and then work in turn on the next elements working backwards.
split in Stata is fine too (I am a fan and indeed am its putative author) but can be awkward if a split yields different numbers of elements, which is where I came in.