Regex to extract shoe size from string column - regex

I have a database with string column product_name which has data like:
Vans Classic Slip-On Black & White Checkerboard/ White - veľkosť (US) : 6 (EUR: 38)
Vans Old Skool - čierna - veľkosť (US) : 9.5 (EUR: 42.5)
I am trying to extract the US size...
SELECT REGEXP_SUBSTR("product_name", ...) AS "size"
...with desired output like this.
size
6
9.5
I have tried this, but to no avail
SELECT REGEXP_SUBSTR("product_name", '(US)(\d+)') AS "size"

I need to agree with B001, this might not be the best way of saving your information. However, if you are sure your strings are going to have this format, you could use this regex
\(US\) ?: ?(\d+\.?\d*) \(EUR: ?(\d+\.?\d*)\)
This will match the US shoe size first and then the EUR one.
Here is a visual explaination of the regex
Please note that this regex will match BOTH sizes, I'm not sure which one you prefer
You can test more cases in this regex101

When working in the web UI I had to double slash my slashes. Thus the following worked as you want.
select REGEXP_SUBSTR(str, '\\(US\\)\\s\\:\\s(\\d+\\.?\\d*)',1,1,'i',1)
from values ('Vans Classic Slip-On Black & White Checkerboard/ White - veľkosť (US) : 6 (EUR: 38)'),
('Vans Old Skool - čierna - veľkosť (US) : 9.5 (EUR: 42.5)') v(str);
gives:
REGEXP_SUBSTR(STR, '\\(US\\)\\S\\:\\S(\\D+\\.?\\D*)',1,1,'I',1)
6
9.5

Related

Getting value between '-' in google sheets

Im trying to get the number between '-' and '-' in google sheets but after trying many things I still havent been able to find the solution.
Data record 1
England Premier League
West Ham vs Crystal Palace
2.090 - 3.47 - 3.770
Expected value = 3.47
Data record 2
England League Two
Carlisle vs Scunthorpe
2.830 - 3.15 - 2.820
Expected value = 3.15
Hopefully someone can help me out
Try either of the following
option 1.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4," \d+\.\d+ ")*1))
option 2.
=INDEX(IFERROR(REGEXEXTRACT(AE1:AE4,".* - (\d+\.\d+) ")))
(Do adjust the formula according to your ranges and locale)
use:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "- (\d+(?:.\d+)?) -")*1))

How to find the difference in hours between two dates dd/mm/yyyy hh:mm

I have two dates in cells
A1=05.11.2021 18:16
B1=05.11.2021 20:16
I need to find difference in hours between two dates. Result should be (B1-A1)=2 I can't find an answer on the Internet, I ask for help.
use:
=TEXT((DATE(
REGEXEXTRACT(B1, "\d{4}"),
REGEXEXTRACT(B1, "\.(\d+)\."),
REGEXEXTRACT(B1, "^\d+"))+INDEX(SPLIT(B1, " "),,2))-(DATE(
REGEXEXTRACT(A1, "\d{4}"),
REGEXEXTRACT(A1, "\.(\d+)\."),
REGEXEXTRACT(A1, "^\d+"))+INDEX(SPLIT(A1, " "),,2)), "[h]")
arrayformula:
=INDEX(IFNA(TEXT((DATE(
REGEXEXTRACT(B1:B, "\d{4}"),
REGEXEXTRACT(B1:B, "\.(\d+)\."),
REGEXEXTRACT(B1:B, "^\d+"))+INDEX(SPLIT(B1:B, " "),,2))-(DATE(
REGEXEXTRACT(A1:A, "\d{4}"),
REGEXEXTRACT(A1:A, "\.(\d+)\."),
REGEXEXTRACT(A1:A, "^\d+"))+INDEX(SPLIT(A1:A, " "),,2)), "[h]")))
shorter:
=INDEX(IFERROR(1/(1/(TEXT(
REGEXREPLACE(B1:B, "(\d+).(\d+).(\d{4})", "$2/$1/$3")-
REGEXREPLACE(A1:A, "(\d+).(\d+).(\d{4})", "$2/$1/$3"), "[h]")))))
EDIT:
As what #basic mentioned in the above comment, you can format the cell where your output goes or use text with h for hour difference and [h] for the whole duration in hours (got from Cooper's answer). See usage and difference below:
Text:
=text(B1-A1, "h")
or
=text(B1-A1, "[h]")
Update:
Make sure your Date Times uses proper delimiters. / and - are acceptable (e.g. 5/11/2021 18:16:00 or 5-11-2021 18:16:00). (This depends entirely on your locale.)
If you want to show it having . as delimiter, just use a custom Date Time format and use . as its delimiter.
Using custom format:
Actual value vs Display value:
If you don't want to do any changes to the date time and want to have it as text, then replace them using regexreplace before using them in text.
RegexReplace:
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "h")
or
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "[h]")

How to format first 7 rows in this txt file using Regex

I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)

Get a string after a specific word, using a program that has limited regex features?

Looking for help on building a regex that captures a 1-line string after a specific word.
The challenge I'm running into is that the program where I need to build this regex uses a single line format, in other words dot matches new line. So the formula I created isn't working. See more details below. Any advice or tips?
More specific regex task:
I'm trying to grab the line that comes after the word Details from entries like below. The goal is pull out 100% Silk, or 100% Velvet. This is the material of the product that always comes after Details.
Raw data:
<p>Loose fitted blouse green/yellow lily print.
V-neck opening with a closure string.
Small tie string on left side of top.</p>
<h3>Details</h3> <p>100% Silk.</p>
<p>Made in Portugal.</p> <h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p> <p>Size 34 measurements</p>
OR
<p>The velvet version of this dress. High waist fit with hook and zipper closure.
Seams run along edges of pants to create a box-like.</p>
<h3>Details</h3> <p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
Here is the current formula I created that's not working:
Replace (.)(\bDetails\s+(.)) with $3
The output gives the below:
<p>100% Silk.</p>
<p>Made in Portugal.</p>
<h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p>
<p>Size 34 measurements</p>
OR
<p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
`
How do I capture just the desired string? Let me know if you have any tips! Thank you!
Difficult to provide a working solution in your situation as you mention your program has "limited regex features" but don't explain what limitations.
Here is a Regex you can try to work with to capture the target string
^(?:<h3>Details<\/h3>)(.*)$
I would personally use BeautifulSoup for something like this, but here are two solutions you could use:
Match the line after "Details", then pull out the data.
matches = re.findall('(?<=Details<).*$', text)
matches = [i.strip('<>') for i in matches]
matches = [i.split('<')[0] for i in [j.split('>')[-1] for j in matches]]
Replace "Details<...>data" with "Detailsdata", then find the data.
text = re.sub('Details<.*?<.*>', '', text)
matches = re.findall('(?<=Details).*?(?=<)', text)

Data Preperation Identify String using Regex and move to new column

Hello I am using Talend to prepare product data for import into DB. I want to use the extract string parts function for Talend.
I have the following data in one cell. (The length of the data varies not a fixed width format)
Measurement: Ring Head Width: 6.8 Ring Height: 5.5 Ring Shank Width: 1.1 Ladies Band Width: 2.5 Ladies band shank Width: 1.2
I need help creating a regex format to match each measurement value and extract it to a new column.
What would be Regex to match the following text ?
Ring Head Width: 6.8
and extract the numeric value following it, which is
6.8
Similarly I want to create regex for all the above measurements. I am assuming the format will be the same.
Thank for your time and help.
If you don't bother using multiple actions to acheive this result I suggest that you use:
the "Split text in parts" action on ":"
and then use "remove whitespaces" to have a clean value.
If you really need to keep one action, you have the "Remove part of the text" action on regex that is based on the java Pattern.
Using regex ".*:\s" works fine