Google Sheets REGEXTRACT between two quotes - regex

I'm trying to extract data between two quotes using the Google Sheets REGEXEXTRACT function.
The regex works perfect:
(?<=actor_email":")(.*?)(?=")
Data in the cell is:
{"account_name":"Test","actor_email":"test#test.com","user_email":"anyone#test.com"}
However, placing it within the Google Sheet gives an error.
Been trying a number of combinations with no luck.
Tried using: (?<=actor_email""":""")(.*?)(?=""")
The output should be: test#test.com

You may use
=REGEXEXTRACT(A1, "actor_email"":""([^""]+)""")
The pattern is actor_email":"([^"]+)":
actor_email":" - a literal substring
([^"]+) - Capturing group 1 (the value extracted): any 1+ chars other than "
" - a " char (may be removed if this " can be missing)

or eliminate quotes like:
=REGEXEXTRACT(SUBSTITUTE(A1, """", ), "actor_email:(.+),user_")
=REGEXEXTRACT(SUBSTITUTE(A1, """", " "), "actor_email : ([^ ]+)")

Related

Regex: match, but do not include in groups

I have some objects like this:
"unit":{
"id":"1",
"title":"I am a title",
"description":" description",
"category":{
"name":"Reading",
"type":"READING",
"icon":"fas fa-book"
}
I'd like to remove the double quotes from the keys. Is there a good way to do this in regex? I use: ".*": to match the key's, but am unsure of how to do the partial replace in VSCode. I tried this solution, but was unsuccessful.
You can use
"([^"]+)":
Replace with $1:. Details:
" - a double quote
([^"]+) - Group 1 ($1 refers to the this group value): any one or more chars other than "
": - a ": string.

Extract from string in BigQuery using regexp_extract

I have a long string in BigQuery where that I need to extract out some data.
Part of the string looks like this:
... source: "agent" resolved_query: "hi" score: 0.61254 parameters ...
I want to extract out data such as agent, hi, and 0.61254.
I'm trying to use regexp_extract but I can't get the regexp to work correctly:
select
regexp_extract([col],r'score: [0-9]*\.[0-9]+') as score,
regexp_extract([col],r'source: [^"]*') as source
from [table]
What should the regexp be to just get agent or 0.61254 without the field name and no quotation marks?
Thank you in advance.
I love non-trivial approaches - below one of such -
select * except(col) from (
select col, split(kv, ': ')[offset(0)] key,
trim(split(kv, ': ')[offset(1)], '"') value,
from your_table,
unnest(regexp_extract_all(col, r'\w+: "?[\w.]+"?')) kv
)
pivot (min(value) for key in ('source', 'resolved_query', 'score'))
if applied to sample data as in your question
with your_table as (
select '... source: "agent" resolved_query: "hi" score: 0.61254 parameters ... ' col union all
select '... source: "agent2" resolved_query: "hello" score: 0.12345 parameters ... ' col
)
the output is
As you might noticed, the benefit of such approach is obvious - if you have more fields/attributes to extract - you do not need to clone the lines of code for each of attribute - you just add yet another value in last line's list - the whole code is always the same
You can use
select
regexp_extract([col],r'score:\s*(\d*\.?\d+)') as score,
regexp_extract([col],r'resolved_query:\s*"([^"]*)"') as resolved_query,
regexp_extract([col],r'source:\s*"([^"]*)"') as source
from [table]
Here,
score:\s*(\d*\.?\d+) matches score: string, then any zero or more whitespaces, and then there is a capturing group with ID=1 that captures zero or more digits, an optional . and then one or more digits
resolved_query:\s*"([^"]*)" matches a resolved_query: string, zero or more whitespaces, ", then captures into Group 1 any zero or more chars other than " and then matches a " char
source:\s*"([^"]*)" matches a source: string, zero or more whitespaces, ", then captures into Group 1 any zero or more chars other than " and then matches a " char.

Googlesheet Regex : only capture first URL from multiple image URL

Example input:
(a
data-zoom-image="https://www.example.com/media/catalog/product/cache/1/image/5f444a2891627135a18d90f22b51fc0/d/m/dm104-2.jpg"
,
data-image="https://www.example.com/media/catalog/product/cache/1/image/500x500/35f444a289162715a18d90f22b51fc0/d/m/dm104-2.jpg"
a data-zoom
image="https://www.example.com/media/catalog/product/cache/1/image/70x70/35f444a2891627135a18d90f2b51fc0/d/m/dm104-2.jpg"
a
data-zoom-image="https://www.example.com/media/catalog/product/cache/1/image/35f444a2891627135a18d9022b1fc0/m/i/mirror_disclaimer_web_16.jpg"
)
I want to capture only the first url after that part (< a data-zoom-image=")
(https://www.example.com/media/catalog/product/cache/1/image/5f444a2891627135a18d90f22b51fc0/d/m/dm104-2.jpg)
How can I do that in googlesheet using regularexpression.
Thanks in advance
You can use the REGEXEXTRACT method like this
=REGEXEXTRACT(A24, "data-zoom-image=""([^\""]*)")
Explanantion -
data-zoom-image="" -- match for exact text data-zoom-image=" (using "" to represent ")
([^\""]*) -- first capturing group, this is the result that would be displayed
[^\""] -- any character that is not " (again using "")
* -- match zero or more times

Regular Expression starting and ending with special characters

I need to extract all matches from a huge text that start with [" and end with "]. These special characters separate each record from database. I need to extract all records.
Inside this record there are letters, numbers and special characters like -, ., &, (), /, {space} or so.
I'm writing this in Office VBA.
The pattern I have come so far looks like this: .Pattern = "[[][""][a-z|A-Z|w|W]*".
With this pattern, I am able to extract the first word from each record, with the starting characters [". The count of found matches is correct.
Example of one record:
["blabla","blabla","blabla","\u00e1no","nie","\u00e1no","\u00e1no","\u00e1no","\u003Ca class=\u0022btn btn-default\u0022 href=\u0022\u0026#x2F;siea\u0026#x2F;suppliers\u0026#x2F;42\u0022\u003E\u003Ci class=\u0022fa fa-pencil\u0022\u003E\u003C\/i\u003E Upravi\u0165\u003C\/a\u003E \u003Ca class=\u0022btn btn-default\u0022 href=\u0022\u0026#x2F;siea\u0026#x2F;suppliers\u0026#x2F;form\u0026#x2F;42\u0022\u003E\u003Ci class=\u0022fa fa-file-pdf-o\u0022\u003E\u003C\/i\u003E Zmluva\u003C\/a\u003E \u003Ca class=\u0022btn btn-default\u0022 href=\u0022\u0026#x2F;siea\u0026#x2F;suppliers\u0026#x2F;crz-form\u0026#x2F;42\u0022\u003E\u003Ci class=\u0022fa fa-file-pdf-o\u0022\u003E\u003C\/i\u003E Zmluva CRZ\u003C\/a\u003E"]
The question is : How can I extract the all records starting with [" and ending with "]?
I don't necessary need the starting and ending characters, but I can clean that up later.
Thanks for help.
The easiest way is to get rid of the initial and trailing [" and "] with either Replace or Left/Right/Mid functions, and then Split with "," (in VBA, """,""").
E.g.
input = "YOUR_STRING"
input = Replace(Replace(input, """]", ""), "[""", "")
result = Split(input, """,""")
If you plan to use Regex, you can use \["[\s\S]*?"] pattern, but it is not that efficient with long inputs and may even freeze the macro if timeout issue occurs. You can unroll it as
\["[^"]*(?:"(?!])[^"]*)*"]
See the regex demo. In VBA, Pattern = "\[""[^""]*(?:""(?!])[^""]*)*""]"
Note that with this unrolled pattern, you do not even need to use the workarounds for dot matching newline issue (negated character class [^"] matches any char but ", including a newline).
Pattern details:
\[" - [" literally
[^"]* - zero or more characters other than "
(?:"(?!])[^"]*)* - zero or more sequences of
"(?!]) - " not followed with ]
[^"]* - zero or more characters other than "
"] - literal character sequence "]

Regular expression for extracting excerpt from long String

I want to extract excerpt from a long string using Regular expression
Example string: "" Is it possible that Germany, which beat Argentina 1-0 today to win the World Cup, that will end up as a loser in terms of economic growth? ""
String to search: " that "
Expected result from regex
" possible that Germany "
" rd Cup, that will end "
I want to search the desired text from the string with -9 and +9 characters from the forward and the backward of the occurence of the searched string. Search string can occur multiple times within the given string.
I am working on an iOS app
using iOS 7.
I have so far created this expression with my little knowledge about reguler expressions but not able to get desired result from that
" (.){0,9} (that) {0,9} "
Remove the spaces in your regex. If you want to capture the matched ones. Then enclose the pattern within capturing groups (ie, ()),
.{9}that.{9}
OR
(?:.{9}|.{0,9})that(?:.{9}|.{0,9})
DEMO
Make the preceding and following characters as optional to match the line which looks like that will change history
Well, in your expression you were just missing the second "." and maybe the "?" for spaces.
.{0,9} ?that ?.{0,9}
Try that.
You can add ( ) for making groups if you want. I added the "?" to make it comply with your other example:
" that will change history"