Extracting String Portions in SQL using Regular expressions - regex

Hi All,
I have a query related to Regular expressions in SQL.
I have a case where a portion of string has to be extracted from a column. The portion of that column will be prefixed with my column A. Please see the screenshot for the sample data. I have also added the output expected in a separate column (highlighted in green).
Scenarios:
Now if a column value has more than 1 unique number then that has to be shown up with Null
Eg: To verify CAN06010025, CAN06010026 & CAN06010030 after the approval.
In the above string I have more than 1 number(bold portion)
and this case should be ignored (meaning it has to give me Null Value).
If there is only one number and if it is repetitive then I have to consider that case and extract the portion of String..
Eg: Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
In this example, the portion I wanted to extract is repetitive and I am looking to extract the highlighted portion alone.
The same applies to the other cases as well.
I tried with the below sql. The challenge is my Col A can also be present in Col B (Line 2 in screenshot) and this code is considering my Col A portion when I count with REGEXP_COUNT function and is giving me the value as Null. My expectation is to extract that USA12S001 portion from the column.
Could you please help in achieving this where the above two conditions satisfies.
SQL:
SELECT
ColA,
ColB,
case when REGEXP_COUNT(ColB,ColA) >2 THEN NULL
ELSE REPLACE(REPLACE(concat(regexp_substr(ColB,ColA||'([[:alnum:]]+\.?)'),
nvl(regexp_substr(ColB,ColA||'(\-[[:digit:]]+)'),
regexp_substr(ColB,ColA||'([[:space:]]\-[[:space:]][[:digit:]]+)'))),
' ',''),'.','')
END AS Result
FROM
table
Test Data:
Col A
CAN06
USA12
USA27
HUN04
CAN05
USA24
CAN06
Col B
to verify CAN06010025, CAN06010026 & CAN06010030 after the approval
Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
Project USA27: Id: USA27S001: Prod
To review id HUN04S002-HUN04S004 after the due date.
ID: CAN05S005 with the details as CAN05S005 are completed.
Project USA24: Id: USA24S009: Data Issue
"Project: Subject CAN06S009: V2 & V3- Id CAN06S010: V1"

If the REGEXP_COUNT is the only issue, then the answer is simple: change
case when REGEXP_COUNT(ColB,ColA) >2
to:
case when REGEXP_COUNT(ColB,ColA || '[[:alnum:]]') >2

Related

Copy and Filter out selected fields from 2 tabs in the Same Gsheet to another Tab with common category Fraud = 'Yes'

Copy and Filter out selected fields from 2 tabs in the Same Gsheet to another Tab with Fraud = 'Yes'
I have 2 sets Tabs in the Same Gsheet with different information. I would like to copy them into another tab with Fraud ="Yes". I have an example with formulas in Combine Example from Stall A Example and Stall B Example with some help previously. When I tried to replicated into my actual data (Combine NSU and ACH ) set I can't do it.
Can someone please help and guide on this.
https://docs.google.com/spreadsheets/d/1N35wUB-a7hDHFTzdhajlaCTJf_Ce34Ql3Miwq51JCqY/edit?usp=sharing
Whenever it is Fraud = "Yes", it extracts the necessary information from NSU Tab and ACH Tab with has Fraud = Yes into Combine NSU and ACH Tab
#=SORT(
LAMBDA(DATA,
LAMBDA(DATE,TYPE,AMOUNT,FRAUD,ID,ERP
FILTER({TEXT(DATE,"dd/mm/yyyy"),TYPE,AMOUNT,FRAUD,ID,ERP},FRAUD="YES")
)(INDEX(DATA,,1),INDEX(DATA,,2),INDEX(DATA,,8),INDEX(DATA,,13),INDEX(DATA,,15),INDEX(DATA,,16))
)({'NSU'!A2:P27;{'ACH'!A2:A8, 'ACH'!B2:B8,'ACH'!N2:N8,'ACH'!L2:L8,'ACH'!F2:F8,'ACH'!H2:H8 }})
,1,TRUE)
Code:
Output:
Get all data of sheet NSU and ASH, and re-arrange their orders by QUERY.
Group them up with another QUERY to filter the data you want such as Col4 = 'Yes'.
All the formating and sort can also be done with query.
=ArrayFormula(
LAMBDA(NSU,ACH,
QUERY({NSU;ACH},
" WHERE Col1 IS NOT NULL "
&" AND Col4 = 'Yes' "
&" ORDER BY Col1 ASC"
&" LABEL Col1 'Date',Col2 'Type',Col3 'Amount',Col4 'Fraud',Col5 'ID',Col6 'ERP' "
&" FORMAT Col1 'yyyy-mm-dd' "
)
)(
LAMBDA(COLS,
QUERY({NSU!$A:$P},
" SELECT "&JOIN(",","Col"&COLS)
&" LABEL "&JOIN(",","Col"&COLS&" '"&REPT(" ",COLS)&"'"),1)
)({1,2,8,13,15,16}),
LAMBDA(COLS,
QUERY({ACH!$A:$N},
" SELECT "&JOIN(",","Col"&COLS)
&" LABEL "&JOIN(",","Col"&COLS&" '"&REPT(" ",COLS)&"'"),1)
)({1,2,14,12,6,8})
)
)
Well, in-case you really want to have the ability to add some input fields in-between the output QUERY.
The answer is you can't, but also you can.
Basically, you cannot insert anything in-between any kind of array outputs in google spreadsheets, the array-formula will return an error '#ref' mentioning that there are other values inside the output range which fail it from showing the output, BUT...
you can always get around this issue by simply seperate the output array by an other QUERY, such as:
This simple array in google sheet will reference range A1:C10 and place the data into where-ever you type this formula into, which output a 10 rows by 3 columns array.
={A1:C10}
We assumne that you put this in cell 'E1', that makes the output covers range 'E1:G10'.
If you what to have a column of fields which allow you to input new data in-between the output range, for example, you want to add a new column in F:F.
In that case, you can put 2 formula seperatly into cell 'E1' and cell 'G1', which contains the following formulas:
in cell 'E1':
=QUERY({A1:C10},"SELECT Col1")
in cell 'G1':
=QUERY({A1:C10},"SELECT Col2,Col3")
Since the output arrays are seperated, that makes the column(s) between the column E and column G a normal empty column, which allows you to input anything into the cells.
The beauty of QUERY function is that you can select any column(s) of a given reference array as an output.
In this case the output data will be seperated into 2 parts, one contains only the first column of the reference, the second one contains the rest of them.
It doesn't matter how many empty columns you insert between the 2 outputs since they are 2 outputs of one identical reference.
The draw back is, if your reference data are results of calculations, the sheet will have to do all those calculations two times, even if they gives the same results, which is why I say this can and will slow things down and are not very recommended.

Google Data Studio Calculated Field by Extracting String from Event Label Values

I'm trying to use the CASE statement to output string values for an Event Label field using RegEx to produce a table that shows the number of events for each field value. So, if I'm looking for foobar, and other string values separately, within values for Event Label; it may either stand alone or be part of a URL like so:
|[object HTMLLabelElement] | Foobar |
/images/foobar-26.svg
It seems REGEXP_EXTRACT might suit this the best:
CASE WHEN REGEXP_EXTRACT(Event Label, '.(?i)foobar.') THEN Foobar
However, the table produced using the calculated field as the dimension only contains a blank row that seems to be the sum of the number of events.
What am I missing?
I think you need to use REGEXP_MATCH not REGEXP_EXTRACT, given your existing syntax, or to change the syntax to a straight REGEXP_EXTRACT without the CASE element.

How do you escape a column name in Google visualisation API query language?

I have a Google sheet which generates an error in the following expression:
=query(Capacity!A5:FE135,"SELECT C,A WHERE "&SUBSTITUTE(ADDRESS(1,match(D2,Capacity!A1:FE1,0)+2,4),"1","")&" = '"&C2&"' AND "&SUBSTITUTE(ADDRESS(1,match(D2,Capacity!A1:FE1,0),4),"1","")&" = 1 ORDER BY C")
for a single, specific input value (a date) at D2.
Essentially, the purpose of the code is to find the column location of the date at D2 in a second sheet (Capacity) and put the values of that column in that sheet into column C in the current sheet, while also selecting only rows that match on a second column. When the date is set to a specific value, however, the expression will not evaluate.
On breaking this massive expression down into its component parts, it turns out the problem is caused by this expression:
=SUBSTITUTE(ADDRESS(1,match(D2,Capacity!A1:FE1,0)+2,4),"1","")
which, for the offending date, is returning column BY.
This means the expression being evaluated in Google Visualization API query language is:
SELECT C,A WHERE BY = '' AND BW = 1 ORDER BY C
but the query language sees BY as a reserved word, not a column, and barfs.
How can I escape the column name somehow to make it clear that it is to be considered a column name?
The way is to surround the offending portion with back-quotes (as I used to make text monospaced here):
=query(Capacity!A5:FE135,"SELECT C,A WHERE `"&SUBSTITUTE(ADDRESS(1,match(D2,Capacity!A1:FE1,0)+2,4),"1","")&"` = '"&C2&"' AND `"&SUBSTITUTE(ADDRESS(1,match(D2,Capacity!A1:FE1,0),4),"1","")&"` = 1 ORDER BY C")
so the query will look like
SELECT C,A WHERE `BY` = '' AND `BW` = 1 ORDER BY C
I assume this will help when the sheet grows so big that we're on column IF as well.

Fuzzy match on google sheets

I'm trying to fuzzy match two columns in google sheets, i've tried numerous formulas but I think it's going to come down to a script to help out.
I have a column with product ID's e.g.
E20067
and then I have another sheet with another column which has image url's relating to this product code such as
http://wholesale.test.com/product/E20067/web_images/E20067.jpg
http://wholesale.test.com/product/E20067/high_res/E20067.jpg
http://wholesale.test.com/product/E20067/high_res/E20067-2.jpg
What I'm wanting to do is "fuzzy" match both of these columns for their product ID, and then create a new column for each match. So it would have the product ID then on the same row in multiple columns each product image URL - like the image below:
Is there a way to do this in google sheets using a script or a formula?
In Google sheets there are a few powerful 'regex' formulas.
Suppose, you have ID list in column A, and URL list in column B
Then use formula:
=REGEXEXTRACT(B1,JOIN("|",$A$1:$A$3))
It will match one of ID's. Drag the formula down to see the result as in picture above.
See more info here
Old thread but, in case you find yourself here, search for my Google Sheets add-on called Flookup. It should do exactly what you want.
For this case, you can use this function:
Flookup (lookupValue, tableArray, lookupCol, indexNum, threshold, [rank], [range])
The parameter details are:
lookupValue: the value you're looking up
tableArray: the table you want to search
lookupCol: the column you want to search
indexNum: the column you want data to be returned from
threshold: the percentage similarity below which data shouldn't be returned
rank: the nth best match (i.e. if the first one isn't to your liking)
range: choose to return the percentage similarity or row number for each match
You can find out more at the official website (examples and such).
Please note that, whereas the OP appears to want the whole list of possible matches, Flookup will only return one result at a time.
Flookup can now return a list of all possible matches through its LRM mode.
Try the following. I am assuming the product codes are in Sheet1 and the URLs are in Sheet2. Both in column A:
=iferror(transpose(FILTER(Sheet2!$A$2:$A,Search("*"& A2 &"*",Sheet2!$A$2:$A))))
Copy down.
If you want to show the image instead of the url try:
=arrayformula(image(iferror(transpose(FILTER(Sheet2!$A$2:$A,Search("*"& A2 &"*",Sheet2!$A$2:$A))))))

Compare column value against list of regex values stored in another table and update accordingly

I am new to Oracle programming.
I want to check the "msg" value of "Table1" against the "regex" values from "Table2".
If the regular expression matches as such, I want to update the respective "regex_id" in "Table1".
Usual query: SELECT 'match found' FROM DUAL WHERE REGEXP_LIKE('s 27', '^(s27|s 27)')
Table1
MSG REG_EXID
Ss27 ?
s27 ?
s28 ?
s29 ?
Table2
REGEX REG_EXID RELEVANCE
^(s27|s 27) 1 10
^(s29|s 29) 2 2
^(m28|m 28) 3 2
^(s27|s 27) 4 100
Taking the newly added "relevance" into account, with Oracle 11g you could try along
UPDATE Table1 T1
SET T1.reg_exID =
(SELECT DISTINCT
MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex)
FROM Table2
WHERE REGEXP_LIKE(T1.msg, regex)
)
;
See SQL Fiddle.
You could work along
UPDATE Table1
SET reg_exID = (SELECT reg_exID FROM Table2 WHERE REGEXP_LIKE(Table1.msg, regex));
Please keep in mind:
None of your current sample records will be updated as REGEX are case sensitive.
The above UPDATE will fail, if more than a single REGEX does match.
You could rewrite the current REGEX expressions along "^m ?28".
See it in action: SQL Fiddle (With some data added to actually show the effect.)
Please comment if and as clarification/adjustment is required.