I have following data, how do i find 11th occurrence of ':' . I want to print/display the information after 11th occurrence of ':'.
https://www.example.com/rest/1/07/myself/urn:ads:accod:org:pki:71E4/Riken/List:abc:bcbc:hfhhf:ncnnc:shiv:hgh:bvbv:hghg:
I have tried [^] tag but its not working.
select regexp_substr(id,'[:]{5}?.*') from tempnew;
regexp_substr does not care about capture-groups, so counting characters not included in the match is not possible. Counting from the end would work though:
-- Returns the substring after the 6th ':' from the end.
select regexp_substr(id, '([^:]*:){5}[^:]*$') from tempnew
-- If the string does not contain 5 ':', an empty string is returned.
If you need to count from the start, you could use regexp_replace instead:
-- Returns the substring after the 11th ':'
select regexp_replace(id, '^([^:]*:){11}') from tempnew
-- If the string does not contain 11 ':', the whole string is returned.
see this demo https://regex101.com/r/wR9aU3/1
/^(?:[^:]*\:){11}(.*)$/
or
/^(?:.+\:){11}(.+)$/gm
https://regex101.com/r/oC5yQ6/1
I would split on ":" and use the 11th element.
But if you must use a regex:
^(?:[^:]*:){10}:([^:]*)
And use group 1 of the match.
you can use split_part for this purpose,
select split_part(id, ':', 12) from tempnew
Related
Need to match everything after the first / and until the 2nd / or end of string. Given the following examples:
/US
/CA
/DE/Special1
/FR/Special 1/special2
Need the following returned:
US
CA
DE
FR
Was using this in DataStudio which worked:
^(.+?)/
However the same in BigQuery is just returning null. After trying dozens of other examples here, decided to ask myself. Thanks for your help.
For such simple extraction - consider alternative of using cheaper string functions instead of more expensive regexp functions. See an example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '/US' line UNION ALL
SELECT '/CA' UNION ALL
SELECT '/DE/Special1' UNION ALL
SELECT '/FR/Special 1/special2'
)
SELECT line, SPLIT(line, '/')[SAFE_OFFSET(1)] value
FROM `project.dataset.table`
with result
Row line value
1 /US US
2 /CA CA
3 /DE/Special1 DE
4 /FR/Special 1/special2 FR
Your regex matches any 1 or more chars as few as possible at the start of a string (up to the first slash) and puts this value in Group 1. Then it consumes a / char. It does not actually match what you need.
You can use a regex in BigQuery that matches a string partially and capture the part you need to get as a result:
/([^/]+)
It will match the first occurrence of a slash followed with one or more chars other than a slash placing the captured substring in the result you get.
I am having data in a column like XXX/XXXX/XXXX/XYYUX/YYY. I am trying to extract only the first two digits after the 3rd backslash(/) in the column which is 'XY' in this example. Can you please help?
Thanks!
Try this:
REGEXP_SUBSTR('XXX/XXXX/XXXX/XYYUX/YYY','^([^/]*/){3}\K..',1,1,'i')
'^' start of string
'([^/]*/){3}' looks for 0 or more non-slashes followed by a slash, 3 times
'\K' match reset operator drops the part of the string that has been matched up to this point
'..' grabs the next two characters in the string
Try using - STRTOK('/88/209/89/132]', ' /]', 3)
returns the 3rd octet, '89'
I would like to make a regex operation at each string of an array.
For instance, take the first characters of each string before a '-'. The results will be store in another array.
('Hello-1','Hi-2','Hola-3')
will give
('Hello','Hi','Hola')
Is there a way do do it in R without a loop ?
Thanks!
Based on the updated question, we can match the character '-' followed by one or more characters until the end of the string and replace with ''.
sub('-.*$', '', test)
I want to write a regex that will return characters in a string not equal to d, M or y.
For example:
in dd.MM.yyyy, I should get a ' . '
in dd/MM/yyyy, I should get a ' / '
Is this possible?
If you try to parse input date, find first non numeric character
[0-9]+([^0-9]).*
If you try to find element in "mask/template/..." then first character not in set
[dMy]+(\.).*
Assuming you will always get a string in that format and casing, then you could use dd(.)MM(.)yyyy. This will match the two strings above and put the seperating character in a group, which you can then later access.
In the context of a postgres query, this -
lower(regexp_replace('If...', '[^\w\s]', ''))
gives me this -
'if..' (quotes mine)
As you can see, only one of the three periods gets trimmed. Can someone tell me what I must add to my regexp to get rid of the other two or any other special characters that might be trailing in this way?
You are probably looking for the fourth, optional parameter of regexp_replace():
SELECT regexp_replace('If...', '[^\w\s]', '', 'g');
g .. for "globally", i.e. replace every match in the string, not just the first.
SELECT regexp_replace('If, stay real....', '[.]{2,}$', '.', 'g');
{m,} a sequence of m or more matches of the atom.
More than 2 dot in the string will be replaced with one dot.
further reference: https://www.postgresql.org/docs/current/functions-matching.html