Postgresql regular expression with semi colon - regex

I want to use regular expression to split String values from a field.
Here is something to follow my question
mydatabase=> SELECT regexp_replace('a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}',
'([^0-9]|^)([=.*])(?=;|$)', '\1 \2', 'g');
regexp_replace
------------------------------------------------------
a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}
(1 row)
But I want the result like below
mydatabase=>YOUR_ANSER_QUERY
regexp_replace
------------------
a1=1
B2b=2,3,4
C3c={3,4,5;4,5,6}
D4d={4,5,6;7,8,9}
(4 rows)

You have semi-colon within your brackets. to escape them, i have added (?![0-9]) a negative look-ahead so specified pattern not exist. to separate them into table
This should do it:
SELECT regexp_split_to_table( 'a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}', ',?[1-2]?(;(?![0-9]))');

I used online regex replace verifier at http://regexr.com
The regex to search for is ([^=]+)=(\{[^\}]+\}|[^;]+)(?:;|$)
and the regex to replace is $1=$2\r.
For your input string they give the required result.
Note that this verifier requires $ sign (+ number) to refer to a capturing group.

Related

How to highlight SQL keywords using a regular expression?

I would like to highlight SQL keywords that occur within a string in a syntax highlighter. Here are the rules I would like to have:
Match the keywords SELECT and FROM (others will be added, but we'll start here). Must be all-caps
Must be contained in a string -- either starting with ' or "
The first word in that string (ignoring whitespace preceding it) should be one of the keywords.
This of course is not comprehensive (can ignore escapes within a string), but I'd like to start here.
Here are a few examples:
SELECT * FROM main -- will not match (not in a string)
"SELECT name FROM main" -- will match
"
SELECT name FROM main" -- will match
"""Here is a SQL statement:
SELECT * FROM main""" -- no, string does not start with a keyword (SELECT...).
The only way I thought to do it in a single regex would be with a negative lookbehind...but then it would not be fixed width, as we don't know when the string starts. Something like:
(?<=["']\s*(SELECT)\s*)(SELECT|FROM)
But this of course won't work:
Would something like this be possible to do in a single regex?
A suitable regular expression is likely to get pretty complex, especially as the rules evolve further. As others have noted, it may be worth considering using a parser instead. That said, here is one possible regex attempting to cover the rules mentioned so far:
(["'])\s*(SELECT)(?:\s+.*)?\s+(FROM)(?:\s+.*)?\1(?:[^\w]|$)
Online Demos
Debuggex Demo
Regex101 Demo
Explanation
As can be seen in the above visualisation, the regex looks for either a double or single quote at the start (saved in capturing group #1) and then matches this reference at the end via \1. The SELECT and FROM keywords are captured in capturing groups #2 and #3. (The (?:x|y) syntax ensures there aren't more groups for other choices as ?: at the start of a choice excludes it as a capturing group.) There are some further optional details such as limiting what is allowed between the SELECT and FROM and not counting the final quotation mark if it is immediately succeeded by a word character.
Results
SELECT * FROM tbl -- no match - not in a string
"SELECT * FROM tbl" -- matches - in a double-quoted string
'SELECT * FROM tbl;' -- matches - in a single-quoted string
'SELECT * FROM it's -- no match - letter after end quote
"SELECT * FROM tbl' -- no match - quotation marks don't match
'SELECT * FROM tbl" -- no match - quotation marks don't match
"select * from tbl" -- no match - keywords not upper case
'Select * From tbl' -- no match - still not all upper case
"SELECT col1 FROM" -- matches - even though no table name
' SELECT col1 FROM ' -- matches - as above with more whitespace
'SELECT col1, col2 FROM' -- matches - with multiple columns
Possible Improvement?
It might also be necessary to exclude quotation marks from the "any character" parts. This can be done at the expense of increased complexity using the technique described here by replacing both instances of .* with (?:(?!\1).)*:
(["'])\s*(SELECT)(?:\s+(?:(?!\1).)*)?\s+(FROM)(?:\s+(?:(?!\1).)*)?\1(?:[^\w]|$)
See this Regex101 Demo.
You could use capturing groups:
(.*["']\s*\K)(?(1)(SELECT|FROM).*(SELECT|FROM)|)
In this case $2 would refer to the first keyword and $3 would refer to the second keyword. This also only works if there are only two keywords and only one string on a line, which seems to be true in all of your examples, but if those restrictions don't work for you, let me know.
Just tested the regexp bellow:
If you need to add other commands the thing may get a little trick, because some keywords doesn't apply. Eg: ALTER TABLE mytable or UPDATE SET col = val;. For these scenarios you will need to create subgroups and the regexp may become slow.
Best regards!
If I understand your requirements well I suggest that:
/^'\s*(SELECT)[^']*(FROM)[^']*'|^"\s*(SELECT)[^"]*(FROM)[^"]*"/m
[Regex Fiddle Demo]
Explanation:
When you need to check start of a string; use ^.
When you need to accept 0-n spaces; use \s*.
When you need to accept new-line or multi-line strings; use m flag over your regex.
When you need to use Case-Sensitive mode; Don't use i flag over your regex.
When you need to block a string between a specific character like "; use [^"]* instead of .* that will protects first end of block.
When you need to have a block with similar start and end characters like ' & "; use ' '|" " instead of ['"] ['"].
Update:
If you need to capture any special keyword after verifying existence of SELECT keyword after start of your string, I can update my solution to this:
/^'\s*(SELECT)([^']*(SELECT|FROM))+|^"\s*(SELECT)([^"]*(SELECT|FROM))+/m
without parsing of quoted strings
could be done using \G and \K construct
(?:"\s*(?=(?:SELECT|FROM))|(?<!^)\G)[^"]*?\K(SELECT|FROM)
demo

Regular Expression to return number without coma

I have to extract a number formatted xx,xxx.xx in a different format - xxxxx.xx by applying a regular expression. In other words, I have to remove the comma from the number in the final capture group.
I am not quite sure if it's possible to achieve only with the regular expression and without writing specific code to split and join at these values.
Here is the demo.
This is the part of input string:
AMT : EGP 3,000.00
My current regex is AMT\s*:\s*EGP\s*(\d*,\d*.\d*), which basically retreives 3,000.00.
I'm expecting to have 3000.00 in final capture group.
EDIT:
Since the OP doesn't want to capture and replace, the following can be done:
AMT\s*:\s*EGP\s*(\d*),(\d*.\d*)
The expected data is now part of the two capturing groups, and can be accessed by concatenating them: \1\2.
Demo
You can capture everything other than the , in two groups, and then replace:
Capture with:
(AMT\s*:\s*EGP\s*\d*),(\d*.\d*)
Replace with: \1\2
Demo
Try this:
AMT\s*:\s*EGP\s*\K\d+(,\d{3})*(\.\d+)?
Here is Demo
After find the match, do something like: Mystring.Replac(",", "")

regex to select only first instance of string (no duplicates)

I am using this regex
(rs)\w+/
to select strings that begin with the string 'rs', i.e.
..the biomarker rs4343 but not rs4342. However rs4343 ..
this returns: rs4343, rs4242, re4343
Is it possible to use regex to select only the first instance of a matched string to avoid duplication, i.e. to return: rs4343, rs4242
I can use JS or PHP regex.
Try this:
(rs\w+)(?!.*\1)
Regex101
Details:
(rs\w+) - Group the required match
(?!.*\1) - Use negative lookahead to assert that there is no same match after this

REGEXP_LIKE in Oracle

I have a query which I was using in an Access database to match a field. The rows I wish to retrieve have a field which contains a sequence of characters in two possible forms (case-insensitive):
*PO12345, 5 digits preceded by *PO, or
PO12345, 5 digits preceded by PO.
In Access I achieved this with:
WHERE MyField LIKE '*PO#####*'
I have tried to replicate the query for use in an Oracle database:
WHERE REGEXP_LIKE(MyField, '/\*+PO[\d]{5}/i')
However, it doesn't return anything. I have tinkered with the Regex slightly, such as placing brackets around PO, but to no avail. To my knowledge what I have written is correct.
Your regex \*+PO[\d]{5} is wrong. There shouldn't be + after \* as it's optional.
Using ? like this /\*?PO\d{5}/i solves the problem.
Use i (case insensitive) as parameter like this: REGEXP_LIKE (MyField, '^\*?PO\d{5}$', 'i');
Regex101 Demo
Read REGEXP_LIKE documentation.

Regular Expression to extract a string based on delimiter

I am trying to extract a substring from a string based on delimiter '.'(period). Can someone share your thoughts on how to do it using regexp_extract please. Thanks.
**
- Input:-
15.075
0.035
**
Output
075
035
From this answer, it appears that you can use parentheses to capture of the match, as you would in most regex systems. That is, match the whole ".[0-9]+", but only capture the numeric portion, by surrounding it with parentheses, like this:
select regexp_extract(input, r'\.([0-9]+)');
This says to match a period followed by one or more numbers, and to extract the numeric portion only. I think that the leading r marks that string as a regular expression, but I can't find documentation on it.
Reference: https://cloud.google.com/bigquery/query-reference?hl=en#regularexpressionfunctions
It seems that you will want to use REGEXP_EXTRACT
REGEXP_EXTRACT(number, r'\.(\d+)')