Postgres Parse String with Delimiter - regex

Trying to parse a sting using SQL, and have not found any solutions online (apologies, maybe I'm looking for the wrong thing).
I have a string field with a series of numbers I need to pull out and sum. Delimiter is "\r\n".
Example: '\r\n - 1234 somenumbersandtext123 \r\n -5678 sometextmorenumbers123'
So in this example, I want to sum 1234 and 5678.
The stings are all different lengths, and I need to eventually sum the numbers within the string. The string details documents tied to a project, and the numbers represent the size of the file (trying to determine the total file size per project).
Thanks in advance for any guidance.

You may use
regexp_matches(col,'(?:^|\n)\s*-\s*(\d*\.?\d+)','g')
The part captured with (...) will be the output of the regexp_matches function.
Details
(?:^|\n) - start of string or newline
\s*-\s* - a hyphen enclosed with 0+ whitespaces
(\d*\.?\d+) - Capturing group 1 (what will be returned):
\d* - 0+ digits
\.? - 1 or 0 dots
\d+ - 1+ digits.

This seems to work:
SELECT
REGEXP_MATCHES(
string::text
,'\Br\Bn ?- ?([0-9]+)',
'g')
from test_table

Related

Regex table of contents

I have a table of contents items I would need to regex. The data is not totally uniform and I cant get it to work in all cases.
Data is following:
1. Header 1
1.2. SubHeader2
1.2.1 Subheader
1.2.2. Another header
1.2.2.1 Test
1.2.2.2. Test2
So I would need to get both the number and the header in different groups. The number should be without the trailing dot, if it is there. The issue that im struggling with is that not all of the numbers have the trailing dot.
I have tried
^([0-9\.]+)[\.]\s+(.+)$ -- Doesnt work when there is no trailing
^([0-9\.]+)[\.]?\s+(.+)$ -- Contains the trailing dot if it is there
You can use
^(\d+(?:\.\d+)*)\.?\s+(.+)
See the regex demo. Details:
^ - start of string
(\d+(?:\.\d+)*) - Group 1: one or more digits and then zero or more repetitions of a . and one or more digits sequence
\.? - an optional .
\s+ - one or more whitespaces
(.+) - Group 2: any one or more chars other than line break chars, as many as possible.

Regex to identify for values other than alphanumeric values which can have hyphen or dot in between them but not at beginning or at end

I am new to the regular expressions. I have seen other quite close posts with a similar question but as you are aware in RegEx even dot matters a lot so here I am posting this question to seek help in this particular scenario.
My SQL column value can have a-z, A-Z, and 0-9
It can have a dot(.) and hyphen(-) in between. These 2 things cannot be at the beginning or at the end.
It cannot have space or tabs or any blanks anywhere in the column value.
It cannot start or end with any special characters; not even dots or hyphens.
I wrote this query which covers the 1st, 2nd, and 3rd points but fails in the 4th case.
select * from test_db.xtmp_testtable_invalidchars042321_rg where (sl_id REGEXP '[^[:alnum:]].+$')
**Table column input values**
RaghavGupta
.RaghavGupta
#Raghav.Gupta
"Raghav Gupta"
Raghav Gupta
Raghav#Gupta
Raghav$Gupta
Raghav%Gupta
Raghav*Gupta
Raghav.Gupta
RaghavGupta
RaghavGupta$
RaghavGupta.
RaghavGupta[]
**Query Result**
RaghavGupta
.RaghavGupta
#Raghav.Gupta
"Raghav Gupta"
Raghav Gupta
Raghav#Gupta
Raghav$Gupta
Raghav%Gupta
Raghav*Gupta
Raghav.Gupta
"RaghavGupta "
RaghavGupta[]
You can use NOT with the matching regex:
select * from test_db.xtmp_testtable_invalidchars042321_rg where (sl_id NOT REGEXP '^[[:alnum:]]+([.-][[:alnum:]]+)*$')
The pattern matches
^ - start of string
[[:alnum:]]+ - one or more alphanumeric chars ([:alnum:] is a POSIX character class that matches letters and/or digits)
([.-][[:alnum:]]+)* - (a capturing group that matches) zero or more repetitions of
[.-] - a . or -
[[:alnum:]]+ - one or more alphanumeric chars
$ - end of string.

A RegEx that matches correct comma placement in a string

I am wanting to check user input to where it check whether a string has correct comma placement, and if the number is valid in human eyes too.
These are numbers that are allowed:
1,000
100
1
1,000,000,000,000,000
Here are numbers that are not allowed:
1e+5
1e5
1,00
-105
100.50
100,00,00,0,000000
I've tried to come up with my own RegEx but this is very complicated for even me to understand. This is my RegEx (^[0-9]{0,3}(,[0-9]*)?$) but it is very broken at the moment.
Is anyone able to help?
You may use
^\d{1,3}(?:,\d{3})*$
See the regex demo
Details
^ - start of string
\d{1,3} - 1, 2 or 3 digits
(?:,\d{3})* - zero or more consecutive occurrences of
, - a comma
\d{3} - 3 digits
$ - end of string.

Only allow 2 digits in a string using regex

I need regex that only allows a maximum of 2 digits (or whatever the desired limit is actually) to be entered into an input field.
The requirements for the field are as follows:
Allow a-z A-Z
Allow 0-9
Allow - and . characters
Allow spaces (\s)
Do not allow more than 2 digits
Do not allow any other special characters
I have managed to put together the following regex based on several answers on SO:
^(?:([a-zA-z\d\s\.\-])(?!([a-zA-Z]*\d.*){3}))*$
The above regex is really close. It works successfully for the following:
test 12 test
test12
test-test.12
But it allows an input of:
123 (but not 1234, so it's close).
It only needs to allow an input of 12 when only digits are entered into the field.
I would like some help in finding a more efficient and cleaner (if possible) solution than my current regex - but it must still be regex, no JS.
You could use a positive lookahead like
(?=^(?:\D*\d\D*){2}$) # only two digits
^[- .\w]+$ # allowed characters
See a demo on regex101.com.
You may use a negative lookahead anchored at the start that will make the match fail once there are 3 digits found anywhere in the string:
^(?!(?:[^0-9]*[0-9]){3})[a-zA-Z0-9\s.-]*$
^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details:
^ - start of string
(?!(?:[^0-9]*[0-9]){3}) - the negative lookahead failing the match if exactly 3 following sequences are found:
[^0-9]* - zero or more chars other than digits
[0-9] - a digit (thus, the digits do not have to be adjoining)
[a-zA-Z0-9\s.-]* - 0+ ASCII letters, digits, whitespace, . or - symbols
$ - end of string.

Notepad++ split numeric values separated with semi-colons and commas keeping the lines having single pairs intact

I have thousands of record which look like this:
35;36,58
36;2
37;5,58,17
My goal output is this:
35;36
35;58
36;2
37;5
37;58
37;17
Is this event possible with some sort of regex?
You may use
^(\d+);(.+),(\d+)
And replace with $1;$2\n$1;$3. Click Replace All repeatedly until no replacements are made.
Details:
^ - start of string
(\d+) - 1 digits (Group 1)
; - a literal ;
(.+) - 1+ chars other than linebreak char as many as possible until the last
,(\d+) - comma and Group 2, one or more digits