PostgreSQL regular expression - regex

I have a string like 'ABC3245-bG1353BcvG34'. I need to remove the hyphen including all the letters after hyphen.
so the above string should be ABC3245135334
I tried the below code:
select substring('ABC3245-bG1353BcvG34' from 1 for (position('-' in 'ABC3245-bG1353BcvG34')-1))||regexp_replace(substring('ABC3245-bG1353BcvG34' from (position('-' in 'ABC3245-bG1353BcvG34') +1) for length('ABC3245-bG1353BcvG34')),'[a-zA-Z]','')
but not able to remove letters after the hyphen.

I need to remove the hyphen including all the letters after hyphen.
so the above string (ABC3245-bG1353BcvG34) should be ABC3245135334
This suggests that all numbers should remain after the hyphen (in their original order). If that's what you want, you cannot do this with a single regex. Assuming you can have only 1 hyphen in your input:
SELECT substring(input.value from '^[^-]*') ||
regexp_replace(substring(input.value from '[^-]*$'), '\D+', '', 'g')
FROM (SELECT 'ABC3245-bG1353BcvG34'::text AS value) AS input
If you can have multiple hyphens in your input, please define how to handle characters between hyphens.

Fixed version
SELECT a[1] || regexp_replace(a[2], '\D', '', 'g')
FROM string_to_array('ABC3245-bG1353BcvG34', '-') AS a
Or, more convenient to deal with a set (like a table):
SELECT split_part(str, '-', 1)
|| regexp_replace(split_part(str, '-', 2), '\D', '', 'g')
FROM (SELECT 'ABC3245-bG1353BcvG34'::text AS str) tbl
Removes all non-digits after the hyphen. (Assuming there is only one hyphen.) Result:
ABC3245135334
First version
Missed that OP wants to remove all letters after -.
SELECT regexp_replace('ABC3245-bG1353BcvG34', '-\D*', '')
Result:
ABC32451353BcvG34
Regex explained:
- .. literal hyphen -
\D .. class shorthand for "non-digits".
* .. 0 or more times
Removes the first hyphen and everything that follows until the first digit.

A RegEx that would work:
[a-zA-Z0-9]+(?=-)
Do note that this requires the string to actually contain the hyphen. It uses a lookahead to grab a substring of all alphanumeric characters followed by a hyphen.

Related

Find LETTERS-NUMBER pairs in postgres using regex

I need to replace TEXT1-NUMBER with TEXT2-NUMBER.
Example "These are TEXT1-123 and TEXT1-456 examples" should be replaced with "These are TEXT2-123 and TEXT2-456 examples".
I can replace most of the cases using
Regexp_Replace(column_name, '(\mTEXT1)(-[0-9]+\M)', 'TEXT2\2', 'g')
But it also replaces some cases that I want to exclude, such as
TEXT1-NUMBER-NUMBER
TEXT3-NUMBER-TEXT1-NUMBER
How can I make it to match only exact pairs of TEXT-NUMBER?
Thanks.
You can use
SELECT REGEXP_REPLACE(column_name,
'(\s|^)TEXT1(-[0-9]+)(?!\S)',
'\1TEXT2\2', 'g') AS Result;
See the regex demo.
Beginning with PostgreSQL 10, lookbehinds are supported, and you can also use REGEXP_REPLACE(column_name, '(?<!\S)TEXT1(-[0-9]+)(?!\S)', 'TEXT2\1', 'g') then.
Regex details:
(\s|^) - Group 1 (\1 refers to this value): a whitespace or start of string
TEXT1 - a static string
-(-[0-9]+) - Group 2 (\2 refers to this value): - and one or more digits
(?!\S) - a negative lookahead that fails the match if there is no non-whitespace char immediately to the right of the current location.

How can I allow hyphens in this RegEx

I know a bit of RegEx but this one's a bit too complicated for me.
All I need to change is for it to allow for a single hyphen too.
replace(/[^\p{L}\s]+/gu, '')
You may use
.replace(/^([^-]*-)|-/g, '$1').replace(/[^\p{L}\s-]+/gu, '')
It will keep the first - in the input string as well as any Unicode letters (\p{L}) and whitespaces (\s), because .replace(/^([^-]*-)|-/g, '$1') will match and capture - from the start of string - all chars other than - up to the first - (with ^([^-]*-)) and then match any other - in the string and replace the matches with the value of Group 1 (it will be empty if the - is not the first hyphen in the string) and .replace(/[^\p{L}\s-]+/gu, '') will remove any one or more chars other than letters, whitespaces and hyphens (there will remain the first one only after the first replacement).
See the ECMAScript 2018+ JS demo below:
console.log( "12-3-**(Виктор Викторович)**...".replace(/^([^-]*-)|-/g, '$1').replace(/[^\p{L}\s-]+/gu, '') )

regex that allows 5-10 characters but can have spaces in-between not counting

Problem
Build a regex statement that allows the following:
minimum 5 characters
maximum 10 characters
can contain whitespace but whitespace does not increment character count
any non-whitespace characters increment character count
Test Cases:
expected_to_pass = ['testa', ' test a', 12342, 1.234, 'test a']
expected_to_fail = [' test', 'test ', ' test ', ' ', 1234, 0.1, ' ','12345678901']
Example regex statements and their purpose
Allow 5-10 non-whitespace characters:
[\S]{5,10}$
Allow 5-10 characters regardless of whitespace:
[\s\S]{5,10}$
I've been farting around with this for a few hours and cannot think of the best way to handle this.
How's this?
\s*(?:[\w\.]\s*){5,10}+$
Or:
\s*(?:[\w\.]\s*){5,10}$
Also, if ANY non-whitespace character goes:
\s*(?:\S\s*){5,10}$
You can test it here
There is a wrong assumption in your question: \w doesn't match all non-space-characters, it matches word characters - this means letters, digits and the underscore. Depending on language and flags set, this might include or exclude unicode letters and digits. There are a lot more non-space-characters, e.g. . and |. To match space-characters one usually uses \s, thus \S matches non-space-characters.
You can use ^\s*(?:\S\s*){5,10}$ to check your requirements. You might be able to drop the anchors, if you use some kind of full match functionality (e.g. Java .matches() or Python re.fullmatch).
Depending on the language you use, you might not want to use a regex, but iterate over the string and check character for character. This should usually be faster than regex.
Pseudocode:
number of chars = 0
for first character of string to last character of string
if character is space
inc number of chars by 1
return true if number of chars between 5 and 10
Check this out:
(\s*?\w\s*?){5,10}$
It won't match 1.234 because . is not included inside \w set
If you need it to be included then:
(\s*?[\w|\.]\s*?){5,10}$
(\s*?[\w\.]\s*?){5,10}$
Cheers

Regex for {!Customobject_relateobject.name}

I don't know regex can you please help me to get regex for
{!Customobject_relateobject.name}
String "Customobject_relateobject.name" can contain only "_" and "." in middle of word not even in first or last
"{!" and "}" is mandatory
Thanks in Advance.
You can use the following regex:
\{![a-zA-Z0-9_.]*}
See demo
The regex means:
\{! - matches {! literally
[a-zA-Z0-9_.]* - 0 or more (due to *) characters that are lower- or uppercase Latin letters, digits from 0 to 9, underscore or dot
} - literal }.
{!^[a-zA-Z0-9]?[a-zA-Z0-9._]*[a-zA-Z0-9]?$} if empty strings like {!} are not allowed and only latin and digits should be inside the parenthesis
I guess the word can't end with '.' or '_' or have any digit in it. So this regex will give you what you want:
\{!(([a-zA-Z]+(_|\.)?)+[a-zA-Z]+)\}
If you want digits have this regex:
\{!(([a-zA-Z0-9]+(_|\.)?)+[a-zA-Z0-9]+)\}
Don't use the '\w' because it match the '_', and you can end with two together.

Can someone explain this regex

I know this is a pretty basic regex, could someone explain what it is doing please?
^[^#]+#[-a-z0-9.]+$
^ - match start of string
[^#]+ - match one or more characters that aren't an #
# - match an #
[-a-z0-9.]+ - match one or more characters from the set '-', lower case 'a'-'z', the digits '0'-'9', '.'
$ - match end of string
So, match any string that consists of some characters that aren't '#', followed by '#', followed by some number of lower case letters / digits / dashes / full stops.
I think it's trying to match an email address (not very well)
Example matches:
abc#example.com
podcast#nospam.com
hello(world)#9
a[]&^&£^$^&£#.
It says "match one or more non-# character followed by an #, followed by one or more alphanumeric characters, a - or a ." The ^ at the beginning and the $ at the end signify this pattern must also be against the beginning and end of the entire string (^ means "beginning of string" and $ means "end of string").
Matches a string that doesn't start with at least 1 # character, followed by matching a #, then a -, . or any alphanumeric characters at least once.
I'm guessing it's a very loose email validator.
To expand upon Rex's answer, it looks like a naive email validation regex.