Remove last 4 digits from string if pattern matches - regex

I'm trying to remove the last 4 digits from a string in Postgres if and only if they match a certain pattern: [0][1-9][0][1-9].
Example:
1031610101 -> 103161
1234 -> 1234
123456 -> 123456
123405 -> 123405
I've tried a few approaches using substring, but somehow can't get this to work.
The length of the string is variable.
So far I've tried:
substring(value from '([\d](3,6}[0][1-9][0][1-9])') as "Result"

Easier with regexp_replace():
SELECT regexp_replace(col, '0[1-9]0[1-9]$', '')
FROM tbl;
$ .. end of string

SELECT SUBSTR('ABCDEFGHIJKLMNOP', 1, LENGTH('ABCDEFGHIJKLMNOP') - 4);
Syntax: SUBSTR('string',from_postion,length)

Related

Regex for valid SSN or other ID

I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999
I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999
Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$

return nth match from string using regex

I am using Tableau to create a visualization and need to apply Regex to string values in my data set. I'm trying to use Regex to return the nth match of this string of data: b29f3b2f2b2f3b3f1r2f3+b3x#. The data will always be in one line and I need to break the data out into substrings each time the characters b,s,f, or d are encountered and I need to match the nth occurrence returned. For example, when identifying which number match to return the following will match:
n=1 matches b29
n=2 matches f3
n=3 matches b2
n=4 matches f2
n=5 matches b2
n=6 matches f3
n=7 matches b3
n=8 matches f1r2
n=9 matches f3+
n=10 matches b3x#
I can get the n=1 match to return the proper value using bfsd(?=[bfsd]) and have tried to get the subsequent values to return using lookahead, but can't find a regex which works. Any help is appreciated.
Your item pattern is [bfsd][^bfsd]*.
You may use ^(?:.*?([bfsd][^bfsd]*)){n} to get what you need, just update the n variable with the number you need to get.
This pattern will get you the second value:
^(?:.*?([bfsd][^bfsd]*)){2}
See regex demo.
Details
^ - start of string
(?:.*?([bfsd][^bfsd]*)){2} - two occurrences of
.*? - any 0+ chars, as few as possible
([bfsd][^bfsd]*) - b, f, s or d followed with 0+ chars othet than b, f, s and d.
You can use this regex:
[bsfd][^bsfd]*
Use the 'global' flag.
This will create matches that start with one of the four letters, followed by any number of other characters.
The result will be an array with all the matches. Note the Array will start with index 0 (not 1).
if you have gawk, this will partition the input field as your spec
$ awk -v FPAT='[a-f][0-9rx#+]+' '{$1=$1}1'
$ echo "b29f3b2f2b2f3b3f1r2f3+b3x#" |
awk -v FPAT='[a-f][0-9rx#+]+' '{for(i=1;i<=NF;i++) print i " -> " $i}'
1 -> b29
2 -> f3
3 -> b2
4 -> f2
5 -> b2
6 -> f3
7 -> b3
8 -> f1r2
9 -> f3+
10 -> b3x#

Select only letters which are followed by a number

I am trying to select some codes from a PostgreSQl table.
I only want the codes that have numbers in them e.g
GD123
GD564
I don't want to pick any codes like `GDTG GDCNB
Here's my query so far:
select regexp_matches(no_, '[a-zA-Z0-9]*$')
from myschema.mytable
which of course doesn't work.
Any help appreciated.
The pattern to match a string that has at least 1 letter followed by at least 1 number is '[A-Za-z]+[0-9]+'.
Now, if the valid patterns had to start with two letters, and then have 3 digits after as your examples show, then replace the + with {2} & {4} respectively, and enclose the pattern in ^$, like this: '^[A-Za-z]{2}[0-9]{3}$'
The regex match operator is ~ which you can use in the where clause:
SELECT no_
FROM myschema.mytable
WHERE no_ ~ '[A-Za-z]+[0-9]+'
You may use
CREATE TABLE tb1
(s character varying)
;
INSERT INTO tb1
(s)
VALUES
('GD123'),
('12345'),
('GDFGH')
;
SELECT * FROM tb1 WHERE s ~ '^(?![A-Za-z]+$)[a-zA-Z0-9]+$';
Result:
Details
^ - start of string
(?![A-Za-z]+$) - a negative lookahead that fails the match if there are only letters to the end of the string
[a-zA-Z0-9]+ - 1 or more alphanumeric chars
$ - end of string.
If you want to avoid matching 12345, use
'^(?![A-Za-z]+$)(?![0-9]+$)[a-zA-Z0-9]+$'
Here, (?![0-9]+$) will similarly fail the match if, from the string start, all chars up to the end of the string are digits. Result:
smth like:
so=# with c(v) as (values('GD123'),('12345'),('GD ERT'))
select v ~ '[A-Z]{1,}[0-9]+', v from c;
?column? | v
----------+--------
t | GD123
f | 12345
f | GD ERT
(3 rows)
?..
If the format of the data you want to obtain is a set of characters follewd by a set of digits (i.e., GD123) you can use the regex:
[a-zA-Z0-9]+[0-9]
This captures every digit and letter which is in front of the digits:
([A-z]+\d+)

Trimming leading zeros with regex

I'm struggling to get a regex expression to work.
I need to allow the following transforms based on the existence of leading zeros...
001234 -> 1234
1234 -> 1234
00AbcD -> AbcD
001234.1234 -> 1234.1234
001234.000002 -> 1234.2
001234/000002 -> 1234.2
I've found the expression matches works well for transforms 1, 2 & 3 but I'm not sure how to match the (optional) second section demonstrated in 4, 5 & 6.
^0*([0-9A-Za-z]*$)
You can get the zeros with following regex :
/(?:^|[./])0+/g
Demo
and replace the second group with first group (\1).
For example in python i can do following :
>>> s="""001234
... 1234
... 00AbcD
... 001234.1234
... 001234.000002
... 001234/000002"""
>>> [re.sub(r'(:?^|[./])0+',r'\1',i) for i in s.split()]
['1234', '1234', 'AbcD', '1234.1234', '1234.2', '1234/2']
^(0+)(.+)
Group 2 should be result.

Regex mask with varying ignored characters

I have a series of strings which look something like this:
foobar | ABC Some text 123
barfoo | DEF Some te 456
And I want to mask it such that I get the results
ABC123
DEF456
respectively. The text in between will always be a substring Some text which could potentially contain numbers (e.g. S0m3 t3xt or S0m3 t3). It will always be a substring starting from the left, so never me te.
So clearly I need to start the Regex with something like
(?<=| )[A-Z]{3}
which gets me ABC and DEF but I am at a loss of how to effectively concatenate the numbers at the end of the string.
Is there any way to do this with a single expression?
See http://regexr.com?375u8
(?<=| )([A-Z]{3}).*(\d{3})
This will give you three characters in the range of A-Z and three numbers in two capturing groups, allowing you to use these groups to concatenate both to your desired output: $1$2
This will even work if your Some text contains three numbers inbetween.
In case you want to replace everything with both of your capturing groups, add .* in front of the regex:
.*(?<=| )([A-Z]{3}).*?(\d{3})
Another javascript version
[
'foobar | ABC Some text 123',
'barfoo | DEF Some te 456'
].map(function(v) {
return v.replace(/^.*\| ([A-Z]{3}) .* (\d{3})$/, '$1$2');
})
Gives
["ABC123", "DEF456"]