Help in set RegularExpressions in Delphi XE - regex

i want to set RegularExpressions for check string1.
string1 can change to :
string1:='D1413578;1038'
string1:='D2;11'
string1:='D16;01'
,....
in string1 only Character 'D' and semicolon is always exist.
i set RegularExpressions1 := '\b(D#\;#)\b';
but RegularExpressions1 can't to check string1 correctly.
in the vb6 this RegularExpressions1="D#;#". but i don't know that is in Delphi??

Try
\bD\d*;\d*
\d* means "zero or more digits".
By the way, I have omitted the second \b because otherwise the match would fail if there is no number after the semicolon (and you said the number was optional).
If by "check" you mean "validate" an entire string, then use
^D\d*;\d*$
All this assumes that only digits are allowed after D and ;. If that is not the case, please edit your question to clarify.

Assuming both numbers require at least one digit, use this regex:
\AD\d+;\d+\z
I prefer to use \A and \z instead of ^ and $ to match the start and end of the string because they always do only that.
In Delphi XE you can check whether this regex matches string1 in a single line of code:
if TRegEx.IsMatch(string1, '\AD\d+;\d+\z') then ...
If you want to use many strings, intantiate the TRegEx:
var
RE: TRegEx;
RegEx.Create('\AD\d+;\d+\z');
for string1 in ListOfStrings do
if RE.IsMatch(string1) then ...

Related

Regex: Separate a string of characters with a non-consistent pattern (Oracle) (POSIX ERE)

EDIT: This question pertains to Oracle implementation of regex (POSIX ERE) which does not support 'lookaheads'
I need to separate a string of characters with a comma, however, the pattern is not consistent and I am not sure if this can be accomplished with Regex.
Corpus: 1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25
The pattern is basically 4 digits, followed by 4 characters, followed by a dot, followed by 1,2, or 3 digits! To make the string above clear, this is how it looks like separated by a space 1710ABCD.13 1711ABCD.43 1711ABCD.4 1711ABCD.404 1711ABCD.25
So the output of a replace operation should look like this:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
I was able to match the pattern using this regex:
(\d{4}\w{4}\.\d{1,3})
It does insert a comma but after the third digit beyond the dot (wrong, should have been after the second digit), but I cannot get it to do it in the right position and globally.
Here is a link to a fiddle
https://regex101.com/r/qQ2dE4/329
All you need is a lookahead at the end of the regular expression, so that the greedy \d{1,3} backtracks until it's followed by 4 digits (indicating the start of the next substring):
(\d{4}\w{4}\.\d{1,3})(?=\d{4})
^^^^^^^^^
https://regex101.com/r/qQ2dE4/330
To expand on #CertainPerformance's answer, if you want to be able to match the last token, you can use an alternative match of $:
(\d{4}\w{4}\.\d{1,3})(?=\d{4}|$)
Demo: https://regex101.com/r/qQ2dE4/331
EDIT: Since you now mentioned in the comment that you're using Oracle's implementation, you can simply do:
regexp_replace(corpus, '(\d{1,3})(\d{4})', '\1,\2')
to get your desired output:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
Demo: https://regex101.com/r/qQ2dE4/333
In order to continue finding matches after the first one you must use the global flag /g. The pattern is very tricky but it's feasible if you reverse the string.
Demo
var str = `1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25`;
// Reverse String
var rts = str.split("").reverse().join("");
// Do a reverse version of RegEx
/*In order to continue searching after the first match,
use the `g`lobal flag*/
var rgx = /(\d{1,3}\.\w{4}\d{4})/g;
// Replace on reversed String with a reversed substitution
var res = rts.replace(rgx, ` ,$1`);
// Revert the result back to normal direction
var ser = res.split("").reverse().join("");
console.log(ser);

Regex words with letters, numbers, optional special characters in any order

I've been using some help on here for a while now but cannot find anything specific to my requirement. I need to pick out whole words which contain at least 6 letters and/or numbers (combined, not each), with optional 'special' characters. All in any order, so A12345, 12345A, 1-2-345-A, 12A45B and so-on.
I've done a fiddle here. I'm almost there (but could be done better) - I can't work out why it needs to be a least 6 numbers to get a match. Is it beacuse the letters are all optional with *
This is VBA so no access to look behinds. The special characters will only ever be 'within' the match, not start or end (will never be -1234-A- for example).
I think this is what you are looking for:
[a-z0-9/-]{6,}
That will match in any order a to z or 0 to 9 or - or / of at least 6. Note the - is at the end of the character class. You can have it in the middle but then need to escape it. Also, / will need to be escaped if your delimiters are also /
update
As Wiktor noted this would also capture ------ which may not be what you want. I would suggest simply cleaning out all optional characters, and then running the above regex. I would delete my answer since I'm not providing exactly what was being asked, but it would be a workable solution so it may have value.
You could do a regex replacement to remove all non letters/numbers, and then check that the length of the resulting string is 6 or more:
Dim input As String = "A-1234-B"
Dim pattern As String = "[^A-Za-z0-9]+"
Dim replacement As String = ""
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, replacement)
Console.WriteLine(result.Length) ' 6
Demo

Regular Expression for 8 or more A's followed by zero, one or two equal signs (=)

I need to filter out garbage string values, which come in the form of at least 8 A's sometimes followed by zero (fixed), one or two equal signs. The examples include the entire string value - if any other characters occur in the string then it's a keeper.
trash:
AAAAAAAA
AAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAA=
keepers:
AAAAA
AAAAA=
AAAAA0AAAAAAAAAAAAAAAAAAAAAAAAAAA
==
I'm lame at regular expressions, so request some help.
What expression will permit me to take out the trash?
Thanks!
Try using: ^A{8,}={0,2}$
Demo (JavaScript):
var regex = /^A{8,}={0,2}$/
console.log([
// Trash (true)
'AAAAAAAA',
'AAAAAAAA=',
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==',
'AAAAAAAAAAAAA=',
// Keep (false)
'AAAAA',
'AAAAA=',
'AAAAA0AAAAAAAAAAAAAAAAAAAAAAAAAAA',
'=='
].map(regex.test, regex))
Assuming you need eight or more As, followed by zero or more equals signs, you can use:
[A]{8,}[=]{0,}
Note that this will also match the final set of A's in AAAAA0AAAAAAAAAAAAAAAAAAAAAAAAAAA. If you don't want that to match, you should start and end the regex with the delimiters ^ and $:
^[A]{8,}[=]{0,}$
Hope this helps! :)
This regexp may work for you: ^A{8}[A=] if matched, it's a trash value.

Regex for comparing Strings with spaces

Im trying to compare is a string is present among a list of Strings using regex.
I tried using the following...
(?!MyDisk1$|MyDisk2$)
But this isnt working... for the scenarios like
(?!My disk1$|My Disk2$)
Can you suggest a better approach to deal with such situations..
I get the list of strings from an sql query... So I am not sure where the spaces are present. The list of Strings vary like My Disk1, MyDisk2, My_Disk3, ABCD123, XYZ_123, MNP 123 etc.... or any other String with [a-zA-Z0-9_ ]
You can make the spaces optional using a zero-or-one quantifier (?):
(?!My ?disk1$|My ?Disk2$)
This assertion will reject substrings like MyDisk2 or My Disk2. Or to handle potentially many spaces, use a zero-or-more quantifier (*):
(?!My *disk1$|My *Disk2$)
Note that if you're running this in an engine which ignores whitespace in the pattern you may need to use a character class, like this:
(?!My[ ]*disk1$|My[ ]*Disk2$)
Or to handle spaces or underscores:
(?!My[ _]*disk1$|My[ _]*Disk2$)
Unfortunately if the spaces can be anywhere in the string, (but you still care about matching the other letters in order), you'd have to do something like this:
(?! *M *y *d *i *s *k *1$| *M *y *D *i *s *k *2$)
Or to handle spaces or underscores:
(?![ _]*M[ _]*y[ _]*d[ _]*i[ _]*s[ _]*k[ _]*1$|[ _]*M[ _]*y[ _]*D[ _]*i[ _]*s[ _]*k[ _]*2$)
But to be honest, at that point, you may be better off preprocessing your data before you try to use your regex with it.
use this Regex upending i at the end that will mean that your regex is case-insensitive
/my\s?disk[12]\$/i
this will match all possible scenarios.
You can do this:
/(?[^\s_-]+(\s|_|-)?[^\s_-]*?$)/i
'?' quantifier means 0 or 1 of the preceding pattern.
/i is for case insensitive. The separator can be space or underscore or dash.I have replace My and disk with a string of length 1 or more which does not contain space ,underscore or dash.. Now it wil match "Shikhar Subedi" "dprpradeep" or "MyDisk 54".
The + quantifier means 1 or more. ^ means not. * means 0 or more. So the string after the space is optional.

Regex to remove characters up to a certain point in a string

How do I use regex to convert
11111aA$xx1111xxdj$%%`
to
aA$xx1111xxdj$%%
So, in other words, I want to remove (or match) the FIRST grouping of 1's.
Depending on the language, you should have a way to replace a string by regex. In Java, you can do it like this:
String s = "11111aA$xx1111xxdj$%%";
String res = s.replaceAll("^1+", "");
The ^ "anchor" indicates that the beginning of the input must be matched. The 1+ means a sequence of one or more 1 characters.
Here is a link to ideone with this running program.
The same program in C#:
var rx = new Regex("^1+");
var s = "11111aA$xx1111xxdj$%%";
var res = rx.Replace(s, "");
Console.WriteLine(res);
(link to ideone)
In general, if you would like to make a match of anything only at the beginning of a string, add a ^ prefix to your expression; similarly, adding a $ at the end makes the match accept only strings at the end of your input.
If this is the beginning, you can use this:
^[1]*
As far as replacing, it depends on the language. In powershell, I would do this:
[regex]::Replace("11111aA$xx1111xxdj$%%","^[1]*","")
This will return:
aA$xx1111xxdj$%%
If you only want to replace consecutive "1"s at the beginning of the string, replace the following with an empty string:
^1+
If the consecutive "1"s won't necessarily be the first characters in the string (but you still only want to replace one group), replace the following with the contents of the first capture group (usually \1 or $1):
1+(.*)
Note that this is only necessary if you only have a "replace all" capability available to you, but most regex implementations also provide a way to replace only one instance of a match, in which case you could just replace 1+ with an empty string.
I'm not sure but you can try this
[^1](\w*\d*\W)* - match all as a single group except starting "1"(n) symbols
In Javascript
var str = '11111aA$xx1111xxdj$%%';
var patt = /^1+/g;
str = str.replace(patt,"");