How to check if a RegEx matches all the target string? - regex

I need to check if a regex pattern matches with all the target string.
For example, if the pattern is '[0-9]+':
Target string '123' should result True
Target string '123' + sLineBreak should result False
The code should looks like the following:
uses
System.RegularExpressions;
begin
if(TRegEx.IsFullMatch('123' + sLineBreak, '[0-9]+'))
then ShowMessage('Match all')
else ShowMessage('Not match all');
end;
I've tried TRegEx.Match(...).Success and TRegEx.IsMatch without success and I'm wondering if there is an easy way for checking if a pattern matches the whole target string.
I've also tried using ^ - start of line and $ - end of line but without any success.
uses
System.RegularExpressions;
begin
if(TRegEx.IsMatch('123' + sLineBreak, '^[0-9]+$'))
then ShowMessage('Match all')
else ShowMessage('Not match all');
end;
Here you can find an online test demonstrating that if the target string ends with a new line, the regex still matches even using start/end of line.

Make sure the whole string matches:
\A[0-9]+\z
Explanation
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\z the end of the string
Also, see Whats the difference between \z and \Z in a regular expression and when and how do I use it?

var str = '123';
var sLineBreak = '\n';
console.log(str.match(/^\d+$/)); //123
console.log((str + 'b').match(/^\d+$/)); //123b
console.log((str + sLineBreak).match(/^\d+$/)); //123\n
You can use : ^\d+$
^ start of string
\d+ at lease one or more number of digits
$ end of string

Related

Use Regex to Split Numbered List array into Numbered List Multiline

I am trying to learn Regex to answer a question on SO portuguese.
Input (Array or String on a Cell, so .MultiLine = False)?
1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).
Output
1 One without dot.
2. Some Random String.
3.1 With SubItens.
3.2 With number 0n mid.
4. Number 9 incorrect.
11.12 More than one digit.
12.7 Ending (no word).
What i thought was to use Regex with Split, but i wasn't able to implement the example on Excel.
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim input As String = "plum-pear"
Dim pattern As String = "(-)"
Dim substrings() As String = Regex.Split(input, pattern) ' Split on hyphens.
For Each match As String In substrings
Console.WriteLine("'{0}'", match)
Next
End Sub
End Module
' The method writes the following to the console:
' 'plum'
' '-'
' 'pear'
So reading this and this. The RegExr Website was used with the expression /([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm on the Input.
And the following is obtained:
Is there a better way to make this? Is the Regex Correct or a better way to generate? The examples that i found on google didn't enlight me on how to use RegEx with Split correctly.
Maybe I am confusing with the logic of Split Function, which i wanted to get the split index and the separator string was the regex.
I can make that it ends with word and period
Use
\d+(?:\.\d+)*[\s\S]*?\w+\.
See the regex demo.
Details
\d+ - 1 or more digits
(?:\.\d+)* - zero or more sequences of:
\. - dot
\d+ - 1 or more digits
[\s\S]*? - any 0+ chars, as few as possible, up to the first...
\w+\. - 1+ word chars followed with ..
Here is a sample VBA code:
Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\w+\."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
For Each m In objMatches
Debug.Print m.Value
Next
End If
NOTE
You may require the matches to only stop at the word + . that are followed with 0+ whitespaces and a number using \d+(?:\.\d+)*[\s\S]*?[a-zA-Z]+\.(?=\s*(?:\d+|$)).
The (?=\s*(?:\d+|$)) positive lookahead requires the presence of 0+ whitespaces (\s*) followed with 1+ digits (\d+) or end of string ($) immediately to the right of the current location.
If VBA's split supports look-behind regex then this one may work, assuming there's no digit except in the indexes:
\s(?=\d)

Actionscript regular expression for repeating characters

I want to implement a regular expression check for a string with a character which repeats itself more than twice.
I am using ActionScript 3.
i.e.:
koby = true
kobyy = true
kobyyy = false
I tried using
/((\w)\2?(?!\2))+/
but it does not seem to work (using RegExp.test())
If you want to invalidate the complete string, when there is a character repeated 3 times, you can use a negative lookahead assertion:
^(?!.*(\w)\1{2}).*
See it here on Regexr.
The group starting with (?! is a negated lookahead assertion. That means the whole regex (.* to match the complete string) will fail, when there is a word character that is repeated 3 times in the string.
^ is an anchor for the start of the string.
^ # match the start of the string
(?!.* # fail when there is anywhere in the string
(\w) # a word character
\1{2} # that is repeated two times
)
.* # match the string
I also tried this one:
var regExp:RegExp = new RegExp('(\\w)\\1{2}');
trace(!regExp.test('koby'));
trace(!regExp.test('kobyy'));
trace(!regExp.test('kobyyy'));

javascript replace last occurrence of string

I've read many Q&As in StackOverflow and I'm still having a hard time getting RegEX.
I have string 12_13_12.
How can I replace last occurrence of 12 with, aa.
Final result should be 12_13_aa.
I would really like for good explanation about how you did it.
You can use this replace:
var str = '12-44-12-1564';
str = str.replace(/12(?![\s\S]*12)/, 'aa');
console.log(str);
explanations:
(?! # open a negative lookahead (means not followed by)
[\s\S]* # all characters including newlines (space+not space)
# zero or more times
12
) # close the lookahead
In other words the pattern means: 12 not followed by another 12 until the end of the string.
newString = oldString.substring(0,oldString.lastIndexOf("_")) + 'aa';
Use this String.replace and make sure you have end of input $ in the end:
repl = "12_13_12".replace(/12(?!.*?12)/, 'aa');
EDIT: To use a variable in Regex:
var re = new RegExp(ToBeReplaced);
repl = str.replace(re, 'aa');

regular expressions: find every word that appears exactly one time in my document

Trying to learn regular expressions. As a practice, I'm trying to find every word that appears exactly one time in my document -- in linguistics this is a hapax legemenon (http://en.wikipedia.org/wiki/Hapax_legomenon)
So I thought the following expression give me the desired result:
\w{1}
But this doesn't work. The \w returns a character not a whole word. Also it does not appear to be giving me characters that appear only once (it actually returns 25873 matches -- which I assume are all alphanumeric characters). Can someone give me an example of how to find "hapax legemenon" with a regular expression?
If you're trying to do this as a learning exercise, you picked a very hard problem :)
First of all, here is the solution:
\b(\w+)\b(?<!\b\1\b.*\b\1\b)(?!.*\b\1\b)
Now, here is the explanation:
We want to match a word. This is \b\w+\b - a run of one or more (+) word characters (\w), with a 'word break' (\b) on either side. A word break happens between a word character and a non-word character, so this will match between (e.g.) a word character and a space, or at the beginning and the end of the string. We also capture the word into a backreference by using parentheses ((...)). This means we can refer to the match itself later on.
Next, we want to exclude the possibility that this word has already appeared in the string. This is done by using a negative lookbehind - (?<! ... ). A negative lookbehind doesn't match if its contents match the string up to this point. So we want to not match if the word we have matched has already appeared. We do this by using a backreference (\1) to the already captured word. The final match here is \b\1\b.*\b\1\b - two copies of the current match, separated by any amount of string (.*).
Finally, we don't want to match if there is another copy of this word anywhere in the rest of the string. We do this by using negative lookahead - (?! ... ). Negative lookaheads don't match if their contents match at this point in the string. We want to match the current word after any amount of string, so we use (.*\b\1\b).
Here is an example (using C#):
var s = "goat goat leopard bird leopard horse";
foreach (Match m in Regex.Matches(s, #"\b(\w+)\b(?<!\b\1\b.*\b\1\b)(?!.*\b\1\b)"))
Console.WriteLine(m.Value);
Output:
bird
horse
It can be done in a single regex if your regex engine supports infinite repetition inside lookbehind assertions (e. g. .NET):
Regex regexObj = new Regex(
#"( # Match and capture into backreference no. 1:
\b # (from the start of the word)
\p{L}+ # a succession of letters
\b # (to the end of a word).
) # End of capturing group.
(?<= # Now assert that the preceding text contains:
^ # (from the start of the string)
(?: # (Start of non-capturing group)
(?! # Assert that we can't match...
\b\1\b # the word we've just matched.
) # (End of lookahead assertion)
. # Then match any character.
)* # Repeat until...
\1 # we reach the word we've just matched.
) # End of lookbehind assertion.
# We now know that we have just matched the first instance of that word.
(?= # Now look ahead to assert that we can match the following:
(?: # (Start of non-capturing group)
(?! # Assert that we can't match again...
\b\1\b # the word we've just matched.
) # (End of lookahead assertion)
. # Then match any character.
)* # Repeat until...
$ # the end of the string.
) # End of lookahead assertion.",
RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
If you are trying to match an English word, the best form is:
[a-zA-Z]+
The problem with \w is that it also includes _ and numeric digits 0-9.
If you need to include other characters, you can append them after the Z but before the ]. Or, you might need to normalize the input text first.
Now, if you want a count of all words, or just to see words that don't appear more than once, you can't do that with a single regex. You'll need to invest some time in programming more complex logic. It may very well need to be backed by a database or some sort of memory structure to keep track of the count. After you parse and count the whole text, you can search for words that have a count of 1.
(\w+){1} will match each word.
After that you could always perfrom the count on the matches....
Higher level solution:
Create an array of your matches:
preg_match_all("/([a-zA-Z]+)/", $text, $matches, PREG_PATTERN_ORDER);
Let PHP count your array elements:
$tmp_array = array_count_values($matches[1]);
Iterate over the tmp array and check the word count:
foreach ($tmp_array as $word => $count) {
echo $word . ' ' . $count;
}
Low level but does what you want:
Pass your text in an array using split:
$array = split('\s+', $text);
Iterate over that array:
foreach ($array as $word) { ... }
Check each word if it is a word:
if (!preg_match('/[^a-zA-Z]/', $word) continue;
Add the word to a temporary array as key:
if (!$tmp_array[$word]) $tmp_array[$word] = 0;
$tmp_array[$word]++;
After the loop. Iterate over the tmp array and check the word count:
foreach ($tmp_array as $word => $count) {
echo $word . ' ' . $count;
}

Matching characters reversely using regex

i need a regex that matches a string from specified position to first character reversely. strings are some file names.
i m using Delphi 2010
my example string is New Document.extension
if specified position is 4, it should match:
New Docu
You can get from "New Document.extension" to "New docu" following those steps:
First strip the extension. You end up with "New Document"
Remove the last 4 characters. You get "New Docu".
For the "This Is My Longest Document.ext1.ext2" example:
Strip the extension, you end up with: "This Is My Longest Document.ext1"
Strip the last 4 characters. You get: "This Is My Longest Document."
So you want the entire string up to the fourth-to-last position before the final dot? No problem:
Delphi .NET:
ResultString := Regex.Match(SubjectString, '^.*(?=.{4}\.[^.]*$)').Value;
Explanation:
^ # Start of string
.* # Match any number of characters
(?= # Assert that it's possible to match, starting at the current position:
.{4} # four characters
\. # a dot (the last dot in the string!) because...
[^.]* # from here one only non-dots are allowed until...
$ # the end of the string.
) # End of lookahead.
Since I can't post the regex because I came up with the exact same Regex as Tim, I'm going to post a piece of procedural code that does the exact same thing.
function FileNameWithoutExtension(const FileName:string; const StripExtraNumChars: Integer): string;
var i: Integer;
begin
i := LastDelimiter('.', FileName); // The extension starts at the last dot
if i = 0 then i := Length(FileName) + 1; // Make up the extension position if the file has no extension
Dec(i, StripExtraNumChars + 1); // Strip the requested number of chars; Plus one for the dot itself
Result := Copy(FileName, 1, i); // This is the result!
end;
You accepted the answer giving a regex for
The entire string up to the fourth-to-last position before the final dot.
If that's what you want then you do it best without a regex:
procedure RemoveExtensionAndFinalNcharacters(var s: string; N: Integer);
begin
s := ChangeFileExt(s, '');//remove extension
s := Copy(s, 1, Length(s)-N);//remove final N characters
end;
This more efficient than a regex and, much more importantly, it is much clearer and more intelligible.
Regexes are not the only fruit.
Edit based on comments
I'm not sure how Delphi does regex, but this works in most systems.
^.*(?=.{4}\.\w+$)
^ #the start of the string
.* #Any characters.
(?= #A lookahead meaning followed by...
.{4} #Any 4 chars.
\. #A literal .
\w+ #an actual extension.
$ #the end of the string
) #closing the lookahead
You could also use \w{3}$ instead of \w+ at the end if you wanted to make sure that the extension was three charaters long.