Regex match multiple occurrences of same pattern - regex

I'm trying to run a test to see if a long string with multiple lines has multiple occurrences of the same pattern, or lets say 5 or 10 occurrences.
So a string like:
$string = "this is a test pattern1 more of a test pattern1
and so on and so on pattern1";
So in this case I was trying in PHP:
if (preg_match('/(pattern1)\1{2,}/m',$string)) print "Found 2+ occurrences of pattern1\n";
Of course this does not work.
And I cannot use preg_match_all.
Can someone correct my regex please?

If I understand well, you was not far from the good pattern (for three occurrences here):
/(pattern1)(?:.*?\1){2,}/s
where the s modifier allows the dot to match newlines.

You can use the following..
if (preg_match('/(pattern1)(?:((?!\1).)*\1){2,}/s', $string)) {
Working Demo

Check this pattern
/(pattern1)/g
g - modifier finds all matches instead of returning first match.

why dont you simply search for the word/pattern using preg_match_all and count the number of occurences :
<?php
$message="this is a test pattern1 more of a test pattern1 and one more pattern1";
echo preg_match_all('/pattern1/i', $message, $matches);
will return 3.
or rather in your case:
<?php
$message="this is a test pattern1 more of a test pattern1 and one more pattern1";
if(preg_match_all('/pattern1/i', $message, $matches) >= 2 ) print "Found 2+ occurences of pattern1\n";

Related

Regex - finding last digits from a string

I've a string that is of the form:
<somedomain>/index.php?attachments/24322
I'm required to find out ending number which can have any count of digits after the '/'. That is 24322, in this example. Also, the number will always have 'attachments/' before it. That is, the URL must have the format 'attachments/'
Can someone help me write the regex to achieve this?
I'm still at a beginner with Regex. I'd be using it with preg_match_all in my php code.
Thank you in advance for reading this question and your time.
https://regex101.com/r/iPy6av/1
$re = '/(\d+)$/';
$str = '<somedomain>/index.php?attachments`enter code here`/24322';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Now loop through the matches array you will get the number.
If the string always ends with attachments/[digits] you could use:
attachments/\K\d+$
That would match
attachments # Match attachments
\K # Reset the starting point of the reported match
\d+ # Match one or more digits
$ # End of the string
Php output test

Regex - skip over expressions and parse the rest

I use regular expressions for sorting data into groups. The lines look somewhat like:
testword test
test testword
tes.w. tes.
tes tes.w.
tes.w othertexttobefound
sometexttobefound testword somemoretextwhichdoesnotmatter
The word test is to be found as well as othertexttobefound and sometexttobefound.
Now I am trying to tell my parser that it is supposed to plainly ignore testword and its derivatives while searching and focus on the rest of my data entries. The "good words" and the "bad words" can be anywhere in each line.
I have tried [^w] which is fine for the beginning of strings, but in my versions not for the other cases. Also (?:w) didn't do the trick. I cannot use lookarounds as these would keep the whole line from being detected.
After long searches on the internet I am hoping for help here!
After much appreciated help from Naxos84, I am adding some German real life examples:
sozialabgabe sozialarbeiter
soz.abg. sozialarbeiter
sozarbeiter soz.abg.
sozialarbeiter otherirrelevantstuff
otherirrelevantstuff soz abg
otherirrelevantstuff sozabg
otherirrelevantstuff sozialabgabe
If I search with:
sozial["^\ab"]|soz["^\ab"]|sometexttobefound|othertexttobefound
Lines 6 and 7 get marked as well, but I don't want those.
What am I doing wrong?
A link:
regexr
To find all the matches you want: any occurence of "test" and "sometexttobefound" and "othertexttobefound you can try the following regex:
test[^\w]|sometexttobefound|othertexttobefound
This regex means:
Find every "test" that is not followed by a word OR sometexttobefound OR othertexttobefound
I tried this regex with the follow text (I added a few 'test's)
testword test
test testword
tes.w. testtes.
tes tes.w. test
tes.w othertexttobefound
sometexttobefound testword somemoretextwhichdoesnotmatter
at regexr (when using the global flag)
If you also want to find things like "tes" I guess you should add it. (I'm not a regex expert)
Like:
test[^\w]|tes[^\w]|sometexttobefound|othertexttobefound
If you want to get all words from the text except from some special words, you could use:
#words = grep{$_ ne 'testword'} split /\P{L}+/, $str;
(if $str is your complete string)
See perl docs for \P{...}. Instead of \P{L}, you could also use \W, but those are locale-dependent.
But if you need to use regexps only, then you could use
#words = $str =~ /\b(?!testword)\p{L}+\b/g;
But again, \b is locale-dependent again, so you might want to use \b{...} or rebuild the word boundary matches with \p{L}:
#words = $str =~ /
(?:(?<=\p{L})(?!\p{L})|(?<!\p{L})(?=\p{L}))
(?!testword)\p{L}+
(?:(?<=\p{L})(?!\p{L})|(?<!\p{L})(?=\p{L}))
/gx;

String between two characters (dots)

Is there a possibility to use Regex for finding matching string between two dots?
I have strings with direcotries and I need to find string between two dots. Eg:
$string = '/Folder/file.co.txt';
and regex will return ONLY co between two dots.
I've tried following pattern: '/..../', but it returned .co. with dots.
Is there a possibility to do this with regex or all I can do is splice returned string?
You could use lookaround:
$string = '/Folder/file.co.txt';
preg_match('/(?<=\.)..(?=\.)/', $string, $matches);
echo $matches[0];
Output:
co
If you use preg_match you can set braces like () to define a group.
Your Statement can look like
"~\.(..)\.~g"

how to replace a string with a dynamic string

Case 1.
I have a string of alphabets like fthhdtrhththjgyhjdtygbh. Using regex I want to change it to ftxxxxxxxxxxxxxxxxxxxxx, i.e, keep the first two letters and replace the rest by x.
After a lot of googling, I achieved this:
s/^(\w\w)(\w+)/$1 . "x" x length($2)/e;
Case 2.
I have a string of alphabets like sdsABCDEABCDEABCDEABCDEABCDEsdf. Using regex I want to change it to sdsABCDExyxyxyABCDEsdf, i.e, keep the first and last ABCDE and replace the ABCDE in the middle with xy.
I achieved this:
s/ABCDE((ABCDE)+)ABCDE/$len = length($1)\/5; ABCDE."xy"x $len . ABCDE/e;
Problem : I am not happy with my solution to the mentioned problem. Is there any better or neat solution to the mentioned problem.
Contraint : Only one regex have to be used.
Sorry for the poor English in the title and the body of the problem, english isn't my first language. Please ask in comments if anything is not clear.
Task 1: Simplify the password hider regex
Use a Positive Lookbehind Assertion to replace all word characters preceded by two other word characters. This removes the need for the /e Modifier:
my $str = 'fthhdtrhththjgyhjdtygbh';
$str =~ s/(?<=\w{2})\w/x/g;
print $str;
Outputs:
ftxxxxxxxxxxxxxxxxxxxxx
Task 2: Translate inner repeated pattern regex
Use both a Positive Lookbehind and Lookahead Assertion to replace all ABCDE that are bookended by the same string:
my $str = 'sdsABCDEABCDEABCDEABCDEABCDEsdf';
$str =~ s/(?<=(ABCDE))\1(?=\1)/xy/g;
print $str, "\n";
Output:
sdsABCDExyxyxyABCDEsdf
One regex, less redundancy using \1 to refer to first captured group,
s|(ABCDE)\K (\1+) (?=\1)| "xy" x (length($2)/length($1)) |xe;

regular expression to match strings with decimals

I'm trying to create a regex which will do the following:
Name description: "QUARTERLY PATCH FOR XAQE (JUL 2013 - 11.2.0.3.20) : (125546467)"
Val version : 11.2.0.3.4
In order to output:
"Name, 11.2.0.3.20"
"Val, 11.2.0.3.4"
I have created the following regex: /^([\w]+).*([\d\.\d]+).*/, but it is only matching the last number in the 2nd group, i.e. in 11.2.0.3.4 it will only match 4. Could anyone help?
Also, there could be more than the two lines given above, so it needs to account for arbitrary lines where the version number could be anywhere in the line.
You can use a one-liner for this as well:
perl -lne '/(\w+).*?(\d+(\.\d+)+)/; print "$1, $2"' <filename>
__END__
Name, 11.2.0.3.20
Val, 11.2.0.3.4
If you are only planning for the output and not doing any processing over the captured groups, then this will do:
$str =~ s/([\n\r]|^)(Name|Val).*?(\d+(\.\d+)+).*/$1"$2, $3"/g;
Your problem is that .* is greedy and will consume as much as it can whilst the pattern still matches. One solution is to make is lazy .*?
Also [\d\.\d]+ means match one of \d, \. and \d, so it's the same as [\d.]+ which isn't what you want since it would match "2013" in the first line. \d+(\.\d+)+ is more suitable.
After those 2 changes you have:
^([\w]+).*?(\d+(\.\d+)+).*
RegExr