Confusion in regex pattern for search - regex

Learning regex in bash, i am trying to fetch all lines which ends with .com
Initially i did :
cat patternNpara.txt | egrep "^[[:alnum:]]+(.com)$"
why : +matches one or more occurrences, so placing it after alnum should fetch the occurrence of any digit,word or signs but apparently, this logic is failing....
Then i did this : (purely hit-and-try, not applying any logic really...) and it worked
cat patternNpara.txt | egrep "^[[:alnum:]].+(.com)$"
whats confusing me : . matches only single occurrence, then, how am i getting the output...i mean how is it really matching the pattern???
Question : whats the difference between [[:alnum:]]+ and [[:alnum:]].+ (this one has . in it) in the above matching pattern and how its working???
PS : i am looking for a possible explanation...not, try it this way thing... :)
Some test lines for the file patternNpara.txt which are fetched as output!
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com

Looking at your screenshot it seems you're trying to match email address that has # character also which is not included in your regex. You can use this regex:
egrep "[#[:alnum:]]+(\.com)" patternNpara.txt
DIfference between 2 regex:
[[:alnum:]] matches only [a-zA-Z0-9]. If you have # or , then you need to include them in character class as well.
Your 2nd case is including .+ pattern which means 1 or more matches of ANY CHARACTER

If you want to match any lines that end with '.com', you should use
egrep ".*\.com$" file.txt
To match all the following lines
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
^[[:alnum:]].+(.com)$ will work, but ^[[:alnum:]]+(.com)$ will not. Here is the reasons:
^[[:alnum:]].+(.com)$ means to match strings that start with a a-zA-Z or 0-9, flows two or more any characters, and end with a 'com' (not '.com').
^[[:alnum:]]+(.com)$ means to match strings that start with one or more a-zA-Z or 0-9, flows one character that could be anything, and end with a 'com' (not '.com').

Try this (with "positive-lookahead") :
.+(?=\.com)
Demo :
http://regexr.com?38bo0

Related

Looking for single occurrence between '{' and ':' in a large text

I'm new to the Regex world, so please be kind on the tantrums :-)
I would like to print only the first occurrence of a string between { and :.
Example in the following string:
({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75)
or
({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)
I want it to output only Zabbix windows
how is that possible?
I tried {([a-zA-Z0-9 ]*): it is printing : and doing it twice.
Thanks for reading!
Srini
You may use a PCRE regex with -o option (extracting the matches rather than returning the whole lines) to grab the text you need and use head -1 to only have the first match:
s='({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75) or ({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)'
echo $s | grep -oP '(?<={)[\w\s]+(?=:)' | head -1
See an online demo
Pattern details:
(?<={) - there must be a { immediately to the left of the current location
[\w\s]+ - 1+ word and/or whitespace chars
(?=:) - there must be a : immediately to the right of the current location.

Using preg_match in php to verify username ( or others )

I read through the similar questions that already been asked but I still couldnn't get it right .
http://regexr.com/39b64
\S should return everything on the keyboard except space , tab and enter .
^$ should be a whole match as it starts from ^ and ends with $ .
There was a link that also uses something similar like the above with addition of {0,} which should be infinite letters but it doesn't work on regexr.com when I tested .
Another link suggested to remove the $ and replace it with \z but it doesn't work on regexr.com as well .
I'm planning to user preg_match to see whether a not the username enter is with all characters on the keyboard except space , tab and enter .
Username = "abcCD0123_" valid
Username = "abcCD0123_!##$%^&)_[]-=\',;/`~) valid
Username = " abcd123~!#$##%[];,.;'" invalid
Username = "abcd123~!#$##%[];,.;' " invalid
Username = " abcd123~ !#$##%[];,.;' " invalid
Something like that cause' I read about a question where someone suggested to do the verification matching on the php side instead of html side for security reasons .
edit : I tried ...
/^[\S]+$/
/^[\S]*$/
/^[\S]{0,}$/
/^[^\s\S]+$/
/^[^\s\S]*$/
/^[^\s\S]{0,}$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]+$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]*$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]{0,}$/
( something like this for this i can't remember cause' I modified a lot of times on this one )
You can just check that the line is composed by only non-space characters (demo):
/^\S+$/
Strings with multiple lines
The regex assumes that you are checking a single username at time (what you probably want to do in your code). But as shown in the demo and as described by user3218114 in his answer, if you have a multiple line string, you need to use the m flag to allow ^ and $ to match also for begin end of each line (otherwise it will just match begin/end of the string). This is probably why your tests weren't working.
/^\S+$/m
You need to use m (PCRE_MULTILINE) modifier if you want to use ^ and $
When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.
Here is demo to check for any string/line that contains non white space and length is in between 8 to 50
^[^\s]{8,50}$
Online demo
OR
^\S{8,50}$
Online demo
Sample code: (Focus on m modifier)
$re = "/^[^\\s]{8,50}$/m";
$str = "...";
preg_match_all($re, $str, $matches);
Based on your examples I suppose you want to have something like that:
/^[a-z0-9_!##$%&^()_\[\]=\\',;\/~`-]+$/i
It's one character group [] which contains all allowed characters. Note, however, that one cannot just put all chars in there, some characters have special meanins in regexp and must be escaped by \ ([,],(),),/ and \ itself). You also have to be careful where to put -. In the case of a-z it means all characters between a and z (including a and z). That's why I put the - char itself at the end.
To match really everything except white-space use /^\S+$/

How to match the whole expression only, even when there are sub parts that match?

Just trying to write input validation pattern that would allow entry of wild characters. Input field is 9 char max and should follow these rules:
* + 1- 8 charcters
1- 8 chars + *
* + 1-7 chars + *
I've written this regex using the regex documentation and testing it on one of the regex testers.
\*{1}[0-9]{1,7}\*{1}|[0-9]{1,8}\*{1}|\*{1}[0-9]{1,8}|[0-9]{9}
It matches all these correctly
123456789
*1*
*12*
*123*
*1234*
*12345*
*123456*
*1234567*
1234567*
123456*
12345*
1234*
123*
12*
1*
*1
*12
*123
*1234
*12345
*123456
*1234567
*12345678
But it also matches when I don't want it. For example it finds 2 matches in this *123456789* First match is *12345678 and second one is 9*
I don't want in this case to find any matches. Either the whole string matches one of the patterns or not. How does one do that?
Use anchors that make sure the regex always matches the entire string:
^(\*[0-9]{1,7}\*|[0-9]{1,8}\*|\*[0-9]{1,8}|[0-9]{9})$
Note the parentheses to make sure that the alternation is contained within the group:
^
(
\*[0-9]{1,7}\*
|
[0-9]{1,8}\*
|
\*[0-9]{1,8}
|
[0-9]{9}
)
$
Also, {1} is always superfluous - one match per token is the default.
You could use start and end string anchors:
http://www.regular-expressions.info/anchors.html
So, your regex would be something like this (note first and last symbol):
^(\*{1}[0-9]{1,7}*{1}|[0-9]{1,8}*{1}|*{1}[0-9]{1,8}|[0-9]{9})$

Regular expression extract filename from line content

I'm very new to regular expression. I want to extract the following string
"109_Admin_RegistrationResponse_20130103.txt"
from this file content, the contents is selected per line:
01-10-13 10:44AM 47 107_Admin_RegistrationDetail_20130111.txt
01-10-13 10:40AM 11 107_Admin_RegistrationResponse_20130111.txt
The regular expression should not pick the second line, only the first line should return a true.
Your Regex has a lot of different mistakes...
Your line does not start with your required filename but you put an ^ there
missing + in your character group [a-zA-Z], hence only able to match a single character
does not include _ in your character group, hence it won't match Admin_RegistrationResponse
missing \ and d{2} would match dd only.
As per M42's answer (which I left out), you also need to escape your dot . too, or it would match 123_abc_12345678atxt too (notice the a before txt)
Your regex should be
\d+_[a-zA-Z_]+_\d{4}\d{2}\d{2}\.txt$
which can be simplified as
\d+_[a-zA-Z_]+_\d{8}\.txt$
as \d{2}\d{2} really look redundant -- unless you want to do with capturing groups, then you would do:
\d+_[a-zA-Z_]+_(\d{4})(\d{2})(\d{2})\.txt$
Remove the anchors and escape the dot:
\d+[a-zA-Z_]+\d{8}\.txt
I'm a newbie in php but i think you can use explode() function in php or any equivalent in your language.
$string = "01-09-13 10:17AM 11 109_Admin_RegistrationResponse_20130103.txt";
$pieces = explode("_", $string);
$stringout = "";
foreach($i = 0;$i<count($pieces);i++){
$stringout = $stringout.$pieces[$i];
}

Regular Expressions: querystring parameters matching

I'm trying to learn something about regular expressions.
Here is what I'm going to match:
/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
My expression should "grabs" abc123 and def456.
And now just an example about what I'm not going to match ("question mark" is missing):
/parent/child/firstparam=abc123&secondparam=def456
Well, I built the following expression:
^(?:/parent/child){1}(?:^(?:/\?|\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?
But that doesn't work.
Could you help me to understand what I'm doing wrong?
Thanks in advance.
UPDATE 1
Ok, I made other tests.
I'm trying to fix the previous version with something like this:
/parent/child(?:(?:\?|/\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?$
Let me explain my idea:
Must start with /parent/child:
/parent/child
Following group is optional
(?: ... )?
The previous optional group must starts with ? or /?
(?:\?|/\?)+
Optional parameters (I grab values if specified parameters are part of querystring)
(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?
End of line
$
Any advice?
UPDATE 2
My solution must be based just on regular expressions.
Just for example, I previously wrote the following one:
/parent/child(?:[?&/]*(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*))*$
And that works pretty nice.
But it matches the following input too:
/parent/child/firstparam=abc123&secondparam=def456
How could I modify the expression in order to not match the previous string?
You didn't specify a language so I'll just usre Perl. So basically instead of matching everything, I just matched exactly what I thought you needed. Correct me if I am wrong please.
while ($subject =~ m/(?<==)\w+?(?=&|\W|$)/g) {
# matched text = $&
}
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
= # Match the character “=” literally
)
\\w # Match a single character that is a “word character” (letters, digits, and underscores)
+? # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
& # Match the character “&” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
\\W # Match a single character that is a “non-word character”
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
Output:
This regex will work as long as you know what your parameter names are going to be and you're sure that they won't change.
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?
Whilst regex is not the best solution for this (the above code examples will be far more efficient, as string functions are way faster than regexes) this will work if you need a regex solution with up to 3 parameters. Out of interest, why must the solution use only regex?
In any case, this regex will match the following strings:
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
It will now only match those containing query string parameters, and put them into capture groups for you.
What language are you using to process your matches?
If you are using preg_match with PHP, you can get the whole match as well as capture groups in an array with
preg_match($regex, $string, $matches);
Then you can access the whole match with $matches[0] and the rest with $matches[1], $matches[2], etc.
If you want to add additional parameters you'll also need to add them in the regex too, and add additional parts to get your data. For example, if you had
/parent/child/?secondparam=def456&firstparam=abc123&fourthparam=jkl01112&thirdparam=ghi789
The regex will become
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?
This will become a bit more tedious to maintain as you add more parameters, though.
You can optionally include ^ $ at the start and end if the multi-line flag is enabled. If you also need to match the whole lines without query strings, wrap this whole regex in a non-capture group (including ^ $) and add
|(?:^\/parent\/child\/?\??$)
to the end.
You're not escaping the /s in your regex for starters and using {1} for a single repetition of something is unnecessary; you only use those when you want more than one repetition or a range of repetitions.
And part of what you're trying to do is simply not a good use of a regex. I'll show you an easier way to deal with that: you want to use something like split and put the information into a hash that you can check the contents of later. Because you didn't specify a language, I'm just going to use Perl for my example, but every language I know with regexes also has easy access to hashes and something like split, so this should be easy enough to port:
# I picked an example to show how this works.
my $route = '/parent/child/?first=123&second=345&third=678';
my %params; # I'm going to put those URL parameters in this hash.
# Perl has a way to let me avoid escaping the /s, but I wanted an example that
# works in other languages too.
if ($route =~ m/\/parent\/child\/\?(.*)/) { # Use the regex for this part
print "Matched route.\n";
# But NOT for this part.
my $query = $1; # $1 is a Perl thing. It contains what (.*) matched above.
my #items = split '&', $query; # Each item is something like param=123
foreach my $item (#items) {
my ($param, $value) = split '=', $item;
$params{$param} = $value; # Put the parameters in a hash for easy access.
print "$param set to $value \n";
}
}
# Now you can check the parameter values and do whatever you need to with them.
# And you can add new parameters whenever you want, etc.
if ($params{'first'} eq '123') {
# Do whatever
}
My solution:
/(?:\w+/)*(?:(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)?|\w+|)
Explain:
/(?:\w+/)* match /parent/child/ or /parent/
(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)? match child?firstparam=abc123 or ?firstparam=abc123 or ?
\w+ match text like child
..|) match nothing(empty)
If you need only query string, pattern would reduce such as:
/(?:\w+/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)
If you want to get every parameter from query string, this is a Ruby sample:
re = /\/(?:\w+\/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)/
s = '/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789'
if m = s.match(re)
query_str = m[1] # now, you can 100% trust this string
query_str.scan(/(\w+)=(\w+)/) do |param,value| #grab parameter
printf("%s, %s\n", param, value)
end
end
output
secondparam, def456
firstparam, abc123
thirdparam, ghi789
This script will help you.
First, i check, is there any symbol like ?.
Then, i kill first part of line (left from ?).
Next, i split line by &, where each value splitted by =.
my $r = q"/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789";
for my $string(split /\n/, $r){
if (index($string,'?')!=-1){
substr($string, 0, index($string,'?')+1,"");
#say "string = ".$string;
if (index($string,'=')!=-1){
my #params = map{$_ = [split /=/, $_];}split/\&/, $string;
$"="\n";
say "$_->[0] === $_->[1]" for (#params);
say "######next########";
}
else{
#print "there is no params!"
}
}
else{
#say "there is no params!";
}
}