PowerGREP - regular expression - regex

I have log of Apache and each line of file looks like:
script.php?variable1=value1&variable2=value2&variable3=value3&.........................
I need to take out this part of string:
variable1=value1&variable2=value2
and ignore the rest of line. How I can do this in PowerGREP?
I tried:
variable1=(.*)&variable2=(.*)&
But I get rest of line after value2.
Please help me, sorry for my english.

Contrary to what Ed Cottrell wrote about his second example, the first one works better (i. e. correctly); this is because if the subexpression for value2 is made non-greedy, it matches as few characters as possible, i. e. not any.
If you wouldn't mind having the & after value2 included in the match, you could as well hone your try by making the subexpression for value2 non-greedy, so that it only extends to the next &:
variable1=(.*)&variable2=(.*?)&

Replace . with [^&] and drop the final &, like this:
variable1=(.*)&variable2=([^&]*)
. will match anything it can (any character except for the newline character, basically). [^&], on the other hand, matches only characters that are not &.
For even better results and faster performance, you can also replace the first . in the same way and add ? (the non-greedy qualifier), like so:
variable1=([^&]*?)&variable2=([^&]*?)
Here's a working demo.

Related

RegEx that matches characters after semicolon in the same line

I need some help with the Regular Expressions. I need a RegEx that matches with characters if they are after a semicolon AND in the same line of a previous word.
Let me explain that:
I need something like this. I have to make a function that does not allow to introduce character after a semicolon in the same line, and I think I could do it with this sort of RegEx.
Thank you.
I am not sure I understood your question, but would something like this help?
This regular expression
Well, you've got two ways to do it:
A: Create a regular expression to validate correct input.
B: Create a regular expression to find incorrect input.
I would use option 1, but it depends on what you need to do.
A: Regex to validate correct lines
In this case, we'll use the m modifier to set the regex engine to search by line (m = multiline). This means that ^ matches the beginning of a line and $ matches the end of a line.
Then we want to match some characters which are not the semicolon itself. To do this we use the [^ ] group meaning "anything which is not in the provided list of characters". So to say any char except the semicolon we'll have to use [^;].
Now, this char is not alone as they'll be probably many of them. To do that we can either use the * or + operators that respectively mean "0 or more times" and "1 or more times". If the data before the semicolon is mandatory then we'll use the + operator. This leads to [^;]+ to say any char which is not a semicolon, 1 or more times.
Then we'll capture this with the () operators. This will let us have direct access to this value without having to take the line and remove the semicolon with a truncation by our own.
After this capturation, we have the semicolon and then maybe some empty spaces or not and then the end of the line. For the spaces after, it's up to you. It would be \s* to say any kind of space, tab or blank char 0 or n times.
At the end we get this regex: ^([^;]+);\s*$ with the m and g flags
m for multiline and g for global, which means don't stop at the first match but look for all of them.
Test it here: https://regex101.com/r/sT59eu/1/
B: Regex to find invalid lines
Well, this could be rather easy too: ;.+$
. means any char. So here we'll find the lines with something behind the semicolon.
Test it here: https://regex101.com/r/ocDofm/1/
But you will NOT find lines with missing semicolons!
if I understand it correctly,
(?<=;)[A-Za-z]+
might does your work.
The python documentation is helpful: https://docs.python.org/3/library/re.html

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Regex to match "Warm Regards"-type email signatures

I am an absolute regex noob and have been banging my head against the wall trying to write a regex to remove email signatures from a string that look like this:
Hi There, this is an email.
Warm Regards,
Joe Bloggs
Thus far, I’ve tried variations on:
/^[\w |][R|r]egards,/
The regex should:
look at the beginning of the line (what I was aiming for with the ^,
cover variations like “Warm Regards”, “Kind Regards”, “Best Regards”, and plain old “Regards” (which I was hoping to accomplish with the [\w |] to match any word or blank and the [R|r] to cover Regards/regards),
be OK with mixed case like “warm regards” or “Warm Regards”, and
only pickup lines that are [word] Regards or just regards, so that we don’t grab email body that has the word “regards” somewhere in it.
This seems elementary, but I just can’t nail it, and I seem to err on broadening my regex too much such that any line that contains “regards” gets picked up. I’m doing this in Node.js combined with the string.search function if that matters.
This seems to fit all your requirements:
^(\w*\s)?[r|R]egards,?
Has to start on a new line, then can have any word followed by a space, and the word regards, or just the word regards, with the comma also being optional.
If you want to wipe out everything after the regards line as well you can add in \s*.*
^(\w*\s)?[r|R]egards,?\s*.*
If you are trying to remove everything from the Warm Regards line on, this should do it
^[^<]*?(?=(.*)[R|r]egards)
Try the following regular expression
^\w* ?regards,?
with the case insensitive & global flag specified.
You can see the regular expression explanation and what it matches here: http://regex101.com/r/vR3zG5
The regular expression that matches signatures defined in #1-#4 is following:
/^(\w+ +)?regards,? *$/im
How it works:
"^" in the beginning means new line
"(\w+ +)?" means optional segment that contains exactly one word followed by at least one space
"regards" is just a simple match
",?" optional comma at the end
" *" - the line may contain trailing spaces (it may be useful to put the same match after ^)
"$" - end of line
/.../i - means that the expression is case-insensitive
/.../m - means that ^ and $ match at line breaks

The regex characters ?, $, |

I have the following data:
abc def; ghi.
This regex will match:
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))?
This regex will also match
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)$
I'm still quite new to regex's, but I thought | stood for OR, () grouped and ? stood for 0 or one occurence. So i thought when combining above queries it would still match. However the following will not match:
([a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))|$
What am I doing wrong?
ps.
I am using the following for testing my regex.
http://regexpal.com/
EDIT:
I didn't use the code tag, so a character disappeared
EDIT2:
What I am trying to match is the following, the data will be a name.
So "abc def" is the surname. ghi the salutation (english is not my native language, is that the correct term for words like sir. ?). It's however possible that the first letter of the first name. That's why it should either be the end of the line, or that letter.
The data when there is a first name involved would be:
abc; def. G.
Operator precedence for the | operator is a little tricky. It's usually a good idea to explicitly wrap its two operands in parentheses.
Also, be careful about inserting spaces into your regexes. It looks like you want to match a literal period in the \w. fragment, to match "G."
So I think what you want for the combined expressions is something like
((.*)( (\w\.))?)|(.*)$
But since ? means 0 or more, as you have learned, this can be rewritten as
(.*)( (\w\.))?)$
And, to add the rest of the expression back in, we have
^[a-z0-9A-ZÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ ]*)\W (.*)( (\w\.))?)$
And, yes, "salutation" or "greeting" is a good word for "Mr.", "Ms.", "Dr.", etc.

Regex or on multiple/single characters

I'm dynamically making a regex.
I want it to match the following:
lem
le,,m
levm
lecm
Basically, "lem" but before the m it can have any number of , or any one of any character. Right now I have
le[\,]{0,}[.]?m
you can see it at
http://regexr.com?303ne
It should match every one but the third one.
Update: I figured it out:
le[\,]{0,}.?m
Whenever you think "or" in Regular Expressions, you should start with alternation:
a|b
matches either a or b. So
any number of a list of characters OR 1 of any character
can be translated quite literally to
[...]*|.
where ... would be the list of characters to match (a character class). If you use that as part of a longer expression, you need to use parentheses, because concatenation binds stronger (has higher precedence) than alternation:
le([,]*|.)m
Because the character class has only one item, we can simplify this:
le(,*|.)m
Note that . by default means "any character but newline".
What about this:
le(,*|.?)m
it should do what you want.
How about this one:
([^,])(?=\\1)
But this does the opposite :-) Not sure if it is ok for you
UPD:
this should work for you:
~^(?:,|([^,])(?!\\1))+$~
not sure what dialect you're looking for, but it works in PCRE: http://ideone.com/6Q3Wk
UPD2:
the same regex included into another
$r = '(?:,|([^,])(?!\\1))+';
var_dump(preg_match('~le' . $r . 'm~', 'leem'));
In this case the final expression becomes: le(?:,|([^,])(?!\\1))+m where le and m are added around mine without modifications