procmail find first 3 upper case chars in subject - procmail

I am new to procmail and struggling to understand the syntax.
What I want to do is to check the subject line to see if it begins with 3 upper case chars followed by a colon, and if it does, remove the colon from the end and perform and action i.e:
Subject: ABC: Other parts of the subject
:0
* $ ^Subject:/^[A-Z]{3}:$/
| /usr/bin/zarafa-dagent -C -P 'Support\\$1' vmail
Firstly I'm not sure if my regex is correct, and secondly, despite a lot of googling I can't figure out how to save my search into a variable to use elsewhere, I tried $1 for the first returned variable but that does not appear to work.
Any help would be much appreciated.

You can post-process the value of $MATCH to trim the colon.
:0 D
* ^Subject:[^ ]*\/[A-Z][A-Z][A-Z]:
{
:0
* MATCH ?? ^^\/[A-Z][A-Z][A-Z]
| /usr/bin/zarafa-dagent -C -P "Support\\$MATCH" vmail
}
The first condition captures the three uppercase characters and the colon into MATCH. The second matches this value against three uppercase characters, and captures just that part into the new value for MATCH.
As usual, the whitespace inside the brackets after Subject: consists of a space and a tab.

OK, solved this, procmail has it's own version of regex:
:0 D
* ^Subject:.*\/([A-Z]+[A-Z]+[A-Z]):
| /usr/bin/zarafa-dagent -C -P "Support\\$MATCH" vmail
EXITCODE=$?
It does not support the iterator brackets [A-Z]{3} and so you have to repeat the expression.
Also, it is case-insensitive, so you need to add the "D" flag.
Problem is I seem to be unable to remove the colon : from the end.

Related

how to match sub conditions occurring once using regex?

Hi every body i'm trying to match the following condition using regex:
string start with P
followed by one of the following:-
-------- & OR number from 0-9 followed by D occurs once
--------& OR number from 0-9 followed by M occurs once
--------& OR number from 0-9 followed by Y occurs once
--------& OR T followed by one of the following:-
------------------------------------------ & OR number from 0-9 followed by H occurs once
------------------------------------------ & OR number from 0-9 followed by M occurs once
------------------------------------------ & OR number from 0-9 followed by S occurs once
i try to use the following with success:-
P?(([0-9]{1,}D)|([0-9]{1,}M)|([0-9]{1}Y)|(T?(([0-9]{1,}H)|([0-9]{1,}M)|([0-9]{1,}S))))
but it match any given number of any condition i addressed before
any idea how i can achieve this regex condition ?
Edit
lastly i found what i'm looking for
/^P(?=\w*\d)(?:\d+Y|Y)?(?:\d+M|M)?(?:\d+W|W)?(?:\d+D|D)?(?:T(?:\d+H|H)?(?:\d+M|M)?(?:\d+(?:\­.\d{1,2})?S|S)?)?$/
I'm not sure I fully understand your spec, but is this getting close?
P([0-9]+D)?([0-9]+M)?([0-9]+Y)?(T([0-9]+H)?([0-9]+M)?([0-9]+S)?)?
Everything is optional except the leading P, order matters, each section can occur only
once, and the number of digits used in each case is one or more. T is required if anything
after it is included.
The RE above matches "P" and "PT", while the spec presumably requires at least one of the optional components to follow P and T. Using lookahead with grep -P (for Perl regular expressions), we can require P to be followed by a digit.
$ RE='P(?=[0-9])([0-9]+D)?([0-9]+M)?([0-9]+Y)?(T(?=[0-9])([0-9]+H)?([0-9]+M)?([0-9]+S)?)?'
$ for s in P1DT5S P1DT5 P1DT P1D P1 P
do
printf "%-10s %s\n" $s $(echo $s | grep -P -o $RE)
done
P1DT5S P1DT5S
P1DT5 P1DT
P1DT P1D
P1D P1D
P1 P
P
$

Regex with global modifier to capture words within lines

The Input:
Let's consider this string below
* key : foo bar *
* big key : bar*bar
* healthy : cereal bar *
sadly : without star *
The Output:
I would like to retrieve the key:value pairs for each match.
'key', 'foo bar'
'big key', 'bar*bar'
'healthy', 'cereal bar'
'sadly', 'without star'
The Regex:
My first success was achieved with this Regex (PCRE/Perl):
/(\n?)([^\* ].*[^ *])\s+:\s+([^\* ].*[^ *])[\s\*]+(?|\n)/g
Here the DEMO.
My question
I really find my regex pretty ugly. The main reason is because I can't use /^ and $/ in a global regex and I had to play with /(\n?)...(?|\n)/g.
Is there any possibility to shorten the above regex ?
The optional challenge
Actually this was the easy part. My string is supposed to be embedded in a C comment and I have to make sure I am not trying to match something outside a comment block.
(I not really need an answer to this second tricky question because if I write a script I can first match all the comments blocks, then find all the key:values patterns).
/********************************
* key : foo bar *
* big key : bar*bar
* healthy : /*cereal bar *
sadly : without star *
********************************/
not a key : this key
You can add the m -flag to the regexp to make anchors ^ and $ match beginnings and ends of each line within the string, i.e:
/^\s*\*?\s*([^:]+?)\s*:\s*(.*?)\s*\*?\s*$/gm
Note the use of non-greedy quantifiers (+? and *?) to not eat up characters that can be matched after the quantifier, i.e. the first capture group will not include the optional trailing whitespace before the colon, and the second capture group will not include trailing whitespace and an optional asterisk at the end of a line.
http://regex101.com/r/oJ8uW4/1
the regex I used is: /^\s*[*]*\s+(.*)\s+:\s+(.*?)\s+[*]*\s*$/gm
It works for your exemple as the not a key : this key has no space after it, so it would miss comments which do not close whith * and get values with trailing spaces too.
The point you're looking for is the modifiers after the last /
m to says it's multiline so ^ and $ are usable and g to rematch on each line.
The drawback is you can't rely on having /* and */ on lines around when using ^ and $
But Avinash will prove me wrong I bet :) (he's far better than me with regexes)

Confusion in regex pattern for search

Learning regex in bash, i am trying to fetch all lines which ends with .com
Initially i did :
cat patternNpara.txt | egrep "^[[:alnum:]]+(.com)$"
why : +matches one or more occurrences, so placing it after alnum should fetch the occurrence of any digit,word or signs but apparently, this logic is failing....
Then i did this : (purely hit-and-try, not applying any logic really...) and it worked
cat patternNpara.txt | egrep "^[[:alnum:]].+(.com)$"
whats confusing me : . matches only single occurrence, then, how am i getting the output...i mean how is it really matching the pattern???
Question : whats the difference between [[:alnum:]]+ and [[:alnum:]].+ (this one has . in it) in the above matching pattern and how its working???
PS : i am looking for a possible explanation...not, try it this way thing... :)
Some test lines for the file patternNpara.txt which are fetched as output!
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
Looking at your screenshot it seems you're trying to match email address that has # character also which is not included in your regex. You can use this regex:
egrep "[#[:alnum:]]+(\.com)" patternNpara.txt
DIfference between 2 regex:
[[:alnum:]] matches only [a-zA-Z0-9]. If you have # or , then you need to include them in character class as well.
Your 2nd case is including .+ pattern which means 1 or more matches of ANY CHARACTER
If you want to match any lines that end with '.com', you should use
egrep ".*\.com$" file.txt
To match all the following lines
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
^[[:alnum:]].+(.com)$ will work, but ^[[:alnum:]]+(.com)$ will not. Here is the reasons:
^[[:alnum:]].+(.com)$ means to match strings that start with a a-zA-Z or 0-9, flows two or more any characters, and end with a 'com' (not '.com').
^[[:alnum:]]+(.com)$ means to match strings that start with one or more a-zA-Z or 0-9, flows one character that could be anything, and end with a 'com' (not '.com').
Try this (with "positive-lookahead") :
.+(?=\.com)
Demo :
http://regexr.com?38bo0

How can I use regex to ignore strings if they contain a certain string

I am trying to use regex to scan through some log files. In particular, I am looking to pick out lines that meet this format:
IP address or random number "banned.", so for example, "111.111.111.111 banned." or "0320932 banned.", etc.
There should only be 2 groups of characters (the number/IP address and "banned." There may be more than one space in between the words or before them), the string should also not contain "client", "[private]", or "request". For the most part I am just confused about how to go about detecting the groups of characters and avoiding strings that contain those words.
Thanks for any help that you may have to offer
egrep -v '^ *[0-9]+((\.[0-9]+){3})? +banned\.$'
Allows optional leading spaces at the beginning of the line.
Must be followed by an all-digit sequence OR an IP-like address.
Must be followed by at least one space.
Line must end in 'banned.'
Finally, the -v option ensures that only lines NOT matching the regex are returned.
With these constraints you needn't worry about ruling out additional words such as 'client'.
I'm assuming in the following input data lines 1 and 3 should be dropped:
111.111.111.111 banned.
2.2.2.2 wibble
0320932 banned
1434324 wobble
You can drop them with this grep expression:
$ grep -E -v "[0-9.]+ +banned" logfile.log
2.2.2.2 wibble
1434324 wobble
$
This regular expression matches 1 or more numbers and periods followed by 1 or more spaces followed by the word "banned". Passing -v to grep will cause it to display all lines that do not match the regular expression. Add -i to the grep command to make it case-insensitive.
You want a negating match, which looks like:
/^((?!([\d.\s]+banned\.)).)*$/
See it in action: http://regex101.com/r/bY7pK4
Note your example shows a period after banned. If you don't want it, remove \. from the expression.
Try this RegExp
String regex = "\\d+.\\d+.\\d+.\\d+ banned.";
Here you can filter your both kind of string.
Example:
public static void main(String[] args) {
System.out.println("start");
String src = "657 hi tis is 111.111.111.111 banned. 57 happy i9";
//String src = "87 working is 0320932 banned. Its ending str 08";
String regex = "\\d+.\\d+.\\d+.\\d+ banned.";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(src);
while(matcher.find()){
System.out.println(matcher.start() + " : " + matcher.group());
}
}
Let me know if it is not working for you.
trying to match IP address or random number "banned."
This egrep should work for you:
egrep '(([0-9]{1,3}\.){3}[0-9]{1,3}|[0-9]+) +banned' logfile
The following will work:
\s*\d\d\d\.\d\d\d\.\d\d\d\.\d\d\d\s*banned\s*

Replace patterns that are inside delimiters using a regular expression call

I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.