I have a recipe that cannot seem to match the address.
I tried these:
* ^From.*address9\#gmail\.com
* ^From.*address9#gmail.com
* ^From.*address\[0-9\]\#gmail\.com
* ^From.*address\d\#gmail\.com
* ^From.*address\d#gmail.com
but none of the above-mentioned works... I am totally lost. It looks like regular expressions have its own logic and patterns in procmail.
Here is my full recipe. The address to match has this 9 at the end and it's gmail.com When I put any other email address into this * ^From.* field it works OK, but this one doesn't... Here is my full recipe. The conditions to match are: to add a tag [New Report] when it comes from address9#gmail.com and the subject field is empty. I would really be thankful if somebody could advise please, as I am about to go nuts trying to understand where is the mistake here.
:0 fhw
* ^From.*address9\#gmail\.com
* ^Subject:\/.+
| /usr/bin/formail -i "Subject: [New Report]$MATCH"
Would also be grateful for any pointers how to trouble shoot it. Many thanks in advance!
The header looks like this:
From: Name Lastname <address9#gmail.com>
Date: Wed, 12 Jun 2019 20:37:17 +1200
Message-ID<CADxD3vdy5cW55mogOK5+543ngU7iFKjJcpDV3Q4YL772F=LdQ#mail.gmail.com>
Subject:
Regular expressions are pretty simple actually. Nearly every character simply matches itself. So if you have (beginning of line) F r o m : followed by anything followed by a d d r e s s 9 ... it should match.
Procmail doesn't support various Perl extensions like \d or \t, and of course there is no need to backslash characters which don't have a special meaning in regex like #. If you want to match a single digit that's [0-9] without any backslashes (those would change [ and ] back into literal matches).
^Subject:\/.+ checks that there is at least one character in the Subject: header and collects it all into $MATCH. This does not check that the subject is empty. Perhaps you want something like
# \t is not supported, replace with a literal tab character
* ^Subject:[ \t]+$
where the \t should be replaced with a literal tab. Every message should contain a Subject: header so it will always be there, but if its value consists entirely of whitespace, that is considered an empty subject. And of course, if the header is empty, there is no need to capture its contents.
Reviewing your attempts, here are some comments on each.
^From.*address9\#gmail\.com
As suggested above, the backslash before # is superfluous, but harmless. This should match any header line which starts with From and which contains address99#gmail.com somewhere later on the same line.
^From.*address9#gmail.com
The unescaped dot matches any character, so this would also match address9#gmailscom.example.org for example.
^From.*address\[0-9\]\#gmail\.com
The square brackets are escaped, and thus matched literally. This will match any line which starts with From and which contains the literal text address[0-9]#gmail.com somewhere later on the line, which is unlikely in practice.
^From.*address\d\#gmail\.com
Procmail does not support the Perl escape \d for a digit, so this would match literally addressd#gmail.com on a header line starting with From.
^From.*address\d#gmail.com
As above, with the unescaped dot again matching any character in addition to a literal dot.
Having said that, your second condition would match if there is at least a single space after the colon in Subject: so again, it really should match in your test case unless there really is absolutely nothing in the Subject: header.
So anyway, if that was your problem, the From: recipe might actually have been matching, but the Subject condition was not matching. With Procmail VERBOSE logging you would see each regex with "matched" or "no match" in the log file.
:0 fhw
* ^From.*address9#gmail\.com\>
* ^Subject:[ ]*$
| /usr/bin/formail -i "Subject: [New Report]"
(The added word boundary \> prevents com from matching if it is part of a longer word.)
Or perhaps you want to do this with any Subject: header? (This time I'm using a proper literal tab - check that you copy/paste this properly, though! There should be a space and a tab between [ and ].)
:0 fhw
* ^From.*address9#gmail\.com\>
* ^Subject:[ ]*\/.*$
| /usr/bin/formail -i "Subject: [New Report] $MATCH"
For troubleshooting, perhaps have a look at http://www.iki.fi/era/mail/procmail-debug.html
The Stack Overflow Minimal Reproducible Example guidance is also useful. Briefly, try to reduce the problem to the simplest input message and the simplest recipe which doesn't behave like you expect, then once you can't reduce it any further, take a good sharp look at what you are left with. Common causes of confusion include
malformed input messages
Empty lines in what's supposed to be the headers?
Headers wrapped over multiple physical lines?
Pesky control characters where there aren't supposed to be, either in the message or in your Procmail script (don't use Windows editors^W)
unwarranted assumptions
Regex doesn't work like you thought?
Procmail's regex dialect is different than e.g. Perl's? Online regex testers typically assume a Perl (or occasionally Javascript) regex feature set
Procmail matches against only the headers by default; some beginners miss this and are surprised when it doesn't find a string which is plainly there, only in the body
MIME content-transfer-encoding obscures the content you thought you knew was there?
Your regex really has to match a literal piece of text in the message; Procmail does no normalization e.g. to extract just the sender address in a convenient form for matching.
... Oh, and everyone's favorite: Make sure you set SHELL=/bin/sh right at the top of your recipe file. This has been the source of many otherwise completely mysterious failures over the years.
Here is a quick demo to demonstrate that your test case works for me:
tripleee#debian$ cat >test.msg
From: Name Lastname <address9#gmail.com>
Date: Wed, 12 Jun 2019 20:37:17 +1200
Message-ID<CADxD3vdy5cW55mogOK5+543ngU7iFKjJcpDV3Q4YL772F=LdQ#mail.gmail.com>
Subject:
No fooling
^D
tripleee#debian$ cat >test.rc
SHELL=/bin/sh
DEFAULT=/dev/null
VERBOSE=yes
:0 fhw
* ^From.*address9\#gmail\.com
* ^Subject:\/.+
| /usr/bin/formail -i "Subject: [New Report]$MATCH"
^D
tripleee#debian$ procmail -m test.rc <test.msg
procmail: [3717] Wed Jun 12 13:38:55 2019
procmail: Match on "^From.*address9\#gmail\.com"
procmail: Assigning "MATCH="
procmail: Matched " "
procmail: Match on "^Subject:\/.+"
procmail: Executing " /usr/bin/formail -i "Subject: [New Report]$MATCH""
procmail: Assigning "LASTFOLDER=/dev/null"
procmail: Opening "/dev/null"
From address9#gmail.com Wed Jun 12 13:38:55 2019
Subject: [New Report]
Folder: /dev/null 253
Related
I'm going over some legacy code and found this code:
cat some_file | \
sed "/^\/${CATEGORY}\/latest\//s: /.*$: ${DATA_PATH}:"
The format of the original file looks like:
/car/latest/ /US/car/2017/04/02
/bike/latest/ /US/bike/2017/03/31
/boat/latest/ /US/boat/2017/04/03
Assume the CATEGORY above is bike, and the DATA_PATH is /US/bike/2017/04/02, I guess the output will be like this, otherwise it does not make any sense.
/car/latest/ /US/car/2017/04/02
/bike/latest/ /US/bike/2017/04/02
/boat/latest/ /US/boat/2017/04/03
If so, what does the "s: /.*$:" do here? Why doesn't "/boat/latest/ /US/boat/2017/04/03" get substituted since we are replacing to the end (using the dollar sign).
If not, then what will be the output?
Thanks!
As the sed part is the issue, let us break it down:
/^/${CATEGORY}/latest// -- So this first part says to find all lines that follow this pattern, assuming CATEGORY = bike --- ^/bike/latest/. Note that ^ means the line must start with this
s: /.*$: ${DATA_PATH}: -- Once we have found lines matching the above this replacement is performed. first note is that the "normal" / delimiter has been replaced by :. Now if you look closely, it reads like this -- match a space followed by / and then all characters until the end of the line. the 'space' is the key as the only place on each line where you find a space followed by / is at the start of the second column, namely :- /US/bike/2017/03/31, using our bike example. The replacement portion also uses "space" + DATA_PATH
if we take a single line of our data (where we have bike), the matching portion is:
/bike/latest/ /US/bike/2017/03/31
^^^^^^^^^^^^^^^^^^^^
Note how the first ^ is prior to the / in front of US
The expression will match /bike/latest/ in your example. The /.*$ substitution replaces space followed by slash followed by any characters up to the end of the line. If DATA_PATH is the same as what is being replaced then this actually does nothing. Try replacing DATA_PATH with something else and you can see the substitution.
Just to clarify, the substitution replaces everything after a slash that is preceded by a space. There are no spaces before any of the category paths, e.g. /bike/latest/
I read through the similar questions that already been asked but I still couldnn't get it right .
http://regexr.com/39b64
\S should return everything on the keyboard except space , tab and enter .
^$ should be a whole match as it starts from ^ and ends with $ .
There was a link that also uses something similar like the above with addition of {0,} which should be infinite letters but it doesn't work on regexr.com when I tested .
Another link suggested to remove the $ and replace it with \z but it doesn't work on regexr.com as well .
I'm planning to user preg_match to see whether a not the username enter is with all characters on the keyboard except space , tab and enter .
Username = "abcCD0123_" valid
Username = "abcCD0123_!##$%^&)_[]-=\',;/`~) valid
Username = " abcd123~!#$##%[];,.;'" invalid
Username = "abcd123~!#$##%[];,.;' " invalid
Username = " abcd123~ !#$##%[];,.;' " invalid
Something like that cause' I read about a question where someone suggested to do the verification matching on the php side instead of html side for security reasons .
edit : I tried ...
/^[\S]+$/
/^[\S]*$/
/^[\S]{0,}$/
/^[^\s\S]+$/
/^[^\s\S]*$/
/^[^\s\S]{0,}$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]+$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]*$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]{0,}$/
( something like this for this i can't remember cause' I modified a lot of times on this one )
You can just check that the line is composed by only non-space characters (demo):
/^\S+$/
Strings with multiple lines
The regex assumes that you are checking a single username at time (what you probably want to do in your code). But as shown in the demo and as described by user3218114 in his answer, if you have a multiple line string, you need to use the m flag to allow ^ and $ to match also for begin end of each line (otherwise it will just match begin/end of the string). This is probably why your tests weren't working.
/^\S+$/m
You need to use m (PCRE_MULTILINE) modifier if you want to use ^ and $
When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.
Here is demo to check for any string/line that contains non white space and length is in between 8 to 50
^[^\s]{8,50}$
Online demo
OR
^\S{8,50}$
Online demo
Sample code: (Focus on m modifier)
$re = "/^[^\\s]{8,50}$/m";
$str = "...";
preg_match_all($re, $str, $matches);
Based on your examples I suppose you want to have something like that:
/^[a-z0-9_!##$%&^()_\[\]=\\',;\/~`-]+$/i
It's one character group [] which contains all allowed characters. Note, however, that one cannot just put all chars in there, some characters have special meanins in regexp and must be escaped by \ ([,],(),),/ and \ itself). You also have to be careful where to put -. In the case of a-z it means all characters between a and z (including a and z). That's why I put the - char itself at the end.
To match really everything except white-space use /^\S+$/
I am new to procmail and struggling to understand the syntax.
What I want to do is to check the subject line to see if it begins with 3 upper case chars followed by a colon, and if it does, remove the colon from the end and perform and action i.e:
Subject: ABC: Other parts of the subject
:0
* $ ^Subject:/^[A-Z]{3}:$/
| /usr/bin/zarafa-dagent -C -P 'Support\\$1' vmail
Firstly I'm not sure if my regex is correct, and secondly, despite a lot of googling I can't figure out how to save my search into a variable to use elsewhere, I tried $1 for the first returned variable but that does not appear to work.
Any help would be much appreciated.
You can post-process the value of $MATCH to trim the colon.
:0 D
* ^Subject:[^ ]*\/[A-Z][A-Z][A-Z]:
{
:0
* MATCH ?? ^^\/[A-Z][A-Z][A-Z]
| /usr/bin/zarafa-dagent -C -P "Support\\$MATCH" vmail
}
The first condition captures the three uppercase characters and the colon into MATCH. The second matches this value against three uppercase characters, and captures just that part into the new value for MATCH.
As usual, the whitespace inside the brackets after Subject: consists of a space and a tab.
OK, solved this, procmail has it's own version of regex:
:0 D
* ^Subject:.*\/([A-Z]+[A-Z]+[A-Z]):
| /usr/bin/zarafa-dagent -C -P "Support\\$MATCH" vmail
EXITCODE=$?
It does not support the iterator brackets [A-Z]{3} and so you have to repeat the expression.
Also, it is case-insensitive, so you need to add the "D" flag.
Problem is I seem to be unable to remove the colon : from the end.
I've got a CSV file with some 600 records where I need to replace some [CRLF] with a [space] but only when the [CRLF] is positioned between two ["] (quotation marks). When the second ["] is encountered then it should skip the rest of the line and go to the next line in the text.
I don't really have a starting point. Hope someone comes up with a suggestion.
Example:
John und Carol,,Smith,,,J.S.,,,,,,,,,,,,,+11 22 333 4444,,,,,"streetx 21[CRLF]
New York City[CRLF]
USA",streetx 21,,,,New York City,,,USA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,[CRLF]
In this case the two [CRLF] after the first ["] need to be replaced with a space [ ]. When the second ["] is encountered, skip the end of the line and go to next line.
Then again, now on the next line, after the first ["] is encountered replace all [CRLF] until the second ["] is encountered. The [CRLF]s vary in numbers.
In the CSV-file the amount of commas [,] before (23) and after (65) the 2 quotation marks ["] is constant.
So maybe a comma counter could be used. I don't know.
Thanks for feedback.
This will work using one regex only (tested in Notepad++):
Enter this regex in the Find what field:
((?:^|\r\n)[^"]*+"[^\r\n"]*+)\r\n([^"]*+")
Enter this string in the Replace with field:
$1 $2
Make sure the Wrap around check box (and Regular expression radio button) are selected.
Do a Replace All as many times as required (until the "0 occurrences were replaced" dialog pops up).
Explanation:
(
(?:^|\r\n) Begin at start of file or before the CRLF before the start of a record
[^"]*+ Consume all chars up to the opening "
" Consume the opening "
[^\r\n"]*+ Consume all chars up to either the first CRLF or the closing "
) Save as capturing group 1 (= everything in record before the target CRLF)
\r\n Consume the target CRLF without capturing it
(
[^"]*+ Consume all chars up to the closing "
" Consume the closing "
) Save as capturing group 2 (= the rest of the string after the target CRLF)
Note: The *+ is a possessive quantifier. Use them appropriately to speed up execution.
Update:
This more general version of the regex will work with any line break sequence (\r\n, \r or \n):
((?:^|[\r\n]+)[^"]*+"[^\r\n"]*+)[\r\n]+([^"]*+")
Maybe do it in three steps (assuming you have 88 fields in the CSV, because you said there are 23 commas before, and 65 after each second ")
Step 1: replace all CR/LF with some character not anywhere in the file, like ~
Search: \r\n Replace: ~
Step 2: replace all ~ after every 88th 'comma group' (or however many fields in CSV) with \r\n -- to reinsert the required CSV linebreaks:
Search: ((?:[^,]*?,){88})~ Replace: $1\r\n
Step 3: replace all remaining ~ with space
Search ~ Replace: <space>
In this case the source data is generated by the export function in GMail for your contacts.
After the modification outlined below (without RegEx) the result can be used to tidy up your contacts database and re-import it to GMail or to MS Outlook.
Yes, I am standing on the shoulders of #alan and #robinCTS. Thank you both.
Instructions in 5 steps:
use Notepad++ / find replace / extended search mode / wrap around = on
-1- replace all [CRLF] with a unique set characters or a string (I used [~~])
find: \r\n and replace with: ~~
The file contents are now on one line only.
-2- Now we need to separate the header line. For this move to where the first record starts exactly before the 88th. comma (including the word after the 87th. comma [,]) and enter the [CRLF] manually by hitting the return key. There are two lines now: header and records.
-3- now find all [,~~] and replace with [,\r\n] The result is one record per line.
-4- remove the remaining [~~] find: ~~ and replace with: [ ] a space.
The file is now clean of unwanted [CRLF]s.
-5- Save the file and use it as intended.
I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.