Does Procmail have a lowercase function, or a similar capability?

Does Procmail have a lowercase function, or a similar capability? - procmail

I'm using the following (classic) procmail recipe to catch mailing list e-mails and file them into a folder by list name:
:0
* ^((List-Id|X-(Mailing-)?List):(.*[<]\/[^>]*))
{
LISTID=$MATCH
:0
* LISTID ?? ^\/[^#\.]*
Lists/$MATCH/
}
The problem is: if a list name changes from all lowercase to Firstlettercap, I end up with two folders, one for 'listname' and another for 'Listname'.
I'd like to lowercase the $MATCH variable before using it in the final delivery rule, but I'm not able to find a reference to a lc() function, or a regex/replacement that can be used to do this.
One comment below suggested this:
:0
* ^((List-Id|X-(Mailing-)?List):(.*[<]\/[^>]*))
{
LISTID=`echo "$MATCH" | tr A-Z a-z`
:0
* LISTID ?? ^\/[^#\.]*
.Lists.$MATCH/
}
Which also doesn't appear to do what I'm after. Though, looking at it now, clearly the transliteration is only happening on the first occurrence of $MATCH and my guess is that it's not changing it at all for the use in the folder assignment line.
UPDATE #1: If I try to use LISTID in the folder assignment line, I get something like 'Bricolage.project.29601.lighthouseapp' instead of just 'Bricolage' or -- what I'm after -- just 'bricolage'.

Procmail itself has no functionality to replace text with other text. You can run the match through tr, or if avoiding external processes is really important, create a rule for each letter you need to map.
LISTID=`echo "$LISTID" | tr A-Z a-z`
# or alternatively
:0D
* LISTID ?? ^A\/.*
{ LISTID="a$MATCH" }
:0D
* LISTID ?? ^B\/.*
{ LISTID="b$MATCH" }
# ... etc
You could combine this with the final MATCH processing but I leave it at this for purposes of clarity.

AFAIK procmail regular expressions are always case INsensitive anyway, so you already get what you want without doing anything special. At least I always used it that way, and all the sites with procmail documentation I checked (3+) said so too.

Related

Error while compiling regex function, why am I getting this issue?

My RAKU Code:
sub comments {
if ($DEBUG) { say "<filtering comments>\n"; }
my #filteredtitles = ();
# This loops through each track
for #tracks -> $title {
##########################
# LAB 1 TASK 2 #
##########################
## Add regex substitutions to remove superflous comments and all that follows them
## Assign to $_ with smartmatcher (~~)
##########################
$_ = $title;
if ($_) ~~ s:g:mrx/ .*<?[\(^.*]> / {
# Repeat for the other symbols
########################## End Task 2
# Add the edited $title to the new array of titles
#filteredtitles.push: $_;
}
}
# Updates #tracks
return #filteredtitles;
}
Result when compiling:
Error Compiling! Placeholder variable '#_' may not be used here because the surrounding block doesn't take a signature.
Is there something obvious that I am missing? Any help is appreciated.

So, in contrast with #raiph's answer, here's what I have:
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
Just that. Nothing else. Let's dissect it, from the inside out:
This part: / <[\(^]> / is a regular expression that will match one character, as long as it is an open parenthesis (represented by the \() or a caret (^). When they go inside the angle brackets/square brackets combo, it means that is an Enumerated character class.
Then, the: S introduces the non-destructive substitution, i.e., a quoting construct that will make regex-based substitutions over the topic variable $_ but will not modify it, just return its value with the modifications requested. In the code above, S:g brings the adverb :g or :global (see the global adverb in the adverbs section of the documentation) to play, meaning (in the case of the substitution) "please make as many as possible of this substitution" and the final / marks the end of the substitution text, and as it is adjacent to the second /, that means that
S:g / <[\(^]> //
means "please return the contents of $_, but modified in such a way that all its characters matching the regex <[\(^]> are deleted (substituted for the empty string)"
At this point, I should emphasize that regular expressions in Raku are really powerful, and that reading the entire page (and probably the best practices and gotchas page too) is a good idea.
Next, the: .map method, documented here, will be applied to any Iterable (List, Array and all their alikes) and will return a sequence based on each element of the Iterable, altered by a Code passed to it. So, something like:
#x.map({ S:g / foo /bar/ })
essencially means "please return a Sequence of every item on #x, modified by substituting any appearance of the substring foo for bar" (nothing will be altered on #x). A nice place to start to learn about sequences and iterables would be here.
Finally, my one-liner
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
can be translated as:
I have a List with three string elements
Foo
Ba(r
B^az
(This would be a placeholder for your "list of titles"). Take that list and generate a second one, that contains every element on it, but with all instances of the chars "open parenthesis" and "caret" removed.
Ah, and store the result in the variable #tracks (that has my scope)

Here's what I ended up with:
my #tracks = <Foo Ba(r B^az>;
sub comments {
my #filteredtitles;
for #tracks -> $_ is copy {
s:g / <[\(^]> //;
#filteredtitles.push: $_;
}
return #filteredtitles;
}
The is copy ensures the variable set up by the for loop is mutable.
The s:g/...//; is all that's needed to strip the unwanted characters.
One thing no one can help you with is the error you reported. I currently think you just got confused.
Here's an example of code that generates that error:
do { #_ }
But there is no way the code you've shared could generate that error because it requires that there is an #_ variable in your code, and there isn't one.
One way I can help in relation to future problems you may report on StackOverflow is to encourage you to read and apply the guidance in Minimal Reproducible Example.
While your code did not generate the error you reported, it will perhaps help you if you know about some of the other compile time and run time errors there were in the code you shared.
Compile-time errors:
You wrote s:g:mrx. That's invalid: Adverb mrx not allowed on substitution.
You missed out the third slash of the s///. That causes mayhem (see below).
There were several run-time errors, once I got past the compile-time errors. I'll discuss just one, the regex:
.*<?[...]> will match any sub-string with a final character that's one of the ones listed in the [...], and will then capture that sub-string except without the final character. In the context of an s:g/...// substitution this will strip ordinary characters (captured by the .*) but leave the special characters.
This makes no sense.
So I dropped the .*, and also the ? from the special character pattern, changing it from <?[...]> (which just tries to match against the character, but does not capture it if it succeeds) to just <[...]> (which also tries to match against the character, but, if it succeeds, does capture it as well).
A final comment is about an error you made that may well have seriously confused you.
In a nutshell, the s/// construct must have three slashes.
In your question you had code of the form s/.../ (or s:g/.../ etc), without the final slash. If you try to compile such code the parser gets utterly confused because it will think you're just writing a long replacement string.
For example, if you wrote this code:
if s/foo/ { say 'foo' }
if m/bar/ { say 'bar' }
it'd be as if you'd written:
if s/foo/ { say 'foo' }\nif m/...
which in turn would mean you'd get the compile-time error:
Missing block
------> if m/⏏bar/ { ... }
expecting any of:
block or pointy block
...
because Raku(do) would have interpreted the part between the second and third /s as the replacement double quoted string of what it interpreted as an s/.../.../ construct, leading it to barf when it encountered bar.
So, to recap, the s/// construct requires three slashes, not two.
(I'm ignoring syntactic variants of the construct such as, say, s [...] = '...'.)

perl regex to handle and preserver single and multiple words into a variable

I am writing a perl script to read the full name of a member and save it to variables firstname and lastname like below:
my ($firstname, $lastname) = $member =~ m/^(\w+.*?) +(\w+)$/;
my $member_name = $firstname.' '.$lastname;
The value for $member comes from an upstream service which would be like for example "Jane Doe"
Now the code above cannot handle when the service sends $member value like "Jane". The regex fails to handle a single word in that code line. I need it to handle both multiple and single words. I cannot implement a new code functionality so I am looking to add to the existing regex so that there is minimal change and that it can handle both the conditions.
So far this is what I am testing with in the command line but so far no luck:
perl -e 'my ($firstname, $lastname) = "Jane Doe" =~ m/^(\w+.*?) +(\w+)$/|m/^(\w+)$/; print "$firstname\n$lastname";'
When I substitute "Jane Doe" with "Jane", nothing prints. I want the code to be in this format though. like if the value is multiple words it should print them both, otherwise just the single word.
Your help will be greatly appreciated.

There is a syntax error in your Perl code. You terminated the pattern too early.
# / / / /
# V
m/^(\w+.*?) +(\w+)$/|m/^(\w+)$/
This will lead to the | being interpreted as a bit-wise or. Since there's another m// behind it, the | will take the return values of both m// operations and do its magic. The second m// will just match against the topic $_.
What you actually want is to merge both patterns.
my ($firstname, $lastname) = "Jane Doe" =~ m/^(?:(\w+.*?) +)?(\w+)$/;
You need to make the first name optional with a non-capture group (?:), followed by a ? none-or-one quantifier.
You cannot have three capture groups, like you probably intended, because the third one would go to $3, and not $1.
However, the above solution uses the last name, which you then assign to the $firstname variable. Your full name pattern allows for names with any characters in them, like Jean-Luc Picard. But if you pass in just Jean-Luc, the match will fail. So if you want only the first name, you should use the correct pattern to make it consistent.
A simple way of doing that is to make the last name optional instead.
my ($firstname, $lastname) = "Jane" =~ m/^(\w+.*?)(?: +(\w+))?$/;
Remember that this will set $lastname to undef, which doesn't matter so much in your command line example, but in a proper program with strict and warnings (which you of course have turned on, right?) it will complain if $lastname is used as a string while it's undef.
I suggest you read this article about names.

Deleting comments in a large file

I am trying to delete a bunch of comments that are all in the following format:
/**
* #ngdoc
... comment body (delete me, too!)
*/
I have tried using this command: %s/\/**\n * #ngdoc.\{-}*\///g
Here is the regex without the patterns: %s/pattern1.\{-}pattern2//g
Here are the individual patterns: \/**\n * #ngdoc and *\/
When I try my pattern in vim I get the following error:
E871: (NFA regexp) Can't have a multi follow a multi !
E61: Nested *
E476: Invalid command
Thanks for any help with this regexp nightmare!

Instead of trying to cram this into one complex regex, it's much easier to search for the start of a comment and delete from there on to the end of a comment
:g/^\/\*\*$/,/\*\/$/d_
This breaks down into
:g start a global command
/^\/\*\*$/ search for start of a comment: <sol>/**<eol>
,/^\*\/$/ extend the range to the end of a comment: <sol>*/<eol>
d delete the range
_ use the black hole register (performance optimization)

Your problem is you have \{-} followed by * which are the multis referenced in the error message. Quote the *:
%s/\/\*\*\n \* #ngdoc\_.\{-}\*\/\n//g

Using embedded newlines in the pattern is the wrong approach. You should instead use an address range. Something like:
sed '\#^/\*\*$#,\#^\*/$#d' file
This will delete all lines starting from one that matches /** anchored at column 1 to the line matching */ anchored at column 1. If your comments are well behaved (eg, no trailing space after /**), this should do what you want.

Try this using gc to be careful when deleting
%s/\v\/\*\*\n\s\*\s\#ngdoc\n((\s*\n)?(\s\*.*\n)?){-}\s?\*\///gc
Match comments like
/**
* #ngdoc
* ... comment body (delete me, too!)
*
*/

My approached consists of using a macro:
qa/\/\*\*<enter><shift-v>/\*\/<enter>d
qa ........ starts recording macro "a"
/\/\*\* ... searches for the comment beginning
<Enter> ... use Ctrl-v Enter
V ......... starts visual block (until...)
/\*\/ ..... end of your comment
<Enter> ... Ctrl-v Enter agai
d ......... it will delete selected area
In order to isert etc presse followed by the keyword you want.

Parsing log files

I'm trying to write a script to simplify the process of searching through a particular applications log files for specific information. So I thought maybe there's a way to convert them into an XML tree, and I'm off to a decent start....but The problem is, the application log files are an absolute mess if you ask me
Some entries are simple
2014/04/09 11:27:03 INFO Some.code.function - Doing stuff
Ideally I'd like to turn the above into something like this
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>INFO</Type>
<Source>Some.code.function</Source>
<Sub>Doing stuff</Sub>
</Message>
Other entries are something like this where there's additional information and line breaks
2014/04/09 11:27:04 INFO Some.code.function - Something happens
changes:
this stuff happened
I'd like to turn this last chunk into something like the above, but add the additional info into a section
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>INFO</Type>
<Source>Some.code.function</Source>
<Sub>Doing stuff</Sub>
<details>changes:
this stuff happened</details>
</Message>
and then other messages, errors will be in the form of
2014/04/09 11:27:03 ERROR Some.code.function - Something didn't work right
Log Entry: LONGARSEDGUID
Error Code: E3145
Application: Name
Details:
message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry
This last chunk I'd like to convert as the last to above examples, but adding XML nodes for log entry, error code, application, and again, details like so
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>ERROR </Type>
<Source>Some.code.function</Source>
<Sub>Something didn't work right</Sub>
<Entry>LONGARSEDGUID</Entry>
<Code>E3145</Code>
<Application>Name</Application>
<details>message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry</details>
</Message>
Now I know that Select-String has a context option which would let me select a number of lines after the line I've filtered, the problem is, this isn't a constant number.
I'm thinking a regular expression would also me to select the paragraph chunk before the date string, but regular expressions are not a strong point of mine, and I thought there might be a better way because the one constant is that new entries start with a date string
the idea though is to either break these up into xml or tables of sorts and then from there I'm hoping it might take the last or filtering non relevant or recurring messages a little easier
I have a sample I just tossed on pastebin after removing/replacing a few bits of information for privacy reasons
http://pastebin.com/raw.php?i=M9iShyT2

Sorry this is kind of late, I got tied up with work for a bit there (darn work expecting me to be productive while on their dime). I ended up with something similar to Ansgar Wiechers solution, but formatted things into objects and collected those into an array. It doesn't manage your XML that you added later, but this gives you a nice array of objects to work with for the other records. I'll explain the main RegEx line here, I'll comment in-line where it's practical.
'(^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) [\d+?] (\w+?) {1,2}(.+?) - (.+)$' is the Regex that detects the start of a new record. I started to explain it, but there are probably better resources for you to learn RegEx than me explaining it to me. See this RegEx101.com link for a full breakdown and examples.
$Records=#() #Create empty array that we will populate with custom objects later
$Event = $Null #make sure nothing in $Event to give script a clean start
Get-Content 'C:\temp\test1.txt' | #Load file, and start looping through it line-by-line.
?{![string]::IsNullOrEmpty($_)}|% { #Filter out blank lines, and then perform the following on each line
if ($_ -match '(^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[\d+?] (\w+?) {1,2}(.+?) - (.+)$') { #New Record Detector line! If it finds this RegEx match, it means we're starting a new record.
if ($Event) { #If there's already a record in progress, add it to the Array
$Records+=$Event
}
$Event = New-Object PSObject -Property #{ #Create a custom PSObject object with these properties that we just got from that RegEx match
DateStamp = [datetime](get-date $Matches[1]) #We convert the date/time stamp into an actual DateTime object. That way sorting works better, and you can compare it to real dates if needed.
Type = $Matches[2]
Source = $Matches[3]
Message = $Matches[4]}
Ok, little pause for the cause here. $Matches isn't defined by me, why am I referencing it? . When PowerShell gets matches from a RegEx expression it automagically stores the resulting matches in $Matches. So all the groups that we just matched in parenthesis become $Matches[1], $Matches[2], and so on. Yes, it's an array, and there is a $Matches[0], but that is the entire string that was matched against, not just the groups that matched. We now return you to your regularly scheduled script...
} else { #End of the 'New Record' section. If it's not a new record if does the following
if($_ -match "^((?:[^ ^\[])(?:\w| |\.)+?):(.*)$"){
RegEx match again. It starts off by stating that this has to be the beginning of the string with the carat character (^). Then it says (in a non-capturing group noted by the (?:<stuff>) format, which really for my purposes just means it won't show up in $Matches) [^ \[]; that means that the next character can not be a space or opening bracket (escaped with a ), just to speed things up and skip those lines for this check. If you have things in brackets [] and the first character is a carat it means 'don't match anything in these brackets'.
I actually just changed this next part to include periods, and used \w instead of [a-zA-Z0-9] because it's essentially the same thing but shorter. \w is a "word character" in RegEx, and includes letters, numbers, and the underscore. I'm not sure why the underscore is considered part of a word, but I don't make the rules I just play the game. I was using [a-zA-Z0-9] which matches anything between 'a' and 'z' (lowercase), anything between 'A' and 'Z' (uppercase), and anything between '0' and '9'. At the risk of including the underscore character \w is a lot shorter and simpler.
Then the actual capturing part of this RegEx. This has 2 groups, the first is letters, numbers, underscores, spaces, and periods (escaped with a \ because '.' on it's own matches any character). Then a colon. Then a second group that is everything else until the end of the line.
$Field = $Matches[1] #Everything before the colon is the name of the field
$Value = $Matches[2].trim() #everything after the colon is the data in that field
$Event | Add-Member $Field $Value #Add the Field to $Event as a NoteProperty, with a value of $Value. Those two are actually positional parameters for Add-Member, so we don't have to go and specify what kind of member, specify what the name is, and what the value is. Just Add-Member <[string]name> <value can be a string, array, yeti, whatever... it's not picky>
} #End of New Field for current record
else{$Value = $_} #If it didn't find the regex to determine if it is a new field then this is just more data from the last field, so don't change the field, just set it all as data.
} else { #If it didn't find the regex then this is just more data from the last field, so don't change the field, just set it all as data.the field does not 'not exist') do this:
$Event.$Field += if(![string]::isNullOrEmpty($Event.$Field)){"`r`n$_"}else{$_}}
This is a long explanation for a fairly short bit of code. Really all it does is add data to the field! This has an inverted (prefixed with !) If check to see if the current field has any data, if it, or if it is currently Null or Empty. If it is empty it adds a new line, and then adds the $Value data. If it doesn't have any data it skips the new line bit, and just adds the data.
}
}
}
$Records+=$Event #Adds the last event to the array of records.
Sorry, I'm not very good with XML. But at least this gets you clean records.
Edit: Ok, code is notated now, hopefully everything is explained well enough. If something is still confusing perhaps I can refer you to a site that explains better than I can. I ran the above against your sample input in PasteBin.

One possible way to deal with such files is to process them line by line. Each log entry starts with a timestamp and ends when the next line starting with a timestamp appears, so you could do something like this:
Get-Content 'C:\path\to\your.log' | % {
if ($_ -match '^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}') {
if ($logRecord) {
# If a current log record exists, it is complete now, so it can be added
# to your XML or whatever, e.g.:
$logRecord -match '^(\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2}) (\S+) ...'
$message = $xml.CreateElement('Message')
$date = $xml.CreateElement('Date')
$date.InnerText = $matches[1]
$message.AppendChild($date)
$time = $xml.CreateElement('Time')
$time.InnerText = $matches[2]
$message.AppendChild($time)
$type = $xml.CreateElement('Type')
$type.InnerText = $matches[3]
$message.AppendChild($type)
...
$xml.SelectSingleNode('...').AppendChild($message)
}
$logRecord = $_ # start new record
} else {
$logRecord += "`r`n$_" # append to current record
}
}

Vim: Delete the text matching a pattern IF submatch(1) is empty

This command line parses a contact list document that may or may not have either a phone, email or web listed. If it has all three then everything works great - appending the return from the FormatContact() at the end of the line for data uploading:
silent!/^\d/+1|ki|/\n^\d\|\%$/-1|kj|'i,'jd|let #a = substitute(#",'\s*Phone: \([^,]*\)\_.*','\1',"")|let #b = substitute(#",'^\_.*E-mail:\s\[\d*\]\([-_#.0-9a-zA-Z]*\)\_.*','\1',"")|let #c = substitute(#",'^\_.*Web site:\s*\[\d*\]\([-_.:/0-9a-zA-Z]*\)\_.*','\1',"")|?^\d\+?s/$/\=','.FormatContact(#a,#b,#c)
or, broken down:
silent!/^\d/+1|ki|/\n^\d\|\%$/-1|kj|'i,'jd
let #a = substitute(#",'\s*Phone: \([^,]*\)\_.*','\1',"")
let #b = substitute(#",'^\_.*E-mail:\s\[\d*\]\([-_#.0-9a-zA-Z]*\)\_.*','\1',"")
let #c = substitute(#",'^\_.*Web site:\s*\[\d*\]\([-_.:/0-9a-zA-Z]*\)\_.*','\1',"")
?^\d\+?s/$/\=','.FormatContact(#a,#b,#c)
I created three separate searches so as not to make any ONE search fail if one atom failed to match because - again - the contact info may or may not exist per contact.
The Problem that solution created was that when the pattern does not match I get the whole #" into #a. Instead, I need it to be empty when the match does not occur. I need each variable represented (phone,email,web) whether it be empty or not.
I see no flags that can be set in the substitution function that
will do this.
Is there a way to return "" if \1 is empty?
Is there a way to create an optional atom so the search query(ies) could still account for an empty match so as to properly record it as empty?

Instead of using substitutions that replace the whole captured text
with its part of interest, one can match only that target part. Unlike
substitution routines, matching ones either locate the text conforming
to the given pattern, or report that there is no such text. Thus,
using the matchstr() function in preference to substitute(), the
parsing code listed in the question can be changed as follows:
let #a = matchstr(#", '\<Phone:\s*\zs[^,]*')
let #b = matchstr(#", '\<E-mail:\s*\[\d*\]\zs[-_#.0-9a-zA-Z]*')
let #c = matchstr(#", '\<Web site:\s*\[\d*\]\zs[-_.:/0-9a-zA-Z]*')

Just in case you want linewise processing, consider using in combination with :global, e.g.
let #a=""
g/text to match/let #A=substitute(getline("."), '^.*\(' . #/ . '\).*$', '\1\r', '')
This will print the matched text for any line that contained it, separated with newlines:
echo #a
The beautiful thing here, is that you can make it work with the last-used search-pattern very easily:
g//let #A=substitute(getline("."), '^.*\(' . #/ . '\).*$', '\1\r', '')

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Does Procmail have a lowercase function, or a similar capability? - procmail

AFAIK procmail regular expressions are always case INsensitive anyway, so you already get what you want without doing anything special. At least I always used it that way, and all the sites with procmail documentation I checked (3+) said so too.

Related

Error while compiling regex function, why am I getting this issue?

perl regex to handle and preserver single and multiple words into a variable

Deleting comments in a large file

Parsing log files

Vim: Delete the text matching a pattern IF submatch(1) is empty

Categories

Resources