Get procmail to reply to larger messages - procmail

I am trying to reply to messages larger than a certain size then forward to another user. Got this, but nothing happens. Its seem I am only able to add text to the end of the message.
:0
* > 1000
{
:0 fhw
| cat - ; echo "Insert this text at the top of the body"
:0
| formail -rk
| $SENDMAIL -t
}

Using sed helped a lot.
SEDSCRIPT='0,/^$/ s//\nLarge message rejected [Max=4MB]\n/'
MAILADDR=me#nowhere
:0
* > 4000000
* !^FROM_DAEMON
* !^X-Loop: $MAILADDR
| formail -rk -A "X-Loop: $MAILADDR" \
| sed "$SEDSCRIPT" \
| $SENDMAIL -t

It's not clear what exactly is wrong, but if you want to append text at the beginning, you obviously need to echo before cat, and work on the body (b), not the headers (h).
:0 fbw
| echo "Insert this"; cat -
I suppose you could technically break the headers by appending something at the end, but if you want it to appear in the body, it needs to have a neck (a newline) before it.
:0 fhw
| cat -; echo; echo "Insert this"
There is also a sed syntax which allows for somewhat more flexible manipulation (sed addressing lets you say things like "before the first line which starts with > for example) but getting newlines into sed command lines inside Procmail is hairy. As a workaround, I often use a string, and then just interpolate that. (How hairy exactly depends on details of sed syntax which are not standard. Some implementations seem to require newlines in the a and i commands.)
sedscript='1i\
insert this\
'
:0 fbw
| sed "$sedscript"
(If you are lucky, your sed will accept something simpler like sed '1i insert this'. The variant above seems to be the only one I can get to work on macOS, and thus probably generally *BSD.)
As an aside, a message which is 1000 bytes long isn't by any means large. I recall calculating an average message length of about 4k in my own inbox, but this was before people started to use HTML email clients. Depending on your inbound topology, just the headers could easily be more than 1000 bytes.

Related

jq remove spaces after first

Seemed simple, but not so far. Tried lots of things. Best I've got:
echo "low quality not gonna apologize" | jq -r 'gsub("[\\s+]"; " "; "g")'
parse error: Invalid numeric literal at line 1, column 4
Goal is to have 1 space replace any occurrence of multiple whitespace of any kind. Note that I removed tabs and newlines already from this stream. This is bash shell. I don't get this error in the context of the larger application I'm building either, where the code is simply and quietly not changing the multiple spaces into a single space for IDK why.
The right way with jq:
echo "low quality not gonna apologize" | jq -Rr 'gsub("\\s+";" ";"g")'
-R - raw input; each line of text is passed to the filter as a string
The output:
low quality not gonna apologize
Two of many alternatives:
$ echo '"low quality not gonna apologize"' | jq -r 'gsub("\\s+"; " ")'
low quality not gonna apologize
$ jq -n --arg in "low quality not gonna apologize" '$in | gsub("\\s+"; " ")'
"low quality not gonna apologize"
Notice that:
Not every shell string is a JSON string.
The --arg command-line option has the effect of coercing the shell string to a JSON string.
if you use 'gsub', there is no need to specify "g" as well.

Ignore long lines in silversearcher

Right now I am using:
ag sessions --color|cut -b1-130
But this will cause color artifacts if the search match is cut bu the cut command.
Silversearcher has this in the docs:
--print-long-lines
Print matches on very long lines (> 2k characters by default).
Can I change 2k to something else? (120 for me, because honestly never in any of the code I work with the real code is longer than that).
Very strangely, the documented --print-long-lines actually does nothing at all, yet there is a working switch for this: -W NUM / --width NUM which is not documented at all. See https://github.com/ggreer/the_silver_searcher/pull/720
I can think of three options:
Just print the result of your search instead of the whole line, using the -o option: ag --color -o
Use less instead of cut which nicely chops long lines at the screen size's width using the -S option (chop long lines) and the -R option (to deal with the color escape sequences): ag --color <pattern> | less -R -S
Use something like sed or awk instead of cut: ag --color <pattern> |sed -E "s/(.{$COLUMNS}).*$/\1/"
Which will cut the returned line at the limit of your screen size. Of course, if you're determined to chop at 120 columns, you can: ag --color <pattern> |sed -E "s/(.{120}).*$/\1/"
This last option doesn't prevent the possibility of chopping in the middle of a color escape sequence; if you're really hellbent, you can modify the sed search pattern to ignore color escape sequences -- already answered on SO. That said, I don't see the purpose of doing this given the easiness and correctness of option 1 above.
ag --width 400 string dir/
# In .bash_aliases (s is for short)
alias ags='ag --width 400'
Ignores lines longer than 400 chars.

How to use awk and grep on 300GB .txt file?

I have a huge .txt file, 300GB to be more precise, and I would like to put all the distinct strings from the first column, that match my pattern into a different .txt file.
awk '{print $1}' file_name | grep -o '/ns/.*' | awk '!seen[$0]++' > test1.txt
This is what I've tried, and as far as I can see it works fine but the problem is that after some time I get the following error:
awk: program limit exceeded: maximum number of fields size=32767
FILENAME="file_name" FNR=117897124 NR=117897124
Any suggestions?
The error message tells you:
line(117897124) has to many fields (>32767).
You'd better check it out:
sed -n '117897124{p;q}' file_name
Use cut to extract 1st column:
cut -d ' ' -f 1 < file_name | ...
Note: You may change ' ' to whatever the field separator is. The default is $'\t'.
The 'number of fields' is the number of 'columns' in the input file, so if one of the lines is really long, then that could potentially cause this error.
I suspect that the awk and grep steps could be combined into one:
sed -n 's/\(^pattern...\).*/\1/p' some_file | awk '!seen[$0]++' > test1.txt
That might evade the awk problem entirely (that sed command substitutes any leading text which matches the pattern, in place of the entire line, and if it matches, prints out the line).
Seems to me that your awk implementation has an upper limit for the number of records it can read in one go of 117,897,124. The limits can vary according to your implementation, and your OS.
Maybe a sane way to approach this problem is to program a custom script that uses split to split the large file into smaller ones, with no more than 100,000,000 records each.
Just in case that you don't want to split the file, then maybe you could look for the limits file correspondent to your awk implementation. Maybe you can define unlimited as the Number of Records value, although I believe that is not a good idea, as you might end up using a lot of resources...
If you have enough free space on disk (because creates a temp .swp file) I suggest to use Vim, vim regex has small difference but you can convert from standard regex to vim regex with this tool http://thewebminer.com/regex-to-vim
The error message says your input file contains too many fields for your awk implementation. Just change the field separator to be the same as the record separator and you'll only have 1 field per line and so avoid that problem, then merge the rest of the commands into one:
awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\// && !seen[$0]++' file_name
If that's a problem then try:
awk 'BEGIN{FS=RS} {sub(/[[:space:]].*/,"")} /\/ns\//' file_name | sort -u
There may be an even simpler solution but since you haven't posted any sample input and expected output, we're just guessing.

sed: return last occurrence match until end of file

Using sed, how do I return the last occurance of a match until the End Of File?
(FYI this has been simplified)
So far I've tried:
sed -n '/ Statistics |/,$p' logfile.log
Which returns all lines from the first match onwards (almost the entire file)
I've also tried:
$linenum=`tail -400 logfile.log | grep -n " Statistics |" | tail -1 | cut -d: -f1`
sed "$linenum,\$!d" logfile.log
This works but won't work over an ssh connection in one command, really need it all to be in one pipeline.
Format of the log file is as follows:
(There are statistics headers with sub data written to the log file every minute, the purpose of this command is to return the most recent Statistics header together with any associated errors that occur after the header)
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
More Stuff
Error: incorrect value
Statistics |
Stuff
More Stuff
Even more Stuff
Statistics |
Stuff
Error: error type one
Error: error type two
EOF
Return needs to be:
Statistics |
Stuff
Error: error type one
Error: error type two
Your example script has a space before Statistics but your sample data doesn't seem to. This has a regex which assumes Statistics is at beginning of line; tweak if that's incorrect.
sed -n '/^Statistics |/h;/^Statistics |/!H;$!b;x;p'
When you see Statistics, replace the hold space with the current line (h). Otherwise, append to the hold space (H). If we are not at the end of file, stop here (b). At end of file, print out the hold space (x retrieve contents of hold space; p print).
In a sed script, commands are optionally prefixed by an "address". Most commonly this is a regex, but it can also be a line number. The address /^Statistics |/ selects all lines matching the regular expression; /^Statistics |/! selects lines not matching the regular expression; and $! matches all lines except the last line in the file. Commands with no explicit address are executed for all input lines.
Edit Explain the script in some more detail, and add the following.
Note that if you need to pass this to a remote host using ssh, you will need additional levels of quoting. One possible workaround if it gets too complex is to store this script on the remote host, and just ssh remotehost path/to/script. Another possible workaround is to change the addressing expressions so that they don't contain any exclamation marks (these are problematic on the command line e.g. in Bash).
sed -n '/^Statistics |/{h;b};H;${x;p}'
This is somewhat simpler, too!
A third possible workaround, if your ssh pipeline's stdin is not tied up for other things, is to pipe in the script from your local host.
echo '/^Statistics |/h;/^Statistics |/!H;$!b;x;p' |
ssh remotehost sed -n -f - file
If you have tac available:
tac INPUTFILE | sed '/^Statistics |/q' | tac
This might work for you:
sed '/Statistics/h;//!H;$!d;x' file
Statistics |
Stuff
Error: error type one
Error: error type two
If you're happy with an awk solution, this kinda works (apart from getting an extra blank line):
awk '/^Statistics/ { buf = "" } { buf = buf "\n" $0 } END { print buf }' input.txt
sed ':a;N;$!ba;s/.*Statistics/Statistics/g' INPUTFILE
should work (GNU sed 4.2.1).
It reads the whole file to one string, then replaces everything from the start to the last Statistics (word included) with Statistics, and prints what's remaining.
HTH
This might also work, slightly more simple version of the sed solution given by the others above:
sed -n 'H; /^Statistics |/h; ${g;p;}' logfile.log
Output:
Statistics |
Stuff
Error: error type one
Error: error type two

bash grep - negative match

I want to show flag places in my Python unittests where I have been lazy and de-activated tests.
But I also have conditional executions that are not laziness, they are motivated by performance or system conditions at time of testing. Those are the skipUnless ones and I want to ignore them entirely.
Let's take some inputs that I have put in a file, test_so_bashregex.txt, with some comments.
!ignore this, because skipUnless means I have an acceptable conditional flag
#unittest.skipUnless(do_test, do_test_msg)
def test_conditional_function():
xxx
!catch these 2, lazy test-passing
#unittest.skip("fb212.test_urls_security_usergroup Test_Detail.test_related fails with 302")
def sometest_function():
xxx
#unittest.expectedFailure
def test_another_function():
xxx
!bonus points... ignore things that are commented out
# #unittest.expectedFailure
Additionally, I can't use a grep -v skipUnless in a pipe because I really want to use egrep -A 3 xxx *.py to give some context, as in:
grep -A 3 "#unittest\." *.py
test_backend_security_meta.py: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py- def test_storage(self):
test_backend_security_meta.py- with getMultiDb() as mdb:
test_backend_security_meta.py-
What I have tried:
Trying # https://www.debuggex.com/
I tried #unittest\.(.+)(?!(Unless\()) and that didn't work, as it matches the first 3.
Ditto #unittest\.[a-zA-Z]+(?!(Unless\())
#unittest\.skip(?!(Unless\()) worked partially, on the 2 with skip.
All of those do partial matches despite the presence of Unless.
on bash egrep, which is where this going to end up, things don't look much better.
jluc#explore$ egrep '#unittest\..*(?!(Unless))' test_so_bashregex.txt
egrep: repetition-operator operand invalid
you could try this regex:
(?<!#\s)#unittest\.(?!skipUnless)(skip|expectedFailure).*
if you don't care if 'skip' or 'expectedFailure' appear you could simplify it:
(?<!#\s)#unittest\.(?!skipUnless).*
How about something like this - grep seems a bit restrictive
items=$(find . -name "*.py")
for item in $items; do
cat $item | awk '
/^\#unittest.*expectedFailure/{seen_skip=1;}
/^\#unittest.*skip/{seen_skip=1;}
/^def/{
if (seen_skip == 1)
print "Being lazy at " $1
seen_skip=0;
}
'
done
OK, I'll put up what I found with sweaver2112's help, but if someone has a good single-stage grep-ready regex, I'll take it.
bash's egrep/grep doesn't like ?! (ref grep: repetition-operator operand invalid). end of story there.
What I have done instead is to pipe it to some extra filters: negative grep -v skipUnless and another one to strip leading comments. These 2 strip out the unwanted lines. But, then pipe their output back into another grep looking for #unittest again and again with the -A 3 flag.
If the negative greps have cleared out a line, it won't show in the last pipestage so drops out of the input. If not, I get my context right back.
egrep -A 3 -n '#unittest\.' test_so_bashregex.txt | egrep -v "^\s*#" | egrep -v "skipUnless\(" | grep #unittest -A 3
output:
7:#unittest.skip("fb212.test_urls_security_usergroup Test_Detail.test_related fails with 302")
8-def sometest_function():
9- xxx
10:#unittest.expectedFailure
11-def test_another_function():
12- xxx
And my actual output from running it on * *.py*, rather than my test.txt file:
egrep -A 3 -n '#unittest\.' *.py | egrep -v "\d:\s*#" | egrep -v "skipUnless\(" | grep #unittest -A 3
output:
test_backend_security_meta.py:77: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py-78- def test_storage(self):
test_backend_security_meta.py-79- with getMultiDb() as mdb:
test_backend_security_meta.py-80-
--
test_backend_security_meta.py:98: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py-99- def test_get_li_tag_for_object(self):
test_backend_security_meta.py-100- li = self.mgr.get_li_tag()
test_backend_security_meta.py-101-