Piping output of shell command to grep causes "grep: write error" - crystal-lang

I wrote a tool in Crystal that takes some command line parameters, and turns those into, basically, "find stuff | xargs grep" where xargs is instructed to use multiple processes. This is run via Process.run, and output and error are redirected into a custom IO object which filters what comes out of grep a bit, writing everything that isn't filtered into STDOUT.
When I run this normally, it mostly seems to run fine. There do seem to be some instances of output getting cut off before the search completes though, so I'm not sure I can fully trust the results. When I pipe the output from this command into grep, however, it always cuts off the search early and says "grep: write error". I have no idea why this is happening, and would love some help. Eventually I'll likely rewrite this to do everything in pure Crystal, but for now this is a quick solution to search the codebase I'm working on.
Here is the code that is getting run:
class FindFilterIO
include IO
##generic_filter = [".diff", ".iml", "/target/"]
##web_filter = [".css", ".js", ".jsp", ".ftl"]
def initialize(#web_search : Bool = false)
end
def read(slice : Bytes)
raise "FindFilterIO does not support reading!"
end
def write(slice : Bytes)
str = String.new slice
if ##generic_filter.any? { |e| str.includes? e }
return
end
if #web_search
if !##web_filter.any? { |e| str.includes? e }
return
end
end
STDOUT.write(slice)
end
end
cmd = "find . -not \\( -path ./.svn -prune \\) " \
"-not \\( -path ./.idea -prune \\) " \
"-type f -print0 " \
"| xargs -0 -P 1 -n 100 grep -E -n --color=always "
cmd += if #html_id
"'id=['\"'\"'\"]#{#search_text}['\"'\"'\"]|\##{#search_text}'"
elsif #html_class
"'class=['\"'\"'\"]#{#search_text}['\"'\"'\"]|\\.#{#search_text}'"
else
"'#{#search_text}'"
end
io = FindFilterIO.new web_search: (#html_id || #html_class)
Process.run(cmd, output: io, error: io, shell: true, chdir: File.join(#env.home_dir, #env.branch, "repodir"))

This seems to have been fixed now that the issue at https://github.com/crystal-lang/crystal/issues/2065 has been closed. Will need to do some more testing to make sure it's totally fixed, but using my older code seems to be working fine now.

Related

How to write unix regular expression to select for specific files in a cp for-loop

I've got a directory with a bunch of files. Instead of describing the filenames and extensions, I'll just show you what is in the directory:
P01_1.atag P03_3.tgt P05_6.src P08_3.atag P10_5.tgt
P01_1.src P03_4.atag P05_6.tgt P08_3.src P10_6.atag
P01_1.tgt P03_4.src P06_1.atag P08_3.tgt P10_6.src
P01_2.atag P03_4.tgt P06_1.src P08_4.atag P10_6.tgt
P01_2.src P03_5.atag P06_1.tgt P08_4.src P11_1.atag
P01_2.tgt P03_5.src P06_2.atag P08_4.tgt P11_1.src
P01_3.atag P03_5.tgt P06_2.src P08_5.atag P11_1.tgt
P01_3.src P03_6.atag P06_2.tgt P08_5.src P11_2.atag
P01_3.tgt P03_6.src P06_3.atag P08_5.tgt P11_2.src
P01_4.atag P03_6.tgt P06_3.src P08_6.atag P11_2.tgt
P01_4.src P04_1.atag P06_3.tgt P08_6.src P11_3.atag
P01_4.tgt P04_1.src P06_4.atag P08_6.tgt P11_3.src
P01_5.atag P04_1.tgt P06_4.src P09_1.atag P11_3.tgt
P01_5.src P04_2.atag P06_4.tgt P09_1.src P11_4.atag
P01_5.tgt P04_2.src P06_5.atag P09_1.tgt P11_4.src
P01_6.atag P04_2.tgt P06_5.src P09_2.atag P11_4.tgt
P01_6.src P04_3.atag P06_5.tgt P09_2.src P11_5.atag
P01_6.tgt P04_3.src P06_6.atag P09_2.tgt P11_5.src
P02_1.atag P04_3.tgt P06_6.src P09_3.atag P11_5.tgt
P02_1.src P04_4.atag P06_6.tgt P09_3.src P11_6.atag
P02_1.tgt P04_4.src P07_1.atag P09_3.tgt P11_6.src
P02_2.atag P04_4.tgt P07_1.src P09_4.atag P11_6.tgt
P02_2.src P04_5.atag P07_1.tgt P09_4.src P12_1.atag
P02_2.tgt P04_5.src P07_2.atag P09_4.tgt P12_1.src
P02_3.atag P04_5.tgt P07_2.src P09_5.atag P12_1.tgt
P02_3.src P04_6.atag P07_2.tgt P09_5.src P12_2.atag
P02_3.tgt P04_6.src P07_3.atag P09_5.tgt P12_2.src
P02_4.atag P04_6.tgt P07_3.src P09_6.atag P12_2.tgt
P02_4.src P05_1.atag P07_3.tgt P09_6.src P12_3.atag
P02_4.tgt P05_1.src P07_4.atag P09_6.tgt P12_3.src
P02_5.atag P05_1.tgt P07_4.src P10_1.atag P12_3.tgt
P02_5.src P05_2.atag P07_4.tgt P10_1.src P12_4.atag
P02_5.tgt P05_2.src P07_5.atag P10_1.tgt P12_4.src
P02_6.atag P05_2.tgt P07_5.src P10_2.atag P12_4.tgt
P02_6.src P05_3.atag P07_5.tgt P10_2.src P12_5.atag
P02_6.tgt P05_3.src P07_6.atag P10_2.tgt P12_5.src
P03_1.atag P05_3.tgt P07_6.src P10_3.atag P12_5.tgt
P03_1.src P05_4.atag P07_6.tgt P10_3.src P12_6.atag
P03_1.tgt P05_4.src P08_1.atag P10_3.tgt P12_6.src
P03_2.atag P05_4.tgt P08_1.src P10_4.atag P12_6.tgt
P03_2.src P05_5.atag P08_1.tgt P10_4.src
P03_2.tgt P05_5.src P08_2.atag P10_4.tgt
P03_3.atag P05_5.tgt P08_2.src P10_5.atag
P03_3.src P05_6.atag P08_2.tgt P10_5.src
I have a file that is just outside of this directory that I need to copy to all of the files that end with "_1.src" inside the directory.
I'm working with unix in the Terminal app, so I tried writing this for loop, but it rejected my regular expression:
for .*1.src in ./
> do
> cp ../1.src
> done
I've only written regular expressions in Python before and have minimal experience, but I was under the impression that .* would match any combination of characters. However, I got the following error message:
-bash: `.*1.src': not a valid identifier
I then tried the same for loop with the following regular expression:
^[a-zA-Z0-9_]*1.src$
But I got the same error message:
-bash: `^[a-zA-Z0-9_]*1.src$': not a valid identifier
I tried the same regular expression with and without quotation marks, but it always gives the same 'not a valid identifier' error message.
Tested on Bash 4.4.12, the following is possible:
$ for i in ./*_1.src; do echo "$i" ; done
This will echo every file ending with _1.src to the screen, thus moving it will be possible as well.
$ mkdir tmp
$ for i in ./*_1.src; do mv "$i" tmp/.; done
I've tested with the following data:
$ touch P{1,2}{0,1,2}_{0..6}.{src,tgt,atag}
$ ls
P10_0.atag P10_5.src P11_3.tgt P12_2.atag P20_0.src P20_5.tgt P21_4.atag P22_2.src
P10_0.src P10_5.tgt P11_4.atag P12_2.src P20_0.tgt P20_6.atag P21_4.src P22_2.tgt
P10_0.tgt P10_6.atag P11_4.src P12_2.tgt P20_1.atag P20_6.src P21_4.tgt P22_3.atag
P10_1.atag P10_6.src P11_4.tgt P12_3.atag P20_1.src P20_6.tgt P21_5.atag P22_3.src
P10_1.src P10_6.tgt P11_5.atag P12_3.src P20_1.tgt P21_0.atag P21_5.src P22_3.tgt
P10_1.tgt P11_0.atag P11_5.src P12_3.tgt P20_2.atag P21_0.src P21_5.tgt P22_4.atag
P10_2.atag P11_0.src P11_5.tgt P12_4.atag P20_2.src P21_0.tgt P21_6.atag P22_4.src
P10_2.src P11_0.tgt P11_6.atag P12_4.src P20_2.tgt P21_1.atag P21_6.src P22_4.tgt
P10_2.tgt P11_1.atag P11_6.src P12_4.tgt P20_3.atag P21_1.src P21_6.tgt P22_5.atag
P10_3.atag P11_1.src P11_6.tgt P12_5.atag P20_3.src P21_1.tgt P22_0.atag P22_5.src
P10_3.src P11_1.tgt P12_0.atag P12_5.src P20_3.tgt P21_2.atag P22_0.src P22_5.tgt
P10_3.tgt P11_2.atag P12_0.src P12_5.tgt P20_4.atag P21_2.src P22_0.tgt P22_6.atag
P10_4.atag P11_2.src P12_0.tgt P12_6.atag P20_4.src P21_2.tgt P22_1.atag P22_6.src
P10_4.src P11_2.tgt P12_1.atag P12_6.src P20_4.tgt P21_3.atag P22_1.src P22_6.tgt
P10_4.tgt P11_3.atag P12_1.src P12_6.tgt P20_5.atag P21_3.src P22_1.tgt P10_5.atag
P11_3.src P12_1.tgt P20_0.atag P20_5.src P21_3.tgt P22_2.atag
Apparently, my previous answer didn't work. But this seems to:
$ for x in `echo ./P[01][012]_1.src`; do echo "$x"; done
./P01_1.src
./P02_1.src
So, when you run this echo alone, this pattern gets expanded into many names:
$ echo ./P[01][012]_1.src # note that the 'regex' is not enclosed in quotes
./P01_1.src ./P02_1.src
And then you can iterate over these names in a loop.
BTW, as noted in the comments, you don't even need that echo, so you can plug the pattern right into the loop:
for x in ./P[01][012]_1.src; do echo "$x"; done
Please correct me if your goal is something other than
"overwrite many existing files sharing a common suffix with the contents of a single file"
find /path/to/dest_dir -type f -name "*_1.src" |xargs -n1 cp /path/to/source_file
Note that without the -maxdepth 1 option, find will recurse through your destination directory.
Thanks to everyone; this is what ended up working:
for x in `echo ./P[0-9]*_1.src`
> do
> cp ../1.src "$x"
> done
This loop allowed me to copy the contents of the one file to all of the files in the subdirectory that ended with "_1.src"

bash script - fetch only unique domains from email list to variable

I am new to bash and having problem understanding how to get this done.
Check all "To:" field email address domains and list all unique domains to a variable to compare it to from domain.
I get the "from address" domain by using
grep -m 1 "From: " filename | cut -f 2 -d '#' | cut -d ">" -f 1
when reading a mail stored in file filename.
For "to address" domain there can be multiple To: addresses and having multiple domains. I am not sure how to get unique domains from "to address field".
Example to address line will be like this:
To: user#domain.com, user2#domain.com,
User Name <sample#domaintest.com>, test#domainname.com
grep -m 1 "^To: " filename | cut -f 2 -d '#' | cut -d ">" -f 1
but there are different format of email. So I am not sure if grep is right or if I should search for awk or something.
I need to get the unique domain list from the "To:" field email address/addresses to a variable in bash script.
Desired output for above example:
domain.com,domaintest.com,domainname.com
If you are hellbent on doing this with line-oriented utilities, there is a utility formail in the Procmail distribution which can normalize things for you somewhat.
bash$ formail -czxTo: <<\==test==
> From: me <sender#example.com>
> To: you <first#example.org>,
> them <other#example.net>
> Subject: quick demo
>
> Very quick, innit.
> ==test==
first#example.org, other#example.net
So with that you have input which you can actually pass to grep or Awk ... or sed.
fromdom=$(formail -czxTo: <message | tr ',' '\n' | sed 's/.*#//')
The From: address will not be normalized by formail -czxFrom: but you can use a neat trick: make formail generate a reply back to the From: address, and then extract the To: header from that.
todoms=$(formail -rtzcxTo: <message | sed 's/.*#//')
In some more detail, -r says to create a new reply to whoever sent you message, and then we do -zcxTo: on that.
(The -t option may or may not do what you want. In this case, I would perhaps omit it. http://www.iki.fi/era/procmail/formail.html has (vague) documentation for what it does; see also the section just before http://www.iki.fi/era/procmail/mini-faq.html#group-writable and sorry for the clumsy link -- there doesn't seem to be a good page-internal anchor to link to.)
Email address normalization is tricky because there are so many variants to choose from.
From: Elvis Parsley <king#graceland.example.com>
From: king#graceland.example.com
From: "Parsley, Elvis" <king#graceland.example.com> (kill me, I have to use Outlook)
From: "quoted#string" <king#graceland.example.com> (wait, he is already dead)
To: This could fold <recipient#example.net>,
over multiple lines <another#example.org>
I would turn to a more capable language with proper support for parsing all of these formats. My choice would be Python, though you could probably also pull this off in a few lines of Ruby or Perl.
The email library was revamped in Python 3.6 so this assumes you have at least that version. The email.Headerregistry class which is new in 3.6 is particularly convenient here.
#!/usr/bin/env python3
from email.policy import default
from email import message_from_binary_file
import sys
if len(sys.argv) == 1:
sys.argv.append('-')
for arg in sys.argv[1:]:
if arg == '-':
handle = sys.stdin
else:
handle = open(arg, 'rb')
message = message_from_binary_file(handle, policy=default)
from_dom = message.get('From').address.domain
to_doms = set()
for addr in message.get('To').addresses:
dom = addr.domain
if dom == from_dom:
continue
to_doms.add(dom)
print(','.join([from_dom] + list(to_doms)))
if arg != '-':
handle.close()
This simply produces a comma-separated list of domain names; you might want to do the rest of the processing in Python too instead, or change this so that it prints something in a slightly different format.
You'd save this in a convenient place (say, /usr/local/bin/fromto) and mark it as executable (chmod 755 /usr/local/bin/fromto). Now you can call this from the shell like any other utility like grep.

Git Bash regex to match latest tag

My VCS has these tags
0.0.3.156-alpha+2
0.0.3.154
0.0.3.153
build-.139
build-.140
build-.142
build-0.0.1.28
build-0.0.1.29
build-0.0.1.30
build-0.0.1.32
I want to git describe --match "<regex>" to get the latest tag of the form number.number.number.number (so it's 0.0.3.154 in this case)
I have tried with git describe --match "[0-9]*.[0-9]*.[0-9]*.[0-9]*$" but it doesn't result in anything, and neither do these pattern:
"[0-9]*.[0-9]*.[0-9]*.[0-9]+"
"[0-9]*.[0-9]*.[0-9]*.[0-9]{1,}"
I need to get the latest tag in other to bump version for the next release. So i'm thinking of doing this automatically. Please let me know if I miss anything
Thanks
UPDATE:
In my build.gradle file I have a function to get tag like this (follow #Marc reply):
version getVersionFromTag()
def getVersionFromTag() {
def stdout = new ByteArrayOutputStream()
exec {
commandLine 'git', 'tag', '|' , 'grep', '^\([0-9]\+\.\?\)\+$', '|', 'sort' , '-nr', '|', 'head', '-1'
standardOutput = stdout
}
return stdout.toString().trim()
}
Here it gives errors Unexpected Char '\' in the regex above. Hence I removed them to becomes '^([0-9]+.?)+$', then it runs fine but in my final artifact, it does not have the version appended to the name (i.e helloword.jar instead of helloword-0.0.3.154.jar
=> My question is how should I put #Marc's suggested command to the gradle function correctly?
For testing I've put the output of your git describe in a file. This will do:
cat file | grep '^\([0-9]\+\.\?\)\+$' | sort -nr | head -1
0.0.3.154
Suppose you've created some irregular formatted tags and you want to use those as well (like your build--tags) for finding the highest tag:
sed -E 's/^[^0-9.]*//' | grep '^\([0-9]\+\.\?\)\+$' | sort -nr | head -1

sed command issues

Background
Original json (test.json): {"rpc-password": "password"}
Expected changed json: {"rpc-password": "somepassword"}
replace_json_str is a function used to replace password with somepassword using sed.
replace_json_str() {
x=$1
sed -i -e 's/\({ "'"$2"'":\)"[^"]*" }/\1"'"$3"'" }/g' $x
}
Unit test: replace_json_str test.json rpc-password somepassword
Issue
After running the above test, I get a file named test.json-e and the contents of the file is the same as before the test was ran, why?
there is a handy command line json tool called jq
cat input.json
{"rpc-password": "password"}
cat update_json.sh
givenkey=$1
givenvalue=$2
inputfile=input.json
outfile=output.json
cat $inputfile | jq . # show input json
jq --arg key1 "$givenkey" --arg val1 "$givenvalue" '.[$key1] = $val1' "$inputfile" > "$outfile"
cat "$outfile" | jq . # render output json
keep in mind jq can handle multiple such key value updates ... execute it
update_json.sh rpc-password somepassword
{
"rpc-password": "password"
}
{
"rpc-password": "somepassword"
}
Depends on which sed you're using.
The command you ran will work as expected with GNU sed.
But BSD sed does not allow for an empty argument to -i, and if you ran the same command, it will use the next argument, -e, as the backup file.
Also, the positioning of the spaces in your pattern don't match your example JSON.

how to detect when compiler emits an error

To compile a C++ project, I want to write a perl script to compile my program and see if the compilation went wrong or not. If the compiler gives any compilation error, I'll need to perform some other task.
The perl script will be something like this:
#l1 = `find . -name '*.c'`;
#l2 = `find . -name '*.cpp'`;
#l3 = `find . -name '*.cc'`;
my $err;
my $FLAGS = "-DNDEBUG"
push(#l , #l1, #l2, #l3);
chomp(#l);
foreach (#l) {
print "processing file $_ ...";
$err = `g++ $_ $FLAGS`;
if($err == something) {
#do the needful
}
}
so what should be something?
You should check $? instead, after g++....
perlvar
$?
The status returned by the last pipe close, backtick (`` ) command,
successful call to wait() or waitpid(), or from the system() operator.
The exit value of the subprocess is really ($?>> 8)
So you should check if g++ returned 0 (success) or non-zero.
if ($? >> 8) {
/* Error? */
}
IPC::System::Simple/IPC::Run3 make this easier