SED - Regex fails - regex

Given the following files:
input_file:
if_line1
if_line2
template_file_1:
temp_file_line1
temp_file_line2
##regex_match## <= must be replaced by input_file
temp_file_line3
template_file_2:
temp_file_line1
temp_file_line2
{my_file.global} <= must be replaced by input_file
temp_file_line3
output_file:
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3
For template_file_1 the following sed command works:
sed -n -e '/##regex_match##/{r input_file' -e 'b' -e '}; p' template_file_1 > output_file
However, for template_file_2 the analog sed command fails:
sed -r -n -e '/(?<={).+\.global(?=})/{r input_file' -e 'b' -e '}; p' template_file_2 > output_file
sed complains the regular expression was invalid
The given regex is at least PCRE valid, for example grep -oP '(?<={).+\.global(?=})' template_file_2 works. Any idea how to deal with that?

perl one-liners:
perl -pe 'do {local $/; open $f, "<input_file"; $_ = <$f>; close $f} if /\{.+?\.global\}/' template_file_2
or perhaps this one, not "pure" perl
perl -ne 'if (/\{.+?\.global\}/) {system("cat","input_file")} else {print}' template_file_2
Using CPAN modules can make this really tidy:
perl -MPath::Tiny -pe '$_ = path("input_file")->slurp if /\{.+?\.global\}/' template_file_2

idk exactly what that PCRE is intended to do but taking a guess at it, this will work using any awk in any shell on every UNIX box:
$ awk 'NR==FNR{new=new s $0; s=ORS; next} /##regex_match##/{$0=new} 1' input_file template_file_1
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3
$ awk 'NR==FNR{new=new s $0; s=ORS; next} /\{[^.{}]+\.global}/{$0=new} 1' input_file template_file_2
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3

Related

sed find with a regex and replace does not work [duplicate]

I'm trying to refine my code by getting rid of unnecessary white spaces, empty lines, and having parentheses balanced with a space in between them, so:
int a = 4;
if ((a==4) || (b==5))
a++ ;
should change to:
int a = 4;
if ( (a==4) || (b==5) )
a++ ;
It does work for the brackets and empty lines. However, it forgets to reduce the multiple spaces to one space:
int a = 4;
if ( (a==4) || (b==5) )
a++ ;
Here is my script:
#!/bin/bash
# Script to refine code
#
filename=read.txt
sed 's/((/( (/g' $filename > new.txt
mv new.txt $filename
sed 's/))/) )/g' $filename > new.txt
mv new.txt $filename
sed 's/ +/ /g' $filename > new.txt
mv new.txt $filename
sed '/^$/d' $filename > new.txt
mv new.txt $filename
Also, is there a way to make this script more concise, e.g. removing or reducing the number of commands?
If you are using GNU sed then you need to use sed -r which forces sed to use extended regular expressions, including the wanted behavior of +. See man sed:
-r, --regexp-extended
use extended regular expressions in the script.
The same holds if you are using OS X sed, but then you need to use sed -E:
-E Interpret regular expressions as extended (modern) regular expressions
rather than basic regular regular expressions (BRE's).
You have to preceed + with a \, otherwise sed tries to match the character + itself.
To make the script "smarter", you can accumulate all the expressions in one sed:
sed -e 's/((/( (/g' -e 's/))/) )/g' -e 's/ \+/ /g' -e '/^$/d' $filename > new.txt
Some implementations of sed even support the -i option that enables changing the file in place.
Sometimes, -r and -e won't work.
I'm using sed version 4.2.1 and they aren't working for me at all.
A quick hack is to use the * operator instead.
So let's say we want to replace all redundant space characters with a single space:
We'd like to do:
sed 's/ +/ /'
But we can use this instead:
sed 's/ */ /'
(note the double-space)
May not be the cleanest solution. But if you want to avoid -E and -r to remain compatible with both versions of sed, you can do a repeat character cc* - that's 1 c then 0 or more c's == 1 or more c's.
Or just use the BRE syntax, as suggested by #cdarke, to match a specific number or patternsc\{1,\}. The second number after the comma is excluded to mean 1 or more.
This might work for you:
sed -e '/^$/d' -e ':a' -e 's/\([()]\)\1/\1 \1/g' -e 'ta' -e 's/ */ /g' $filename >new.txt
on the bash front;
First I made a script test.sh
cat test.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
SRC=`echo $line | awk '{print $1}'`
DEST=`echo $line | awk '{print $2}'`
echo "moving $SRC to $DEST"
mv $SRC $DEST || echo "move $SRC to $DEST failed" && exit 1
done < "$1"
then we make a data file and a test file aaa.txt
cat aaa.txt
<tag1>19</tag1>
<tag2>2</tag2>
<tag3>-12</tag3>
<tag4>37</tag4>
<tag5>-41</tag5>
then test and show results.
bash test.sh list.txt
Text read from file: aaa.txt bbb.txt
moving aaa.txt to bbb.txt

bash regex multiple match in one line

I'm trying to process my text.
For example i got:
asdf asdf get.this random random get.that
get.it this.no also.this.no
My desired output is:
get.this get.that
get.it
So regexp should catch only this pattern (get.\w), but it has to do it recursively because of multiple occurences in one line, so easiest way with sed
sed 's/.*(REGEX).*/\1/'
does not work (it shows only first occurence).
Probably the good way is to use grep -o, but i have old version of grep and -o flag is not available.
This grep may give what you need:
grep -o "get[^ ]*" file
Try awk:
awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
You might need to tweak the regex between the slashes for your specific issue. Sample output:
$ awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
get.this
get.that
get.it
With awk:
awk -v patt="^get" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
printf "%s%s", $i, OFS;
print ""
}' <<< "$text"
bash
while read -a words; do
for word in "${words[#]}"; do
if [[ $word == get* ]]; then
echo -n "$word "
fi
done
echo
done <<< "$text"
perl
perl -lane 'print join " ", grep {$_ =~ /^get/} #F' <<< "$text"
This might work for you (GNU sed):
sed -r '/\bget\.\S+/{s//\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1 /g;s/ $//}' file
or if you want one per line:
sed -r '/\n/!s/\bget\.\S+/\n&\n/g;/^get/P;D' file

Regex to get number after last underscore

I am having trouble coming up with the regex command that will get me Y in the following string X_X_X_Y . BTW: Y is an interger, but can validate that after.
You could use shell parameter expansion:
$ s="X_X_X_Y"
$ echo "${s##*_}"
Y
Using sed:
$ sed 's/.*_//' <<< "$s"
Y
Using grep:
$ grep -oP '.*_\K.*' <<< "$s"
Y
This regex will work as long at the stuff you're matching for is an integer
[^_]+_[^_]+_[^_]+_(\d+)
as an alternative, if you are always tokenizing on the _ char you can skip regex and use awk
echo 'X_X_X_Y' | awk -F_ '{print $NF}'
Using BASH regex:
s='s="X_X_X_10'
[[ "$s" =~ [^_]+$ ]] && echo "${BASH_REMATCH[0]}"
10
This will print an integer at the end of the string after an underscore.
perl -e '"0_0_0_1" =~ /_([0-9]+)$/; print $1,"\n" if defined $1'
1
This might work for you:
sed 's/.*_\([0-9][0-9]*\)/\1/' file

How to change the contents between "{" and "}" using sed?

For example, we have a file like:
{abc}...{def}
How to append 123 at the end of every string inside the {} and meanwhile, remove the {}? The above would be changed to:
abc123...def123
Use this sed:
echo "$s"|sed 's/{\([^}]*\)}/\1123/g'
abc123...def123
Or using awk:
awk -v x=123 -F '[{}]' '{for(i=1; i<=NF; i++) if (i%2) printf $i, OFS; else printf $i x, OFS; print ""}'
abc123...def123
You could use a sed capturing group
echo '{abc}...{def}' | sed 's/{\([^}]*\)}/\1123/g'
abc123...def123
sed 's/{//g; s/}/123/g'
test:
kent$ echo "{abc}...{def}"|sed 's/{//g; s/}/123/g'
abc123...def123
Using gnu awk
echo '{abc}...{def}' | awk '{print gensub(/{([^}]*)}/,"\\1123","g")}'
abc123...def123
This might work for you (GNU sed):
sed ':a;s/{\([^{}]*\)}/\1123/g;ta' file
or:
sed -e ':a' -e 's/{\([^{}]*\)}/\1123/g' -e 'ta' file

perl regex to extract a specifc word

I have the following exmaple of a text file:
AFUA_2G08360|pyrG
AFUA_2G12630
gel1|bgt2|AFUA_2G01170
and I wish to do a regex to filter out AFUA_2G08360, AFUA_2G12630, AFUA_2G01170 using perl -l -ne in unix command line.
How would you suggest to do that?
why not using 'sed' with something like
sed 's/AFUA_2G\d{5}//'
Try this expression:
/(AFUA_2G\d+)/g
Here is a doable one-liner for your example input.
cat data | perl -l -e 'while (<>) {s/.*(AFUA_[^\|]*).*/\1/g; print}'
AFUA_[0-9A-Za-z]{7}
See here : http://regexr.com?328gj
Command line :
user#mch:/tmp$ cat input.txt
AFUA_2G08360|pyrG
AFUA_2G12630
gel1|bgt2|AFUA_2G01170
user#mch:/tmp$ cat input.txt | perl -lne "#matches = /AFUA_[0-9A-Za-z]{7}/g; print join("\n", #matches)";
AFUA_2G08360
AFUA_2G12630
AFUA_2G01170
use
perl -pe 's/.*(AFUA_[0-9a-zA-Z]*).*$/\1/' your_file
tested:
> cat temp
AFUA_2G08360|pyrG
AFUA_2G12630
gel1|bgt2|AFUA_2G01170
> perl -pe 's/.*(AFUA_[0-9a-zA-Z]*).*$/\1/' temp
AFUA_2G08360
AFUA_2G12630
AFUA_2G01170