Use SED to replace curlybrace and add insert a character - regex

I have a ".csv" file that is generated by a BASH script. Within that script I have a sed statement to make some changes in the file the script output just 1 line earlier. I'm trying to sed the file and remove/replace a few encoding characters.
I'm trying to replace '{' in the file, wherever it occurs, with a zero '0'. Additionally, I need to prepend the match with a plus '+'.
Here is the most recent try (of hundreds of previous tries): sed -r 's/^(.*)([\{])(.*)$/\1\+0\3/g' -i "$FILENAME"
Here is a sample of my data:
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,F,BI,,,D,7391420002
Frustratingly, it only seems to match the first line and then quit, despite the global flag '/g' being on:
4240880002,9000413542,001,000000000000000+0,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,F,BI,,,D,7391420002
Here is how I am trying to format it: (I included my next character replacement P=7):
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,F,BI,,,D,7391420002
My brain has been rendered to hamburger meat over this! :(
I sincerely appreciate your help!
UPDATE
This is conversion chart I'm working from:
Character Digit Sign
{ 0 +
A 1 +
B 2 +
C 3 +
D 4 +
E 5 +
F 6 +
G 7 +
H 8 +
I 9 +
} 0 -
J 1 -
K 2 -
L 3 -
M 4 -
N 5 -
O 6 -
P 7 -
Q 8 -
R 9 -

Your sedcommand was getting close. Two things to change:
Do not match beginning and end-of-line.
Match with characters that are not a ,.
You will get
sed -r 's/,([^,]*)\{([^,])*/,+\10\2/g; s/,([^,]*)P([^,]*)/,-\17\2/g' "$FILENAME"

Good grief, just use awk:
$ cat tst.awk
BEGIN {
mkmap("{ A B C D E F G H I","+")
mkmap("} J K L M N O P Q R","-")
FS=OFS=","
}
{
for (i=1; i<=NF; i++) {
for (char in map) {
num = map[char]
if ( sub(char,num,$i) ) {
$i = pfx[char] $i
}
}
}
print
}
function mkmap(list,sign, char,tmp,num) {
split(list,tmp,/ /)
for (num in tmp) {
char = tmp[num]
map[char] = num-1
pfx[char] = sign
}
}
.
$ awk -f tst.awk file
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+6,++29,,,+4,7391420002
You don't say what to do if 2 characters appear in a field so idk if what I'm doing above is what you want or not but that code should be trivial enough to modify to do whatever it is you want with that and to add whatever other transformations you need.

This might work for you (GNU sed):
sed 's/[^,]*[{A-I]\+/+&/g;s/[^,]*[}J-R]\+/-&/g;y/{ABCDEFGHI}JKLMNOPQR/01234567890123456789/' file
First insert either + or - infront of fields containing the translation encodings. Then translate the encodings.

after reading it again, here is another solution with awk
$ awk 'BEGIN {FS=OFS=","}
{for(i=1;i<=NF;i++)
{if(gsub(/{/,0,$i)) $i="+"$i;
if(gsub(/P/,7,$i)) $i="-"$i} }1' file
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,F,BI,,,D,7391420002

take +0 -0 +1 -1 out of your conversion chart, it should work.

I really appreciate all the help! You all helped me get very close. In the end, this is what ended up solving the issues. THANK YOU!
#!/bin/bash
function FixEncodedStrings(){
if [[ "$1" =~ ([0-9]{3,})\{([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})\{([0-9]{1,})?/+\10\2/g'
elif [[ "$1" =~ ([0-9]{3,})A([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})A([0-9]{1,})?/+\11\2/g'
elif [[ "$1" =~ ([0-9]{3,})B([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})B([0-9]{1,})?/+\12\2/g'
elif [[ "$1" =~ ([0-9]{3,})C([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})C([0-9]{1,})?/+\13\2/g'
elif [[ "$1" =~ ([0-9]{3,})D([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})D([0-9]{1,})?/+\14\2/g'
elif [[ "$1" =~ ([0-9]{3,})E([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})E([0-9]{1,})?/+\15\2/g'
elif [[ "$1" =~ ([0-9]{3,})F([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})F([0-9]{1,})?/+\16\2/g'
elif [[ "$1" =~ ([0-9]{3,})G([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})G([0-9]{1,})?/+\17\2/g'
elif [[ "$1" =~ ([0-9]{3,})H([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})H([0-9]{1,})?/+\18\2/g'
elif [[ "$1" =~ ([0-9]{3,})I([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})I([0-9]{1,})?/+\19\2/g'
elif [[ "$1" =~ ([0-9]{3,})\}([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})\}([0-9]{1,})?/-\10\2/g'
elif [[ "$1" =~ ([0-9]{3,})J([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})J([0-9]{1,})?/-\11\2/g'
elif [[ "$1" =~ ([0-9]{3,})K([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})K([0-9]{1,})?/-\12\2/g'
elif [[ "$1" =~ ([0-9]{3,})L([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})L([0-9]{1,})?/-\13\2/g'
elif [[ "$1" =~ ([0-9]{3,})M([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})M([0-9]{1,})?/-\14\2/g'
elif [[ "$1" =~ ([0-9]{3,})N([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})N([0-9]{1,})?/-\15\2/g'
elif [[ "$1" =~ ([0-9]{3,})O([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})O([0-9]{1,})?/-\16\2/g'
elif [[ "$1" =~ ([0-9]{3,})P([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})P([0-9]{1,})?/-\17\2/g'
elif [[ "$1" =~ ([0-9]{3,})Q([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})Q([0-9]{1,})?/-\18\2/g'
elif [[ "$1" =~ ([0-9]{3,})R([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})R([0-9]{1,})?/-\19\2/g'
fi
}

Related

multi-lines pattern matching

I have some files with content like this:
file1:
AAA
BBB
CCC
123
file2:
AAA
BBB
123
I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above.
Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason
my script:
file=file1
if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then
echo "$file"
fi
I ran it with bash -x, this is the output
+ file=file1
++ head -3 file1
++ tr -d '\n'
++ sed 's/\r//g'
+ [[ ASMUTCEDD == [A-Z] ]]
+exit
What you missed:
You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out)
You can use the pipeline directly in the if statement, without [[ ... ]]
Like this:
file=file1
if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then
echo "$file"
fi
To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters.
if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then
echo "$file"
fi
You can use built-ins and character classes for this problem:-
#!/bin/bash
file="file1"
C=0
flag=0
while read line
do
(( ++C ))
[ $C -eq 4 ] && break;
[[ "$line" =~ '[^[:alpha:]]' ]] && flag=1
done < "$file"
[ $flag -eq 0 ] && echo "$file"

Shell stderr redirection to another file using a text file as a argument

I wrote a bash script which would read the text file already provided in the argument and would process the text and redirect errors to the error file and other outputs to the list.txt file.
#!/bin/bash
filename="$1"
while read line; do
a=$(echo $line | awk "{print NF}")
if [ "$a" = "3" ]
then
first=$(echo $line | awk -F' ' '{print $1}')
last=$(echo $line | awk -F' ' '{print $2}')
email=$(echo $line | awk -F' ' '{print $3}')
if [[ $first =~ ^[a-zA-Z]+$ && $last =~ ^[a-zA-Z]+$ ]]
then
if [[ $email =~ '<([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}>' ]]
then
echo "$first $last $email" | cat >>list.txt
elif [[ $email =~ '([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}' ]]
then
echo "$first $last <$email>" | cat >>list.txt
else
echo "$first $last $email" | cat >&2
fi
else
echo "$first $last $email" | cat >&2
fi
else
echo "$line" | cat >&2
fi
done < $filename
I run this code as $./script.sh argumentfile.txt 2>error.txt
My argument file has following information
Joe cable cable#ablecorp.com
Bob Brown <bob_baker#bakerandsons.com>
Jim Hass hass#bigcorp.com
mike_lupo#mou.east.com
Edison jones jones#inl.net.gov
pirate.coe.su.com pirate people
Ideal form of the file should be as (which is intentionally poorly formatted)
lastname firstname <email>
In the error file what I get is
Joe cable cable#ablecorp.com
Bob Brown <bob_baker#bakerandsons.com>
Jim Hass hass#bigcorp.com
mike_lupo#mou.east.com
Edison jones jones#inl.net.gov
pirate.coe.su.com pirate people
You could just do this entirely with awk:
#!/bin/bash
gawk '{
name_re = "^[[:alpha:]]+$"
mail_re = "<?[[:alnum:]_.%+-]+#[[:alnum:]_.-]+\\.[[:alpha:]]{2,6}>?"
# check for 3 fields with suitable regexp matching for all
if (NF == 3 && $1 ~ name_re && $2 ~ name_re && $3 ~ mail_re) {
# put brackets around the address if needed
email = $3 ~ /^<.*?>$/ ? $3 : "<" $3 ">"
# output to the good list
print $1 " " $2 " " email > "list.txt"
# move to the next one
next
}
# output to the bad list
print > "error.txt"
}' "$1"
Tested with BSD and Gnu versions of awk.

Regex - translate "a" to "zzzz" ranges to code. Only sed or grep

Using ONLY the commands: echo, grep, sed
Argument $1 of the script is a code that has to be translated according to the table below and the translation sent to STDOUT. The code is a one to four digit code that starts in "A" and increments alphabetically to "ZZZZ". For example: A, B, .... Z, AA, AB,.....
Code - Translation
- A -> 1
- B to AA -> 2
- AB to AF -> 3
- AG to ZZZZ -> 4
For example if the script is called script.sh A the output would be 1. If the script is called script.sh ABC the output would be 4.
#!/bin/bash
echo $1 | grep -e '^A$'>/dev/null && echo 1 && exit
echo $1 | grep -e '^[^A]$' -e '^[A][A]$' >/dev/null && echo 2 && exit
echo $1 | grep -e '^[A][B-F]$' >/dev/null && echo 3 && exit
echo $1 | grep -e '^[A-Z]\{2,4\}$' >/dev/null && echo 4 && exit
echo 0
exit
#!/bin/bash
str=$1
if [[ -z "${str#A}" ]]
then
echo 1
elif [[ -z "${str#?}" || -z "${str#AA}" ]]
then
echo 2
elif [[ -z "${str/A[B-F]/}" ]]
then
echo 3
else
echo 4
fi
# just in case, so you can say you used them:
grep . < /dev/null | sed > /dev/null
EDIT: silly ., you should be ?.
EDIT2: Command [ is gone, non-command [[ to the rescue!

In bash, how can I check a string for partials in an array?

If I have a string:
s='path/to/my/foo.txt'
and an array
declare -a include_files=('foo.txt' 'bar.txt');
how can I check the string for matches in my array efficiently?
You could loop through the array and use a bash substring check
for file in "${include_files[#]}"
do
if [[ $s = *${file} ]]; then
printf "%s\n" "$file"
fi
done
Alternately, if you want to avoid the loop and you only care that a file name matches or not, you could use the # form of bash extended globbing. The following example assumes that array file names do not contain |.
shopt -s extglob
declare -a include_files=('foo.txt' 'bar.txt');
s='path/to/my/foo.txt'
printf -v pat "%s|" "${include_files[#]}"
pat="${pat%|}"
printf "%s\n" "${pat}"
#prints foo.txt|bar.txt
if [[ ${s##*/} = #(${pat}) ]]; then echo yes; fi
For an exact match to the file name:
#!/bin/bash
s="path/to/my/foo.txt";
ARR=('foo.txt' 'bar.txt');
for str in "${ARR[#]}";
do
# if [ $(echo "$s" | awk -F"/" '{print $NF}') == "$str" ]; then
if [ $(basename "$s") == "$str" ]; then # A better option than awk for sure...
echo "match";
else
echo "no match";
fi;
done

Delete everything except all surrounded by ()

Let's say i have file like this
adsf(2)
af(3)
g5a(65)
aafg(1245)
a(3)df
How can i get from this only numbers between ( and ) ?
using BASH
A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'.
sed
sed 's|.*(\([0-9]\+\).*|\1|' input
awk
awk -F'[()]' '/./{print $2}' input
2
3
65
1245
3
pure bash
#!/bin/bash
IFS="()"
while read a b; do
if [ -z $b ]; then
continue
fi
echo $b
done < input
and finally, using tr
cat input | tr -d '[a-z()]'
while read line; do
if [ -z "$line" ]; then
continue
fi
line=${line#*(}
line=${line%)*}
echo $line
done < file
Positive lookaround:
$ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))'
2
456
Another one:
while read line ; do
[[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}"
done < file