Use SED to replace curlybrace and add insert a character - regex
I have a ".csv" file that is generated by a BASH script. Within that script I have a sed statement to make some changes in the file the script output just 1 line earlier. I'm trying to sed the file and remove/replace a few encoding characters.
I'm trying to replace '{' in the file, wherever it occurs, with a zero '0'. Additionally, I need to prepend the match with a plus '+'.
Here is the most recent try (of hundreds of previous tries): sed -r 's/^(.*)([\{])(.*)$/\1\+0\3/g' -i "$FILENAME"
Here is a sample of my data:
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,F,BI,,,D,7391420002
Frustratingly, it only seems to match the first line and then quit, despite the global flag '/g' being on:
4240880002,9000413542,001,000000000000000+0,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,000000000000000{,000000000000011P,000000000000000{,000000000000011P,A,2006060000,,,F,BI,,,D,7391420002
Here is how I am trying to format it: (I included my next character replacement P=7):
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,F,BI,,,D,7391420002
My brain has been rendered to hamburger meat over this! :(
I sincerely appreciate your help!
UPDATE
This is conversion chart I'm working from:
Character Digit Sign
{ 0 +
A 1 +
B 2 +
C 3 +
D 4 +
E 5 +
F 6 +
G 7 +
H 8 +
I 9 +
} 0 -
J 1 -
K 2 -
L 3 -
M 4 -
N 5 -
O 6 -
P 7 -
Q 8 -
R 9 -
Your sedcommand was getting close. Two things to change:
Do not match beginning and end-of-line.
Match with characters that are not a ,.
You will get
sed -r 's/,([^,]*)\{([^,])*/,+\10\2/g; s/,([^,]*)P([^,]*)/,-\17\2/g' "$FILENAME"
Good grief, just use awk:
$ cat tst.awk
BEGIN {
mkmap("{ A B C D E F G H I","+")
mkmap("} J K L M N O P Q R","-")
FS=OFS=","
}
{
for (i=1; i<=NF; i++) {
for (char in map) {
num = map[char]
if ( sub(char,num,$i) ) {
$i = pfx[char] $i
}
}
}
print
}
function mkmap(list,sign, char,tmp,num) {
split(list,tmp,/ /)
for (num in tmp) {
char = tmp[num]
map[char] = num-1
pfx[char] = sign
}
}
.
$ awk -f tst.awk file
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+1,++29,,,+3,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,+1,2006060000,,,+6,++29,,,+4,7391420002
You don't say what to do if 2 characters appear in a field so idk if what I'm doing above is what you want or not but that code should be trivial enough to modify to do whatever it is you want with that and to add whatever other transformations you need.
This might work for you (GNU sed):
sed 's/[^,]*[{A-I]\+/+&/g;s/[^,]*[}J-R]\+/-&/g;y/{ABCDEFGHI}JKLMNOPQR/01234567890123456789/' file
First insert either + or - infront of fields containing the translation encodings. Then translate the encodings.
after reading it again, here is another solution with awk
$ awk 'BEGIN {FS=OFS=","}
{for(i=1;i<=NF;i++)
{if(gsub(/{/,0,$i)) $i="+"$i;
if(gsub(/P/,7,$i)) $i="-"$i} }1' file
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,A,BI,,,C,7639840002
4240880002,9000413542,001,+0000000000000000,-0000000000000117,+0000000000000000,-0000000000000117,A,2006060000,,,F,BI,,,D,7391420002
take +0 -0 +1 -1 out of your conversion chart, it should work.
I really appreciate all the help! You all helped me get very close. In the end, this is what ended up solving the issues. THANK YOU!
#!/bin/bash
function FixEncodedStrings(){
if [[ "$1" =~ ([0-9]{3,})\{([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})\{([0-9]{1,})?/+\10\2/g'
elif [[ "$1" =~ ([0-9]{3,})A([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})A([0-9]{1,})?/+\11\2/g'
elif [[ "$1" =~ ([0-9]{3,})B([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})B([0-9]{1,})?/+\12\2/g'
elif [[ "$1" =~ ([0-9]{3,})C([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})C([0-9]{1,})?/+\13\2/g'
elif [[ "$1" =~ ([0-9]{3,})D([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})D([0-9]{1,})?/+\14\2/g'
elif [[ "$1" =~ ([0-9]{3,})E([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})E([0-9]{1,})?/+\15\2/g'
elif [[ "$1" =~ ([0-9]{3,})F([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})F([0-9]{1,})?/+\16\2/g'
elif [[ "$1" =~ ([0-9]{3,})G([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})G([0-9]{1,})?/+\17\2/g'
elif [[ "$1" =~ ([0-9]{3,})H([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})H([0-9]{1,})?/+\18\2/g'
elif [[ "$1" =~ ([0-9]{3,})I([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})I([0-9]{1,})?/+\19\2/g'
elif [[ "$1" =~ ([0-9]{3,})\}([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})\}([0-9]{1,})?/-\10\2/g'
elif [[ "$1" =~ ([0-9]{3,})J([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})J([0-9]{1,})?/-\11\2/g'
elif [[ "$1" =~ ([0-9]{3,})K([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})K([0-9]{1,})?/-\12\2/g'
elif [[ "$1" =~ ([0-9]{3,})L([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})L([0-9]{1,})?/-\13\2/g'
elif [[ "$1" =~ ([0-9]{3,})M([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})M([0-9]{1,})?/-\14\2/g'
elif [[ "$1" =~ ([0-9]{3,})N([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})N([0-9]{1,})?/-\15\2/g'
elif [[ "$1" =~ ([0-9]{3,})O([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})O([0-9]{1,})?/-\16\2/g'
elif [[ "$1" =~ ([0-9]{3,})P([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})P([0-9]{1,})?/-\17\2/g'
elif [[ "$1" =~ ([0-9]{3,})Q([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})Q([0-9]{1,})?/-\18\2/g'
elif [[ "$1" =~ ([0-9]{3,})R([0-9]{1,})? ]]
then
echo "$1" | sed -r 's/([0-9]{3,})R([0-9]{1,})?/-\19\2/g'
fi
}
Related
multi-lines pattern matching
I have some files with content like this: file1: AAA BBB CCC 123 file2: AAA BBB 123 I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above. Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason my script: file=file1 if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then echo "$file" fi I ran it with bash -x, this is the output + file=file1 ++ head -3 file1 ++ tr -d '\n' ++ sed 's/\r//g' + [[ ASMUTCEDD == [A-Z] ]] +exit
What you missed: You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out) You can use the pipeline directly in the if statement, without [[ ... ]] Like this: file=file1 if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then echo "$file" fi
To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters. if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then echo "$file" fi
You can use built-ins and character classes for this problem:- #!/bin/bash file="file1" C=0 flag=0 while read line do (( ++C )) [ $C -eq 4 ] && break; [[ "$line" =~ '[^[:alpha:]]' ]] && flag=1 done < "$file" [ $flag -eq 0 ] && echo "$file"
Shell stderr redirection to another file using a text file as a argument
I wrote a bash script which would read the text file already provided in the argument and would process the text and redirect errors to the error file and other outputs to the list.txt file. #!/bin/bash filename="$1" while read line; do a=$(echo $line | awk "{print NF}") if [ "$a" = "3" ] then first=$(echo $line | awk -F' ' '{print $1}') last=$(echo $line | awk -F' ' '{print $2}') email=$(echo $line | awk -F' ' '{print $3}') if [[ $first =~ ^[a-zA-Z]+$ && $last =~ ^[a-zA-Z]+$ ]] then if [[ $email =~ '<([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}>' ]] then echo "$first $last $email" | cat >>list.txt elif [[ $email =~ '([\w\.\-_]+)?\w+#[\w-_]+(\.\w+){1,}' ]] then echo "$first $last <$email>" | cat >>list.txt else echo "$first $last $email" | cat >&2 fi else echo "$first $last $email" | cat >&2 fi else echo "$line" | cat >&2 fi done < $filename I run this code as $./script.sh argumentfile.txt 2>error.txt My argument file has following information Joe cable cable#ablecorp.com Bob Brown <bob_baker#bakerandsons.com> Jim Hass hass#bigcorp.com mike_lupo#mou.east.com Edison jones jones#inl.net.gov pirate.coe.su.com pirate people Ideal form of the file should be as (which is intentionally poorly formatted) lastname firstname <email> In the error file what I get is Joe cable cable#ablecorp.com Bob Brown <bob_baker#bakerandsons.com> Jim Hass hass#bigcorp.com mike_lupo#mou.east.com Edison jones jones#inl.net.gov pirate.coe.su.com pirate people
You could just do this entirely with awk: #!/bin/bash gawk '{ name_re = "^[[:alpha:]]+$" mail_re = "<?[[:alnum:]_.%+-]+#[[:alnum:]_.-]+\\.[[:alpha:]]{2,6}>?" # check for 3 fields with suitable regexp matching for all if (NF == 3 && $1 ~ name_re && $2 ~ name_re && $3 ~ mail_re) { # put brackets around the address if needed email = $3 ~ /^<.*?>$/ ? $3 : "<" $3 ">" # output to the good list print $1 " " $2 " " email > "list.txt" # move to the next one next } # output to the bad list print > "error.txt" }' "$1" Tested with BSD and Gnu versions of awk.
Regex - translate "a" to "zzzz" ranges to code. Only sed or grep
Using ONLY the commands: echo, grep, sed Argument $1 of the script is a code that has to be translated according to the table below and the translation sent to STDOUT. The code is a one to four digit code that starts in "A" and increments alphabetically to "ZZZZ". For example: A, B, .... Z, AA, AB,..... Code - Translation - A -> 1 - B to AA -> 2 - AB to AF -> 3 - AG to ZZZZ -> 4 For example if the script is called script.sh A the output would be 1. If the script is called script.sh ABC the output would be 4.
#!/bin/bash echo $1 | grep -e '^A$'>/dev/null && echo 1 && exit echo $1 | grep -e '^[^A]$' -e '^[A][A]$' >/dev/null && echo 2 && exit echo $1 | grep -e '^[A][B-F]$' >/dev/null && echo 3 && exit echo $1 | grep -e '^[A-Z]\{2,4\}$' >/dev/null && echo 4 && exit echo 0 exit
#!/bin/bash str=$1 if [[ -z "${str#A}" ]] then echo 1 elif [[ -z "${str#?}" || -z "${str#AA}" ]] then echo 2 elif [[ -z "${str/A[B-F]/}" ]] then echo 3 else echo 4 fi # just in case, so you can say you used them: grep . < /dev/null | sed > /dev/null EDIT: silly ., you should be ?. EDIT2: Command [ is gone, non-command [[ to the rescue!
In bash, how can I check a string for partials in an array?
If I have a string: s='path/to/my/foo.txt' and an array declare -a include_files=('foo.txt' 'bar.txt'); how can I check the string for matches in my array efficiently?
You could loop through the array and use a bash substring check for file in "${include_files[#]}" do if [[ $s = *${file} ]]; then printf "%s\n" "$file" fi done Alternately, if you want to avoid the loop and you only care that a file name matches or not, you could use the # form of bash extended globbing. The following example assumes that array file names do not contain |. shopt -s extglob declare -a include_files=('foo.txt' 'bar.txt'); s='path/to/my/foo.txt' printf -v pat "%s|" "${include_files[#]}" pat="${pat%|}" printf "%s\n" "${pat}" #prints foo.txt|bar.txt if [[ ${s##*/} = #(${pat}) ]]; then echo yes; fi
For an exact match to the file name: #!/bin/bash s="path/to/my/foo.txt"; ARR=('foo.txt' 'bar.txt'); for str in "${ARR[#]}"; do # if [ $(echo "$s" | awk -F"/" '{print $NF}') == "$str" ]; then if [ $(basename "$s") == "$str" ]; then # A better option than awk for sure... echo "match"; else echo "no match"; fi; done
Delete everything except all surrounded by ()
Let's say i have file like this adsf(2) af(3) g5a(65) aafg(1245) a(3)df How can i get from this only numbers between ( and ) ? using BASH
A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'. sed sed 's|.*(\([0-9]\+\).*|\1|' input awk awk -F'[()]' '/./{print $2}' input 2 3 65 1245 3 pure bash #!/bin/bash IFS="()" while read a b; do if [ -z $b ]; then continue fi echo $b done < input and finally, using tr cat input | tr -d '[a-z()]'
while read line; do if [ -z "$line" ]; then continue fi line=${line#*(} line=${line%)*} echo $line done < file
Positive lookaround: $ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))' 2 456
Another one: while read line ; do [[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}" done < file