Use sed to replace matched value from associative bash array - regex

I'm using sed to reformat an input string, a portion of which I want replaced with a different string.
The input string is a date in the format of:
%Y-%m-%dT%H:%M:%S.%N%:z
Example:
2016-01-20T08:15:32.398242-05:00
My goal is to replace the month, 01 in the example above, with a string representation such as Jan.
I've defined the following array to use:
declare -A MONTHS=([01]="Jan" [02]="Feb" [03]="Mar" [04]="Apr" [05]="May" [06]="Jun" [07]="Jul" [08]="Aug" [09]="Sep" [10]="Oct" [11]="Nov" [12]="Dec")
I can't seem to get sed to use a matched group's value as the index to the MONTHS array.
What I've tried:
# straightforward sed approach
sed 's/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}/g'
# result: ${MONTHS[01]}
# break out of the single quotes
sed 's/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/'"${MONTHS[\1]}"'/g'
# result:
# use double quotes
sed "s/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}/g"
# result:
# use double quotes *and* a hardcoded example
sed "s/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}, ${MONTHS[01]}/g"
# result: , Jan
Is it possible to use a matched-group value from sed as an array index in the replacement?
Note: I'm purposefully avoiding the date function because the application of this can go beyond actual dates; but, I'm definitely open to alternative approaches such as awk.

I suggest this awk as an alternative:
s='2016-01-20T08:15:32.398242-05:00'
awk -v ms='Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec' 'BEGIN{
split(ms, mths, ":"); FS=OFS="-"} {$2=mths[$2+0]} 1' <<< "$s"
Output:
2016-Jan-20T08:15:32.398242-05:00

First, you can convert your associative array to a string containing the months names in order
monstr=$(for k in "${!MONTHS[#]}"; do echo $k; done | sort | while read mon; do echo ${MONTHS[$mon]}; done)
then, use awk to do the heavy lifting
awk -F- -v monstr="$monstr" 'BEGIN { split(monstr, mon, " "); } { printf("%s-%s-", $1, mon[$2+0]); for (i=3; i < NF; i++) { printf("%s-", $i); } printf("%s\n", $NF);}'
That is, store the string containing the months in a varaible that you split at the beginning, then replace the second field and print all.

First generate sed script from your array, then execute it.
Disclaimer: not sure that I correctly used bash array in the following code. Also not sure about quotes and escaping.
for k in $(seq -w 1 12) ; do
echo 's/^[0-9]\{4\}-'"$k-.*/${MONTHS[$k]}/;"
done | sed -f - your_file
Alternatively just use bash:
IFS=- read year mon rest <<<"$string"
string="$year ${MONTHS[$mon]} $rest"

If it must be sed... Here is a "brute force" answer using the t command:
#! /bin/sed -f
s/-01-/-Jan-/; tx
s/-02-/-Feb-/; tx
s/-03-/-Mar-/; tx
s/-04-/-Apr-/; tx
s/-05-/-May-/; tx
s/-06-/-Jun-/; tx
s/-07-/-Jul-/; tx
s/-08-/-Aug-/; tx
s/-09-/-Sep-/; tx
s/-10-/-Oct-/; tx
s/-11-/-Nov-/; tx
s/-12-/-Dec-/; tx
:x

Related

CLI: Increment a number in a string while keeping padded zeroes

I currently use this perl command to increment the last number in a string:
perl -pe 's/(\d+)(?!.*\d+)/$1+1/e' <<< "abc123_00456.txt"
It outputs abc123_457.txt, while I want abc123_00457.txt.
I also want something like 99 to increment to 100, though if that's too hard, 00 is also acceptable.
Some more examples of what I want:
09 -> 10
004 -> 005
I also want to be able to increment by any number (not just 1), so no ++.
I do not want to use shell's builtins to accomplish this.
Try this:
perl -pe 's/(\d+)(?=\D*\z)/my $n = $1; ++$n/e' <<< "abc123_00456.txt"
The ++ operator preserves the number of digits when incrementing a string.
Alternatively:
perl -pe 's/(\d+)(?=\D*\z)/sprintf "%0*d", length($1), $1 + 1/e' <<< "abc123_00456.txt"
This lets you increment by more than just 1 (or perform other arithmetic operations).
sprintf %d formats an integer in decimal format. 0 means to pad the result with zeroes; * means the maximum field width is taken from the next argument instead of the format string itself. (E.g. %05d means "format a number by padding it with zeroes until it is at least 5 characters wide".)
Here we simply take the length of the original string of digits (length($1)) and use it as our field width. The number to format is $1 + 1. If it is shorter than the original string, sprintf automatically adds zeroes.
See also perldoc -f sprintf.
You can use a formatted string with sprintf:
perl -pe 's/(\d+)(?!.*\d)/sprintf("%05d",$1+1)/e' <<< "abc123_00456.txt"
The 5 gives the width of your number, the 0 is the character used to pad the number.
For an unknow width, you can build dinamically the formatted string:
perl -pe 's/(\d+)(?!.*\d)/sprintf("%0".length($1)."d",$1+1)/e' <<< "abc123_00456.txt"
With GNU awk for the 3rd arg to match():
$ awk -v n=17 'match($0,/(.*[^0-9])([0-9]+)(.*)/,a){$0=a[1] sprintf("%0*d",length(a[2]),a[2]+n) a[3]} 1' <<< "abc123_00456.txt"
abc123_00473.txt
With any awk in any shell on every UNIX box:
$ awk -v n=17 'match($0,/[0-9]+\./){lgth=RLENGTH-1; tgt=substr($0,RSTART,lgth); $0=substr($0,1,RSTART-1) sprintf("%0*d",lgth,tgt+n) substr($0,RSTART+lgth)} 1' <<< "abc123_00456.txt"
abc123_00473.txt
This might work for you (GNU sed & Bash):
sed -E 's/^([^0-9]*([0-9]+[^0-9]+)*0*)([0-9]+)(.*)/echo "\1$((\3+1))\4"/e' file

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

Extract all but last field from a variable in bash

I have a file with lines similar to this:
01/01 THIS IS A DESCRIPTION 123.45
12/23 SHORTER DESC 9.00
11/16 DESC 1,234.00
Three fields: date, desc, amount. The first field will always be followed by a space. The last field will always be preceded by a space. But the middle field will usually contain spaces.
I know bash/regex well enough to get the first and last fields (for example, echo ${LINE##* } or cut -f1 -d\). But how do I get the middle field? Essentially everything except the first and last fields.
You can use sed for that:
$ sed -E 's/^[^[:space:]]*[[:space:]](.*)[[:space:]][^[:space:]]*$/\1/' file
THIS IS A DESCRIPTION
SHORTER DESC
DESC
Or with awk:
$ awk '{$1=$NF=""; sub(/^[ \t]*/,"")}1' file
# same output
You can also use cut and rev to delete the first and last fields:
$ cut -d ' ' -f2- file | rev | cut -d ' ' -f2- | rev
# same output
Or GNU grep:
$ grep -oP '^\H+\h\K(.*)(?=\h+\H+$)' file
# same output
Or, with a Bash loop and parameter expansion:
$ while read -r line; do line="${line#* }"; echo "${line% *}"; done <file
# same output
Or, if you want to capture the fields as variables in Bash:
while IFS= read -r line; do
date="${line%% *}"
amt="${line##* }"
line="${line#* }"
desc="${line% *}"
printf "%5s %10s \"%s\"\n" "$date" "$amt" "$desc"
done <file
Prints:
01/01 123.45 "THIS IS A DESCRIPTION"
12/23 9.00 "SHORTER DESC"
11/16 1,234.00 "DESC"
If you want to remove the first and last fields, you can just extend the parameter expansion technique you referenced:
var=${var#* } var=${var% *}
A single # or % removes the shortest substring that matches the glob.
bash: read the line into an array of words, and pick out the wanted elements from the array
while read -ra words; do
date=${words[0]}
amount=${words[-1]}
description=${words[*]:1:${#words[#]}-2}
printf "%s=%s\n" date "$date" desc "$description" amt "$amount"
done < file
outputs
date=01/01
desc=THIS IS A DESCRIPTION
amt=123.45
date=12/23
desc=SHORTER DESC
amt=9.00
date=11/16
desc=DESC
amt=1,234.00
This is the fun bit: ${words[*]:1:${#words[#]}-2}
take a slice of the words array, from index 1 (the 2nd element) for a length of "number of elements minus 2"
the words will be joined into a single string with a space separator.
See Shell Parameter Expansion and scroll down a bit for the ${parameter:offset:length} discussion.
If you want to use a regex in bash, then you can use capturing parentheses and the BASH_REMATCH array
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)" "(.+)" "([^[:blank:]]+) ]]; then
echo "date=${BASH_REMATCH[1]}"
echo "desc=${BASH_REMATCH[2]}"
echo "amt=${BASH_REMATCH[3]}"
fi
done < file
Same output as above.
Notice in the pattern that the spaces need to be quoted (or backslash-escaped)
You could try below one with awk:
awk '{$1="";$NF="";sub(/^[ \t]*/,"")}1' file_name

Extract substring from string with sed

I want to extract MIB-Objects from snmpwalk output. The output FILE looks like:
RFC1213-MIB::sysDescr.0.0.0.0.192.168.1.2 = STRING: "Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64"
RFC1213-MIB::sysObjectID.0 = OID: RFC1155-SMI::enterprises.8072.3.2.10
..
First, I read the output file, split at character = and remove everything between RFC1213-MIB:: and .0 till the end of the string.
while read -r; do echo "${REPLY%%=*}" | sed -e 's/RFC1213-MIB::\(.*\)\.0/\1/'; done <$FILE
My current output:
sysDescr.0.0.0.192.168.1.2
sysObjectID
How can I remove the other values? Is there a better solution of extracting sysDescr, sysObjectID?
With awk:
awk -F[:.] '{print $3}'
(define : and . as field delimiters and display the 3rd field)
with sed (Gnu):
sed 's/^[^:]*::\|\.0.*//g'
(replace with the empty string all that isn't a : followed by :: at the start of the line or the first .0 and following characters until the end of the line)
Maybe you can try with:
sed 's/RFC1213-MIB::\([^\.]*\).*/\1/' $FILE
This will get everything that is not a dot (.) following the RFC1213-MIB:: string.
If you don't want to use sed, you can just use parameter substitution. sed is an external process so it won't be as fast as parameter substitution since it's a bash built in.
while IFS= read -r line; do line=${line#*::}; line=${line%%.*}; echo $line; done < file
line=${line#*::} assumes RFC1213-MIB does not have two colons and will be split from sysDescr with two colons.
line=${line%%.*} assumes sysDescr will have a . after it.
If you have more examples, that you think won't work, I can update my answer.

Replace number of specified characters

I have something like this:
aaaaaaaaaaaaaaaaaaaaaaaaa
I need something that will allow me to replace a with another character like c from left to right according to the specified number.
For example:
some_command 3 should replace the first 3 a with c
cccaaaaaaaaaaaaaaaaaaaaaa
some_command 15
cccccccccccccccccaaaaaaaaaa
This can be done entirely in bash:
some_command() {
a="aaaaaaaaaaaaaaaaaaaaaaaaa"
c="ccccccccccccccccccccccccc"
echo "${c:0:$1}${a:$1}"
}
> some_command 3
cccaaaaaaaaaaaaaaaaaaaaaa
Using awk:
s='aaaaaaaaaaaaaaaaaaaaaaaaa'
awk -F "\0" -v n=3 -v r='c' '{for (i=1; i<=n; i++) $i=r}1' OFS= <<< "$s"
cccaaaaaaaaaaaaaaaaaaaaaa
This might work for you (GNU sed):
sed -r ':a;/a/{x;/^X{5}$/{x;b};s/$/X/;x;s/a/c/;ba} file
This will replace the first 5 a's with c throughout the file:
sed -r ':a;/a/{x;/^X{5}$/{z;x;b};s/$/X/;x;s/a/c/;ba} file
This will replace the first 5 a's with cfor each line throughout the file.
#/bin/bash
char=c
word=aaaaaaaaaaaaaaaaaaaaaaaaa
# pass in the number of chars to replace
replaceChar () {
num=$1
newword=""
# this for loop to concatenate the chars could probably be optimized
for i in $(seq 1 $num); do newword="${newword}${char}"; done
word="${newword}${word:$num}"
echo $word
}
replaceChar 4
A more general solution than the OP asked for, building on #anubhava's excellent answer.
Parameterizes the replacement count as well as the "before and after" chars.
The "before" char is matched anywhere - not just at the beginning of the input string, and whether adjacent to other instances or not.
Input is taken from stdin, so multiple lines can be piped in.
# Usage:
# ... | some_command_x replaceCount beforeChar afterChar
some_command_x() {
awk -F '\0' -v n="$1" -v o="${2:0:1}" -v r="${3:0:1}" -v OFS='' \
'{
while(++i <= NF)
{ if ($i==o) { if (++n_matched > n) break; $i=r } }
{ i=n_matched=0; print }
}'
}
# Example:
some_command_x 2 a c <<<$'abc_abc_abc\naaa rating'
# Returns:
cbc_cbc_abc
cca rating
Perl has some interesting features that can be exploited. Define the following bash script some_command:
#! /bin/bash
str="aaaaaaaaaaaaaaaaaaaaaaaaa"
perl -s -nE'print s/(a{$x})/"c" x length $1/er' -- -x=$1 <<<"$str"
Testing:
$ some_command 5
cccccaaaaaaaaaaaaaaaaaaaa