Replace number of specified characters - regex

I have something like this:
aaaaaaaaaaaaaaaaaaaaaaaaa
I need something that will allow me to replace a with another character like c from left to right according to the specified number.
For example:
some_command 3 should replace the first 3 a with c
cccaaaaaaaaaaaaaaaaaaaaaa
some_command 15
cccccccccccccccccaaaaaaaaaa

This can be done entirely in bash:
some_command() {
a="aaaaaaaaaaaaaaaaaaaaaaaaa"
c="ccccccccccccccccccccccccc"
echo "${c:0:$1}${a:$1}"
}
> some_command 3
cccaaaaaaaaaaaaaaaaaaaaaa

Using awk:
s='aaaaaaaaaaaaaaaaaaaaaaaaa'
awk -F "\0" -v n=3 -v r='c' '{for (i=1; i<=n; i++) $i=r}1' OFS= <<< "$s"
cccaaaaaaaaaaaaaaaaaaaaaa

This might work for you (GNU sed):
sed -r ':a;/a/{x;/^X{5}$/{x;b};s/$/X/;x;s/a/c/;ba} file
This will replace the first 5 a's with c throughout the file:
sed -r ':a;/a/{x;/^X{5}$/{z;x;b};s/$/X/;x;s/a/c/;ba} file
This will replace the first 5 a's with cfor each line throughout the file.

#/bin/bash
char=c
word=aaaaaaaaaaaaaaaaaaaaaaaaa
# pass in the number of chars to replace
replaceChar () {
num=$1
newword=""
# this for loop to concatenate the chars could probably be optimized
for i in $(seq 1 $num); do newword="${newword}${char}"; done
word="${newword}${word:$num}"
echo $word
}
replaceChar 4

A more general solution than the OP asked for, building on #anubhava's excellent answer.
Parameterizes the replacement count as well as the "before and after" chars.
The "before" char is matched anywhere - not just at the beginning of the input string, and whether adjacent to other instances or not.
Input is taken from stdin, so multiple lines can be piped in.
# Usage:
# ... | some_command_x replaceCount beforeChar afterChar
some_command_x() {
awk -F '\0' -v n="$1" -v o="${2:0:1}" -v r="${3:0:1}" -v OFS='' \
'{
while(++i <= NF)
{ if ($i==o) { if (++n_matched > n) break; $i=r } }
{ i=n_matched=0; print }
}'
}
# Example:
some_command_x 2 a c <<<$'abc_abc_abc\naaa rating'
# Returns:
cbc_cbc_abc
cca rating

Perl has some interesting features that can be exploited. Define the following bash script some_command:
#! /bin/bash
str="aaaaaaaaaaaaaaaaaaaaaaaaa"
perl -s -nE'print s/(a{$x})/"c" x length $1/er' -- -x=$1 <<<"$str"
Testing:
$ some_command 5
cccccaaaaaaaaaaaaaaaaaaaa

Related

Insert 1 after elements with no count

I have a file of a structure like this:
NH3O
CH4
CHN
C2NOPH3
What I was trying to do is to put 1 as a count between the two letters or at the end of the item. Thus, the desired output is:
NH3O1
C1H4
C1H1N1
C2N1O1P1H3
So far, I was trying something like sed -e 's/\([A-Z]\)\([A-Z]\)/\11\2/g' -e 's/\([A-Z]\)[[:blank:]]/\11/g' but that does not work out.
Thanks for any tips
Could you please try following, written and tested with GNU awk.
awk '{num=split($0,array,"");for(i=1;i<=num;i++){if(array[i]~/^[a-zA-Z]*[a-zA-Z]/ && (array[i]+1)~/^[a-zA-Z]*/){array[i]=array[i]"|"};val=val array[i]};print val;val=""}' Input_file
Adding a non-one liner form of solution here.
awk '
{
num=split($0,array,"")
for(i=1;i<=num;i++){
if(array[i]~/^[a-zA-Z]*[a-zA-Z]/ && (array[i]+1)~/^[a-zA-Z]*/){
array[i]=array[i]"|"
}
val=val array[i]
}
print val
val=""
}
' Input_file
sed -e ':1' -e 's/\([[:upper:]][[:lower:]]*\)\([[:upper:]]\|$\)/\11\2/' -e 't1'

Extract all but last field from a variable in bash

I have a file with lines similar to this:
01/01 THIS IS A DESCRIPTION 123.45
12/23 SHORTER DESC 9.00
11/16 DESC 1,234.00
Three fields: date, desc, amount. The first field will always be followed by a space. The last field will always be preceded by a space. But the middle field will usually contain spaces.
I know bash/regex well enough to get the first and last fields (for example, echo ${LINE##* } or cut -f1 -d\). But how do I get the middle field? Essentially everything except the first and last fields.
You can use sed for that:
$ sed -E 's/^[^[:space:]]*[[:space:]](.*)[[:space:]][^[:space:]]*$/\1/' file
THIS IS A DESCRIPTION
SHORTER DESC
DESC
Or with awk:
$ awk '{$1=$NF=""; sub(/^[ \t]*/,"")}1' file
# same output
You can also use cut and rev to delete the first and last fields:
$ cut -d ' ' -f2- file | rev | cut -d ' ' -f2- | rev
# same output
Or GNU grep:
$ grep -oP '^\H+\h\K(.*)(?=\h+\H+$)' file
# same output
Or, with a Bash loop and parameter expansion:
$ while read -r line; do line="${line#* }"; echo "${line% *}"; done <file
# same output
Or, if you want to capture the fields as variables in Bash:
while IFS= read -r line; do
date="${line%% *}"
amt="${line##* }"
line="${line#* }"
desc="${line% *}"
printf "%5s %10s \"%s\"\n" "$date" "$amt" "$desc"
done <file
Prints:
01/01 123.45 "THIS IS A DESCRIPTION"
12/23 9.00 "SHORTER DESC"
11/16 1,234.00 "DESC"
If you want to remove the first and last fields, you can just extend the parameter expansion technique you referenced:
var=${var#* } var=${var% *}
A single # or % removes the shortest substring that matches the glob.
bash: read the line into an array of words, and pick out the wanted elements from the array
while read -ra words; do
date=${words[0]}
amount=${words[-1]}
description=${words[*]:1:${#words[#]}-2}
printf "%s=%s\n" date "$date" desc "$description" amt "$amount"
done < file
outputs
date=01/01
desc=THIS IS A DESCRIPTION
amt=123.45
date=12/23
desc=SHORTER DESC
amt=9.00
date=11/16
desc=DESC
amt=1,234.00
This is the fun bit: ${words[*]:1:${#words[#]}-2}
take a slice of the words array, from index 1 (the 2nd element) for a length of "number of elements minus 2"
the words will be joined into a single string with a space separator.
See Shell Parameter Expansion and scroll down a bit for the ${parameter:offset:length} discussion.
If you want to use a regex in bash, then you can use capturing parentheses and the BASH_REMATCH array
while IFS= read -r line; do
if [[ $line =~ ([^[:blank:]]+)" "(.+)" "([^[:blank:]]+) ]]; then
echo "date=${BASH_REMATCH[1]}"
echo "desc=${BASH_REMATCH[2]}"
echo "amt=${BASH_REMATCH[3]}"
fi
done < file
Same output as above.
Notice in the pattern that the spaces need to be quoted (or backslash-escaped)
You could try below one with awk:
awk '{$1="";$NF="";sub(/^[ \t]*/,"")}1' file_name

Replace a block of text

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?
Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.
This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.
This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here
awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.
An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

Use sed to replace matched value from associative bash array

I'm using sed to reformat an input string, a portion of which I want replaced with a different string.
The input string is a date in the format of:
%Y-%m-%dT%H:%M:%S.%N%:z
Example:
2016-01-20T08:15:32.398242-05:00
My goal is to replace the month, 01 in the example above, with a string representation such as Jan.
I've defined the following array to use:
declare -A MONTHS=([01]="Jan" [02]="Feb" [03]="Mar" [04]="Apr" [05]="May" [06]="Jun" [07]="Jul" [08]="Aug" [09]="Sep" [10]="Oct" [11]="Nov" [12]="Dec")
I can't seem to get sed to use a matched group's value as the index to the MONTHS array.
What I've tried:
# straightforward sed approach
sed 's/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}/g'
# result: ${MONTHS[01]}
# break out of the single quotes
sed 's/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/'"${MONTHS[\1]}"'/g'
# result:
# use double quotes
sed "s/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}/g"
# result:
# use double quotes *and* a hardcoded example
sed "s/^[0-9]\{4\}-\([0-9]\{2\}\)-.*/${MONTHS[\1]}, ${MONTHS[01]}/g"
# result: , Jan
Is it possible to use a matched-group value from sed as an array index in the replacement?
Note: I'm purposefully avoiding the date function because the application of this can go beyond actual dates; but, I'm definitely open to alternative approaches such as awk.
I suggest this awk as an alternative:
s='2016-01-20T08:15:32.398242-05:00'
awk -v ms='Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec' 'BEGIN{
split(ms, mths, ":"); FS=OFS="-"} {$2=mths[$2+0]} 1' <<< "$s"
Output:
2016-Jan-20T08:15:32.398242-05:00
First, you can convert your associative array to a string containing the months names in order
monstr=$(for k in "${!MONTHS[#]}"; do echo $k; done | sort | while read mon; do echo ${MONTHS[$mon]}; done)
then, use awk to do the heavy lifting
awk -F- -v monstr="$monstr" 'BEGIN { split(monstr, mon, " "); } { printf("%s-%s-", $1, mon[$2+0]); for (i=3; i < NF; i++) { printf("%s-", $i); } printf("%s\n", $NF);}'
That is, store the string containing the months in a varaible that you split at the beginning, then replace the second field and print all.
First generate sed script from your array, then execute it.
Disclaimer: not sure that I correctly used bash array in the following code. Also not sure about quotes and escaping.
for k in $(seq -w 1 12) ; do
echo 's/^[0-9]\{4\}-'"$k-.*/${MONTHS[$k]}/;"
done | sed -f - your_file
Alternatively just use bash:
IFS=- read year mon rest <<<"$string"
string="$year ${MONTHS[$mon]} $rest"
If it must be sed... Here is a "brute force" answer using the t command:
#! /bin/sed -f
s/-01-/-Jan-/; tx
s/-02-/-Feb-/; tx
s/-03-/-Mar-/; tx
s/-04-/-Apr-/; tx
s/-05-/-May-/; tx
s/-06-/-Jun-/; tx
s/-07-/-Jul-/; tx
s/-08-/-Aug-/; tx
s/-09-/-Sep-/; tx
s/-10-/-Oct-/; tx
s/-11-/-Nov-/; tx
s/-12-/-Dec-/; tx
:x

sed/awk replace in all matches

I want to invert all the color values in a bunch of files. The colors are all in the hex format #ff3300 so the inversion could be done characterwise with the sed command
y/0123456789abcdef/fedcba9876543210/
How can I loop through all the color matches and do the char translation in sed or awk?
EDIT:
sample input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
asdfghj
desired output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
asdfghj
EDIT: I changed my response as per your edit.
OK, sed may result in a difficult processing. awk could do the trick more or less easily, but I find perl much more easy for this task:
$ perl -pe 's/#[0-9a-f]+/$&=~tr%0123456789abcdef%fedcba9876543210%r/ge' <infile >outfile
Basically you find the pattern, then execute the right-hand side, which executes the tr on the match, and substitutes the value there.
The inversion is really a subtraction. To invert a hex, you just subtract it from ffffff.
With this in mind, you can build a simple script to process each line, extract hexes, invert them, and inject them back to the line.
This is using Bash (see arrays, printf -v, += etc) only (no external tools there):
#!/usr/bin/env bash
[[ -f $1 ]] || { printf "error: cannot find file: %s\n" "$1" >&2; exit 1; }
while read -r; do
# split line with '#' as separator
IFS='#' toks=( $REPLY )
for tok in "${toks[#]}"; do
# extract hex
read -n6 hex <<< "$tok"
# is it really a hex ?
if [[ $hex =~ [0-9a-fA-F]{6} ]]; then
# compute inversion
inv="$((16#ffffff - 16#$hex))"
# zero pad the result
printf -v inv "%06x" "$inv"
# replace hex with inv
tok="${tok/$hex/$inv}"
fi
# build the modified line
line+="#$tok"
done
# print the modified line and clean it for reuse
printf "%s\n" "${line#\#}"
unset line
done < "$1"
use it like:
$ ./invhex infile > outfile
test case input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
bdf#cvb_foo
asdfghj
#bdfg
processed output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
bdf#cvb_foo
asdfghj
#bdfg
This might work for you (GNU sed):
sed '/#[a-f0-9]\{6\}\>/!b
s//\n&/g
h
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g
y/0123456789abcdef/fedcba9876543210/
H
g
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta
s/\n//' file
Explanation:
/#[a-f0-9]\{6\}\>/!b bail out on lines not containing the required pattern
s//\n&/g prepend every pattern with a newline
h copy this to the hold space
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g delete everything but the required pattern(s)
y/0123456789abcdef/fedcba9876543210/ transform the pattern(s)
H append the new pattern(s) to the hold space
g overwrite the pattern space with the contents of the hold space
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta replace the old pattern(s) with the new.
s/\n// remove the newline artifact from the H command.
This works...
cat test.txt |sed -e 's/\#\([0123456789abcdef]\{6\}\)/\n\#\1\n/g' |sed -e ' /^#.*/ y/0123456789abcdef/fedcba9876543210/' | awk '{lastType=type;type= substr($0,1,1)=="#";} type==lastType && length(line)>0 {print line;line=$0} type!=lastType {line=line$0} length(line)==0 {line=$0} END {print line}'
The first sed command inserts line breaks around the hex codes, then it is possible to make the substitution on all lines starting with a hash. There are probably an elegant solution to merge the lines back again, but the awk command does the job. The only assumption there is that there won't be two hex-codes following directly after each other. If so, this step has to be revised.