Shell scripting using grep to split a string - regex

I have a variable in my shell script of the form
myVAR = "firstWord###secondWord"
I would like to use grep or some other tool to separate into two variables such that the final result is:
myFIRST = "firstWord"
mySECOND = "secondWord"
How can I go about doing this? #{3} is what I want to split on.

Using substitution with sed:
echo $myVAR | sed -E 's/(.*)#{3}(.*)/\1/'
>>> firstword
echo $myVAR | sed -E 's/(.*)#{3}(.*)/\2/'
>>> secondword
# saving to variables
myFIRST=$(echo $myVAR | sed -E 's/(.*)#{3}(.*)/\1/')
mySECOND=$(echo $myVAR | sed -E 's/(.*)#{3}(.*)/\2/')

The best tool for this is sed :
$ echo "firstWord###secondWord" | sed 's#####\
#'
firstWord
secondWord
A complete example :
$ read myFIRST mySECOND < <(echo "$myvar" | sed 's##### #')
$ echo $myFIRST
firstWord
$ echo $mySECOND
secondWord

$ STR='firstWord###secondWord'
$ eval $(echo $STR | sed 's:^:V1=":; /###/ s::";V2=": ;s:$:":')
$ echo $V1
firstWord
$ echo $V2
secondWord

This is how I would do it with zsh:
myVAR="firstWord###secondWord"
<<<$myvar sed 's/###/ /' | read myFIRST mySECOND

Related

How to use sed to replace every match according to each match?

$ echo 'a,b,c,d=1' | sed '__MAGIC_HERE__'
a=1,b=1,c=1,d=1
$ echo 'a,b,c,d=2' | sed '__MAGIC_HERE__'
a=2,b=2,c=2,d=2
Dose sed can cast this spell ?
EDIT
I have to use sed twice to achieve this
s='a,b,c,d=2'
v=`echo $s | sed -rn 's/.*([0-9]+)/\1/p'`
echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"
OR
s='a,b,c,d=2'
echo $s | sed -rn 's/.*([0-9]+)/\1/p' | { read v;echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"; }
EDIT
The real use case is here and there is multiline content, Thanks to #hek2mgl, the awk is way more easier.
EDIT
My usecase
export LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32'
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
# SED Version
read -rd '' exts < <(
for i in $(echo $exts)
do
echo $i | sed -rn 's/.*=(.*)/\1/p' | { read v; echo $i | sed "s/=.*//" | sed -rn "s/([^|]+)\|?/:\*.\1=$v/gp"; }
done | tr -d '\n'
)
export LS_COLORS="$LS_COLORS$exts"
# AWK Version
read -r -d '' exts < <( echo $exts | xargs -n1 | awk -F= '{gsub(/\|/,"="$2":*.")}$2' | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:*.$exts"
unset exts
EDIT
Finale sed version
read -r -d '' exts < <( echo $exts | xargs -n1 | sed -r 's/\|/\n/g;:a;s/\n(.*(=.*))/\2:*.\1/;ta' | sed "s/^/*./g" | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:$exts"
This might work for you (GNU sed):
sed -r 's/,/\n/g;:a;s/\n(.*(=.*))/\2,\1/;ta' file
Convert the separators to newlines (a unique character not found in the file) and then replace each occurrence of the newline by the required string and the original separator.
I would use awk:
awk -F= '{gsub(/,/,"="$2",")}1'
-F= splits the input line by = which let's us access the number in field two $2. gsub() replaces all occurrences of , by =$2,. The 1 at the end is an awk idiom. It will simply print the, modified, line.
Perl can...
echo 'a,b,c,d=1' | perl -ne 'chomp; my ($val) = m|=(\d+)|; s|\=.*||; print join(",", map {"$_=$val"} split/,/) . "\n";'
a=1,b=1,c=1,d=1
Explained
perl -ne # Loop over input and run command
chomp; # Remove trailing newline
my ($val) = m|=(\d+)|; # Find numeric value after '='
s|\=.*||; # Remove everything starting with '='
split /,/ # Split input on ',' => ( a, b, c, d )
map {"$_=$val" } # Create strings ( "a=1", "b=1", ... ) from results of split
join(",",...) # Join the results of previous map with ','
print .... "\n" # Print it all out with a newline at the end.
I hope you're not seriously going to use that mush of read/echo/xargs/sed/sed/tr in your code. Just use one small, simple awk script:
$ cat tst.sh
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
exts=$( awk -F'=' '
NF {
gsub(/\||$/, "="$2":", $1)
out = out $1
}
END {
sub(":$", "", out)
print out
}
' <<<"$exts" )
echo "$exts"
$ ./tst.sh
tar=01;31:tgz=01;31:arj=01;31:taz=01;31:lzh=01;31:zip=01;31:z=01;31:Z=01;31:gz=01;31:bz2=01;31:deb=01;31:rpm=01;31:jar=01;31:jpg=01;34:jpeg=01;34:gif=01;34:bmp=01;34:pbm=01;34:pgm=01;34:ppm=01;34:tga=01;34:xbm=01;34:xpm=01;34:tif=01;34:tiff=01;34:png=01;34:mov=01;35:fli=01;35:gl=01;35:dl=01;35:xcf=01;35:xwd=01;35:ogg=01;35:mp3=01;35:wav=01;35:flv=01;36:mkv=01;36:mp4=01;36:mpg=01;36:mpeg=01;36:avi=01;36
Perl, another Perl alternative...
d=1:
echo 'a,b,c,d=1' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=1,b=1,c=1,d=1
d=2:
echo 'a,b,c,d=2' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=2,b=2,c=2,d=2
Explanations:
perl -e # perl one-liner switch
perl -ne # puts an implicit loop for each line of input
perl -pe # as 'perl -ne', but adds an implicit print at the end of each iteration
($a)=/(\d+)$/; # catch the number in d=1 or d=2, assign variable $a
s/,/=$a,/g; # substitute each ',' with '=1,' if $a=1

How can I extract the timestamp from the end of a shell variable when the format isn't fixed?

I'm trying to extract the timestamp from the end of a shell variable like this:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_yyyymmddhhmmss.txt
TimeStamp=`echo $Input | awk -F"_" '{print $6}'`
This works for this particular case, but the format of the string can change. For example, it could also be:
Input=AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_yyyymmddhhmmss.txt
The variable will always end with yyyymmddhhmmss.txt. How can I extract the timestamp consistently?
Given:
$ echo $Input
AEXP_CSTONE_EU_prpbdp_sourcefile_prospects_20151116141111.txt
You can use sed:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\)\.txt|\1|p'
20151116141111
Or nested grep:
$ echo $Input | grep -Eo '_[0-9]{14}\.txt' | grep -Eo '[0-9]{14}'
20151116141111
awk:
$ echo $Input | awk -F_ '{split($NF, a, "."); print a[1]}'
20151116141111
Perl
$ echo $Input | perl -ne 'print $1 if /_(\d{14})\.txt/'
20151116141111
cut and rev:
$ echo $Input | rev | cut -d'_' -f 1 | rev | cut -d'.' -f 1
20151116141111
Bash:
$ last=${Input##*_}
$ echo $last
20151116141111.txt
$ ts=${last%.*}
$ echo $ts
20151116141111
In summary, lots of ways...
If you don't want to loose the .txt part, even easier:
$ echo $Input | sed -n 's|.*_\([0-9]\{14\}\.txt\)|\1|p'
20151116141111.txt
$ echo $Input | grep -Eo '[0-9]{14}\.txt$'
20151116141111.txt
$ echo $Input | awk -F_ '{print $NF}'
20151116141111.txt
$ echo $Input | perl -ne 'print $1 if /_(\d{14}\.txt)/'
20151116141111.txt
$ echo $Input | rev | cut -d'_' -f 1 | rev
20151116141111.txt
$ last=${Input##*_}
$ echo $last
20151116141111.txt
You need to match the part that will not change then:
TimeStamp=$(echo $Input | perl -pe 's/.*(\d{14})\.txt/$1/')
You are extracting the 6th field separated by _, yet it seems you really want to extract the last field. You can do that with parameter expansion:
timestamp=${Input##*_}
timestamp=${timestamp%.txt}
See BashFAQ 100 for more on string manipulation in bash.
In awk, you'd use $NF to reference the last field, though awk is overkill for this.

sed regular expression extraction

i have a range of strings which conform to one of the two following patters:
("string with spaces",4)
or
(string_without_spaces,4)
I need to extract the "string" via a bash command, and so far have found a pattern that works for each, but not for both.
echo "(\"string with spaces\",4)" | sed -n 's/("\(.*\)",.*)/\1/ip'
output:string with spaces
echo "(string_without_spaces,4)" | sed -n 's/(\(.*\),.*)/\1/ip'
output:string_without_spaces
I have tried using "\? however it does not match the " if it is there:
echo "(SIM,0)" | sed -n 's/("\?\(.*\)"\?,.*)/\1/ip'
output: SIM
echo "(\"SIM\",0)" | sed -n 's/("\?\(.*\)"\?,.*)/\1/ip'
output: SIM"
can anyone suggest a pattern that would extract the string in both scenarios? I am not tied to sed but would prefer to not have to install perl in this environment.
How about using [^"] instead of . to exclude " to be matched.
$ echo '("string with spaces",4)' | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
string with spaces
$ echo "(string_without_spaces,4)" | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
string_without_spaces
$ echo "(SIM,0)" | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
SIM
$ echo '("SIM",0)' | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
SIM

Replace string if first letter is uppercase using sed

I try to write sed answer to this question Edit a file using sed/awk using:
sed -e 's/^[A-Z]/$:$&/' file.txt
but the result is:
wednesday
$:$Weekday
$:$thursday
$:$Weekday
$:$friday
$:$Weekday
$:$saturday
$:$MaybeNot
$:$sunday
$:$MaybeNot
$:$monday
$:$Weekday
$:$tuesday
$:$Weekday
Why it replace if first character is lower case?
This is a "feature" according to this bug report caused by unexpected character ordering in the locale, further explained here and here.
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[A-Z]/./g'
..........................a.........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[a-z]/./g'
.........................Z..........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | LC_ALL=C sed -e 's/[A-Z]/./g'
..........................abcdefghijklmnopqrstuvwxyz
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | LC_ALL=C sed -e 's/[a-z]/./g'
ABCDEFGHIJKLMNOPQRSTUVWXYZ..........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[[:upper:]]/./g'
..........................abcdefghijklmnopqrstuvwxyz
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[[:lower:]]/./g'
ABCDEFGHIJKLMNOPQRSTUVWXYZ..........................
$ sed --version
GNU sed version 4.2.1

How to format a list using regex

I cannot figure out how to make this work:
I have:
icon-braille icon-bookmark-empty icon-blogger icon-adult icon-address-book
And I want:
'icon-braille','icon-bookmark-empty','icon-blogger','icon-adult','icon-address-book'
This is OSX friendly sed "s/ /','/g;s/^/'/;s/$/'/":
$ echo "icon-braille icon-bookmark-empty" | sed "s/ /','/g;s/^/'/;s/$/'/"
'icon-braille','icon-bookmark-empty'
With shell and sed:
echo "'"$(sed "s/ /','/g")"'"
Example:
$ echo "'"$(sed "s/ /','/g")"'"
icon-braille icon-bookmark-empty icon-blogger icon-adult icon-address-book
'icon-braille','icon-bookmark-empty','icon-blogger','icon-adult','icon-address-book'
The first line was inserted, the second — produced.
$ echo "icon-braille icon-bookmark-empty icon-blogger icon-adult icon-address-book" | sed -e "s/ \+/','/g" -e "s/^/'/" -e "s/$/'/"
'icon-braille','icon-bookmark-empty','icon-blogger','icon-adult','icon-address-book'
You can just use the shell (assuming bash)
$ list="icon-braille icon-bookmark-empty icon-blogger icon-adult icon-address-book"
$ result=""; sep=""
$ for word in $list; do result+=$sep\'$word\'; sep=,; done
$ echo "$result"
'icon-braille','icon-bookmark-empty','icon-blogger','icon-adult','icon-address-book'