Replace slash in Bash - regex

Let's suppose I have this variable:
DATE="04\Jun\2014:15:54:26"
Therein I need to replace \ with \/ in order to get the string:
"04\/Jun\/2014:15:54:26"
I tried tr as follows:
echo "04\Jun\2014:15:54:26" | tr '\' '\\/'
But this results in: "04\Jun\2014:15:54:26".
It does not satisfy me. Can anyone help?

No need to use an echo + a pipe + sed.
A simple substitution variable is enough and faster:
echo ${DATE//\//\\/}
#> 04\/Jun\/2014:15:54:26

Use sed for substitutions:
sed 's#/#\\/#g' < filename.txt > newfilename.txt
You usually use "/" instead of the "#", but as long as it is there, it doesn't matter.
I am writing this on a windows PC so I hope it is right, you may have to escape the slashes with another slash.
sed explained, the -e lets you edit the file in place. You can use -i to create a backup automatically.
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html

here you go:
kent$ echo "04/Jun/2014:15:54:26"|sed 's#/#\\/#g'
04\/Jun\/2014:15:54:26
your tr line was not correct, you may mis-understand what tr does, tr 'abc' 'xyz' will change a->x, b->y, c->z,not changing whole abc->xyz..

You can also escape the slashes, with a slightly less readable solution than with hashes:
echo "04/Jun/2014:15:54:26" | sed 's/\//\\\//g'

This has not been said in other answers so I thought I'd add some clarifications:
tr uses two sets of characters for replacement, and the characters from the first set are replaced with those from the second set in a one-to-one correspondance. The manpage states that
SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored.
Example:
echo abca | tr ab de # produces decd
echo abca | tr a de # produces dbcd, 'e' is ignored
echo abca | tr ab d # produces ddcd, 'd' is interpreted as a replacement for 'b' too
When using sed for substitutions, you can use another character than '/' for the delimiter, which will make your expression clearer (I like to use ':', #n34_panda proposed '#' in their answer). Don't forget to use the /g modifier to replace all occurences: sed 's:/:\\/:g' with quotes or sed s:/:\\\\/:g without (backslashes have to be escaped twice).
Finally your shortest solution will probably be #Luc-Olivier's answer, involving substitution, in the following form (don't forget to escape forward slashes too when part of the expected pattern):
echo ${variable/expected/replacement} # will replace one occurrence
echo ${variable//expected/replacement} # will replace all occurrences

Related

find recurring pattern with `sed`

I am using GNU bash 4.3.48
I expected that
echo "23S62M1I19M2D" | sed 's/.*\([0-9]*M\).*/\1/g'
would output 62M19M... But it doesn't.
sed 's/\([0-9]*M\)//g' deletes ALL [0-9]*M and retrieves 23S1I2D. but the group \1 is not working as I thought it would.
sed 's/.*\([0-9]*M\).*/ \1 /g', retrieves M...
What am I doing wrong?
Thank you!
With your shown samples and with awk you could try following program.
echo "23S62M1I19M2D" |
awk '
{
val=""
while(match($0,/[0-9]+M/)){
val=val substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
print val
}
'
Explanation: Simple explanation would be, using echo to print values and sending it as a standard input to awk program. In awk program using its match function to match regex mentioned in it(/[0-9]+M) running loop to find all matches in each line and printing the collected matched values at last of each line.
This might work for you (GNU sed):
sed -nE '/[0-9]*M/{s//\n&\n/g;s/(^|\n)[^\n]*\n?//gp}' file
Surround the match by newlines and then remove non-matching parts.
Alternative, using grep and tr:
grep -o '[0-9]*M' file | tr -d '\n'
N.B. tr removes all newlines (including the last one) to restore the last newline, use:
grep -o '[0-9]*M' file | tr -d '\n' | paste
The alternate solution will concatenate all results into a single line. To achieve the same result with the first solution use:
sed -nE '/[0-9]*M/{s//\n&\n/g;s/(^|\n)[^\n]*\n?//g;H};${x;s/\n//gp}' file
The problem is that the .* is greedy. Since only M is obligatory, when the engine finds last M, it satisfies the regex, so all string is matched, M is captured and thus kept after replacing with \1 backreference.
That means, you can't easily do this with sed. You can do that with Perl much easier since it supports matching and skipping pattern:
#!/bin/bash
perl -pe 's/\d+M(*SKIP)(*F)|.//g' <<< "23S62M1I19M2D"
See the online demo. The pattern matches
\d+M(*SKIP)(*F) - one or more digits, M, and then the match is omitted and the next match is searched for from the failure position
|. - or matches any char other than a line break char.
Or simply match all occurrences and concatenate them:
perl -lane 'BEGIN{$a="";} while (/\d+M/g) {$a .= $&} END{print $a;}' <<< "23S62M1I19M2D"
All \d+M matches are appended to the $a variable which is printed at the end of processing the string.
Your substitution is probably working, but not substituting what you think it is.
In the substitution s/\(foo...\)/\1/, the \1 matches whatever \(...\) matches and captures, so your substitution is replacing foo... by foo...!
% echo "1234ABC" | sed 's/\([A-Z]\)/-\1-/'g
1234-A--B--C-
So you'll need to match more, but capture only a portion of the match. For example:
echo "23S62M1I19M2D" | sed 's/[0-9]*[A-LN-Z]*\([0-9]*M\)/\1/g'
62M19M2D
In the case of sed 's/.*\([0-9]*M\).*/\1/g' (did that appear in an edit to the question, or did I just miss it?), the .* matches ‘greedily’ – it matches as much as it possibly can, thus including the digits before the M. In the example above, the [A-LN-Z] is required to be at the end of the uncaptured part, so the digits are forced to be matched by the [0-9] inside the capture.
Getting a clear idea of what ‘greedy’ means is a really important idea when writing or interpreting regexps.
If you know you will only encounter the suffixes S, M, I and D, an alternative approach would be explicitly deleting the combinations you don't want:
echo "23S62M1I19M2D" | sed 's/[0-9]\+[SID]//g'
This gives the expected:
62M19M
Update: This variant produces the same output, but rejects all non-numeric, non-M suffixes:
echo "23S62M1I19M2D" | sed 's/[0-9]\+[^0-9M]//g'

Linux shell extracting substring between matching patterns

Let's say I have a string poskek|gfgfd|XLSE|a1768|d234|uijjk and I want to extract just the LSE part.
I only know that there will be |X directly before LSE, and | directly after the part I am interested in LSE.
The other answer using sed should work, but I always find sed to be a bit awkward for regex selection, as it's really intended for replacement (hence why either side of the pattern needs to be flanked with .* and the part you actually want needs to be in parentheses). Here's a solution using grep:
grep -Po '\|X\K[^|]+'
-P signals grep to use Perl's regex engine which is more advanced
-o only prints the matching part of the line
\|X match a literal vertical bar and a capital X
\K forget what has currently been matched (do not include it in the final output)
[^|]+ one or more characters other than vertical bars
As a pure bash solution, please try:
str='poskek|gfgfd|XLSE|a1768|d234|uijjk'
ext=${str#*|X}
ext=${ext%%|*}
echo "$ext"
If regex is available, following also works:
if [[ $str =~ .*\|X([^|]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
echo 'poskek|gfgfd|XLSE|a1768|d234|uijjk' | sed -n 's/.*|X\([^|]\+\).*/\1/p'
That ought to do the trick.
Explained:
sed -n will not print anything unless specified
s/ - search and replace
.*|X - match everything up to and including |X
\([^|]\+\) - capture multiple (at least one) character that isn't a |
.* - match the rest of the text (just to "eat it up")
/\1/p - Replace all matched text with the first capture, and print
For this particular case, you could do the rather unconventional:
awk '$1=="X"{$1="";print}' FS= OFS= RS=\|
try this
echo 'poskek|gfgfd|XLSE|a1768|d234|uijjk' |
awk -F "|" '{for(i=1;i<=NF;++i) printf "%s", (substr($i,1,1)=="X"?substr($i,2):"")}'
where
-F is field seperator => '|'
NF is number of fields

sed substitution with user-specified replacement string

The general form of the substitution command in sed is:
s/regexp/replacement/flags
where the '/' characters may be uniformly replaced by any other single character. But how do you choose this separator character when the replacement string is being fed in by an environment variable and might contain any printable character? Is there a straightforward way to escape the separator character in the variable using bash?
The values are coming from trusted administrators so security is not my main concern. (In other words, please don't answer with: "Never do this!") Nevertheless, I can't predict what characters will need to appear in the replacement string.
You can use control character as regex delimiters also like this:
s^Aregexp^Areplacement^Ag
Where ^A is CTRLva pressed together.
Or else use awk and don't worry about delimiters:
awk -v s="search" -v r="replacement" '{gsub(s, r)} 1' file
Here isn't (easy) solution for the following using the sed.
while read -r string from to wanted
do
echo "in [$string] want replace [$from] to [$to] wanted result: [$wanted]"
final=$(echo "$string" | sed "s/$from/$to/")
[[ "$final" == "$wanted" ]] && echo OK || echo WRONG
echo
done <<EOF
=xxx= xxx === =====
=abc= abc /// =///=
=///= /// abc =abc=
EOF
what prints
in [=xxx=] want replace [xxx] to [===] wanted result: [=====]
OK
in [=abc=] want replace [abc] to [///] wanted result: [=///=]
sed: 1: "s/abc/////": bad flag in substitute command: '/'
WRONG
in [=///=] want replace [///] to [abc] wanted result: [=abc=]
sed: 1: "s/////abc/": bad flag in substitute command: '/'
WRONG
Can't resists: Never do this! (with sed). :)
Is there a straightforward way to escape the separator character in
the variable using bash?
No, because you passing the strings from variables, you can't easily escape the separator character, because in "s/$from/$to/" the separator can appear not only in the $to part but in the $from part too. E.g. when you escape the separator it in the $from part it will not do the replacement at all, because will not find the $from.
Solution: use something other as sed
1.) Using pure bash. In the above script instead of the sed use the
final=${string//$from/$to}
2.) If the bash's substitutions are not enough, use something to what you can pass the $from and $to as variables.
as #anubhava already said, can use: awk -v f="$from" -v t="$to" '{gsub(f, t)} 1' file
or you can use perl and passing values as environment variables
final=$(echo "$string" | perl_from="$from" perl_to="$to" perl -pe 's/$ENV{perl_from}/$ENV{perl_to}/')
or passing the variables to perl via the command line arguments
final=$(echo "$string" | perl -spe 's/$f/$t/' -- -f="$from" -t="$to")
2 options:
1) take a char not in the string (need a pre process on content check and possible char without warranty that a char is available)
# Quick and dirty sample using `'/_##|!%=:;,-` arbitrary sequence
Separator="$( printf "%sa%s%s" '/_##|!%=:;,-' "${regexp}" "${replacement}" \
| sed -n ':cycle
s/\(.\)\(.*a.*\1.*\)\1/\1\2/g;t cycle
s/\(.\)\(.*a.*\)\1/\2/g;t cycle
s/^\(.\).*a.*/\1/p
' )"
echo "Separator: [ ${Separator} ]"
sed "s${Separator}${regexp}${Separator}${replacement}${Separator}flag" YourFile
2) escape the wanted char in the string patterns (need a pre process to escape char).
# Quick and dirty sample using # arbitrary with few escape security check
regexpEsc="$( printf "%s" "${regexp}" | sed 's/#/\\#/g' )"
replacementEsc"$( printf "%s" "${replacement}" | sed 's/#/\\#/g' )"
sed 's#regexpEsc#replacementEsc#flags' YourFile
From man sed
\cregexpc
Match lines matching the regular expression regexp. The c may be any
character.
When working with paths i often use # as separator:
sed s\#find/path#replace/path#
No need to escape / with ugly \/.

replace more than one special character with sed

I´m a nooby in regex so i have my headache with sed.
I need help to replace all special characters from the given company names with "-".
So this is the given string:
FML Finanzierungs- und Mobilien Leasing GmbH & Co. KG
I want the result:
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
I tried the following:
nr = $(echo "$name" | sed -e 's/ /-/g'))
so this replace all whitespaces with -, but what the right expression to replace the others? My one search via google are not very successful.
That depends on what you consider to be a special character -- I say this because you appear to consider & a regular character but not ., which seems a bit odd. Anyway, I imagine something of the form
nr=$(echo "$name" | sed 's/[^[:alnum:]&]\+/-/g')
would serve you best. Here [^[:alnum:]&] matches any character that is not alphanumeric or &, and [^[:alnum:]&]\+ matches a sequence of one or more such characters, so the sed call replaces all such sequences in $name with a hyphen. If there are other characters that you consider regular, add them to the set. Note that the handling of umlauts and suchlike depends on your locale.
Also note that echo may cause trouble if $name begins with a hyphen (it could be parsed as options for echo), so if you can tether yourself to bash,
nr=$(sed 's/[^[:alnum:]&]\+/-/g' <<< "$name")
might be more robust.
Apparently you wan to remove - and . and then replace spaces with -.
This would do it, by saying sed -e 'one thing' -e 'another thing':
$ echo "$name" | sed -e 's/[-\.]//g' -e 's/ /-/g'
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
Note we enclose within square backets all the characters that we want to treat equally: [-\.] means either - or . (we need to escape it, otherwise it would match any character).
Do this help you:
awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name"
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
gsub(/[.-]/,"") Removes . and _
-vOFS=- sets new field separator to -
$1=$1 reconstruct the line so it uses new field separator
1 print the line.
To get it to a variable
nr=$(awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name")
Try this way also
echo "name" | sed 's/ \|- \|\. /-/g'
OutPut :
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG

Bash- How to convert non-alphanumerical character to "_"

I am trying to store user input in a variable and clean that variable in order to keep only alphanumerical caract + some others (I mean [a-zA-Z0-9-_]).
I tried using this but it isn't exhaustive :
SERVICE_NAME=$(echo $SERVICE_NAME | tr A-Z a-z | tr ' ' _ | tr \' _ | tr \" _)
Do you have some help for this?
Bash's string substitution is a fine thing: ${var//pat/rep}
val='Foo$%!*#BAR###baZ'
echo ${val//[^a-zA-Z_-]/_}
Foo_____BAR___baZ
A small explanation: The slash introduces a search/replace, a little like in sed (where it just delimits patterns). But you use a single slash for one replacement:
val='Foo$%!*#BAR###baZ'
echo ${val/[^a-zA-Z_-]/_}
Foo_%!*#BAR###baZ
Two slashes // mean replace all. Uncommon, but it has some logic, multiple slashes to mean multiple replace (please excuse my poor English).
And note how the $ is separated from the variable, but it is hard to modify a literal constant this way (which would be nice for testing). Modifying $1 isn't a no-brainer as well, afaik.
$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -e 's/[^a-zA-Z0-9\-]/_/g'
asd__qcw__d
I would use sed for this and use the ^ (not) operator in your set of valid characters and replace everything else with an underscore. The above shows the syntax with the output.
And, as a bonus, if you want to replace a run of invalid characters with one underscore, just add + to your regular expression (and use the -r switch to sed to make it use extended regular expressions:
$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -r 's/[^a-zA-Z0-9\-]+/_/g'
asd_qcw_d
I believe it can all be done in 1 single sed command like this:
echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g'
OUTPUT
foo_____bar___baz
perl way:
perl -ple 's/[^\w\-]/_/g'
pure bash way
a='foo-BAR_123,.:goo'
echo ${a//[^[:alnum:]-]/_}
produces:
foo-BAR_123___goo