How to ignore word delimiters in sed

How to ignore word delimiters in sed - regex

So I have a bash script which is working perfectly except for one issue with sed.
full=$(echo $full | sed -e 's/\b'$first'\b/ /' -e 's/ / /g')
This would work great except there are instances where the variable $first is preceeded immediately by a period, not a blank space. In those instances, I do not want the variable removed.
Example:
full="apple.orange orange.banana apple.banana banana";first="banana"
full=$(echo $full | sed -e 's/\b'$first'\b/ /' -e 's/ / /g')
echo $first $full;
I want to only remove the whole word banana, and not make any change to orange.banana or apple.banana, so how can I get sed to ignore the dot as a delimiter?

You want "banana" that is preceded by beginning-of-string or a space, and followed by a space or end-of-string
$ sed -r 's/(^|[[:blank:]])'"$first"'([[:blank:]]|$)/ /g' <<< "$full"
apple.orange orange.banana apple.banana
Note the use of -r option (for bsd sed, use -E) that enables extended regular expressions -- allow us to omit a lot of backslashes.

Related

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?

Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.

You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841

sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"

Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');

Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.

type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'

Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

How to remove special characters like a single quote from a string?

Using Sed I tried but it did not worked out.
Basically, I have a string say:-
Input:-
'http://www.google.com/photos'
Output required:-
http://www.google.com
I tried using sed but escaping ' is not possible.
what i did was:-
sed 's/\'//' | sed 's/photos//'
sed for photos worked but for ' it didn't.
Please suggest what can be the solution.

Escaping ' in sed is possible via a workaround:
sed 's/'"'"'//g'
# |^^^+--- bash string with the single quote inside
# | '--- return to sed string
# '------- leave sed string and go to bash
But for this job you should use tr:
tr -d "'"

Perl Replacements have a syntax identical to sed, works better than sed, is installed almost in every system by default and works for all machines the same way (portability):
$ echo "'http://www.google.com/photos'" |perl -pe "s#\'##g;s#(.*//.*/)(.*$)#\1#g"
http://www.google.com/
Mind that this solution will keep only the domain name with http in front, discarding all words following http://www.google.com/
If you want to do it with sed , you can use sed "s/'//g" as advised by Wiktor Stribiżew in comments.
PS: I sometimes refer to special chars with their ascii hex code of the special char as advised by man ascii, which is \x27 for '
So for sed you can do it:
$ echo "'http://www.google.com/photos'" |sed -r "s#'##g; s#(.*//.*/)(.*$)#\1#g;"
http://www.google.com/
# sed "s#\x27##g' will also remove the single quote using hex ascii code.
$ echo "'http://www.google.com/photos'" |sed -r "s#'##g; s#(.*//.*)(/.*$)#\1#g;"
http://www.google.com #Without the last slash
If your string is stored in a variable, you can achieve above operations with pure bash, without the need of external tools like sed or perl like this:
$ a="'http://www.google.com/photos'" && a="${a:1:-1}" && echo "$a"
http://www.google.com/photos
# This removes 1st and last char of the variable , whatever this char is.
$ a="'http://www.google.com/photos'" && a="${a:1:-1}" && echo "${a%/*}"
http://www.google.com
#This deletes every char from the end of the string up to the first found slash /.
#If you need the last slash you can just add it to the echo manually like echo "${a%/*}/" -->http://www.google.com/

It's unclear if the ' are actually around your string, although this should take care it:
str="'http://www.google.com/photos'"
echo "$str" | sed s/\'//g | sed 's/\/photos//g'
Combined:
echo "$str" | sed -e "s/'//g" -e 's/\/photos//g'
Using tr:
echo "$str" | sed -e "s/\/photos//g" | tr -d \'
Result:
http://www.google.com
If the single quotes are not around your string it should work regardless.

sed - exchange words with delimiter

I'm trying swap words around with sed, not replace because that's what I keep finding on Google search.
I don't know if it's the regex that I'm getting wrong. I did a search for everything before a char and everything after a char, so that's how I got the regex.
echo xxx,aaa | sed -r 's/[^,]*/[^,]*$/'
or
echo xxx/aaa | sed -r 's/[^\/]*/[^\/]*$/'
I am getting this output:
[^,]*$,aaa
or this:
[^,/]*$/aaa
What am I doing wrong?

For the first sample, you should use:
echo xxx,aaa | sed 's/\([^,]*\),\([^,]*\)/\2,\1/'
For the second sample, simply use a character other than slash as the delimiter:
echo xxx/aaa | sed 's%\([^/]*\)/\([^/]*\)%\2/\1%'
You can also use \{1,\} to formally require one or more:
echo xxx,aaa | sed 's/\([^,]\{1,\}\),\([^,]\{1,\}\)/\2,\1/'
echo xxx/aaa | sed 's%\([^/]\{1,\}\)/\([^/]\{1,\}\)%\2/\1%'
This uses the most portable sed notation; it should work anywhere. With modern versions that support extended regular expressions (-r with GNU sed, -E with Mac OS X or BSD sed), you can lose some of the backslashes and use + in place of * which is more precisely what you're after (and parallels \{1,\} much more succinctly):
echo xxx,aaa | sed -E 's/([^,]+),([^,]+)/\2,\1/'
echo xxx/aaa | sed -E 's%([^/]+)/([^/]+)%\2/\1%'

With sed it would be:
sed 's#\([[:alpha:]]\+\)/\([[:alpha:]]\+\)#\2,\1#' <<< 'xxx/aaa'
which is simpler to read if you use extended posix regexes with -r:
sed -r 's#([[:alpha:]]+)/([[:alpha:]]+)#\2/\1#' <<< 'xxx/aaa'
I'm using two sub patterns ([[:alpha:]]+) which can contain one or more letters and are separated by a /. In the replacement part I reassemble them in reverse order \2/\1. Please also note that I'm using # instead of / as the delimiter for the s command since / is already the field delimiter in the input data. This saves us to escape the / in the regex.
Btw, you can also use awk for that, which is pretty easy to read:
awk -F'/' '{print $2,$1}' OFS='/' <<< 'xxx/aaa'

removing extra space

cat myfile.sql
DELIMITER $$
USE `AA4`$$
DROP TRIGGER /*!50032 IF EXISTS */ `AT_Card_INSERT_Trigger`$$
CREATE
/*!50017 DEFINER='root'#'%' */
TRIGGER `AT_Card_INSERT_Trigger` AFTER INSERT ON `card`
FOR EACH ROW BEGIN
The following works as expected and removes the definer clause.
sed -e 's/DEFINER=[^*]*\*/\*/' myfile.sql
But it does not work with spaces after or before equal to sign. For e.g. if I have a line like this...
/*!50017 DEFINER = 'root'#'%' */
Then I need a sed statement something like this...
sed -e 's/DEFINER\ =\ [^*]*\*/\*/' myfile.sql
But there are 2 more possibilites with space (no space before, no space after). It is also possible that there can be more than 1 space before or after "=".
How do I handle it all?

You can just add * (a single space followed by *) to match zero or more spaces:
sed -e 's/DEFINER *= *[^*]*\*/\*/' myfile.sql
If you're on OSX, you could allow for more whitespace than just spaces using this:
# OSX
sed -E -e 's/DEFINER[[:space:]]*=[[:space:]]*[^*]*\*/\*/' myfile.sql
and the same will work with GNU sed(1) if you use -r in place of -E:
# GNU
sed -r -e 's/DEFINER[[:space:]]*=[[:space:]]*[^*]*\*/\*/' myfile.sql
GNU sed(1) also understands \s for whitespace:
# GNU
sed -r -e 's/DEFINER\s*=\s*[^*]*\*/\*/' myfile.sql

You already know how to use the Kleene star; it works for spaces too:
sed -e 's/DEFINER[ ]*=[ ]*[^*]*\*/\*/' myfile.sql
That handles spaces only. If you need tabs as well, drop them into the [ ] sections too. If you have an extended sed that knows about \s you can get even fancier.

How can I insert a tab character with sed on OS X?

I have tried:
echo -e "egg\t \t\t salad" | sed -E 's/[[:blank:]]+/\t/g'
Which results in:
eggtsalad
And...
echo -e "egg\t \t\t salad" | sed -E 's/[[:blank:]]+/\\t/g'
Which results in:
egg\tsalad
What I would like:
egg salad

Try: Ctrl+V and then press Tab.

Use ANSI-C style quoting: $'string'
sed $'s/foo/\t/'
So in your example, simply add a $:
echo -e "egg\t \t\t salad" | sed -E $'s/[[:blank:]]+/\t/g'

OSX's sed only understands \t in the pattern, not in the replacement doesn't understand \t at all, since it's essentially the ancient 4.2BSD sed left over from 1982 or thenabouts. Use a literal tab (which in bash and vim is Ctrl+V, Tab), or install GNU coreutils to get a more reasonable sed.

Another option is to use $(printf '\t') to insert a tab, e.g.:
echo -e "egg\t \t\t salad" | sed -E "s/[[:blank:]]+/$(printf '\t')/g"

try awk
echo -e "egg\t \t\t salad" | awk '{gsub(/[[:blank:]]+/,"\t");print}'

A workaround for tab on osx is to use "\ ", an escape char followed by four spaces.
If you are trying to find the last instance of a pattern, say a " })};" and insert a file on a newline after that pattern, your sed command on osx would look like this:
sed -i '' -e $'/^\ \})};.*$/ r fileWithTextIWantToInsert' FileIWantToChange
The markup makes it unclear: the escape char must be followed by four spaces in order for sed to register a tab character on osx.
The same trick works if the pattern you want to find is preceded by two spaces, and I imagine it will work for finding a pattern preceded by any number of spaces as well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to ignore word delimiters in sed - regex

Related

Get substring using either perl or sed

How to remove special characters like a single quote from a string?

sed - exchange words with delimiter

removing extra space

How can I insert a tab character with sed on OS X?

Categories

Resources