Negate Single Quote In Character Class - regex

I have the following code where I am trying to replace assign with always #(*) using SED.
Essentially I am trying to ignore the ' in character class but SED still seems to match it. I don't want sed to catch the line if the line contains a ' (My regex is much more complicated than this but I want sed to catch lines which my regex and ignore the lines which match my regex but contain a ')
echo "assign sample_signal = '0;" | sed '/[^\x27]/ {s/assign/always #(*)/g}'
Result: always #(*) sample_signal = '0;

Try this please (GNU sed):
$ echo "assign sample_signal = '0;" | sed -n '/[\x27]/!{s/assign/always #(*)/g; p}'
$ echo "assign sample_signal = 0;" | sed -n '/[\x27]/!{s/assign/always #(*)/g; p}'
always #(*) sample_signal = 0;
Two mistakes you've made:
1. /[^\x27]/ means to match any character that is not a ', but there're many characters that are not ', so the regex will match anyway.
2. You didn't use -n which is to suppress the output, so match or not, substitude or not, the line will be printed out anyway.
So I changed to /[\x27]/!{} which means when \x27 matched, not execute the block {}.
(In sed's word, it will be executed when not matched.)
And by -n switch, and p in the block, lines with ' are ignored.

Just use '\'' anywhere you need a ':
$ echo "f'oo" | sed 's/'\''o/X/'
fXo
$ echo "f'oo" | sed 's/[^'\'']o/X/'
f'X

You can enclose your sed command in double quotes and simply use /'/! to apply your command to lines not containing quotes:
echo "assign sample_signal = '0;" | sed "/'/! {s/assign/always #(*)/g;}"
If there is just one s command to apply, you can also omit the braces:
echo "assign sample_signal = '0;" | sed "/'/! s/assign/always #(*)/g"
As #EdMorton points out in comments, enclosing the command in double quotes may have unwanted effects. You may need to escape dollar sign(\$) to avoid unwanted variable expansion in your pattern) and double escape backslashes: \\\.

Related

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?
Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.
You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841
sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"
Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');
Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.
type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'
Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

Regex EOL replace with perl is giving unexpected results

Why is there a dollar sign at the starting of line 2 and line 3?
➜ echo -e "hello\nworld" | perl -pe 's/$/\$/g'
hello$
$world$
$%
Above, I am trying to add a dollar sign at the end of each line, but somehow it's appending a dollar sign at the beginning too. It does that when global flag is enabled. But when I remove the global flag, it works fine:
➜ echo -e "hello\nworld" | perl -pe 's/$/\$/'
hello$
world$
Can anyone explain what's happening? Maybe it has something to do with '\r\n' characters?
EDIT : Adding the lookbehind case
It's not just breaking in this cases, but other cases as well. Consider the following:
➜ echo -e "A\nB\nC\nD" | perl -pe 's/(?<!A)$/\$/'
A
$B$
C$
D$
Above, I want to mark rows which don't end in "A" with $.
The extra dollar sign in line 2 shouldn't be there. I'm not even using global flag.
SOLUTION : Okay got it now. The solution for second one is like this (for explanation, refer to Wiktor Stribiżew's answer)
➜ echo -e "A\nB\nC\nD" | perl -pe 's/(?<!A|\n)$/\$/'
A
B$
C$
D$
But beware, if you try with more than single characters, it will throw
Variable length lookbehind not implemented in regex. For example:
➜ echo -e "AA\nBB\nCC\nDD" | perl -pe 's/(?<!AA|\n)$/\$/'
Variable length lookbehind not implemented in regex m/(?<!AA|\n)$/ at -e line 1.
To solve this, add the appropriate number of . before newline.
➜ echo -e "AA\nBB\nCC\nDD" | perl -pe 's/(?<!AA|.\n)$/\$/'
AA
BB$
CC$
DD$
The point is that $ is a zero-width assertion and it can match before a final newline. Perl reads a line with a trailing \n, so $ matches twice: before and after that.
Your string basically goes to Perl as two lines:
hello\n
world\n
And the $ can match both before a final newline and at the very end of the string. Thus, there are two matches in both lines ("strings" in this context).
If you want to match the very end of string, use \z:
perl -pe 's/\z/\$/g'
since \z only matches the very end of the string, but it is not likely anyone would want to use that since it will effectively insert a $ at the start of the second and subsequent lines, adding it as the final line as well.
To only insert $ before the last \n and stop, use your perl -pe 's/$/\$/', with no g modifier.
If you really want to use it with the global replace, you can use the following command:
echo -e "hello\nworld" | perl -pe 's/^(.*)$/\1\$/g'
hello$
world$
or without back-references you can use:
echo -e "hello\nworld" | perl -pe 's/\n$/\$\n/g'
hello$
world$
you might need to replace \n by \r\n if you manipulate a file from windows or just use dos2unix to remove Windows EOL chars \r.

Replace slash in Bash

Let's suppose I have this variable:
DATE="04\Jun\2014:15:54:26"
Therein I need to replace \ with \/ in order to get the string:
"04\/Jun\/2014:15:54:26"
I tried tr as follows:
echo "04\Jun\2014:15:54:26" | tr '\' '\\/'
But this results in: "04\Jun\2014:15:54:26".
It does not satisfy me. Can anyone help?
No need to use an echo + a pipe + sed.
A simple substitution variable is enough and faster:
echo ${DATE//\//\\/}
#> 04\/Jun\/2014:15:54:26
Use sed for substitutions:
sed 's#/#\\/#g' < filename.txt > newfilename.txt
You usually use "/" instead of the "#", but as long as it is there, it doesn't matter.
I am writing this on a windows PC so I hope it is right, you may have to escape the slashes with another slash.
sed explained, the -e lets you edit the file in place. You can use -i to create a backup automatically.
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html
here you go:
kent$ echo "04/Jun/2014:15:54:26"|sed 's#/#\\/#g'
04\/Jun\/2014:15:54:26
your tr line was not correct, you may mis-understand what tr does, tr 'abc' 'xyz' will change a->x, b->y, c->z,not changing whole abc->xyz..
You can also escape the slashes, with a slightly less readable solution than with hashes:
echo "04/Jun/2014:15:54:26" | sed 's/\//\\\//g'
This has not been said in other answers so I thought I'd add some clarifications:
tr uses two sets of characters for replacement, and the characters from the first set are replaced with those from the second set in a one-to-one correspondance. The manpage states that
SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored.
Example:
echo abca | tr ab de # produces decd
echo abca | tr a de # produces dbcd, 'e' is ignored
echo abca | tr ab d # produces ddcd, 'd' is interpreted as a replacement for 'b' too
When using sed for substitutions, you can use another character than '/' for the delimiter, which will make your expression clearer (I like to use ':', #n34_panda proposed '#' in their answer). Don't forget to use the /g modifier to replace all occurences: sed 's:/:\\/:g' with quotes or sed s:/:\\\\/:g without (backslashes have to be escaped twice).
Finally your shortest solution will probably be #Luc-Olivier's answer, involving substitution, in the following form (don't forget to escape forward slashes too when part of the expected pattern):
echo ${variable/expected/replacement} # will replace one occurrence
echo ${variable//expected/replacement} # will replace all occurrences

Change CSV Delimiter with sed

I've got a CSV file that looks like:
1,3,"3,5",4,"5,5"
Now I want to change all the "," not within quotes to ";" with sed, so it looks like this:
1;3;"3,5";5;"5,5"
But I can't find a pattern that works.
If you are expecting only numbers then the following expression will work
sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
e.g.
$ echo '1,3,"3,5",4,"5,5"' | sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5"
You can't just replace the [0-9][0-9]* with .* to retain any , in that is delimted by quotes, .* is too greedy and matches too much. So you have to use [a-z0-9]*
$ echo '1,3,"3,5",4,"5,5",",6","4,",7,"a,b",c' | sed -e 's/,/;/g' -e 's/\("[a-z0-9]*\);\([a-z0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5";",6";"4,";7;"a,b";c
It also has the advantage over the first solution of being simple to understand. We just replace every , by ; and then correct every ; in quotes back to a ,
You could try something like this:
echo '1,3,"3,5",4,"5,5"' | sed -r 's|("[^"]*),([^"]*")|\1\x1\2|g;s|,|;|g;s|\x1|,|g'
which replaces all commas within quotes with \x1 char, then replaces all commas left with semicolons, and then replaces \x1 chars back to commas. This might work, given the file is correctly formed, there're initially no \x1 chars in it and there're no situations where there is a double quote inside double quotes, like "a\"b".
Using gawk
gawk '{$1=$1}1' FPAT="([^,]+)|(\"[^\"]+\")" OFS=';' filename
Test:
[jaypal:~/Temp] cat filename
1,3,"3,5",4,"5,5"
[jaypal:~/Temp] gawk '{$1=$1}1' FPAT='([^,]+)|(\"[^\"]+\")' OFS=';' filename
1;3;"3,5";4;"5,5"
This might work for you:
echo '1,3,"3,5",4,"5,5"' |
sed 's/\("[^",]*\),\([^"]*"\)/\1\n\2/g;y/,/;/;s/\n/,/g'
1;3;"3,5";4;"5,5"
Here's alternative solution which is longer but more flexible:
echo '1,3,"3,5",4,"5,5"' |
sed 's/^/\n/;:a;s/\n\([^,"]\|"[^"]*"\)/\1\n/;ta;s/\n,/;\n/;ta;s/\n//'
1;3;"3,5";4;"5,5"

Bash- How to convert non-alphanumerical character to "_"

I am trying to store user input in a variable and clean that variable in order to keep only alphanumerical caract + some others (I mean [a-zA-Z0-9-_]).
I tried using this but it isn't exhaustive :
SERVICE_NAME=$(echo $SERVICE_NAME | tr A-Z a-z | tr ' ' _ | tr \' _ | tr \" _)
Do you have some help for this?
Bash's string substitution is a fine thing: ${var//pat/rep}
val='Foo$%!*#BAR###baZ'
echo ${val//[^a-zA-Z_-]/_}
Foo_____BAR___baZ
A small explanation: The slash introduces a search/replace, a little like in sed (where it just delimits patterns). But you use a single slash for one replacement:
val='Foo$%!*#BAR###baZ'
echo ${val/[^a-zA-Z_-]/_}
Foo_%!*#BAR###baZ
Two slashes // mean replace all. Uncommon, but it has some logic, multiple slashes to mean multiple replace (please excuse my poor English).
And note how the $ is separated from the variable, but it is hard to modify a literal constant this way (which would be nice for testing). Modifying $1 isn't a no-brainer as well, afaik.
$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -e 's/[^a-zA-Z0-9\-]/_/g'
asd__qcw__d
I would use sed for this and use the ^ (not) operator in your set of valid characters and replace everything else with an underscore. The above shows the syntax with the output.
And, as a bonus, if you want to replace a run of invalid characters with one underscore, just add + to your regular expression (and use the -r switch to sed to make it use extended regular expressions:
$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -r 's/[^a-zA-Z0-9\-]+/_/g'
asd_qcw_d
I believe it can all be done in 1 single sed command like this:
echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g'
OUTPUT
foo_____bar___baz
perl way:
perl -ple 's/[^\w\-]/_/g'
pure bash way
a='foo-BAR_123,.:goo'
echo ${a//[^[:alnum:]-]/_}
produces:
foo-BAR_123___goo