Remove special character at the beginning of words in Unix - regex

I need help removing special characters from the beginning of the word in a Unix shell.
For example I have the list of words like this,
'aaa
'bbb
'ccc
'ddd
I want to remove the quotes and get output like this,
aaa
bbb
ccc
ddd
How can I remove only the quote at the beginning of each word?

You will need to match at a word boundary, which is delimited with \b.
So for example, if you were using sed and wanted to remove a single quote ' at the beginning of any word, you would use
sed "s/'\b//g"
Which means "replace any single quote immediately before a word boundary with an empty string".
Additionally, if you aren't worried about at the beginning of the line, you can use the specifier ^, which matches the start of a line.
sed "s/^'//g"

Give a try to:
echo 'aaa 'bbb 'ccc 'ddd | tr -d "'"

Related

Regex to get match on entire string

How to match a a word before a specific charachter using sed in bash?
In my scenario I would need to match the metrics names in the entire string which occurs only before {.
The below is the string I am working on.
sum(rate(nginx_ingress_controller_request_duration_seconds_sum{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sum(rate(nginx_ingress_controller_request_duration_seconds_count{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))
What I would need the output is the below.
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
I am not a Regex expert and I would be very thankful.
With GNU grep:
grep -oP '\(\K[^({]+(?={)'
This will print the results in separate lines. \(\K will check for presence of ( character and reset the start of matching portion (since ( isn't needed in the output). [^({]+ will match except ( and { characters. (?={) makes sure that the matched portion is followed by { character (but not part of the output).
If you know that the required portion can have only word characters, you can also use:
grep -oP '\w+(?={)'
This will look for two occurrences on the line onto a separate line in new_file
(with GNU sed):
sed 's/.*(\(.*\){.*(\(.*\){.*/\1\n\2/' your_file > new_file
Contents of new_file:
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
The ways it's working is as follows:
/.*(: Match everything after a { up to a (
\(.*\): I remember the stuff in between \( and \) (these are called
capture group)
{.*(: Match everything after a { up to a (
\(.*\): I remember a second group of stuff using a second capture group
{.*: Match the rest of the stuff in the line
/\1\n\2/: Put the two patterns we remembered back into a file a newline
\n between.
Edit
Another approach that would would work for multiple occurrences would be to
create newlines and a unique patter at the points before and after the part of the string that
you're interested in, and then grep away those lines:
sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file | grep -v BADLINES
The first part (sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file) produces:
sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_sum
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_count
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))
and the | grep -v BADLINES produces:
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
This might work for you (GNU sed):
sed -E '/^(\w+)\{/{s//\1\n/;P;D};s/^\w*\W/\n/;D' file
If the start of the line is a valid string followed by a {, replace the { by a newline, print/delete the first line in the pattern space and repeat.
Otherwise, reduce the pattern space and repeat until all strings are matched.
N.B. A valid string in this case is a word i.e. alphanumeric or an underscore.

How to match and replace string following the match via sed or awk

I have a file which I want to modify into a new file using cat.
So the file contains lines like:
name "myName"
place "xyz"
and so on....
I want these lines to be changed to
name "Jon"
place "paris"
I tried to do it like this but its not working:
cat originalFile | sed 's/^name\*/name "Jon"/' > tempFile
I tried using all sorts of special characters and it did not work. I am unable to recognize the space characters after name and then "myName".
You may match the rest of the line using .*, and you may match a space with a space, or [[:blank:]] or [[:space:]]:
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' originalFile > tempFile
Note there are two replace commands here joined with s semicolon. The first parts are wrapped with a capturing group that is necessary because the space POSIX character class is not literal and in order to keep it after replacing the \1 backreference should be used (to insert the text captured with Group 1).
See the online demo:
s='name "myName"
place "xyz"'
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' <<< "$s"
Output:
name "Jon"
place "paris"
An awk alternative:
awk '$1=="name"{$0="name \"Jon\""} $1=="place"{$0="place \"paris\""} 1' originalFile
It will work when there're space(s) before name or place.
It's not regex match here but just string compare.
awk separates fields by space characters which including \n or .
Append > tempFile to it when the results seems correct to you.

Sed prepend when searching for colon

I need to search for each instance of a colon ":" and then prepend a string to the word before that colon.
Example:
some data here word:number
Desired outcome:
some data here prepend_word:number
I've tried:
sed "s/:/s/^/prepend_/g"
This adds prepend_ to the beginning of the line: prepend_some data here word:number
sed "s/:/prepend_&/g"
this adds prepend_ right before the colon: some data here wordprepend_:number
You need to use
sed 's/[^[:space:]]*:/prepend_&/g'
The [^[:space:]]*: pattern searches for 0 or more non-whitespace chars and a : after them, and the prepend_& replacement pattern will replace the match with itself (see &) and insert prepend_ before it.
See an online sed demo:
sed 's/[^[:space:]]*:/prepend_&/g' <<< "some data here word:number more:here"
Output: some data here prepend_word:number prepend_more:here.

Regex for replacing space with comma-space, except at end of line

I am trying to covert input file content of this:
NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077
to this:
NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077
i.e., replace a space with comma and space (excluding newline)
For that, I have tried:
perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt
but it gives:
NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077
how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?
Thanks
Try using regex negative lookbehind. It is basically look if the character before the space is colon (:) then it don't match that space.
s/(?<!:)[^\S\n]+/, /g
You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g
It can be done with perl:
perl -pe's/\b\h+/, /g' file
but also with sed:
sed -E 's/\b[ \t]+/, /g' file
Other approach that uses the field separator:
perl -F'\b\h+' -ape'BEGIN{$,=", "}' file
or do the same with awk:
awk -F'\b[ \t]+' -vOFS=', ' '1' file
You were close. That should do the trick:
s/(\d+-\d+)[^\S\n]+/$1, /g
The thing is, I try to look at the parts that will get a comma after them which apply to the pattern of "digits, then a dash, more digits, then a whitespace that's not a newline". The funny thing about it is that I said that "whitespace that's not a newline" part as [^\S\n]+ which means "not a non-whitespace or a newline" (because \S is all that's not \s and we want to exclude the newline too). If in any case you have some trailing whitespace, you can trim it with s/\s+$// prior to the regex above, just don't forget to add the newline character back after that.

REGEX - How to get rid of quotation marks at the start and end of a string

Have a bunch of strings
"pipe 1/4" square"
"3" bar"
"3/16" spanner
2" nozzle
spare tyre
I want to get rid of " marks from the start of the string and the end of the string with RegEx.
I've been trying on a simulator with the aid of some references but cannot seem to do it right.
Q: What is the RegEx that will do this with BASH?
Use this regex to match double quotes which exists at the start and end of a line ^"|"$ and then replace the match with empty string.
Using sed.
sed 's/^"\|"$//g' <<<$var
Try the following command:
echo $var | sed 's/^(.*)"$/\1/'
This will pass the variable $var into the sed command via the pipe | operator. Sed will then substitute this input string with the group match in parenthesis. This match is available in sed as \1. So your input string, minus the final quotation mark, is what will actually be output by echo.
Using Bash parameter expansion:
a="\"pipe 1/4\" square\""
a="${a/#\"/}" && a="${a/%\"/}"
echo "$a"
Output:
pipe 1/4" square
Explanation:
${var/old/new} replaces old with new in $var.
A # before old makes it to match at the beginning of $var.
A % before old makes it to match at the end of $var.