Extract strings that lie outside the brackets using sed or awk

Extract strings that lie outside the brackets using sed or awk - regex

I have a string of the format abc(something that should be removed) is bad(another thing to remove): basically a string that has some words which are not in parenthesis and some that are in parenthesis. I want to extract the words that are not parenthesized. For eg. in the above example, output should be abc is bad.

You could try the below sed command,
sed 's/([^()]*)//g' file
Example:
$ cat file
abc(something that should be removed) is bad(another thing to remove)
$ sed 's/([^()]*)//g' file
abc is bad
Default sed uses BRE (Basic Regular Expressions) so you don't need to escape ( or ) to match a literal (, ) symbols.

Related

Bash script to enclose words in single quotes

I'm trying to write a bash script to enclose words contained in a file with single quotes.
Word - Hello089
Result - 'Hello089',
I tried the following regex but it doesn't work. This works in Notepad++ with find and replace. I'm not sure how to tweak it to make it work in bash scripting.
sed "s/(.+)/'$1',/g" file.txt > result.txt

Replacement backreferences (also called placeholders) are defined with \n syntax, not $n (this is perl-like backreference syntax).
Note you do not need groups here, though, since you want to wrap the whole lines with single quotation marks. This is also why you do not need the g flags, they are only necessary when you plan to find multiple matches on the same line, input string.
You can use the following POSIX BRE and ERE (the one with -E) solutions:
sed "s/..*/'&',/" file.txt > result.txt
sed -E "s/.+/'&',/" file.txt > result.txt
In the POSIX BRE (first) solution, ..* matches any char and then any 0 or more chars (thus emulating .+ common PCRE pattern). The POSIX ERE (second) solution uses .+ pattern to do the same. The & in the right-hand side is used to insert the whole match (aka \0). Certainly, you may enclose the whole match with capturing parentheses and then use \1, but that is redundant:
sed "s/\(..*\)/'\1',/" file.txt > result.txt
sed -E "s/(.+)/'\1',/" file.txt > result.txt
See the escaping, capturing parentheses in POSIX BRE must be escaped.
See the online sed demo.
s="Hello089";
sed "s/..*/'&',/" <<< "$s"
# => 'Hello089',
sed -E "s/.+/'&',/" <<< "$s"
# => 'Hello089',

$1 is expanded by the shell before sed sees it, but it's the wrong back reference anyway. You need \1. You also need to escape the parentheses that define the capture group. Because the sed script is enclosed in double quotes, you'll need to escape all the backslashes.
$ echo "Hello089" | sed "s/\\(.*\\)/'\1',/g"
'Hello089',
(I don't recall if there is a way to specify single quotes using an ASCII code instead of a literal ', which would allow you to use single quotes around the script.)

grep regex with backtick matches all lines

$ cat file
anna
amma
kklks
ksklaii
$ grep '\`' file
anna
amma
kklks
ksklaii
Why? How is that match working ?

This appears to be a GNU extension for regular expressions. The backtick ('\`') anchor matches the very start of a subject string, which explains why it is matching all lines. OS X apparently doesn't implement the GNU extensions, which would explain why your example doesn't match any lines there. See http://www.regular-expressions.info/gnu.html
If you want to match an actual backtick when the GNU extensions are in effect, this works for me:
grep '[`]' file

twm's answer provides the crucial pointer, but note that it is the sequence \`, not ` by itself that acts as the start-of-input anchor in GNU regexes.
Thus, to match a literal backtick in a regex specified as a single-quoted shell string, you don't need any escaping at all, neither with GNU grep nor with BSD/macOS grep:
$ { echo 'ab'; echo 'c`d'; } | grep '`'
c`d
When using double-quoted shell strings - which you should avoid for regexes, for reasons that will become obvious - things get more complicated, because you then must escape the ` for the shell's sake in order to pass it through as a literal to grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\`"
c`d
Note that, after the shell has parsed the "..." string, grep still only sees `.
To recreate the original command with a double-quoted string with GNU grep:
$ { echo 'ab'; echo 'c`d'; } | grep "\\\`" # !! BOTH \ and ` need \-escaping
ab
c`d
Again, after the shell's string parsing, grep sees just \`, which to GNU grep is the start-of-the-input anchor, so all input lines match.
Also note that since grep processes input line by line, \` has the same effect as ^ the start-of-a-line anchor; with multi-line input, however - such as if you used grep -z to read all lines at once - \` only matches the very start of the whole string.
To BSD/macOS grep, \` simply escapes a literal `, so it only matches input lines that contain that character.

Matching strings with grep and \A regexp

Given the string in some file:
hel string1
hell string2
hello string3
I'd like to capture just hel using cat file | grep 'regexp here'
I tried doing a bunch of regexp but none seem to work. What makes the most sense is: grep -E '\Ahel' but that doesn't seem to work. It works on http://rubular.com/ however. Any ideas why that isn't working with grep?
Also, when pasting the above string with a tab space before each line, the \A does not seem to work on rubular. I thought \A matches beginning of string, and that doesn't matter whatever characters was before that. Why did \A stop matching when there was a space before the string?

ERE (-E) does not support \A for indicating start of match. Try ^ instead.
Use -m 1 to stop grepping after the first match in each file.
If you want grep to print only the matched string (not the entire line), use -o.
Use -h if you want to suppress the printing of filenames in the grep output.
Example:
grep -Eohm 1 "^hel" *.log
If you need to enforce only outputting if the search string is on the first line of the file, you could use head:
head -qn 1 *.log | grep -Eoh "^hel"

ERE doesn't support \A but PCRE does hence grep -P can be used with same regex (if available):
grep -P '\Ahel\b' file
hel string1
Also important is to use word boundary \b to restrict matching hello
Alternatively in ERE you can use:
egrep '^hel\b'
hel string1

I thought \A matches beginning of string, and that doesn't matter whatever characters was before that. Why did \A stop matching when there was a space before the string?
\A matches the very beginning of the text, it doesn't match the start-of-line when you have one or more lines in your text.
Anyway, grep doesn't support \A so you need to use ^ which by the way matches the start of each line in multi-line mode contrary to \A.

Using awk
awk '$1=="hel"' file
PS you do not need to cat file to grep, use grep 'regexp here' file

Find a pattern and replace the whole line & find a pattern and insert after

Question 1:
Pattern:
test_$(whoami)
Variable:
var1=$(pwd)
I want to find the pattern and replace the whole line with var1
sed -i "s/.*test_$(whoami).*/$var1/" test.txt
It gives me sed: -e expression #1, char 28: unknown option to `s'
Question 2.
Pattern:
#####Insert here#####
Content to be insert: include $var1/file_$(whoami).txt
I want to find the line with the pattern(Fully match), and insert the content one line after
sed -i "s/#####Insert here#####/include $var1/file_$(whoami).txt" test.txt
Doesn't work either
Can someone help?

Re Question 1. Use a different regex delimiter:
sed -i.bak "s~^.*test_$(whoami).*$~$var1~" test.txt
since $var1 can contain /

Question 1.
It seems $var1 contains a character interpreted as a sed delimiter, namely: '/'.
In a substitute command, after the third delimiter, sed expects an occurrence number, and you may be providing text.
Example, if:
var1="~/myDirectory"
Then this produces a sed command with too many delimiters:
sed -i 's/.*test_$(whoami).*/~/myDirectory/"
You should use a different delimiter character such as ~, #, !, ?, &, | ... one which is not present in your regexp.
Sed will automatically recognize the delimiter character after the substitute command and enable you to use the '/' character in your regexp:
sed "s#~/toto#~/tata#"
If you have difficulties finding a character that is not present in your regexp, you can use a non-printable character which is unlikely to exist in your pattern. For example, if your shell is bash:
$ echo '/~#' | sed s$'\001''/~#'$'\001''!?\&'$'\001''g'
In this example, bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.
Question 2.
You may be looking for sed's append function ('a'):
sed -i "/#####Insert here#####/ a include $var1/file_$(whoami).txt" test.txt

using sed to replace ^[(s3B with blank space

I'm trying to use sed with perl to replace ^[(s3B with an empty string in several files.
s/^[(s3B// isn't working though, so I'm wondering what else I could try.

You need to quote the special characters:
$ echo "^[(s3B AAA ^[(s3B"|sed 's/\^\[[(]s3B//g'
AAA
$ echo "^[(s3B AAA ^[(s3B" >file.txt
$ perl -p -i -e 's/\^\[[(]s3B//g' file.txt
$ cat file.txt
AAA

The problem is that there are several characters that have a special meaning in regular expressions. ^ is a start-of-line anchor, [ opens a character class, and ( opens a capture.
You can escape all non-alphanumerics in a Perl string by preceding it with \Q, so you can safely use
s/\Q^[(s3B//
which is equivalent to, and more readable than
s/\^\[\(s3B//

If you're dealing with ANSI sequences (xterm color sequences, escape sequences), then ^[ is not '^' followed by '[' but rather an unprintable character ESC, ASCII code 0x1B.
To put that character into a sed expression you need to use \x1B in GNU sed, or see http://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/ . You can also insert special characters directly into your command line using ctrl+v in Bash line editing.

In regex "^", "[" and "(" (and many others) are special characters used for special regex features, if you are referencing the characters themselves you should preceed them with "\".
The correct substitution reges would be:
$string =~ s/\^\[\(3B//g
if you want to replace all occurences.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract strings that lie outside the brackets using sed or awk - regex

Related

Bash script to enclose words in single quotes

grep regex with backtick matches all lines

Matching strings with grep and \A regexp

Find a pattern and replace the whole line & find a pattern and insert after

using sed to replace ^[(s3B with blank space

Categories

Resources