Bash script to enclose words in single quotes

Bash script to enclose words in single quotes - regex

I'm trying to write a bash script to enclose words contained in a file with single quotes.
Word - Hello089
Result - 'Hello089',
I tried the following regex but it doesn't work. This works in Notepad++ with find and replace. I'm not sure how to tweak it to make it work in bash scripting.
sed "s/(.+)/'$1',/g" file.txt > result.txt

Replacement backreferences (also called placeholders) are defined with \n syntax, not $n (this is perl-like backreference syntax).
Note you do not need groups here, though, since you want to wrap the whole lines with single quotation marks. This is also why you do not need the g flags, they are only necessary when you plan to find multiple matches on the same line, input string.
You can use the following POSIX BRE and ERE (the one with -E) solutions:
sed "s/..*/'&',/" file.txt > result.txt
sed -E "s/.+/'&',/" file.txt > result.txt
In the POSIX BRE (first) solution, ..* matches any char and then any 0 or more chars (thus emulating .+ common PCRE pattern). The POSIX ERE (second) solution uses .+ pattern to do the same. The & in the right-hand side is used to insert the whole match (aka \0). Certainly, you may enclose the whole match with capturing parentheses and then use \1, but that is redundant:
sed "s/\(..*\)/'\1',/" file.txt > result.txt
sed -E "s/(.+)/'\1',/" file.txt > result.txt
See the escaping, capturing parentheses in POSIX BRE must be escaped.
See the online sed demo.
s="Hello089";
sed "s/..*/'&',/" <<< "$s"
# => 'Hello089',
sed -E "s/.+/'&',/" <<< "$s"
# => 'Hello089',

$1 is expanded by the shell before sed sees it, but it's the wrong back reference anyway. You need \1. You also need to escape the parentheses that define the capture group. Because the sed script is enclosed in double quotes, you'll need to escape all the backslashes.
$ echo "Hello089" | sed "s/\\(.*\\)/'\1',/g"
'Hello089',
(I don't recall if there is a way to specify single quotes using an ASCII code instead of a literal ', which would allow you to use single quotes around the script.)

Related

bash tool to search and replace text (while leaving text in the middle the same)

I have text files that look like this:
foo(bar(some_id)) I want to replace that with
bleh(some_id)
I can come up with the regex to find the instances, which is: foo\(bar\([a-zA-z0-9_]+\)\). But I dont know how to express that I want to keep the text in the middle the same.
Any suggestion? (I'm thinking of using sed or awk or any standard bash tool, whichever is easier )

You can use
sed -E 's/foo\(bar\(([^()]*).*/bleh(\1)/'
sed 's/foo(bar(\([^()]*\).*/bleh(\1)/'
The first pattern is POSIX ERE compliant, hence the -E option.
The foo\(bar\(([^()]*).* POSIX ERE pattern matches foo(bar(, then captures any zero or more chars other than ( and ) into Group 1 (\1 refers to this group value from the replacement pattern), and then matches the rest of string. After the replacement, the Group 1 value remains. You may add .* at the start if there is text before foo(bar(.
The second sed command is POSIX BRE equivalent of the above command.
See an online demo:
s='foo(bar(some_id))'
sed -E 's/foo\(bar\(([^()]*).*/bleh(\1)/' <<< "$s"
# => bleh(some_id)
sed 's/foo(bar(\([^()]*\).*/bleh(\1)/' <<< "$s"
# => bleh(some_id)

Using sed
$ sed 's/.*\(([^)]*)\).*/bleh\1/' input_file
bleh(some_id)

How to escape regex in sed replace

I want to replace text in a file. My regex is [\s\S\n]*<h1 class='test'>.
I tried following commands, but the text not replaced.
sed -i.bak 's/[\s\S\n]*<h1 class='test'>//g' 36
sed -i.bak 's/\[\\s\\S\\n\]\*<h1 class=\x27test\x27>//g' 36
File name is 36
Output of grep "[\s\S\n]*<h1 class='test'>" -q 36 && echo "FOUND" || echo "NOTFOUND" is FOUND.

sed by default only operates on a line-by-line basis.
To match across lines - as it appears you are using GNU sed - you need to use -z option (it will slurp the file contents and sed will be able to "see" line breaks) and then use . to match any char (in POSIX regex, . matches even line breaks). Note [\s\S] is a "corrupt" POSIX pattern, as inside POSIX bracket expressions, PCRE-like shorthand character classes are parsed as combinations of a backslash and a char next to it (i.e. [\s] matches a \ or s).
Another issue is that you used single quotation marks inside single quoted string, which is wrong (they got stripped in the end and your pattern had no ' in it).
So, with GNU sed use
sed -i.bak -z "s/.*<h1 class='test'>//g" 36
With a non-GNU sed, you could use techinques described here.

How to use grep/sed/awk, to remove a pattern from beginning of a text file

I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I've tried the following:
cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"
This identifies correctly the lines I want. Now, to delete the pattern I've tried:
cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//
but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7

With your shown samples, please try following grep command. Written and tested with GNU grep.
grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file
Explanation: Adding detailed explanation for above code.
^TIME\[ ##Matching string TIME from starting of value here.
\d+\.\d+ms\] ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+ ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.* ##to match till end of the value.
2nd solution: Adding awk program here.
awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file
Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.

This awk using its sub() function:
awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"
If there is replacement, sub() returns true.

$ cut -d'"' -f2 file
TEXT I WANT TO KEEP

You may use:
s='TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'
"TEXT I WANT TO KEEP"

The \s regex extension may not be supported by your sed.
In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).
You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.
sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt
should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)
If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.

A simpler one for sed:
sed 's/^[^"]*//' myfile.txt

If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:
sed -n '/^TIME/p' file | awk -F'"' '{print $2}'
should get the line starting with "TIME..." and print the text within the quotes.

Thanks all, for your help.
By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’
Cheers,
Pedro

Backreferences in sed returning wrong value

I am trying to replace an expression using sed. The regex works in vim but not in sed. I'm replacing the last dash before the number with a slash so
/www/file-name-1
should return
/www/file-name/1
I am using the following command but it keeps outputting /www/file-name/0 instead
sed 's/-[0-9]/\/\0/g' input.txt
What am I doing wrong?

You must surround between parentheses the data to reference it later, and sed begins to count in 1. To recover all the characters matched without the need of parentheses, it is used the & symbol.
sed 's/-\([0-9]\)/\/\1/g' input.txt
That yields:
/www/file-name/1

You need to capture using parenthesis before you can back reference (which start a \1). Try sed -r 's|(.*)-|\1/|':
$ sed -r 's|(.*)-|\1/|' <<< "/www/file-name-1"
/www/file-name/1
You can use any delimiter with sed so / isn't the best choice when the substitution contains /. The -r option is for extended regexp so the parenthesis don't need to be escaped.

It seems sed under OS X starts counting backreferences at 1. Try \1 instead of \0

Replace all whitespace with a line break/paragraph mark to make a word list

I am trying to vocab list for a Greek text we are translating in class. I want to replace every space or tab character with a paragraph mark so that every word appears on its own line. Can anyone give me the sed command, and explain what it is that I'm doing? I’m still trying to figure sed out.

For reasonably modern versions of sed, edit the standard input to yield the standard output with
$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g'
τέχνη
βιβλίο
γη
κήπος
If your vocabulary words are in files named lesson1 and lesson2, redirect sed’s standard output to the file all-vocab with
sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab
What it means:
The character class [[:blank:]] matches either a single space character or
a single tab character.
Use [[:space:]] instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab).
The + quantifier means match one or more of the previous pattern.
So [[:blank:]]+ is a sequence of one or more characters that are all space or tab.
The \n in the replacement is the newline that you want.
The /g modifier on the end means perform the substitution as many times as possible rather than just once.
The -E option tells sed to use POSIX extended regex syntax and in particular for this case the + quantifier. Without -E, your sed command becomes sed -e 's/[[:blank:]]\+/\n/g'. (Note the use of \+ rather than simple +.)
Perl Compatible Regexes
For those familiar with Perl-compatible regexes and a PCRE-capable sed, use \s+ to match runs of at least one whitespace character, as in
sed -E -e 's/\s+/\n/g' old > new
or
sed -e 's/\s\+/\n/g' old > new
These commands read input from the file old and write the result to a file named new in the current directory.
Maximum portability, maximum cruftiness
Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.
$ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\
/g'
τέχνη
βιβλίο
γη
κήπος
Notes:
Here we do not even assume the existence of the humble + quantifier and simulate it with a single space-or-tab ([ \t]) followed by zero or more of them ([ \t]*).
Similarly, assuming sed does not understand \n for newline, we have to include it on the command line verbatim.
The \ and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line.
Note: There must be no whitespace preceding the escaped newline. That is, the end of the first line must be exactly backslash followed by end-of-line.
This error prone process helps one appreciate why the world moved to visible characters, and you will want to exercise some care in trying out the command with copy-and-paste.
Note on backslashes and quoting
The commands above all used single quotes ('') rather than double quotes (""). Consider:
$ echo '\\\\' "\\\\"
\\\\ \\
That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.

The portable way to do this is:
sed -e 's/[ \t][ \t]*/\
/g'
That's an actual newline between the backslash and the slash-g. Many sed implementations don't know about \n, so you need a literal newline. The backslash before the newline prevents sed from getting upset about the newline. (in sed scripts the commands are normally terminated by newlines)
With GNU sed you can use \n in the substitution, and \s in the regex:
sed -e 's/\s\s*/\n/g'
GNU sed also supports "extended" regular expressions (that's egrep style, not perl-style) if you give it the -r flag, so then you can use +:
sed -r -e 's/\s+/\n/g'
If this is for Linux only, you can probably go with the GNU command, but if you want this to work on systems with a non-GNU sed (eg: BSD, Mac OS-X), you might want to go with the more portable option.

All of the examples listed above for sed break on one platform or another. None of them work with the version of sed shipped on Macs.
However, Perl's regex works the same on any machine with Perl installed:
perl -pe 's/\s+/\n/g' file.txt
If you want to save the output:
perl -pe 's/\s+/\n/g' file.txt > newfile.txt
If you want only unique occurrences of words:
perl -pe 's/\s+/\n/g' file.txt | sort -u > newfile.txt

option 1
echo $(cat testfile)
Option 2
tr ' ' '\n' < testfile

This should do the work:
sed -e 's/[ \t]+/\n/g'
[ \t] means a space OR an tab. If you want any kind of space, you could also use \s.
[ \t]+ means as many spaces OR tabs as you want (but at least one)
s/x/y/ means replace the pattern x by y (here \n is a new line)
The g at the end means that you have to repeat as many times it occurs in every line.

You could use POSIX [[:blank:]] to match a horizontal white-space character.
sed 's/[[:blank:]]\+/\n/g' file
or you may use [[:space:]] instead of [[:blank:]] also.
Example:
$ echo 'this is a sentence' | sed 's/[[:blank:]]\+/\n/g'
this
is
a
sentence

You can also do it with xargs:
cat old | xargs -n1 > new
or
xargs -n1 < old > new

Using gawk:
gawk '{$1=$1}1' OFS="\n" file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Bash script to enclose words in single quotes - regex

Related

bash tool to search and replace text (while leaving text in the middle the same)

How to escape regex in sed replace

How to use grep/sed/awk, to remove a pattern from beginning of a text file

Backreferences in sed returning wrong value

Replace all whitespace with a line break/paragraph mark to make a word list

Categories

Resources