insert string between each pair of doublequotes - regex

I am stuck with situation, I have string as shown below:
-name "B_12*" -o -name "B_21*" -o -name "B_31" -o -name "B_41"
My requirement is I want to convert above string is as shown below:
-name "B_12*.tar" -o -name "B_21*.tar" -o -name "B_31.tar" -o -name "B_41.tar"
I am not expert with bash commands but I have little bit idea the problem could be solved with sed command.

The only tricky part here is that you need to match both quotes so that they won't be matched again. With a sed distro which has ERE support by -E option, following command would suffice.
sed -E 's/("[^"]*)"/\1.tar"/g' file

This pattern will match the text string without single quote , all you need to do is get all the matches and perform an alternate query to add .tar
\b[A-Z][^"]+
\b[A-Z] match a Char in scope [A-Z]
[^"] match until "
Demo Regex101.com
sed-replace-syntax

Related

Using find/sed to replace strings in text files- works only on some of the matches

I want to replace
{not STRING }
with
(not STRING )
I ran
find . -maxdepth 1 -type f -exec sed -i -E 's/{not\s([^\s}]+)\s}/(not \1 )/g' {} ;
It worked on some of the matches. When I run grep with the same pattern it shows more files that still have STRING. Ran find/sed again, same result.
You need to escape curly braces ({}), as they are regex meta-characters. Also \s is not POSIX sed, I would use the more portable [[:space:]].
Your code did not work on the example text for me (GNU/Linux). This does:
sed -E 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g'
I also allowed for variable length whitespace directly after not and directly before } (using [[:space:]]+). You may or may not want that.
Also:
On MacOS sed I believe you need to supply a suffix argument to -i.
The trailing ; for find -exec must be quoted (\;) to avoid interpretation by the shell.
So the command would be:
find . -maxdepth 1 -type f -exec \
sed -E -i .TMP 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g' {} \;
If .TMP conflicts with an existing file, choose a different suffix.

How to grep for a pattern in every file involving brackets ${<any-word}

Environment: bash (cygwin)
I have a need to grep every file in a directory with a specific extension, and have printed to screen, just the pattern I am looking for.
It must support multiple patterns per line for the file.
The pattern is: dollar sign, left curly, then any word or no word, then right curly bracket, like so:
$P{<anyword>}
Preferably a single: grep command, or find
find . -type f -name '*.txt' -exec grep <something> {} \;
The issue is that I have a statement to do this, but it returns the whole line where the expression is found, and I only want the pattern found to be displayed.
I am in need of help with finding the regex expression to find the pattern:
$P{any-series-of-characters-or-numbers-or-dashes-or-underlines-anything-at-all-up-until-the-next-closing-curly-bracket}
I have tried several things that do not work, and then to print just what is found, but not the file name that it is found in.
given myFile.txt:
asd
asd
asdf
fdg dsfg dsf g
askldf ${foo}
${bar} dfsdfg ${}
asdf asdf
asdfl asdf ${zzzzz
AKSDHA ASDF {aaaa}
grep -o -E '[$]{[^}]*}' myFile.txt results in:
${foo}
${bar}
${}
The regex can definitely be tighten up to cover more use cases....
You can try to use grep flag -o (--only-matching): "print only the matched (non-empty) parts of a matching line".
UPD:
grep --only-matching --no-filename -P '(?<=\$\{)[^}]*(?=\})'
And if you need to include ${ and } to the result, please use
grep --only-matching --no-filename -E '\$\{[^}]*\}'
Using Perl regular expression :
find . -type f -name '*.txt' -exec grep -Po '\$\{[\w-]*\}' {} \;

Regex matching any of the strings coupled with a certain string but NOT containing a third string

I want to match all directories that contain words from a list AND the word test but never word DAT.
EB80
TF90
UI11
POSPO02
Therefore, the string is a match if any of the above patterns are in it and the word test is also in the string. But the string DAT should NEVER be anywhere in the match.
I have this regex but it does not seem to be working correctly:
EB80 | TF90 | UI11 | POSPO02 [^DAT]test$
find . -regextype sed -regex "EB80 | TF90 | UI11 | POSPO02 [^DAT]test$"
Not particularly elegant but with basic find:
$ ls
DATtestTF90 EB80test POSPO02test UI11
$ find . -name "*DAT*" -prune -o -name "*test*" \( -name "*EB80*" -o -name "*TF90*" -o -name "*UI11*" -o -name "*POSPO02*" \) -print
./POSPO02test
./EB80test
The arguments to find can be understood as:
-- If the name matches "*DAT*" stop! (-prune) and proceed no further (see also: What does -prune option in find do?)
-- Otherwise, (-o), if the name matches "*test*" AND the name contains any one of the given patterns, output the name (-print)
The parentheses work like you'd expect in a typical programming language. By default any two predicates have an AND relation, but this can be overidden with -o to give an OR relationship. The parens, in the words of the man page, are used to "Force precedence", again as I'm sure your used to in other languages. Hence you can read the second part of the find as
name == "*test*" AND (name=="*EB80*" OR name=="*TF90*" OR name=="*UI11*" OR name=="*POSPO02*")
Note that because the parentheses have meaning for the shell, they need to be escaped so that find receives them in tact.
You can't express in a single regexp (a or b) and c and !d where those chars are actually strings. Even if they were just chars trying to express it in a single regexp would be a convoluted mess if it were possible at all. [^DAT] means not (D or A or T) btw - [] is a bracket expression and as such contains sets of characters, not strings.
You should consider using awk to match the condition you care about for post-processing the find output. It'd simply be:
find . -type d -print |
awk '/EB80|TF90|UI11|POSPO02/ && /test/ && !/DAT/'
because it's trivial to write what you need as a condition, but not as a single regexp. If your file names can contain newlines then with GNU find and GNU awk just use NUL as the file name terminator instead of newline:
find . -type d -print0 |
awk -v RS='\0' '/EB80|TF90|UI11|POSPO02/ && /test/ && !/DAT/'
Obviously you can add some of the condition to the find and take it out of the awk if you care for efficiency but you might find it easier to maintain if you have your whole condition in one place like above.
Some people will argue that I'm spawning too many procs, but sometimes readability matters, too, and since you didn't explicitly say one way or another I'm going to assume that order of these strings isn't relevant. How about -
find . -type d -name \*test\* |
grep -v DAT | egrep "EB80|TF90|UI11|POSPO02"
A quick test -
$: mkdir footestbar
$: mkdir footestbarDAT
$: mkdir footestbarDATEB80
$: mkdir footestbarEB80
$: find . -type d -name \*test\* |
> grep -v DAT | egrep "EB80|TF90|UI11|POSPO02"
./footestbarEB80

Using SED to replace a domain name in a large number of HTML files

Ok, I give up. I've been trying for a couple of hours to get sed to replace an incorrectly formatted domain name in several thousand html files but I cannot seem to get the escaping of the slashes (and possibly dot/colon) correct.
Text to find:
http://www.domain.com/http
Replace with:
http
What i have tried:
sed -i 's/http:\/\/www.domain.com\/http/http/'
sed -i 's/http\\:\\/\\/www\\.domain\\.com\\/http/http/'
sed -i 's/http\:\/\/www\.domain\.com\/http/http/'
sed -i 's=http://www.domain.com/http=http='
UPDATE:
As it transpires I was chasing chasing ghosts. A piece of javascript was adding the http://www.domain.com/ to the beginning of all my img tags! Unfortunately now I need to try and remove this from all pages. So instead of the above, i am now looking to:
Replace this:
http://www.domain.com/'+img[0]
with this:
'+img[0]
I have tried the following to no avail:
find . -name "*.html" -type f -exec sed -i 's|http://www\.domain\.com/\'+img\[0\]|\'+img\[0\]|g' {} \;
find . -name "*.html" -type f -exec sed -i 's|http://www\.domain\.com/\'+img[0]|\'+img[0]|g' {} \;
I appear to be stuck on the escaping of certain chars again. Only this time when i try to run one of the above commands it just takes me to a > prompt.
You can avoid alot of the escaping by using a different delimiter. The dot . is the only character of special meaning that needs to be escaped, everything else you can match literally. Also use the global modifier with your pattern.
sed -i 's|http://www\.domain\.com/http|http|g'
Edit — You can use the following to replace the other part.
sed -i "s|http://www\.domain\.com/\('[+]img\[0\]\)|\1|g"

Using sed between specific lines only

I have this sed command for removing the spaces after commas.
sed -e 's/,\s\+/,/g' example.txt
How can i change it that, it will make the modification between only specific line numbers.
(e.g. between second and third lines).
Use:
sed '2,3s/,\s\+/,/g' example.txt
This will apply the regex /,\s\+/ only in the lines numbered 2 to 3 (inclusive) and substitute the match with ,.
Since OSX (BSD sed) has some syntax differences to linux (GNU) sed, thought I'd add the following from some hard-won notes of mine:
OSX (BSD) SED find/replace within (address) block (start and end point patterns(/../) or line #s) in same file (via & via & via & section 4.20 here):
Syntax:
$ sed '/start_pattern/,/end_pattern/ [operations]' [target filename]
Standard find/replace examples:
$ sed -i '' '2,3 s/,\s\+/,/g' example.txt
$ sed -i '' '/DOCTYPE/,/body/ s/,\s\+/,/g' example.txt
Find/replace example with complex operator and grouping (cannot operate without grouping syntax due to stream use of standard input). All statements in grouping must be on separate lines, or separated w/ semi-colons:
Complex Operator Example (will delete entire line containing a match):
$ sed -i '' '2,3 {/pattern/d;}' example.txt
Multi-file find + sed:
$ find ./ -type f -name '*.html' | xargs sed -i '' '/<head>/,/<\/head>/ {/pattern/d; /pattern2/d;}'
Hope this helps someone!
sed -e '2,3!b;s/,\s\+/,/g' example.txt
This version can be useful if you later want to add more commands to process the desired lines.