How to grep for "string\tstring"? - regex

I need to search a large group of data files. I want to find files that contain the string "foo\tbar\tboo". I have tried this ...
$ find . -name "foo*dat" -exec grep foo {} \; | less
"miscinfo_foo" => [
"foo\tbar\tnot_foo"
"miscinfo_foo",
"miscinfo_foo" => [
"foo\tbar\tyes_foo"
"miscinfo_foo",
But if I do ...
$ find . -name "foo*dat" -exec grep -E "foo\tbar" {} \;
... I get no output. I have tried egrep too. I have tried escaping the \t with \\t but still get no output.
What am I doing wrong?
Thanks

Try
find . -name "foo*dat" -exec grep -E 'foo\\tbar' {} \;
^ ^ ^
in single quotes rather than double, and with an extra backslash. The '' prevent bash from processing backslashes, so that grep will actually see foo\\tbar. Based on your output, I think you are looking for the literal text backslash-tee, not an ASCII character 9, so double the backslash to have grep match it as literal text.

There are two effects at play here:
grep understands that \t means a tab character.
The shell will expand \\ to \ within a double-quoted string.
You want the slash to be escaped, so you need to pass \\t to grep within single quotes:
grep 'foo\\tbar'

Related

Using find/sed to replace strings in text files- works only on some of the matches

I want to replace
{not STRING }
with
(not STRING )
I ran
find . -maxdepth 1 -type f -exec sed -i -E 's/{not\s([^\s}]+)\s}/(not \1 )/g' {} ;
It worked on some of the matches. When I run grep with the same pattern it shows more files that still have STRING. Ran find/sed again, same result.
You need to escape curly braces ({}), as they are regex meta-characters. Also \s is not POSIX sed, I would use the more portable [[:space:]].
Your code did not work on the example text for me (GNU/Linux). This does:
sed -E 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g'
I also allowed for variable length whitespace directly after not and directly before } (using [[:space:]]+). You may or may not want that.
Also:
On MacOS sed I believe you need to supply a suffix argument to -i.
The trailing ; for find -exec must be quoted (\;) to avoid interpretation by the shell.
So the command would be:
find . -maxdepth 1 -type f -exec \
sed -E -i .TMP 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g' {} \;
If .TMP conflicts with an existing file, choose a different suffix.

Regex to return last 3 characters of matching pattern

I am using grep to search through text files containing 88 character long MRZs (machine readable zones). Within the text file they are preceeded by a semicolon.
I only want to get the substring of characters 3-5 from the string.
This is my pattern:
egrep --include *.txt -or . -e ";[A-Z][A-Z0-9<][A-Z<]{3}"
This is a textfile:
text is here;P<RUSIVAN<<DEL<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<F64D123456RUS7404124F131009734P41234<<<<<<<8 ;2019-02-08
This is my output:
;P<RUS
This is my desired output:
RUS
The semicolon introduces the MRZ. It starts with a uppercase letter, followed by either an uppercase letter, a digit or a filler character <. Then follows the 3 digit country code that can contain uppercase letters or filler characters <.
This pattern works fine, but what I only want returned is the last 3 digits I am quantifying. Is there a way to get only the last 3 characters of a matching pattern?
In the sample text file the desired output would be RUS.
Thank you!
If you could use GNU Grep, you can make use of \K which will no longer include any of the previous matched characters in the match and then match your character class 3 times:
grep -roP --include=*.txt ";[A-Z][A-Z0-9<]\K[A-Z<]{3}"
Is this all you're trying to do?
$ awk -F';' '{print substr($2,3,3)}' file
RUS
$ sed -E 's/[^;]*;..(.{3}).*/\1/' file
RUS
If not then edit your question to provide more truly representative sample input/output.
The UNIX command to find files is named find, btw, not grep. I know the GNU guys added a bunch of options for finding files to grep but just don't use them as they make your grep command unnecessarily complicated (and inconsistent with the other UNIX text processing tools) as it then needs arguments to find files as well as to g/re/p within the files. So your command line if you're using grep should be:
find . -name '*.txt' -exec grep 'stuff' {} +
not:
egrep --include *.txt -or . -e 'stuff'
and do the same for any other tool:
find . -name '*.txt' -exec grep 'stuff' {} +
find . -name '*.txt' -exec sed 'stuff' {} +
find . -name '*.txt' -exec awk 'stuff' {} +

Search and Replace String from text file Ubuntu

I have to replace following String
//#Config(manifest
with below string,
#Config(manifest
So this i created following regex
\/\/#Config\(manifest
And tried
grep -rl \/\/#Config\(manifest . | xargs sed -i "\/\/#Config\(manifest#Config\(manifest/g"
But i am getting following error:
sed: -e expression #1, char 38: Unmatched ( or \(
I have to search recursively and do this operation, though i am stuck with above error.
grep -rl '//#Config(manifest' | xargs sed -i 's|//#Config(manifest|#Config(manifest|g'
Specifying . for current directory is optional for grep -r
sed allows Any character other than backslash or newline to be used as delimiter
Edit
If file name contains spaces, use
grep -rlZ '//#Config(manifest' | xargs -0 sed -i 's|//#Config(manifest|#Config(manifest|g'
Explanation (assumes GNU version of commands)
grep
-r performs recursive search
-l option outputs only filenames instead of matched patterns
-Z outputs a zero byte (ASCII NUL character) after each file name instead of usual newline
'pattern' by default, grep uses BRE (basic regular expression) where characters like ( do not have special meaning and hence need not be escaped
xargs -0 tells xargs to separate arguments by the ASCII NUL character
sed
-i inplace edit, use -i.bkp if you want to create backup of original files
s|pattern|replace|g the g flag tells sed to search and replace all occurrences. sed also defaults to BRE and so no need to escape (. Using \( would mean start of capture groups and hence the error when it doesn't find the closing \)

Escaping Regex in SED

I need to use Sed to do a search and replace. I'm replacing /**# for define('WP_POST_REVISIONS', 3);\n\n/**#.
But I can't figure out the proper escaping. Even after escaping the (obviously needed) single quotes, I still get a bash: syntax error near unexpected token ')'
What is the proper escaping in this case?
try to replace your:
find /start/path -name *.html -exec sed -ie 's|/**#|define(\'WP_POST_REVISIONS\', 3);|g' '{}' \;
with:
find /start/path -name '*.html' -print0 \
| xargs -0 -n 1 sed -ie 's|/\*\*#|define('\''WP_POST_REVISIONS'\'', 3);\n/\*\*#|g'
and tell us what it gives you
(I tried to guess you were looking for the actual string "/**#" in your file(s) ... please give us examples of what you are really looking for, if it isn't that actual string)
It is not sed escaping, but bash escaping.
Escaping does not work within single-quotes (')
You can use double-quotes ("), if you have no special characters like "$\ in the parameter (or escape them there if necessary):
find /start/path -name *.html -exec sed -ie "s/abc/define('WP_POST_REVISIONS', 3);/g" '{}' \;
Or quote using $', which supports escaping:
find /start/path -name *.html -exec sed -ie $'s/abc/define(\'WP_POST_REVISIONS\', 3);/g' '{}' \;

Recursively replace django template tag with sed

I renamed something in my django application, and I want to recursively search and replace the tag in all of the templates. I tried to do this using find and sed like so.
find . -name *.html -exec sed -i 's/\{\{\s*oldtag\s*\}\}/{{ newtag }}/g' {} \;
I get this error.
sed: -e expression #1, char 44: Invalid preceding regular expression
Ok, so I tried a whole bunch of different things to try to make it work. I tried unescaping and double-escaping the curly braces. I tried using [ \t] instead of \s. Nothing seems to work. Some of the combinations don't give an error, but they also don't find or replace anything. What's even worse is sometimes I get this other error.
find: paths must precede expression: index.html
How can the path precede the expression? . is the path, and it immediately follows the find command. It precedes all the expressions.
Try:
find . -name '*.html' -exec sed -i 's|{{\s*oldtag\s*}}|{{ newtag }}|g' {} +
With some assumptions:
your sed implementation recognizes the \s escape sequence and the -i option
your find implementation supports the {} + syntax
You should be escaping the ' and \ characters. This should work:
find . -name *.html -exec sed -i \'s/{{\\s*oldtag\\s*}}/{{ newtag }}/g\' {} \;
Tip: You can always just insert echo just before the word sed to see a printout of what it looks like (see what is escaped).