This question already has answers here:
How to give a pattern for new line in grep?
(6 answers)
Closed 7 years ago.
How to find more than one newline in code before } with regular expression and add warning to it? This is what I tried add bash script in xcode:
TAGS2="\}/\n"
echo "searching ${SRCROOT} for ${TAGS2}"
find "${SRCROOT}" \( -name "*.h" -or -name "*.m" \) -print0 | xargs -0
egrep --with-filename --line-number --only-matching "($TAGS2).*\$" |
perl -p -e "s/($TAGS2)/ warning\$1/"
If you mean that you would like to find places in your file where there is a blank line before a line whose first non-blank character is a }, the following awk program might help.
awk '/\S/ { blanks = NR - prev - 1; prev = NR }
blanks && /^\s*}/ { printf "%d: WARNING: Useless blank line\n", NR - 1 }'
That will work with Gnu awk; with more Posix-compatible awks you'll need to use /[^ \t]/ and /^[ \t]*}/ (or something similar) to instead of the given patterns.
egrep is a wonderful tool but it does line-by-line matching. It cannot recognize multiline patterns, nor can you change the line delimiter.
egrep does not work with multiple lines.
You can use sed (or awk as mentioned in previous answer) instead of "egrep". No need to use perl. See the script below.
TAGS2="}"
echo "searching ${SRCROOT} for ${TAGS2}"
find "${SRCROOT}" \( -name "*.h" -or -name "*.m" \) \
-print -exec sed "/$TAGS2/{ # search for tag
=; # print line number
N; # join with next line
s/$TAGS2/warning/; # substitute tags found with warning
}" {} \;
Related
I want to replace
{not STRING }
with
(not STRING )
I ran
find . -maxdepth 1 -type f -exec sed -i -E 's/{not\s([^\s}]+)\s}/(not \1 )/g' {} ;
It worked on some of the matches. When I run grep with the same pattern it shows more files that still have STRING. Ran find/sed again, same result.
You need to escape curly braces ({}), as they are regex meta-characters. Also \s is not POSIX sed, I would use the more portable [[:space:]].
Your code did not work on the example text for me (GNU/Linux). This does:
sed -E 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g'
I also allowed for variable length whitespace directly after not and directly before } (using [[:space:]]+). You may or may not want that.
Also:
On MacOS sed I believe you need to supply a suffix argument to -i.
The trailing ; for find -exec must be quoted (\;) to avoid interpretation by the shell.
So the command would be:
find . -maxdepth 1 -type f -exec \
sed -E -i .TMP 's/\{not[[:space:]]+([^[:space:]}]+)[[:space:]]+\}/(not \1 )/g' {} \;
If .TMP conflicts with an existing file, choose a different suffix.
I'm trying to pipe the output of a find command to a perl one-liner to replace a line that ends with ?> with RedefineForDocker::standardizeXmlmc() but for some reason the value isn't being replaced. I've checked the output of the find command and it is performing as expected, and I've double checked my regex and it should match.
find . -name *.php -exec ggrep -Ezl 'class XmlMethodCall.*([?]>)$' {} \; \
| xargs perl -ewpn -i.bak2 \
"s/[?]>\s*?$/RedefineForDocker::standardizeXmlmc()\n/gm"
I get no warnings and no indication that it isn't working, the backups are created, but the file remains unchanged. The list of matched files run from the find command is below.
./swsupport/clisupp/trending/services/data.helpers.php
./swsupport/clisupp/_bpmui/arch/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm_default/service/data.helpers.php
./webclient_code/php/session.php
./webclient_code/service/storedquery/helpers.php
./php/_phpinclude/itsm/xmlmc/xmlmc.php
./php/_phpinclude/itsmf/xmlmc/xmlmc.php
./php/_phpinclude/itsm_default/xmlmc/xmlmc.php
Here is an example of one of the files it should match
https://regex101.com/r/BUoCif/1
Run your perl command as this:
perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Order of command line option is important here.
Full pipeline should be like this:
find . -name '*.php' -exec ggrep -PZzl '(?ms)class XmlMethodCall.*\?>\h*$' {} + |
xargs -0 perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Note use -Z option in grep and -0 option in xargs to address issues with filenames with whitespaces etc.
I am using grep to search through text files containing 88 character long MRZs (machine readable zones). Within the text file they are preceeded by a semicolon.
I only want to get the substring of characters 3-5 from the string.
This is my pattern:
egrep --include *.txt -or . -e ";[A-Z][A-Z0-9<][A-Z<]{3}"
This is a textfile:
text is here;P<RUSIVAN<<DEL<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<F64D123456RUS7404124F131009734P41234<<<<<<<8 ;2019-02-08
This is my output:
;P<RUS
This is my desired output:
RUS
The semicolon introduces the MRZ. It starts with a uppercase letter, followed by either an uppercase letter, a digit or a filler character <. Then follows the 3 digit country code that can contain uppercase letters or filler characters <.
This pattern works fine, but what I only want returned is the last 3 digits I am quantifying. Is there a way to get only the last 3 characters of a matching pattern?
In the sample text file the desired output would be RUS.
Thank you!
If you could use GNU Grep, you can make use of \K which will no longer include any of the previous matched characters in the match and then match your character class 3 times:
grep -roP --include=*.txt ";[A-Z][A-Z0-9<]\K[A-Z<]{3}"
Is this all you're trying to do?
$ awk -F';' '{print substr($2,3,3)}' file
RUS
$ sed -E 's/[^;]*;..(.{3}).*/\1/' file
RUS
If not then edit your question to provide more truly representative sample input/output.
The UNIX command to find files is named find, btw, not grep. I know the GNU guys added a bunch of options for finding files to grep but just don't use them as they make your grep command unnecessarily complicated (and inconsistent with the other UNIX text processing tools) as it then needs arguments to find files as well as to g/re/p within the files. So your command line if you're using grep should be:
find . -name '*.txt' -exec grep 'stuff' {} +
not:
egrep --include *.txt -or . -e 'stuff'
and do the same for any other tool:
find . -name '*.txt' -exec grep 'stuff' {} +
find . -name '*.txt' -exec sed 'stuff' {} +
find . -name '*.txt' -exec awk 'stuff' {} +
I need to search a large group of data files. I want to find files that contain the string "foo\tbar\tboo". I have tried this ...
$ find . -name "foo*dat" -exec grep foo {} \; | less
"miscinfo_foo" => [
"foo\tbar\tnot_foo"
"miscinfo_foo",
"miscinfo_foo" => [
"foo\tbar\tyes_foo"
"miscinfo_foo",
But if I do ...
$ find . -name "foo*dat" -exec grep -E "foo\tbar" {} \;
... I get no output. I have tried egrep too. I have tried escaping the \t with \\t but still get no output.
What am I doing wrong?
Thanks
Try
find . -name "foo*dat" -exec grep -E 'foo\\tbar' {} \;
^ ^ ^
in single quotes rather than double, and with an extra backslash. The '' prevent bash from processing backslashes, so that grep will actually see foo\\tbar. Based on your output, I think you are looking for the literal text backslash-tee, not an ASCII character 9, so double the backslash to have grep match it as literal text.
There are two effects at play here:
grep understands that \t means a tab character.
The shell will expand \\ to \ within a double-quoted string.
You want the slash to be escaped, so you need to pass \\t to grep within single quotes:
grep 'foo\\tbar'
I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!
A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.
You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).
Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.
What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.