How to grep some particular word out of multiple text files

How to grep some particular word out of multiple text files - regex

I have multiple text files (let say a.txt) whose contents is as shown below. I need to grep ICV tool version (in this case it is ICV:2018.12-1) out of this files. Can any body help me to grep tool version after ICV as this tool version (2018.12-1) is changing from one text file to another text file but this format is remains same for all files like ;tool name:tool version;tool name:tool version;?
1) a.txt
setenv VDK_TOOL_VERS 'CDESIGNER:2014.12-SP2-2;CUSTOMCOMPILER:2018.09-SP1-1;HSPICE:2018.09-SP1-1;XA:2018.09-SP2;STARRCXT:2018.06-SP5;ICV:2018.12-1;PYLCC:2008.09-SP4-11;CALIBRE:2018.2-15.10;CIRANOVA:2012.12-1-gcc44x;HERCULES:2008.09-SP5-2;WAVEVIEW:2019.06';

Search for what is not a semicolon:
$ grep -o 'ICV:[^;]*' a.txt
ICV:2018.12-1
Only the version using lookbehinds:
$ grep -Po '(?<=ICV:)[^;]*' a.txt
2018.12-1
Lookahead & Lookbehind

Related

Finding the number of xml files having a particular word in a directory

I read Chris Maes answer on how to a grep for a particular word in any file/files contained in a directory, and it worked.
But, what is the way of finding the names of the files and the total number of files containing that word ?
Please correct me if I am wrong. Thanks in advance !!

You just need to add --files-with-matches to the grep command:
grep -R --files-with-matches --include="*.xml" "pattern" /path/to/dir
If you want the number of matched files you can use pipe the output from the above grep to wc -l (or you can also pipe it to nl, so each line will be numbered).

grep and replace multiple strings between multiple files

I have multiple XML files with strings that are all set to
<msg type="Status"
Some of these strings should be type "Warning" and I have a separate text file with the warning strings.
I can do a
grep -f strings.txt *.xml
this allows me to see which warning strings are incorrectly listed as status strings. What I would like to do is
grep -f strings.txt *.xml | sed 's/status/warning/'
This gives me my desired output, but it is only being displayed (not saved). There are multiple xml files so i can't just save the output to one file. I need sed to replace the string in the original .xml file it originated from. Thanks for your help.

you were close instead of grep -f strings.txt *.xml | sed 's/status/warning/'
do
grep -l strings.txt *.xml | xargs sed -i 's/status/warning/g'
you forgot to use xargs, read here for more info about xargs.

Recommend trying xmllint. Using the --pattern option, you can specify xpath search queries to search for exactly what you need it to find without complicated regular expressions.

There is an Perl script xml_grep using xpath as query language:
http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep
xml_grep '//msg[#type="Warning"]' *.xml

How to grep for a file extension

I am currently trying to a make a script that would grep input to see if something is of a certain file type (zip for instance), although the text before the file type could be anything, so for instance
something.zip
this.zip
that.zip
would all fall under the category. I am trying to grep for these using a wildcard, and so far I have tried this
grep ".*.zip"
But whenever I do that, it will find the .zip files just fine, but it will still display output if there are additional characters after the .zip so for instance .zippppppp or .zipdsjdskjc would still be picked up by grep. Having said that, what should I do to prevent grep from displaying matches that have additional characters after the .zip?

Test for the end of the line with $ and escape the second . with a backslash so it only matches a period and not any character.
grep ".*\.zip$"
However ls *.zip is a more natural way to do this if you want to list all the .zip files in the current directory or find . -name "*.zip" for all .zip files in the sub-directories starting from (and including) the current directory.

On UNIX, try:
find . -type f -name \*.zip

You can also use grep to find all files with a specific extension:
find .|grep -e "\.gz$"
The . means the current folder.
If you want to specify a folder other than the current folder, just replace the . with the path of the folder.
Here is an example: Let's find all files that end with .gz and are in the folder /var/log
find /var/log/ |grep -e "\.gz$"
The output is something similar to the following:
✘ ⚙> find /var/log/ |grep -e "\.gz$"
/var/log//mail.log.1.gz
/var/log//mail.log.0.gz
/var/log//system.log.3.gz
/var/log//system.log.7.gz
/var/log//system.log.6.gz
/var/log//system.log.2.gz
/var/log//system.log.5.gz
/var/log//system.log.1.gz
/var/log//system.log.0.gz
/var/log//system.log.4.gz
The $ sign says that the file extension is ending with gz

I use this to get a listing of the file types inside a folder.
find . -type f | egrep -i -E -o "\.{1}\w*$" | sort -su
Outputs for example:
.DS_Store
.MP3
.aif
.aiff
.asd
.doc
.flac
.jpg
.m4a
.m4p
.m4r
.mp3
.pdf
.png
.txt
.wav
.wma
.zip
BONUS: with
find . -type f | egrep -i -E -o "\.{1}\w*$" | sort | uniq -c
You'll get the file count:
106 .DS_Store
35 .MP3
89 .aif
5 .aiff
525 .asd
1 .doc
60 .flac
48 .jpg
149 .m4a
11 .m4p
1 .m4r
12844 .mp3
1 .pdf
5 .png
9 .txt
108 .wav
44 .wma
2 .zip

You need to do a couple of things. It should look like this:
grep '.*\.zip$'
You need to escape the second dot, so it will just match a dot, and not any character. Using single quotes makes the escaping a bit easier.
You need the dollar sign at the end of the line to indicate that you want the "zip" to occur at the end of the line.

grep -r pattern --include="*.txt" /path/to/dir/

Try: grep -o -E "(\\.([A-z])+)+"
I used this to get multi-dotted/multiple extensions. So if the input was hello.tar.gz, then it would output .tar.gz.
For single dotted, use grep -o -E "\\.([A-z])+$".
Tested on Cygwin/MingW+MSYS.

One more fix/addon of the above example:
# multi-dotted/multiple extensions
grep -oEi "(\\.([A-z0-9])+)+" file.txt
# single dotted
grep -oEi "\\.([A-z0-9])+$" file.txt
This will get file extensions like '.mp3' and etc.

Just reviewing some of the other answers. The .* isn't necessary, and if you're looking for a certain file extension, it's best to include -i so that it's case-insensitive; in case the file is HELLO.ZIP, for example. I don't think the quotes are necessary, either.
grep -i \.zip$

If you just want to find in the current folder, why not with this simple command without grep ?
ls *.zip

Simply do :
grep ".*.zip$"
The "$" indicates the end of line

How do I use grep in the terminal to print a list of files matching a specific grep pattern?

For a school project, I have to SSH into a folder on the school server, the usr/bin folder which has a list of files, then print a list of files that start with "file". I know Regex half-decently, at least conceptually, but I'm not sure of the UNIX command to do this.
I tried grep '^[file][a-zA-Z0-9]*' (start of a line, letters f-i-l-e, then 0 or more occurrences of any other number or digit) but that doesn't seem to work.
Help?

You can use find command for this once you are connected to your school server.
find /usr/bin -type f -name "file*"
How would I do it if I wanted all files that started with a OR b, and ended with a OR b
Using find:
find /usr/bin -type f -regex "^[ab].*[ab]$"
Using ls and grep:
ls -1 /usr/bin | grep "^[ab].*[ab]$"

You should be able to use a simple ls command to get this information.
cd /usr/bin
ls -1 file*

For more complex matches, you could pipe the output of ls to grep, but wwomack's solution is simplest for your scenario.
# for file names starting with "file"
ls /usr/bin | grep ^file
# more complex file names
ls /usr/bin | grep "^[ab].*[ab]$"
# files that do not start with alphabetic characters
ls -a | grep ^[^a-zA-Z]
grep works on the contents of files, not file names. But, using pipes (|), you are able to treat the output (referred to as stdout) of one command as an input file (stdin) to another command.
You'll want to study regular expressions (and grep) more on your own, but here are some basics. First, grep operates on a line-by-line basis, comparing each line to the regex and printing it if it matches. At the beginning of the regex ^ anchors the match to the beginning of the line; at the end, $ anchors it to the end. If the regex pattern does not begin or end with these symbols then any subsequence of the line that matches the pattern causes the line to match.
For example, grep ^file$ only matches if the line only contains the word file while grep file matches any line that contains the word file anywhere. grep file$ matches lines that end with the word file with 0 or more characters before it.
Regarding your question, "whose names do not start with either a lowercase or an uppercase English letter" your command could be much simplified (see third example), but also notice that you begin the pattern with $: since $ matches the end of the line, your regex is impossible. One final note, in my example, I used ls -a to return all files including hidden . files. On Unix and Linux systems, if the first character of the file name is a dot, then the file will not normally show up when listing a directory.

Regex to find external links from the html file using grep

From past few days I'm trying to develop a regex that fetch all the external links from the web pages given to it using grep.
Here is my grep command
grep -h -o -e "\(\(mailto:\|\(\(ht\|f\)tp\(s\?\)\)\)\://\)\{1\}\(.*\?\)" "/mnt/websites_folder/folder_to_search" -r
now the grep seem to return everything after the external links in that given line
Example
if an html file contain something like this on same line
Googlehttps://yahoo.com'>Yahoo
then the given grep command return the following result
http://www.google.com">Google</a><p><a href='https://yahoo.com'>Yahoo</a></p>
the idea here is that if an html file contain more than one links(irrespective in a,img etc) in same line then the regex should fetch only the links and not all content of that line
I managed to developed the same in rubular.com
the regex is as follow
("|')(\b((ht|f)tps?:\/\/)(.*?)\b)("|')
with work with the above input
but iam not able to replicate the same in grep
can anyone help
I can't modify the html file so don't ask me to do that neither I can look for each specific tags and check their attributes to to get external links as it addup processing time and my application doesn't demand that
Thank You

Try this:
cat /path/to/file | egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+"
egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+" /path/to/file
Outputs one link per line. It assumes every link is inside single or double quotes. To exclude some certain domain links, use -v:
egrep -o "(mailto|ftp|http(s)?://){1}[^'\"]+" /path/to/file | egrep -v "yahoo.com"

By default grep prints the entire line a match was found on. The -o switch selects only the matched parts of a line. See the man page.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to grep some particular word out of multiple text files - regex

Search for what is not a semicolon: $ grep -o 'ICV:[^;]' a.txt ICV:2018.12-1 Only the version using lookbehinds: $ grep -Po '(?<=ICV:)[^;]' a.txt 2018.12-1 Lookahead & Lookbehind

Related

Finding the number of xml files having a particular word in a directory

grep and replace multiple strings between multiple files

How to grep for a file extension

How do I use grep in the terminal to print a list of files matching a specific grep pattern?

Regex to find external links from the html file using grep

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to grep some particular word out of multiple text files - regex

Search for what is not a semicolon: $ grep -o 'ICV:[^;]*' a.txt ICV:2018.12-1 Only the version using lookbehinds: $ grep -Po '(?<=ICV:)[^;]*' a.txt 2018.12-1 Lookahead & Lookbehind

Related

Finding the number of xml files having a particular word in a directory

grep and replace multiple strings between multiple files

How to grep for a file extension

How do I use grep in the terminal to print a list of files matching a specific grep pattern?

Regex to find external links from the html file using grep

Categories

Resources

Search for what is not a semicolon: $ grep -o 'ICV:[^;]' a.txt ICV:2018.12-1 Only the version using lookbehinds: $ grep -Po '(?<=ICV:)[^;]' a.txt 2018.12-1 Lookahead & Lookbehind