Using sed to find lines with specific keywords - regex

This is in bash using CentOS
I am attempting to use sed to scan a text file to find lines that contain both the phrases "define" and "REV_NUMBER" (what lies before, in between, and after doesn't matter). However, I also want to ignore lines that have "//" in them because these indicate comments (the source file is a Verilog file).
My code is as follows:
REV=$(SED -n '/define REV_NUMBER/p' text.vh <<< $REV)
RESULT=$(echo "$REV")
This covers all lines that include:
define REV_NUMBER
But I want it to include lines that have, say:
define REV_NUMBER
Or any number of whitespace between the words.
But ignore lines that have
//define REV_NUMBER
//define REV_NUMBER
// define REV_NUMBER
I stumped how to achieve this. New to bash/shell scripting and sed. Normally a C++ guy using strings.
Thanks

You can use this sed command:
sed -n '/define *REV_NUMBER/{\~^ *//~!p;}' file
define REV_NUMBER

Wouldn't three grep connected by pipe be more readable?
grep -v "//" $FILE | grep "define" | grep "REV_NUMBER"

Related

Use of grep + sed based on a pattern file?

Here's the problem: i have ~35k files that might or might not contain one or more of the strings in a list of 300 lines containing a regex each
if I grep -rnwl 'C:\out\' --include=*.txt -E --file='comp.log' i see there are a few thousands of files that contain a match.
now how do i get sed to delete each line in these files containing the strings in comp.log used before?
edit: comp.log contains a simple regex in each line, but for the most part each string to be matched is unique
this is is an example of how it is structured:
server[0-9]\/files\/bobba fett.stw
[a-z]+ mochaccino
[2-9] CheeseCakes
...
etc. silly examples aside, it goes to show each line is unique save for a few variations so it shouldn't affect what i really want: see if any of these lines match the lines in the file being worked on. it's no different than 's/pattern/replacement/' except that i want to use the patterns in the file instead of inline.
Ok here's an update (S.O. gets inpatient if i don't declare the question answered after a few days)
after MUCH fiddling with the #Kenavoz/#Fischer approach, i found a totally different solution, but first things first.
creating a modified pattern list for sed to work with does work.
as well as #werkritter's approach of dropping sed altogether. (this one i find the most... err... "least convoluted" way around the problem).
I couldn't make #Mklement's answer work under windows/cygwin (it did work on under ubuntu, so...not sure what that means. figures.)
What ended up solving the problem in a more... long term, reusable form was a wonderful program pointed out by a colleage called PowerGrep. it really blows every other option out of the water. unfortunately it's windows only AND it's not free. (not even advertising here, the thing is not cheap, but it does solve the problem).
so considering #werkiter's reply was not a "proper" answer and i can't just choose both #Lars Fischer and #Kenavoz's answer as a solution (they complement each other), i am awarding #Kenavoz the tickmark for being first.
final thoughts: i was hoping for a simpler, universal and free solution but apparently there is not.
You can try this :
sed -f <(sed 's/^/\//g;s/$/\/d/g' comp.log) file > outputfile
All regex in comp.log are formatted to a sed address with a d command : /regex/d. This command deletes lines matching the patterns.
This internal sed is sent as a file (with process substitition) to the -f option of the external sed applied to file.
To delete just string matching the patterns (not all line) :
sed -f <(sed 's/^/s\//g;s/$/\/\/g/g' comp.log) file > outputfile
Update :
The command output is redirected to outputfile.
Some ideas but not a complete solution, as it requires some adopting to your script (not shown in the question).
I would convert comp.log into a sed script containing the necessary deletes:
cat comp.log | sed -r "s+(.*)+/\1/ d;+" > comp.sed`
That would make your example comp.sed look like:
/server[0-9]\/files\/bobba fett.stw/ d;
/[a-z]+ mochaccino/ d;
/[2-9] CheeseCakes/ d;
then I would apply the comp.sed script to each file reported by grep (With your -rnwl that would require some filtering to get the filename.):
sed -i.bak -f comp.sed $AFileReportedByGrep
If you have gnu sed, you can use -i inplace replacement creating a .bak backup, otherwise use piping to a temporary file
Both Kenavoz's answer and Lars Fischer's answer use the same ingenious approach:
transform the list of input regexes into a list of sed match-and-delete commands, passed as a file acting as the script to sed via -f.
To complement these answers with a single command that puts it all together, assuming you have GNU sed and your shell is bash, ksh, or zsh (to support <(...)):
find 'c:/out' -name '*.txt' -exec sed -i -r -f <(sed 's#.*#/\\<&\\>/d#' comp.log) {} +
find 'c:/out' -name '*.txt' matches all *.txt files in the subtree of dir. c:/out
-exec ... + passes as many matching files as will fit on a single command line to the specified command, typically resulting only in a single invocation.
sed -i updates the input files in-place (conceptually speaking - there are caveats); append a suffix (e.g., -i.bak) to save backups of the original files with that suffix.
sed -r activates support for extended regular expressions, which is what the input regexes are.
sed -f reads the script to execute from the specified filename, which in this case, as explained in Kenavoz's answer, uses a process substitution (<(...)) to make the enclosed sed command's output act like a [transient] file.
The s/// sed command - which uses alternative delimiter # to facilitate use of literal / - encloses each line from comp.log in /\<...\>/d to yield the desired deletion command; the enclosing of the input regex in \<...\>ensures matching as a word, as grep -w does.
This is the primary reason why GNU sed is required, because neither POSIX EREs (extended regular expressions) nor BSD/OSX sed support \< and \>.
However, you could make it work with BSD/OSX sed by replacing -r with -E, and \< / \> with [[:<:]] / [[:>:]]

Find and replace part of a string using bash in an XML file

I am new to bash scripting and was looking into what kid of command will help me replace a specific string in an xml file.
The string looks like
uri='file:/var/lib/abc/cde.repo/r/c/e/v/1.1/abc-1.1.jar'
Should be replaced with
uri='file:/lib/abc-1.1.jar'
The strings vary as the jars vary too. First part of the string "file:/var/lib/abc/cde.repo/r/" is constant and is across all strings. The 2nd half is varying
This needs to be done across entire file. Please note that replacing one is easier then doing it for each an every string that varies. I am trying to look for solution to do it in one single command.
I know we can use sed but how? Any pointers are highly appreciated
Thanks
With sed:
sed "s~uri='file:/var/lib/abc/cde.repo/r/c/e/v/1\.1/abc-1.1.jar~uri='file:/lib/abc-1\.1\.jar'~g"
Basically it is:
sed "s~pattern~replacement~g"
where s is the command substitute and the trailing g means globally. I'm using ~ as the delimiter char as this helps to avoid escaping all that / in the paths. (thanks #Jotne)
Update: In order to make the regex more flexible, you may try this:
sed 's~file.*/\(.*\.jar\)\(.*\)~file:///lib/\1\2~' a.txt
It searches for file: ... .jar links, grabs the name of the jar file and builds the new links.
Using awk you can do:
awk -F/ '/file:\/var\/lib\/abc\/cde.repo\/r/ {print $1,$3,$NF}' OFS=/ file
uri='file:/lib/abc-1.1.jar'
Static URL, but changing file name.
You do not need to use sed or even awk. You could simply use basename:
prefix='file:/lib/'
uri='file:/var/lib/abc/cde.repo/r/c/e/v/1.1/abc-1.1.jar'
result="${prefix}$(basename ${uri})"
echo ${result}
This worked for me :
sudo sed -i -e "s/stringToChange/TheNewString/g" test.xml

How to use command grep with several lines?

With a shell script I'm looking for a way to make the grep command do one of the following two options:
a) Use the grep command to display the following 10 lines of a match in a file; ie, the command grep "pattern" file.txt will result in all lines of the file that has that pattern:
patternrestoftheline
patternrestofanotherline
patternrestofanotherline
...
So I'm looking for this:
patternrestoftheline
following line
following line
...
until the tenth
patternrestofanotherline
following line
following line
...
until the tenth
b) Use the grep command to display all lines within two patterns as if they were limits
patternA restoftheline
anotherline
anotherline
...
patternB restoftheline
I do not know if another command instead of grep is a better option.
I'm currently using a loop that solves my problem but is line by line, so with extremely large files takes too much time.
I'm looking for the solution working on Solaris.
Any suggestions?
In case (a), What do you expect to happen if the pattern occurs within the 10 lines?
Anyway, here are some awk scripts which should work (untested, though; I don't have Solaris):
# pattern plus 10 lines
awk 'n{--n;print}/PATTERN/{if(n==0)print;n=10}'
# between two patterns
awk '/PATTERN1/,/PATTERN2/{print}'
The second one can also be done similarly with sed
For your first task, use the -A ("after") option of grep:
grep -A 10 'pattern' file.txt
The second task is a typical sed problem:
sed -ne '/patternA/,/patternB/p' file.txt

Why does sed /^$/d delete only blank lines but /^$/p print all lines?

I'm able to use sed /^$/d <file> to delete all the blank lines in the file, but what if I want to print all the blank lines only? The command sed /^$/p <file> prints all the lines in file.
The reason I want to do this is that we use an EDA program (Expedition) that uses regex to run rules on the names of nets. I'm trying to find a way to search for all nets that don't have names assigned. I thought using ^$ would work, but it just ends up finding all nets, which is what /^$/p is doing too. So is there a different way to do this?
Unless otherwise specified sed will print the pattern space when it has finished processing it. If you look carefully at your output you'll notice that you get 2 blank lines for every one in the file. You'll have to use the -n command line switch to stop sed from printing.
sed -n /^$/p infile
Should work as you want.
You can also use grep as:
grep '^$' infile
Sed prints every line by default, and so the p flag is useless. To make it useful, you need to give sed the -n switch. Indeed, the following appears to do what you want:
sed -n /^$/p
think in another way, don't p, but !d
you may try:
sed '/^$/!d' yourFile

How do I find broken NMEA log sentences with grep?

My GPS logger occassionally leaves "unfinished" lines at the end of the log files. I think they're only at the end, but I want to check all lines just in case.
A sample complete sentence looks like:
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76
The line should start with a $ sign, and end with an * and a two character hex checksum. I don't care if the checksum is correct, just that it's present. It also needs to ignore "ADVER" sentences which don't have the checksum and are at the start of every file.
The following Python code might work:
import re
from path import path
nmea = re.compile("^\$.+\*[0-9A-F]{2}$")
for log in path("gpslogs").files("*.log"):
for line in log.lines():
if not nmea.match(line) and not "ADVER" in line:
print "%s\n\t%s\n" % (log, line)
Is there a way to do that with grep or awk or something simple? I haven't really figured out how to get grep to do what I want.
Update: Thanks #Motti and #Paul, I was able to get the following to do almost what I wanted, but had to use single quotes and remove the trailing $ before it would work:
grep -nvE '^\$.*\*[0-9A-F]{2}' *.log | grep -v ADVER | grep -v ADPMB
Two further questions arise, how can I make it ignore blank lines? And can I combine the last two greps?
The minimum of testing shows that this should do it:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt | grep -v ADVER
-E use extended regexp
-v Show lines that do not match
^ starts with
.* anything
\* an asterisk
[0-9A-Fa-f] hexadecimal digit
{2} exactly two of the previous
$ end of line
| grep -v ADVER weed out the ADVER lines
HTH, Motti.
#Motti's answer doesn't ignore ADVER lines, but you easily pipe the results of that grep to another:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt |grep -v ADVER
#Tom (rephrased) I had to remove the trailing $ for it to work
Removing the $ means that the line may end with something else (e.g. the following will be accepted)
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76xxx
#Tom And can I combine the last two greps?
grep -Ev "ADVER|ADPMB"
#Motti: Combining the greps isn't working, it's having no effect.
I understand that without the trailing $ something else may folow the checksum & still match, but it didn't work at all with it so I had no choice...
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?
#Tom
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?
I'm using grep (GNU grep) 2.4.2 on Windows (for shame!) and it works for me (and DOS line-breaks are naturally accepted) , I don't really have access to other OSs at the moment so I'm sorry but I won't be able to help you any further :o(