Getting list of commands using regex - regex

I have list of commands where some are having parameters which I need to skip before executing them.
show abc(h2) xyz
show abc(h2) xyz opq(h1)
show abc(h2) xyz <32>
show abc(a,l) xyz [<32>] opq
show abc
Ultimately, the list has different combinations of ( ), <>, [] with plain text commands.
I want to separate out all other commands from plain commands like "show abc".
Processing needed on commands :-
(h1), (h2), (a,l) are to be discarded
<32> - is to be replaced with any ip address
[<32>] - is to be replaced with any integer digit
I tried following but resultant file was empty :-
cat show-cmd.txt | grep "<|(|[" > hard-cmd.txt
How can I get the result file which has no plain commands using regex?
Desired output file :-
show abc xyz
show abc xyz opq
show abc xyz 1.1.1.1
show abc xyz 2 opq

Try using grep followed by sed
grep '[(<\[]' file | sed -e 's/\[<32>\]/2/g' -e 's/<32>/1.1.1.1/g' -e 's/([^)]*)//g'
Output:
show abc xyz
show abc xyz opq
show abc xyz 1.1.1.1
show abc xyz 2 opq
Please note that order of s///g command matters in your case.
Also try avoiding redundant use of cat

cat show-cmd.txt | grep "[\[\(\<]" > hard-cmd.txt
This should work. The opening and closing square brackets [] mean that only one of the options need to be present. Then the further brackets that you want to search for are provided and escaped by a .
Hope this helps.
Pulkit

Related

Regular Expression to match against first character and file extension

I'm using Bash to try to write a command that gets every file where the first character is not 'a' and the file does not end with '.html' but cannot seem to get both to work properly.
So far I can get my regex to match all the files that start with 'a' and end with '.html' and remove them but my issue that I cannot seem to solve is when the file starts with 'a' and ends with a different file extension. My regex seems to ignore that second requirement and just hides it regardless.
cat inputfile.txt | sed -n '/^[^a].*[^html$]/p'
Input File Contents:
123
anapple.html
456
theapple.html
789
nottrue.html
apple.csv
12
Output:
123
456
theapple.html
789
nottrue.html
12
Instead of trying to write a pattern that matches the rows to keep, write a pattern that matches the rows to remove, and use grep -v to print all the lines that don't match it.
grep -v '^a.*\.html$' inputfile.txt

Sed parsing >50 MB file with over 50k lines failing with no error

I'm parsing a description file (given in question) with this magic sed
sudo sed -nE 's/(^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}) .*
http://(.?)\s+.$/!\1 \2/p'
Even when I copy sample lines from orginial files and put in temp file it works but on complete file its failing. I tried debugging using ! switch but too doesn't work it return no unsuccessful matches. Where I go with this?
Background
I have a case, a file which I need to post-process. The sample format is given below:-
bigspeedpro.com Intel::DOMAIN from http://malc0de.com/bl/BOOT via intel.criticalstack.com F
1.1.1.1 Intel::DOMAIN from http://abcd.com/bl/BOOT via intel.criticalstack.com F
Expected output is :--
1.1.1.1 abcd
Parsing is as:-
Anything which doesn't start with IP address delete that line
If start with IP address do
delete Intel::DOMAIN
between from to F replace it based upon following strings occurrences
e.g malc0de or abcd
PART 2
I want only
12.2.2.2 Intel::DOMAIN from http://hosts-file.net/fsa.txt via intel.criticalstack.com F
http://hosts-file.net/fsa.txt
I use
sudo sed -nE 's/(^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}) .*
http://(.*)\s+?$/ \1 \2/p'
Its giving me
12.2.2.2 hosts-file.net/fsa.txt via intel.criticalstack.com
I don't want this?
solved part 2
silkman#Silky-flows:~/tmp$ sudo sed -nE
's/(^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}) .*
http://(.?)\s+[v][i][a].$/ \1 \2/p'
update
1.1.1.1 www.abc.com
1.1.2.2 def.com
2.2.2.2 mnx.dbc.net
However, I want second column after ip address to be shortened to a string of my own choice for e.g in second column I only accept
abc
def
mnx
Once, its found just replace entire string as
1.1.1.1 abc
1.1.2.2 def
2.2.2.2 mnx
Thanks.
How about some perl code:
perl -ne 's/Intel::DOMAIN.*from http:\/\/(.+?)\..*/\1/; m/^(\d{1,3}\.){3}\d{1,3}/ and print'
It first removes unwanted text and prints the line only if it begins with something that looks like an IP address. If you cannot use perl code, it might be possible to "port" this to sed.

Regular expression to find the text lines that do not contain a specific string using grep(linux terminal-opensuse)

for example if i have 2 lines like
1.George Pappas george2 12136 Peristeri –-----
2. Nick Pappas nick4 11223 Aigaleo 5324123
i want to find the lines containing Pap but not Aig
the result must be line 1 since the 2nd one contains Aig
i m completely new to terminal commands so let me know if something of what i said wasn't clear and needs more info.
You can do this:
grep Pap input.txt | grep -v Aig
-v means invert-match
You can do like this:
awk '/Pap/ && !/Aig/' file
1.George Pappas george2 12136 Peristeri .-----

How to remove matching pattern?

How do i remove my matching pattern from the file?
Everytime the pattern [my_id= occurs, it shall be removed without replacement.
For example, the field [my_id=AB_123456789.1] should be AB_123456789.1.
I already tried, with no result
sed '/\[my\_id\=/d'
awk '$(NF-1) /^[protein\_id\=/d'
Also it is possible to remove the first n characters from the last but 1 field ($(NF-1)) as an alternative?
Thanks for any help
You can use:
sed 's/\[my_id=\([^]]*\)\]/\1/g' file
\[my_id=\([^]]*\)\] looks for this and replaces with the text inside (\1).
\[my_id=\([^]]*\)\] means [my_id= plus a string not containing ], that is caught with the \(...\) syntax to be printed back with \1.
Test
$ cat a
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
$ sed 's/\[my_id=\([^]]*\)\]/\1/g' a
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb
You can try something like this in awk
$ cat <<test | awk 'gsub(/\[my_id=|\]/,"")'
hello [my_id=AB_123456789.1] bye
adf aa [my_id=AB_123456789.1] bbb
test
hello AB_123456789.1 bye
adf aa AB_123456789.1 bbb

Copy Text that matches a regular expression

I have a regular expression that has several matches in a textfile.
I want to copy only the the matches to a second file. I dont want to copy the lines that contain the matches: I only want the matched text.
I dont find a way to do this in notepad++ (only copies complete lines, not only the matches). Also not in Visual Studio search.
Is there a way to copy only matches? Maybe in grepp or sed?
You can do it with both. Lets say I have a following file -
Sample file:
[jaypal:~/Temp] cat myfile
this is some random number 424-555
and my cell is 111-222-3333
and 42555 is my zip code
And I want to capture only numbers from myfile
Using sed:
With sed you can use the combination of -n and p option along with grouped pattern.
sed -n 's/.[^0-9]*\([0-9-]\+\).*/\1/p'
| | | | | ||
--- ---------- -- |
| | | ---------> `p` prints only matched output. Since
V V V we are suppressing everything with -n
Suppress Escaped `(` \1 prints we use p to invoke printing.
output start the group first matched
you can reference group
it with \1. If you
have more grouped
pattern then they can
be called with \2 ...
Test:
[jaypal:~/Temp] sed -n 's/.[^0-9]*\([0-9-]\+\).*/\1/p' myfile
424-555
111-222-3333
42555
You can simply re-direct this to another file.
Using grep:
You can use either -
egrep -o "regex" filename
or
grep -E -o "regex" filename
From the man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (see below).
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
Test:
[jaypal:~/Temp] egrep -o "[0-9-]+" myfile
424-555
111-222-3333
42555
You can simply re-direct this to another file.
Note: Obviously these are simple examples but it conveys the point.
This might work for you:
sed -n 's/^.*\(matched text regexp\).*/\1/w matched_text_file' source_file
You can do it with Notepad++ by bookmarking the wanted lines then using menu => Search => Copy styled text => Find style (marked). The following lines show the method in more detail:
Example text:
aposidfupwoebfadbsf-mytext1-ausdfioabq
qoejbgaoudfb -mytext2-asdoufbnqub
foqiuebgf-mytext3- ñqloienbq
alkbnepaofub -mytext4- jafpoebqaf
Want to extract all -mytext?- words
Create a Regex to find text you want to copy: -mytext\d+-
Use the find function (ctlr + f) of Notepad++ and open the "Mark" Tab:
"Bookmark line" option on "Mark" Tab
Enter the regex, activate the option Bookmark line and then click on Mark All button on the Mark Tab.
Mark all text we need
Finally open the Search menu and select the option Copy Styled Text and then choos the correct color, like this:
Select all ocurrences of the regex
Then paste it where you need it.
Here an animation of the entire process:
Entire process Gif animation