Removing lines from txt file, where line doesn't match string - regex

I have a .txt file, with lots of lines in it. I have a procedure to fill up a database, using this textfile. But I only want to insert the lines where the string from position 67 to 70 matches 772. I cannot change the procedure to read the file, I have to change the file itself.
So in fact, I want to remove all lines from the txt-file where the string on position 67 to 70, doesn't match 772.
How can I get this done?

The following regex matches that string from position 67 to 70:
^.{66}772.*$
There are various ways to remove lines based on this regex, such as using grep with the -v flag. It depends on the tool you're using.

Related

How to remove unsorted duplicate lines in Notepad++ using Regex

I have my file (link is in comment)
A Sample of Data
Yn2STc5A
MBI1irwA
Yn2STc5A
agCGRvWu
KZIcwFII
414PGEBK
MBI1irwA
KZIcwFII
lln5OKRi
Yn2STc5A
6gCsLHJA
Yn2STc5A
MBI1irwA
KZIcwFII
MBI1irwA
22LYWQsX
22LYWQsX
Yn2STc5A
KZIcwFII
agCGRvWu
lln5OKRi
This file has 528 lines, every line is a repetition of 13 lines, And the 13 lines is a code per a Team link.
I have used and searched many Regex
But only these two was a bit close to what I needed,
Find: ^(.{8}\n)([\S\s]+?\1) and this too ^(.*)([\S\s]+?\1)
Replace All: $2
But I have to press Replace all repetitively, (47 times at least) to reach my goal...
My Desired Output should be out of complete file..
1:22LYWQsX
2:414PGEBK
3:6gCsLHJA
4:C6C8JOnf
5:KZIcwFII
6:MBI1irwA
7:NQid5EnY
8:P68A94uk
9:Yn2STc5A
10:agCGRvWu
11:jbsO5Pzk
12:lln5OKRi
13:vWSvMjaa
Thanks in advance
I recommend to use standard functions of Notepad++ (my version 8.1.9 64 bit) if possible for your needs.
First open the sample data file (*.txt) by Notepad++
From the main menu go to Edit > Line Operations > Remove Duplicate Lines
Go to Edit > Line Operations > Sort Lines Lexicographically Ascending
Format the result as desired for your needs.
Interim result:

Grouping lines with a header using regex

I'm trying to write a regex query that groups lines which start with a type of key as a header.
For example the key will be an line containing an 'A' followed by a number, I'm alternating bold lines to indicate a group. So the first 4 lines are one group, the next 2 a group etc. :
dd A3
This line is arbitrary
This line is also arbitrary
1234 Arbitrary
A9
This line is arbitrary
ff A3 d
A5ff
Hi there
Hello
This is what I ended up with that worked: .A[0-9].*\n((?!A[0-9]).|\n)

Extract Text From CSV

I want to grab the regular expressions out of the snort rules.
Here's an example of the text that I've saved as a csv - https://rules.emergingthreats.net/open/snort-2.9.0/rules/emerging-exploit.rules
So there are multiple rules,
#by Akash Mahajan
#
alert udp $EXTERNAL_NET any -> $HOME_NET 14000 (msg:"ET EXPLOIT Borland VisiBroker Smart Agent Heap Overflow"; content:"|44 53 52 65 71 75 65 73 74|"; pcre:"/[0-9a-zA-Z]{50}/R"; reference:bugtraq,28084; reference:url,aluigi.altervista.org/adv/visibroken-adv.txt; reference:url,doc.emergingthreats.net/bin/view/Main/2007937; classtype:successful-dos; sid:2007937; rev:4;)
and I want only the text that appears after "pcre" in all of them, extracted and printed to a new file, without the quotes
pcre:"/[0-9a-zA-Z]{50}/R";
So, from this line above, I want to end up with the below text;
/[0-9a-zA-Z]{50}/R
From every place "pcre" appears in the whole file.
I've been messing around with grep, awk, and sed. I just can't figure it out. I'm fairly new to this.
Could anyone give me some tips?
Thanks
With GNU sed:
$ sed -n -r 's/.*\<pcre:"([^"]+).*/\1/p' file
/[0-9a-zA-Z]{50}/R
You can do this using grep. But the thing with grep is that it can't only display a matching group, it can only display the matched text.
In order to get by this you need to use look-ahead and look-behind.
Lookahead (?=foo)
Asserts that what immediately follows the current position in the string is foo
Lookbehind (?<=foo)
Asserts that what immediately precedes the current position in the string is foo
┌─ print file to standard output
│ ┌─ has pcre:" before matching group (look-behind)
│ │ ┌─ has "; after matching group (look-ahead)
cat file | grep -Po '(?<=pcre:\")(.*)(?=\";)'
││ └─ what we want (matching group)
│└─ print only matched part
└─ all users

Notepad ++ clone each line

I have a file that includes such lines
111
112
113
I want to clone the lines and add a seperator between the numbers. Output should be as the following
111#111
112#112
113#113
How can I do it with notepadd++ by using regex replace
Find (.+)
Replace \1#\1
This will work

regex tutorial, How can I improve this

I needed a utililty function earlier today to strip some data out of a file and wrote an appaling regular expresion to do it. The input was a file with lots of line with the format:
<address> <11 * ascii character value> <11 characters>
00C4F244 75 6C 74 73 3E 3C 43 75 72 72 65 ults><Curre
I wanted to strip out everything bar the 11 characters at the end and used the following expression:
"^[0-9A-F+]{8}[\\s]{2}[0-9A-F\\s]{34}"
This matched to the bits I didn't want which I then removed from the original string. I'd like to see how you'd do this but the particular areas I couldn't get working were:
1: having the regex engine return the characters I wanted rather than the characters I didn't and
2: finding a way of repeating the match on a single ascii value followed by the space (eg "75 " = [0-9A-F]{2}[\s]{1}?) and repeating that 11 times rather than grabbing 34 characters.
Looking at it again the easiest thing to do would be to match to the last 11 characters of each input line but this isn't very flexible and in the interests of learning regex I would like to see how you can match through from the start of the sequence.
Edit: Thanks guys, this is what I wanted:
"(?:^[0-9A-F]{8} )(?:[0-9A-F]{2} ){11} (.*)"
Wish I could turn more than one of you green.
As the file has a fixed format, you could use this regular expression to just match the last 11 characters.
^.{44}(.{11})
Last eleven is:
...........$
or:
.{11}$
Matching a hex byte + space and repeat eleven times:
([0-9A-Fa-f]{2} ){11}
1) ^[0-9A-F+]{8}[\s]{2}[0-9A-F\s]{34}(.*)
Parens are used for grouping with extraction. How you retrieve it depends on your language context, but now some sort of $1 is set to everything after the initial pattern.
2) ^[0-9A-F+]{8}[\s]{2}(?:[0-9A-F\s]){11}\s(.*)
(?:) is grouping without extraction. So (?:[0-9A-F\s]){11} considers the subpattern there as a unit and looks for it repeated 11 times.
I'm assuming PCRE here, by the way.
The address and ascii char value are all hex so:
^[0-9A-F\s]{42}
Matching the end of the line would be
.{11}$
To match only the end, you can use a positive look behind.
"(?<=(^[0-9A-F+]{8}[\\s]{2}[0-9A-F\\s]{34}))(.*?)$"
This would match any character until the end of the line, providing that it is preceded by the "look behind" expression.
(?<=....) defines a condition that must be met before matching is possible.
I am a bit short of time, but if you look on the net for any tutorial that contain the words "regex" and "lookbehind", you will find good stuff (if a regex tutorial covers look ahead/behind, it will usually be pretty complete and advanced).
Another advice is to get a regex training tool and play with it. Have a look at this excellent Regex designer.
If you're using Perl, you could also use unpack(), to get each element.
my #data;
open my $fh, '<', $filename or die;
for my $line(<$fh>){
my($address,#list) = unpack 'a8xx(a2x)11xa11', $line;
my $str = pop #list;
# unpack the hexadecimal bytes
my $data = join '', map { pack 'H2',$_ } #list;
die unless $data eq $str;
push #data, [$address,$data,$str];
}
close $fh;
I also went ahead and converted the 11 hexadecimal codes back into a string, using pack().