Notepad++ `Find in Files` displays all results except a certain keyword - regex

I am using notepad++ to search for certain keywords (using regular expression). something like (word1|word2|this statement|another statement). It works but can I search and show all results except a certain keyword? something like exclude the following words (exclude this|exclude this)? For example, below.
samedir\File1.log
This is line 1
This is line 2
This is line 3
exclude this
This is line 4
This is line 5
This is line 6
not excluded
excluding this
samedir\File2.log
This is line 1 1
This is line 2 1
This is line 3 1
exclude this
This is line 4 1
This is line 5 1 1
This is line 6 1
not excluded
excluding this
For example: I want to start a find in both files (on the same directory) but exclude the lines with excluding this and exclude this
the results should show something like below
File1.log
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
not excluded
File2.log
This is line 1 1
This is line 2 1
This is line 3 1
This is line 4 1
This is line 5 1 1
This is line 6 1
not excluded

You can do this with a lookahead assertion:
^(?!excluding this|exclude this)[^\r\n]*$
This will match entire lines as long as they don't contain excluding this or exclude this.
The key is the (?!) part. See http://www.regular-expressions.info/lookaround.html for more info.

You could try the regex like below to match all the lines which don't have exclude or excluding this strings.
^(?!.*\bexclud(?:ing|e) this\b).+$
DEMO
This (?!.*\bexclud(?:ing|e) this\b) negative lookahead at the start asserts that there isn't a string exclude this or excluding this present on the the line in which we are going to match. That is , the above regex would match all the lines except the one which contains exclude this or excluding this

I wanted to exclude one string and search for two strings in one line, so I used this regex:
^(?!.*excludethisstring).*includethisstring1.*includethisstring2.*$
This will make it so that the one line searched MUST have the two strings included, if you want to search for either one of the lines:
^(?!.*excludethisstring).*(includethisstring1|includethisstring2).*$

Related

RegEx matching odd number of characters at the beginning and the end of string

I have grammar is Lezer where I need to match a "custom string" which can start with any odd number of " and end with the same corresponding number. It can span multiple lines as well and anything inside needs to be skipped as far as the parser is concerned. I am struggling a little with the regEx part of the matching.
str"test" // valid
str"""test""" // valid
str""test"" // not valid
str""""test"""" // not valid
I am trying to match the beginning and end of that string.
I tried among other things "("")*[^"] but it matches the first letter after the odd double quotes (due to the [^"] which is something I would like to avoid.
For matching the end I have a similar issue.
So with the given input of:
1 str"test"
2 str"""
3 a
4 b
5 c
6 """
7 str""nope""
I am trying to match only str" for line 1 and str""" for line 2 and not match on line 7.
Need to match the ends as well (not in the same regex). So the match should be on " for line 1 and """ for line 6.
I have this so far: start: ^str"("")*[^"] end: [^"]"("")*$ but it is not optimal.
FYI I need start and end since the expectation is when you start writing and we hit a match on the beginning the highlighting in the editor should highlight all remaining text as a string until you have a matching odd number of ".
Any advice is appreciated.

Back-referencing in vim search

In my file I have text of type:
1 module ABC(
2 …
3 …
any number of lines
n-1 …
n ABC …
I am trying to highlight both ABC in line 1 and ABC in line n.
I am using the following regex:
/module \(\<.\{-}\>\)\|^\s*\1
However, this regex is not working.
It highlights:
ABC line 1 correctly and
Space between line beginning and first character on all lines which is not expected.
My end goal is to to change the ABC at both places to have the same prefix and suffix.
In this case lets say both ABCs get changed to prefix_ABC_suffix.
Please help me fix the search regex.

Find repeating gps using regular expression

I work with text files, and I need to be able to see when the gps (last 3 columns of csv) "hangs up" for more than a few lines.
So for example, usually, part of a text file looks like this:
5451,1667,180007,35.7397387,97.8161897,375.8
5448,1053z,180006,35.7397407,97.8161814,375.7
5444,1667,180005,35.7397445,97.8161674,375.6
5439,1668,180004,35.7397483,97.8161526,375.5
5435,1669,180003,35.7397518,97.8161379,375.5
5431,1669,180002,35.7397554,97.8161269,375.6
5426,1054z,180001,35.7397584,97.8161115,375.6
5420,1670,175959,35.7397649,97.8160931,375.9
But sometimes there is an error with the gps and it looks like this:
36859,1598,202603.00,35.8867316,99.2515545,555.700
36859,1598,202608.00,35.8867316,99.2515545,555.700
36859,1142z,202610.00,35.8867316,99.2515545,555.700
36859,1597,202612.00,35.8867316,99.2515545,555.700
36859,1597,202614.00,35.8867316,99.2515545,555.700
36859,1596,202616.00,35.8867316,99.2515545,555.700
36859,1595,202618.00,35.8867316,99.2515545,555.700
I need to be able to figure out a way to search for matching strings of 7 different numbers, (the decimal portion of the gps) but so far I've only been able to figure out how to search for repeating #s or consecutive numbers.
Any ideas?
If you were to find such repetitions in an editor (such as Notepad++), you could use the following regex to find 4 or more repeating lines:
([^,]+(?:,[^,]+){2})\v+(?:(?:[^,]+,){3}\1(?:\v+|$)){3,}
To go a bit into detail
([^,]+(?:,[^,]+){2})\v+ is a group consisting of one or more non-commas followed by comma and another one or more non-commas followed by a vertical space (linebreak), that is not part of the group (e.g. 1,1,1\n)
(?:[^,]+,){3} matches one or more non-commas followed by comma, three times (your columns that don't have to be considered)
\1 is a backreference to group 1, matching if it contains exactly the same as group 1
(?:\v+|$) matches either another vertical whitespaces or the end of the text
{3,} for 3 or more repetitions - increase it if you want more
Here you can see, how it works
However, if you are using any programming language to check this, I wouldn't walk on the path of regex, as checking for those repetitions can be done a lot easier. Here is one example in Python, I hope you can adopt it for your needs:
oldcoords = [0,0,0]
lines = [line.rstrip('\n') for line in open(r'C:\temp\gps.csv')]
for line in lines:
gpscoords = line.split(',')[3:6]
if gpscoords == oldcoords:
repetitions += 1
else:
oldcoords = gpscoords
repetitions = 0
if repetitions == 4: #or however you define more than a few
print(', '.join(gpscoords) + ' is repeated')
If you can use perl, and if I understood you:
perl -ne 'm/^[^,]*,[^,]*,[^,]*,([^,]*,[^,]*,[^,]*$)/g; $current_line=$1; ++$line_number; if ($prev_line==$current_line){$equals++} else {if ($equals>=6){ print "Last three fields in lines ".($line_number-$equals-1)." to ".($line_number-1)." are equals to:\n$prev_line" } ; $equals=0}; $prev_line=$current_line' < onlyreplacethiswithyourfilepath should do the trick.
Sample output:
Last three fields in lines 1 to 7 are equals to:
35.8867316,99.2515545,555.700
Last three fields in lines 16 to 22 are equals to:
37.8782116,99.7825545,572.810
Last three fields in lines 31 to 44 are equals to:
36.6868916,77.2594245,581.358
Last three fields in lines 57 to 63 are equals to:
35.5128764,71.2874545,575.631

conditionally remove portion of a line in delimited file

I have a ~ delimited text file with about 20 nullable columns.
I am trying to use SED (from cygwin) to "blank out" the value in column 11 if the following conditions are met...
Column 3 is a zero (0)
Column 11 is in date format mm/dd/yy (I'm not really concerned if it's a valid date)
Here's what I'm trying...
s/\([^~]*~[^~]*~0~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~\)\(\d{2}\/\d{2}\/\d{2}~\)\(.*$\)/\1~\3/
Here's a sample from the file:
Test A~7~1~~~~72742050~~~Z370~10/25/11~~~0~8.58563698~6.40910452~4.59198764~3.18239469~1.72955975~.23345372~-1.30891113~-2.89971394~1~0
Test B~7~0~~~~72742060~~~Z351~05/15/12~05/14/12~~0~18.88910518~12.69425528~9.96182381~6.76077612~6.76077612~3.86279298~.22449489~-.91021010~0~0
Test C~7~0~~~~72742060~~~Z352~06/12/12~ABC~~0~20.60845679~17.54889351~15.52912556~12.43279217~12.43279217~10.32033576~9.35296144~8.09245899~0~0
...and here's what I expect to get back
Test A~7~1~~~~72742050~~~Z370~10/25/11~~~0~8.58563698~6.40910452~4.59198764~3.18239469~1.72955975~.23345372~-1.30891113~-2.89971394~1~0
Test B~7~0~~~~72742060~~~Z351~05/15/12~~~0~18.88910518~12.69425528~9.96182381~6.76077612~6.76077612~3.86279298~.22449489~-.91021010~0~0
Test C~7~0~~~~72742060~~~Z352~06/12/12~ABC~~0~20.60845679~17.54889351~15.52912556~12.43279217~12.43279217~10.32033576~9.35296144~8.09245899~0~0
but the file comes through with line 2 completely unchanged.
You are trying to replace column 12 instead of 11:
\([^~]*~[^~]*~0~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~\)\(\d{2}\/\d{2}\/\d{2}~\)\(.*$\)
1 2 3 4 5 6 7 8 9 10 11 12
If just removing one of the [^~]*~ from the end of the first group doesn't fix it, it could be because your version of sed doesn't support either \d or repetition with {2} (although escaping the curly brackets would probably fix that).
Here is a version that should work everywhere which replaces each \d{2} with [0-9][0-9] (and fixes the incorrect column issue mentioned above):
s/\([^~]*~[^~]*~0~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~[^~]*~\)\([0-9][0-9]\/[0-9][0-9]\/[0-9][0-9]~\)\(.*$\)/\1~\3/

c# split text file by changing the line number

I'm trying to split text file by line numbers,
for example, if I have text file like:
1 ljhgk uygk uygghl \r\n
1 ljhg kjhg kjhg kjh gkj \r\n
1 kjhl kjhl kjhlkjhkjhlkjhlkjhl \r\n
2 ljkih lkjhl kjhlkjhlkjhlkjhl \r\n
2 lkjh lkjh lkjhljkhl \r\n
3 asdfghjkl \r\n
3 qweryuiop \r\n
I want to split it to 3 parts (1,2,3),
How can I do this? the size of the text is very large (~20,000,000 characters) and I need an efficient way (like regex).
Another idea, you can use linq to get the groups you're after, by splitting by each first word. Note that this will take each first word, so make sure you only have numbers there. This is using the split/join antipattern, but it seems to work nice here.
var lines = from line in s.Split("\r\n".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries)
let lineNumber = line.Split(" ".ToCharArray(), 2).FirstOrDefault()
group line by lineNumber
into g
select String.Join("\n", g);
Notes:
GroupBy is gurenteed to return lines in the order they appeared.
If a block appears more than once (e.g. "1 1 2 2 3 3 1"), all blocks with the same number will be merged.
You can use a regex, but Split will not work too well. You can Match for the following pattern:
^(\d).*$ # Match first line, capture number
([\r\n]+^\1.*$)* # Match additional lines that begin with the same number
Example: here
I did try to split by$(?<=^(\d+).*)[\r\n]+^(?!\1), but it adds the line numbers as additional elementnt in the array.