Back-referencing in vim search - regex

In my file I have text of type:
1 module ABC(
2 …
3 …
any number of lines
n-1 …
n ABC …
I am trying to highlight both ABC in line 1 and ABC in line n.
I am using the following regex:
/module \(\<.\{-}\>\)\|^\s*\1
However, this regex is not working.
It highlights:
ABC line 1 correctly and
Space between line beginning and first character on all lines which is not expected.
My end goal is to to change the ABC at both places to have the same prefix and suffix.
In this case lets say both ABCs get changed to prefix_ABC_suffix.
Please help me fix the search regex.

Related

RegEx matching odd number of characters at the beginning and the end of string

I have grammar is Lezer where I need to match a "custom string" which can start with any odd number of " and end with the same corresponding number. It can span multiple lines as well and anything inside needs to be skipped as far as the parser is concerned. I am struggling a little with the regEx part of the matching.
str"test" // valid
str"""test""" // valid
str""test"" // not valid
str""""test"""" // not valid
I am trying to match the beginning and end of that string.
I tried among other things "("")*[^"] but it matches the first letter after the odd double quotes (due to the [^"] which is something I would like to avoid.
For matching the end I have a similar issue.
So with the given input of:
1 str"test"
2 str"""
3 a
4 b
5 c
6 """
7 str""nope""
I am trying to match only str" for line 1 and str""" for line 2 and not match on line 7.
Need to match the ends as well (not in the same regex). So the match should be on " for line 1 and """ for line 6.
I have this so far: start: ^str"("")*[^"] end: [^"]"("")*$ but it is not optimal.
FYI I need start and end since the expectation is when you start writing and we hit a match on the beginning the highlighting in the editor should highlight all remaining text as a string until you have a matching odd number of ".
Any advice is appreciated.

Using grep reverse to get rid of a line and a few before

I'd like to get rid of a line with a pattern containing:
CE1(2or8 # CE1(number 2 or 8
CE2(-1-17-2or8 # CE2(any number from -1 to 17, a dash, number 2 or 8
and 6 lines before that and 1 line after that.
grep -B6 -A1 'CE1([28]\|CE2([-1-17]-[28]' file
This attempt seems to match my pattern (does it do what I explicitly described?) but I was thinking of using reverse option to get rid of that pattern search from my file. Is it possible? It does not seem to work.
Not a complete answer, but some explanations:
A character class matches only one character. The hyphen in a character class, when it doesn't represent a literal hyphen (at the first position, at the end, when escaped or immediately after ^), defines a range of characters, but not a range of numbers. (make some tries with the ascii table on a corner to well understand.)
[-1-17] matches one of these characters that can be:
a literal hyphen (because at the beginning)
a character in the range 1-1 (so 1)
the character 7
To match an integer between -1 and 17, you need:
\(-1\|1[0-7]\|[0-9]\)
The simplest and most robust (since it works even when the skipped range includes lines that match the regexp or when the range runs off the start/end of the input file) approach, IMHO, is 2 passes - the first to identify the lines to be skipped and the second to skip those lines:
$ cat file
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
$ awk -v b=3 -v a=1 'NR==FNR{if (/f/) for (i=NR-b;i<=NR+a;i++) skip[i]; next} !(FNR in skip)' file file
a 1
b 2
h 8
i 9
Just change /f/ to /<your regexp of choice>/ and set the b(efore) and a(fter) values as you like.
As for your particular regexp, you didn't provide any sample input and expected output for us to test against but I THINK what you want might be:
awk -v b=6 -v a=1 'NR==FNR{if (/CE(1|2(-1|[0-9]|1[0-7])-)[28]/) for (i=NR-b;i<=NR+a;i++) skip[i]; next} !(FNR in skip)' file file

Notepad++ `Find in Files` displays all results except a certain keyword

I am using notepad++ to search for certain keywords (using regular expression). something like (word1|word2|this statement|another statement). It works but can I search and show all results except a certain keyword? something like exclude the following words (exclude this|exclude this)? For example, below.
samedir\File1.log
This is line 1
This is line 2
This is line 3
exclude this
This is line 4
This is line 5
This is line 6
not excluded
excluding this
samedir\File2.log
This is line 1 1
This is line 2 1
This is line 3 1
exclude this
This is line 4 1
This is line 5 1 1
This is line 6 1
not excluded
excluding this
For example: I want to start a find in both files (on the same directory) but exclude the lines with excluding this and exclude this
the results should show something like below
File1.log
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
not excluded
File2.log
This is line 1 1
This is line 2 1
This is line 3 1
This is line 4 1
This is line 5 1 1
This is line 6 1
not excluded
You can do this with a lookahead assertion:
^(?!excluding this|exclude this)[^\r\n]*$
This will match entire lines as long as they don't contain excluding this or exclude this.
The key is the (?!) part. See http://www.regular-expressions.info/lookaround.html for more info.
You could try the regex like below to match all the lines which don't have exclude or excluding this strings.
^(?!.*\bexclud(?:ing|e) this\b).+$
DEMO
This (?!.*\bexclud(?:ing|e) this\b) negative lookahead at the start asserts that there isn't a string exclude this or excluding this present on the the line in which we are going to match. That is , the above regex would match all the lines except the one which contains exclude this or excluding this
I wanted to exclude one string and search for two strings in one line, so I used this regex:
^(?!.*excludethisstring).*includethisstring1.*includethisstring2.*$
This will make it so that the one line searched MUST have the two strings included, if you want to search for either one of the lines:
^(?!.*excludethisstring).*(includethisstring1|includethisstring2).*$

Java regular expression to get a value in matlab

Hi I was wondering how I can do this in matlab: I have a file and somewhere in the file i have this string = "1 to 10 of 434M" . I would like to get the "434M". Though keeping in mind that the M can also be other letters (K or B), but is always a capital letter. The ciphers before the letter can be up to 3 chippers, but can also be smaller.
How would I get this out of a text in matlab?
Assume that you read your file line by line. Then for each line execute the following commands:
% line is current line of input file
[matchstart,~,~,~,tokenstring] = regexp(line, '1 to 10 of (\d+[MKB])');
if ~isempty(matchstart)
desired_string = tokenstring{1};
end
This regular expression matches at least one digit before M. (E. g. also 451274M) If it should only match numbers with 1 to 3 digits use:
'1 to 10 of (\d{1,3}[MKB])'

c# split text file by changing the line number

I'm trying to split text file by line numbers,
for example, if I have text file like:
1 ljhgk uygk uygghl \r\n
1 ljhg kjhg kjhg kjh gkj \r\n
1 kjhl kjhl kjhlkjhkjhlkjhlkjhl \r\n
2 ljkih lkjhl kjhlkjhlkjhlkjhl \r\n
2 lkjh lkjh lkjhljkhl \r\n
3 asdfghjkl \r\n
3 qweryuiop \r\n
I want to split it to 3 parts (1,2,3),
How can I do this? the size of the text is very large (~20,000,000 characters) and I need an efficient way (like regex).
Another idea, you can use linq to get the groups you're after, by splitting by each first word. Note that this will take each first word, so make sure you only have numbers there. This is using the split/join antipattern, but it seems to work nice here.
var lines = from line in s.Split("\r\n".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries)
let lineNumber = line.Split(" ".ToCharArray(), 2).FirstOrDefault()
group line by lineNumber
into g
select String.Join("\n", g);
Notes:
GroupBy is gurenteed to return lines in the order they appeared.
If a block appears more than once (e.g. "1 1 2 2 3 3 1"), all blocks with the same number will be merged.
You can use a regex, but Split will not work too well. You can Match for the following pattern:
^(\d).*$ # Match first line, capture number
([\r\n]+^\1.*$)* # Match additional lines that begin with the same number
Example: here
I did try to split by$(?<=^(\d+).*)[\r\n]+^(?!\1), but it adds the line numbers as additional elementnt in the array.