Regular expressions negative lookahead - regex

I'm doing some regular expression gymnastics. I set myself the task of trying to search for C# code where there is a usage of the as-operator not followed by a null-check within a reasonable amount of space. Now I don't want to parse the C# code. E.g. I want to capture code snippets such as
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)
Thus any random null-check will count as a "good check" and hence not found.
The question is: How do I match something while ensuring something else is not found in its sourroundings.
I've tried the naive approach, looking for 'as' then doing a negative lookahead within a 150 characters.
\bas\b.{1,150}(?!\b==\s*null\b)
The above regular expression matches all of the above examples infortunately. My gut tells me, the problem is that the looking ahead and then doing negative lookahead can find many situations where the lookahead does not find the '== null'.
If I try negating the whole expression, then that doesn't help either, at that would match most C# code around.

I love regex gymnastics! Here is a commented PHP regex:
$re = '/# Find all AS, (but not preceding a XX == null).
\bas\b # Match "as"
(?= # But only if...
(?: # there exist from 1-150
[\S\s] # chars, each of which
(?!==\s*null) # are NOT preceding "=NULL"
){1,150}? # (and do this lazily)
(?: # We are done when either
(?= # we have reached
==\s*(?!null) # a non NULL conditional
) #
| $ # or the end of string.
)
)/ix'
And here it is in Javascript style:
re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;
This one did make my head hurt a little...
Here is the test data I am using:
text = r""" var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
however, not capture
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
nor for that matter
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null) {...}
if(x1.a == y1.a)"""

Put the .{1,150} inside the lookahead, and replace . with \s\S (in general, . doesn't match newlines). Also, the \b might be misleading near the ==.
\bas\b(?![\s\S]{1,150}==\s*null\b)

I think it would help to put the variable name into () so you can use it as a back reference. Something like the following,
\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)

The question isn't clear. What do you want EXACTLY ? I regret, but I still don't understand, after having read the question and comments numerous times.
.
Must the code be in C# ? In Python ? Other ? There is no indication concerning this point
.
Do you want a matching only if a if(... == ...) line follows a block of var ... = ... lines ?
Or may an heterogenous line be BETWEEN the block and the if(... == ...) line without stopping the matching ?
My code takes the second option as true.
.
Does a if(... == null) line AFTER a if(... == ...) line stop the matchin or not ?
Unable to understand if it is yes or no, I defined the two regexes to catch these two options.
.
I hope my code will be clear enough and answering to your preoccupation.
It is in Python
import re
ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''
ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''
ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''
ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''
ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''
ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''
ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''
pat1 = re.compile(('('
'(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
'([\s\S](?!==\s*null\\b))*?'
'^if *\( *[^\s=]+ *==(?!\s*null).+$'
')'
),
re.MULTILINE)
pat2 = re.compile(('('
'(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
'([\s\S](?!==\s*null\\b))*?'
'^if *\( *[^\s=]+ *==(?!\s*null).+$'
')'
'(?![\s\S]{0,150}==)'
),
re.MULTILINE)
for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
print
print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
print '-----------------------------------------'
Result
>>>
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
None
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
None
-----------------------------------------
>>>

Let me try to redefine your problem:
Look for an "as" assignment -- you probably needs a better regex to look for actual assignments and may want to store the expression assigned, but let's use "\bas\b" for now
If you see an if (... == null) within 150 characters, don't match
If you don't see an if (... == null) within 150 characters, match
Your expression \bas\b.{1,150}(?!\b==\s*null\b) won't work because of the negative look-ahead. The regex can always skip ahead or behind one letter in order to avoid this negative look-ahead and you end up matching even when there is an if (... == null) there.
Regex's are really not good at not matching something. In this case, you're better of trying to match an "as" assignment with an "if == null" check within 150 characters:
\bas\b.{1,150}\b==\s*null\b
and then negating the check: if (!regex.match(text)) ...

(?s:\s+as\s+(?!.{0,150}==\s*null\b))
I'm activating the SingleLine option with ?s:. You can put it in the options of your Regex if you want. I'll add that I'm putting \s around as because I think that only spaces are "legal" around the as. You can probably put the \b like
(?s:\b+as\b(?!.{0,150}==\s*null\b))
Be aware that \s will probably catch spaces that aren't "valid spaces". It's defined as [\f\n\r\t\v\x85\p{Z}] where \p{Z} is Unicode Characters in the 'Separator, Space' Category plus Unicode Characters in the 'Separator, Line' Category plus Unicode Characters in the 'Separator, Paragraph' Category.

Related

Rust regexes live long enough for match but not find

I'm trying to understand why behavior for the match regex is different from the behavior for find, from documentation here.
I have the following for match:
use regex::Regex;
{
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.is_match(&meow);
dbg!(x)
}
And get:
[src/lib.rs:142] x = true
Great, now let's identify the location of the match:
{
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x)
}
And I get:
let x = re.find(&meow).unwrap();
^^^^^ borrowed value does not live long enough
}
^ `meow` dropped here while still borrowed
`meow` does not live long enough
I think I'm following the documentation. Why does the string meow live long enough for a match but not long enough for find?
Writing a value without ; at the end of a { } scope effectively returns that value out of the scope. For example:
fn main() {
let x = {
let y = 10;
y + 1
};
dbg!(x);
}
[src/main.rs:7] x = 11
Here, because we don't write a ; after the y + 1, it gets returned from the inner scope and written to x.
If you write a ; after it, you will get something different:
fn main() {
let x = {
let y = 10;
y + 1;
};
dbg!(x);
}
[src/main.rs:7] x = ()
Here you can see that the ; now prevents the value from being returned. Because no value gets returned from the inner scope, it implicitly gets the empty return type (), which gets stored in x.
The same happens in your code:
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.is_match(&meow);
dbg!(x)
};
dbg!(z);
}
[src/main.rs:9] x = true
[src/main.rs:12] z = true
Because you don't write a ; after the dbg!() statement, its return value gets returned from the inner scope. The dbg!() statement simply returns the value that gets passed to it, so the return value of the inner scope is x. And because x is just a bool, it gets returned without a problem.
Now let's look at your second example:
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x)
};
dbg!(z);
}
error[E0597]: `meow` does not live long enough
--> src/main.rs:8:25
|
4 | let z = {
| - borrow later stored here
...
8 | let x = re.find(&meow).unwrap();
| ^^^^^ borrowed value does not live long enough
9 | dbg!(x)
10 | };
| - `meow` dropped here while still borrowed
And now it should be more obvious what's happening: It's basically the same as the previous example, just that the returned x is now a type that internally borrows meow. And because meow gets destroyed at the end of the scope, x cannot be returned, as it would outlive meow.
The reason why x borrows from meow is because regular expression Matches don't actually copy the data they matched, they just store a reference to it.
So if you add a ;, you prevent the value from being returned from the scope, changing the scope return value to ():
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x);
};
dbg!(z);
}
[src/main.rs:9] x = Match {
text: "This is a long string that I am testing regexes on in rust.",
start: 27,
end: 28,
}
[src/main.rs:12] z = ()

invalid syntax and program error

forvalue n=1/18 {
if f_3_`n'_==1 {
local i= `0'
local y=`i'+1
gen ownagri_`y' = f123a_`y'_
replace ownagri_`y' = . if f_2_sel_`n' ==1
local i = `i'+1
}
else if f_3_`n'_==2 {
local i= `0'
local y=`i'+1
gen agri_`y' = f126_a1_`n'_
replace agri_`y' = .if f_2_sel_`n' ==1
local i = `i'+1
}
else if f_3_`n'_==3 {
local i= `0'
local y=`i'+1
gen nonagri_`y' = f126_a1_`n'_
replace nonagri_`y' = . if f_2_sel_`n' ==1
local i = `i'+1
}
else if f_3_`n'_==4 {
local i=`0'
local y=`i'+1 {
gen nonagriself_`y' = f128_`n'_
replace nonagriself_`y' = . if f_2_sel_`n' ==1
local i = `i'+1
}
else if f_3_`n'_==5 {
local i=`0'
local y=`i'+1
gen military_`y' = . if f_2_sel_`n' ==1
local i = `i'+1
}}
}
Stata says my command contains invalid syntax and there is program error: code follows on the same line as close brace.
EDIT:
forvalue n = 1/18 {
if f_3_`n'_==1 {
local y1 = 1
gen ownagri_`y1' = f123a_`y1'_
replace ownagri_`y' = . if f_2_sel_`n'_ ==1
local y1 = `y1'+1
}
else if f_3_`n'_==2 {
local y2 = 1
gen agri_`y2' = f126_a1_`y2'_
replace agri_`y2' = . if f_2_sel_`n'_ ==1
local y2 = `y2'+1
}
else if f_3_`n'_==3 {
local y3 = 1
gen nonagri_`y3' = f126_a1_`y3'_
replace nonagri_`y3' = . if f_2_sel_`n'_ ==1
local y3 = `y3'+1
}
else if f_3_`n'_==4 {
local y4 = 1
gen nonagriself_`y4' = f128_`y4'_
replace nonagriself_`y4' = . if f_2_sel_`n'_ ==1
local y4 = `y4'+1
}
else if f_3_`n'_==6 {
local y5 = 1
gen military_`y5' = f129a_`y5'_
replace military_`y5' = . if f_2_sel_`n'_ ==1 ,modify
local y5 = `y5'+1
}
}
I modified the code and the program seems to work, but the results generated seem to be incomplete. The result shows as follow:
(20,070 missing values generated)
(2,194 real changes made, 2,194 to missing)
(19,229 missing values generated)
(1,129 real changes made, 1,129 to missing)
Why?
Specific comments already made:
a. The }} on the next to last line should be } (William Lisowski)
b. Lines like
if f_3_`n'_==1
are evaluated as if
if f_3_`n'_[1] ==1
which is usually not what is wanted. See this FAQ for much more. But this is not a syntax error.
New specific comment:
c. The line
local y=`i'+1 {
has a spurious { that should be removed.
General comments:
A. Throwing a large chunk of code at us without context is poor question style. You're new to this, which is fine, but equally there is accessible advice for you to follow e.g. on good examples.
B. There is no context here on what you are trying to do and no data example. The Statalist advice on how to present data examples in its FAQ carries over to other sites with easy small modifications (e.g. the advice there on delimiters [CODE] and {/CODE] is irrelevant here).
C. There is repeated code in every branch which can be moved, producing this:
local i = `0'
local y = `i'+1
forvalue n = 1/18 {
if f_3_`n'_==1 {
gen ownagri_`y' = f123a_`y'_
replace ownagri_`y' = . if f_2_sel_`n' ==1
}
else if f_3_`n'_==2 {
gen agri_`y' = f126_a1_`n'_
replace agri_`y' = .if f_2_sel_`n' ==1
}
else if f_3_`n'_==3 {
gen nonagri_`y' = f126_a1_`n'_
replace nonagri_`y' = . if f_2_sel_`n' ==1
}
else if f_3_`n'_==4 {
gen nonagriself_`y' = f128_`n'_
replace nonagriself_`y' = . if f_2_sel_`n' ==1
}
else if f_3_`n'_==5 {
gen military_`y' = . if f_2_sel_`n' ==1
}
}
local i = `i'+1
Whether this code is what you want we cannot say, but it looks legal, and it's shorter than your original.

How do I include a regular expression in a function in R?

below I wrote a function which searches for specific regular expressions within a vector. The function always searches for regular expressions including "Beer" or "Wine" within a vector. Now I would like to include the regular Expressions I am searching for (In my case "Beer and Wine") as additional variables into the vector. How can I do this?
x <- c("Beer","Wine","wine","Beer","Beef","Potato","Vacation")
Thirsty <- function(x) {
Beer <- grepl("Beer",x, ignore.case = TRUE)
Beer <- as.numeric(Beer == "TRUE")
Wine <- grepl("Wine",x, ignore.case = TRUE)
Wine <- as.numeric(Wine == "TRUE")
Drink <- Beer + Wine
Drink <- as.numeric(Drink == "0")
Drink <- abs(Drink -1)
}
y <- Thirsty(x)
y
This can be done with the following code:
x <- c("Beer","Wine","wine","Beer","Beef","Potato","Vacation")
drinks <- c("Beer","Wine")
Thirsty <- function(x, drinks) {
Reduce("|",lapply(drinks, function(p)grepl(p,x, ignore.case = TRUE)))
}
y <- Thirsty(x,drinks)
y
lapply loops over the possibilities in drinks and produces a list of logical vectors, one for each drink. These are combined into a single vector by Reduce.
I would simply try to concatenate the match patterns with |
strings = c("Beer","Wine","wine","Beer","Beef","Potato","Vacation")
thirstStrings = c("beer", "wine")
matchPattern = paste0(thirstStrings, collapse = "|") #"beer|wine"
grep(pattern = matchPattern, x = strings, ignore.case = T)
# [1] 1 2 3 4
You can easily wrap that in a function
Thirsty = function(x, matchStrings){
matchPattern = paste0(matchStrings, collapse = "|") #"beer|wine"
grep(pattern = matchPattern, x = x, ignore.case = T)
}
Thirsty(strings, thirstStrings) # [1] 1 2 3 4
This should also work
Thirsty = function(vec, ...) {
pattern = paste0(unlist(list(...)), collapse = "|")
stringr::str_detect(tolower(vec), pattern)
}
> Thirsty (x, "beer", "wine")
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE

can I replace multiline like this in vim?

I want keep my js in this style. I want write a map in vim to do it faster.
from:
var a = x;
var b = y;
var c = z;
to:
var a = x
, b = y
, c = z
;
Use the following command.
%s/;\nvar /\r , /gc
My solution in a case like this would be
position cursor on first var (e.g. {)
1f;C-vjr,wC-vjexkJJ
For info, the Align script has the inverse operation:
var a = x, b = y, c = z;
VLeadera,, result:
var a = x;
var b = y;
var c = z;

How do add values of selective rows from a list in an functional style?

I solved my problem in an imperative style, but it looks very ugly. How can I make it better (more elegant, more concise, more functional - finally its Scala). Rows with the same values as the previous row, but with a different letter should be skipped, all other values of the rows should be added.
val row1 = new Row(20, "A", true) // add value
val row2 = new Row(30, "A", true) // add value
val row3 = new Row(40, "A", true) // add value
val row4 = new Row(40, "B", true) // same value as the previous element & different letter -> skip row
val row5 = new Row(60, "B", true) // add value
val row6 = new Row(70, "B", true) // add value
val row7 = new Row(70, "B", true) // same value as the previous element, but the same letter -> add value
val rows = List(row1, row2, row3, row4, row5, row6, row7)
var previousLetter = " "
var previousValue = 0.00
var countSkip = 0
for (row <- rows) {
if (row.value == previousValue && row.letter != previousLetter) {
row.relevant = false
countSkip += 1
}
previousLetter = row.letter
previousValue = row.value
}
// get sum
val sumValue = rows.filter(_.relevant == true).map(_.value) reduceLeftOption(_ + _)
val sum = sumValue match {
case Some(d) => d
case None => 0.00
}
assert(sum == 290)
assert(countSkip == 1)
Thanks in advance
Twistleton
(rows.head :: rows).sliding(2).collect{
case List(Row(v1,c1), Row(v2,c2)) if ! (v1 == v2 && c1 != c2) => v2 }.sum
I think the shortest (bulletproof) solution when Row is a case class (dropping the boolean) is
(for ((Row(v1,c1), Row(v2,c2)) <- (rows zip rows.take(1) ::: rows) if (v1 != v2 || c1 == c2)) yield v1).sum
Some of the other solutions don't handle the list-is-empty case, but this is largely because sliding has a bug where it will return a partial list if the list is too short. Clearer to me (and also bulletproof) is:
(rows zip rows.take(1) ::: rows).collect{
case (Row(v1,c1), Row(v2,c2)) if (v1 != v2 || c1 == c2) => v1
}.sum
(which is only two characters longer if you keep it on one line). If you need the number skipped also,
val indicated = (rows zip rows.take(1) ::: rows).collect {
case (Row(v1,c1), Row(v2,c2)) => (v1, v1 != v2 || c1 == c2)
}
val countSkip = indicated.filterNot(_._2).length
val sum = indicated.filter(_._2).map(_._1).sum
Fold it:
scala> rows.foldLeft((row1, 0))((p:(Row,Int), r:Row) => (r, p._2 + (if (p._1.value == r.value && p._1.letter != r.letter) 0 else r.value)))._2
res2: Int = 290
(new Row(0, " ", true) +: rows).sliding(2).map { case List(r1, r2) =>
if (r1.value != r2.value || r1.letter == r2.letter) { r2.value }
else { 0 }
}.sum
Of course you can drop the boolean member of Row if you do not need it for something else
Reduce it:
rows.reduceLeft { (prev, curr) =>
if (prev.value == curr.value && prev.letter != curr.letter) {
curr.relevant = false
countSkip += 1
}
curr
}