I have the following regular expression for range 288-303 but it is not working in GVim.
The regexp is :/28[89]|29[0-9]|30[0-3]/.
Could anyone please point out the reason. I referred Stack Overflow and got the regexp from http://utilitymill.com/utility/Regex_For_Range/42.
You have to escape the pipe in Vim:
:/28[89]\|29[0-9]\|30[0-3]/
Edit:
Per #Tim's comment, you can optionally prefix the pattern with \v instead of escaping the individual pipe characters:
:/\v28[89]|29[0-9]|30[0-3]/
Thanks #Tim.
Based on Jim answer I made a little script to search for integers in a given range. You use the command like :
:Range 341 752
This will match each sequence of digit between the two numbers 341 and 752.
Using a search like
/\%(3\%(\%(4\%([1-9]\)\)\|\%([5-9]\d\{1}\)\|\%(9\%([0-9]\)\)\)\)\|\%([4-7]\d\{2}\)\|\%(7\%(\%(0\%([0-9]\)\)\|\%([1-5]\d\{1}\)\|\%(5\%([0-2]\)\)\)\)
Just add that to your vimrc
function! RangeMatch(min,max)
let l:res = RangeSearchRec(a:min,a:max)
execute "/" . l:res
let #/=l:res
endfunction
"TODO if both number don't have same number of digit
function! RangeSearchRec(min,max) " suppose number with the same number of digit
if len(a:max) == 1
return '[' . a:min . '-' . a:max . ']'
endif
if a:min[0] < a:max[0]
" on cherche de a:min jusqu'à 99999 x times puis de (a:min[0]+1)*10^x à a:max[0]*10^x
let l:zeros=repeat('0',len(a:max)-1) " string (a:min[0]+1 +) 000000
let l:res = '\%(' . a:min[0] . '\%(' . RangeSearchRec( a:min[1:], repeat('9',len(a:max)-1) ) . '\)\)' " 657 à 699
if a:min[0] +1 < a:max[0]
let l:res.= '\|' . '\%('
let l:res.= '[' . (a:min[0]+1) . '-' . a:max[0] . ']'
let l:res.= '\d\{' . (len(a:max)-1) .'}' . '\)' "700 a 900
endif
let l:res.= '\|' . '\%(' . a:max[0] . '\%(' . RangeSearchRec( repeat('0',len(a:max)-1) , a:max[1:] ) . '\)\)' " 900 a 957
return l:res
else
return '\%(' . a:min[0] . RangeSearchRec(a:min[1:],a:max[1:]) . '\)'
endif
endfunction
command! -nargs=* Range call RangeMatch(<f-args>)
Note that the \%(\) matching parenthesis instead of \(\) avoid a ERROR E872: (NFA regexp) Too many '('
The script look between 341-399 or 400-699 or 700-752
Related
Context: I am dealing with a mix of boolean and arithmetic expressions that may look like in the following example:
b_1 /\ (0 <= x_1) /\ (x_2 <= 2 \/ (b_3 /\ ((/ 1 3) <= x_4))))
I want to match and extract any constraint of the shape A <= B contained in the formula which must be always true. In the above example, only 0 <= x_1 would satisfy such criterion.
Current Goal:
My idea is to build a simple parse tree of the input formula focusing only on the following tokens: and (/\), or (\/), left bracket (() and right bracket ()). Given the above formula, I would like to generate the following AST:
/\
|_ "b_1"
|_ /\
|_ "0 <= x_1"
|_ \/
|_ "x_2 <= 2"
|_ /\
|_ "b_3"
|_ "(/ 1 3) <= x_4"
Then, I can simply walk through the AST and discard any sub-tree rooted at \/.
My Attempt:
Looking at this documentation, I am defining the grammar for the lexer as follows:
import ply.lex as lex
tokens = (
"LPAREN",
"RPAREN",
"AND",
"OR",
"STRING",
)
t_AND = r'\/\\'
t_OR = r'\\\/'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_ignore = ' \t\n'
def t_error(t):
print(t)
print("Illegal character '{}'".format(t.value[0]))
t.lexer.skip(1)
def t_STRING(t):
r'^(?!\)|\(| |\t|\n|\\\/|\/\\)'
t.value = t
return t
data = "b_1 /\ (x_2 <= 2 \/ (b_3 /\ ((/ 1 3) <= x_4))"
lexer = lex.lex()
lexer.input(data)
while True:
tok = lexer.token()
if not tok:
break
print(tok.type, tok.value, tok.lineno, tok.lexpos)
However, I get the following output:
~$ python3 lex.py
LexToken(error,'b_1 /\\ (x_2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,0)
Illegal character 'b'
LexToken(error,'_1 /\\ (x_2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,1)
Illegal character '_'
LexToken(error,'1 /\\ (x_2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,2)
Illegal character '1'
AND /\ 1 4
LPAREN ( 1 7
LexToken(error,'x_2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,8)
Illegal character 'x'
LexToken(error,'_2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,9)
Illegal character '_'
LexToken(error,'2 <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,10)
Illegal character '2'
LexToken(error,'<= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,12)
Illegal character '<'
LexToken(error,'= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,13)
Illegal character '='
LexToken(error,'2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))',1,15)
Illegal character '2'
OR \/ 1 17
LPAREN ( 1 20
LexToken(error,'b_3 /\\ ((/ 1 3) <= x_4))',1,21)
Illegal character 'b'
LexToken(error,'_3 /\\ ((/ 1 3) <= x_4))',1,22)
Illegal character '_'
LexToken(error,'3 /\\ ((/ 1 3) <= x_4))',1,23)
Illegal character '3'
AND /\ 1 25
LPAREN ( 1 28
LPAREN ( 1 29
LexToken(error,'/ 1 3) <= x_4))',1,30)
Illegal character '/'
LexToken(error,'1 3) <= x_4))',1,32)
Illegal character '1'
LexToken(error,'3) <= x_4))',1,34)
Illegal character '3'
RPAREN ) 1 35
LexToken(error,'<= x_4))',1,37)
Illegal character '<'
LexToken(error,'= x_4))',1,38)
Illegal character '='
LexToken(error,'x_4))',1,40)
Illegal character 'x'
LexToken(error,'_4))',1,41)
Illegal character '_'
LexToken(error,'4))',1,42)
Illegal character '4'
RPAREN ) 1 43
RPAREN ) 1 44
The t_STRING token is not correctly recognized as it should.
Question: how to set the catch all regular expression for t_STRING so as to get a working tokenizer?
Your regular expression for T_STRING most certainly doesn't do what you want. What it does do is a little more difficult to answer.
In principle, it consists only of two zero-length assertions: ^, which is only true at the beginning of the string (unless you provide the re.MULTILINE flag, which you don't), and a long negative lookahead assertion.
A pattern which consists only of zero-length assertions can only match the empty string, if it matches anything at all. But lexer patterns cannot be allowed to match the empty string. Lexers divide the input into a series of tokens, so that every character in the input belongs to some token. Each match -- and they are all matches, not searches -- starts precisely at the end of the previous match. So if a pattern could match the empty string, the lexer would try the next match at the same place, with the same result, which would be an endless loop.
Some lexer generators solve this problem by forcing a minimum one-character match using a built-in catch-all error pattern, but Ply simply refuses to generate a lexer if a pattern matches the empty string. Yet Ply does not complain about this lexer specification. The only possible explanation is that the pattern cannot match anything.
The key is that Ply compiles all patterns using the re.VERBOSE flag, which allows you to separate items in regular expressions with whitespace, making the regexes slightly less unreadable. As the Python documentation indicates:
Whitespace within the pattern is ignored, except when in a character class, or when preceded by an unescaped backslash, or within tokens like *?, (?: or (?P<...>.
Whitespace includes newlines and even comments (starting with a # character), so you can split patterns over several lines and insert comments about each piece.
We could do that, in fact, with your pattern:
def t_STRING(t):
r'''^ # Anchor this match at the beginning of the input
(?! # Don't match if the next characters match:
\) | # Close parenthesis
\( | # Open parenthesis
\ | # !!! HERE IS THE PROBLEM
\t | # Tab character
\n | # Newline character
\\\/ | # \/ token
\/\\ # /\ token
)
'''
t.value = t
return t
So as I added whitespace and comments to your pattern, I had to notice that the original pattern attempted to match a space character as an alternative with | |. But since the pattern is compiled as re.VERBOSE, that space character is ignored, leaving an empty alternative, which matches the empty string. That alternative is part of a negative lookahead assertion, which means that the assertion will fail if the string to match at that point starts with the empty string. Of course, every string starts with the empty string, so the negative lookahead assertion always fails, explaining why Ply didn't complain (and why the pattern never matches anything).
Regardless of that particular glitch, the pattern cannot be useful because, as mentioned already, a lexer pattern must match some characters, and so a pattern which only matches the empty string cannot be useful. What we want to do is match any character, providing that the negative lookahead (corrected, as below) allows it. So that means that the negative lookahead assertion show be followed with ., which will match the next character.
But you almost certainly don't want to match just one character. Presumably you wanted to match a string of characters which don't match any other token. So that means putting the negative lookahead assertion and the following . into a repetition. And remember that it needs to be a non-empty repetition (+, not *), because patterns must not have empty matches.
Finally, there is absolutely no point using an anchor assertion, because that would limit the pattern to matching only at the beginning of the input, and that is certainly not what you want. It's not at all clear what it is doing there. (I've seen recommendations which suggest using an anchor with a negative lookahead search, which I think are generally misguided, but that discussion is out of scope for this question.)
And before we write the pattern, let's make one more adjustment: in a Python regular expression, if you can replace a set of alternatives with a character class, you should do so because it is a lot more efficient. That's true even if only some of the alternatives can be replaced.
So that produces the following:
def t_STRING(t):
r'''(
(?! # Don't match if the next characters match:
[() \t\n] | # Parentheses or whitespace
\\\/ | # \/ token
\/\\ # /\ token
) . # If none of the above match, accept a character
)+ # and repeat as many times as possible (at least once)
'''
return t
I removed t.value = t. t is a token object, not a string, and the value should be the string it matched. If you overwrite the value with a circular reference, you won't be able to figure out which string was matched.
This works, but not quite in the way you intended. Since whitespace characters are excluded from T_STRING, you don't get a single token representing (/ 1 3) <= x_4. Instead, you get a series of tokens:
STRING b_1 1 0
AND /\ 1 4
LPAREN ( 1 7
STRING x_2 1 8
STRING <= 1 12
STRING 2 1 15
OR \/ 1 17
LPAREN ( 1 20
STRING b_3 1 21
AND /\ 1 25
LPAREN ( 1 28
LPAREN ( 1 29
STRING / 1 30
STRING 1 1 32
STRING 3 1 34
RPAREN ) 1 35
STRING <= 1 37
STRING x_4 1 40
RPAREN ) 1 43
RPAREN ) 1 44
But I think that's reasonable. How could the lexer be able to tell that the parentheses in (x_2 <= 2 and (b_3 are parenthesis tokens, while the parentheses in (/ 1 3) <= x_4 are part of T_STRING? That determination will need to be made in your parser.
In fact, my inclination would be to fully tokenise the input, even if you don't (yet) require a complete tokenisation. As this entire question and answer shows, attempting to recognised "everything but..." can actually be a lot more complicated than just recognising all tokens. Trying to get the tokeniser to figure out which tokens are useful and which ones aren't is often more difficult than tokenising everything and passing it through a parser.
Based on the excellent answer from #rici, pointing the problem with t_STRING, this is my final version of the example that introduces smaller changes to the one proposed by #rici.
Code
##############
# TOKENIZING #
##############
tokens = (
"LPAREN",
"RPAREN",
"AND",
"OR",
"STRING",
)
def t_AND(t):
r'[ ]*\/\\[ ]*'
t.value = "/\\"
return t
def t_OR(t):
r'[ ]*\\\/[ ]*'
t.value = "\\/"
return t
def t_LPAREN(t):
r'[ ]*\([ ]*'
t.value = "("
return t
def t_RPAREN(t):
r'[ ]*\)[ ]*'
t.value = ")"
return t
def t_STRING(t):
r'''(
(?! # Don't match if the next characters match:
[()\t\n] | # Parentheses or whitespace
\\\/ | # \/ token
\/\\ # /\ token
) . # If none of the above match, accept a character
)+ # and repeat as many times as possible (at least once)
'''
return t
def t_error(t):
print("error: " + str(t.value[0]))
t.lexer.skip(1)
import ply.lex as lex
lexer = lex.lex()
data = "b_b /\\ (ccc <= 2 \\/ (b_3 /\\ ((/ 1 3) <= x_4))"
lexer.input(data)
while True:
tok = lexer.token()
if not tok:
break
print("{0}: `{1}`".format(tok.type, tok.value))
Output
STRING: `b_b `
AND: `/\`
LPAREN: `(`
STRING: `ccc <= 2 `
OR: `\/`
LPAREN: `(`
STRING: `b_3 `
AND: `/\`
LPAREN: `(`
LPAREN: `(`
STRING: `/ 1 3`
RPAREN: `)`
STRING: `<= x_4`
RPAREN: `)`
RPAREN: `)`
I have a string:
s='articles[zone.id=1].comments[user.status=active].user'
Looking to split (via split(some_regex_here)). The split needs to occur on every period other than those inside the bracketed substring.
Expected output:
["articles[zone.id=1]", "comments[user.status=active]", "user"]
How would I go about this? Or is there something else besides split(), I should be looking at?
Try this,
s.split(/\.(?![^\[]*\])/)
I got this result,
2.3.2 :061 > s.split(/\.(?![^\[]*\])/)
=> ["articles[zone.id=1]", "comments[user.status=active]", "user"]
You can also test it here:
https://rubular.com/r/LaxEFQZJ0ygA3j
I assume the problem is to split on periods that are not within matching brackets.
Here is a non-regex solution that works with any number of nested brackets. I've assumed the brackets are all matched, but it would not be difficult to check that.
def split_it(s)
left_brackets = 0
s.each_char.with_object(['']) do |c,a|
if c == '.' && left_brackets.zero?
a << '' unless a.last.empty?
else
case c
when ']' then left_brackets -= 1
when '[' then left_brackets += 1
end
a.last << c
end
end.tap { |a| a.pop if a.last.empty? }
end
split_it '.articles[zone.id=[user.loc=1]].comments[user.status=active].user'
#=> ["articles[zone.id=[user.loc=1]]", "comments[user.status=active]", "user"]
How to skip an unmatched line in input on replacing by regex?
For Ex. Below is the contents of my test.txt
elkay_iyer#yahoo.com
elkay_qwer#yahoo.com
elke engineering ltd.,#yahoo.com
elke0265#yahoo.com
elke#yahoo.com
Below is my Autohotkey script with regex code
ReplaceEmailsRegEx := "i)([a-z0-9]+(\.*|\_*|\-*))+#([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}"
RemoveDuplicateCharactersRegEx := "s)(.)(?=.*\1)"
Try{
FileRead, EmailFromTxtFile, test.txt
OtherThanEmails :=RegExReplace(EmailFromTxtFile,ReplaceEmailsRegEx)
Chars :=RegExReplace(OtherThanEmails,RemoveDuplicateCharactersRegEx)
Loop{
StringReplace, OtherThanEmails, OtherThanEmails, `r`n`r`n,`r`n, UseErrorLevel
If ErrorLevel = 0
Break
}
If (StrLen(OtherThanEmails)){
Msgbox The Characters found other than email:`n%OtherThanEmails%
}
}
catch e {
ErrorString:="what: " . e.what . "file: " . e.file . " line: " . e.line . " msg: " . e.message . " extra: " . e.extra
Msgbox An Exception was thrown`n%ErrorString%
}
Return
When it replace on test.txt it throws error:
e.what contains 'RegExReplace', e.line is 10
It executes without error when I remove 3rd email in test.txt. So how to change my regex to skip the problematic string?
The problem you have is catastrophic backtracking due to the nested quantifier in the beginning: ([a-z0-9]+(\.*|\_*|\-*))+. Here, the ., _ and - are all optional due to the * quantifier and thus your pattern gets reduced to ([a-z0-9]+)+.
I suggest "unrolling" the first subpattern to make it linear:
i)[a-z0-9]+(?:(?:\.+|_+|-+)[a-z0-9]+)*#([a-z][-a-z0-9]+\.)+[a-z]{2,6}
Or
i)[a-z0-9]+(?:([._-])\1*[a-z0-9]+)*#(?:[a-z][-a-z0-9]+\.)+[a-z]{2,6}
You may even remove \1* if you do not allow more than 1 . or _ or - in between "words".
Also, there is no need in using \-* with alternation in (\.|\-*\.), as the hyphen is matched with the previous character class, thus, this subpattern can be reduced to \..
See the regex demo
I want to write a simple regex, in vim, that will find all strings lexicographically smaller than another string.
Specifically, I want to use this to compare dates formatted as 2014-02-17. These dates are lexicographically sortable, which is why I use them.
My specific use case: I'm trying to run through a script and find all the dates that are earlier than today's today.
I'm also OK with comparing these as numbers, or any other solution.
I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)
function! Convert_to_char_class(cur)
if a:cur =~ '[2-9]'
return '[0-' . (a:cur-1) . ']'
endif
return '0'
endfunction
function! Match_number_before(num)
let branches = []
let init = ''
for i in range(len(a:num))
if a:num[i] =~ '[1-9]'
call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
endif
let init .= a:num[i]
endfor
return '\%(' . join(branches, '\|') .'\)'
endfunction
function! Match_date_before(date)
if a:date !~ '\v\d{4}-\d{2}-\d{2}'
echo "invalid date"
return
endif
let branches =[]
let parts = split(a:date, '-')
call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))
return '\%(' . join(branches, '\|') .'\)'
endfunction
To use you the following to search for all matches before 2014-02-24.
/<C-r>=Match_date_before('2014-02-24')
You might be able to wrap it in a function to set the search register if you wanted to.
The generated regex for dates before 2014-02-24 is the following.
\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)
It does not do any validation of dates. It assumes if you are in that format you are a date.
Equivalent set of functions for matching after the passed in date.
function! Convert_to_char_class_after(cur)
if a:cur =~ '[0-7]'
return '[' . (a:cur+1) . '-9]'
endif
return '9'
endfunction
function! Match_number_after(num)
let branches = []
let init = ''
for i in range(len(a:num))
if a:num[i] =~ '[0-8]'
call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
endif
let init .= a:num[i]
endfor
return '\%(' . join(branches, '\|') .'\)'
endfunction
function! Match_date_after(date)
if a:date !~ '\v\d{4}-\d{2}-\d{2}'
echo "invalid date"
return
endif
let branches =[]
let parts = split(a:date, '-')
call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))
return '\%(' . join(branches, '\|') .'\)'
endfunction
The regex generated was
\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)
You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with
if DateCmp(date, '2014-02-24') < 0
" ...
endif
In that case, try this function.
" Compare formatted date strings:
" #param String date1, date2
" dates in YYYY-MM-DD format, e.g. '2014-02-24'
" #return Integer
" negative, zero, or positive according to date1 < date2, date1 == date2, or
" date1 > date2
function! DateCmp(date1, date2)
let [year1, month1, day1] = split(a:date1, '-')
let [year2, month2, day2] = split(a:date2, '-')
if year1 != year2
return year1 - year2
elseif month1 != month2
return month1 - month2
else
return day1 - day2
endif
endfun
If you really want a regular expression, then try this:
" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date. Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
let init = ''
let branches = []
for c in split(a:date, '\zs')
if c =~ '[1-9]'
call add(branches, init . '[0-' . (c-1) . ']')
endif
let init .= c
endfor
return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun
Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,
:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar
EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.
Use nested subpatterns. It starts simple, with the century:
[01]\d\d\d-\d\d-\d\d|20
As for each digit to follow, use one of the following patterns; you may want to replace .* by an appropriate sequence of \d and -.
for 0: (0
for 1: (0.*|1
for 2: ([01].*|2
for 3: ([0-2].*|3
for 4: ([0-3].*|4
for 5: ([0-4].*|5
for 6: ([0-5].*|6
for 7: ([0-6].*|7
for 8: ([0-7].*|8
for 9: ([0-8].*|9
For the last digit, you only need the digit range, e.g.:
[0-6]
Finally, all parentheses should be closed:
)))))
In the example of 2014-02-17, this becomes:
[01]\d\d\d-\d\d-\d\d|20
(0\d-\d\d-\d\d|1
([0-3]-\d\d-\d\d|4
-
(0
([01]-\d\d|2
-
(0\d|1
[0-6]
)))))
Now in one line:
[01]\d\d\d-\d\d-\d\d|20(0\d-\d\d-\d\d|1([0-3]-\d\d-\d\d|4-(0([01]-\d\d|2-(0\d|1[0-6])))))
For VIM, let's not forget to escape (, ) and |:
[01]\d\d\d-\d\d-\d\d\|20\(0\d-\d\d-\d\d\|1\([0-3]-\d\d-\d\d\|4-\(0\([01]-\d\d\|2-\(0\d\|1[0-6]\)\)\)\)\)
Would be best to try and generate this (much like in FDinoff's answer), rather than write it yourself...
Update:
Here is a sample AWK script to generate the correct regex for any date yyyy-mm-dd.
#!/usr/bin/awk -f
BEGIN { # possible overrides for non-VIM users
switch (digit) {
case "ascii" : digit = "[0-9]"; break;
case "posix" : digit = "[:digit:]"; break;
default : digit = "\\d";
}
switch (metachar) {
case "unescaped" : escape = ""; break;
default : escape = "\\";
}
}
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$/ {
print BuildRegex($0);
}
function BuildRegex(s) {
if (s ~ /^[1-9][^1-9]*$/) {
regex = LessThanOnFirstDigit(s);
}
else {
regex = substr(s, 1, 1) BuildRegex(substr(s, 2)); # recursive call
if (s ~ /^[1-9]/) {
regex = escape "(" LessThanOnFirstDigit(s) escape "|" regex escape ")";
}
}
return regex;
}
function LessThanOnFirstDigit(s) {
first = substr(s, 1, 1) - 1;
rest = substr(s, 2);
gsub(/[0-9]/, digit, rest);
return (first ? "[0-" first "]" : "0") rest;
}
Call it like this:
echo 2014-02-17 | awk -f genregex.awk
Of course, you can write such a simple generator in any language you like.
Would be nice to do it in Vimscript, but I have no experience with that, so I will leave that as a home assignment.
If you wanted to search for all dates that were less than 2014-11-23, inclusive, you would use the following regex.
2014-(?:[1-9]|1[0-1])-(?:[1-9]|1[0-9]|2[0-3])
for a better explanation of the regex visit regex101.com and paste the regex in. You can also test it by using that site.
The basics of the regex are to search all dates that:
start with 2014-
either contain a single character from 1 - 9
or a 1 and a single character from 0 - 1, i.e. numbers from 1 - 11
finished by - and numbers from 1 - 23 done in the same style as the second term
I would like to visual select backwards a calculation p.e.
200 + 3 This is my text -300 +2 + (9*3)
|-------------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
The reason is that I will use it in insert mode.
After writing a calculation I want to select the calculation (using a map) and put the results of the calculation in the text.
What the regex must do is:
- select from the cursor (see * in above example) backwards to the start of the calculation
(including \/-+*:.,^).
- the calculation can start only with log/sqrt/abs/round/ceil/floor/sin/cos/tan or with a positive or negative number
- the calculation can also start at the beginning of the line but it never goes back to
a previous line
I tried in all ways but could not find the correct regex.
I noted that backward searching is different then forward searching.
Can someone help me?
Edit
Forgot to mention that it must include also the '=' if there is one and if the '=' is before the cursor or if there is only space between the cursor and '='.
It must not include other '=' signs.
200 + 3 = 203 -300 +2 + (9*3) =
|-------------------|<SPACES>*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|<SPACES>*
* = where the cursor is
A regex that comes close in pure vim is
\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*
There are limitations: subexpressions (including function arguments) aren't parsed. You'd need to use a proper grammar parser to do that, and I don't recommend doing that in pure vim1
Operator Mapping
To enable using this a bit like text-objects, use something like this in your $MYVIMRC:
func! DetectExpr(flag)
let regex = '\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*'
return searchpos(regex, a:flag . 'ncW', line('.'))
endf
func! PositionLessThanEqual(a, b)
"echo 'a: ' . string(a:a)
"echo 'b: ' . string(a:b)
if (a:a[0] == a:b[0])
return (a:a[1] <= a:b[1]) ? 1 : 0
else
return (a:a[0] <= a:b[0]) ? 1 : 0
endif
endf
func! SelectExpr(mustthrow)
let cpos = getpos(".")
let cpos = [cpos[1], cpos[2]] " use only [lnum,col] elements
let begin = DetectExpr('b')
if ( ((begin[0] == 0) && (begin[1] == 0))
\ || !PositionLessThanEqual(begin, cpos) )
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos)
call setpos('.', [0, begin[0], begin[1], 0])
let end = DetectExpr('e')
if ( ((end[0] == 0) || (end[1] == 0))
\ || !PositionLessThanEqual(cpos, end) )
call setpos('.', [0, cpos[0], cpos[1], 0])
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
norm! v
call setpos('.', [0, end[0], end[1], 0])
return 1
endf
silent! unmap X
silent! unmap <M-.>
xnoremap <silent>X :<C-u>call SelectExpr(0)<CR>
onoremap <silent>X :<C-u>call SelectExpr(0)<CR>
Now you can operator on the nearest expression around (or after) the cursor position:
vX - [v]isually select e[X]pression
dX - [d]elete current e[X]pression
yX - [y]ank current e[X]pression
"ayX - id. to register a
As a trick, use the following to arrive at the exact ascii art from the OP (using virtualedit for the purpose of the demo):
Insert mode mapping
In response to the chat:
" if you want trailing spaces/equal sign to be eaten:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?)?\s*$', '\=string(eval(submatch(1)))', '')<CR>
" but I'm assuming you wanted them preserved:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?\s*)?$', '\=string(eval(submatch(1))) . submatch(2)', '')<CR>
allows you to hit Alt-. during insert mode and the current expression gets replaced with it's evaluation. The cursor ends up at the end of the result in insert mode.
200 + 3 This is my text -300 +2 + (9*3)
This is text 0.25 + 2.000 + sqrt(15/1.5)
Tested by pressing Alt-. in insert 3 times:
203 This is my text -271
This is text 5.412278
For Fun: ascii art
vXoyoEsc`<jPvXr-r|e.
To easily test it yourself:
:let #q="vXoyo\x1b`<jPvXr-r|e.a*\x1b"
:set virtualedit=all
Now you can #q anywhere and it will ascii-decorate the nearest expression :)
200 + 3 = 203 -300 +2 + (9*3) =
|-------|*
|-------------------|*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|*
|-------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
1 consider using Vim's python integration to do such parsing
This seems quite a complicated task after all to achieve with regex, so if you can avoid it in any way, try to do so.
I've created a regex that works for a few examples - give it a try and see if it does the trick:
^(?:[A-Za-z]|\s)+((?:[^A-Za-z]+)?(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan)[^A-Za-z]+)(?:[A-Za-z]|\s)*$
The part that you are interested in should be in the first matching group.
Let me know if you need an explanation.
EDIT:
^ - match the beginning of a line
(?:[A-Za-z]|\s)+ - match everything that's a letter or a space once or more
match and capture the following 3:
((?:[^A-Za-z]+)? - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan) - match one of your keywords
[^A-Za-z]+) - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:[A-Za-z]|\s)* - match everything that's a letter or a space zero or more times
$ - match the end of the line