How do I capture all occurences in a string in Vim? - regex

I want to capture all certain occurrences in a string in Vimscript.
example:
let my_calculation = '200/3 + 23 + 100.5/3 -2 + 4*(200/2)'
How can I capture all numbers (including dots if there are) before and after the '/'? in 2 different variables:
- output before_slash: 200100.5200
- output after slash 332
How can I replace them if a condition occurs?
p.e. if after a single '/' there is no '.' add '.0' after this number
I tried to use matchstring and regex but after trying and trying I couldn't resolve it.

A useful feature that can be taken advantage of in this case is substitution
with an expression (see :help sub-replace-\=).
let [a; b] = [[]]
call substitute(s, '\(\d*\.\?\d\+\)/\(\d*\.\?\d\+\)\zs',
\ '\=add(a,submatch(1))[1:0]+add(b,submatch(2))[1:0]', 'g')

To answer the second part of the question:
let my_calculation = '200/3 + 23 + 100.5/3 -2 + 4*(200/2)'
echo substitute(my_calculation, '\(\/[0-9]\+\)\([^0-9.]\|$\)', '\1.0\2', 'g')
The above outputs:
200/3.0 + 23 + 100.5/3.0 -2 + 4*(200/2.0)

Give this a try:
function! GetNumbers(string)
let pairs = filter(split(a:string, '[^0-9/.]\+'), 'v:val =~ "/"')
let den = join(map(copy(pairs), 'matchstr(v:val, ''/\zs\d\+\(\.\d\+\)\?'')'), '')
let num = join(map(pairs, 'matchstr(v:val, ''\d\+\(\.\d\+\)\?\ze/'')'), '')
return [num, den]
endfunction
let my_calculation = '200/3 + 23 + 100.5/3 -2 + 4*(200/2)'
let [a,b] = GetNumbers(my_calculation)
echo a
echo b

Related

Regex to match despite some of the characters not matching pattern?

I'm working with some bioinformatics data, and I've got this sed expression:
sed -n 'N;/.*:\(.*\)\n.*\1/{p;n;p;n;p};D' file.txt
It currently takes a file that is structured such as:
#E00378:1485 1:N:0:ABC
ABCDEF ##should match, all characters present
+
#
#E00378:1485 1:N:1:ABC
XYZABX ##should match, with permutation
+
#
#E00378:1485 1:N:1:ABCDE
ZABCDXFGH ##should match, with permutation
+
#
#E00378:1485 1:N:1:CBA
ABC ##should not match, order not preserved
+
#
Then it returns 4 lines if the sequence after : is found in the second line, so in this case I would get:
#E00378:1485 1:N:0:ABC
ABCDEF
+
#
However, I am looking to expand my search a little, by adding the possibility of searching for any single permutation of the letters, while maintaining the order, such that ABX, ZBC, AHC, ABO would all match the search criteria ABC.
Is a search like this possible to construct as a one-liner? Or should I write a script?
I was thinking it should be possible to programmatically change one of the letters to a * in the pattern space.
I am trying to make something along the lines of an AWK pattern that has a match defined as:
p = "";
p = p "."a[2]a[3]a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]"."a[3]a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]"."a[4]a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]"."a[5]a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]"."a[6]a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]"."a[7]a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]a[6]"."a[8]"|";
p = p a[1]a[2]a[3]a[4]a[5]a[6]a[7]".";
m = p;
But I can't seem to figure out how to make it programmatically for n numbers.
Okay, check this out where fuzzy is your input above:
£ perl -0043 -MText::Fuzzy -ne 'if (/.*:(.*?)\n(.*?)\n/) {my ($offset, $edits, $distance) = Text::Fuzzy::fuzzy_index ($1, $2); print "$offset $edits $distance\n";}' fuzzy
3 kkk 0
5 kkd 1
5 kkkkd 1
Since you haven't been 100% clear on your "fuzziness" criteria (and can't be until you have a measurement tool), I'll explain this first. Reference here:
http://search.cpan.org/~bkb/Text-Fuzzy-0.27/lib/Text/Fuzzy.pod
Basically, for each record (which I've assumed are split on # which is the -0043 bit), the output is an offset, how the 1st string can become the 2nd string, and lastly the "distance" (Levenshtein, I would assume) between the two strings.
So..
£ perl -0043 -MText::Fuzzy -ne 'if (/.*:(.*?)\n(.*?)\n/) {my ($offset, $edits, $distance) = Text::Fuzzy::fuzzy_index ($1, $2); print "$_\n" if $distance < 2;}' fuzzy
#E00378:1485 1:N:0:ABC
ABCDEF
+
#
#E00378:1485 1:N:1:ABC
XYZABX
+
#
#E00378:1485 1:N:1:ABCDE
ZABCDXFGH
+
#
See here for installing perl modules like Text::Fuzzy
https://www.thegeekstuff.com/2008/09/how-to-install-perl-modules-manually-and-using-cpan-command/
Example input/output for a record that wouldn't be printed (distance is 3):
#E00378:1485 1:N:1:ABCDE
ZDEFDXFGH
+
#
gives us this (or simply doesn't print with the second perl command)
3 dddkk 3
Awk doesn't have sed back-references, but has more expressiveness to make up the difference. The following script composes the pattern for matching from the final field of the lead line, then applies the pattern to the subsequent line.
#! /usr/bin/awk -f
BEGIN {
FS = ":"
}
# Lead Line has 5 fields
NF == 5 {
line0 = $0
seq = $NF
getline
if (seq != "") {
n = length(seq)
if (n == 1) {
pat = seq
} else {
# ABC -> /.BC|A.C|AB./
pat = "." substr(seq, 2, n - 1)
for (i = 2; i < n; ++i)
pat = pat "|" substr(seq, 1, i - 1) "." substr(seq, i + 1, n - i)
pat = pat "|" substr(seq, 1, n - 1) "."
}
if ($0 ~ pat) {
print line0
print
getline; print
getline; print
next
}
}
getline
getline
}
If the above needs some work to form a different matching pattern, we mostly limit our modification to the lines of pattern composition. By the way... I noticed that sequences repeat -- to make this faster we can implement caching:
#! /usr/bin/awk -f
BEGIN {
FS = ":"
# Noticed that sequences repeat
# -- implement caching of patterns
split("", cache)
}
# Lead Line has 5 fields
NF == 5 {
line0 = $0
seq = $NF
getline
if (seq != "") {
if (seq in cache) {
pat = cache[seq]
} else {
n = length(seq)
if (n == 1) {
pat = seq
} else {
# ABC -> /.BC|A.C|AB./
pat = "." substr(seq, 2, n - 1)
for (i = 2; i < n; ++i)
pat = pat "|" substr(seq, 1, i - 1) "." substr(seq, i + 1, n - i)
pat = pat "|" substr(seq, 1, n - 1) "."
}
cache[seq] = pat
}
if ($0 ~ pat) {
print line0
print
getline; print
getline; print
next
}
}
getline
getline
}

RegEx for computer name validation (cannot be more than 15 characters long, be entirely numeric, or contain the following characters...)

I have these requirements to follow:
Windows computer name cannot be more than 15 characters long, be
entirely numeric, or contain the following characters: ` ~ ! # # $ % ^
& * ( ) = + _ [ ] { } \ | ; : . ' " , < > / ?.
I want to create a RegEx to validate a given computer name.
I can see that the only permitted character is - and so far I have this:
/^[a-zA-Z0-9-]{1,15}$/
which matches almost all constraints except the "not entirely numeric" part.
How to add last constraints to my RegEx?
You could use a negative lookahead:
^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$
Or simply use two regular expressions:
^[a-zA-Z0-9-]{1,15}$
AND NOT
^[0-9]{1,15}$;
Here is a live example:
var regex1 = /^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$/;
var regex2 = /^[a-zA-Z0-9-]{1,15}$/;
var regex3 = /^[0-9]{1,15}$/;
var text1 = "lklndlsdsvlk323";
var text2 = "4214124";
console.log(text1 + ":", !!text1.match(regex1));
console.log(text1 + ":", text1.match(regex2) && !text1.match(regex3));
console.log(text2 + ":", !!text2.match(regex1));
console.log(text2 + ":", text2.match(regex2) && !text2.match(regex3));

How to ignore case in regex

I want to check if a string contains two words "hello world". I am using something like this:
str = " aa bbb hEllo accc woRld"
str.matches( "(.*)" + "hello" + "(.*)" + "world" + "(.*)" );
How do I execute this regular expression as case-insensitive?
Try and put the case-insensitive modifier (?i) at the start of the regex:
str.matches( "(?i)(.)" + "hello" + "(.)" + "world" + "(.*)" );
Typically there is a flag that you can set. For many languages such as PHP/JS you would write your regex like: /REGEX/i with the i after your delimiters.
Perhaps this:
str.matches( "(.*)" + "([hH])" + "([eE])" + "([lL])" + "([lL])" + "([oO])" + "(.*)" + "([wW])" + "([oO])" + "([rR])" + "([lL])" + "([dD])" + "(.*)" );
I think there is a more efficient way than this awnser but anyway...
You can use the operator | (or) for each letter of both "hello" and "world" strings.
For instance with hello :
(H | h)(E | e)(L | l){2}(O | o)
Which means H or h then E or e then L or l (2 times) then O or o
I did not test this, but hope it will help you.
You can just lower case the string and compare it.
str = "dffdfHellodasfWorld"
re.findall("(.*)" + "hello" + "(.*)" + "world" + "(.*)", str.lower())
This is in python BTW.
Notice the str.lower()

Using regex in Scala to group and pattern match

I need to process phone numbers using regex and group them by (country code) (area code) (number). The input format:
country code: between 1-3 digits
, area code: between 1-3 digits
, number: between 4-10 digits
Examples:
1 877 2638277
91-011-23413627
And then I need to print out the groups like this:
CC=91,AC=011,Number=23413627
This is what I have so far:
String s = readLine
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val ret = pattern.findAllIn(s)
println("CC=" + ret.group(1) + "AC=" + ret.group(2) + "Number=" + ret.group(3));
The compiler said "empty iterator." I also tried:
val (cc,ac,n) = s
and that didn't work either. How to fix this?
The problem is with your pattern. I would recommend using some tool like RegexPal to test them. Put the pattern in the first text box and your provided examples in the second one. It will highlight the matched parts.
You added spaces between your groups and [ -] separators, and it was expecting spaces there. The correct pattern is:
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
Also if you want to explicitly get groups then you want to get a Match returned. For an example the findFirstMatchIn function returns the first optional Match or the findAllMatchIn returns a list of matches:
val allMatches = pattern.findAllMatchIn(s)
allMatches.foreach { m =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
}
val matched = pattern.findFirstMatchIn(s)
matched match {
case Some(m) =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
case None =>
println("There wasn't a match!")
}
I see you also tried extracting the string into variables. You have to use the Regex extractor in the following way:
val Pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val Pattern(cc, ac, n) = s
println(s"CC=${cc}AC=${ac}Number=$n")
And if you want to handle errors:
s match {
case Pattern(cc, ac, n) =>
println(s"CC=${cc}AC=${ac}Number=$n")
case _ =>
println("No match!")
}
Also you can also take a look at string interpolation to make your strings easier to understand: s"..."

How do I visual select a calculation backwards?

I would like to visual select backwards a calculation p.e.
200 + 3 This is my text -300 +2 + (9*3)
|-------------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
The reason is that I will use it in insert mode.
After writing a calculation I want to select the calculation (using a map) and put the results of the calculation in the text.
What the regex must do is:
- select from the cursor (see * in above example) backwards to the start of the calculation
(including \/-+*:.,^).
- the calculation can start only with log/sqrt/abs/round/ceil/floor/sin/cos/tan or with a positive or negative number
- the calculation can also start at the beginning of the line but it never goes back to
a previous line
I tried in all ways but could not find the correct regex.
I noted that backward searching is different then forward searching.
Can someone help me?
Edit
Forgot to mention that it must include also the '=' if there is one and if the '=' is before the cursor or if there is only space between the cursor and '='.
It must not include other '=' signs.
200 + 3 = 203 -300 +2 + (9*3) =
|-------------------|<SPACES>*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|<SPACES>*
* = where the cursor is
A regex that comes close in pure vim is
\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*
There are limitations: subexpressions (including function arguments) aren't parsed. You'd need to use a proper grammar parser to do that, and I don't recommend doing that in pure vim1
Operator Mapping
To enable using this a bit like text-objects, use something like this in your $MYVIMRC:
func! DetectExpr(flag)
let regex = '\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*'
return searchpos(regex, a:flag . 'ncW', line('.'))
endf
func! PositionLessThanEqual(a, b)
"echo 'a: ' . string(a:a)
"echo 'b: ' . string(a:b)
if (a:a[0] == a:b[0])
return (a:a[1] <= a:b[1]) ? 1 : 0
else
return (a:a[0] <= a:b[0]) ? 1 : 0
endif
endf
func! SelectExpr(mustthrow)
let cpos = getpos(".")
let cpos = [cpos[1], cpos[2]] " use only [lnum,col] elements
let begin = DetectExpr('b')
if ( ((begin[0] == 0) && (begin[1] == 0))
\ || !PositionLessThanEqual(begin, cpos) )
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos)
call setpos('.', [0, begin[0], begin[1], 0])
let end = DetectExpr('e')
if ( ((end[0] == 0) || (end[1] == 0))
\ || !PositionLessThanEqual(cpos, end) )
call setpos('.', [0, cpos[0], cpos[1], 0])
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
norm! v
call setpos('.', [0, end[0], end[1], 0])
return 1
endf
silent! unmap X
silent! unmap <M-.>
xnoremap <silent>X :<C-u>call SelectExpr(0)<CR>
onoremap <silent>X :<C-u>call SelectExpr(0)<CR>
Now you can operator on the nearest expression around (or after) the cursor position:
vX - [v]isually select e[X]pression
dX - [d]elete current e[X]pression
yX - [y]ank current e[X]pression
"ayX - id. to register a
As a trick, use the following to arrive at the exact ascii art from the OP (using virtualedit for the purpose of the demo):
Insert mode mapping
In response to the chat:
" if you want trailing spaces/equal sign to be eaten:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?)?\s*$', '\=string(eval(submatch(1)))', '')<CR>
" but I'm assuming you wanted them preserved:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?\s*)?$', '\=string(eval(submatch(1))) . submatch(2)', '')<CR>
allows you to hit Alt-. during insert mode and the current expression gets replaced with it's evaluation. The cursor ends up at the end of the result in insert mode.
200 + 3 This is my text -300 +2 + (9*3)
This is text 0.25 + 2.000 + sqrt(15/1.5)
Tested by pressing Alt-. in insert 3 times:
203 This is my text -271
This is text 5.412278
For Fun: ascii art
vXoyoEsc`<jPvXr-r|e.
To easily test it yourself:
:let #q="vXoyo\x1b`<jPvXr-r|e.a*\x1b"
:set virtualedit=all
Now you can #q anywhere and it will ascii-decorate the nearest expression :)
200 + 3 = 203 -300 +2 + (9*3) =
|-------|*
|-------------------|*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|*
|-------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
1 consider using Vim's python integration to do such parsing
This seems quite a complicated task after all to achieve with regex, so if you can avoid it in any way, try to do so.
I've created a regex that works for a few examples - give it a try and see if it does the trick:
^(?:[A-Za-z]|\s)+((?:[^A-Za-z]+)?(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan)[^A-Za-z]+)(?:[A-Za-z]|\s)*$
The part that you are interested in should be in the first matching group.
Let me know if you need an explanation.
EDIT:
^ - match the beginning of a line
(?:[A-Za-z]|\s)+ - match everything that's a letter or a space once or more
match and capture the following 3:
((?:[^A-Za-z]+)? - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan) - match one of your keywords
[^A-Za-z]+) - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:[A-Za-z]|\s)* - match everything that's a letter or a space zero or more times
$ - match the end of the line