Further define a GAWK match and divide operation - regex

I have some TXT files with numbers in them that I need to divide by 4.
Text-line I'm matching and changing is:-
scale = 23 23
My little GAWK file looks like this:-
/scale [\=] [0-9]+ [0-9]+/ {
$3 = int($3/4)
$4 = int($4/4) }
{print}
So I successfully get "scale = 5 5"
But, I have 3 more requirements, however, and would love some help...
1) the "scale" parameter should only be that following another match called "detail" on some lines above it.
(so instead of simply matching every "scale = " it would be "detail(.....)scale = ") (any number/letter/+newline between them)
2) these values of "scale" should never be lower than 1.
(dividing anything lower than 6 should always give a result of 1 (just changing "scale = 0" to "scale = 1" after will do))
3) values should preferably round up instead of down.
(so instead of 5 here from 23, it is actually 5.75 and should round up to 6 (this isn't SO important, but would be nice))

Something like this perhaps?
awk '/detail/ { d=1 }
d && /scale = [0-9]+ [0-9]+/ && $3>1 && $4>1 {
$3 = $3<6 ? 1 : sprintf("%1.0f", $3/4)
$4 = $4<5 ? 1 : sprintf("%1.0f", $4/4)
d = 0 }
1'
sprintf with a suitable format specifier applies rounding (see e.g. https://www.gnu.org/software/gawk/manual/html_node/Round-Function.html)
The ternary operator x ? y : z produces y if x is true, otherwise z.
Notice also the minor simplifications (= doesn't need a backslash or a character class, and {print} can be shortened to just 1).

Related

RegEx vscode - replace decimal places and round correctly

Is it possible to use regex to round decimal places?
I have lines that look like this but without any spaces (space added for readability).
0, 162.3707542, -162.3707542
128.2, 151.8299471, -23.62994709 // this 151.829 should lead to 151.83
I want to remove all numbers after the second decimal position and if possible round the second decimal position based on the third position.
0, 162.37, -162.37
128.2, 151.82, -23.62 // already working .82
..., 151.83, ... // intended .83 <- this is my question
What is working
The following regex (see this sample on regex101.com) almost does what i want
([0-9]+\.)([0-9]{2})(\d{0,}) // search
$1$2 // replace
My understanding
The search works like this
group: ([0-9]+\.) find 1 up to n numbers and a point
group: ([0-9]{2}) followd by 2 numbers
group: (\d{0,}) followed by 0 or more numbers / digits
In visual-studio-code in the replacement field only group 1 and 2 are referenced $1$2.
This results in this substitution (regex101.com)
Question
Is it possible to change the last digit of $2 (group two) based on the first digit in $3 (group three) ?
My intention is to round correctly. In the sample above this would mean
151.8299471 // source
151.82 // current result
151.83 // desired result 2 was changed to 3 because of third digit 9
It is not only that you need to update the digit of $2. if the number is 199.995 you have to modify all digits of your result.
You can use the extension Regex Text Generator.
You can use a predefined set of regex's.
"regexTextGen.predefined": {
"round numbers": {
"originalTextRegex": "(-?\\d+\\.\\d+)",
"generatorRegex": "{{=N[1]:fixed(2):simplify}}"
}
}
With the same regex (-?\\d+\\.\\d+) in the VSC Find dialog select all number you want, you can use Find in Selection and Alt+Enter.
Then execute the command: Generate text based on Regular Expression.
Select the predefined option and press Enter a few times. You get a preview of the result, you can escape the UI and get back the original text.
In the process you can edit generatorRegex to change the number of decimals or to remove the simplify.
It was easier than I thought, once I found the Number.toFixed(2) method.
Using this extension I wrote, Find and Transform, make this keybinding in your keybindings.json:
{
"key": "alt+r", // whatever keybinding you want
"command": "findInCurrentFile",
"args": {
"find": "(-?[0-9]+\\.\\d{3,})", // only need the whole number as one capture group
"replace": [
"$${", // starting wrapper to indicate a js operation begins
"return $1.toFixed(2);", // $1 from the find regex
"}$$" // ending wrapper to indicate a js operation ends
],
// or simply in one line
// "replace": "$${ return $1.toFixed(2); }$$",
"isRegex": true
},
}
[The empty lines above are there just for readability.]
This could also be put into a setting, see the README, so that a command appears in the Command Palette with the title of your choice.
Also note that javascript rounds -23.62994709 to -23.63. You had -23.62 in your question, I assume -23.63 is correct.
If you do want to truncate things like 4.00 to 4 or 4.20 to 4.2 use this replace instead.
"replace": [
"$${",
"let result = $1.toFixed(2);",
"result = String(result).replace(/0+$/m, '').replace(/\\.$/m, '');",
"return result;",
"}$$"
],
We are able to round-off decimal numbers correctly using regular expressions.
We need basically this regex:
secondDD_regx = /(?<=[\d]*\.[\d]{1})[\d]/g; // roun-off digit
thirdDD_regx = /(?<=[\d]*\.[\d]{2})[\d]/g; // first discard digit
isNonZeroAfterThirdDD_regx = /(?<=[\d]*\.[\d]{3,})[1-9]/g;
isOddSecondDD_regx = /[13579]/g;
Full code (round-off digit up to two decimal places):
const uptoOneDecimalPlaces_regx = /[\+\-\d]*\.[\d]{1}/g;
const secondDD_regx = /(?<=[\d]*\.[\d]{1})[\d]/g;
const thirdDD_regx = /(?<=[\d]*\.[\d]{2})[\d]/g;
const isNonZeroAfterThirdDD_regx = /(?<=[\d]*\.[\d]{3,})[1-9]/g;
const num = '5.285';
const uptoOneDecimalPlaces = num.match(uptoOneDecimalPlaces_regx)?.[0];
const secondDD = num.match(secondDD_regx)?.[0];
const thirdDD = num.match(thirdDD_regx)?.[0];
const isNonZeroAfterThirdDD = num.match(isNonZeroAfterThirdDD_regx)?.[0];
const isOddSecondDD = /[13579]/g.test(secondDD);
// check carry
const carry = !thirdDD ? 0 : thirdDD > 5 ? 1 : thirdDD < 5 ? 0 : isNonZeroAfterThirdDD ? 1 : isOddSecondDD ? 1 : 0;
let roundOffValue;
if(/9/g.test(secondDD) && carry) {
roundOffValue = (Number(`${uptoOneDecimalPlaces}` + `${secondDD ? Number(secondDD) : 0}`) + Number(`0.0${carry}`)).toString();
} else {
roundOffValue = (uptoOneDecimalPlaces + ((secondDD ? Number(secondDD) : 0) + carry)).toString();
}
// Beaufity output : show exactly 2 decimal places if output is x.y or x
const dd = roundOffValue.match(/(?<=[\d]*[\.])[\d]*/g)?.toString().length;
roundOffValue = roundOffValue + (dd=== undefined ? '.00' : dd === 1 ? '0' : '');
console.log(roundOffValue);
For more details check: Round-Off Decimal Number properly using Regular Expression🤔

vim - code folding by expression

I have some sourcecode with curly brackets code blocks
I want to be able to fold the blocks having some if condition in front, and leave the other code blocks unfolded.
example input:
print "this is a test"
if a == b {
{ x = 1
y = 2
z = 3
}
k = [1, 2, 3]
}
{ l = 5 }
return "foo"
expected output:
print "this is a test"
if a == b {
+-- 6 lines:
}
{ l = 5 }
return "foo"
I've read this and this, but still no idea how to face the problem.
Any suggestions ?
Assuming that the if closing '}' brace is at the beginning of a line, you can use:
:g/if.*{/+,/^}/-fold
This folds the statements within the {} braces of the if, excluding the braces themselves.
This is achieved through the + and - movements put after the patterns that define the g range (there's a coma between the patterns): + moves down the range by one line from the first matched pattern (/if.*{/) and the - moves the range one line up from the second matched pattern (/^}/)
If you have indented closing '}' braces or for any circumstance where the above command does not apply, you can try to look for other patterns that you can exploit and change the ex command above as needed.

Add two different numbers in a single text field space separated in Access VBA

I am using Access and VBA to tidy up a database before a migration. One field is going from text to an INT. So I need to convert and possibly add some numbers which exist in a singular field.
Examples:
F/C 3 other 8 should become 11
Calender-7 should become 7
21 F/C and 1 other should become 22
29 (natural ways) should become 29
The second and fourth line are simple enough, just use the following regex in VBA
Dim rgx As New RegExp
Dim inputText As String
Dim outputText As String
rgx.Pattern = "[^0-9]*"
rgx.Global = True
inputText = "29 (natural ways)"
outputText = rgx.Replace(inputText, "")
The downside is if I use it on option 1 or 3:
F/C 3 other 8 will become 38
Calender-7 will become 7
21 F/C and 1 other will become 211
29 (natural ways) will become 29
This is simple enough in bash, I can just keep the spaces by adding one to [^0-9 ]* and then piping it into awk which will add every field using a space as a delimiter like so:
sed 's/[^0-9 ]*//g' | awk -F' ' 's=0; {for (i=1; i<=NF; i++) s=s+$i; print s}'
F/C 3 other 8 will become 11
21 F/C and 1 other will become 22
The problem is I cannot use bash, and there are far too many values to do it by hand. Is there any way to use VBA to accomplish this?
Instead of using the replace method, just capture and then add up all the numbers. For example:
Option Explicit
Function outputText(inputText)
Dim rgx As RegExp
Dim mc As MatchCollection, m As Match
Dim I As Integer
Set rgx = New RegExp
rgx.Pattern = "[0-9]+"
rgx.Global = True
Set mc = rgx.Execute(inputText)
For Each m In mc
I = I + CInt(m) 'may Need to be cast as an int in Access VBA; not required in Excel VBA
Next m
outputText = I
End Function
I'm not sure if there are any easier way for your question. Here I've wrote small function for you.
Requirement: add all numbers in a string, identify "consecutive" digits as one number.
pseudo:
Loop through given text
find the first number and check/loop if following chars are numbers
if following chars are numbers treat as one number else pass the
result
continue searching from last point and add the result to the total
in code:
Public Function ADD_NUMB(iText As String) As Long
Dim I, J As Integer
Dim T As Long
Dim TM As String
For I = 1 To Len(iText)
If (InStr(1, "12346567890", Mid$(iText, I, 1)) >= 1) Then
TM = Mid(iText, I, 1)
For J = I + 1 To Len(iText)
If (InStr(1, "12346567890", Mid$(iText, J, 1)) >= 1) Then
TM = TM & Mid$(iText, J, 1)
Else
Exit For
End If
Next J
T = T + Val(Nz(TM, 0))
I = J
End If
Next I
ADD_NUMB = T
End Function
usage:
dim total as integer
total = ADD_NUMB("21 F/C and 1 other")
not sure about performance but it will get you what you need :)

R code to check if word matches pattern

I need to validate a string against a character vector pattern. My current code is:
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
# valid pattern is lowercase alphabet, '.', '!', and '?' AND
# the string length should be >= than 2
my.pattern = c(letters, '!', '.', '?')
check.pattern = function(word, min.size = 2)
{
word = trim(word)
chars = strsplit(word, NULL)[[1]]
all(chars %in% my.pattern) && (length(chars) >= min.size)
}
Example:
w.valid = 'special!'
w.invalid = 'test-me'
check.pattern(w.valid) #TRUE
check.pattern(w.invalid) #FALSE
This is VERY SLOW i guess...is there a faster way to do this? Regex maybe?
Thanks!
PS: Thanks everyone for the great answers. My objective was to build a 29 x 29 matrix,
where the row names and column names are the allowed characters. Then i iterate over each word of a huge text file and build a 'letter precedence' matrix. For example, consider the word 'special', starting from the first char:
row s, col p -> increment 1
row p, col e -> increment 1
row e, col c -> increment 1
... and so on.
The bottleneck of my code was the vector allocation, i was 'appending' instead of pre-allocate the final vector, so the code was taking 30 minutes to execute, instead of 20 seconds!
There are some built-in functions that can clean up your code. And I think you're not leveraging the full power of regular expressions.
The blaring issue here is strsplit. Comparing the equality of things character-by-character is inefficient when you have regular expressions. The pattern here uses the square bracket notation to filter for the characters you want. * is for any number of repeats (including zero), while the ^ and $ symbols represent the beginning and end of the line so that there is nothing else there. nchar(word) is the same as length(chars). Changing && to & makes the function vectorized so you can input a vector of strings and get a logical vector as output.
check.pattern.2 = function(word, min.size = 2)
{
word = trim(word)
grepl(paste0("^[a-z!.?]*$"),word) & nchar(word) >= min.size
}
check.pattern.2(c(" d ","!hello ","nA!"," asdf.!"," d d "))
#[1] FALSE TRUE FALSE TRUE FALSE
Next, using curly braces for number of repetitions and some paste0, the pattern can use your min.size:
check.pattern.3 = function(word, min.size = 2)
{
word = trim(word)
grepl(paste0("^[a-z!.?]{",min.size,",}$"),word)
}
check.pattern.3(c(" d ","!hello ","nA!"," asdf.!"," d d "))
#[1] FALSE TRUE FALSE TRUE FALSE
Finally, you can internalize the regex from trim:
check.pattern.4 = function(word, min.size = 2)
{
grepl(paste0("^\\s*[a-z!.?]{",min.size,",}\\s*$"),word)
}
check.pattern.4(c(" d ","!hello ","nA!"," asdf.!"," d d "))
#[1] FALSE TRUE FALSE TRUE FALSE
If I understand the pattern you are desiring correctly, you would want a regex of a similar format to:
^\\s*[a-z!\\.\\?]{MIN,MAX}\\s*$
Where MIN is replaced with the minimum length of the string, and MAX is replaced with the maximum length of the string. If there is no maximum length, then MAX and the comma can be omitted. Likewise, if there is neither maximum nor minimum everything within the {} including the braces themselves can be replaced with a * which signifies the preceding item will be matched zero or more times; this is equivalent to {0}.
This ensures that the regex only matches strings where every character after any leading and trailing whitespace is from the set of
* a lower case letter
* a bang (exclamation point)
* a question mark
Note that this has been written in Perl style regex as it is what I am more familiar with; most of my research was at this wiki for R text processing.
The reason for the slowness of your function is the extra overhead of splitting the string into a number of smaller strings. This is a lot of overhead in comparison to a regex (or even a manual iteration over the string, comparing each character until the end is reached or an invalid character is found). Also remember that this algorithm ENSURES a O(n) performance rate, as the split causes n strings to be generated. This means that even FAILING strings must do at least n actions to reject the string.
Hopefully this clarifies why you were having performance issues.

How do I visual select a calculation backwards?

I would like to visual select backwards a calculation p.e.
200 + 3 This is my text -300 +2 + (9*3)
|-------------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
The reason is that I will use it in insert mode.
After writing a calculation I want to select the calculation (using a map) and put the results of the calculation in the text.
What the regex must do is:
- select from the cursor (see * in above example) backwards to the start of the calculation
(including \/-+*:.,^).
- the calculation can start only with log/sqrt/abs/round/ceil/floor/sin/cos/tan or with a positive or negative number
- the calculation can also start at the beginning of the line but it never goes back to
a previous line
I tried in all ways but could not find the correct regex.
I noted that backward searching is different then forward searching.
Can someone help me?
Edit
Forgot to mention that it must include also the '=' if there is one and if the '=' is before the cursor or if there is only space between the cursor and '='.
It must not include other '=' signs.
200 + 3 = 203 -300 +2 + (9*3) =
|-------------------|<SPACES>*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|<SPACES>*
* = where the cursor is
A regex that comes close in pure vim is
\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*
There are limitations: subexpressions (including function arguments) aren't parsed. You'd need to use a proper grammar parser to do that, and I don't recommend doing that in pure vim1
Operator Mapping
To enable using this a bit like text-objects, use something like this in your $MYVIMRC:
func! DetectExpr(flag)
let regex = '\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*'
return searchpos(regex, a:flag . 'ncW', line('.'))
endf
func! PositionLessThanEqual(a, b)
"echo 'a: ' . string(a:a)
"echo 'b: ' . string(a:b)
if (a:a[0] == a:b[0])
return (a:a[1] <= a:b[1]) ? 1 : 0
else
return (a:a[0] <= a:b[0]) ? 1 : 0
endif
endf
func! SelectExpr(mustthrow)
let cpos = getpos(".")
let cpos = [cpos[1], cpos[2]] " use only [lnum,col] elements
let begin = DetectExpr('b')
if ( ((begin[0] == 0) && (begin[1] == 0))
\ || !PositionLessThanEqual(begin, cpos) )
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos)
call setpos('.', [0, begin[0], begin[1], 0])
let end = DetectExpr('e')
if ( ((end[0] == 0) || (end[1] == 0))
\ || !PositionLessThanEqual(cpos, end) )
call setpos('.', [0, cpos[0], cpos[1], 0])
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
norm! v
call setpos('.', [0, end[0], end[1], 0])
return 1
endf
silent! unmap X
silent! unmap <M-.>
xnoremap <silent>X :<C-u>call SelectExpr(0)<CR>
onoremap <silent>X :<C-u>call SelectExpr(0)<CR>
Now you can operator on the nearest expression around (or after) the cursor position:
vX - [v]isually select e[X]pression
dX - [d]elete current e[X]pression
yX - [y]ank current e[X]pression
"ayX - id. to register a
As a trick, use the following to arrive at the exact ascii art from the OP (using virtualedit for the purpose of the demo):
Insert mode mapping
In response to the chat:
" if you want trailing spaces/equal sign to be eaten:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?)?\s*$', '\=string(eval(submatch(1)))', '')<CR>
" but I'm assuming you wanted them preserved:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?\s*)?$', '\=string(eval(submatch(1))) . submatch(2)', '')<CR>
allows you to hit Alt-. during insert mode and the current expression gets replaced with it's evaluation. The cursor ends up at the end of the result in insert mode.
200 + 3 This is my text -300 +2 + (9*3)
This is text 0.25 + 2.000 + sqrt(15/1.5)
Tested by pressing Alt-. in insert 3 times:
203 This is my text -271
This is text 5.412278
For Fun: ascii art
vXoyoEsc`<jPvXr-r|e.
To easily test it yourself:
:let #q="vXoyo\x1b`<jPvXr-r|e.a*\x1b"
:set virtualedit=all
Now you can #q anywhere and it will ascii-decorate the nearest expression :)
200 + 3 = 203 -300 +2 + (9*3) =
|-------|*
|-------------------|*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|*
|-------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
1 consider using Vim's python integration to do such parsing
This seems quite a complicated task after all to achieve with regex, so if you can avoid it in any way, try to do so.
I've created a regex that works for a few examples - give it a try and see if it does the trick:
^(?:[A-Za-z]|\s)+((?:[^A-Za-z]+)?(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan)[^A-Za-z]+)(?:[A-Za-z]|\s)*$
The part that you are interested in should be in the first matching group.
Let me know if you need an explanation.
EDIT:
^ - match the beginning of a line
(?:[A-Za-z]|\s)+ - match everything that's a letter or a space once or more
match and capture the following 3:
((?:[^A-Za-z]+)? - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan) - match one of your keywords
[^A-Za-z]+) - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:[A-Za-z]|\s)* - match everything that's a letter or a space zero or more times
$ - match the end of the line