Spot vars and functions with regex in mathematical formulas

Spot vars and functions with regex in mathematical formulas - regex

I am trying to get an array of functions, and an array of variables in any mathematical formula in vanilla JS :
A sample :
1+3*9/cos(4+2*x/6+pol)+
BDLIRE(longueur2)+2*8+sin(2)
+g()
For getting functions I use : /([a-zA-Z]+)(?=[(])/gm
https://regex101.com/r/fT3iM7/1
For getting vars I tried : /([a-zA-Z]+[a-zA-Z0-9]?)(?!\()/gm
https://regex101.com/r/mG2fQ2/1
But as you can see it ignores last char of a match followed by a ( char
So i'm a bit stuck with my regex to match variables.
Thanks for any help :)

You cannot use RegEx to parse arithmetic expressions, maybe you can, but you will have a flaky implementation and writing it WILL make you go bonkers.
You need to look into parsers. Check out PEG.js, it is an easy to use framework for writing parsers that compiles into JS, so it IS "vanilla" (Whatever you mean by that...):
http://pegjs.org/

Related

Regex expression to recognize XdY+Z OR XdY

I've been trying to develop a program that will be used for DMing in an MMORPG but I'm having trouble parsing for the actual regex expression I need.
To quote myself from another thread on a less active forum:
I've officially taken over the DiceRoller addon from years and years ago and I've reworked it a lot since I've taken it over and done a lot of testing in game. While I haven't uploaded anything yet, I've been struggling on a piece of regex expression that is currently crucial to the design of the addon.
Some background: the newest iteration of the DiceRoller addon makes it so you can type "!XdY" (where X is the number of dice, Y is the dice value) into raid chat and the DM who has the addon will go through some logic in the addon (random number lua protocol) and then spit out an input after adding up the dice.
It is as follows:
local count, size = string.match(message, "^!(%d+)[dD](%d+)$")
Now the functionality I need it to do is parse for both "!XdY" OR "XdY+Z", but it seems as if I can't get close to "XdY+Z" no matter which regex expression I use since I need it to do both expressions. I can provide more source code context if necessary.
This is the closest I've ever gotten:
http://i.imgur.com/eMhPHQB.png
and this is with the regex expression:
local count, size, modifier = string.match(message, "^!(%d+)[dD](%d+)+?(%d+)$")
As you can see, with the modifier it will work just fine. However, remove the modifier the regex expression still thinks that it is "XdY+Z" and so with "1d20" it think it is "1d2+0". It will think 1d200 is "1d20+0", etc. I've tried moving around the optional character "?" but it just causes the expression to not work at all. If I do !1d2 it doesn't work. It's almost as if the optional character NEEDS to be there?
Thanks for the help ahead of time, I've always struggled with regex.

local function dice(input)
local count, size, modifier = input:match"^!(%d+)[dD](%d+)%+?(%d*)$"
if count then
return tonumber(count), tonumber(size), tonumber("0"..modifier)
end
end
for _, input in ipairs{"!1d6", "!1d24", "!1d200", "!1d2+4", "!1d20+24"} do
print(input, dice(input))
end
Output:
!1d6 1 6 0
!1d24 1 24 0
!1d200 1 200 0
!1d2+4 1 2 4
!1d20+24 1 20 24

Lua regular expressions are very limited. You would need to use ^!(%d+)[dD](%d+)(?:+(%d+))?$ but this wouldn't be supported because of (?:+(%d+))? that uses a non-capturing group and a modifier on a group, both are not supported by Lua Patterns.
Consider using a regex library like this one that allows you to use PCRE, PHP regex engine, one of the most complete engine. But that would be overkill if you only want to use it for this regex. You can do it by code then, wouldn't be so hard for a simple task like this.

While Lua patterns are not powerful enough to parse this with one expression (as they don't support optional groups), there is an easy option to handle it with two expressions:
-- check the longer expression first
local count, size, modifier = string.match(message, "^!(%d+)[dD](%d+)+(%d+)$")
if not count then
count, size = string.match(message, "^!(%d+)[dD](%d+)$")
end

Creating a simple parser in (V)C++ (2010) similar to PEG

For an school project, I need to parse a text/source file containing a simplified "fake" programming language to build an AST. I've looked at boost::spirit, however since this is a group project and most seems reluctant to learn extra libraries, plus the lecturer/TA recommended leaning to create a simple one on C++. I thought of going that route. Is there some examples out there or ideas on how to start? I have a few attempts but not really successful yet ...
parsing line by line
Test each line with a bunch of regex (1 for procedure/function declaration), one for assignment, one for while etc...
But I will need to assume there are no multiple statements in one line: eg. a=b;x=1;
When I reach a container statement, procedures, whiles etc, I will increase the indent. So all nested statements will go under this
When I reach a } I will decrement indent
Any better ideas or suggestions? Example code I need to parse (very simplified here ...)
procedure Hello {
a = 1;
while a {
b = a + 1 + z;
}
}
Another idea was to read whole file into a string, and go top down. Match all procedures, then capture everything in { ... } then start matching statements (end with ;) or containers while { ... }. This is similar to how PEG does things? But I will need to read entire file

Multipass makes things easier. On a first pass, split things into tokens, like "=", or "abababa", or a quote-delimited string, or a block of whitespace. Don't be destructive (keep the original data), but break things down to simple chunks, and maybe have a little struct or enum that describes what the token is (ie, whitespace, a string literal, an identifier type thing, etc).
So your sample code gets turned into:
identifier(procedure) whitespace( ) identifier(Hello) whitespace( ) operation({) whitespace(\n\t) identifier(a) whitespace( ) operation(=) whitespace( ) number(1) operation(;) whitespace(\n\t) etc.
In those tokens, you might also want to store line number and offset on the line (this will help with error message generation later).
A quick test would be to turn this back into the original text. Another quick test might be to dump out pretty-printed version in html or something (where you color whitespace to have a pink background, identifiers as light blue, operations as light green, numbers as light orange), and see if your tokenizer is making sense.
Now, your language may be whitespace insensitive. So discard the whitespace if that is the case! (C++ isn't, because you need newlines to learn when // comments end)
(Note: a professional language parser will be as close to one-pass as possible, because it is faster. But you are a student, and your goal should be to get it to work.)
So now you have a stream of such tokens. There are a bunch of approaches at this point. You could pull out some serious parsing chops and build a CFG to parse them. (Do you know what a CFG is? LR(1)? LL(1)?)
An easier method might be to do it a bit more ad-hoc. Look for operator({) and find the matching operator(}) by counting up and down. Look for language keywords (like procedure), which then expects a name (the next token), then a block (a {). An ad-hoc parser for a really simple language may work fine.
I've done exactly this for a ridiculously simple language, where the parser consisted of a really simple PDA. It might work for you guys. Or it might not.

Since you mentioned PEG i'll like to throw in my open source project : https://github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_c++
Here is a visual tool that can export C++ version: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-language-workbench
Documentation for rule grammar: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-dsl-documentation
If i was writing my own language I would probably look at the terminals/non-terminals found in System.Linq.Expressions as these would be a great start for your grammar rules.
http://msdn.microsoft.com/en-us/library/system.linq.expressions.aspx
System.Linq.Expressions.Expression
System.Linq.Expressions.BinaryExpression
System.Linq.Expressions.BlockExpression
System.Linq.Expressions.ConditionalExpression
System.Linq.Expressions.ConstantExpression
System.Linq.Expressions.DebugInfoExpression
System.Linq.Expressions.DefaultExpression
System.Linq.Expressions.DynamicExpression
System.Linq.Expressions.GotoExpression
System.Linq.Expressions.IndexExpression
System.Linq.Expressions.InvocationExpression
System.Linq.Expressions.LabelExpression
System.Linq.Expressions.LambdaExpression
System.Linq.Expressions.ListInitExpression
System.Linq.Expressions.LoopExpression
System.Linq.Expressions.MemberExpression
System.Linq.Expressions.MemberInitExpression
System.Linq.Expressions.MethodCallExpression
System.Linq.Expressions.NewArrayExpression
System.Linq.Expressions.NewExpression
System.Linq.Expressions.ParameterExpression
System.Linq.Expressions.RuntimeVariablesExpression
System.Linq.Expressions.SwitchExpression
System.Linq.Expressions.TryExpression
System.Linq.Expressions.TypeBinaryExpression
System.Linq.Expressions.UnaryExpression

camelCase to underscore in vi(m)

If for some reason I want to selectively convert camelCase named things to being underscore separated in vim, how could I go about doing so?
Currently I've found that I can do a search /s[a-z][A-Z] and record a macro to add an underscore and convert to lower case, but I'm curious as to if I can do it with something like :
%s/([a-z])([A-Z])/\1\u\2/gc
Thanks in advance!
EDIT: I figured out the answer for camelCase (which is what I really needed), but can someone else answer how to change CamelCase to camel_case?

You might want to try out the Abolish plugin by Tim Pope. It provides a few shortcuts to coerce from one style to another. For example, starting with:
MixedCase
Typing crc [mnemonic: CoeRce to Camelcase] would give you:
mixedCase
Typing crs [mnemonic: CoeRce to Snake_case] would give you:
mixed_case
And typing crm [mnemonic: CoeRce to MixedCase] would take you back to:
MixedCase
If you also install repeat.vim, then you can repeat the coercion commands by pressing the dot key.

This is a bit long, but seems to do the job:
:%s/\<\u\|\l\u/\= join(split(tolower(submatch(0)), '\zs'), '_')/gc

I suppose I should have just kept trying for about 5 more minutes. Well... if anyone is curious:
%s/\(\l\)\(\u\)/\1\_\l\2/gc does the trick.
Actually, I realized this works for camelCase, but not CamelCase, which could also be useful for someone.

I whipped up a plugin that does this.
https://github.com/chiedojohn/vim-case-convert
To convert the case, select a block of text in visual mode and the enter one of the following (Self explanatory) :
:CamelToHyphen
:CamelToSnake
:HyphenToCamel
:HyphenToSnake
:SnakeToCamel
:SnakeToHyphen
To convert all occerences in your document then run one of the following commands:
:CamelToHyphenAll
:CamelToSnakeAll
:HyphenToCamelAll
:HyphenToSnakeAll
:SnakeToCamelAll
:SnakeToHyphen
Add a bang (eg. :CamelToHyphen!) to any of the above command to bypass the prompts before each conversion.
You may not want to do that though as the plugin wouldn't know the different between variables or other text in your file.

For the CamelCase case:%s#(\<\u\|\l)(\l+)(\u)#\l\1\2_\l\3#gc
Tip: the regex delimiters can be altered as in my example to make it (somewhat) more legible.

I have an API for various development oriented processing. Among other things, it provides a few functions for transforming names between (configurable) conventions (variable <-> attribute <-> getter <-> setter <-> constant <-> parameter <-> ...) and styles (camelcase (low/high) <-> underscores). These conversion functions have been wrapped into a plugin.
The plugin + API can be fetch from here: https://github.com/LucHermitte/lh-dev, for this names conversion task, it requires lh-vim-lib
It can be used the following way:
put the cursor on the symbol you want to rename
type :NameConvert + the type of conversion you wish (here : underscore). NB: this command supports auto-completion.
et voilà!

Calling fnparse to parse a string?

I've been trying out fnparse library written by Joshua Choi in Clojure and I'm having difficulties trying to work out how to call the rules on the text that I want to parse. I've been experimenting with cat which is part of the new release. Lets take the example code listed. Could anyone give me some ideas how I could call the rule on an expression?
Thank you!

thanks for trying out FnParse 3.
In general, you use the edu.arizona.fnparse/match form (as well as the complementary find, substitute, and substitute-1 forms) to use the rules that you create. Check their documentation strings.
Sorry about the confusion—I should have added an example of match in math.clj—but take a look at the bottom of the sample Clojure parser. Even though the Clojure parser uses FnParse Hound, match works the same way in both Cat and Hound.

Why is this regular expression faster?

I'm writing a Telnet client of sorts in C# and part of what I have to parse are ANSI/VT100 escape sequences, specifically, just those used for colour and formatting (detailed here).
One method I have is one to find all the codes and remove them, so I can render the text without any formatting if needed:
public static string StripStringFormating(string formattedString)
{
if (rTest.IsMatch(formattedString))
return rTest.Replace(formattedString, string.Empty);
else
return formattedString;
}
I'm new to regular expressions and I was suggested to use this:
static Regex rText = new Regex(#"\e\[[\d;]+m", RegexOptions.Compiled);
However, this failed if the escape code was incomplete due to an error on the server. So then this was suggested, but my friend warned it might be slower (this one also matches another condition (z) that I might come across later):
static Regex rTest =
new Regex(#"(\e(\[([\d;]*[mz]?))?)?", RegexOptions.Compiled);
This not only worked, but was in fact faster to and reduced the impact on my text rendering. Can someone explain to a regexp newbie, why? :)

Do you really want to do run the regexp twice? Without having checked (bad me) I would have thought that this would work well:
public static string StripStringFormating(string formattedString)
{
return rTest.Replace(formattedString, string.Empty);
}
If it does, you should see it run ~twice as fast...

The reason why #1 is slower is that [\d;]+ is a greedy quantifier. Using +? or *? is going to do lazy quantifing. See MSDN - Quantifiers for more info.
You may want to try:
"(\e\[(\d{1,2};)*?[mz]?)?"
That may be faster for you.

I'm not sure if this will help with what you are working on, but long ago I wrote a regular expression to parse ANSI graphic files.
(?s)(?:\e\[(?:(\d+);?)*([A-Za-z])(.*?))(?=\e\[|\z)
It will return each code and the text associated with it.
Input string:
<ESC>[1;32mThis is bright green.<ESC>[0m This is the default color.
Results:
[ [1, 32], m, This is bright green.]
[0, m, This is the default color.]

Without doing detailed analysis, I'd guess that it's faster because of the question marks. These allow the regular expression to be "lazy," and stop as soon as they have enough to match, rather than checking if the rest of the input matches.
I'm not entirely happy with this answer though, because this mostly applies to question marks after * or +. If I were more familiar with the input, it might make more sense to me.
(Also, for the code formatting, you can select all of your code and press Ctrl+K to have it add the four spaces required.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js