Simplified regular expression matching in Scala - regex

I am trying to write a simple regular expression matching in Scala as an exercise. For simplicity I assume the strings to match are ASCII and the regexps consists of ASCII characters and two metacharacters: . and * only. (Obviously, I don't use any regexp library).
This is my simple and slow (exponential) solution.
def doMatch(r: String, s: String): Boolean = {
if (r.isEmpty) s.isEmpty
else if (r.length > 1 && r(1) == '*') star(r(0), r.tail.tail, s)
else if (!s.isEmpty && (r(0) == '.' || r(0) == s(0))) doMatch(r.tail, s.tail)
else false
}
def star(c: Char, r: String, s: String): Boolean = {
if (doMatch(r, s)) true
else if (!s.isEmpty && (c == '.' || c == s(0))) star(c, r, s.tail)
else false
}
Now I would like to improve it. Could you suggest a simple polynomial solution in ~10-15 lines of "pure" Scala code ?

Related

How to check if alternate letters are similar in a string using RegEx Python

I want to solve a problem using Regex Python3.
For instance, Given a string, I want to check each character in the string and return true if any character has the same left, and right characters (e.g. "454", "aba")
Input : a = "4346789"
Output: "True" (because character sits in left and right of '3' are same ie '4')
Input2: a = "4335667"
Output: "False". (because there is no character that is same to left and right of each character)
How can I write a Regex to determine if the characters to the left and right of each character are the same?
Check this link:
https://regex101.com/r/W7SAZU/1
The regex is :
'(.).\1'
This solution worked fine for me, hope it finds its way to other souls in need
def alternate(s: str):
for i, v in enumerate(s):
if i > 0:
if i > 1 and v == s[i-1]:
return False
return True
The second solution doesn't work for me but this code works
def twoalter( s):
for i in range ( len( s) - 2) :
if (s[i] != s[i + 2]) :
return False
if (s[0] == s[1]):
return False
return True
It worked on Python 3.7.3:
import re
def hasAlternate(str):
p = re.compile('(.).\\1')
m = p.search(str) is not None
return m
print(hasAlternate('4346789'))
print(hasAlternate('4335667'))

Regex should allow German Umlauts in C#

I am using following regular expression:
[RegularExpression(#"^[A-Za-z0-9äöüÄÖÜß]+(?:[\._-äöüÄÖÜß][A-Za-z0-9]+)*$", ErrorMessageResourceName = "Error_User_UsernameFormat", ErrorMessageResourceType = typeof(Properties.Resources))]
Now I want to improve it the way it will allow German Umlauts(äöüÄÖÜß).
The way you added German letters to your regex, it will only be possible to use German letters in the first word.
You need to put the letters into the last character class:
#"^[A-Za-z0-9äöüÄÖÜß]+(?:[._-][A-Za-z0-9äöüÄÖÜß]+)*$"
^^^^^^^
See the regex demo
Also, note that _-ä creates a range inside a character class that matches a lot more than just a _, - and ä (and does not even match - as it is not present in the range).
Note that if you validate on the server side only, and want to match any Unicode letters, you may also consider using
#"^[\p{L}0-9]+(?:[._-][\p{L}0-9]+)*$"
Where \p{L} matches any Unicode letter. Another way to write [\p{L}0-9] would be [^\W_], but in .NET, it would also match all Unicode digits while 0-9 will only match ASCII digits.
replace [A-Za-z0-9äöüÄÖÜß] with [\w]. \w already contains Umlauts.
This works better i just modified somebody else his code who posted it on Stackoverflow. this works good for German language encoding.
I just added this code (c >= 'Ä' && c <= 'ä') and now it is working more towards my needs. Not all German letters are supported you need to create your own (c >= 'Ö' && c <= 'ö') type to add the letters u are having a issue with.
public static string RemoveSpecialCharacters(this string str)
{
StringBuilder sb = new StringBuilder();
foreach (char c in str)
{
if ((c >= '0' && c <= '9') || (c >= 'Ö' && c <= 'ö') || (c >= 'Ü' && c <= 'ü') || (c >= 'Ä' && c <= 'ä') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '.' || c == ' ')
{
sb.Append(c);
}
}
return clean(sb);
}

Regular expression to extract words from string

I have a string as
str= "value 1 then perform certain action"
I need a regular expression that will make sure value and perform are present in string without being repeated.
no need regular expression for this simple task. use this code
str= "value 1 then perform certain action"
var a = str.match("value") || []
var b = str.match("perform") || []
if(a.length == 1 && b.length == 1){
console.log("true")
}else{
console.log("false")
}
You can set the pattern as follows:
pattern: "value (.*) perform (.*)"

Scala regex find and replace

I'm having problems finding and replacing portions of a string using regex in scala.
Given the following string: q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20
I want to replace all the occurrences of "and" and "or" with "&&" and "||".
The regex I have come up with is: .+\s((and|or)+)\s.+. However, this seems to only find the last "and".
When using https://regex101.com/#pcre I tried to solve this by adding the modifiers gU, which seems to work. But I'm not sure how to use those modifiers in Scala code.
Any help is much appreciated
Why not to use solution like:
str.replaceAll("\\sand\\s", " && ").replaceAll("\\sor\\s", " || ")
You can check the captured/matched substrings with a lambda and use an if/else syntax to replace with the appropriate replacement:
val str = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val pattern = """\b(and|or)\b""".r
val replacedStr = pattern replaceAllIn (str, m => if (m.group(1) == "or") "||" else "&&")
println(replacedStr)
Result of the code demo: q[k6.q3]>=0 && q[dist.report][0] || q[dist.report][1] && q[10]>20
Regex breakdown:
\b - word boundary
(and|or) - either and or or letter sequences
\b - the closing word boundary.
If you require whitespaces on both ends, use
val pattern = """ (and|or) """.r
val replacedStr = pattern replaceAllIn (str, m => if (m.group(1) == "or") " || " else " && ")
See another Scala demo
You need to add "?" in the right places to make your patterns reluctant:
val line = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val regex = ".+\\s((and|or)+)\\s.+".r
regex.findAllIn(line).toList
//Produces list with one item:
//res0: List[String] = List(q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q)
Compared with:
val line = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val regex = ".+?\\s((and|or)+)\\s.+?".r
regex.findAllIn(line).toList
//List with 3 items:
//res0: List[String] = List(q[k6.q3]>=0 and q, [dist.report][0] or q, [dist.report][1] and q)

Validate mathematical expressions using regular expression?

I want to validate mathematical expressions using regular expression. The mathematical expression can be this
It can be blank means nothing is entered
If specified it will always start with an operator + or - or * or / and will always be followed by a number that can have
any number of digits and the number can be decimal(contains . in between the numbers) or integer(no '.' symbol within the number).
examples : *0.9 , +22.36 , - 90 , / 0.36365
It can be then followed by what is mentioned in point 2 (above line).
examples : *0.9+5 , +22.36*4/56.33 , -90+87.25/22 , /0.36365/4+2.33
Please help me out.
Something like this should work:
^([-+/*]\d+(\.\d+)?)*
Regexr Demo
^ - beginning of the string
[-+/*] - one of these operators
\d+ - one or more numbers
(\.\d+)? - an optional dot followed by one or more numbers
()* - the whole expression repeated zero or more times
You could try generating such a regex using moo and such:
(?:(?:((?:(?:[ \t]+))))|(?:((?:(?:\/\/.*?$))))|(?:((?:(?:(?<![\d.])[0-9]+(?![\d.])))))|(?:((?:(?:[0-9]+\.(?:[0-9]+\b)?|\.[0-9]+))))|(?:((?:(?:(?:\+)))))|(?:((?:(?:(?:\-)))))|(?:((?:(?:(?:\*)))))|(?:((?:(?:(?:\/)))))|(?:((?:(?:(?:%)))))|(?:((?:(?:(?:\()))))|(?:((?:(?:(?:\)))))))
This regex matches any amount of int, float, braces, whitespace, and the operators +-*/%.
However, expressions such as 2+ would still be validated by the regex, so you might want to use a parser instead.
If you want negative or positive expression you can write it like this>
^\-?[0-9](([-+/*][0-9]+)?([.,][0-9]+)?)*?$
And a second one
^[(]?[-]?([0-9]+)[)]??([(]?([-+/*]([0-9]))?([.,][0-9]+)?[)]?)*$
With parenthesis in expression but doesn't count the number you will need method that validate it or regex.
// the method
public static bool IsPairParenthesis(string matrixExpression)
{
int numberOfParenthesis = 0;
foreach (char character in matrixExpression)
{
if (character == '(')
{
numberOfParenthesis++;
}
if (character == ')')
{
numberOfParenthesis--;
}
}
if (numberOfParenthesis == 0)
{ return true; }
return false;
}
This is java regex, but this is only if not have any braces
[+\-]?(([0-9]+\.[0-9]+)|([0-9]+\.?)|(\.?[0-9]+))([+\-/*](([0-9]+\.[0-9]+)|([0-9]+\.?)|(\.?[0-9]+)))*
Also this with braces in java code
In this case I raplace (..) to number (..), should matches without brace pattern
// without brace pattern
static Pattern numberPattern = Pattern.compile("[+\\-]?(([0-9]+\\.[0-9]+)|([0-9]+\\.?)|(\\.?[0-9]+))([+\\-/*](([0-9]+\\.[0-9]+)|([0-9]+\\.?)|(\\.?[0-9]+)))*");
static Pattern bracePattern = Pattern.compile("\\([^()]+\\)");
public static boolean matchesForMath(String txt) {
if (txt == null || txt.isEmpty()) return false;
txt = txt.replaceAll("\\s+", "");
if (!txt.contains("(") && !txt.contains(")")) return numberPattern.matcher(txt).matches();
if (txt.contains("(") ^ txt.contains(")")) return false;
if (txt.contains("()")) return false;
Queue<String> toBeRematch = new ArrayDeque<>();
toBeRematch.add(txt);
while (toBeRematch.size() > 0) {
String line = toBeRematch.poll();
Matcher m = bracePattern.matcher(line);
if (m.find()) {
String newline = line.substring(0, m.start()) + "1" + line.substring(m.end());
String withoutBraces = line.substring(m.start() + 1, m.end() - 1);
toBeRematch.add(newline);
if (!numberPattern.matcher(withoutBraces).matches()) return false;
}
}
return true;
}