How to select sub brackets from code using regex and substitute values - regex

Just a basic question related to python regex.
My code is this
void main void()
{
abc
{
tune
}
{
wao
}
}
i want to replace the abc ,tune and wao with different values like abc=u, tune=p and wao=m.
I am unable to select these two sub brackets of code in my regex.
The Regex which i am using is \{([^}]+)\}.
I want to write a code which can substitute these values in such a way that first all sub brackets were substituted i.e (tune=p and wao=m) then main brackets are substituted (abc=u).

If you do want to make use of regex, the following will work for the given example:
import re
code = """void main void()
{
abc
{
tune
}
{
wao
}
}
"""
code = re.sub(r"({\s*)[^{\s]+(\s*{)", r"\1u\2", code)
code = re.sub(r"({\s*)[^{\s]+(\s*})(?=\s*{)", r"\1p\2", code)
code = re.sub(r"({\s*)[^{\s]+(\s*})(?=\s*})", r"\1m\2", code)
print(code)
Output:
void main void()
{
u
{
p
}
{
m
}
}
But in general cases it is recommended to do a stateful parsing as #wim suggests.

Related

Why is my regex C++ expression not working?

I have the following regex expression: \\(([^)]+)\\) (don't take into account the double brackets it's because of C++) and the following code:
if (in_str.find("(") != string::npos) {
print(to_string(countMatchInRegex(in_str, "\\([^ ]*\\.[^ ]*\\)")));
for (int i = 0; i < countMatchInRegex(in_str, "\\(([^)]+)\\)"); ++i) {
regex r("\\(([^)]+)\\)");
smatch m;
regex_search(in_str, m, r);
string obj = m.str();
obj = obj.substr(1, obj.length() - 2);
string property = obj.substr(obj.find(".") + 1);
obj = obj.substr(0, obj.find("."));
in_str = replace(in_str, m.str(), process_property(obj, property));
}
}
This code is supposed to find, in a string, substrings like the following: (something.somethingelse). It works fine, except it only works two times...but i don't know how. I do know the problem is not the countMatchInRegex function, because I've printed what it results the correct number of substrings that match the regex expression in the string.
If anyone has any idea, please share them, I've been stuck on this for weeks..
I actually found the error, I don’t know why but C++ was returning different values for the two countMatchInRegex so all I did was assign it to a variable and use the variable instead every thing the function was called in the code.

Regex.IsMatch for only letters and numbers [duplicate]

How can I validate a string using Regular Expressions to only allow alphanumeric characters in it?
(I don't want to allow for any spaces either).
In .NET 4.0 you can use LINQ:
if (yourText.All(char.IsLetterOrDigit))
{
//just letters and digits.
}
yourText.All will stop execute and return false the first time char.IsLetterOrDigit reports false since the contract of All cannot be fulfilled then.
Note! this answer do not strictly check alphanumerics (which typically is A-Z, a-z and 0-9). This answer allows local characters like åäö.
Update 2018-01-29
The syntax above only works when you use a single method that has a single argument of the correct type (in this case char).
To use multiple conditions, you need to write like this:
if (yourText.All(x => char.IsLetterOrDigit(x) || char.IsWhiteSpace(x)))
{
}
Use the following expression:
^[a-zA-Z0-9]*$
ie:
using System.Text.RegularExpressions;
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(SomeString)) {
...
}
You could do it easily with an extension function rather than a regex ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
for (int i = 0; i < str.Length; i++)
{
if (!(char.IsLetter(str[i])) && (!(char.IsNumber(str[i]))))
return false;
}
return true;
}
Per comment :) ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
return (str.ToCharArray().All(c => Char.IsLetter(c) || Char.IsNumber(c)));
}
While I think the regex-based solution is probably the way I'd go, I'd be tempted to encapsulate this in a type.
public class AlphaNumericString
{
public AlphaNumericString(string s)
{
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(s))
{
value = s;
}
else
{
throw new ArgumentException("Only alphanumeric characters may be used");
}
}
private string value;
static public implicit operator string(AlphaNumericString s)
{
return s.value;
}
}
Now, when you need a validated string, you can have the method signature require an AlphaNumericString, and know that if you get one, it is valid (apart from nulls). If someone attempts to pass in a non-validated string, it will generate a compiler error.
You can get fancier and implement all of the equality operators, or an explicit cast to AlphaNumericString from plain ol' string, if you care.
I needed to check for A-Z, a-z, 0-9; without a regex (even though the OP asks for regex).
Blending various answers and comments here, and discussion from https://stackoverflow.com/a/9975693/292060, this tests for letter or digit, avoiding other language letters, and avoiding other numbers such as fraction characters.
if (!String.IsNullOrEmpty(testString)
&& testString.All(c => Char.IsLetterOrDigit(c) && (c < 128)))
{
// Alphanumeric.
}
^\w+$ will allow a-zA-Z0-9_
Use ^[a-zA-Z0-9]+$ to disallow underscore.
Note that both of these require the string not to be empty. Using * instead of + allows empty strings.
Same answer as here.
If you want a non-regex ASCII A-z 0-9 check, you cannot use char.IsLetterOrDigit() as that includes other Unicode characters.
What you can do is check the character code ranges.
48 -> 57 are numerics
65 -> 90 are capital letters
97 -> 122 are lower case letters
The following is a bit more verbose, but it's for ease of understanding rather than for code golf.
public static bool IsAsciiAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
{
return false;
}
for (int i = 0; i < str.Length; i++)
{
if (str[i] < 48) // Numeric are 48 -> 57
{
return false;
}
if (str[i] > 57 && str[i] < 65) // Capitals are 65 -> 90
{
return false;
}
if (str[i] > 90 && str[i] < 97) // Lowers are 97 -> 122
{
return false;
}
if (str[i] > 122)
{
return false;
}
}
return true;
}
In order to check if the string is both a combination of letters and digits, you can re-write #jgauffin answer as follows using .NET 4.0 and LINQ:
if(!string.IsNullOrWhiteSpace(yourText) &&
yourText.Any(char.IsLetter) && yourText.Any(char.IsDigit))
{
// do something here
}
Based on cletus's answer you may create new extension.
public static class StringExtensions
{
public static bool IsAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
Regex r = new Regex("^[a-zA-Z0-9]*$");
return r.IsMatch(str);
}
}
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphaNumeric() will return True whereas "smith 23".IsAlphaNumeric(false) will return False. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden as shown above. If you want to allow spaces such that "smith 23".IsAlphaNumeric() will return True, simple default the arg.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().
12 years and 7 months later, if anyone comes across this article nowadays.
Compiled RegEx actually has the best performance in .NET 5 and .NET 6
Please look at the following link where I compare several different answers given on this question. Mainly comparing Compiled RegEx, For-Loops, and Linq Predicates: https://dotnetfiddle.net/WOPQRT
Notes:
As stated, this method is only faster in .NET 5 and .NET 6.
.NET Core 3.1 and below show RegEx being the slowest.
Regardless of the version of .NET, the For-Loop method is consistently faster than the Linq Predicate.
I advise to not depend on ready made and built in code in .NET framework , try to bring up new solution ..this is what i do..
public bool isAlphaNumeric(string N)
{
bool YesNumeric = false;
bool YesAlpha = false;
bool BothStatus = false;
for (int i = 0; i < N.Length; i++)
{
if (char.IsLetter(N[i]) )
YesAlpha=true;
if (char.IsNumber(N[i]))
YesNumeric = true;
}
if (YesAlpha==true && YesNumeric==true)
{
BothStatus = true;
}
else
{
BothStatus = false;
}
return BothStatus;
}

As I can detect various function blocks with braces "{}" using regex?

I need a regular expression to extract the following text block functions only.
Example:
// Comment 1. function example1() { return 1; } // Comment 2 function example2() { if (a < b) { a++ } } // Comment 3 function example3() { while (1) { i++; } } /* Comment 4 */ function example4() { i = 4; for (i = 1; i < 10; i++) { i++; } return i; }
Take into account that no line breaks. It is a single block of code.
I have tried using the following regular expression:
Expression:
function\s[a-z|A-Z|0-9_]+()\s?{(?:.+)\s}
But there is a problem, place the .+ , take me all characters to the end of the text block.
Thanks in advance guys for the help you can give me.
In PCRE (PHP, R, Delphi), you can achieve this with recursion:
function\s[a-zA-Z0-9_]+\(\)(\s?{(?>[^{}]|(?1))*})
See demo.
In Ryby, just use \g<1> instead of (?1):
function\s[a-zA-Z0-9_]+\(\)(\s?{(?>[^{}]|(\g<1>))*})
In .NET, you can match them using balanced groups:
function\s[a-zA-Z0-9_]+\(\)\s*{((?<sq>{)|(?<-sq>})|[^{}]*)+}
See demo 2
In other languages, there is no recursion, and you need to use a workaround by adding nested levels "manually". In your examples, you have 2 levels.
Thus, in Python, it will look like:
function\s[a-zA-Z0-9_]+\(\)(?:\s?{(?:[^{}]*(?:\s*{[^{}]*}[^{}]*)*})*)
See Python demo (also works in JavaScript
In Java, you will need to escape {:
function\s[a-zA-Z0-9_]+\(\)(?:\s?\{(?:[^\{}]*(?:\s*\{[^{}]*}[^{}]*)*})*)

How to Simplify C++ Boolean Comparisons

I'm trying to find a way to simplify the comparison cases of booleans. Currently, there are only three (as shown below), but I'm about to add a 4th option and this is getting very tedious.
bracketFirstIndex = message.indexOf('[');
mentionFirstIndex = message.indexOf('#');
urlFirstIndex = message.indexOf(urlStarter);
bool startsWithBracket = (bracketFirstIndex != -1);
bool startsWithAtSymbol = (mentionFirstIndex != -1);
bool startsWithUrl = (urlFirstIndex != -1);
if (!startsWithBracket)
{
if (!startsWithAtSymbol)
{
if (!startsWithUrl)
{
// No brackets, mentions, or urls. Send message as normal
cursor.insertText(message);
break;
}
else
{
// There's a URL, lets begin!
index = urlFirstIndex;
}
}
else
{
if (!startsWithUrl)
{
// There's an # symbol, lets begin!
index = mentionFirstIndex;
}
else
{
// There's both an # symbol and URL, pick the first one... lets begin!
index = std::min(urlFirstIndex, mentionFirstIndex);
}
}
}
else
{
if (!startsWithAtSymbol)
{
// There's a [, look down!
index = bracketFirstIndex;
}
else
{
// There's both a [ and #, pick the first one... look down!
index = std::min(bracketFirstIndex, mentionFirstIndex);
}
if (startsWithUrl)
{
// If there's a URL, pick the first one... then lets begin!
// Otherwise, just "lets begin!"
index = std::min(index, urlFirstIndex);
}
}
Is there a better/more simpler way to compare several boolean values, or am I stuck in this format and I should just attempt to squeeze in the 4th option in the appropriate locations?
Some type of text processing are fairly common, and for those, you should strongly consider using an existing library. For example, if the text you are processing is using the markdown syntax, consider using an existing library to parse the markdown into a structured format for you to interpret.
If this is completely custom parsing, then there are a few options:
For very simple text processing (like a single string expected to
be in one of a few formats or containing a piece of subtext in an expected format), use regular expressions. In C++, the RE2 library provides very powerful support for matching and extracting usign regexes.
For more complicated text processing, such as data spanning many lines or having a wide variety of content / syntax, consider using an existing lexer and parser generator. Flex and Bison are common tools (used together) to auto-generate logic for parsing text according to a grammar.
You can, by hand, as you are doing now, write your own parsing logic.
If you go with the latter approach, there are a few ways to simplify things:
Separate the "lexing" (breaking up the input into tokens) and "parsing" (interpreting the series of tokens) into separate phases.
Define a "Token" class and a corresponding hierarchy representing the types of symbols that can appear within your grammar (like RawText, Keyword, AtMention, etc.)
Create one or more enums representing the states that your parsing logic can be in.
Implement your lexing and parsing logic as a state machine that transforms the state given the current state and the next token or letter. Building up a map from (state, token type) to next_state or from (state, token type) to handler_function can help you to simplify the structure.
Since you are switching only on the starting letter, use cases:
enum State { Start1, Start2, Start3, Start4};
State state;
if (startswithbracket) {
state = Start1;
} else {
.
.
.
}
switch (state) {
case Start1:
dosomething;
break;
case Start2:
.
.
.
}
More information about switch syntax and use cases can be found here.

Ignore ending character if a other character was found (find the scope in code)

I'm currently trying how to use Regex in C++ for more advanced stuff like creating my own coding language (just for the fun of it) and I'm having a bit of a problem with reading out the scope of the code.
I'm currently using this text to test my Regex against,
private char test;
public static int foo = 0;
private int var = 0;
private void run(char data, int add)
{
var += 50 + add;
print (var + test, asdm, asf.getString());
if (var == 70) {
print("yes");
}
}
And the regex I'm using to read out the scope of run and the if-statement within it is,
\{([\S\s]*?)\}
This is what it matches atm
{
var += 50 + add;
print (var + test, asdm, asf.getString());
if (var == 70) {
print("yes");
}
The problem is that it find a } before the actual } it needs, I'd like to know if there is a way to go around this and capture each individual scope as it's own part.
I've tried messing around with loopbacks but I keep getting the error 'The expression contained mismatched ( and )' even though it should be valid regex. Edit, I found out that the c++ version of regex (default one) does not support lookbehind which is why it didn't work.