I need a regular expression to extract the following text block functions only.
Example:
// Comment 1. function example1() { return 1; } // Comment 2 function example2() { if (a < b) { a++ } } // Comment 3 function example3() { while (1) { i++; } } /* Comment 4 */ function example4() { i = 4; for (i = 1; i < 10; i++) { i++; } return i; }
Take into account that no line breaks. It is a single block of code.
I have tried using the following regular expression:
Expression:
function\s[a-z|A-Z|0-9_]+()\s?{(?:.+)\s}
But there is a problem, place the .+ , take me all characters to the end of the text block.
Thanks in advance guys for the help you can give me.
In PCRE (PHP, R, Delphi), you can achieve this with recursion:
function\s[a-zA-Z0-9_]+\(\)(\s?{(?>[^{}]|(?1))*})
See demo.
In Ryby, just use \g<1> instead of (?1):
function\s[a-zA-Z0-9_]+\(\)(\s?{(?>[^{}]|(\g<1>))*})
In .NET, you can match them using balanced groups:
function\s[a-zA-Z0-9_]+\(\)\s*{((?<sq>{)|(?<-sq>})|[^{}]*)+}
See demo 2
In other languages, there is no recursion, and you need to use a workaround by adding nested levels "manually". In your examples, you have 2 levels.
Thus, in Python, it will look like:
function\s[a-zA-Z0-9_]+\(\)(?:\s?{(?:[^{}]*(?:\s*{[^{}]*}[^{}]*)*})*)
See Python demo (also works in JavaScript
In Java, you will need to escape {:
function\s[a-zA-Z0-9_]+\(\)(?:\s?\{(?:[^\{}]*(?:\s*\{[^{}]*}[^{}]*)*})*)
Related
I have the following regex expression: \\(([^)]+)\\) (don't take into account the double brackets it's because of C++) and the following code:
if (in_str.find("(") != string::npos) {
print(to_string(countMatchInRegex(in_str, "\\([^ ]*\\.[^ ]*\\)")));
for (int i = 0; i < countMatchInRegex(in_str, "\\(([^)]+)\\)"); ++i) {
regex r("\\(([^)]+)\\)");
smatch m;
regex_search(in_str, m, r);
string obj = m.str();
obj = obj.substr(1, obj.length() - 2);
string property = obj.substr(obj.find(".") + 1);
obj = obj.substr(0, obj.find("."));
in_str = replace(in_str, m.str(), process_property(obj, property));
}
}
This code is supposed to find, in a string, substrings like the following: (something.somethingelse). It works fine, except it only works two times...but i don't know how. I do know the problem is not the countMatchInRegex function, because I've printed what it results the correct number of substrings that match the regex expression in the string.
If anyone has any idea, please share them, I've been stuck on this for weeks..
I actually found the error, I don’t know why but C++ was returning different values for the two countMatchInRegex so all I did was assign it to a variable and use the variable instead every thing the function was called in the code.
How can I validate a string using Regular Expressions to only allow alphanumeric characters in it?
(I don't want to allow for any spaces either).
In .NET 4.0 you can use LINQ:
if (yourText.All(char.IsLetterOrDigit))
{
//just letters and digits.
}
yourText.All will stop execute and return false the first time char.IsLetterOrDigit reports false since the contract of All cannot be fulfilled then.
Note! this answer do not strictly check alphanumerics (which typically is A-Z, a-z and 0-9). This answer allows local characters like åäö.
Update 2018-01-29
The syntax above only works when you use a single method that has a single argument of the correct type (in this case char).
To use multiple conditions, you need to write like this:
if (yourText.All(x => char.IsLetterOrDigit(x) || char.IsWhiteSpace(x)))
{
}
Use the following expression:
^[a-zA-Z0-9]*$
ie:
using System.Text.RegularExpressions;
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(SomeString)) {
...
}
You could do it easily with an extension function rather than a regex ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
for (int i = 0; i < str.Length; i++)
{
if (!(char.IsLetter(str[i])) && (!(char.IsNumber(str[i]))))
return false;
}
return true;
}
Per comment :) ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
return (str.ToCharArray().All(c => Char.IsLetter(c) || Char.IsNumber(c)));
}
While I think the regex-based solution is probably the way I'd go, I'd be tempted to encapsulate this in a type.
public class AlphaNumericString
{
public AlphaNumericString(string s)
{
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(s))
{
value = s;
}
else
{
throw new ArgumentException("Only alphanumeric characters may be used");
}
}
private string value;
static public implicit operator string(AlphaNumericString s)
{
return s.value;
}
}
Now, when you need a validated string, you can have the method signature require an AlphaNumericString, and know that if you get one, it is valid (apart from nulls). If someone attempts to pass in a non-validated string, it will generate a compiler error.
You can get fancier and implement all of the equality operators, or an explicit cast to AlphaNumericString from plain ol' string, if you care.
I needed to check for A-Z, a-z, 0-9; without a regex (even though the OP asks for regex).
Blending various answers and comments here, and discussion from https://stackoverflow.com/a/9975693/292060, this tests for letter or digit, avoiding other language letters, and avoiding other numbers such as fraction characters.
if (!String.IsNullOrEmpty(testString)
&& testString.All(c => Char.IsLetterOrDigit(c) && (c < 128)))
{
// Alphanumeric.
}
^\w+$ will allow a-zA-Z0-9_
Use ^[a-zA-Z0-9]+$ to disallow underscore.
Note that both of these require the string not to be empty. Using * instead of + allows empty strings.
Same answer as here.
If you want a non-regex ASCII A-z 0-9 check, you cannot use char.IsLetterOrDigit() as that includes other Unicode characters.
What you can do is check the character code ranges.
48 -> 57 are numerics
65 -> 90 are capital letters
97 -> 122 are lower case letters
The following is a bit more verbose, but it's for ease of understanding rather than for code golf.
public static bool IsAsciiAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
{
return false;
}
for (int i = 0; i < str.Length; i++)
{
if (str[i] < 48) // Numeric are 48 -> 57
{
return false;
}
if (str[i] > 57 && str[i] < 65) // Capitals are 65 -> 90
{
return false;
}
if (str[i] > 90 && str[i] < 97) // Lowers are 97 -> 122
{
return false;
}
if (str[i] > 122)
{
return false;
}
}
return true;
}
In order to check if the string is both a combination of letters and digits, you can re-write #jgauffin answer as follows using .NET 4.0 and LINQ:
if(!string.IsNullOrWhiteSpace(yourText) &&
yourText.Any(char.IsLetter) && yourText.Any(char.IsDigit))
{
// do something here
}
Based on cletus's answer you may create new extension.
public static class StringExtensions
{
public static bool IsAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
Regex r = new Regex("^[a-zA-Z0-9]*$");
return r.IsMatch(str);
}
}
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphaNumeric() will return True whereas "smith 23".IsAlphaNumeric(false) will return False. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden as shown above. If you want to allow spaces such that "smith 23".IsAlphaNumeric() will return True, simple default the arg.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().
12 years and 7 months later, if anyone comes across this article nowadays.
Compiled RegEx actually has the best performance in .NET 5 and .NET 6
Please look at the following link where I compare several different answers given on this question. Mainly comparing Compiled RegEx, For-Loops, and Linq Predicates: https://dotnetfiddle.net/WOPQRT
Notes:
As stated, this method is only faster in .NET 5 and .NET 6.
.NET Core 3.1 and below show RegEx being the slowest.
Regardless of the version of .NET, the For-Loop method is consistently faster than the Linq Predicate.
I advise to not depend on ready made and built in code in .NET framework , try to bring up new solution ..this is what i do..
public bool isAlphaNumeric(string N)
{
bool YesNumeric = false;
bool YesAlpha = false;
bool BothStatus = false;
for (int i = 0; i < N.Length; i++)
{
if (char.IsLetter(N[i]) )
YesAlpha=true;
if (char.IsNumber(N[i]))
YesNumeric = true;
}
if (YesAlpha==true && YesNumeric==true)
{
BothStatus = true;
}
else
{
BothStatus = false;
}
return BothStatus;
}
Just a basic question related to python regex.
My code is this
void main void()
{
abc
{
tune
}
{
wao
}
}
i want to replace the abc ,tune and wao with different values like abc=u, tune=p and wao=m.
I am unable to select these two sub brackets of code in my regex.
The Regex which i am using is \{([^}]+)\}.
I want to write a code which can substitute these values in such a way that first all sub brackets were substituted i.e (tune=p and wao=m) then main brackets are substituted (abc=u).
If you do want to make use of regex, the following will work for the given example:
import re
code = """void main void()
{
abc
{
tune
}
{
wao
}
}
"""
code = re.sub(r"({\s*)[^{\s]+(\s*{)", r"\1u\2", code)
code = re.sub(r"({\s*)[^{\s]+(\s*})(?=\s*{)", r"\1p\2", code)
code = re.sub(r"({\s*)[^{\s]+(\s*})(?=\s*})", r"\1m\2", code)
print(code)
Output:
void main void()
{
u
{
p
}
{
m
}
}
But in general cases it is recommended to do a stateful parsing as #wim suggests.
I am copying code of file 1 to file 2 , but i want the code in file 2 to look adjusted with indentation like this: at the beginning indentation=0, every curly bracket opened increases the depth of indentation, every curly bracket closed reduces the indentation 4 spaces for example. I need help in fixing this to work
char preCh;
int depth=0;
int tab = 3;
int d = 0;
int pos = 0;
file1.get(ch);
while(!file1.eof())
{
if(ch=='{')
{
d++;
}
if(ch=='}'){
d--;
}
depth = tab * d;
if(preCh == '{' && ch=='\n'){
file2.put(ch);
for (int i = 0; i <= depth; i++)
{
file2.put(' ');
}
}
else
file2.put(ch);
preCh = ch;
ch = file1.get();
}
}
result must be indented like in code editors:
int main(){
if(a>0)
{
something();
}
}
Maybe, unexpectedly for you, there is no easy answer to your question.
And because of that, your code will never work.
First and most important, you need to understand and define indentation styles. Please see here in Wikipedia. Even in your given mini example, you are mixing Allman and K&R. So, first you must be clear, what to use.
Then, you must be aware that brackets may appear in quotes, double quotes, C-Comments, C++ comments and even worse, multi line comments (and #if or #idefs). This will make life really hard.
And, for the closing brackets, and for example Allman style, you will know the needed indentation only after you printed already the "indentation spaces". So you need to work line oriented or use a line buffers, before you print a complete line.
Example:
}
In this one simple line, you will read the '}' character, after you have already printed the spaces. This will always lead to wrong (too far right) indentation.
The logic for only this case would be complicated. Ant then assume statements like
if (x ==5) { y = 3; } } } }
So, unfortunately I cannot give you an easy solution.
A parser would be needed, or I simply recommend any kinde of beautifier or pretty printer
I need to match a structure against set of patterns and take some action for each match.
Patterns should support wildcards and i need to determine which patterns is matching incoming structure, example set:
action=new_user email=*
action=del_user email=*
action=* email=*#gmail.com
action=new_user email=*#hotmail.com
Those patterns can be added/removed at realtime. There can be thousands connections, each have its own pattern and i need to notify each connection about I have received A structure which is matching. Patterns are not fully regex, i just need to match a string with wildcards * (which simple match any number of characters).
When server receives message (lets call it message A) with structure action=new_user email=testuser#gmail.com and i need to find out that patterns 1 and 3 are matching this message, then i should perform action for each pattern that match (send this structure A to corresponding connection).
How this can be done with most effecient way? I can iterate this patterns and check one-by-one, but im looking for more effecient and thread-safe way to do this. Probably its possible to group those patterns to reduce checking.. Any suggestions how this can be done?
UPD: Please note i want match multiplie patterns(thousands) aganst fixed "string"(actually a struct), not vice versa. In other words, i want to find which patterns are fitting into given structure A.
Convert the patterns to regular expressions, and match them using RE2, which is written in C++ and is one of the fastest.
Actually, if I understood correctly, the fourth pattern is redundant, since the first pattern is more general, and includes every string that is matched by the fourth. That leaves only 3 patterns, which can be easly checked by this function:
bool matches(const char* name, const char* email)
{
return strstr(name, "new_user") || strstr(name, "del_user") || strstr(email, "#gmail.com");
}
And if you prefer to parse whole string, not just match the values of action and email, then the following function should do the trick:
bool matches2(const char* str)
{
bool match = strstr(str, "action=new_user ") || strstr(str, "action=del_user ");
if (!match)
{
const char* emailPtr = strstr(str, "email=");
if (emailPtr)
{
match = strstr(emailPtr, "#gmail.com");
}
}
return match;
}
Note that the strings you put as arguments must be escaped with \0. You can read about strstr function here.
This strglobmatch supports * and ? only.
#include <string.h> /* memcmp, index */
char* strfixstr(char *s1, char *needle, int needle_len) {
int l1;
if (!needle_len) return (char *) s1;
if (needle_len==1) return index(s1, needle[0]);
l1 = strlen(s1);
while (l1 >= needle_len) {
l1--;
if (0==memcmp(s1,needle,needle_len)) return (char *) s1;
s1++;
}
return 0;
}
int strglobmatch(char *str, char *glob) {
/* Test: strglobmatch("almamxyz","?lmam*??") */
int min;
while (glob[0]!='\0') {
if (glob[0]!='*') {
if ((glob[0]=='?') ? (str[0]=='\0') : (str[0]!=glob[0])) return 0;
glob++; str++;
} else { /* a greedy search is adequate here */
min=0;
while (glob[0]=='*' || glob[0]=='?') min+= *glob++=='?';
while (min--!=0) if (*str++=='\0') return 0;
min=0; while (glob[0]!='*' && glob[0]!='?' && glob[0]!='\0') { glob++; min++; }
if (min==0) return 1; /* glob ends with star */
if (!(str=strfixstr(str, glob-min, min))) return 0;
str+=min;
}
}
return str[0]=='\0';
}
If all you want is wildcart matching, then you might try this algorithm. The point is to check all substrings that is not a wildcart to be subsequent in a string.
patterns = ["*#gmail.com", "akalenuk#*", "a*a#*", "ak*#gmail.*", "ak*#hotmail.*", "*#*.ua"]
string = "akalenuk#gmail.com"
preprocessed_patterns = [p.split('*') for p in patterns]
def match(s, pp):
i = 0
for w in pp:
wi = s.find(w, i)
if wi == -1:
return False
i = wi+len(w)
return i == len(s) or pp[-1] == ''
print [match(string, pp) for pp in preprocessed_patterns]
But it might be best to still use regexp in case you would need something more than a wildcart in a future.