Using a var in regex [duplicate] - regex

Is there a way to escape the special characters in regex, such as []()* and others, from a string?
Basically, I'm asking the user to input a string, and I want to be able to search in the database using regex. Some of the issues I ran into are too many)'s or [x-y] range in reverse order, etc.
So what I want to do is write a function to do replace on the user input. For example, replacing ( with \(, replacing [ with \[
Is there a built-in function for regex to do so? And if I have to write a function from scratch, is there a way to account all characters easily instead of writing the replace statement one by one?
I'm writing my program in C# using Visual Studio 2010

You can use .NET's built in Regex.Escape for this. Copied from Microsoft's example:
string pattern = Regex.Escape("[") + "(.*?)]";
string input = "The animal [what kind?] was visible [by whom?] from the window.";
MatchCollection matches = Regex.Matches(input, pattern);
int commentNumber = 0;
Console.WriteLine("{0} produces the following matches:", pattern);
foreach (Match match in matches)
Console.WriteLine(" {0}: {1}", ++commentNumber, match.Value);
// This example displays the following output:
// \[(.*?)] produces the following matches:
// 1: [what kind?]
// 2: [by whom?]

you can use Regex.Escape for the user's input

string matches = "[]()*";
StringBuilder sMatches = new StringBuilder();
StringBuilder regexPattern = new StringBuilder();
for(int i=0; i<matches.Length; i++)
sMatches.Append(Regex.Escape(matches[i].ToString()));
regexPattern.AppendFormat("[{0}]+", sMatches.ToString());
Regex regex = new Regex(regexPattern.ToString());
foreach(var m in regex.Matches("ADBSDFS[]()*asdfad"))
Console.WriteLine("Found: " + m.Value);

Related

Match longest substring with regex [duplicate]

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,
String pattern = ("(hello|hello world)");
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var matches = regex.Matches("hello world");
Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.
If you already know the lengths of the words beforehand, then put the longest first. For example:
String pattern = ("(hello world|hello)");
The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.
An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.
Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.
String pattern = ("(hello world|hello wor|hello)");
Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.
string input = "hello world";
string patternFull = "hello world";
Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase);
var matches = regexFull.Matches(input);
if (matches.Count == 0)
{
string patternShort = "hello";
Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase);
matches = regexShort.Matches(input);
}
At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.
You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).
public bool HasRegexMatchInOrder(string input, params string[] patterns)
{
foreach (var pattern in patterns)
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
return true;
}
}
return false;
}
string input = "hello world";
bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...);

Google sheet : REGEXREPLACE match everything except a particular pattern

I would try to replace everything inside this string :
[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs
1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567,
1410315029
Except the following numbers :
1303222074
1403281851
1307239335
1403277567
1410315029
I have built a REGEX to match them :
1[0-9]{9}
But I have not figured it out to do the opposite that is everything except all matches ...
google spreadsheet use the Re2 regex engine and doesn't support many usefull features that can help you to do that. So a basic workaround can help you:
match what you want to preserve first and capture it:
pattern: [0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)
replacement: $1 (with a space after)
demo
So probably something like this:
=TRIM(REGEXREPLACE("[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs 1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567, 1410315029"; "[0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)"; "$1 "))
You can also do this with dynamic native functions:
=REGEXEXTRACT(A1,rept("(\d{10}).*",counta(split(regexreplace(A1,"\d{10}","#"),"#"))-1))
basically it is first split by the desired string, to figure out how many occurrences there are of it, then repeats the regex to dynamically create that number of capture groups, thus leaving you in the end with only those values.
First of all thank you Casimir for your help. It gave me an idea that will not be possible with a built-in functions and strong regex lol.
I found out that I can make a homemade function for my own purposes (yes I'm not very "up to date").
It's not very well coded and it returns doublons. But rather than fixing it properly, I use the built in UNIQUE() function on top of if to get rid of them; it's ugly and I'm lazy but it does the job, that is, a list of all matches of on specific regex (which is: 1[0-9]{9}). Here it is:
function ti_extract(input) {
var tab_tis = new Array();
var tab_strings = new Array();
tab_tis.push(input.match(/1[0-9]{9}/)); // get the TI and insert in tab_tis
var string_modif = input.replace(tab_tis[0], " "); // modify source string (remove everything except the TI)
tab_strings.push(string_modif); // insert this new string in the table
var v = 0;
var patt = new RegExp(/1[0-9]{9}/);
var fin = patt.test(tab_strings[v]);
var first_string = tab_strings[v];
do {
first_string = tab_strings[v]; // string 0, or the string with the first removed TI
tab_tis.push(first_string.match(/1[0-9]{9}/)); // analyze the string and get the new TI to put it in the table
var string_modif2 = first_string.replace(tab_tis[v], " "); // modify the string again to remove the new TI from the old string
tab_strings.push(string_modif2);
v += 1;
}
while(v < 15)
return tab_tis;
}

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"

Regex To Match Order Of String

I wanted to match the words in string with reverse order.
We wanted to put validation to prompt user, if name exists in reverse order.
For example:
If name column has the value, 'Viral,Tennis'
Now if user enters a new name with the value, 'Tennis,Viral'
Then how can we match reverse order of word using regex or some other way?
I am using C#.net for development.
You could take a look at the Regex.Split(String input, String regex) and do something like so:
String[] userEntry = Regex.Split(userString, "\\s+");
StringBuilder sb = new StringBuilder()
for (int i = userEntry.Length -1; i >= 0; i--)
{
sb.append(userEntry[i]).append(" ");
}
String result = sb.ToString();
//Do Validation
That would do the trick, however, you need to keep in mind that things will get a little bit messy if you do not want to change the order of special symbols such as the comma. You could easily remove those and do any validation without special symbols.
EDIT: It depends on what you mean by special symbols. The regex [^a-zA-z0-9]+ will match any character which is not a letter (upper or lower case) and which is also not a number. So you could easily do something like so:
string input = ...
string pattern = "[^a-zA-z0-9]+";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
The above should yield a string which is only made from letters and digits. White spaces will also be removed.

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.