How to process this string via regular expression - regex

my string style like this
expression1/field1+expression2*expression3+expression4/field2*expression5*expression6/field3
a real style mybe like this:
computer/(100)+web*mail+explorer/(200)*bbs*solution/(300)
"+" and "*" represent operator
"computer","web"...represent expression
(100),(200) represent field num . field num may not exist.
I want process the string to this:
<computer>/(100)+web*<mail>+explorer/(200)*bbs*<solution>/(300)
rules like this
if expression length is more than 3 and its field is not (200), then add brackets to it.

My recommendation is to mix regex with other language features. The complication arises from the fact that field appears before expressions, and lookbehind is usually more limited than lookforward.
In pseudo-Java-code, I recommend doing something like this:
String[] parts = input.split("/");
for (int i = 0; i < parts.length; i++) {
if (!parts[i].startsWith("(200)"))
parts[i] = parts[i].replaceAll("(?=[a-z]{4})([a-z]+)", "<$1>");
}
String output = parts.join("/");

I would not use just regular expression.
You say "if expression length is more than 3 and its field is not (200), then add brackets to it"
I think a normal conditional statement is the best and clearest solutoion for this.
I think regular expressions are sometimes overused. Regexes are hard to read, and when a couple of conditional statements can do the same but more clearly, then I'd say the code quality is higher.

Related

How to use Regex in DataSet.Tables.Select() in VB.net

I have a dataset that contains multiple values. I want to take those rows from that dataset from the datatable BLABLA that contains an "S" with the numbers from zero to six. Then I want to display those in a MessageBox.
My Regex is S[0-6].
Dim answer As String = ""
Dim myregex As Regex = New Regex("S[0-6]")
Dim SearchRows() As DataRow = datasetB.Tables("BLABLA").Select("Data LIKE '%myregex%'")
For k As Integer = 0 To SearchRows.Length - 1
If answer = "" Then
answer = SearchRows(k).Item("Data")
Else
answer = answer & vbNewLine & SearchRows(k).Item("Data")
End If
Next
MsgBox(answer)
Unfortunately SearchRows is empty. I couldn't find the reason by debugging.
What am I doing wrong?
The DataTable.Select method does not support regex. As the documentation states, it does allow you to pass it a filterExpression string as an argument, but just because it takes a filter expression doesn't mean that it support's regex expressions. On the contrary, it's designed to mostly support the same kinds of expressions as the WHERE clause in T-SQL. T-SQL's LIKE operator does not support regex patterns, and neither does DataTable.Select. See this documentation to learn the rules for the pattern expressions that are supported by the DataTable.Select method's LIKE operator.
The filter expressions supported by the LIKE operator are not as advanced as regex, so it's almost certainly impossible to construct a filter expression which is that specific. If there is a way to filter to digits between 0 and 6, I am unaware of it and the documentation doesn't mention it. So, if you really need to filter rows by regex, you can still do it, but you need to select all the rows and then filter them yourself:
Dim SearchRows() As DataRow = datasetB.Tables("BLABLA").Select().
Where(Function(r) myregex.IsMatch(r.Item("Data").ToString())).
ToArray()

Part of as string from a string using regular expressions

I have a string of 5 characters out of which the first two characters should be in some list and next three should be in some other list.
How could i validate them with regular expressions?
Example:
List for First two characters {VBNET, CSNET, HTML)}
List for next three characters {BEGINNER, EXPERT, MEDIUM}
My Strings are going to be: VBBEG, CSBEG, etc.
My regular expression should find that the input string first two characters could be either VB, CS, HT and the rest should also be like that.
Would the following expression work for you in a more general case (so that you don't have hardcoded values): (^..)(.*$)
- returns the first two letters in the first group, and the remaining letters in the second group.
something like this:
^(VB|CS|HT)(BEG|EXP|MED)$
This recipe works for me:
^(VB|CS|HT)(BEG|EXP|MED)$
I guess (VB|CS|HT)(BEG|EXP|MED) should do it.
If your strings are as well-defined as this, you don't even need regex - simple string slicing would work.
For example, in Python we might say:
mystring = "HTEXP"
prefix = mystring[0:2]
suffix = mystring[2:5]
if (prefix in ['HT','CS','VB']) AND (suffix in ['BEG','MED','EXP']):
pass # valid!
else:
pass # not valid. :(
Don't use regex where elementary string operations will do.

Regular expression to find two sets of 11 only

Hello guys I need to find a regular expression that takes strings with two sets of 11 only
from a set {0,1,2}
0011110000 match it only has two sets
0010001001100 does not match (only has one set)
0000011000110011 does not match (it has three sets)
00 does not match (it has no set
0001100000110001 match it only has two sets
This is what I've done so far
([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*|1$)*
--------------------------
I think what I'm missing is that I need to make sure the underlined section of the above regular expression has to make sure there is no more "11" left in the string, and I don't think that section is working correctly.
You could use a regular expression, but you've got much simpler options available to you. Here's an example in C#:
public bool IsValidString(string input)
{
return input.Split(new string[] { "11" }, StringSplitOptions.None).Length == 3;
}
Although regular expressions can be a very useful tool, their usage is not always warranted. As jwz put it:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
If this is not homework, then I would suggest avoiding a regex and going with a regular function (shown here is JavaScript):
function hasTwoElevensOnly(s) {
var first = s.indexOf("11");
if (first < 0) return false;
var second = s.indexOf("11", first + 2);
if (second < 0) return false;
return s.indexOf("11", second + 2) < 0;
}
Code here: http://jsfiddle.net/8FMRH/
function hasTwoElevensOnly(s) {
return /^((0|1(?!1)|2)*?11(0|1(?!1)|2)*?){2}$/.test(s);
}
If a regex is required,
COde here: http://jsfiddle.net/PAARn/1/
most of regex comes with the restriction of appearance, usually in {}. For example, in JavaScript, you could do something like:
/^((10|0)*11(01|0)*){2}$/
Which mataches 2 set of 11 prefixed and suffixed with 0+ 0 in the string.
There may be a simpler way, but starting with your approach, this seems to work on the sample data provided:
/^([^1]|1[023])*11([^1]|1[023])*11((?<!11)|1[023]|[023]|(?<=[023])1)*$/
Using lookbehind.

Is there a RegEx that can parse out the longest list of digits from a string?

I have to parse various strings and determine a prefix, number, and suffix. The problem is the strings can come in a wide variety of formats. The best way for me to think about how to parse it is to find the longest number in the string, then take everything before that as a prefix and everything after that as a suffix.
Some examples:
0001 - No prefix, Number = 0001, No suffix
1-0001 - Prefix = 1-, Number = 0001, No suffix
AAA001 - Prefix = AAA, Number = 001, No suffix
AAA 001.01 - Prefix = AAA , Number = 001, Suffix = .01
1_00001-01 - Prefix = 1_, Number = 00001, Suffix = -01
123AAA 001_01 - Prefix = 123AAA , Number = 001, Suffix = _01
The strings can come with any mixture of prefixes and suffixes, but the key point is the Number portion is always the longest sequential list of digits.
I've tried a variety of RegEx's that work with most but not all of these examples. I might be missing something, or perhaps a RegEx isn't the right way to go in this case?
(The RegEx should be .NET compatible)
UPDATE: For those that are interested, here's the C# code I came up with:
var regex = new System.Text.RegularExpressions.Regex(#"(\d+)");
if (regex.IsMatch(m_Key)) {
string value = "";
int length;
var matches = regex.Matches(m_Key);
foreach (var match in matches) {
if (match.Length >= length) {
value = match.Value;
length = match.Length;
}
}
var split = m_Key.Split(new String[] {value}, System.StringSplitOptions.RemoveEmptyEntries);
m_KeyCounter = value;
if (split.Length >= 1) m_KeyPrefix = split(0);
if (split.Length >= 2) m_KeySuffix = split(1);
}
You're right, this problem can't be solved purely by regular expressions. You can use regexes to "tokenize" (lexically analyze) the input but after that you'll need further processing (parsing).
So in this case I would tokenize the input with (for example) a simple regular expression search (\d+) and then process the tokens (parse). That would involve seeing if the current token is longer than the tokens seen before it.
To gain more understanding of the class of problems regular expressions "solve" and when parsing is needed, you might want to check out general compiler theory, specifically when regexes are used in the construction of a compiler (e.g. http://en.wikipedia.org/wiki/Book:Compiler_construction).
You're input isn't regular so, a regex won't do. I would iterate over the all groups of digits via (\d+) and find the longest and then build a new regex in the form of (.*)<number>(.*) to find your prefix/suffix.
Or if you're comfortable with string operations you can probably just find the start and end of the target group and use substr to find the pre/suf fix.
I don't think you can do this with one regex. I would find all digit sequences within the string (probably with a regex) and then I would select the longest with .NET code, and call Split().
This depends entirely on your Regexp engine. Check your Regexp environment for capturing, there might be something in it like the automatic variables in Perl.
OK, let's talk about your question:
Keep in mind, that both, NFA and DFA, of almost every Regexp engine are greedy, this means, that a (\d+) will always find the longest match, when it "stumbles" over it.
Now, what I can get from your example, is you always need middle portion of a number, try this:
/^(.*\D)?(\d+)(\D.*)?$/ig
The now look at variables $1, $2, $3. Not all of them will exist: if there are all three of them, $2 will hold your number in question, the other vars, parts of the prefix. when one of the prefixes is missing, only variable $1 and $2 will be set, you have to see for yourself, which one is the integer. If both prefix and suffix are missing, $1 will hold the number.
The idea is to make the engine "stumble" over the first few characters and start matching a long number in the middle.
Since the modifier /gis present, you can loop through all available combinations, that the machine finds, you can then simply take the one you like most or something.
This example is in PCRE, but I'm sure .NET has a compatible mode.

c++ search text n boolean mode

basically have two questions.
1. Is there a c++ library that would do full text boolean search just like in mysql. E.g.,
Let's say I have:
string text = "this is my phrase keywords test with boolean query.";
string booleanQuery = "\"my phrase\" boolean -test -\"keywords test\" OR ";
booleanQuery += "\"boolean search\" -mysql -sql -java -php"b
//where quotes ("") contain phrases, (-) is NOT keyword and OR is logical OR.
If answer to first is no, then;
2. Is it possible to search a phrase in text. e.g.,
string text =//same as previous
string keyword = "\"my phrase\"";
//here what's the best way to search for my phrase in the text?
TR1 has a regex class (derived from Boost::regex). It's not quite like you've used above, but reasonably close. Boost::phoenix and Boost::Spirit also provide similar capabilities, but for a first attempt the Boost/TR1 regex class is probably a better choice.
As to the 2nd point: string class does have a method find, see http://www.cppreference.com/wiki/string/find
Sure there is, try Spirit:
http://boost-spirit.com/home/