Regex Split String excluding hyphen - regex

I have a Full Name as input and want to split the whole full name word by word but it should:
Do not Split the word if contains a Hyphen e.g. REES-MOGG
Should Split the word if contains an Underscore e.g REES_MOGG
HYPHEN
Example:
MRS C REES-MOGG
Result:
MRS
C
REES-MOGG
UNDERSCORE
Example:
MRS C REES_MOGG
Result :
MRS
C
REES
MOGG
I am currently using the code below but in vain:
Dim str As String() = Regex.Split(names, "\s+")

Just split on "\s+|_", that will split on whitespace, and also on underscores. Your code would be:
Dim str As String() = **Regex.Split(names, "\s+|_")**
Demo.
For it to split on ampersands too, just add |\& to the string:
Dim str As String() = **Regex.Split(names, "\s+|_|\&")**
Demo.

use this :
Dim str As String() = Regex.Split(names, "[\s_]+")

Dim str As String() = names.Split({" ", "_", "&", vbTab}, StringSplitOptions.RemoveEmptyEntries)

In order to make your script split on white-space and underscores you simply need to add a character group [ ] around the white-space character \s in your regex and then add any other symbols which you want to spit on into that group.
Dim str As String() = Regex.Split(names, "[\s_]+")

I don't know much about VB .NET, but you should change your RegEx for sure.
here is an example, though I tested on Javascript.
Dim matchForHyphen As MatchCollection = Regex.Matches("MRS C REES-MOGG","[\w]*[^_]*")
Dim matchForUnderscore As MatchCollection = Regex.Match("MRS C REES_MOGG","[\w]*[^_]*")
Then you should cycle through the Match objects to get the results.
eg. matchForHyphen[i] in a For cycle. or a For Each statement
Hope it helps

Related

VB.Net regex random string

I have regex code that gets string between 2 strings from TextBox1.
TextBox1 looks something like this:
href="www.example.com/account/05798/john123">
href="www.example.com/account/4970/max16">
href="www.example.com/account/96577/killer007">
href="www.example.com/account/3077/hackerboy1337">
href="www.example.com/account/43210/king42">
So, it will get value from href="www.example.com/account/4321/ to "> (usernames)
The problem is, how to do it? My regex code:
(?<="href=""www.example.com/account/RANDOM_STRING/")(.*?)(?="">)
I know i could replace RANDOM_STRING with \w{4}, but some IDs are 5-digit.
You need a negated character class [^/] that matches any char but a /. So, replace RANDOM_STRING with [^/]*.
Also, in a regex pattern, to match ., you need to escape the dot - \..
Thus, your regex pattern can be fixed as
(?<="href=""www\.example\.com/account/[^/]*/").*?(?="">)
However, you may user a simpler regex with a capturing group:
"href=""www\.example\.com/account/[^/]*/"(.*?)"">
The value you need is in Match.Groups(1).Value.
Or another option would be to do this
Dim strOne As String = "www.example.com/account/43210/king42"
Dim strMain As String = Split(strOne, "/account/")(1)
Dim strSubOne As String = Split(strMain, "/")(0)
Dim strSubTwo As String = Split(strMain, "/")(1)

Regex to find by ignoring certain words

I am very new with regex. I want to search for multiple words in a string by ignoring common words like "in", "of", "the" and special characters like comma, backslash etc.
My Code
Dim StringToSearchFrom As String = "Thus, one shifts one's focus in a variety of directions all at the same time"
Dim PhraseToSearch As String = "focus variety directions"
Dim found1 As Match = Regex.Match(StringToSearchFrom, Regex needed)
If found1.Success Then
MsgBox(found1.Index)
Else
First regex should ignore the complete words "in", "a" and "of" while trying to find and then return the index of the first word (focus ) of PhraseToSearch. Thanks
You can use the following regular expression, that you will have to build dynamically. Here is a proof-of-concept example that will capture "focus variety" in your string ignoring "a" and "in":
Public Dim MyRegex As Regex = New Regex( _
"focus(?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*)variety", _
RegexOptions.IgnoreCase _
Or RegexOptions.CultureInvariant _
Or RegexOptions.Compiled _
)
Explanation:
To make a part of string optional, we should still be able to capture it in the pattern. If you replace all optional substrings in your query string with (?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*), you will be able to match any words in the word list (?:in|of|a|the) (update with your word list), punctuation \p{P}, symbols \p{S}, whitespace \p{Z}.
Dim StringToSearchFrom As String = "Thus, one shifts one's focus in a variety of directions all at the same time"
Dim PhraseToSearch As String = "focus variety directions"
Dim optional_pattern As String = "(?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*)"
Dim rgx_Optional As New Regex(optional_pattern)
PhraseToSearch = rgx_Optional.Replace(PhraseToSearch, optional_pattern)
Dim rgx_Search As New Regex(PhraseToSearch)
' And then apply our regex
Dim found1 As Match = rgx_Search.Match(StringToSearchFrom)
If found1.Success Then
MsgBox(found1.Index)
Else

Remove all comments from a PHP source file

I would like to remove all comments from a PHP source file from within a VB.NET application. Another stackoverflow question showed how to do this in C# code
I came up with this conversion, but it does not work unfortunately:
Dim blockComments As String = "/\*(.*?)\*/"
Dim lineComments As String = "//(.*?)\r?\n"
Dim strings As String = """((\\[^\n]|[^""\n])*)"""
Dim verbatimStrings As String = "#(""[^""]*"")+"
regex = New Regex(blockComments & "|" & lineComments)
srcT = regex.Replace(srcT, "")
You need to pass the flag RegexOptions.Singleline when constructing the Regex object. Otherwise, the block-comments can't span multiple lines.
regex = New Regex(blockComments & "|" & lineComments, RegexOptions.Singleline)
The . normally matches any character except newline (\n). The RegexOptions.Singleline flag makes it match any character, including newline.

Whole word replacements using Regular Expression

I have a list of original words and replace with words which I want to replace occurrence of the original words in some sentences to the replace words.
For example my list:
theabove the above
myaddress my address
So the sentence "This is theabove." will become "This is the above."
I am using Regular Expression in VB like this:
Dim strPattern As String
Dim regex As New RegExp
regex.Global = True
If Not IsEmpty(myReplacementList) Then
For intRow = 0 To UBound(myReplacementList, 2)
strReplaceWith = IIf(IsNull(myReplacementList(COL_REPLACEMENTWORD, intRow)), " ", varReplacements(COL_REPLACEMENTWORD, intRow))
strPattern = "\b" & myReplacementList(COL_ORIGINALWORD, intRow) & "\b"
regex.Pattern = strPattern
TextToCleanUp = regex.Replace(TextToReplace, strReplaceWith)
Next
End If
I loop all entries in my list myReplacementList against the text TextToReplace I want to process, and the replacement have to be whole word so I used the "\b" token around the original word.
It works well but I have a problem when the original words contain some special characters for example
overla) overlay
I try to escape the ) in the pattern but it does not work:
\boverla\)\\b
I can't replace the sentence "This word is overla) with that word." to "This word is overlay with that word."
Not sure what is missing? Is regular expression the way to the above scenario?
I'd use string.replace().
That way you don't have to escape special chars .. only these: ""!
See here for examples: http://www.dotnetperls.com/replace-vbnet
Regex is good if your looking for patterns. Or renaming your mp3 collection ;-) and much, much more. But in your case, I'd use string.replace().

Split string on several words, and track which word split it where

I am trying to split a long string based on an array of words. For Example:
Words: trying, long, array
Sentence: "I am trying to split a long string based on an array of words."
Resulting string array:
I am
trying
to split a
long
string based on an
array
of words
Multiple instances of the same word is likely, so having two instances of trying cause a split, or of array, will probably happen.
Is there an easy way to do this in .NET?
The easiest way to keep the delimiters in the result is to use the Regex.Split method and construct a pattern using alternation in a group. The group is key to including the delimiters as part of the result, otherwise it will drop them. The pattern would look like (word1|word2|wordN) and the parentheses are for grouping. Also, you should always escape each word, using the Regex.Escape method, to avoid having them incorrectly interpreted as regex metacharacters.
I also recommend reading my answer (and answers of others) to a similar question for further details: How do I split a string by strings and include the delimiters using .NET?
Since I answered that question in C#, here's a VB.NET version:
Dim input As String = "I am trying to split a long string based on an array of words."
Dim words As String() = { "trying", "long", "array" }
If (words.Length > 0)
Dim pattern As String = "(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")"
Dim result As String() = Regex.Split(input, pattern)
For Each s As String in result
Console.WriteLine(s)
Next
Else
' nothing to split '
Console.WriteLine(input)
End If
If you need to trim the spaces around each word being split you can prefix and suffix \s* to the pattern to match surrounding whitespace:
Dim pattern As String = "\s*(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\s*"
If you're using .NET 4.0 you can drop the ToArray() call inside the String.Join method.
EDIT: BTW, you need to decide up front how you want the split to work. Should it match individual words or words that are a substring of other words? For example, if your input had the word "belong" in it, the above solution would split on "long", resulting in {"be", "long"}. Is that desired? If not, then a minor change to the pattern will ensure the split matches complete words. This is accomplished by surrounding the pattern with a word-boundary \b metacharacter:
Dim pattern As String = "\s*\b(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\b\s*"
The \s* is optional per my earlier mention about trimming.
You could use a regular expression.
(.*?)((?:trying)|(?:long)|(?:array))(.*)
will give you three groups if it matches:
1) The bit before the first instance of any of the split words.
2) The split word itself.
3) The rest of the string.
You can keep matching on (3) until you run out of matches.
I've played around with this but I can't get a single regex that will split on all instances of the target words. Maybe someone with more regex-fu can explain how.
I've assumed that VB has regex support. If not, I'd recommend using a different language. Certainly C# has regexes.
You can split with " ",
and than go through the words and see which one is contained in the "splitting words" array
Dim testS As String = "I am trying to split a long string based on an array of words."
Dim splitON() As String = New String() {"trying", "long", "array"}
Dim newA() As String = testS.Split(splitON, StringSplitOptions.RemoveEmptyEntries)
Something like this
Dim testS As String = "I am trying to split a long string based on a long array of words."
Dim splitON() As String = New String() {"long", "trying", "array"}
Dim result As New List(Of String)
result.Add(testS)
For Each spltr As String In splitON
Dim NewResult As New List(Of String)
For Each s As String In result
Dim a() As String = Strings.Split(s, spltr)
If a.Length <> 0 Then
For z As Integer = 0 To a.Length - 1
If a(z).Trim <> "" Then NewResult.Add(a(z).Trim)
NewResult.Add(spltr)
Next
NewResult.RemoveAt(NewResult.Count - 1)
End If
Next
result = New List(Of String)
result.AddRange(NewResult)
Next
Peter, I hope the below would be suitable for Split string by array of words using Regex
// Input
String input = "insert into tbl1 inserttbl2 insert into tbl2 update into tbl3
updatededle into tbl4 update into tbl5";
//Regex Exp
String[] arrResult = Regex.Split(input, #"\s+(?=(?:insert|update|delete)\s+)",
RegexOptions.IgnoreCase);
//Output
[0]: "insert into tbl1 inserttbl2"
[1]: "insert into tbl2"
[2]: "update into tbl3 updatededle into tbl4"
[3]: "update into tbl5"