Remove sub string vb.net - regex

How can i remove ing from a word if it shows continuity:
ie,
Remove ing if words like
playing
dancing
crying
eating
imitating
will not Remove ing if words like
sing
wing
swing
I know that if i want to remove ing from a word means i can use any of the following methods:
Method 1:
Dim op As String
Dim input As String = "playing"
Dim pattern = "ing\.?|ING\.?"
op = Regex.Replace(input, pattern, "") 'play
Method 2:
Dim op As String
Dim input As String = "playing"
op = input.Replace("ing", "")
My Question is that is it possible to check whether the word show continuity? if yes then remove ing from the word

Regex is not enough to do this kind of analysis. I would suggest querying an online service (such as WordNet) to know whether the word is a continuous form or not.

Use "$" sign at the end of your pattern which means that the ing should be at the end of word. Since there is no English word with the ing at the end of it that means other than continuity, you can use this pattern:
Dim pattern = "\w+ing\.?$|\w+ING\.?$"
Hope this helps

Related

VBA Word Wildcards - finding shortest possible set of characters

I have trouble finding working solution for couple of hours now. I hope you will help me.
My problem:
I need to find and select in Word a whole sentence after providing the starting and ending strings of particular sentence.
For example, when my starting string is "People" and ending string is "apples." I expect Word to select the whole "People like red apples." sentence in my document. (If such a sentence exists)
For this purpose I prepared a macro which works almost like I want. The only problem is that it doesn't select the smallest possible set of characters (which I want it to do). To make it clear let's assume I have this text in my document: People like smoking. People like red apples.
Now, when I provide the starting and ending strings to the macro respectively as "People" and "apples.", it selects all the text, which contains 2 sentences mentioned above. That is my problem: I wanted it to select only the second sentence (People like red apples.), not both of them, even though they start with the same word. So, basically, I always want to select the shortest possible set of characters (which in this case is only the last sentence).
Here is a part of my macro in VBA:
`text_str = startStr & "*" & endStr
With Application.Selection.Find
.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Text = text_str
.MatchWildcards = True
.MatchCase = True
.Execute
End With
I know the problem is with the Wildcards (or very limited set of regular expressions), so I also tried something like this as the search string:
text_str = "(" & startStr & "*){1}" & endStr
It also didn't help. I'm stuck here. :/
Thanks for any suggestions!
Selection.Find has something similar to regular expressions,
but in this case you must use real regular expressions.
The pattern (in this particular case) should be:
People[^.]+apples\.
I wrote an example macro, which:
Selects the whole text in the document and assigns it to src
variable (searched by the regex).
Sets the cursor at the beginning of the document.
Checks whether the pattern can be matched (regEx.Test).
Executes the regex.
Assigns the matched string to ret variable.
Displays it in a message box.
Below you have a complete macro. Probably you should change it to
select (find) the text matched (instead of the message box).
Sub Re()
Dim startStr As String: startStr = "People"
Dim endStr As String: endStr = "apples"
Dim pattern As String: pattern = startStr & "[^.]+" & endStr & "\."
Dim regEx As New RegExp
Dim src As String
Dim ret As String
Dim colMatches As MatchCollection
ActiveDocument.Range.Select
src = ActiveDocument.Range.Text
Selection.StartOf
regEx.pattern = pattern
If (regEx.Test(src)) Then
Set colMatches = regEx.Execute(src)
ret = "Match: " & colMatches(0).Value
Else
ret = "Matching Failed"
End If
MsgBox ret, vbOKOnly, "Result"
End Sub

regex .NET to find and replace underscores only if found between > and <

I have a list of strings looking like this:
Title_in_Title_by_-_Mr._John_Doe
and I need to replace the _ with a SPACE from the text between the html"> and </a> ONLY.
so that the result to look like this:
Title in Title by - Mr. John Doe
I've tried to do it in 2 steps:
first isolate that part only with .*html">(.*)<\/a.* & ^.*>(.*)<.* & .*>.*<.* or ^.*>.*<.*
and then do the replace but the return is always unchanged and now I'm stuck.
Any help to accomplish this is much appreciated
How I would do it is to .split it and then .replace it, no need for regex.
Dim line as string = "Title_in_Title_by_-_Mr._John_Doe"
Dim split as string() = line.split(">"c)
Dim correctString as String = split(1).replace("_"c," "c)
Boom done
here is the string.replace article
Though if you had to use regex, this would probably be a better way of doing it
Dim inputString = "Title_in_Title_by_-_Mr._John_Doe"
Dim reg As New Regex("(?<=\>).*?(?=\<)")
Dim correctString = reg.match(inputString).value.replace("_"c, " "c)
Dim line as string = "Title_and_Title_by_-_Mr._John_Doe"
line = Regex.Replace(line, "(?<=\.html"">)[^<>]+(?=</a>)", _
Function (m) m.Value.Replace("_", " "))
This uses a regex with lookarounds to isolate the title, and a MatchEvaluator delegate in the form of a lambda expression to replace the underscores in the title, then it plugs the result back into the string.

How to get Opposite result of Regex.Split VB.NET?

I have some string, like this one:[H]GOODYEAR[/H] [H]TIRE[/H] & RUBBER COMPANY
I need to get words that inside [H] [/H] node inside this string.
I created this Regex Pattern: \[H](.*?)\[\/H]
I've tried to use Regex.Split Method to get this words. Here's my code:
Dim pattern As String = "\[H](.*?)\[\/H]"
Dim input As String = "[H]GOODYEAR[/H] [H]TIRE[/H] & RUBBER COMPANY"
Dim SearchedResult() As String = Regex.Split(input, pattern, RegexOptions.IgnoreCase)
But then I realized, that this Split gives me everything, which is not words I need.
My question: How to get correct words? Is that any way to REVERSE Regex pattern? Or any better way to get my result?
Instead of splitting the string, you should use Regex.Matches method.
Note: I used inline modifiers (?si), the s (dotAll) modifier which forces the dot . to match newline characters in case the nodes span across multiple lines, and the i modifier for case-insensitive matching.
Dim input As String = "[H]GOODYEAR[/H] [H]TIRE[/H] & RUBBER COMPANY"
For Each m As Match In Regex.Matches(input, "(?si)\[H](.*?)\[/H]")
Console.WriteLine(m.Groups(1).Value)
Next
Output
GOODYEAR
TIRE

Whole word replacements using Regular Expression

I have a list of original words and replace with words which I want to replace occurrence of the original words in some sentences to the replace words.
For example my list:
theabove the above
myaddress my address
So the sentence "This is theabove." will become "This is the above."
I am using Regular Expression in VB like this:
Dim strPattern As String
Dim regex As New RegExp
regex.Global = True
If Not IsEmpty(myReplacementList) Then
For intRow = 0 To UBound(myReplacementList, 2)
strReplaceWith = IIf(IsNull(myReplacementList(COL_REPLACEMENTWORD, intRow)), " ", varReplacements(COL_REPLACEMENTWORD, intRow))
strPattern = "\b" & myReplacementList(COL_ORIGINALWORD, intRow) & "\b"
regex.Pattern = strPattern
TextToCleanUp = regex.Replace(TextToReplace, strReplaceWith)
Next
End If
I loop all entries in my list myReplacementList against the text TextToReplace I want to process, and the replacement have to be whole word so I used the "\b" token around the original word.
It works well but I have a problem when the original words contain some special characters for example
overla) overlay
I try to escape the ) in the pattern but it does not work:
\boverla\)\\b
I can't replace the sentence "This word is overla) with that word." to "This word is overlay with that word."
Not sure what is missing? Is regular expression the way to the above scenario?
I'd use string.replace().
That way you don't have to escape special chars .. only these: ""!
See here for examples: http://www.dotnetperls.com/replace-vbnet
Regex is good if your looking for patterns. Or renaming your mp3 collection ;-) and much, much more. But in your case, I'd use string.replace().

Split string on several words, and track which word split it where

I am trying to split a long string based on an array of words. For Example:
Words: trying, long, array
Sentence: "I am trying to split a long string based on an array of words."
Resulting string array:
I am
trying
to split a
long
string based on an
array
of words
Multiple instances of the same word is likely, so having two instances of trying cause a split, or of array, will probably happen.
Is there an easy way to do this in .NET?
The easiest way to keep the delimiters in the result is to use the Regex.Split method and construct a pattern using alternation in a group. The group is key to including the delimiters as part of the result, otherwise it will drop them. The pattern would look like (word1|word2|wordN) and the parentheses are for grouping. Also, you should always escape each word, using the Regex.Escape method, to avoid having them incorrectly interpreted as regex metacharacters.
I also recommend reading my answer (and answers of others) to a similar question for further details: How do I split a string by strings and include the delimiters using .NET?
Since I answered that question in C#, here's a VB.NET version:
Dim input As String = "I am trying to split a long string based on an array of words."
Dim words As String() = { "trying", "long", "array" }
If (words.Length > 0)
Dim pattern As String = "(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")"
Dim result As String() = Regex.Split(input, pattern)
For Each s As String in result
Console.WriteLine(s)
Next
Else
' nothing to split '
Console.WriteLine(input)
End If
If you need to trim the spaces around each word being split you can prefix and suffix \s* to the pattern to match surrounding whitespace:
Dim pattern As String = "\s*(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\s*"
If you're using .NET 4.0 you can drop the ToArray() call inside the String.Join method.
EDIT: BTW, you need to decide up front how you want the split to work. Should it match individual words or words that are a substring of other words? For example, if your input had the word "belong" in it, the above solution would split on "long", resulting in {"be", "long"}. Is that desired? If not, then a minor change to the pattern will ensure the split matches complete words. This is accomplished by surrounding the pattern with a word-boundary \b metacharacter:
Dim pattern As String = "\s*\b(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\b\s*"
The \s* is optional per my earlier mention about trimming.
You could use a regular expression.
(.*?)((?:trying)|(?:long)|(?:array))(.*)
will give you three groups if it matches:
1) The bit before the first instance of any of the split words.
2) The split word itself.
3) The rest of the string.
You can keep matching on (3) until you run out of matches.
I've played around with this but I can't get a single regex that will split on all instances of the target words. Maybe someone with more regex-fu can explain how.
I've assumed that VB has regex support. If not, I'd recommend using a different language. Certainly C# has regexes.
You can split with " ",
and than go through the words and see which one is contained in the "splitting words" array
Dim testS As String = "I am trying to split a long string based on an array of words."
Dim splitON() As String = New String() {"trying", "long", "array"}
Dim newA() As String = testS.Split(splitON, StringSplitOptions.RemoveEmptyEntries)
Something like this
Dim testS As String = "I am trying to split a long string based on a long array of words."
Dim splitON() As String = New String() {"long", "trying", "array"}
Dim result As New List(Of String)
result.Add(testS)
For Each spltr As String In splitON
Dim NewResult As New List(Of String)
For Each s As String In result
Dim a() As String = Strings.Split(s, spltr)
If a.Length <> 0 Then
For z As Integer = 0 To a.Length - 1
If a(z).Trim <> "" Then NewResult.Add(a(z).Trim)
NewResult.Add(spltr)
Next
NewResult.RemoveAt(NewResult.Count - 1)
End If
Next
result = New List(Of String)
result.AddRange(NewResult)
Next
Peter, I hope the below would be suitable for Split string by array of words using Regex
// Input
String input = "insert into tbl1 inserttbl2 insert into tbl2 update into tbl3
updatededle into tbl4 update into tbl5";
//Regex Exp
String[] arrResult = Regex.Split(input, #"\s+(?=(?:insert|update|delete)\s+)",
RegexOptions.IgnoreCase);
//Output
[0]: "insert into tbl1 inserttbl2"
[1]: "insert into tbl2"
[2]: "update into tbl3 updatededle into tbl4"
[3]: "update into tbl5"