How to use regex to change occurance of complex strings? - regex

For example,
Say I want to change
all occurances of <img src="https://www.blahblah.com/i" title="Bob" />
To simply
Bob
This is for vb.net
Basically there are plenty of such pattern in a big string. I want to change every one of them.
This is what I tried
Dim tdparking = New System.Text.RegularExpressions.Regex("\w* (<img.*title="")(.*)"" />")
After that I suppose I would need to do some substitution. But how?
How would I do so?

var str = '<img src="https://www.blahblah.com/i" title="Bob" />';
str = str.replace(/<img[^>]*title="(\w+)"[^>]*>/,"$1");
document.write(str);

You can use this regex: \<img[^\<\>]+title=\"([a-zA-Z]+)\"[^\<\>]+\/\> and return $1 of the matching pattern:
In PHP it should be:
preg_match_all('#\<img[^\<\>]+title=\"([a-zA-Z]+)\"[^\<\>]+\/\>#', $html, $matches);
var_dump($matches[1]);
Regards,

Everyone is making this waay to hard - Here it is in vb.net
Dim reg as Regex = New Regex("(?<=title="""").+(?="""")")
Dim str as String = "<img src=""https://www.blahblah.com/i"" title=""Bob"" />"
Dim match as String = reg.match(str).value
Depending on how the string is inputted you will either need
"(?<=title="""").+(?="""")" 'If there is Double quotes ("")
or
"(?<=title="").+(?="")" 'Or single quotes (")
Also there is no need for you to get downvoted - here is a point back

Related

Match shortest option

I'm trying to use Outlook 2013 VBA to modify an email body by pulling out and replacing a < span> section. However, with multiple spans, I'm having trouble forcing the regex to only pick up one span.
Based on some other searches, I'm trying to use negative lookahead, but failing at it.
Result from below is: <span><span style = blah blah>Tags: test, test2</span>
Desired result is: <span style = blah blah>Tags: test, test2</span>
Code for test module:
Sub regextest()
Dim regex As New RegExp
Dim testStr As String
testStr = "a<span><span style=blah blah>Tags: test, test2</span></span>"
regex.pattern = "<span.*?(?:(span)).*?Tags:.*?</span>"
Set matches = regex.Execute(testStr)
For Each x In matches
Debug.Print x 'Result: <span><span style = blah blah>Tags: test, test2</span>
Next
End Sub
Thank you!
Wiktor's answer in comments above works for my purposes:
<span\b[^<]*>[^<]*Tags:[^<]*</span>
This works as long as there are no '<' between the two span ends. Not really a lookahead, but it's good enough for what I'm doing and very simple.
Thanks Wiktor!

regex .NET to find and replace underscores only if found between > and <

I have a list of strings looking like this:
Title_in_Title_by_-_Mr._John_Doe
and I need to replace the _ with a SPACE from the text between the html"> and </a> ONLY.
so that the result to look like this:
Title in Title by - Mr. John Doe
I've tried to do it in 2 steps:
first isolate that part only with .*html">(.*)<\/a.* & ^.*>(.*)<.* & .*>.*<.* or ^.*>.*<.*
and then do the replace but the return is always unchanged and now I'm stuck.
Any help to accomplish this is much appreciated
How I would do it is to .split it and then .replace it, no need for regex.
Dim line as string = "Title_in_Title_by_-_Mr._John_Doe"
Dim split as string() = line.split(">"c)
Dim correctString as String = split(1).replace("_"c," "c)
Boom done
here is the string.replace article
Though if you had to use regex, this would probably be a better way of doing it
Dim inputString = "Title_in_Title_by_-_Mr._John_Doe"
Dim reg As New Regex("(?<=\>).*?(?=\<)")
Dim correctString = reg.match(inputString).value.replace("_"c, " "c)
Dim line as string = "Title_and_Title_by_-_Mr._John_Doe"
line = Regex.Replace(line, "(?<=\.html"">)[^<>]+(?=</a>)", _
Function (m) m.Value.Replace("_", " "))
This uses a regex with lookarounds to isolate the title, and a MatchEvaluator delegate in the form of a lambda expression to replace the underscores in the title, then it plugs the result back into the string.

Split string on single forward slashes with RegExp

edit: wow, thanks for so many suggestions, but I wanted to have a regexp solution specifically for future, more complex use.
I need support with splitting text string in VBA Excel. I looked around but solutions are either for other languages or I can't make it work in VBA.
I want to split words by single slashes only:
text1/text2- split
text1//text2- no split
text1/text2//text3 - split after text1
I tried using regexp.split function, but don't think it works in VBA. When it comes to pattern I was thinking something like below:
(?i)(?:(?<!\/)\/(?!\/))
but I also get error when executing search in my macro while it works on sites like: https://www.myregextester.com/index.php#sourcetab
You can use a RegExp match approach rather than split one. You need to match any character other than / or double // to grab the values you need.
Here is a "wrapped" (i.e. with alternation) version of the regex:
(?:[^/]|//)+
Here is a demo
And here is a more efficient, but less readable:
[^/]+(?://[^/]*)*
See another demo
Here is a working VBA code:
Sub GetMatches(ByRef str As String, ByRef coll As collection)
Dim rExp As Object, rMatch As Object
Set rExp = CreateObject("vbscript.regexp")
With rExp
.Global = True
.pattern = "(?:[^/]|//)+"
End With
Set rMatch = rExp.Execute(str)
If rMatch.Count > 0 Then
For Each r_item In rMatch
coll.Add r_item.Value
Debug.Print r_item.Value
Next r_item
End If
Debug.Print ""
End Sub
Call the sub as follows:
Dim matches As New collection
Set matches = New collection
GetMatches str:="text1/text2", coll:=matches
Here are the results for the 3 strings above:
1. text1/text2
text1
text2
2. text1/text2//text3
text1
text2//text3
3. text1//text2
text1//text2
Public Sub customSplit()
Dim v As Variant
v = Split("text1/text2//text3", "/")
v = Replace(Join(v, ","), ",,", "//")
Debug.Print v '-> "text1,text2//text3"
End Sub
or
Replace(Replace("text1/text2//text3", "/", ","), ",,", "//") '-> "text1,text2//text3"
Go to Data tab, then Text to Columns option. Later, choose "Delimited" option and then select "other" and put any delimiter you want.
Text to columns will work. Another option, if you want to keep the original value, is to use formulas:
in B1
=left(a1,find(":",a1)-1)
in C1
=mid(a1,find(":",a1)+1,len(a1))

How to create regex pattern - value between "$" character

How to create regex pattern - value between "$" character.
e.g.
String:
" <tag key = $value$ /> "
I want get "value" string from this...
If I understand correctly, you're trying to get a string that is located between 2 dollar signs. The code (perl) should look similar to:
if ($str =~ /\$(\w)\$/)
$substr = $1;
of course, you can replace the \w sign with a pattern of your choosing...
EDIT:
if ($str =~ /\<tag key \= \$(\w)\$ \/\>/)
$substr =$1;
From my understanding you are looking to return exact word from a string.
i.e return is from this island is beautiful instead of returning this, island and is
If I am correct explaining your problem the regex you are looking for will be \bis\b
For more information visit this link, http://www.regular-expressions.info/wordboundaries.html
In javascript, you could is the following:
var theString = '<tag key = $value$ />';
var newString = theString.replace(/^.*\$(.*?)\$.*$/, '$1');
Or you could use the RegExp object to do something like:
var pattern=new RegExp('\\$(.*?)\\$');
var newString = pattern.exec(theString )[1];

Split string on several words, and track which word split it where

I am trying to split a long string based on an array of words. For Example:
Words: trying, long, array
Sentence: "I am trying to split a long string based on an array of words."
Resulting string array:
I am
trying
to split a
long
string based on an
array
of words
Multiple instances of the same word is likely, so having two instances of trying cause a split, or of array, will probably happen.
Is there an easy way to do this in .NET?
The easiest way to keep the delimiters in the result is to use the Regex.Split method and construct a pattern using alternation in a group. The group is key to including the delimiters as part of the result, otherwise it will drop them. The pattern would look like (word1|word2|wordN) and the parentheses are for grouping. Also, you should always escape each word, using the Regex.Escape method, to avoid having them incorrectly interpreted as regex metacharacters.
I also recommend reading my answer (and answers of others) to a similar question for further details: How do I split a string by strings and include the delimiters using .NET?
Since I answered that question in C#, here's a VB.NET version:
Dim input As String = "I am trying to split a long string based on an array of words."
Dim words As String() = { "trying", "long", "array" }
If (words.Length > 0)
Dim pattern As String = "(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")"
Dim result As String() = Regex.Split(input, pattern)
For Each s As String in result
Console.WriteLine(s)
Next
Else
' nothing to split '
Console.WriteLine(input)
End If
If you need to trim the spaces around each word being split you can prefix and suffix \s* to the pattern to match surrounding whitespace:
Dim pattern As String = "\s*(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\s*"
If you're using .NET 4.0 you can drop the ToArray() call inside the String.Join method.
EDIT: BTW, you need to decide up front how you want the split to work. Should it match individual words or words that are a substring of other words? For example, if your input had the word "belong" in it, the above solution would split on "long", resulting in {"be", "long"}. Is that desired? If not, then a minor change to the pattern will ensure the split matches complete words. This is accomplished by surrounding the pattern with a word-boundary \b metacharacter:
Dim pattern As String = "\s*\b(" + String.Join("|", words.Select(Function(s) Regex.Escape(s)).ToArray()) + ")\b\s*"
The \s* is optional per my earlier mention about trimming.
You could use a regular expression.
(.*?)((?:trying)|(?:long)|(?:array))(.*)
will give you three groups if it matches:
1) The bit before the first instance of any of the split words.
2) The split word itself.
3) The rest of the string.
You can keep matching on (3) until you run out of matches.
I've played around with this but I can't get a single regex that will split on all instances of the target words. Maybe someone with more regex-fu can explain how.
I've assumed that VB has regex support. If not, I'd recommend using a different language. Certainly C# has regexes.
You can split with " ",
and than go through the words and see which one is contained in the "splitting words" array
Dim testS As String = "I am trying to split a long string based on an array of words."
Dim splitON() As String = New String() {"trying", "long", "array"}
Dim newA() As String = testS.Split(splitON, StringSplitOptions.RemoveEmptyEntries)
Something like this
Dim testS As String = "I am trying to split a long string based on a long array of words."
Dim splitON() As String = New String() {"long", "trying", "array"}
Dim result As New List(Of String)
result.Add(testS)
For Each spltr As String In splitON
Dim NewResult As New List(Of String)
For Each s As String In result
Dim a() As String = Strings.Split(s, spltr)
If a.Length <> 0 Then
For z As Integer = 0 To a.Length - 1
If a(z).Trim <> "" Then NewResult.Add(a(z).Trim)
NewResult.Add(spltr)
Next
NewResult.RemoveAt(NewResult.Count - 1)
End If
Next
result = New List(Of String)
result.AddRange(NewResult)
Next
Peter, I hope the below would be suitable for Split string by array of words using Regex
// Input
String input = "insert into tbl1 inserttbl2 insert into tbl2 update into tbl3
updatededle into tbl4 update into tbl5";
//Regex Exp
String[] arrResult = Regex.Split(input, #"\s+(?=(?:insert|update|delete)\s+)",
RegexOptions.IgnoreCase);
//Output
[0]: "insert into tbl1 inserttbl2"
[1]: "insert into tbl2"
[2]: "update into tbl3 updatededle into tbl4"
[3]: "update into tbl5"