Regex VB.Net Regex.Replace - regex

I'm trying to perform a simple regex find and replace, adding a tab into the string after some digits as outlined below.
From
a/users/12345/badges
To
a/users/12345 /badges
I'm using the following:
s = regex.replace(s, "(a\/users\/\d*)("a\/users\/\d*\t)", $1 $2")
But im clearly doing something wrong.
Where am I going wrong, I know its a stupid mistake but help would be gratefully received.
VBVirg

You can achieve that with a mere look-ahead that will find the position right before the last /:
Dim s As String = Regex.Replace("a/users/12345/badges", "(?=/[^/]*$)", vbTab)
Output:
a/users/12345 /badges
See IDEONE demo
Or, you can just use LastIndexOf owith Insert:
Dim str2 As String
Dim str As String = "a/users/12345/badges"
Dim idx = str.LastIndexOf("/")
If idx > 0 Then
str2 = str.Insert(idx, vbTab)
End If

When I read, "adding a tab into the string after some digits" I think there could be more than one set of digits that can appear between forward slashes. This pattern:
"/(\d+)/"
Will capture only digits that are between forward slashes and will allow you to insert a tab like so:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim str As String = "a/54321/us123ers/12345/badges"
str = Regex.Replace(str, "/(\d+)/", String.Format("/$1{0}/", vbTab))
Console.WriteLine(str)
Console.ReadLine()
End Sub
End Module
Results (NOTE: The tab spaces can vary in length):
a/54321 /us123ers/12345 /badges
When String is "a/54321/users/12345/badges" results are:
a/54321 /users/12345 /badges

Related

Replace 2 step Regex with 1 step Regex to get one upper case letter between underscores

I have a string, myFile, that looks like: Name_2019-11-29_D_HPSeries.txt. I need to extract the letter D between the underscores...the letter could be any uppercase letter. Right now I am using a 2 step Regex code.
Dim bC As String = Regex.Match(myFile, "_[A-Z]+_").ToString
boatClass = Regex.Match(bC, "[A-Z]+").ToString
This works but I believe it could be done with one line. I tried the code below but it doesn't work.
boatClass = Regex.Replace(myFile, "_[A-Z]_", "[A-Z]").ToString
You can use positive lookarounds to avoid a 2-step process, checking that the characters before and after the letter are underscores without capturing them:
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "(?<=_)[A-Z](?=_)").ToString
Console.WriteLine(bc)
Output:
D
You were almost there with a single char A-Z, but you could wrap it in a capturing group and then use the Match.Groups property.
_([A-Z])_
Regex demo | VB.Net Demo
For example
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "_([A-Z])_").Groups(1).Value
Console.WriteLine(bc)
Result
D

VBA RegEx getting String with only Number and Hyphen

I have a string with something like
Bl. 01 - 03
I want this to be reduced to only
01-03
Everything other than digits & hyphen should be removed. Any ideas how to do it using regex or any other method?
you can use this pattern in a replace expression:
reg.Pattern = "[^\d-]+"
Debug.Print reg.Replace(yourstring, "")
Barring a more complete description of exactly what you mean by something like "BI. 01 - 03", this:
^.*(\d{2}\s?-\s?\d{2}).*$
Will capture the portion you seem to be interested in as group 1. If you want to get rid of the spaces as well, then something like:
^.*(\d{2})\s?-\s?(\d{2}).*$
might be more suited, where you will have the two numbers in groups 1 and 2, and can replace the hyphen in output.
Here's a function with a non-RegEx approach to remove anything but digits and the hyphen from a given input string:
Function removeBadChars(sInput As String) As String
Dim i As Integer
Dim sResult As String
Dim sChr As String
For i = 1 To Len(sInput)
sChr = Mid(sInput, i, 1)
If IsNumeric(sChr) Or sChr = "-" Then
sResult = sResult & sChr
End If
Next
removeBadChars = sResult
End Function

how to remove double characters and spaces from string

Please let me how to remove double spaces and characters from below string.
String = Test----$$$$19****45#### Nothing
Clean String = Test-$19*45# Nothing
I have used regex "\s+" but it just removing the double spaces and I have tried other patterns of regex but it is too complex... please help me.
I am using vb.net
What you'll want to do is create a backreference to any character, and then remove the following characters that match that backreference. It's usually possible using the pattern (.)\1+, which should be replaced with just that backreference (once). It depends on the programming language how it's exactly done.
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(.)\1+").Replace(text, "$1")
result will now contain Test#_&a&. Alternatively, you can use a lookaround to not remove that backreference in the first place:
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(?<=(.))\1+").Replace(text, "")
Edit: included examples
For a faster alternative try:
Dim text As String = "Test###_&aa&&&"
Dim sb As New StringBuilder(text.Length)
Dim lastChar As Char
For Each c As Char In text
If c <> lastChar Then
sb.Append(c)
lastChar = c
End If
Next
Console.WriteLine(sb.ToString())
Here is a perl way to substitute all multiple non word chars by only one:
my $String = 'Test----$$$$19****45#### Nothing';
$String =~ s/(\W)\1+/$1/g;
print $String;
output:
Test-$19*45# Nothing
Here's how it would look in Java...
String raw = "Test----$$$$19****45#### Nothing";
String cleaned = raw.replaceAll("(.)\\1+", "$1");
System.out.println(raw);
System.out.println(cleaned);
prints
Test----$$$$19****45#### Nothing
Test-$19*45# Nothing

regex not matching correctly

First of all, I would like an opinion if using regex is even the best solution here, I'm fairly new to this area and regex is the first thing I found and it seemed somewhat easy to use, until I need to grab a long section of text out of a line lol. I'm using a vb.net environment for regex.
Basically, I'm taking this line here:
21:24:55 "READ/WRITE: ['PASS',false,'27880739',[40,[459.313,2434.11,0.00221252]],[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]],["DZ_Backpack_EP1",[["BAF_AS50_TWS"],[1]],[["FoodSteakCooked","ItemPainkiller","ItemMorphine","ItemSodaCoke","5Rnd_127x99_as50","ItemBloodbag"],[2,1,1,2,4,1]]],[316,517,517],Sniper1_DZ,0.94]"
Using the following regex:
\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
To try and get the following:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi","FoodSteakCooked",["30Rnd_556x45_StanagSD",29],"30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD","30Rnd_556x45_StanagSD",["15Rnd_9x19_M9SD",12],["15Rnd_9x19_M9SD",10],"15Rnd_9x19_M9SD","15Rnd_9x19_M9SD","ItemBandage"]]
However either my regex is flawed, or my vb.net code is. It only displays the following data:
[["ItemFlashlight","ItemWatch","ItemMap","ItemKnife","ItemEtool","ItemGPS","ItemHatchet","ItemCompass","ItemMatchbox","M9SD","ItemFlashlightRed","NVGoggles","Binocular_Vector","ItemToolbox","M4A1_AIM_SD_camo"],["ItemPainkiller","ItemMorphine","ItemSodaPepsi",
My vb.net code in case you need to peek at it is:
ListView1.Clear()
Call initList(Me.ListView1)
My.Computer.FileSystem.CurrentDirectory = My.Settings.cfgPath
My.Computer.FileSystem.CopyFile("arma2oaserver.RPT", "tempRPT.txt")
Dim ScriptLine As String = ""
Dim path As String = My.Computer.FileSystem.CurrentDirectory & "\tempRPT.txt"
Dim lines As String() = IO.File.ReadAllLines(path, System.Text.Encoding.Default)
Dim que = New Queue(Of String)(lines)
ProgressBar1.Maximum = lines.Count + 1
ProgressBar1.Value = 0
Do While que.Count > 0
ScriptLine = que.Dequeue()
ScriptLine = LCase(ScriptLine)
If InStr(ScriptLine, "login attempt:") Then
Dim rtime As Match = Regex.Match(ScriptLine, ("(\d{1,2}:\d{2}:\d{2})"))
Dim nam As Match = Regex.Match(ScriptLine, "\""([^)]*)\""")
Dim name As String = nam.ToString.Replace("""", "")
Dim next_line As String = que.Peek 'Read next line temporarily 'This is where it would move to next line temporarily to read from it
next_line = LCase(next_line)
If InStr(next_line, "read/write:") > 0 Then 'Or InStr(next_line, "update: [b") > 0 Then 'And InStr(next_line, "setmarkerposlocal.sqf") < 1 Then
Dim coords As Match = Regex.Match(next_line, "\[(\d+)\,\[(-?\d+)\.\d+\,(-?\d+)\.\d+,([\d|.|-]+)\]\]")
Dim inv As Match = Regex.Match(next_line, "\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],") '\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\],
'\[\[([\w|_|\""|,]*)\],\[([\w|_|\""|,|\[|\]]*)\]\]:\[([\w|_|\""|,|\[|\]]*)\]\:
Dim back As Match = Regex.Match(next_line, "\""([\w|_]+)\"",\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\],\[\[([\w|_|\""|,]*)\],\[([\d|,]*)\]\]")
Dim held As Match = Regex.Match(next_line, "\[\""([\w|_|\""|,]+)\""\,\d+\]")
With Me.ListView1
.Items.Add(name.ToString)
With .Items(.Items.Count - 1).SubItems
.Add(rtime.ToString)
.Add(coords.ToString)
.Add(inv.ToString)
.Add(back.ToString)
.Add(held.ToString)
End With
End With
End If
End If
ProgressBar1.Value += 1
Loop
My.Computer.FileSystem.DeleteFile("tempRPT.txt")
ProgressBar1.Value = 0
The odd thing is, when I test my regex in Expresso it gets the full, correct match. So I don't know what I'm doing wrong.
I'm not sure what's wrong with the regex you have, but the first match off of this one seems to work fine:
\[\[.*?\]\]
Hope this helps.
-EDIT-
The problem isn't the regex, it's that ListView is truncating the display of the string. See here
Try this regular expression instead: \Q[[\E(?:(?!\Q[[\E).)+]]
http://regex101.com/r/zP1aC5
If you need a backref, use \Q[[\E((?:(?!\Q[[\E).)+)]]
Perhaps you should specify whether you are working with single line or multi line input text. Depending on your input text format, try with:
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.SingleLine);
or
Dim variableName as Match = Regex.Match("input", "pattern", RegexOptions.Multiline);

How to extract substring in parentheses using Regex pattern

This is probably a simple problem, but unfortunately I wasn't able to get the results I wanted...
Say, I have the following line:
"Wouldn't It Be Nice" (B. Wilson/Asher/Love)
I would have to look for this pattern:
" (<any string>)
In order to retrieve:
B. Wilson/Asher/Love
I tried something like "" (([^))]*)) but it doesn't seem to work. Also, I'd like to use Match.Submatches(0) so that might complicate things a bit because it relies on brackets...
Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\)
"" 'quote (twice to escape)
[ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces
\( 'left parenthesis
( 'open capturing group
[^)]+ 'anything not a right parenthesis
) 'close capturing group
\) 'right parenthesis
In a function:
Public Function GetStringInParens(search_str As String)
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
GetStringInParens = ""
regEx.Pattern = """[ \xA0]*\(([^)]+)\)"
regEx.Global = True
If regEx.test(search_str) Then
Set matches = regEx.Execute(search_str)
GetStringInParens = matches(0).SubMatches(0)
End If
End Function
Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex.
Function BetweenParentheses(s As String) As String
BetweenParentheses = Mid(s, InStr(s, "(") + 1, _
InStr(s, ")") - InStr(s, "(") - 1)
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
EDIT #alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification:
Function BetweenParentheses(s As String) As String
Dim iEndQuote As Long
Dim iLeftParenthesis As Long
Dim iRightParenthesis As Long
iEndQuote = InStrRev(s, """")
iLeftParenthesis = InStr(iEndQuote, s, "(")
iRightParenthesis = InStr(iEndQuote, s, ")")
If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then
BetweenParentheses = Mid(s, iLeftParenthesis + 1, _
iRightParenthesis - iLeftParenthesis - 1)
End If
End Function
Usage:
Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
'B. Wilson/Asher/Love
Debug.Print BetweenParentheses("""Don't talk (yell)""")
' returns empty string
Of course this is less concise than before!
This a nice regex
".*\(([^)]*)
In VBA/VBScript:
Dim myRegExp, ResultString, myMatches, myMatch As Match
Dim myRegExp As RegExp
Set myRegExp = New RegExp
myRegExp.Pattern = """.*\(([^)]*)"
Set myMatches = myRegExp.Execute(SubjectString)
If myMatches.Count >= 1 Then
Set myMatch = myMatches(0)
If myMatch.SubMatches.Count >= 3 Then
ResultString = myMatch.SubMatches(3-1)
Else
ResultString = ""
End If
Else
ResultString = ""
End If
This matches
Put Your Head on My Shoulder
in
"Don't Talk (Put Your Head on My Shoulder)"
Update 1
I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong
If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane.
Given the new input the regex is tweaked to
".*".*\(([^)]*)
So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.
This function worked on your example string:
Function GetArtist(songMeta As String) As String
Dim artist As String
' split string by ")" and take last portion
artist = Split(songMeta, "(")(UBound(Split(songMeta, "(")))
' remove closing parenthesis
artist = Replace(artist, ")", "")
End Function
Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
prints "B. Wilson/Asher/Love" to the Immediate Window.
It also solves the problem alan mentioned. Ex:
Sub Test()
Dim songMeta As String
songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)"
Debug.Print GetArtist(songMeta)
End Sub
also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.
This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here
Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
wscript.echo Extract(Data)
'---------------------------------------------------------------
Function Extract(Data)
Dim strPattern,oRegExp,Matches
strPattern = "(?:\()(.*)(?:\))"
Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Pattern = strPattern
set Matches = oRegExp.Execute(Data)
If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0)
End Function
'---------------------------------------------------------------
I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.