Replacing Characters in String with a Double Quote - regex

This is my string:
<HOLDERS><ACCOUNTHOLDER Title="" Initials="" FirstName="" Surname="" Name="AN'A"N&D & TEST'S"I&X" CifKey="ANA"D.TSX000" CustomerType="2" PrimaryPan="00027272898"/></HOLDERS>
how do I replace the double quotes " in the Name and cifkey and replace them with
"
while still maintaining the double quotes everywhere else in the string?
the output should be
<HOLDERS><ACCOUNTHOLDER Title="" Initials="" FirstName="" Surname="" Name="AN'A"N&D & TEST'S"I&X" CifKey="ANA"D.TSX000" CustomerType="2" PrimaryPan="00027272898"/></HOLDERS>

I wrote all the passages just to be clear but you can make a function and reduce everything to few lines.
Probably isn't the cleanest way but start making it work and improve it after. Assuming txt is your string.
index1 = InStr(txt," Name=") 'space before Name is important not to confuse with firstName, Surname and so on...
index2 = InStr(txt, " CifKey=") 'space before Cifkey is important...
index3 = InStr(txt, " CustomerType=") 'space before CustomerType is important...
SubStrName = Mid(txt, index1 + 7, index2 - 1)
txt = Replace(txt,"SubStrName","##AAA##") 'modifing original string with placeholder which you should be sure is never inside the text
NewChunk1 = Replace(SubStrName,""",""")
txt = Replace(txt,"##AAA##","NewChunk1")
SubStrCifKey = Mid(txt, index2 + 8, index3 - 1)
txt = Replace(txt,"SubStrCifKey","##BBB##")
NewChunk2 = Replace(SubStrName,""",""")
txt = Replace(txt,"##BBB##","NewChunk2")

Related

Select text strings with multiple formatting tags within

Context:
VB.NET application using htmlagility pack to handle html document.
Issue:
In a html document, I'd like to prefixe all the strings starting with # and ending with space by an url whatever formatting tags are used within.
So #sth would became http://www.anything.tld/sth
For instance:
Before:
<p>#string1</p> blablabla
<p><strong>#stri</strong>ng2</p> bliblibli
After:
<p>#string1 blablabla</p>
<p><strong>#stri</strong>ng2 bliblibli</p>
I guess i can achieve this with html agility pack but how to select the entire text string without its formatting ?
Or should i use a simple regex replace routine?
Here's my solution. I'm sure it would make some experienced developpers bleed from every hole but it actually works.
The htmlcode is in strCorpusHtmlContent
Dim matchsHashtag As MatchCollection
Dim matchHashtag As Match
Dim captureHashtag As Capture
Dim strHashtagFormatted As String
Dim strRegexPatternHashtag As String = "#([\s]*)(\w*)"
matchsHashtag = Regex.Matches(strCorpusHtmlContent, strRegexPatternHashtag)
For Each matchHashtag In matchsHashtag
For Each captureHashtag In matchHashtag.Captures
Dim strHashtagToFormat As String
Dim strHashtagValueToFormat As String
' Test if the hashtag is followed by a tag
If Mid(strCorpusHtmlContent, captureHashtag.Index + captureHashtag.Length + 1, 1) = "<" Then
strHashtagValueToFormat = captureHashtag.Value
Dim intStartPosition As Integer = captureHashtag.Index + captureHashtag.Length + 1
Dim intSpaceCharPostion As Integer = intStartPosition
Dim nextChar As Char
Dim blnInATag As Boolean = True
Do Until (nextChar = " " Or nextChar = vbCr Or nextChar = vbLf Or nextChar = vbCrLf) And blnInATag = False
nextChar = CChar(Mid(strCorpusHtmlContent, intSpaceCharPostion + 1, 1))
If nextChar = "<" Then
blnInATag = True
ElseIf nextChar = ">" Then
blnInATag = False
End If
If blnInATag = False And nextChar <> ">" And nextChar <> " " Then
strHashtagValueToFormat &= nextChar
End If
intSpaceCharPostion += 1
Loop
strHashtagToFormat = Mid(strCorpusHtmlContent, captureHashtag.Index + 1, intSpaceCharPostion - captureHashtag.Length)
Else
strHashtagToFormat = captureHashtag.Value
End If
strHashtagFormatted = "" & strHashtagToFormat & ""
strCorpusHtmlContent = Regex.Replace(strCorpusHtmlContent, strHashtagToFormat, strHashtagFormatted)
Next
Next
Before:
<p>#has<strong>hta</strong><em>g_m</em>u<span style="text-decoration: underline;">ltifortmat</span> to convert</p>
After:
<p>#has<strong>hta</strong><em>g_m</em>u<span style="text-decoration: underline;">ltiformat</span> to convert</p>

Excel - Extract all occurrences of a String Pattern + the subsequent 4 characters after the pattern match from a cell

I am struggling with a huge Excel sheet where I need to extract from a certain cell (A1),
all occurrences of a string pattern e.g. "TCS" + the following 4 characters after the pattern match e.g. TCS1234 comma-separated into another cell (B1).
Example:
Cell A1 contains the following string:
HRS164, SRS3439(s), SRS3440(s), SRS3441(s), SRS3442(s), SRS3443(s), SRS3444(s), SRS3445(s), SRS3449(s), SRS3450(s), SRS3451(s), SRS3452(s), SYSBASE.SSS300(s), TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234
All TCS-Numbers shall be comma-separated in B1:
TCS3715, TCS3716, TCS3717, TCS4037, TCS1234
It is not necessary to also extract the followed "(s)".
Could someone please help me (excel rookie) with this challenge?
TIA Erika
Here is what I would use for something like that: also a user defined function:
Function GetTCS(TheString)
For Each TItem In Split(TheString, ", ")
If Left(TItem, 3) = "TCS" Then GetTCS = GetTCS & TItem & " "
Next
GetTCS = Replace(Trim(GetTCS), " ", ", ")
End Function
This returns "TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234" out of your string. If you don't know how to create a user defined function, just ask, it's pretty straight forward and I'd be happy to show you. Hope this helps.
Try the following User Defined Function:
Public Function Xtract(r As Range) As String
Dim s As String, L As Long, U As Long
Dim msg As String, i As Long
s = Replace(r(1).Text, " ", "")
ary = Split(s, ",")
L = LBound(ary)
U = UBound(ary)
Xtract = ""
msg = ""
For i = L To U
If Left(ary(i), 3) = "TCS" Then
If msg = "" Then
msg = Left(ary(i), 7)
Else
msg = msg & "," & Left(ary(i), 7)
End If
End If
Next i
Xtract = msg
End Function
If the TCS-parts are always at the end of the string as in your example, I would use (in B1):
=REPLACE(A1,1,FIND("TCS",A1)-1,"")

VB.Net help selecting first index of string with regex

I was wondering if there was a way I could start a selection from the Regex string i have in the below example
The below example works exactly how I want it too however if there is text that matches before it on another line it is choosing the wrong text and highlighting it.
What im wondering is if there is a way to get the start index of the regex string?
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Me.TextBox1.SelectionStart = Me.TextBox1.Text.IndexOf("is")
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(Me.TextBox1.Text.IndexOf("is"))
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text
The System.Text.RegularExpression.Match object has a property which should help you here: Match.Index. Match.Index will tell you where the capture starts, and Match.Length tells you how long it is. Using those you could change your code to look like this:
If Regex.IsMatch(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b") Then
Dim m as Match
m = Regex.Match(Me.TextBox1.Text, "\b" + Regex.Escape("is") + "\b")
Me.TextBox1.SelectionStart = m.Index
Dim linenumber As Integer = Me.TextBox1.GetLineFromCharIndex(m.Index)
Me.TextBox1.SelectionLength = Me.TextBox1.Lines(linenumber).Length
Me.TextBox1.Focus()
Me.TextBox1.SelectedText = "is " & Me.TextBox2.Text

find multiple regex patterns using vbscript

Sorry, but I am a bit new to RegEx and hope someone is able to help.
Files in questions:
Apples.A.Tasty.Treat.Author-JoeDirt.doc
Cooking with Apples Publisher-Oscar Publishing.txt
Candied.Treats.Author-JenBloc.Publisher-Event.docx
I currently use this piece of vbscript code to replace spaces or dashes in the filename with a period but I am wondering if there is a more efficient way to accomplish this?
Set colRegExMatches = strRegEx.Execute(objSourceFile.Name)
For Each objRegExMatch in colRegExMatches
strResult = InStr(objSourceFile.Name, objRegExMatch)
objTargetFile = Left(objSourceFile.Name, (strResult -1)) & objRegExMatch.Value
objTargetFile = Replace(objSourceFile.Name, " ", ".", 1, -1, 1)
objTargetFile = Replace(objSourceFile.Name, "-", ".", 1, -1, 1)
objSourceFile.Name = objTargetFile
Next
Once the script above is complete, I have the following list of files:
Apples.A.Tasty.Treat.Author-JoeDirt.doc
Cooking.with.Apples.Publisher-Oscar.Publishing.txt
Candied.Treats.Author-JenBloc.Publisher-Event.docx
Now, I want to find anything beginning with Author or Publisher and simply delete the text until the extension.
myRegEx.Pattern = (?:Author|Publisher)+[\w-]+\.
This works mostly for the files with the exception if there is an additional period to add a second part of the publisher name or year of publication or book number.
Apples.A.Tasty.Treat.doc
Cooking.with.Apples.Publishing.txt
Candied.Treats.docx
I tried this code and it seems to work but I have to specify the file extensions.
myRegEx.Pattern = (?:Author|Publisher)[\w-](\S*\B[^txt|docx|doc][\w-].)
If I try the following, it strips the extension for the Candied.Treats file
myRegEx.Pattern = (?:Author|Publisher)[\w-](\S*\B[^][\w-].)
Apples.A.Tasty.Treat.doc
Cooking.with.Apples.txt
Candied.Treats.
I have been using the RegExr Builder at http://gskinner.com/RegExr to test my patterns but am at a loss right now. Finally once my pattern is working as expected how do I use that in my vbscript? Do I simply add a new line as per below?
objTargetFile = Replace(objSourceFile.Name, "(?:Author|Publisher)[\w-](\S*\B[^txt|docx|pdf|doc][\w-].)", "", 1, -1, 1)
Thanks.
This is the new vbscript code which seems to do nothing.
strFixChars = InputBox("Do you want to replace spaces, dashes and strip tags? (Y/N)", "Confirmation")
Set strRegEx = new RegExp
For Each objSourceFile in colSourceFiles
strFileExt = objFSO.GetExtensionName(objSourceFile)
objLogFile.WriteLine "Input File: " & objSourceFile.Name
strCount = Len(objSourceFile.Name)
strRegEx.Pattern = "(?:Author|Publisher)(.+)\."
strRegEx.IgnoreCase = True
strRegEx.Global = True
Set colRegExMatches = strRegEx.Execute(objSourceFile.Name)
For Each objRegExMatch in colRegExMatches
strResult = InStr(objSourceFile.Name, objRegExMatch)
objTargetFile = Left(objSourceFile.Name, (strResult -1)) & objRegExMatch.Value
If strFixChars = "Y" Then
objTargetFile = Replace(objSourceFile.Name, " ", ".")
objTargetFile = Replace(objSourceFile.Name, "-", ".")
objTargetFile = Replace(objSourceFile.Name, "(?:Author|Publisher)(.+)\.", "")
End If
objLogFile.WriteLine "Output File: " & objTargetFile
strFileList = strFileList & vbCrlf & objTargetFile
Next
Next
A quick fix for your regex would be to use (?:Author|Publisher)(.+)\. You will have to replace the first matching group with an empty string in vbscript.

Find and Replace with ASP Classic

I have an function in ASP VB. and I need to replace the exact word in it. For example I have an string like "wool|silk/wool|silk". I want to replace just silk and not silk/wool.
' "|" is a devider
cur_val = "wool|silk/wool|silk"
cur_val_spl = Split("wool|silk/wool|silk", "|")
key_val = "silk"
For Each i In cur_val_spl
If i = key_val Then
cur_val = Replace(cur_val, ("|" & i), "")
cur_val = Replace(cur_val, i, "")
End If
Next
Response.Write(cur_val)
In this case my result would be "wool/wool" but what I really want is this "wool|silk/wool".
I really appreciate any help.
You should build a new string as you go
' "|" is a devider
cur_val = "wool|silk/wool|silk"
cur_val_spl = Split("wool|silk/wool|silk", "|")
result = ""
key_val = "silk"
addPipe = false
For Each i In cur_val_spl
If i <> key_val Then
if addPipe then
result = result & "|"
else
addPipe = true
end if
result = result & i
End If
Next
Response.Write(result)
you could do it with a regular expression but this is shorter
cur_val = "wool|silk/wool|silk"
Response.Write left(mid(replace("|"&cur_val&"|","|wool|","|silk|"),2),len(cur_val))
'=>silk|silk/wool|silk
Too bad you allready accepted the other answer 8>)