I can't figure out what is wrong with this regular expression - regex

So the I wrote the following vbscript to read a file that the command line would output. The contents of the file would simply be (COMx) with x being the port number of the device in question. This script is supposed to read that file and pull out 'x' and save it to a new text file. I wrote this about two weeks ago and tested it, it worked. Now it seems that no matter what I do I can't get work at all. This is just so baffling to as IT WORKED two weeks ago. Now it just creates an output file with nothing in it. I don't know if I accidentally changed something or what, but any help would be appreciated.
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\rtlstuff\COM.txt", ForReading)
strContents = objFile.ReadAll
objFile.Close
Set regex = New RegExp
With regex
.Pattern = ".*\(COM(.+)?\).*"
End With
Dim ComPort
If regex.Test(strContents) Then
ComPort = regex.Replace(strContents,"$1")
End If
Set objFSO=CreateObject("Scripting.FileSystemObject")
outFile="c:\rtlstuff\ComPort.txt"
Set objFile = objFSO.CreateTextFile(outFile,True)
objFile.Write ComPort
objFile.Close

A regex seems like overkill. If you know the line just contains "COMx", why not just use one of these methods?
' Option 1: Start at the 4th char...
strContents = Mid(objFile.ReadLine, 4)
' Option 2: Remove "COM" from the line...
strContents = Replace(objFile.ReadLine, "COM", "")

Related

VB.Net Search for files matching REGEX

Hi I have a really basic question that the answer completely escapes me. I want to search in a given directory for a file REGEX match. I've tried all kinds of iterations but nothing is working for me. My REGEX is "*_Ch[0-9]+.sgm" and it should work. My files are named "Bld1_Ch1.sgm" and iterates.
The error I get is "System.IO.DirectoryNotFoundException: 'Could not find a part of the path 'C:\Test\06-GCS Bursting Script\TO 33D1-8-2-2-2 RAMTS FI\Bld1'.'"
Thank you for your patience and help.
Maxine
Private Sub btnImport_Click(sender As Object, e As EventArgs) Handles btnImport.Click
Dim searchDir As String = txtSGMFile.Text & "\" & txtUnique.Text
Dim searchFolder As String = "\" & txtUnique.Text
Dim searchPattern = "*_Ch[0-9]+.sgm"
Dim files = Directory.GetFiles(searchDir, searchPattern)
For Each file In files
MsgBox(file)
Next
End Sub
I was able to get it working use this code! Thank you everyone for your help.
Dim files = Directory.GetFiles(path, "*.sgm")
Dim rx = New Regex(".*_Ch\d\.sgm") ' or Dim rx = new Regex(".*_v[0-9]\.pdf")
For Each file In files
If rx.IsMatch(file) Then
' do something with the file
MsgBox(file)
End If
Next file

VBA regex word can't detect text on new line

I am trying to create a macro that will change the font style of the sentences spoken by one of the speakers in my transcripts. The speaker names are style "Heading 2" and i want to change the interviewers lines to style "Interviewer" as you can probably see from my code snippet.
I've never done VBA or macros before, so this is just what i've pulled together over the past 4-5 hours. I really need this to work as i have 20 transcripts that are really long and to do it manually would take too long.
Any help that you can give me would be greatly appreciated.
I have got the macro to recognise the name of one the speakers but can't get it to ignore that and do the text beneath it. I have posted my code and a screenshot of the document below.
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetStart As Long
offsetStart = Selection.Start
regExp.Pattern = "(Interviewer)([\r\n]+)"
regExp.Global = True
regExp.MultiLine = True
Set colMatches = regExp.Execute(Selection.Text) 'Execute search.
For Each objMatch In colMatches
Debug.Print objMatch
Set myRange = ActiveDocument.Range(objMatch.FirstIndex + offsetStart, End:=offsetStart + objMatch.FirstIndex + objMatch.Length)
myRange.Style = ActiveDocument.Styles("Interviewer")
Next
A copy of the file was requested so i've posted it online:
http://www.filedropper.com/stackoverflowexamplefile
If it needs to be uploaded to google drive or something, i can probably do that
I solved this by creating a simple python script using python-docx. It loops through the document checking for the interviewer's name and then just changes the font of the next paragraph. It works perfectly, doing exactly what i needed.
I would like to thank everybody that helped me, it was very much appreciated. I'm going to consider this question answered now. If anyone would like the python script, leave a comment and ill post it.

Cleaning bad data in excel, splitting words by capital letters

I'm using excel 2011 on Mac OSX. I have a data set with about 3000 entries. In the fields that contain names, many of the names are not separated. First and last names are separated by a space, but separate names are bunched together.
Here's what I have, (one cell):
Grant MorrisonSholly FischBen OliverCarlos Alberto Fernandez UrbanoBen OliverCarlos Alberto Fernandez UrbanoBen OliverBen Oliver
Here's what I want to accomplish, (one cell, comma separated with one space after comma):
Grant Morrison, Sholly Fisch, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Ben Oliver
I have found a few VBA scripts that will split words by capital letters, but the ones I've tried will add spaces where I don't need them like this one...
Function splitbycaps(inputstr As String) As String
Dim i As Long
Dim temp As String
If inputstr = vbNullString Then
splitbycaps = temp
Exit Function
Else
temp = inputstr
For i = 1 To Len(temp)
If Mid(temp, i, 1) = UCase(Mid(temp, i, 1)) Then
If i <> 1 Then
temp = Left(temp, i - 1) + " " + Right(temp, Len(temp) - i + 1)
i = i + 1
End If
End If
Next i
splitbycaps = temp
End If
End Function
There was another one that I found here that used RegEx, (forgive me, I'm just learning all of this so I may sound a little dumb) but when I tried that one, it wouldn't work at all, and my research pointed me to a way to add references to the library that would add the necessary tools so I could use it. Unfortunately, I cannot, for the life of me, find how to add a reference to the library on my mac version of excel... I may be doing something wrong, but this is the answer that I could not get to work...
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
I am basically brand new at adding custom functions via VBA through excel, and there may even be a better way to do this, but it seems like every answer that I come to just doesn't quite get the data right. Thanks for any answers!
My function from Split Uppercase words in Excel needs udpdating for your additional string matching.
You would use this function in cell B1 for text in A1 as follows
One assumption your cleansing does make is people have only two names, so
Ben OliverCarlos Alberto
is broken to
Ben Oliver
Carlos Alberto
is that actually what should happen? (needs a minor tweak if so)
code
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = Replace(.Replace(strIn, "$1, $2"), "<br>", ", ")
End With
End Function

Check if line matches regex

I have a file that has been generated by a server - I have no control over how this file is generated or formatted. I need to check each line begins with a string of set length (in this case 21 numerical chars). If a line doesn't match that condition, I need to join it to the previous line and, after reading and correcting the whole file, save it. I am doing this for a lot of files in a directory.
So far I have:
Dim rgx As New Regex("^[0-9]{21}$")
Dim linesList As New List(Of String)(File.ReadAllLines(finfo.FullName))
If linesList(0).Contains("BlackBerry Messenger") Then
linesList.RemoveAt(0)
For i As Integer = 0 To linesList.Count
If Not rgx.IsMatch(i.ToString) Then
linesList.Concat(linesList(i-1))
End If
Next
End If
File.WriteAllLines(finfo.FullName, linesList.ToArray())[code]
There's a for statement before and after that code block to loop over all files in the source directory, which works fine.
Hope this isn't too bad to read :/
I didn't think your solution was any good, you were failing on concatenating the lines. Here's a different approach:
Dim rgx As New Regex("^[0-9]{21}")
Dim linesList As New List(Of String)(File.ReadAllLines(finfo.FullName))
' We will create a new list to store the new lines data
Dim newLinesList As New List(Of String)()
If linesList(0).Contains("BlackBerry Messenger") Then
Dim i As Integer = 1
Dim newLine As String
While i < linesList.Count
newLine = linesList(i)
i += 1
' Keep going until the "real" line is over
While i < linesList.Count AndAlso Not rgx.IsMatch(linesList(i))
newLine += linesList(i)
i += 1
End While
newLinesList.Add(newLine)
End While
End If
File.WriteAllLines(finfo.FullName, newLinesList.ToArray())

RegEx in VBA: Removing line returns from data field in XML file

For someone who knows Regular Expressions, this is probably pretty trivial.
I have several XML files that I need to import, but because some of the fields were originally user-entered, they can have line returns within the data. I am using VBA because the database I am importing into is Access, so it's what is immediately handy.
Sample line in the XML:
<dbXML comments="This is line 1.
This is line 2." />
I tried a regular brute-force import, replace and export to replace line returns with ;, but that also caught all the line returns at the end of each XML line. Which was bad.
I have looked at a couple of similar questions on this site, with a reference document for RegEx open to figure out what they say, but can't quite make the jump from reading RegEx to writing it.
I can give the following definitions for the problem:
The only line returns I want to replace with a ; are between both <> and ""
The "" is always within the <>
so it looks something like <......".....\n...."....>
For the example I gave at the top, I want it to end up like:
<dbXML comments="This is line 1.;This is line 2." />
Thanks in advance for any assistance.
I'm a bit old fashioned, so I'll do this without RegEx.
Dim xmlString As String
Dim thisChar As String
Dim i As Long
Dim insideTag As Boolean
' Make an example xml string
xmlString = "<dbXML comments=""This is line 1." _
& vbCrLf & "This is line 2."" />"
xmlString = xmlString & vbCrLf & xmlString
Debug.Print "Before:"
Debug.Print xmlString
insideTag = False 'Assume we are NOT inside a tag to start with.
For i = 1 To Len(xmlString) 'March through the string char by char
thisChar = Mid(xmlString, i, 1)
If Not insideTag Then
If thisChar = "<" Then
'We're entering a tag.
insideTag = True
Else
'Do nothing. Leave line breaks alone.
End If
Else
'We're inside a tag.
If thisChar = ">" Then
'We've reached the end of this tag
insideTag = False
ElseIf thisChar = vbCr Or thisChar = vbLf Then
'Remove line breaks.
Mid(xmlString, i, 1) = ";"
End If
End If
Next i
Debug.Print "After:"
Debug.Print xmlString
I'm sure that there exists a RegEx that will do the same thing more succinctly.
However it would take me ages to figure it out, compared to the 2 minutes it took me to write out the above code, which works.