Regex - Trying to adjust several lines of coding - regex

I'm working on developing a program and realized I have made a mistake on a few lines: This line here:
CInt(DBRead(userfile, "Stats", "MaxHP", String.Empty))
For example, should become
CInt(DBRead(userfile, "Stats", "MaxHP", "0"))
The idea here is that string.empty obviously can't be converted to Cint. The last variable is meant to be 'default if not found'.
CInt(DBRead(filename, section, name, default))
What I want to do now is find any lines that start with CInt(DBRead( and have String.Empty)), and then adjust the String.Empty to 0. Is this possible, specially because these aren't typically the start or the end of the lines they are on.
Thanks!

This should parse your String.Empty s:
(?<=CInt.*?,.*?,\s?)String\.Empty(?=\))

Related

VBA regex word can't detect text on new line

I am trying to create a macro that will change the font style of the sentences spoken by one of the speakers in my transcripts. The speaker names are style "Heading 2" and i want to change the interviewers lines to style "Interviewer" as you can probably see from my code snippet.
I've never done VBA or macros before, so this is just what i've pulled together over the past 4-5 hours. I really need this to work as i have 20 transcripts that are really long and to do it manually would take too long.
Any help that you can give me would be greatly appreciated.
I have got the macro to recognise the name of one the speakers but can't get it to ignore that and do the text beneath it. I have posted my code and a screenshot of the document below.
Set regExp = New regExp
Dim objMatch As Match
Dim colMatches As MatchCollection
Dim offsetStart As Long
offsetStart = Selection.Start
regExp.Pattern = "(Interviewer)([\r\n]+)"
regExp.Global = True
regExp.MultiLine = True
Set colMatches = regExp.Execute(Selection.Text) 'Execute search.
For Each objMatch In colMatches
Debug.Print objMatch
Set myRange = ActiveDocument.Range(objMatch.FirstIndex + offsetStart, End:=offsetStart + objMatch.FirstIndex + objMatch.Length)
myRange.Style = ActiveDocument.Styles("Interviewer")
Next
A copy of the file was requested so i've posted it online:
http://www.filedropper.com/stackoverflowexamplefile
If it needs to be uploaded to google drive or something, i can probably do that
I solved this by creating a simple python script using python-docx. It loops through the document checking for the interviewer's name and then just changes the font of the next paragraph. It works perfectly, doing exactly what i needed.
I would like to thank everybody that helped me, it was very much appreciated. I'm going to consider this question answered now. If anyone would like the python script, leave a comment and ill post it.

VB.Net Beginner: Replace with Wildcards, Possibly RegEx?

I'm converting a text file to a Tab-Delimited text file, and ran into a bit of a snag. I can get everything I need to work the way I want except for one small part.
One field I'm working with has the home addresses of the subjects as a single entry ("1234 Happy Lane Somewhere, St 12345") and I need each broken down by Street(Tab)City(Tab)State(Tab)Zip. The one part I'm hung up on is the Tab between the State and the Zip.
I've been using input=input.Replace throughout, and it's worked well so far, but I can't think of how to untangle this one. The wildcards I'm used to don't seem to be working, I can't replace ("?? #####") with ("??" + ControlChars.Tab + "#####")...which I honestly didn't expect to work, but it's the only idea on the matter I had.
I've read a bit about using Regex, but have no experience with it, and it seems a bit...overwhelming.
Is Regex my best option for this? If not, are there any other suggestions on solutions I may have missed?
Thanks for your time. :)
EDIT: Here's what I'm using so far. It makes some edits to the line in question, taking care of spaces, commas, and other text I don't need, but I've got nothing for the State/Zip situation; I've a bad habit of wiping something if it doesn't work, but I'll append the last thing I used to the very end, if that'll help.
If input Like "Guar*###/###-####" Then
input = input.Replace("Guar:", "")
input = input.Replace(" ", ControlChars.Tab)
input = input.Replace(",", ControlChars.Tab)
input = "C" + ControlChars.Tab + strAccount + ControlChars.Tab + input
End If
input = System.Text.RegularExpressions.Regex.Replace(" #####", ControlChars.Tab + "#####") <-- Just one example of something that doesn't work.
This is what's written to input in this example
" Guar: LASTNAME,FIRSTNAME 999 E 99TH ST CITY,ST 99999 Tel: 999/999-9999"
And this is what I can get as a result so far
C 99999/9 LASTNAME FIRSTNAME 999 E 99TH ST CITY ST 99999 999/999-9999
With everything being exactly what I need besides the "ST 99999" bit (with actual data obviously omitted for privacy and professional whatnots).
UPDATE: Just when I thought it was all squared away, I've got another snag. The raw data gives me this.
# TERMINOLOGY ######### ##/##/#### # ###.##
And the end result is giving me this, because this is a chunk of data that was just fine as-is...before I removed the Tabs. Now I need a way to replace them after they've been removed, or to omit this small group of code from a document-wide Tab genocide I initiate the code with.
#TERMINOLOGY###########/##/########.##
Would a variant on rgx.Replace work best here? Or can I copy the code to a variable, remove Tabs from the document, then insert the variable without losing the tabs?
I think what you're looking for is
Dim r As New System.Text.RegularExpressions.Regex(" (\d{5})(?!\d)")
Dim input As String = rgx.Replace(input, ControlChars.Tab + "$1")
The first line compiles the regular expression. The \d matches a digit, and the {5}, as you can guess, matches 5 repetitions of the previous atom. The parentheses surrounding the \d{5} is known as a capture group, and is responsible for putting what's captured in a pseudovariable named $1. The (?!\d) is a more advanced concept known as a negative lookahead assertion, and it basically peeks at the next character to check that it's not a digit (because then it could be a 6-or-more digit number, where the first 5 happened to get matched). Another version is
" (\d{5})\b"
where the \b is a word boundary, disallowing alphanumeric characters following the digits.

Regex Matching and Deleting/Replacing a string

So I am trying to parse through a file which has multiple "footers" (the file is an output that was designed for printing which my company wants to keep electronically stored...each footer is a new page and the new page is no longer needed as).
I am trying to look for and remove lines that look like:
1 of 2122 PRINTED 07/01/2013 04:46 Page : 1 of 11
2 of 2122 PRINTED 07/01/2013 04:46 Page: 2 of 11
3 of 2122 PRINTED 07/01/2013 04:46 Page: 3 of 11
and so on
I then want to replace the final line (which would read something like "2122 of 2122") with a "custom" footer.
I am using RegEx, but am very new to using it so how should my RegEx look in order to accomplish this? I plan on using the RegEx "count" function to find out when I've found the last line and then do a .replace on it.
I am using VB .NET, but can translate C# if required. How can I accomplish what I'm looking to do? Specifically I only care about matching/removing of a match so long as the # of matches > 1.
Here's one I created with RegExr:
/^(\d+\s+of\s+\d+)(?=\s+printed)/gim
It matches (number)(space)('of')(space)(number) at the beginning of a line, and only if it is followed by (space)('printed'), case insensitive. The /m flag turns ^ and $ into line-aware boundaries.
This is how I ended up doing it...
Private Function FixFooters(ByVal fileInput As String, Optional ByVal numberToLeaveAlone As Integer = 1) As String
Dim matchpattern As String = "^\d+\W+of\W+\d+\W+PRINTED.*$"
Dim myRegEx As New Regex(matchpattern, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim replacementstring As String = String.Empty
Dim matchCounter As Integer = myRegEx.Matches(fileInput).Count
If numberToLeaveAlone > matchCounter Then numberToLeaveAlone = matchCounter
Return myRegEx.Replace(fileInput, replacementstring, matchCounter - numberToLeaveAlone, 0)
End Function
I used myregextester.com to get the inital matchpattern. Since I wanted to leave the last footer alone (to manipulate it further later on) I created the numberToLeaveAlone variable to ensure we don't remove ALL of the variables. For the purposes of this program I made the default value 1, but that could be changed to zero (I only did it for readability in the calling code as I know I will ALWAYS want to leave one...but I do like to reuse code). It's fairly fast, I'm sure there are better ways out there, but this one made the most sense to me.

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T
pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.
If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.
Doesn't require a %:
/^[^%]*(%20[^%]*)*$/
Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".
Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$
Reject the string if it matches %[^2][^0]
I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/
Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);
/^([^%]|%20)*$/
This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.
I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;