Search string and grab all contents between 2 key words VB .NET - regex

I've seen a bunch of examples on this online but I can't seem to find the one I'm looking for which uses Regex. I've seen many that use a loop and use a lot of lines of code but I'd like to see an example of Regex.
What I'm trying to create is an app that will connect too a webpage take the source search for a keyword once its found copy the text from that keyword to another keyword and save it into a string or to a textbox whatever.
I'm already using web request to get the information and put it into a string I just need to search the string for what I am looking for.
The reason for this app is to search webpage for an updated version of some software I'm using. I want to monitor for updates and the app to notify me when an update is available. Just a simple app but having issues searching for what I need.
For Example:
first words to search for: Server 64-bit
second words/characters to search for: </div>
grab first words everything in between and last word saved into a string.
EDIT: The information I am trying to grab is this....
Server 64-bit
<span class="version">
3.0.13.6
</span>
</h3>
<div class="checksum">SHA256: c7eeb1937b0bce0b99e7c7e20de030a4b71adcaf09750481801cfa361433522f</div>

you can use the following code with RegEx to return the whole sentence including the two keywords you are providing
Dim str As String = "first words to search for: Server 64-bit second words/characters to search for: </div>"
str = str.Replace(vbNewLine,"|")
Dim strA As String = Regex.Match(str, "Server 64-bit(.*?)</div>", RegexOptions.Singleline).Value
Msgbox(strA)
Or you can use the following expression to get only value between this two keywords:
Dim strA As String = Regex.Match(str, "(?<=Server 64-bit)(.*)(?=</div>)", RegexOptions.Singleline).Value

Maybe not the prettiest solution, but i would save it into a string. Then iterate through it with the string.contain("Server 64-Bit") and then split the whole thing and then split the remaining part of the string at the next and retrieve only the first part.
Dim Information As String
Dim Splitstring As String
If Information.Contains("Server 64-Bit") Then
Dim parts As String() = Information.Split("Server 64-Bit")
For Each part In parts
SplitString As String = part(1)
Next
If SplitString.Contains("</div>") then
Dim parts As String() = Information.Split("</div>")
For Each part In parts
Dim ResultString As String = part(0)
'Displaying Result in a MsgBox
MsgBox(ResultString)
Next
End If
End If
Im currently only at my Phone, so I cant actually test this, but this should work.

Related

Vb net Regex retrieve data up to specific keyword or end of string

I have several string like
kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx
kw_CS_TABLE__FC29-002::details=CAT to NSE
kw_CS_TABLE__FC29-003::details=HAZMIN::
I want to retrieve only the details string (MIN_CAT, CAT to NSE, HAZMIN).
I use the regex (?<=::details=)(.*)(?=::), it looks fine for the first and 3rd case. But it fails for the second case.
I am struggle with the recognition of the end of the string. I use the |$ command, but in this case, I retrieve all the sentence up to the end of the file.
(?<=::details=)(.*)(?=::|$)
kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx
returns > MIN_CAT::title=xxxx
I have a lots of difficulties to understand the regex concepts, especially because I use it only for some specific case. I read several tutorials and posts, but nothing solve my problem.
Thanks
Without regex
Private Function GetDetailsFrom(line As String) As String
Return line.Split({"::"}, StringSplitOptions.None).
Where(Function(item) item.StartsWith("details")).
Select(Function(detail) detail.Split({"="c}).LastOrDefault()).
FirstOrDefault()
End Function
Usage
Dim lines As String() =
{
"kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx",
"kw_CS_TABLE__FC29-002::details=CAT to NSE",
"kw_CS_TABLE__FC29-003::details=HAZMIN::"
}
Dim details = lines.Select(AddressOf GetDetailsFrom)
Console.WriteLine(string.Join(Environment.NewLine, details))
' MIN_CAT
' CAT to NSE
' HAZMIN

How to include 2 words within Regex and result must be based on only those 2 words VB.NET

I would like to know how to include only 2 or more keywords within a Regex. and ending results should only show those words defined, not only one word.
What I currently have works with multiple keywords but I want it to use BOTH words not either one of the other.
For example:
Dim pattern As String = "(?i)[\t ](?<w>((arma)|(crapo))[a-z0-9]*)[\t ]"
Now the code works fine by including 'arma' or 'crapo'. I only want it to include BOTH 'arma' AND 'crapo' otherwise do not show any results.
Dealing with finding certain keywords within a PDF document and I only want to be shown results if the PDF document includes BOTH 'arma' and 'crapo' (Works fine by showing results for 'arma' OR 'crapo' I want to see results based on 'arma' AND 'crapo'.
Sorry for sounding so repetitive.
Edit: Here is my code. Please read comment.
Dim filesz() As String = GetPatternedFiles("c:\temp\", New String() {"tes*.pdf", "fes*.pdf", "Bas*.pdf"})
'The getpatterenedfiles is a function" also gettextfromPDF is another function.
For Each s As String In filesz
Dim thetext As String = Nothing
Dim pattern As String = "(?i)[\t ](?<w>(crapo)|(arma)[a-z0-9]*)[\t ]"
thetext = GetTextFromPDF(s)
For Each m As Match In Regex.Matches(thetext, pattern)
ListBox1.Items.Add(s)
Next
Next
You can use this regex:
\barma\b.*?\bcrapo\b|\bcrapo\b.*?\barma\b
Working demo
The idea is to match arma whatever crapo or crapo whatever arma and use word boundaries to avoid words like karma.
However, if you want to match karma or crapotos as you asked in your comment you can use:
arma.*?crapo|crapo.*?arma

Regex Matching and Deleting/Replacing a string

So I am trying to parse through a file which has multiple "footers" (the file is an output that was designed for printing which my company wants to keep electronically stored...each footer is a new page and the new page is no longer needed as).
I am trying to look for and remove lines that look like:
1 of 2122 PRINTED 07/01/2013 04:46 Page : 1 of 11
2 of 2122 PRINTED 07/01/2013 04:46 Page: 2 of 11
3 of 2122 PRINTED 07/01/2013 04:46 Page: 3 of 11
and so on
I then want to replace the final line (which would read something like "2122 of 2122") with a "custom" footer.
I am using RegEx, but am very new to using it so how should my RegEx look in order to accomplish this? I plan on using the RegEx "count" function to find out when I've found the last line and then do a .replace on it.
I am using VB .NET, but can translate C# if required. How can I accomplish what I'm looking to do? Specifically I only care about matching/removing of a match so long as the # of matches > 1.
Here's one I created with RegExr:
/^(\d+\s+of\s+\d+)(?=\s+printed)/gim
It matches (number)(space)('of')(space)(number) at the beginning of a line, and only if it is followed by (space)('printed'), case insensitive. The /m flag turns ^ and $ into line-aware boundaries.
This is how I ended up doing it...
Private Function FixFooters(ByVal fileInput As String, Optional ByVal numberToLeaveAlone As Integer = 1) As String
Dim matchpattern As String = "^\d+\W+of\W+\d+\W+PRINTED.*$"
Dim myRegEx As New Regex(matchpattern, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim replacementstring As String = String.Empty
Dim matchCounter As Integer = myRegEx.Matches(fileInput).Count
If numberToLeaveAlone > matchCounter Then numberToLeaveAlone = matchCounter
Return myRegEx.Replace(fileInput, replacementstring, matchCounter - numberToLeaveAlone, 0)
End Function
I used myregextester.com to get the inital matchpattern. Since I wanted to leave the last footer alone (to manipulate it further later on) I created the numberToLeaveAlone variable to ensure we don't remove ALL of the variables. For the purposes of this program I made the default value 1, but that could be changed to zero (I only did it for readability in the calling code as I know I will ALWAYS want to leave one...but I do like to reuse code). It's fairly fast, I'm sure there are better ways out there, but this one made the most sense to me.

regular expressions and vba

Does anyone know how to extract matches as strings from a RegExp.Execute() function?
Let me show you what I've gotten to so far:
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
I want the object "myMatches" which is holding the matches, to be converted to a string. I know that there is only going to be one match per execution.
Does anyone know how to extract the matches from the object as Strings to be displayed lets say via a MsgBox?
Try this:
Dim sResult As String
'// Your expression code here...
sResult = myMatches.Item(0)
'// or
sResult = myMatches(0)
Msgbox("The matching text was: " & sResult)
The Execute method returns a match collection and you can use the item property to retrieve the text using an index.
As you stated you only ever have one match then the index is zero. If you have more than one match you can return the index of the match you require or loop over the entire collection.
This page has a lot of information on regex and seems to have what you want.
http://www.regular-expressions.info/vbscript.html

Regex For Finding Ctypes with Int32

(Hey all,
I am looking for a little regex help...
I am trying to find all CType(expression,Int32) s and replace them with CInt(expression)
This, however, is proving quite difficult, considering there could be a nested Ctype(expression, Int32) within the regex match. Does anyone have any ideas for how to best go about doing this?
Here is what I have now:
Dim str As String = "CType((original.Width * CType((targetSize / CType(original.Height, Single)), Single)), Int32)"
Dim exp As New Regex("CType\((.+), Int32\)")
str = exp.Replace(str, "CInt($1)")
But this will match the entire string and replace it.
I was thinking of doing a recursive function to find the outer most match, and then work inwards, but that still presents a problem with things like
CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)
Any tips would be appreciated.
Input
returnString.Replace(Chr(CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)))
Output:
returnString.Replace(Chr(CInt(replaceChars(I))),Chr(CInt(replacementChars(I))))
Edit:
Been working on it a little more and have a recursive function that I'm still working out the kinks in. Recursion + regex. it kinda hurts.
Private Function FindReplaceCInts(ByVal strAs As String) As String
System.Console.WriteLine(String.Format("Testing : {0}", strAs))
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
System.Console.WriteLine(strAs.IndexOf(replaceT))
strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
Cheers,
What do you guys think of this?
I think it does it quite nicely for a variety of cases that I have tested so far...
Private Function FindReplaceCInts(ByVal strAs As String) As String
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
strAs = strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
try to use this (?!CType\(.+, )Int32 regex instead of yours
You need to use negative look ahead to accomplish your task.
Check regex at this site
I've tried this in VS 2008 (no copy of VS 2010 to try it out), using the Find & Replace dialog:
Regular Expression: CType\({.+}, Int32\)
Replace With: CInt(\1)
It won't fix the nested situations in one pass, but you should be able to continue searching with that pattern and replacing until no other matches are found.
BTW: That dialog also provides a link to this help page explaining characters used the VS flavor of regex http://msdn.microsoft.com/en-us/library/aa293063(VS.71).aspx