Vb net Regex retrieve data up to specific keyword or end of string - regex

I have several string like
kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx
kw_CS_TABLE__FC29-002::details=CAT to NSE
kw_CS_TABLE__FC29-003::details=HAZMIN::
I want to retrieve only the details string (MIN_CAT, CAT to NSE, HAZMIN).
I use the regex (?<=::details=)(.*)(?=::), it looks fine for the first and 3rd case. But it fails for the second case.
I am struggle with the recognition of the end of the string. I use the |$ command, but in this case, I retrieve all the sentence up to the end of the file.
(?<=::details=)(.*)(?=::|$)
kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx
returns > MIN_CAT::title=xxxx
I have a lots of difficulties to understand the regex concepts, especially because I use it only for some specific case. I read several tutorials and posts, but nothing solve my problem.
Thanks

Without regex
Private Function GetDetailsFrom(line As String) As String
Return line.Split({"::"}, StringSplitOptions.None).
Where(Function(item) item.StartsWith("details")).
Select(Function(detail) detail.Split({"="c}).LastOrDefault()).
FirstOrDefault()
End Function
Usage
Dim lines As String() =
{
"kw_CS_TABLE__FC29-001::details=MIN_CAT::title=xxxx",
"kw_CS_TABLE__FC29-002::details=CAT to NSE",
"kw_CS_TABLE__FC29-003::details=HAZMIN::"
}
Dim details = lines.Select(AddressOf GetDetailsFrom)
Console.WriteLine(string.Join(Environment.NewLine, details))
' MIN_CAT
' CAT to NSE
' HAZMIN

Related

Use IndexOf and Substring to catch a string AFTER IndexOf

I just came across something I would ask you for other possible solutions.
I have a string:
string text = "This is a very serious sample text, not a joke!"
Now I would like to find the position of the word "serious" and get the rest of the string AFTER "serious".
One way I would solve this is:
$text="This is a very serious sample text, not a joke!"
$start=($text).IndexOf("serious")
(($text).Substring($start+"serious".Length)).TrimStart()
I am sure there is a regex solution for this as well, but I was wondering if I can use IndexOf() and then Substring to get the rest of the string AFTER "serious".
I was also looking into this post here: Annoying String Substring & IndexOf but either it is not the solution/question I am looking for or I didnt understand...
Thanks for your help in advance, Adis
Since one of SubString's overloads takes only the starting index, first find where serious (note the trailing space) is and then pick substring from that point plus length of what was searched for.
By putting the search term into a variable, one can access its length as a property. Changing the search term would be easy too, as it requires just updating the variable value instead of doing search and replace for string values.
Like so,
$searchTerm = "serious "
$start = $text.IndexOf($searchTerm)
$text.Substring($start + $searchTerm.length)
sample text, not a joke!
As for a simple regex, use -replace and pattern ^.*serious . That would match begin of string ^ then anything .* followed by seroius . Replacing that with an empty string removes the matched start of string. Like so,
"This is a very serious sample text, not a joke!" -replace '^.*serious ', ''
sample text, not a joke!
There might be cases in which Extension Methods would be straight-forward solution. Those allow adding new methods to existing .Net classes. The usual solution would be inheriting, but since string is sealed, that's not allowed. So, extension methods are the way to go. One case could be creating a method, say, IndexEndOf that'll return where search term ends.
Adding .Net code (C# in this case) is easy enough. Sample code is adapted from another answer. The IndexEndOf method does the arithmetic and returns index where the pattern ended at. Like so,
$code=#'
public class ExtendedString {
public string s_ {get; set;}
public ExtendedString(string theString){
s_ = theString;
}
public int IndexEndOf(string pattern)
{
return s_.IndexOf(pattern) + pattern.Length;
}
public static implicit operator ExtendedString(string value){
return new ExtendedString(value);
}
}
'#
add-type -TypeDefinition $code
$text = "This is a very serious sample text, not a joke!"
$searchTerm = "serious "
$text.Substring(([ExtendedString]$text).IndexEndOf($searchTerm))
sample text, not a joke!

Search string and grab all contents between 2 key words VB .NET

I've seen a bunch of examples on this online but I can't seem to find the one I'm looking for which uses Regex. I've seen many that use a loop and use a lot of lines of code but I'd like to see an example of Regex.
What I'm trying to create is an app that will connect too a webpage take the source search for a keyword once its found copy the text from that keyword to another keyword and save it into a string or to a textbox whatever.
I'm already using web request to get the information and put it into a string I just need to search the string for what I am looking for.
The reason for this app is to search webpage for an updated version of some software I'm using. I want to monitor for updates and the app to notify me when an update is available. Just a simple app but having issues searching for what I need.
For Example:
first words to search for: Server 64-bit
second words/characters to search for: </div>
grab first words everything in between and last word saved into a string.
EDIT: The information I am trying to grab is this....
Server 64-bit
<span class="version">
3.0.13.6
</span>
</h3>
<div class="checksum">SHA256: c7eeb1937b0bce0b99e7c7e20de030a4b71adcaf09750481801cfa361433522f</div>
you can use the following code with RegEx to return the whole sentence including the two keywords you are providing
Dim str As String = "first words to search for: Server 64-bit second words/characters to search for: </div>"
str = str.Replace(vbNewLine,"|")
Dim strA As String = Regex.Match(str, "Server 64-bit(.*?)</div>", RegexOptions.Singleline).Value
Msgbox(strA)
Or you can use the following expression to get only value between this two keywords:
Dim strA As String = Regex.Match(str, "(?<=Server 64-bit)(.*)(?=</div>)", RegexOptions.Singleline).Value
Maybe not the prettiest solution, but i would save it into a string. Then iterate through it with the string.contain("Server 64-Bit") and then split the whole thing and then split the remaining part of the string at the next and retrieve only the first part.
Dim Information As String
Dim Splitstring As String
If Information.Contains("Server 64-Bit") Then
Dim parts As String() = Information.Split("Server 64-Bit")
For Each part In parts
SplitString As String = part(1)
Next
If SplitString.Contains("</div>") then
Dim parts As String() = Information.Split("</div>")
For Each part In parts
Dim ResultString As String = part(0)
'Displaying Result in a MsgBox
MsgBox(ResultString)
Next
End If
End If
Im currently only at my Phone, so I cant actually test this, but this should work.

How to include 2 words within Regex and result must be based on only those 2 words VB.NET

I would like to know how to include only 2 or more keywords within a Regex. and ending results should only show those words defined, not only one word.
What I currently have works with multiple keywords but I want it to use BOTH words not either one of the other.
For example:
Dim pattern As String = "(?i)[\t ](?<w>((arma)|(crapo))[a-z0-9]*)[\t ]"
Now the code works fine by including 'arma' or 'crapo'. I only want it to include BOTH 'arma' AND 'crapo' otherwise do not show any results.
Dealing with finding certain keywords within a PDF document and I only want to be shown results if the PDF document includes BOTH 'arma' and 'crapo' (Works fine by showing results for 'arma' OR 'crapo' I want to see results based on 'arma' AND 'crapo'.
Sorry for sounding so repetitive.
Edit: Here is my code. Please read comment.
Dim filesz() As String = GetPatternedFiles("c:\temp\", New String() {"tes*.pdf", "fes*.pdf", "Bas*.pdf"})
'The getpatterenedfiles is a function" also gettextfromPDF is another function.
For Each s As String In filesz
Dim thetext As String = Nothing
Dim pattern As String = "(?i)[\t ](?<w>(crapo)|(arma)[a-z0-9]*)[\t ]"
thetext = GetTextFromPDF(s)
For Each m As Match In Regex.Matches(thetext, pattern)
ListBox1.Items.Add(s)
Next
Next
You can use this regex:
\barma\b.*?\bcrapo\b|\bcrapo\b.*?\barma\b
Working demo
The idea is to match arma whatever crapo or crapo whatever arma and use word boundaries to avoid words like karma.
However, if you want to match karma or crapotos as you asked in your comment you can use:
arma.*?crapo|crapo.*?arma

Regex For Finding Ctypes with Int32

(Hey all,
I am looking for a little regex help...
I am trying to find all CType(expression,Int32) s and replace them with CInt(expression)
This, however, is proving quite difficult, considering there could be a nested Ctype(expression, Int32) within the regex match. Does anyone have any ideas for how to best go about doing this?
Here is what I have now:
Dim str As String = "CType((original.Width * CType((targetSize / CType(original.Height, Single)), Single)), Int32)"
Dim exp As New Regex("CType\((.+), Int32\)")
str = exp.Replace(str, "CInt($1)")
But this will match the entire string and replace it.
I was thinking of doing a recursive function to find the outer most match, and then work inwards, but that still presents a problem with things like
CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)
Any tips would be appreciated.
Input
returnString.Replace(Chr(CType(replaceChars(I), Int32)), Chr(CType(replacementChars(I), Int32)))
Output:
returnString.Replace(Chr(CInt(replaceChars(I))),Chr(CInt(replacementChars(I))))
Edit:
Been working on it a little more and have a recursive function that I'm still working out the kinks in. Recursion + regex. it kinda hurts.
Private Function FindReplaceCInts(ByVal strAs As String) As String
System.Console.WriteLine(String.Format("Testing : {0}", strAs))
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
System.Console.WriteLine(strAs.IndexOf(replaceT))
strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
Cheers,
What do you guys think of this?
I think it does it quite nicely for a variety of cases that I have tested so far...
Private Function FindReplaceCInts(ByVal strAs As String) As String
Dim exp As New Regex("CType\((.+), Int32\)")
If exp.Match(strAs).Success Then
For Each match As Match In exp.Matches(strAs)
If exp.Match(match.Value.Substring(2)).Success Then
Dim replaceT As String = match.Value.Substring(2)
Dim Witht As String = FindReplaceCInts(match.Value.Substring(2))
strAs = strAs.Replace(replaceT, Witht)
End If
Next
strAs = exp.Replace(strAs, "CInt($1)")
End If
Return strAs
End Function
try to use this (?!CType\(.+, )Int32 regex instead of yours
You need to use negative look ahead to accomplish your task.
Check regex at this site
I've tried this in VS 2008 (no copy of VS 2010 to try it out), using the Find & Replace dialog:
Regular Expression: CType\({.+}, Int32\)
Replace With: CInt(\1)
It won't fix the nested situations in one pass, but you should be able to continue searching with that pattern and replacing until no other matches are found.
BTW: That dialog also provides a link to this help page explaining characters used the VS flavor of regex http://msdn.microsoft.com/en-us/library/aa293063(VS.71).aspx

Regex to replace string with another string in MS Word?

Can anyone help me with a regex to turn:
filename_author
to
author_filename
I am using MS Word 2003 and am trying to do this with Word's Find-and-Replace. I've tried the use wildcards feature but haven't had any luck.
Am I only going to be able to do it programmatically?
Here is the regex:
([^_]*)_(.*)
And here is a C# example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
String test = "filename_author";
String result = Regex.Replace(test, #"([^_]*)_(.*)", "$2_$1");
}
}
Here is a Python example:
from re import sub
test = "filename_author";
result = sub('([^_]*)_(.*)', r'\2_\1', test)
Edit: In order to do this in Microsoft Word using wildcards use this as a search string:
(<*>)_(<*>)
and replace with this:
\2_\1
Also, please see Add power to Word searches with regular expressions for an explanation of the syntax I have used above:
The asterisk (*) returns all the text in the word.
The less than and greater than symbols (< >) mark the start and end
of each word, respectively. They
ensure that the search returns a
single word.
The parentheses and the space between them divide the words into
distinct groups: (first word) (second
word). The parentheses also indicate
the order in which you want search to
evaluate each expression.
Here you go:
s/^([a-zA-Z]+)_([a-zA-Z]+)$/\2_\1/
Depending on the context, that might be a little greedy.
Search pattern:
([^_]+)_(.+)
Replacement pattern:
$2_$1
In .NET you could use ([^_]+)_([^_]+) as the regex and then $2_$1 as the substitution pattern, for this very specific type of case. If you need more than 2 parts it gets a lot more complicated.
Since you're in MS Word, you might try a non-programming approach. Highlight all of the text, select Table -> Convert -> Text to Table. Set the number of columns at 2. Choose Separate Text At, select the Other radio, and enter an _. That will give you a table. Switch the two columns. Then convert the table back to text using the _ again.
Or you could copy the whole thing to Excel, construct a formula to split and rejoin the text and then copy and paste that back to Word. Either would work.
In C# you could also do something like this.
string[] parts = "filename_author".Split('_');
return parts[1] + "_" + parts[0];
You asked about regex of course, but this might be a good alternative.