Need to extract text from within first curly brackets - regex

I have strings that look like this
{/CSDC} CHOC SHELL DIP COLOR {17}
I need to extract the value in the first swirly brackets. In the above example it would be
/CSDC
So far i have this code which is not working
Dim matchCode = Regex.Matches(txtItems.Text, "/\{(.+?)\}/")
Dim itemCode As String
If matchCode.Count > 0 Then
itemCode = matchCode(0).Value
End If

I think the main issue here is that you are confusing your regular expression syntax between different languages.
In languages like Javascript, Perl, Ruby and others, you create a regular expression object by using the /regex/ notation.
In .NET, when you instantiate a Regex object, you pass it a string of the regular expression, which is delimited by quotes, not slashes. So it is of the form "regex".
So try removing the leading and trailing / from your string and see how you go.
This may not be the whole problem, but it is at least part of it.

Are you getting the whole string instead of just the 1st value? Regular expressions are greedy by default so .Net is trying to grab the largest matching string.
Try this:
Dim matchCode = Regex.Matches(txtItems.Text, "\{[^}]*\}")
Dim itemCode As String
If matchCode.Count > 0 Then
itemCode = matchCode(0).Groups(0).Value
End If
Edited: I've tried this in Linqpad and it worked.

It appears you are using a capture group.. so try matchCode(0).Groups(0).Value
Also, remove the /\ from the beginning of the pattern and remove the trailing /

Related

Regular expression formatting issue

I'm VERY new to using regular expressions, and I'm trying to figure something simple out.
I have a simple string, and i'm trying to pull out the 590111 and place it into another string.
HMax_590111-1_v8980.bin
So the new string would simply be...
590111
The part number will ALWAYS have 6 digits, and ALWAYS have a version and such. The part number might change location inside of the string.. so it needs to be able to work if it's like this..
590111-1_v8980_HMXAX.bin
What regex expression will do this? Currently, i'm using ^[0-9]* to find it if it's in the front of the file.
Try the following Regex:
Dim text As String = "590111-1_v8980_HMXAX.bin"
Dim pattern As String = "\d{6}"
'Instantiate the regular expression object.
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
'Match the regular expression pattern against a text string.
Dim m As Match = r.Match(text)
In Regex \d denotes numerics, so first you write \d.
Then as you know there will be a fix length of numbers which can be specified in Regex with "{}". If you specify \d{6} it means it will expect 6 continuous occurrences of a numeric character.
I would recommend to use this site to try your own expressions. Here you can also find a little bit of information about the expressions you are building if you hover over it.
Regex Tester

How to match a specific character only if it is followed by string containing specific characters

I'm trying to replace slashes in a string, but not all of them - only the ones before first comma. To do that, I probably have to find a way to match only slashes being followed by string containing a comma.
Is it possible to do this using one regexp, i.e. without first splitting the string by commas?
Example input string:
Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4
What I want to get:
Abc1.Def2.Ghi3,/Dore1/Mifa2/Solla3,Sido4
I've tried some lookahead and lookbehind techniques with no effect, so currently to do this in e.g. Python I first split the data:
test = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
strlist = re.split(r',', test)
result = ','.join([re.sub(r'\/', r'.', strlist[0])] + strlist[1:])
What I would prefer is to use a specific regexp pattern instead of Python-oriented solution though, so essentially I could have a pattern and replacement such that the following code would give me the same result:
result = re.sub(pattern, replacement, test)
Thanks for all regex-avoiding answers - I was wondering if I could do this using only regex (so e.g. I could use sed instead of Python).
item = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
print item.replace("/", ".", item.count("/", 0, item.index(",")))
This will print what you need. Try to avoid regex wherever you can because they are slow.
You could do this with lookbehind expressions that look for both the beginning of the string and no comma. Or don't use re entirely.
s = 'Abc1/Def2/Ghi3,/Dore1/Mifa2/Solla3,Sido4'
left,sep,right = s.partition(',')
sep.join((left.replace('/','.'),right))
Out[24]: 'Abc1.Def2.Ghi3,/Dore1/Mifa2/Solla3,Sido4'

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Extract pattern from string, with special characters, using Regular Expressions

I am trying to use a regex in VB.NET - the language probably shouldn't matter though - I am trying to extract something reasonable out of a very large file name, "\\path\path\path.path.path\path\some_more_stuff_from a name.item_123_456.html"
I would like to extract, from that whole mess, the "item_123_456"
It seems to make sense that I can get everything before a pattern like ".html" , and from it, everything after the last dot ?
I have tried to get at least the last part (the entire string before .html) and I still get no matches:
Dim matches As MatchCollection
Dim regexStuff As New Regex(".*\\.html")
matches = regexStuff.Matches(strINeed)
Dim successfulMatch As Match
For Each successfulMatch In matches
strFound = successfulMatch.Value
Next
The match I experimented with, hoping I might even get everything between a dot and an .html: Regex("\\..*\\.html") returned Nothing as well.
I just can't get regular expressions to work...
.*\.(.*?)\.html
This finds as many characters as possible .* until it comes to ( a dot followed by as few characters as possible followed by a dot html ) (\.(.*?)\.html)
It places the stuff between the dot html and the dot preceding the dot html into a capturing group, which should be in $1. If you need the vb.net code for that I can likely get that as well, but your code looked okay
Your vb code should look something like this:
Dim matches As MatchCollection
Dim regexStuff As New Regex(".*\.(.*?)\.html")
matches = regexStuff.Matches(strINeed)
strFound = matches.Item(0).Groups(1).Value.ToString
It could probably be generalized into this
[^.\\]+\.html
Edit: or, initial dot required
\.[^.\\]+\.html

RegEx pattern to extract URLs

I have to extract all there is between this caracters:
<a href="/url?q=(text to extract whatever it is)&amp
I tried this pattern, but it's not working for me:
/(?<=url\?q=).*?(?=&amp)/
I'm programming in Vb.net, this is the code, but I think that the problem is that the pattern is wrong:
Dim matches As MatchCollection
matches = regex.Matches(TextBox1.Text)
For Each Match As Match In matches
listbox1.items.add(Match.Value)
Next
Could you help me please?
Your regex is seemed to be correct except the slash(/) in the beginning and ending of expression, remove it:
Dim regex = New Regex("(?<=url\?q=).*?(?=&amp)")
and it should work.
Some utilities and most languages use / (forward slash) to start and end (de-limit or contain) the search expression others may use single quotes. With System.Text.RegularExpressions.Regex you don't need it.
This regex code below will extract all urls from your text (or any other):
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?