Regex catch space in the link - regex

I'm grab value from web page and I use this:
.Squadra = Regex.Match(Content, "<td class=""team large-link""\s*>(.+?)</td>").Groups(1).ToString
.Squadra = Remove_Tags(Regex.Match(.Squadra, ">(\w+)<\/a>").Groups(1).ToString)
you can see that .Squadra contains the grabbed link value, so in the next step i replace al link format and grab the only value inserted in the link.
All working good, but if I have value like this: Real Vicenza, the variable return me blank value, probably for the space between Real and Vicenza?
How to correct this? I'm trying to insert s* before (+w) but seems doesn't work.

Instead of using (\w+) use ([^<]+) - that matches all non-< characters, which is far less likely to fail.

Related

Rainmeter Remove last space

From the example, the name of the processor is returned but with a space at the end, how can I make it return a value without a space in the end?
[MeasureRun]
Measure=Plugin
Plugin=RunCommand
Parameter=wmic cpu get Name
OutputType=ANSI
RegExpSubstitute=1
Substitute="Name.*#CRLF#":"","#CRLF#":""
ClipString=1
IfCondition=1
IfTrueAction=[!CommandMeasure MeasureRun "Run"]
[MeterResult]
Meter=String
MeasureName=MeasureRun
FontSize=14
FontColor=255,255,255,255
AntiAlias=1
Text=%1!
I think this can be done through regular expression, but I'm not strong at it.
Just add another Substitute in the substitution list to replace the occurence of one or more whitespaces from the end of the string (the pattern "\s+$") with an empty string like below:
Substitute="Name.*#CRLF#":"","#CRLF#":"","\s+$":""

VB.Net Beginner: Replace with Wildcards, Possibly RegEx?

I'm converting a text file to a Tab-Delimited text file, and ran into a bit of a snag. I can get everything I need to work the way I want except for one small part.
One field I'm working with has the home addresses of the subjects as a single entry ("1234 Happy Lane Somewhere, St 12345") and I need each broken down by Street(Tab)City(Tab)State(Tab)Zip. The one part I'm hung up on is the Tab between the State and the Zip.
I've been using input=input.Replace throughout, and it's worked well so far, but I can't think of how to untangle this one. The wildcards I'm used to don't seem to be working, I can't replace ("?? #####") with ("??" + ControlChars.Tab + "#####")...which I honestly didn't expect to work, but it's the only idea on the matter I had.
I've read a bit about using Regex, but have no experience with it, and it seems a bit...overwhelming.
Is Regex my best option for this? If not, are there any other suggestions on solutions I may have missed?
Thanks for your time. :)
EDIT: Here's what I'm using so far. It makes some edits to the line in question, taking care of spaces, commas, and other text I don't need, but I've got nothing for the State/Zip situation; I've a bad habit of wiping something if it doesn't work, but I'll append the last thing I used to the very end, if that'll help.
If input Like "Guar*###/###-####" Then
input = input.Replace("Guar:", "")
input = input.Replace(" ", ControlChars.Tab)
input = input.Replace(",", ControlChars.Tab)
input = "C" + ControlChars.Tab + strAccount + ControlChars.Tab + input
End If
input = System.Text.RegularExpressions.Regex.Replace(" #####", ControlChars.Tab + "#####") <-- Just one example of something that doesn't work.
This is what's written to input in this example
" Guar: LASTNAME,FIRSTNAME 999 E 99TH ST CITY,ST 99999 Tel: 999/999-9999"
And this is what I can get as a result so far
C 99999/9 LASTNAME FIRSTNAME 999 E 99TH ST CITY ST 99999 999/999-9999
With everything being exactly what I need besides the "ST 99999" bit (with actual data obviously omitted for privacy and professional whatnots).
UPDATE: Just when I thought it was all squared away, I've got another snag. The raw data gives me this.
# TERMINOLOGY ######### ##/##/#### # ###.##
And the end result is giving me this, because this is a chunk of data that was just fine as-is...before I removed the Tabs. Now I need a way to replace them after they've been removed, or to omit this small group of code from a document-wide Tab genocide I initiate the code with.
#TERMINOLOGY###########/##/########.##
Would a variant on rgx.Replace work best here? Or can I copy the code to a variable, remove Tabs from the document, then insert the variable without losing the tabs?
I think what you're looking for is
Dim r As New System.Text.RegularExpressions.Regex(" (\d{5})(?!\d)")
Dim input As String = rgx.Replace(input, ControlChars.Tab + "$1")
The first line compiles the regular expression. The \d matches a digit, and the {5}, as you can guess, matches 5 repetitions of the previous atom. The parentheses surrounding the \d{5} is known as a capture group, and is responsible for putting what's captured in a pseudovariable named $1. The (?!\d) is a more advanced concept known as a negative lookahead assertion, and it basically peeks at the next character to check that it's not a digit (because then it could be a 6-or-more digit number, where the first 5 happened to get matched). Another version is
" (\d{5})\b"
where the \b is a word boundary, disallowing alphanumeric characters following the digits.

Regex to extract long hexadecimal string

I want to extract with a regex the value after ajaxBrowserNavigationCheck('&x and before the = from the following javascript code:
if (ajaxBrowserNavigationCheck('&x909ef93d-61ac-4311-ac56-20c2ae9770f5=7ebdc2a4-df58-4c1c-9b50-96964c93e927', '', 'servletcontroller', '')){
processBrowserNavigationButton();
Basicly teh value I want to extra are &x909ef93d-61ac-4311-ac56-20c2ae9770f5 (the value before the = and we need the &x)
and 7ebdc2a4-df58-4c1c-9b50-96964c93e927 (the value after the =)
Note that the value is there twice (its after MODE=BROWSER_NAV)
Note that both value have 36 char without the &x
the &x is always there for the first string
My reg ex is a bit rusty here what I got so far:
(&x([0-9a-fA-F]|-)+) get me the first part
(&x([0-9a-fA-F]|-)+)|(=([0-9a-fA-F]|-)+) get me both but with the = we don't want it...
Edit: Sorry that I forgot the language, it's for a jmeter script which use jakarta ORO.
Edit2: I realize I can split those in two variable or even in three in jmeter that make it a bit easier.
Edit3: I removed the window location part because it was misleading since it was the same in the ajax part.
in ajaxBrowserNavigationCheck('&x909ef93d-61ac-4311-ac56-20c2ae9770f5=7ebdc2a4-df58-4c1c-9b50-96964c93e927', '', 'servletcontroller', ''))
we want &x909ef93d-61ac-4311-ac56-20c2ae9770f5 and 7ebdc2a4-df58-4c1c-9b50-96964c93e927
You haven't said what language you are using, so it's hard to give a solid answer.
This matches just your targets:
&x[a-fA-F0-9-]*(?==)
The last term is a look ahead, which asserts, but does not capture, an equals sign.
This regex matches all the input and captures each target twice as groups 1 and 2:
(?m).*?(&x[a-fA-F0-9-]*)=.*(&x[a-fA-F0-9-]*)=.*
See a live demo on rubular

Notepad++ RegeEx group capture syntax

I have a list of label names in a text file I'd like to manipulate using Find and Replace in Notepad++, they are listed as follows:
MyLabel_01
MyLabel_02
MyLabel_03
MyLabel_04
MyLabel_05
MyLabel_06
I want to rename them in Notepad++ to the following:
Label_A_One
Label_A_Two
Label_A_Three
Label_B_One
Label_B_Two
Label_B_Three
The Regex I'm using in the Notepad++'s replace dialog to capture the label name is the following:
((MyLabel_0)((1)|(2)|(3)|(4)|(5)|(6)))
I want to replace each capture group as follows:
\1 = Label_
\2 = A_One
\3 = A_Two
\4 = A_Three
\5 = B_One
\6 = B_Two
\7 = B_Three
My problem is that Notepad++ doesn't register the syntax of the regex above. When I hit Count in the Replace Dialog, it returns with 0 occurrences. Not sure what's misesing in the syntax. And yes I made sure the Regular Expression radio button is selected. Help is appreciated.
UPDATE:
Tried escaping the parenthesis, still didn't work:
\(\(MyLabel_0\)\((1\)|\(2\)|\(3\)|\(4\)|\(5\)|\(6\)\)\)
Ed's response has shown a working pattern since alternation isn't supported in Notepad++, however the rest of your problem can't be handled by regex alone. What you're trying to do isn't possible with a regex find/replace approach. Your desired result involves logical conditions which can't be expressed in regex. All you can do with the replace method is re-arrange items and refer to the captured items, but you can't tell it to use "A" for values 1-3, and "B" for 4-6. Furthermore, you can't assign placeholders like that. They are really capture groups that you are backreferencing.
To reach the results you've shown you would need to write a small program that would allow you to check the captured values and perform the appropriate replacements.
EDIT: here's an example of how to achieve this in C#
var numToWordMap = new Dictionary<int, string>();
numToWordMap[1] = "A_One";
numToWordMap[2] = "A_Two";
numToWordMap[3] = "A_Three";
numToWordMap[4] = "B_One";
numToWordMap[5] = "B_Two";
numToWordMap[6] = "B_Three";
string pattern = #"\bMyLabel_(\d+)\b";
string filePath = #"C:\temp.txt";
string[] contents = File.ReadAllLines(filePath);
for (int i = 0; i < contents.Length; i++)
{
contents[i] = Regex.Replace(contents[i], pattern,
m =>
{
int num = int.Parse(m.Groups[1].Value);
if (numToWordMap.ContainsKey(num))
{
return "Label_" + numToWordMap[num];
}
// key not found, use original value
return m.Value;
});
}
File.WriteAllLines(filePath, contents);
You should be able to use this easily. Perhaps you can download LINQPad or Visual C# Express to do so.
If your files are too large this might be an inefficient approach, in which case you could use a StreamReader and StreamWriter to read from the original file and write it to another, respectively.
Also be aware that my sample code writes back to the original file. For testing purposes you can change that path to another file so it isn't overwritten.
Bar bar bar - Notepad++ thinks you're a barbarian.
(obsolete - see update below.) No vertical bars in Notepad++ regex - sorry. I forget every few months, too!
Use [123456] instead.
Update: Sorry, I didn't read carefully enough; on top of the barhopping problem, #Ahmad's spot-on - you can't do a mapping replacement like that.
Update: Version 6 of Notepad++ changed the regular expression engine to a Perl-compatible one, which supports "|". AFAICT, if you have a version 5., auto-update won't update to 6. - you have to explicitly download it.
A regular expression search and replace for
MyLabel_((01)|(02)|(03)|(04)|(05)|(06))
with
Label_(?2A_One)(?3A_Two)(?4A_Three)(?5B_One)(?6B_Two)(?7B_Three)
works on Notepad 6.3.2
The outermost pair of brackets is for grouping, they limit the scope of the first alternation; not sure whether they could be omitted but including them makes the scope clear. The pattern searches for a fixed string followed by one of the two-digit pairs. (The leading zero could be factored out and placed in the fixed string.) Each digit pair is wrapped in round brackets so it is captured.
In the replacement expression, the clause (?4A_Three) says that if capture group 4 matched something then insert the text A_Three, otherwise insert nothing. Similarly for the other clauses. As the 6 alternatives are mutually exclusive only one will match. Thus only one of the (?...) clauses will have matched and so only one will insert text.
The easiest way to do this that I would recommend is to use AWK. If you're on Windows, look for the mingw32 precompiled binaries out there for free download (it'll be called gawk).
BEGIN {
FS = "_0";
a[1]="A_One";
a[2]="A_Two";
a[3]="A_Three";
a[4]="B_One";
a[5]="B_Two";
a[6]="B_Three";
}
{
printf("Label_%s\n", a[$2]);
}
Execute on Windows as follows:
C:\Users\Mydir>gawk -f test.awk awk.in
Label_A_One
Label_A_Two
Label_A_Three
Label_B_One
Label_B_Two
Label_B_Three

RegEx : Replace parts of dynamic strings

I have a string
IsNull(VSK1_DVal.RuntimeSUM,0),
I need to remove IsNull part, so the result would be
VSK1_DVal.RuntimeSUM,
I'm absolute new to RegEx, but it wouldn't be a problem, if not one thing :
VSK1 is dynamic part, can be any combination of A-Z,0-9 and any length. How to replace strings with RegEx? I use MSSQL 2k5, i think it uses general set of RegEx rules.
EDIT : I forgot to say, that I'm doing replacement in SSMS Query window's Replace Box (^H) - not building RegEx query
br
marius
here's a regex that should work:
[^(]+\(([^,]+),[^)]\)
Then use $1 capture group to extract the part that you need.
I did a sanity check in ruby:
orig = "IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /[^(]*\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => VSK1_DVal.RuntimeSUM,
It gets trickier if you have a prefix that you want to retain. Like if you have this:
"somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
In this case, you need someway to identify the start of the pattern. Maybe you can use '=' to identify the start of the pattern? If so, this should work:
orig = "somestuff = IsNull(VSK1_DVal.RuntimeSUM,0),"
regex = /=\s*\w+\(([^,]+),[^)]\)/
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
But then the case where you don't have an equals sign will fail. Maybe you can use 'IsNull' to identify the start of the pattern? If so, try this (note the '/i' representing case insensitive matching):
orig = "somestuff = isnull(VSK1_DVal.RuntimeSUM,0),"
regex = /IsNull\(([^,]+),[^)]\)/i
result = orig.sub(regex){$1} # result => somestuff = VSK1_DVal.RuntimeSUM,
/IsNULL\((A-Z0-9+),0\)/
Then pick group match number 1.
Here's a very useful site: http://www.regexlib.com/RETester.aspx
They have a tester and a cheatsheet that are very useful for quick testing of this sort.
I tested the solution by Dave and it works fine except it also removes the trailing comma you wanted retained. Minor thing to fix.
Try this:
IsNULL\((.*,)0\)
You say in your question
I use MSSQL 2k5, i think it uses
general set of RegEx rules.
This is not true unless you enable CLR and compile and install an assembly. You can use its native pattern matching syntax and LIKE for this as below.
WITH T(C) AS
(
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,0),' UNION ALL
SELECT 'IsNull(VSK1_DVal.RuntimeSUM,123465),' UNION ALL
SELECT 'No Match'
)
SELECT SUBSTRING(C,8,1+LEN(C)-8-CHARINDEX(',',REVERSE(C),2))
FROM T
WHERE C LIKE 'IsNull(%,_%),'