RegEx to normalize XML syntax - regex

I have an XML-code where some tags generate xml parse errors (Error #1090). The problem is in attributes that need to be quoted:
<div class=treeview>
Help me please to write a regular expression to make them as following:
<div class="treeview">

this one will be correct:
var pattern:RegExp = /(\w+)(=)(\w+)/g;
trace('regexTest:', pString.replace(pattern, '$1$2"$3"'));
because, there must be 3 groups: attribute_name, = (equals), attribute_value

Please, could you try the next code:
var regExp:RegExp = /(class\=)(\w+)/g;
var sourceText:String = "<div class=treeview>";
var replacedText:String = sourceText.replace(regExp, '$1"$2"');
trace(replacedText);
In a nutshell, this RegExp means:
Find 2 groups: (class=) and (any-word-after-it)
Add before and after the group 2 quotes.

You should try the following regex>
regex = /(<div[^>]*class=)(\S+)([^>]*>)/g;
sourceString.replace(regex, '$1"$2"$3');

Try using a general purpose markup repair tool such as John Cowan's TagSoup. This is likely to be much more robust than anything you attempt yourself (for example, most of the suggested regular expressions don't even check that the keyword=value construct is within a start tag).

Related

remove "?show=false" using regex [duplicate]

I looking for regular expression to use in my javascript code, which give me last part of url without parameters if they exists - here is example - with and without parameters:
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC
https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg
In both cases as result I want to get:
14238253_132683573850463_7287992614234853254_n.jpg
Here is this regexp
.*\/([^?]+)
and JS code:
let lastUrlPart = /.*\/([^?]+)/.exec(url)[1];
let lastUrlPart = url => /.*\/([^?]+)/.exec(url)[1];
// TEST
let t1 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg?oh=fdbf6800f33876a86ed17835cfce8e3b&oe=599548AC"
let t2 = "https://scontent-fra3-1.xx.fbcdn.net/v/t1.0-9/14238253_132683573850463_7287992614234853254_n.jpg"
console.log(lastUrlPart(t1));
console.log(lastUrlPart(t2));
May be there are better alternatives?
You could always try doing it without regex. Split the URL by "/" and then parse out the last part of the URL.
var urlPart = url.split("/");
var img = urlPart[urlPart.length-1].split("?")[0];
That should get everything after the last "/" and before the first "?".

Regex JSON response Gatling stress tool

Wanting to capture a variable called scanNumber in the http response loking like this:
{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}
How can I do this with a regular experssion?
The tool I am using is Gatling stress tool (with the Scala DSL)
I have tried to do it like this:
.check(jsonPath("""${scanNumber}""").saveAs("scanNr")))
But I get the error:
---- Errors --------------------------------------------------------------------
> Check extractor resolution crashed: No attribute named 'scanNu 5 (100,0%)
mber' is defined
You were close first time.
What you actually want is:
.check(jsonPath("""$..scanNumber""").saveAs("scanNr")))
or possibly:
.check(jsonPath("""$.profile.memberships[0].scanNumber""").saveAs("scanNr")))
Note that this uses jsonPath, not regular expressions. JsonPath should more reliable than regex for this.
Check out the JsonPath spec for more advanced usage.
use this regex to match this in anywhere in json:
/"scanNumber":"[^"]+"/
and if you want to match just happens in structure you said use:
/\{[^{[]+\{[^{[]+\[\{[^{[]*("scanNumber":"[^"]+")/
Since json fields may change its order you should make your regex more tolerant for those changes:
val j = """{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}"""
val scanNumberRegx = """\{.*"memberships":\[\{.*"scanNumber":"([^"]*)".*""".r
val scanNumberRegx(scanNumber) = j
scanNumber //String = 123-456-123-123
This will work even if the json fields will be in different order (but of course keep the structure)

Validate URLs via Regex?

I have the following Regex
"^http\\\\://[a-zA-Z0-9\\\\-\\\\.]+\\\\.[a-zA-Z]{2,3}(/\\\\S*)?$";
But I'm not sure that it's validating URLs correctly. Is anyone able to assist me or see what's wrong with this?
Thanks
If you want a solid pattern read here.
Looks like Rakesh some good mods to your existing pattern; however, if I were you I would consider the aforementioned patterns because they are a bit more robust depending on your scenario.
Try this, there a quite a bit of escapes "/" in your version
var subUrlSTR = "http://subdomain.stackoverflow.com";
var urlSTR = "http://stackoverflow.com";
var result = /http:\/\/[A-Za-z0-9\.-]{3,}\.[A-Za-z]{3}/;
console.log(subUrlSTR.match(result));
console.log(urlSTR.match(result));
See it working here
if (Uri.TryCreate(stringUrl, UriKind.Absolute, out uri))
{
...
}

How to use regular expression in WatiN

I'm working on WatiN automation tool. I'm having problem in regular expression. I've situation where i have to enter some text and click on a button in the popup window. I'm using AttachToIE method and URL attribute("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd") of the popup to attach to the popup.
The problem is each time the popup appears the ID value in the URL changes. So i'm not able to access the popup. can anyone plz help with this by giving me Regular Expression for the changing value of ID in the below URL
("http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd")
thanking you
It appears that you have a URL with 2 query string parameters Type and ID and your pattern is:
"http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID={some id}"
You can use the Find.ByUrl() attribute constraint method and pass it to AttachToIE() as shown below with the regex for matching that pattern.
string url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=Feedback&ID="
Regex regex = new Regex(url + "[a-z0-9]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(regex));
string baseUrl ="http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID="
Regex urlIE= new Regex(baseUrl + "[\\wd]+", RegexOptions.IgnoreCase);
IE ie = IE.AttachToIE(Find.ByUrl(urlIE);
I'm not familiar with WatiN but it looks like it's runs on .Net so perhaps this might help?
var desiredId = "000000000000-0000-0000-000000000000";
var url = "http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=ef5ad7ef5490-4656-9669-32464aeba7cd&someMoreStuff";
var pattern = #"(?i)(?<=FeedBackId=)[-a-z0-9]+";
var result = Regex.Replace(url, pattern, desiredId);
Console.WriteLine(result);
//Output: http://192.168.25.10:215/admin/SelectUsers.aspx?Type=FeedbackID=000000000000-0000-0000-000000000000&someMoreStuff
The following pattern should have the same affect but is more defensive. It should only match stuff in the query string, it requires the id to be 35 characters and won't match similar parameter names like "PreviousFeedBackId".
var pattern = #"(?i)(?<=\?.*\bFeedBackId=)[-a-z0-9]{35,35}\b";
If you just want to extract the id:
var id = Regex.Match(url, pattern).Value;
Console.WriteLine(id);
//output: ef5ad7ef5490-4656-9669-32464aeba7cd
WatiN has a feature where in we can use the url by neglecting the query string. Below is the code which is working fine for me.
string baseUrl = "http://192.168.25.10:215/admin/SelectUsers.aspx";
IE ie = IE.AttachToIE(Find.ByUrl(baseUrl,true));

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]