Regex to fix GetSafeHtmlFragment x_ prefix - regex

When using Sanitizer.GetSafeHtmlFragment from Microsoft's AntiXSSLibrary 4.0, I noticed it changes my HTML fragment from:
<pre class="brush: csharp">
</pre>
to:
<pre class="x_brush: x_csharp">
</pre>
Sadly their API doesn't allow us to disable this behavior. Therefore I'd like to use a regular expression (C#) to fix and replace strings like "x_anything" to "anything", that occur inside a class="" attribute.
Can anyone help me with the RegEx to do this?
Thanks
UPDATE - this worked for me:
private string FixGetSafeHtmlFragment(string html)
{
string input = html;
Match match = Regex.Match(input, "class=\"(x_).+\"", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
return input.Replace(key, "");
}
return html;
}

Im not 100% sure about the C# #(Verbatim symbol) but I think this should match x_ inside of any class="" and replace it with an empty string:
string input = 'class="x_something"';
Match match = Regex.Match(input, #'class="(x_).+"',
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
string v = input.Replace(key,"");
}

It's been over a year since this has been posted but here's some regex you can use that will remove up to three class instances. I'm sure there's a cleaner way but it gets the job done.
VB.Net Code:
Regex.Replace(myHtml, "(<\w+\b[^>]*?\b)(class="")x[_]([a-zA-Z]*)( )?(?:x[_])?([a-zA-Z]*)?( )?(?:x[_])?([^""]*"")", "$1$2$3$4$5$6$7")

Related

regex help. modify regex to exclude content within curly braces

In javascript, my regex is:
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)", "gi");
My string is:
<span class="customer-key">{aaa}</span>bbb {ccc}
In the above example, the regex matches "aaa", "bbb" and "ccc".
I would like to update my regex to EXCLUDE anything WITHIN curly braces, so that it ONLY matches "bbb".
How can I update the regex to do so? thanks!
Try (your regexp separate letters so my too)
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)(?!([^{]+)?})", "gi");
let str= '<span class="customer-key">{aaa}</span>bbb {ccc}'
let s =str.match(regEx);
console.log(s)
If you want to get bbb , one option could be to use the dom, find the textnode(s) and remove the content between curly braces:
const htmlString = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
let div = document.createElement('div');
div.innerHTML = htmlString;
div.childNodes.forEach(x => {
if (x.nodeType === Node.TEXT_NODE) {
console.log(x.textContent.replace(/{[^}]+}/g, ''));
}
});
Note that parsing html with a regex is not advisable.
If you want to get bbb from your example string, another option could be to match what you don't want to keep and to replace that with an empty string.
const regex = /\s*<[^>]+>\s*|\s*{[^}]+}\s*/gm;
const str = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
const result = str.replace(regex, '');
console.log(result);

Simple Regex text replace then add suffix

Im have a program that vbcan only handle basic regex no C# vb.net etc.
This is my situation.
I have a set of start Urls.
http://www.foo.com?code=234654
I need to remove the ?code= and replace with a / then add the letter t at the end.
Like this:
http://www.foo.com/234654t
I would appreciate any help this this.
Thanks
Sean
For the dialect that is used in java.util.regex you can use this regular expression, for example:
String regex = "\\?+[A-Za-z=]+([0-9]+)(?<=[0-9]+)(?=$)";
String replacement = "/$1t";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(line);
if (m.find()) {
System.out.println(m.replaceAll(replacement));
}
Another example, by using replaceAll:
line.replaceAll("\\?+[A-Za-z=]+", "/").replaceAll("(?<=[0-9|/]+)(?=$)", "t");
For the string:
String line = "http://www.foo.com?code=234654";
You'll get:
http://www.foo.com/234654t

How to use regex to change occurance of complex strings?

For example,
Say I want to change
all occurances of <img src="https://www.blahblah.com/i" title="Bob" />
To simply
Bob
This is for vb.net
Basically there are plenty of such pattern in a big string. I want to change every one of them.
This is what I tried
Dim tdparking = New System.Text.RegularExpressions.Regex("\w* (<img.*title="")(.*)"" />")
After that I suppose I would need to do some substitution. But how?
How would I do so?
var str = '<img src="https://www.blahblah.com/i" title="Bob" />';
str = str.replace(/<img[^>]*title="(\w+)"[^>]*>/,"$1");
document.write(str);
You can use this regex: \<img[^\<\>]+title=\"([a-zA-Z]+)\"[^\<\>]+\/\> and return $1 of the matching pattern:
In PHP it should be:
preg_match_all('#\<img[^\<\>]+title=\"([a-zA-Z]+)\"[^\<\>]+\/\>#', $html, $matches);
var_dump($matches[1]);
Regards,
Everyone is making this waay to hard - Here it is in vb.net
Dim reg as Regex = New Regex("(?<=title="""").+(?="""")")
Dim str as String = "<img src=""https://www.blahblah.com/i"" title=""Bob"" />"
Dim match as String = reg.match(str).value
Depending on how the string is inputted you will either need
"(?<=title="""").+(?="""")" 'If there is Double quotes ("")
or
"(?<=title="").+(?="")" 'Or single quotes (")
Also there is no need for you to get downvoted - here is a point back

Dart: RegExp by example

I'm trying to get my Dart web app to: (1) determine if a particular string matches a given regex, and (2) if it does, extract a group/segment out of the string.
Specifically, I want to make sure that a given string is of the following form:
http://myapp.example.com/#<string-of-1-or-more-chars>[?param1=1&param2=2]
Where <string-of-1-or-more-chars> is just that: any string of 1+ chars, and where the query string ([?param1=1&param2=2]) is optional.
So:
Decide if the string matches the regex; and if so
Extract the <string-of-1-or-more-chars> group/segment out of the string
Here's my best attempt:
String testURL = "http://myapp.example.com/#fizz?a=1";
String regex = "^http://myapp.example.com/#.+(\?)+\$";
RegExp regexp= new RegExp(regex);
Iterable<Match> matches = regexp.allMatches(regex);
String viewName = null;
if(matches.length == 0) {
// testURL didn't match regex; throw error.
} else {
// It matched, now extract "fizz" from testURL...
viewName = ??? // (ex: matches.group(2)), etc.
}
In the above code, I know I'm using the RegExp API incorrectly (I'm not even using testURL anywhere), and on top of that, I have no clue how to use the RegExp API to extract (in this case) the "fizz" segment/group out of the URL.
The RegExp class comes with a convenience method for a single match:
RegExp regExp = new RegExp(r"^http://myapp.example.com/#([^?]+)");
var match = regExp.firstMatch("http://myapp.example.com/#fizz?a=1");
print(match[1]);
Note: I used anubhava's regular expression (yours was not escaping the ? correctly).
Note2: even though it's not necessary here, it is usually a good idea to use raw-strings for regular expressions since you don't need to escape $ and \ in them. Sometimes using triple-quote raw-strings are convenient too: new RegExp(r"""some'weird"regexp\$""").
Try this regex:
String regex = "^http://myapp.example.com/#([^?]+)";
And then grab: matches.group(1)
String regex = "^http://myapp.example.com/#([^?]+)";
Then:
var match = matches.elementAt(0);
print("${match.group(1)}"); // output : fizz

Regex to find substring between two strings

I'd like to capture the value of the Initial Catalog in this string:
"blah blah Initial Catalog = MyCat'"
I'd like the result to be: MyCat
There could or could not be spaces before and after the equal sign and there could or could not be spaces before the single quote.
Tried this and various others but no go:
/Initial Catalog\s?=\s?.*\s?\'/
Using .Net.
You need to put parentheses around the part of the string that you would like to match:
/Initial Catalog\s*=\s*(.*?)\s*'/
Also you would like to exclude as many spaces as possible before the ', so you need \s* rather than \s?. The .*? means that the extracted part of the string doesn't take those spaces, since it is now lazy.
This is a nice regex
= *(.*?) *'
Use the idea and add \s and more literal text as needed.
In C# group 1 will contain the match
string resultString = null;
try {
Regex regexObj = new Regex("= *(.*?) *'");
resultString = regexObj.Match(subjectString).Groups[1].Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Regex rgx = new Regex(#"=\s*([A-z]+)\s*'");
String result = rgx.Match(text).Groups[1].Value;