regex help. modify regex to exclude content within curly braces - regex

In javascript, my regex is:
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)", "gi");
My string is:
<span class="customer-key">{aaa}</span>bbb {ccc}
In the above example, the regex matches "aaa", "bbb" and "ccc".
I would like to update my regex to EXCLUDE anything WITHIN curly braces, so that it ONLY matches "bbb".
How can I update the regex to do so? thanks!

Try (your regexp separate letters so my too)
let regEx = new RegExp("([A-Za-z])(?!([^<]+)?>)(?!([^{]+)?})", "gi");
let str= '<span class="customer-key">{aaa}</span>bbb {ccc}'
let s =str.match(regEx);
console.log(s)

If you want to get bbb , one option could be to use the dom, find the textnode(s) and remove the content between curly braces:
const htmlString = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
let div = document.createElement('div');
div.innerHTML = htmlString;
div.childNodes.forEach(x => {
if (x.nodeType === Node.TEXT_NODE) {
console.log(x.textContent.replace(/{[^}]+}/g, ''));
}
});
Note that parsing html with a regex is not advisable.
If you want to get bbb from your example string, another option could be to match what you don't want to keep and to replace that with an empty string.
const regex = /\s*<[^>]+>\s*|\s*{[^}]+}\s*/gm;
const str = `<span class="customer-key">{aaa}</span>bbb {ccc}`;
const result = str.replace(regex, '');
console.log(result);

Related

Find all commas between two seperate characters in string

I have a substring that contains commas. This substring lives inside of another string that is a semi colon delimited list. I need to match the commas in that substring. The substring has a key field "u3=" in front of it.
Example:
u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing
Regex so far:
(?<=u3)(.*)(?=;)
The regex i've been working on above matches everything between "u3" and the last ";" in the outerstring. I need to match only the commas in the substring.
Any guidance would be greatly appreciated.
You didn't specify language!
C#, VB (.NET):
Using an infinite positive lookbehind,
(?<=u3=[^;]*),
Java:
Using a variable-length positive lookbehind:
(?<=u3=[^;]{0,9999}),
PHP (PCRE), Perl, Ruby:
Using \G along with \K token:
(?>u3=|\G(?!^))[^,;]+\K,
Live demo
JavaScript:
Using two replace() methods (if you are going to substitute),
var s = 'u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing';
console.log(
s.replace(/u3=[^;]+/, function(match) {
return match.replace(/,/g, '*');
})
)
Try to use this regex:
(?<=u3)[^;]+
The result is:
=cat,matt,bat,hat
If this was PHP I would do this:
<?php
$str = 'u1=something;u2=somethingelse;u3=cat,matt,bat,hat;u4=anotherthing;u5=yetanotherthing;';
$split = explode(';', $str);
foreach ($split as $key => $value) {
$subsplit = explode('=',$value);
if ($subsplit[0] == 'u3') {
echo $subsplit[1];
preg_match_all('/,/', $subsplit[1], $matches, PREG_OFFSET_CAPTURE);
}
}
var_dump($matches);

Regular expression to extract href url

I want to extract the links from a String with regular expressions. I found a similar post here and I tried this code
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>.*?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in: text,
options: [],
range:range ,
withTemplate: "")
but the proposed regular expression deleted all the content of the href tag. My string look like
SOME stirng some text I need to keep and other text
and the expected result is
SOME stirng https://com.mywebsite.com/yfgvh/f23/fsd some text I need to keep and other text
the perfect result is
SOME stirng some text I need to keep (https://com.mywebsite.com/yfgvh/f23/fsd) and other text
Do you have an idea if it's possible to achieve this?
Of course it deletes the href content because you are ...ReplacingMatches...with empty string.
Your sample string does not match the pattern because the closing tag </a> is missing.
The pattern "<a[^>]+href=\"(.*?)\"[^>]*>" checks until a closing angle bracket after the link.
The captured group is located at index 1 of the match. This code prints all extracted links:
let text = "<a href=\"https://com.mywebsite.com/yfgvh/f23/fsd\" rel=\"DFGHJ\">"
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>")
let range = NSMakeRange(0, text.characters.count)
let matches = regex.matches(in: text, range: range)
for match in matches {
let htmlLessString = (text as NSString).substring(with: match.rangeAt(1))
print(htmlLessString)
}
I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: #"$2 ($1)")
This regex seems to work in this case: href="(.*)" .*">(.*)<\/a>(.*) , group 1 would have your url , group 2 text between <a></a> and group 3 text after <a></a> , however you will have to use this extension to be able to get information out of groups, as there is no native group support: http://samwize.com/2016/07/21/how-to-capture-multiple-groups-in-a-regex-with-swift/

Matching but not capture a string in Swift Regex

I'm trying to search for a single plain quote mark (') in given String to then replace it with a single curved quote mark (’). I had tested more patterns but every time the search captures also the adjacent text. For example in the string "I'm", along with the ' mark it gets also the "I" and the "m".
(?:\\S)'(?:\\S)
Is there a possibility for achieve this or in the Swift implementation of Regex there is not support for non-capturing groups?
EDIT:
Example
let startingString = "I'm"
let myPattern = "(?:\\S)(')(?:\\S)"
let mySubstitutionText = "’"
let result = (applyReg(startingString, pattern: myPattern, substitutionText: mySubstitutionText))
func applyReg(startingString: String, pattern: String, substitutionText: String) -> String {
var newStr = startingString
if let regex = try? NSRegularExpression(pattern: pattern, options: .CaseInsensitive) {
let regStr = regex.stringByReplacingMatchesInString(startingString, options: .WithoutAnchoringBounds, range: NSMakeRange(0, startingString.characters.count), withTemplate: startingString)
newStr = regStr
}
return newStr
}
Matching but not capture a string in Swift Regex
In regex, you can use lookarounds to achieve this behavior:
let myPattern = "(?<=\\S)'(?=\\S)"
See the regex demo
Lookarounds do not consume the text they match, they just return true or false so that the regex engine could decide what to do with the currently matched text. If the condition is met, the regex pattern is evaluated further, and if not, the match is failed.
However, using capturing seems quite valid here, do not discard that approach.
Put your quote in a capture group in itself
(?:\\S)(')(?:\\S)
For example, when matching against "I'm", this will capture ["I", "'", "m"]

extjs 4.2 - replace substring, but avoid changing tags

In a given string, e.g. "This <TERM>is</TERM> my question <TERM>for</TERM> you", I need to replace a substring.
Goal: surround substring with <MARK>-Tags for highlighting.
Example:
substring: 'e'
desired result: This <TERM>is</TERM> my qu<mark>e</mark>stion <TERM>for</TERM> you
actual result: This <T<mark>E</mark>RM>is</T<mark>E</mark>RM> my qu<mark>e</mark>stion <T<mark>E</mark>RM>for</T<mark>E</mark>RM> you
How can I prevent the pseudoTags <TERM> and </TERM> from beeing changed?
Until now, I used this:
according to my example, filterValue would be substring 'e'
var filterStartTag = '<mark>';
var filterEndTag = '</mark>';
var replaceFilterValue = filterStartTag + filterValue + filterEndTag;
value = value.replace(new RegExp(filterValue, 'gi'), replaceFilterValue);
I need to find a solution where also substrings like "t", "te", "erm" (and so on) can properly be replaced.
Any help would be highly appreciated.
Try something like this:
value.replace(/(e)+(?![^<]*>)/g, "$1");
The (e) is what is interpreted as $1 for your replacement, and can be replaced with any Regex that matches what you need (keep it surrounded by parentheses to allow for the $1 replacement). So, you could also do it like this, to be more dynamic:
var r = new RegExp("(" + filterValue + ")+(?![^<]*>)", "gi");
value.replace(r, "<mark>$1</mark>");
Essentially, this is saying to match anything in the first parentheses, unless it is surrounded by tags (?![^<]*>).

Regex to fix GetSafeHtmlFragment x_ prefix

When using Sanitizer.GetSafeHtmlFragment from Microsoft's AntiXSSLibrary 4.0, I noticed it changes my HTML fragment from:
<pre class="brush: csharp">
</pre>
to:
<pre class="x_brush: x_csharp">
</pre>
Sadly their API doesn't allow us to disable this behavior. Therefore I'd like to use a regular expression (C#) to fix and replace strings like "x_anything" to "anything", that occur inside a class="" attribute.
Can anyone help me with the RegEx to do this?
Thanks
UPDATE - this worked for me:
private string FixGetSafeHtmlFragment(string html)
{
string input = html;
Match match = Regex.Match(input, "class=\"(x_).+\"", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
return input.Replace(key, "");
}
return html;
}
Im not 100% sure about the C# #(Verbatim symbol) but I think this should match x_ inside of any class="" and replace it with an empty string:
string input = 'class="x_something"';
Match match = Regex.Match(input, #'class="(x_).+"',
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
string v = input.Replace(key,"");
}
It's been over a year since this has been posted but here's some regex you can use that will remove up to three class instances. I'm sure there's a cleaner way but it gets the job done.
VB.Net Code:
Regex.Replace(myHtml, "(<\w+\b[^>]*?\b)(class="")x[_]([a-zA-Z]*)( )?(?:x[_])?([a-zA-Z]*)?( )?(?:x[_])?([^""]*"")", "$1$2$3$4$5$6$7")