Refactoring starting place for regex - regex

I have a function that stripes HTML markup to display inside of a text element.
stripChar: function stripChar(string) {
string = string.replace(/<\/?[^>]+(>|$)/g, "")
string = string.trim()
string = string.replace(/(\n{2,})/gm,"\n\n");
string = string.replace(/…/g,"...")
string = string.replace(/ /g,"")
let changeencode = entities.decode(string);
return changeencode;
}
This has worked great for me, but I have a new requirement and Im struggle to work out where I should start refactoring the code above. I still need to stripe out the above, but I have 2 exceptions;
List items, <ul><li>, I need to handle these so that they still appear as a bullet point
Hyperlinks, I want to use the react-native-hyperlink, so I need to leave intack the <a> for me to handle separately
Whilst the function is great for generalise tag replacement, its less flexible for my needs above.

You may use
stripChar: function stripChar(string) {
string = string.replace(/ |<(?!\/?(?:li|ul|a)\b)\/?[^>]+(?:>|$)/g, "");
string = string.trim();
string = string.replace(/\n{2,}/g,"\n\n");
string = string.replace(/…/g,"...")
let changeencode = entities.decode(string);
return changeencode;
}
The main changes:
.replace(/ /g,"") is moved to the first replace
The first replace is now used with a new regex pattern where the li, ul and a tags are excluded from the matches using a negative lookahead (?!\/?(?:li|ul|a)\b).
See the updated regex demo here.

Related

Removing a String from a String REGEX

I have an automatically generated string which looks as follows:
["January","February",null,"April"]
I need to remove any match of ",null" from the string, ie:
["January","February",null,"April"] --> ["January","February","April"]
How can I find everything except for ",null"?
I have tried variations of "^(?!,null).*" without success.
To answer your question as stated, you don't need regex:
str = str.replace(",null", "");
However, to handle the edge cases too:
["January","February",null,"April"] --> ["January","February","April"]
["January",null,null,"April"] --> ["January","April"]
[null,"January","February","April"] --> ["January","February","April"]
["January","February","April",null] --> ["January","February","April"]
[null] --> []
you would be better served with regex:
str = str.replaceAll("(?<=\\[)null,?|,null", "");
The replacement regex caters for null the first (and potentially only) position, and any other case.
Please review this one, we can use filter function
var arr = [null, "January","February",null,"April", null];
var arr2 = arr.filter(function(x){return x !== null})
console.log(arr,arr2);

Search for an item in a text file using UIMA Ruta

I have been trying to search for an item which is there in a text file.
The text file is like
Eg: `
>HEADING
00345
XYZ
MethodName : fdsafk
Date: 23-4-2012
More text and some part containing instances of XYZ`
So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName .
I am unable to do that.
WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};
DECLARE Method;
"MethodName" -> Method;
WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};
Also how do we use REGEX in UIMA RUTA?
There are many ways to specify this. Here are some examples (not tested):
// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};
// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # #type{-> UNMARK(type)}
// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} #Method;
There are two options to use regex in UIMA Ruta:
(find) simple regex rules like "[A-Za-z]+" -> Type;
(matches) REGEXP conditions for validating the match of a rule element like
ANY{REGEXP("[A-Za-z]+")-> Type};
Let me know if something is not clear. I will extend the description then.
DISCLAIMER: I am a developer of UIMA Ruta

How to remove text inside of parentheses with VB script RegExp

I am using a labeling software and I don't want any text inside of parentheses to display on the labels. Here is what I have so far
Function RemovePara(TextToBeEdited)
Set myRegEx = New RegExp
myRegEx.IgnoreCase = True
myRegEx.Global = True
myRegEx.Pattern = "\(([a-z]+?)\)(.+)"
Set RemovePara = myRegEx.Replace(txt, "")
End Function
Now I'm pretty new to this, and when I try to save this code in the labeling software it says "The script did not read the "Value" property, which means the current specified data source was ignored. This may not be what you intended" I had the text I field name I want edited where "TextToBeEdited" is at. What am I missing here?
You could use lookaround assertions.
myRegEx.Pattern = "(?<=\()[^()]*(?=\))"
Set RemovePara = myRegEx.Replace(txt, "")
DEMO

As3 Regex or alternative to split strings

i have a html page , i use regex to remove all html tags from the page and extract the text using the below code.
var foo = loader.data.replace(/<.*?>/g, "");
var bar:Array = foo.split("Total");
foo = foo.split(bar[0]);
trace(foo);
And using the same code lines below the replace method i remove every string before the word "TOTAL". It does the job perfectly but now i want to apply and other split to get contents after "TOTAL" and remove the Content after "BYTES".
So when i try to split it up again with
var bar2:Array = foo.split("BYTES");
foo = foo.split(bar2[0]);
Flash returns a error saying SPLIT is a not a valid method :S
I tried several other ways , ( REPLACE ) but still flash produces errors.
Can Anyone help me to get through this ?
Thank you
".split()" is a method of String. When you did the assignment below:
foo = foo.split(bar[0]);
foo became an array, and thus the call
var bar2:Array = foo.split("BYTES");
was being made to an array, which is invalid (no such method)
What you want instead is this:
var foo = loader.data.replace(/<.*?>/g, "");
trace(foo);
var result = foo.split("Total")[1].split("BYTES")[0];
trace(result);

How to highlight searched for words using a regular expression

Hi
I am working on a groovy application that requires me to highlight(add spans) to the word that is searched for.For instance given the text below :
youtube
[href="youtube.com] i am here , in Youtube[/a]
I want to search for the word "youtube" and when it returned the above text should look like this :
[span]youtube[span]
[href="youtube.com] i am here , in [span]Youtube[/span] [/a]
The youtube word that is contained in the href or in the iframe must be ignored.
At the moment I have the following code :
def m = test =~ /([^<]*)?(youtube)/
println m[0]
def highLightText = { attrs, body ->
def postBody = attrs.text
def m = postBody =~ /(?i:${attrs.searchTerm})/
def array = []
m.each{
array << it as String
}
array.unique()
String result = postBody
array.each{
result = result.replaceAll("${it}", "<span class='highlight'>${it}</span>")
}
out << result
}
And it returns :
[span]youtube[span]
[href="[span]youtube[span].com] i am here , in [span]Youtube[/span] [/a]
Can anyone help me with a regular expression that can select only words that are not contained in links or other tags.
Thanks
A maintainable solution is unlikely to be achievable using regular expressions - the problem is too complex.
Parse your HTML into a DOM and consider only text nodes as being suitable for potential highlighting. Text nodes will, by definition, be only those pieces of content that are rendered and will not be element names, attributes/attribute values and so on.
The complexity of your problem is then reduced down to: how to do I find and highlight a string within another string?