RNA second structure regexp pattern for HTML5 - regex

I need a RegExp to identify RNA Second structure in an HTML5 web page.
An RNA Second Structure is simply a string that contains ONLY dot points and balancing parentheses and it's used to identify the RNA shape and if we know the target RNA shape we could guess the sequence of bases that could make an RNA with that target shape.
Please note it should contains at least One dot ..
For example
.....((((...).))..)....
(((....)))
....(((..)))...()...(....((..)))
are true RNA Second Structures but
.....((((....)))...
....a.((((......))))......
((((()))))
are not true structures
These are all my failed attempts to identifying structures:
<input type="text" pattern="/[.()]/g" />
<input type="text" pattern="/[.()]/g" />
<input type="text" pattern="/[\.\(\)]/g" />
<input type="text" pattern="/[().]/g" />
<input type="text" pattern="/[()\.]/g" />
<input type="text" pattern="/[\.()]/g" />
I'm new to RegExp and I should publish my program in the web because my teacher ordered me to do so!
And PLEASE just tell me the RegExp I should use! My program ( libRNA ) itself checks the balancing of parentheses!
libRNA

It is impossible to do generalized bracket balancing (finitely many nesting levels of brackets) with the level of support of JavaScript RegExp. (In Perl, PCRE, .NET regular expression, it is possible to do generalized bracket balancing).
You can write a simple JavaScript function to check, though:
function isValidSequence(str) {
if (!/\./.test(str)) {
// Dot . not found
return false;
}
var openBrackets = 0;
for (var i = 0; i < str.length; i++) {
if (str[i] === "(") {
openBrackets++;
} else if (str[i] === ")") {
if (openBrackets > 0) {
openBrackets--;
} else {
// Reject the case ..)(..
return false;
}
} else if (str[i] !== ".") {
// Garbage character, since it is not . or ( or )
return false;
}
}
// Check all brackets are properly closed
return openBrackets === 0;
}

/[().]+/g
would match everything that looks like an RNA Second Structure (i. e. a continuous sequence of dots and parentheses). You should first use this regex to find possible matches.
Then, you can check whether at least one dot is contained within each of those matches using
if (submatch.indexof(".") != -1)
But you can't check whether the parentheses are correctly balanced - for that you need a parser like nhahtdh suggested.

The problem here is that what you actually need to match is:
a = . | .(a) | (a). | .a | a.
The main problem why solving this with regular expressions will be hard if not impossible is that for every opening paranthesis there has to be a closing one.
It should be possible to do this with JavaScript. You need to do something like this:
Set paranthesis counter to 0. Iterate over the entire structure. When an opening paranthesis is found, increase counter. If you find a closing parenthesis, decrease the counter.
If at the end of the parsing the counter is back to zero, the structure is ok. Only thing missing now is the required dot. For that I would introduce another variable justOpened or something similar. When you find an opening paranthesis you set it to true. When you find a dot you set it to false. If you find a closing parenthesis and your variable is true you can abort, because your structure is broken.

Related

Regex trim all <br>'s on a string while ignoring line breaks and spaces

var str = `
<br><br/>
<Br>
foobar
<span>yay</span>
<br><br>
catmouse
<br>
`;
//this doesn't work but what I have so far
str.replace(/^(<br\s*\/?>)*|(<br\s*\/?>)*$/ig, '');
var desiredOutput = `
foobar
<span>yay</span>
<br><br>
catmouse
`;
I want to ensure that I remove all <br>'s regardless of case or ending slash being present. And I want to keep any <br>'s that reside in the middle of the text. There may be other html tags present.
Edit: I want to note that this will be happening server-side so DOMParser won't be available to me.
We may try using the following pattern:
^\s*(<br\/?>\s*)*|(<br\/?>\s*)*\s*$
This pattern targets <br> tags (and their variants) only if they occur at the start or end of the string, possibly preceded/proceeded by some whitespace.
var str = '<br><br/>\n<Br>\nfoobar\n<span>yay</span>\n<br><br>\ncatmouse\n<br>';
console.log(str + '\n');
str = str.replace(/^\s*(<br\/?>\s*)*|(<br\/?>\s*)*\s*$/ig, '');
console.log(str);
Note that in general parsing HTML with regex is not advisable. But in this case, since you just want to remove flat non-nested break tags from the start and end, regex might be viable.
Don't use a regular expression for this - regular expressions and HTML parsing don't work that well together. Even if it's possible with a regex, I'd recommend using DOMParser instead; transform the text into a document, and iterate through the first and last nodes, removing them while their tagName is BR (and removing empty text nodes too, if they exist):
var str = `
<br><br/>
<Br>
foobar
<span>yay</span>
<br><br>
catmouse
<br>
`;
const body = new DOMParser().parseFromString(str.trim(), 'text/html').body;
const nodes = [...body.childNodes];
let node;
while (node = nodes.shift(), node.tagName === 'BR') {
node.remove();
const next = nodes[0];
if (next.nodeType === 3 && next.textContent.trim() === '') nodes.shift().remove();
}
while (node = nodes.pop(), node.tagName === 'BR') {
node.remove();
const next = nodes[nodes.length - 1];
if (next.nodeType === 3 && next.textContent.trim() === '') nodes.pop().remove();
}
console.log(body.innerHTML);
Note that it gets a lot easier if you don't have to worry about empty text nodes, or if you don't care about whether there are empty text nodes or not in HTML output.
Try
/^(\s*<br\s*\/?>)*|(<br\s*\/?>\s*)*$/ig

Knockout Js Regex no negative numbers

I'm trying to find a regex code to disable negative numbers for the user input.
I'm playing around with the code abit trying to find the right one but haven't had much success.
my current code is:
Price: ko.observable().extend({
required: true,
pattern: '^[0-9].$'
})
In such case, Why do you need to allow user to enter minus numbers in your input field and validate the input against negative number?
Instead you can prevent the User from entering negative numbers/strings.
This uses JavaScript, but you don't have to write your own validation routine. Instead just check the validity.valid property. This will be true if and only if the input falls within the range.
Solution 1:
<html>
<body>
<form action="#">
<input type="number" name="test" min=0 oninput="validity.valid||(value='');">
</form>
</body>
</html>
Solution 2:
The below solution supports to validate multiple inputs.
// Select your input element.
var numInput = document.querySelector('input');
// Listen for input event on numInput.
numInput.addEventListener('input', function(){
// Let's match only digits.
var num = this.value.match(/^\d+$/);
if (num === null) {
// If we have no match, value will be empty.
this.value = "";
}
}, false)
<input type="number" min="0" />
Solution 3:
I haven't tested the below solution, But this might help as well...
Either '/^\d+$/' OR '^\d+$' pattern may help you along with your current approach.
Price: ko.observable().extend({
required: true,
pattern: '/^\d+$/'
})
Original Solution and Reference here..
Hope this helps...
You could use the digit group \d
pattern: '^\d+\.?$'
This matches the following:
The number must start at the beginning of the line
Must consist of 1 or more digits
Can have the character "." 0 or 1 times
The number must end at the end of the line
Here are some examples of matches: "34", "5", "45687654", "1.", "198289."
I notice that you said you wanted to avoid negative numbers, your solution was to squish the number to the beginning and end of the line. You can also use a negative lookbehind to check that the number does not have a negative sign, such as with
pattern: '(?<!-)\b\d+\.?'
I also added a word boundary check (\b) so that is would not try to match the 23 in -123

regex find content question

Trying to use regex refind tag to find the content within the brackets in this example using coldfusion
joe smith <joesmith#domain.com>
The resulting text should be
joesmith#domain.com
Using this
<cfset reg = refind(
"/(?<=\<).*?(?=\>)/s","Joe <joe#domain.com>") />
Not having any luck. Any suggestions?
Maybe a syntax issue, it works in an online regex tester I use.
You can't use lookbehind with CF's regex engine (uses Apache Jakarta ORO).
However, you can use Java's regex though, which does support them, and I've created a wrapper CFC that makes this even easier. Available from:
http://www.hybridchill.com/projects/jre-utils.html
(Update: The wrapper CFC mentioned above has evolved into a full project. See cfregex.net for details.)
Also, the /.../s stuff isn't required/relevant here.
So, from your example, but with improved regex:
<cfset jrex = createObject('component','jre-utils').init()/>
<cfset reg = jrex.match( "(?<=<)[^<>]+(?=>)" , "Joe <joe#domain.com>" ) />
A quick note, since I've updated that regex a few times; hopefully it's at its best now...
(?<=<) # positive lookbehind - start matching at `<` but don't capture it.
[^<>]+ # any char except `<` or `>`, the `+` meaning one-or-more greedy.
(?=>) # positive lookahead - only succeed if there's a `>` but don't capture it.
I've never been happy with the regular expression matching functions in CF. Hence, I wrote my own:
<cfscript>
function reFindNoSuck(string pattern, string data, numeric startPos = 1){
var sucky = refindNoCase(pattern, data, startPos, true);
var i = 0;
var awesome = [];
if (not isArray(sucky.len) or arrayLen(sucky.len) eq 0){return [];} //handle no match at all
for(i=1; i<= arrayLen(sucky.len); i++){
//if there's a match with pos 0 & length 0, that means the mime type was not specified
if (sucky.len[i] gt 0 && sucky.pos[i] gt 0){
//don't include the group that matches the entire pattern
var matchBody = mid( data, sucky.pos[i], sucky.len[i]);
if (matchBody neq arguments.data){
arrayAppend( awesome, matchBody );
}
}
}
return awesome;
}
</cfscript>
Applied to your problem, here is my example:
<cfset origString = "joe smith <joesmith#domain.com>" />
<cfset regex = "<([^>]+)>" />
<cfset matches = reFindNoSuck(regex, origString) />
Dumping the "matches" variable shows that it is an array with 2 items. The first will be <joesmith#domain.com> (because it matches the entire regex) and the second will be joesmith#domain.com (because it matches the 1st group defined in the regular expression -- all subsequent groups would also be captured and included in the array).
/\<([^>]+)\>$/
something like that, didn't test it though, that one's yours ;)

Flex 3 Regular Expression Problem

I've written a url validator for a project I am working on. For my requirements it works great, except when the last part for the url goes longer than 22 characters it breaks. My expression:
/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i
It expects input that looks like "http(s)://hostname:port/location".
When I give it the input:
https://demo10:443/111112222233333444445
it works, but if I pass the input
https://demo10:443/1111122222333334444455
it breaks. You can test it out easily at http://ryanswanson.com/regexp/#start. Oddly, I can't reproduce the problem with just the relevant (I would think) part /(:\d+\/\S+)/i. I can have as many characters after the required / and it works great. Any ideas or known bugs?
Edit:
Here is some code for a sample application that demonstrates the problem:
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
<mx:Script>
<![CDATA[
private function click():void {
var value:String = input.text;
var matches:Array = value.match(/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i);
if(matches == null || matches.length < 1 || matches[0] != value) {
area.text = "No Match";
}
else {
area.text = "Match!!!";
}
}
]]>
</mx:Script>
<mx:TextInput x="10" y="10" id="input"/>
<mx:Button x="178" y="10" label="Button" click="click()"/>
<mx:TextArea x="10" y="40" width="233" height="101" id="area"/>
</mx:Application>
I debugged your regular expression on RegexBuddy and apparently it takes millions of steps to find a match. This usually means that something is terribly wrong with the regular expression.
Look at ([^\s.]+.)+([^\s.]+)(:\d+\/\S+).
1- It seems like you're trying to match subdomains too, but it doesn't work as intended since you didn't escape the dot. If you escape it, demo10:443/123 won't match because it'll need at least one dot. Change ([^\s.]+\.)+ to ([^\s.]+\.)* and it'll work.
2- [^\s.]+ is a bad character class, it will match the whole string and start backtracking from there. You can avoid this by using [^\s:.] which will stop at the colon.
This one should work as you want:
https?:\/\/([^\s:.]+\.)*([^\s:.]+):\d+\/\S+
This is a bug, either in Ryan's implementation or within Flex/Flash.
The regular expression syntax used above (less surrounding slashes and flags) matches Python which provides the following output:
# ignore case insensitive flag as it doesn't matter in this case
>>> import re
>>> rx = re.compile('((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)')
>>> print rx.match('https://demo10:443/1111122222333334444455').groups()
('https://', 'https', 'demo1', '0', ':443/1111122222333334444455')

Regular expression to verify that string contains { }

I need to write a regular expression to verify that a string contains { } but not { or }.Can someone shine some light on this please?
Thanks for all the help , here are some examples.
e.g.
valid : {abc}, as09{02}dd, {sdjafkl}sdjk, sfdsjakl,00{00}00, aaaaa{d}
invalid: {sdsf , sdfadf},sdf{Sdfs ,333}333
*********Update*******************
^[a-zA-Z0-9_-. ](?:{[a-zA-Z0-9_-.]+})?[a-zA-Z0-9_-. ]$ is what I need,thanks for all your help :)
/.*\{.*\}.*/
This would ensure that the string contains an opening curly bracket somewhere before a closing curly bracket, occurring anywhere in the string. However, it wouldn't be able to ensure that there's only one opening and closing curly bracket -- to do that, the .* patterns would have to be changed to something more restrictive.
If you want to experiment and test these regexes out, here's a good site.
What flavor of regex? In JavaScript, for instance, this'll do it:
var re = /\{.*\}/;
alert("A: " + re.test("This {is} a match"));
alert("B: " + re.test("This {is not a match"));
alert("C: " + re.test("This } is not a match"));
Alerts A: true, B: false, and C: false.
Most other flavors will be similar.
For this problem regex-based solution is way too heavy.
If you have the opportunity of NOT using regexes - don't, simpler statement(s) can handle it just fine.
Even much general problem - checking, if the use of (potentially nested) parentheses is correct - is solvable using simple one-pass loop.
I.e. this is correct
{}{{{}{}}}
while this isn't
{{}
Solution in python (easy to translate to other language):
def check(s):
counter = 0
for character in s:
if character == "{":
counter += 1
elif character == "}":
counter -= 1
if counter < 0:
# not valid
return False
if counter == 0:
# valid
return True
else:
return False
There is exactly one opening brace and exactly one closing brace in the string, and the closing brace follows the opening brace:
^[^\{\}]\{[^\{\}]\}[^\{\}]$
There any number of braces in the string, but they are not nested (there is never another opening brace before the previous one has been closed), and they are always balanced:
^[^\{\}](\{[^\{\}]\})*[^\{\}]$
Nesting cannot be generally solved by regular expressions.