Python to ruby regex search conversion - regex

I have a simple python regex re.search('<span[\w\W]*?>Members[\w\W]*?([\w\W]*?)</span>', str). I'd like to do the same in ruby.
From the docs, it appears that match should work. However when I try
/<span[\w\W]*?>Members[\w\W]*?([\w\W]*?)</span>/.match(str) I get a syntax error.
I know this is something obvious but would love some help. Thank you

You need to escape the / which is inside closing tag of </span>
/<span[\w\W]*?>Members[\w\W]*?([\w\W]*?)<\/span>/.match(str)
// ^^
otherwise this will be considered as end of regex
and you can use .*? where . mean capture anything except line break
/<span.*?>Members.*?(.*?)<\/span>/.match(str)

If you have a lot of slashes to match, try %r notation:
%r{<span[\w\W]*?>Members[\w\W]*?([\w\W]*?)</span>}.match(str)

Related

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Regular Expression does not remove html comment?

I have the following string:
<TD><!-- 1.91 -->6949<!-- 9.11 --></TD>
I want to end up with:
<TD>6949/TD>
but instead I end up with just the tags and no information:
<TD></TD>
This is the regular expression I am using:
RegEx.Replace("<TD><!-- 1.91 -->6949<!-- 9.11 --></TD>","<!--.*-->","")
Can someone explain how to keep the numbers and remove just what the comments. Also if possible, can someone explain why this is happening?
.* is a greedy qualifier which matches as much as possible.
It's matching everything until the last -->.
Change it to .*?, which is a lazy qualifier.
.* is greedy so it will match as many characters as possible. In this case the opening of the first comment until the end of the second. Changing it to .*? or [^>]* will fix it as the ? makes the match lazy. Which is to say it will match as few characters as possible.
Parsing HTML with Regex is always going to be tricky. Instead, use something like HTML Agility Pack which will allow you to query and parse html in a structured manner.

Vim/Perl Regex Tag Match Problem

I have data that looks like this:
[Shift]);[Ctrl][Ctrl+S][Left mouse-click][Backspace][Ctrl]
I want to find all [.*] tags that have the word mouse in them. Keeping in mind non-greedy specifiers, I tried this in Vim: \[.\{-}mouse.\{-}\], but this yielded this result,
[Shift]);[Ctrl][Ctrl+S][Left mouse-click]
Rather than just the desired,
[Left mouse-click]
Any ideas? Ultimately I need this pattern in Perl syntax as well, so if anyone has a solution in Perl that would also be appreciated.
\[[^]]*mouse[^[]*\]
That is, match a literal opening bracket, then any number of characters that aren't closing brackets, then "mouse," then any number of non-opening-brackets, and finally a literal closing bracket. Should be the same in Perl.
You can use the following regex:
\[[^\]]*mouse.*?\]

What is wrong with this really really simple RegEx expression?

this one is really easy.
I'm trying to create a Regular Expression that will result in a Successful Match when against the following text
/default.aspx?
So i tried the following...
^/default.aspx$
and it's failing to match it.
Can someone help, please?
(i'm guessing i'm screwing up becuase of the \ and the ? in the input expression).
The problem is in the .(dot), which is a wildcard,
You must escape it like \..
Also, Because there is a ? at the end of URL and $ (end-of-input) is in the regexp, therefore, it does not match.
The correct regexp should be ^/default\.aspx(\?.*)?$
The $ at the end of ^/default.aspx$ means 'match the end of the string', but the string you're searching ends with '?'.
Maybe something like this is more appropriate:
^/default\.aspx(\?.*)?$
This will match default.aspx, with an optional ?whatever-else-that-comes-after.

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.