I need to write a regular expression to verify that a string contains { } but not { or }.Can someone shine some light on this please?
Thanks for all the help , here are some examples.
e.g.
valid : {abc}, as09{02}dd, {sdjafkl}sdjk, sfdsjakl,00{00}00, aaaaa{d}
invalid: {sdsf , sdfadf},sdf{Sdfs ,333}333
*********Update*******************
^[a-zA-Z0-9_-. ](?:{[a-zA-Z0-9_-.]+})?[a-zA-Z0-9_-. ]$ is what I need,thanks for all your help :)
/.*\{.*\}.*/
This would ensure that the string contains an opening curly bracket somewhere before a closing curly bracket, occurring anywhere in the string. However, it wouldn't be able to ensure that there's only one opening and closing curly bracket -- to do that, the .* patterns would have to be changed to something more restrictive.
If you want to experiment and test these regexes out, here's a good site.
What flavor of regex? In JavaScript, for instance, this'll do it:
var re = /\{.*\}/;
alert("A: " + re.test("This {is} a match"));
alert("B: " + re.test("This {is not a match"));
alert("C: " + re.test("This } is not a match"));
Alerts A: true, B: false, and C: false.
Most other flavors will be similar.
For this problem regex-based solution is way too heavy.
If you have the opportunity of NOT using regexes - don't, simpler statement(s) can handle it just fine.
Even much general problem - checking, if the use of (potentially nested) parentheses is correct - is solvable using simple one-pass loop.
I.e. this is correct
{}{{{}{}}}
while this isn't
{{}
Solution in python (easy to translate to other language):
def check(s):
counter = 0
for character in s:
if character == "{":
counter += 1
elif character == "}":
counter -= 1
if counter < 0:
# not valid
return False
if counter == 0:
# valid
return True
else:
return False
There is exactly one opening brace and exactly one closing brace in the string, and the closing brace follows the opening brace:
^[^\{\}]\{[^\{\}]\}[^\{\}]$
There any number of braces in the string, but they are not nested (there is never another opening brace before the previous one has been closed), and they are always balanced:
^[^\{\}](\{[^\{\}]\})*[^\{\}]$
Nesting cannot be generally solved by regular expressions.
Related
I am downloading a webpage and converting into a string using LWP::Simple. When I copy the results into an editor I find multiple instances of the pattern I'm looking for "data-src-hq".
While I'm trying to do something more complex using regex I am starting in baby steps so I can properly learn how to use regex, I started off with just to match "data-src-hq" with the following code:
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
else
{
print "nope\n";
}
My code returns "nope". However, if I modify the pattern search to just "data" or "data-src" I do get a match. The same happens no matter how I use and combine the string and multiline modifier.
My understanding is that a hyphen is not a special character unless it's within brackets, am I missing something simple?
How to fix this?
You are likely getting two outputs, one of match and one of nope. Your code is missing the keyword else:
See your code's current execution here
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
{
print "nope\n";
}
Should be:
See this code's execution here
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
else {
print "nope\n";
}
Otherwise, your code is fine and works to identify whether data-src-hq exists in $html.
So why does your existing code output nope?
That's because {} is a basic block (see Basic BLOCKs in Perl's documentation). An excerpt from the documentation:
A BLOCK by itself (labeled or not) is semantically equivalent to a
loop that executes once. Thus you can use any of the loop control
statements in it to leave or restart the block. (Note that this is NOT
true in eval{}, sub{}, or contrary to popular belief do{} blocks,
which do NOT count as loops.) The continue block is optional.
This is about regular expression replacement in the Epsilon editor. I have a csv file that I wanted to replace the texts with a certain pattern.
The pattern replacement works perfectly when I use #1, #2 etc., in the replacement group.
But, when I enter #10, its the first group that got placed here. How to use the matching group greater than 9?
(Well, a very late answer, I realize now; anyway...)
I'm not able to find the documentation, but I think that only 1 (0 for the whole patter) to 9 (at least in interactive command) are supported.
I found this code, in src/searc.e:
...
char *with;
...
if (*with != '#')
insert(*with);
else if (isdigit(*++with)) {
bufnum = orig;
group = *with - '0';
buf_xfer(tmp, find_group(group, 1),
find_group(group, 0));
bufnum = tmp;
} else {
...
It seems to me the only the first character after # is considered.
You may try to mail to support#lugaru.com for further clarification, I found Steven Doerfler always very helpfull (Epsilon 14 is now in beta, it could be an opportunity to improve the documentation.)
I am pretty new to Perl. I have the following code fragment that works just fine, but I don't fully understand it:
for ($i = 1; $i <= $pop->Count(); $i++) {
foreach ( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i and print $_, "\n";
}
}
$pop->Head is a string or an array of strings returned by the function Mail::POP3Client, and it is the headers of a bunch of emails. Line 3 is some kind of regular expression that extracts the FROM and the SUBJECT from the header.
My question is how does the print function only print the From and the Subject without all the other stuff in the header? What does "and" mean - this surely can't be a boolean and can it? Most important, I want to put the From string into its own variable (my $fromline). How do I do this?
I am hoping that this will be easy for some Perl professional, it has got me baffled!
Thanks in advance.
ARGHHH... The question was edited while I was typing the answer. OK, throwing out the part of my answer that's no longer relevant, and focusing on the specific questions:
The outer loop iterates over all the messages in the mailbox.
The inner loop doesn't specify a loop variable, so the special variable $_ is used.
In each iteration through the inner loop, $_ is one header line from message number $i.
/^(From|Subject):\s+/i and print $_, "\n";
The first part of this line, up to the and is a pattern. We didn't specify what to do with the pattern, so it's implicitly matched against $_. (That's one of the things that makes $_ special.) This gives us a yes/no test: does the pattern match the header line or not?
The pattern tests whether that item begins with (<) either of the words "From" or "Subject", followed immediately by a colon and one or more whitespace characters. (This not the correct pattern to match an RFC 822 header. Whitespace is optional on both sides of the colon. The pattern should more properly be /^(From|Subject)\s*:\s*/i. But that's a separate issue.) the i at the end of the pattern says to ignore case, so from or SUBJECT would be OK.
The and says to continue evaluating (i.e., executing) the expression if there is a match. If there's no match, whatever follows and is ignored.
The rest of the expression prints the header line ($_) and a newline ("\n").
In perl, and and or are boolean operators. They're synonyms for && and ||, except that they have much lower precedence, making it easier to write short-ciruit expressions without clutter from lots of parentheses.
The smallest change that captures the From line into a separate variable would be to add the following line to the inner loop:
/^From\s*:\s*(.*)$/i and $fromline = $1;
You should probably also put
$fromline = undef
before the loop so you can test, after the loop, whether there was a From: line.
There are other ways to do it. In fact, that's one of the mantras of perl: "There's more than one way to do it." I've stripped out the "From: " from the beginning of the line before storing the balance in $fromline, but I don't know your needs.
It's a logical and with short-circuiting. If the left side evaluates to true -- say, if that regular expression matches -- it'll evaluate the right side, the print.
If the expression on the left is false, it doesn't need to evaluate the right hand side, because the net result would still be false, so it skips it.
See also: perldoc perlop
Hello guys I need to find a regular expression that takes strings with two sets of 11 only
from a set {0,1,2}
0011110000 match it only has two sets
0010001001100 does not match (only has one set)
0000011000110011 does not match (it has three sets)
00 does not match (it has no set
0001100000110001 match it only has two sets
This is what I've done so far
([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*)*11([^1]|1(0|2|3)(0|2|3)*|1$)*
--------------------------
I think what I'm missing is that I need to make sure the underlined section of the above regular expression has to make sure there is no more "11" left in the string, and I don't think that section is working correctly.
You could use a regular expression, but you've got much simpler options available to you. Here's an example in C#:
public bool IsValidString(string input)
{
return input.Split(new string[] { "11" }, StringSplitOptions.None).Length == 3;
}
Although regular expressions can be a very useful tool, their usage is not always warranted. As jwz put it:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
If this is not homework, then I would suggest avoiding a regex and going with a regular function (shown here is JavaScript):
function hasTwoElevensOnly(s) {
var first = s.indexOf("11");
if (first < 0) return false;
var second = s.indexOf("11", first + 2);
if (second < 0) return false;
return s.indexOf("11", second + 2) < 0;
}
Code here: http://jsfiddle.net/8FMRH/
function hasTwoElevensOnly(s) {
return /^((0|1(?!1)|2)*?11(0|1(?!1)|2)*?){2}$/.test(s);
}
If a regex is required,
COde here: http://jsfiddle.net/PAARn/1/
most of regex comes with the restriction of appearance, usually in {}. For example, in JavaScript, you could do something like:
/^((10|0)*11(01|0)*){2}$/
Which mataches 2 set of 11 prefixed and suffixed with 0+ 0 in the string.
There may be a simpler way, but starting with your approach, this seems to work on the sample data provided:
/^([^1]|1[023])*11([^1]|1[023])*11((?<!11)|1[023]|[023]|(?<=[023])1)*$/
Using lookbehind.
I've written a url validator for a project I am working on. For my requirements it works great, except when the last part for the url goes longer than 22 characters it breaks. My expression:
/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i
It expects input that looks like "http(s)://hostname:port/location".
When I give it the input:
https://demo10:443/111112222233333444445
it works, but if I pass the input
https://demo10:443/1111122222333334444455
it breaks. You can test it out easily at http://ryanswanson.com/regexp/#start. Oddly, I can't reproduce the problem with just the relevant (I would think) part /(:\d+\/\S+)/i. I can have as many characters after the required / and it works great. Any ideas or known bugs?
Edit:
Here is some code for a sample application that demonstrates the problem:
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
<mx:Script>
<![CDATA[
private function click():void {
var value:String = input.text;
var matches:Array = value.match(/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i);
if(matches == null || matches.length < 1 || matches[0] != value) {
area.text = "No Match";
}
else {
area.text = "Match!!!";
}
}
]]>
</mx:Script>
<mx:TextInput x="10" y="10" id="input"/>
<mx:Button x="178" y="10" label="Button" click="click()"/>
<mx:TextArea x="10" y="40" width="233" height="101" id="area"/>
</mx:Application>
I debugged your regular expression on RegexBuddy and apparently it takes millions of steps to find a match. This usually means that something is terribly wrong with the regular expression.
Look at ([^\s.]+.)+([^\s.]+)(:\d+\/\S+).
1- It seems like you're trying to match subdomains too, but it doesn't work as intended since you didn't escape the dot. If you escape it, demo10:443/123 won't match because it'll need at least one dot. Change ([^\s.]+\.)+ to ([^\s.]+\.)* and it'll work.
2- [^\s.]+ is a bad character class, it will match the whole string and start backtracking from there. You can avoid this by using [^\s:.] which will stop at the colon.
This one should work as you want:
https?:\/\/([^\s:.]+\.)*([^\s:.]+):\d+\/\S+
This is a bug, either in Ryan's implementation or within Flex/Flash.
The regular expression syntax used above (less surrounding slashes and flags) matches Python which provides the following output:
# ignore case insensitive flag as it doesn't matter in this case
>>> import re
>>> rx = re.compile('((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)')
>>> print rx.match('https://demo10:443/1111122222333334444455').groups()
('https://', 'https', 'demo1', '0', ':443/1111122222333334444455')