Why is this seemingly correct Regex not working correctly in Rascal? - regex

In have following code:
set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };
The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:
/\b[^eEnN]?\b/
seems to work like it should, when I try it out in an online regex-tester.
When trying it out in the Rascel terminal it doesn't seem to work:
rascal>/\b[^eEnN]*\b/ := "Slander";
bool: true
I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.

Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:
if (/<w1:\b[^eEnN]?\b>/ := "Slander")
println("The match is: |<w1>|");
This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:
The match is: ||
The online regex tester says the same thing:
Match 1
Full match 0-0 ''
If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:
rascal>/\b[^eEnN]+\b/ := "Slander";
bool: false
Note that you can also make the regex match case insensitive by following it with an i, like so:
/\b[^en]+\b/i
This may make it easier to write if you need to add more characters into the character class.

This solution (/\b[^en]+\b/i) doesn't work for strings consisting of two words, such as the Czech Republic.
Try /\b[^en]+\b$/i. That seems to work for me.

Related

parse comma seperated values in argumentlist that's seperated by commas

So i have this regex:
=([0-9A-Za-z_-]+),?
and i need have a string like:
foo=bar,pine=apple,tree,bar=bie
or
foo=bar,pine=apple,tree
or
pine=apple,tree
the regex works for cases where i only have 1 value.
but since we have comma's in the list of values for the key.
the regex just craps out and my code does half of what i want it to do but doesn't get the 2nd value.
How do i fix my regex to take both values regardless of where in the string it is?
alone, between 2 others, at the end.
i tried some stuff but couldn't figure it out.
Attempt 1:
=([0-9A-Za-z,_-]+),=?
In this case, it matches the one where it's in the middle but it fails on the others because = does not exist.
Attempt 2:
=[0-9A-Za-z_-]+([,]+[0-9A-Za-z_-]*),?
Matches too bar,pine and tree,bar for example
EDIT::
This seems to work maybe....
=('[0-9A-Za-z,_-]+'),*|=([0-9A-Za-z_-]+),*
if i use quotes for multi values..
You can split on variable names - that will leave only the values:
s := regexp.MustCompile("[^,\\s]+=").Split("foo=bar,pine=apple,tree,bar=bie", -1)
fmt.Println(s)
# => [ "bar", "apple,tree", "bie"]
Go Demo
Regex Demo

Regular Exp match anything but not specific string

I am handling user input in my program by using regular exp.
the string contains /_MyWord/ and only a-z is accepted before /_MyWord/.
the string not contain /s/123, /s/32A and atr/will in the beginning.
My try:
^(?!.*/s/123)(?!.*/s/32A )(?!.*atr/will)([/a-z]+)/_MyWord/(.*)$
Example:
/s/123/QWERERTYU/_MyWord/45454545 -> fail
/DFGH/FGHJK/GHJK/_MyWord/DFGHJ452 -> OK
HiCanYouHelpMe/_MyWord/fgh -> OK
/_MyWord/HiCanYouHelpMefgh -> OK
Can anyone help me to finish the Regular Exp string
If I got your question correctly, try this regex:
^(?!.*\/s\/123)(?!.*\/s\/32A)(?!.*atr\/will)([\/a-zA-Z]*)\/_MyWord\/(.*)$
Unescaped: ^(?!.*/s/123)(?!.*/s/32A)(?!.*atr/will)([/a-zA-Z]*)/_MyWord/(.*)$
Changed ([\/a-z]+) to ([\/a-zA-Z]*) to include lower and upper case as well as support none (e.g /_MyWord/Test)
Regex101 Demo
Works for
/DFGH/FGHJK/GHJK/_MyWord/DFGHJ452
HiCanYouHelpMe/_MyWord/fgh
/_MyWord/HiCanYouHelpMefgh
Doesn't match:
/s/123/QWERERTYU/_MyWord/45454545
atr/will/DFGH/FGHJK/GHJK/_MyWord/DFGHJ452
Also, you really don't need lookaheads for /s/123 and /s/32A since they contain numbers so they will automatically be rejected because your condition includes [a-zA-Z]. So you might want to remove (?!.*\/s\/123)(?!.*\/s\/32A) from the beginning.

Lua pattern to validate a DNS address

I am currently using this regular expression to loosely validate a DNS address:
^[A-Za-z0-9_]+(\.[A-Za-z0-9_]+)*$
Which would match things like hello.com, hello, and hello.com.com.com. I was trying to replicate it exactly as it is into a Lua pattern. I came up with the following Lua pattern:
^([%d%a_]+(%.[%d%a_]+)*)$
So that I can use the following code to validate the DNS address:
local s = "hello.com"
print(s:match("^([%d%a_]+(%.[%d%a_]+)*)$"))
For some reason this always fails, although it looks like a 1:1 copy of the regular expression.
Any ideas why?
Lua patterns are not regular expressions. You cannot translate the ^[A-Za-z0-9_]+(\.[A-Za-z0-9_]+)*$ to ^([%d%a_]+(%.[%d%a_]+)*)$, because you cannot apply quantifiers to groups in Lua (see Limitations of Lua patterns).
Judging by the ^[A-Za-z0-9_]+(\.[A-Za-z0-9_]+)*$ regex, the rules are:
String can consist of one or more alphanumeric or underscore or dot characters
String cannot start with a dot
String cannot end with a dot
String cannot contain 2 consecutive dots
You can use the following work-around:
function pattern_checker(v)
return string.match(v, '^[%d%a_.]+$') ~= nil and -- check if the string only contains digits/letters/_/., one or more
string.sub(v, 0, 1) ~= '.' and -- check if the first char is not '.'
string.sub(v, -1) ~= '.' and -- check if the last char is not '.'
string.find(v, '%.%.') == nil -- check if there are 2 consecutive dots in the string
end
See IDEONE demo:
-- good examples
print(pattern_checker("hello.com")) -- true
print(pattern_checker("hello")) -- true
print(pattern_checker("hello.com.com.com")) -- true
-- bad examples
print(pattern_checker("hello.com.")) -- false
print(pattern_checker(".hello")) -- false
print(pattern_checker("hello..com.com.com")) -- false
print(pattern_checker("%hello.com.com.com")) -- false
You can translate the pattern to ^[%w_][%w_%.]+[%w_]$, although that still allows for double dots. When using that pattern while checking for double dots, you end up with this:
function pattern_checker(v)
-- Using double "not" because we like booleans, no?
return not v:find("..",1,true) and not not v:match("^[%w_][%w_%.]+[%w_]$")
end
I used the same testcode as Wiktor Stribiżew (since it's good testcode) and it produces the same results. Mine is also 2 to 3 times faster, if that matters. (Doesn't mean I don't like Wiktor's code, his code also works. He also has a link to the limitations page, a nice touch to his answer)
(I like playing with string patterns in Lua)

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed

Flex 3 Regular Expression Problem

I've written a url validator for a project I am working on. For my requirements it works great, except when the last part for the url goes longer than 22 characters it breaks. My expression:
/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i
It expects input that looks like "http(s)://hostname:port/location".
When I give it the input:
https://demo10:443/111112222233333444445
it works, but if I pass the input
https://demo10:443/1111122222333334444455
it breaks. You can test it out easily at http://ryanswanson.com/regexp/#start. Oddly, I can't reproduce the problem with just the relevant (I would think) part /(:\d+\/\S+)/i. I can have as many characters after the required / and it works great. Any ideas or known bugs?
Edit:
Here is some code for a sample application that demonstrates the problem:
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
<mx:Script>
<![CDATA[
private function click():void {
var value:String = input.text;
var matches:Array = value.match(/((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)/i);
if(matches == null || matches.length < 1 || matches[0] != value) {
area.text = "No Match";
}
else {
area.text = "Match!!!";
}
}
]]>
</mx:Script>
<mx:TextInput x="10" y="10" id="input"/>
<mx:Button x="178" y="10" label="Button" click="click()"/>
<mx:TextArea x="10" y="40" width="233" height="101" id="area"/>
</mx:Application>
I debugged your regular expression on RegexBuddy and apparently it takes millions of steps to find a match. This usually means that something is terribly wrong with the regular expression.
Look at ([^\s.]+.)+([^\s.]+)(:\d+\/\S+).
1- It seems like you're trying to match subdomains too, but it doesn't work as intended since you didn't escape the dot. If you escape it, demo10:443/123 won't match because it'll need at least one dot. Change ([^\s.]+\.)+ to ([^\s.]+\.)* and it'll work.
2- [^\s.]+ is a bad character class, it will match the whole string and start backtracking from there. You can avoid this by using [^\s:.] which will stop at the colon.
This one should work as you want:
https?:\/\/([^\s:.]+\.)*([^\s:.]+):\d+\/\S+
This is a bug, either in Ryan's implementation or within Flex/Flash.
The regular expression syntax used above (less surrounding slashes and flags) matches Python which provides the following output:
# ignore case insensitive flag as it doesn't matter in this case
>>> import re
>>> rx = re.compile('((https?):\/\/)([^\s.]+.)+([^\s.]+)(:\d+\/\S+)')
>>> print rx.match('https://demo10:443/1111122222333334444455').groups()
('https://', 'https', 'demo1', '0', ':443/1111122222333334444455')