Delphi TRegEx bug? - regex

I try to validate the input '3a' for regex '[_a-zA-Z][_a-zA-Z0-9]*' with that source:
len := TRegEx.Create([_a-zA-Z][_a-zA-Z0-9]*).Match('3a').Length;
I expected 0 for len variable, but it was 2. Is that correct?

This is not your real code. For a start it does not compile. You have omitted the quote marks. If we fix that then we have:
len := TRegEx.Create('[_a-zA-Z][_a-zA-Z0-9]*').Match('3a').Length;
But that returns a value of 1 and not 2 as you stated. This return value is correct because the a matches [_a-zA-Z] and then the input string ends.
I expect that you have the wrong regex. Perhaps you should be using
^[_a-zA-Z][_a-zA-Z0-9]*$
The ^ matches the beginning of the input string, the $ mathes the end. Presumably the input is taken from a source code tokenizer.
So the conclusion is that there is no bug evident in the Delphi regex code from this pattern and input.

Related

parse comma seperated values in argumentlist that's seperated by commas

So i have this regex:
=([0-9A-Za-z_-]+),?
and i need have a string like:
foo=bar,pine=apple,tree,bar=bie
or
foo=bar,pine=apple,tree
or
pine=apple,tree
the regex works for cases where i only have 1 value.
but since we have comma's in the list of values for the key.
the regex just craps out and my code does half of what i want it to do but doesn't get the 2nd value.
How do i fix my regex to take both values regardless of where in the string it is?
alone, between 2 others, at the end.
i tried some stuff but couldn't figure it out.
Attempt 1:
=([0-9A-Za-z,_-]+),=?
In this case, it matches the one where it's in the middle but it fails on the others because = does not exist.
Attempt 2:
=[0-9A-Za-z_-]+([,]+[0-9A-Za-z_-]*),?
Matches too bar,pine and tree,bar for example
EDIT::
This seems to work maybe....
=('[0-9A-Za-z,_-]+'),*|=([0-9A-Za-z_-]+),*
if i use quotes for multi values..
You can split on variable names - that will leave only the values:
s := regexp.MustCompile("[^,\\s]+=").Split("foo=bar,pine=apple,tree,bar=bie", -1)
fmt.Println(s)
# => [ "bar", "apple,tree", "bie"]
Go Demo
Regex Demo

Why is this seemingly correct Regex not working correctly in Rascal?

In have following code:
set[str] noNnoE = { v | str v <- eu, (/\b[^eEnN]*\b/ := v) };
The goal is to filter out of a set of strings (called 'eu'), those strings that have no 'e' or 'n' in them (both upper- and lowercase). The regular expression I've provided:
/\b[^eEnN]?\b/
seems to work like it should, when I try it out in an online regex-tester.
When trying it out in the Rascel terminal it doesn't seem to work:
rascal>/\b[^eEnN]*\b/ := "Slander";
bool: true
I expected no match. What am I missing here? I'm using the latest (stable) Rascal release in Eclipse Oxygen1a.
Actually, the online regex-tester is giving the same match that we are giving. You can look at the match as follows:
if (/<w1:\b[^eEnN]?\b>/ := "Slander")
println("The match is: |<w1>|");
This is assigning the matched string to w1 and then printing it between the vertical bars, assuming the match succeeds (if it doesn't, it returns false, so the body of the if will not execute). If you do this, you will get back a match to the empty string:
The match is: ||
The online regex tester says the same thing:
Match 1
Full match 0-0 ''
If you want to prevent this, you can force at least one occurrence of the characters you are looking for by using a +, versus a ?:
rascal>/\b[^eEnN]+\b/ := "Slander";
bool: false
Note that you can also make the regex match case insensitive by following it with an i, like so:
/\b[^en]+\b/i
This may make it easier to write if you need to add more characters into the character class.
This solution (/\b[^en]+\b/i) doesn't work for strings consisting of two words, such as the Czech Republic.
Try /\b[^en]+\b$/i. That seems to work for me.

Find numbers in string using Golang regexp

I want to find all numbers in a string with the following code:
re:=regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def", 0))
I also tried adding delimiters to the regex, using a positive number as second parameter for FindAllString, using a numbers only string like "123" as first parameter...
But the output is always []
I seem to miss something about how regular expressions work in Go, but cannot wrap my head around it. Is [0-9]+ not a valid expression?
The problem is with your second integer argument. Quoting from the package doc of regex:
These routines take an extra integer argument, n; if n >= 0, the function returns at most n matches/submatches.
You pass 0 so at most 0 matches will be returned; that is: none (not really useful).
Try passing -1 to indicate you want all.
Example:
re := regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def987asdf", -1))
Output:
[123 987]
Try it on the Go Playground.
#icza answer is perfect for fetching positive numbers but, if you have a string which contains negative numbers also like below
"abc-123def987asdf"
and you are expecting output like below
[-123 987]
replace regex expression with below
re := regexp.MustCompile(`[-]?\d[\d,]*[\.]?[\d{2}]*`)

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.
Doesn't require a %:
/^[^%]*(%20[^%]*)*$/
Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".
Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$
Reject the string if it matches %[^2][^0]
I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/
Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);
/^([^%]|%20)*$/
This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.
I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;