Regex matching extra characters

Regex matching extra characters - regex

using: this tool to evaluate my expression
My test string: "Little" Timmy (tim) McGraw
my regex:
^[()"]|.["()]
It looks like I'm properly catching the characters I want but my matches are including whatever character comes just before the match. I'm not sure what, or if anything, I'm doing wrong to be catching the preceding characters like that? The goal is to capture characters we don't want in the name field of one of our systems.

Brief
Your current regex ^[()"]|.["()] says the following:
^[()"]|.["()] Match either of the following
^[()"] Match the following
^ Assert position at the start of the line
[()"] Match any character present in the list ()"
.["()] Match the following
. Match any character (this is the issue you were having)
["()] Match any character present in the list "()
Code
You can actually shorten your regex to just [()"].
Ultimately, however, it would be much easier to create a negated set that determines which characters are valid rather than those that are invalid. This approach would get you something like [^\w ]. This means match anything not present in the set. So match any non-word and non-space characters (in your sample string this will match the symbols ()" since they are not in the set).

Related

non-greedy search for redundant values in string

Basically I have this string and I want to get only a distinct image filename.
/mPastedImg_Time1469244713469.png&gtxResourceFileName=mPastedImg_Time1469244713469.png&amp
I have this regex code but it does not seem to work.
[^\/]*?_Time[0-9]{13}\.\w{3,4}\&
My expected output is:
mPastedImg_Time1469244713469.png
But the actual output is:
mPastedImg_Time1469244713469.png&gtxResourceFileName=mPastedImg_Time1469244713469.png&

To find the unique filename in a string, you can use this regex,
([^\/&= ]+_Time[0-9]{13}\.\w{3,4})(?!.*\1)
Here, ([^\/&= ]+_Time[0-9]{13}\.\w{3,4}) captures the filename you require and (?!.*\1) negative look ahead gives you the last match ensuring the removal of all duplicates matches in the string. Also, because of appropriate negated character set, it allows matching Chinese character set too that are present in your filename which also you wanted to capture.
Demo

Your pattern has 2 matches where the second part has a larger match due to the negated character class [^\/] that matches not a forward slash.
What you might do is make the first character class more restrictive to specify what you would allow to match (for example [a-zA-Z]) and make sure that you don't use a global match to match all, but just one match:
[a-zA-Z]*_Time[0-9]{13}\.\w{3,4}
Regex demo
Note that you don't have to match the ampersand at the end of the pattern.

I think you were quite close matching it, but you were doing too complex:
If you know that the name will be mPastedImg_Time then use it to the fullest.
What about simply doing it like this:
mPastedImg_Time[0-9]{13}\.\w{3,4}

Why the character ^ is required in an regex ^(?!.*?spam) to filter strings?

I try to filter strings, that don't contain word "spam".
I use the regex from here!
But I can't understand why I need the symbol ^ at the start of expression. I know that it signs the start of regex but I do not understand why it doesn't work without ^ in my case?
UPD. All the answers hereunder are very usefull.
It's completely clear now. Thank you!

The regex (?!.*?spam) matches a position in a string that is not followed by something matching .*?spam.
Every single string has such a position, because if nothing else, the very end of the string is certainly not followed by anything matching .*?spam.
So every single string contains a match for the regex (?!.*?spam).
The anchor ^ in ^(?!.*?spam) restricts the regex, so that it only matches strings where the very beginning of the string isn't followed by anything matching .*?spam — i.e., strings that don't contain spam at all (or anywhere in the first line, at least, depending on whether . matches newlines).

The lookahead is a zero-width assertion (that is, it ensures a position in your string). In your case it is a negative lookahead making sure that not "zero more characters, followed by the word spam" are following. This is true for a couple of positions in your string, see a demo on regex101.com without the anchor.
With the anchor the matching process starts right at the very beginning, so the whole string is analyzed, see the altered demo on regex101.com as well.

Exclude strings of pattern "abba"

For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'

It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.

It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)

How to detect something does not exist before 'end of string' in regex

I have a simple regular expression looking for twitter style tags in a string like so:
$reg_exUrl = "/#([A-Za-z0-9_]{1,15})/";
This works great for matching words after an # sign.
One thing it does though which I don't want it to do is match full stops.
So it should match
"#foo"
but should not match
"#foo."
I tried adding amending the expression to dissallow full stops like so:
$reg_exUrl = "/#([A-Za-z0-9_]{1,15})[^\.]/";
This almost works, except it will not match if it's at the end of the string. EG:
It WILL match this "#foo more text here"
but won't match this "#foo"
Could anyone tell me where I'm going wrong?
Thanks

First of all your original expression can be written like the following:
/#\w{1,15}/
because \w is equivalent to [A-Za-z0-9_].
Secondly your expression doesn't match names with . so you probably meant that you don't want to match names ending with a dot and this can be done with the following:
/#\w{1,15}(?![^\.]*\.)/
Or if you want to match a name no matter how long it is just not ending with a dot then
/#\w+(?![^\.]\.)/
Oh ya, I forgot one thing, your problem was caused by the absence of any anchor characters such as the start of line ^ and end of line $, so you should use them if you want to match a string that contains only a twitter name which you wish to validate.
Summary: If you want to match names anywhere in the document don't use anchors, and if you want to know whether a given string is a valid name use the anchors.

It's not working if it's at the end of the string because it's expecting [^\.] after it.
What you are wanting, you can do with a negative lookahead to make sure there is no dot afterwards, like this:
/#([A-Za-z0-9_]{1,15})(?![^\.]*\.)/
Test it here
You could also do it this way:
/#([A-Za-z0-9_]{1,15})([^\.]*)$/
Test it here
This one allows for optional characters other than a dot, and then it has to be the end of the string.

A $ matches the end of the string, and for future reference, a ^ matches the begining:
$reg_exUrl = "/#([A-Za-z0-9_]{1,15})$/";

Regex to match anything

I know it seems a bit redundant but I'd like a regex to match anything.
At the moment we are using ^*$ but it doesn't seem to match no matter what the text.
I do a manual check for no text but the test view we use is always validated with a regex. However, sometimes we need it to validate anything using a regex. i.e. it doesn't matter what is in the text field, it can be anything.
I don't actually produce the regex and I'm a complete beginner with them.

The regex .* will match anything (including the empty string, as Junuxx points out).

The chosen answer is slightly incorrect, as it wont match line breaks or returns. This regex to match anything is useful if your desired selection includes any line breaks:
[\s\S]+
[\s\S] matches a character that is either a whitespace character (including line break characters), or a character that is not a whitespace character. Since all characters are either whitespace or non-whitespace, this character class matches any character. the + matches one or more of the preceding expression

^ is the beginning-of-line anchor, so it will be a "zero-width match," meaning it won't match any actual characters (and the first character matched after the ^ will be the first character of the string). Similarly, $ is the end-of-line anchor.
* is a quantifier. It will not by itself match anything; it only indicates how many times a portion of the pattern can be matched. Specifically, it indicates that the previous "atom" (that is, the previous character or the previous parenthesized sub-pattern) can match any number of times.
To actually match some set of characters, you need to use a character class. As RichieHindle pointed out, the character class you need here is ., which represents any character except newlines (and it can be made to match newlines as well using the appropriate flag). So .* represents * (any number) matches on . (any character). Similarly, .+ represents + (at least one) matches on . (any character).

I know this is a bit old post, but we can have different ways like :
.*
(.*?)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex matching extra characters - regex

Related

non-greedy search for redundant values in string

Why the character ^ is required in an regex ^(?!.*?spam) to filter strings?

Exclude strings of pattern "abba"

How to detect something does not exist before 'end of string' in regex

Regex to match anything

Categories

Resources