When I searched about something, I found an answered question in this site. 2 of the answers contain
/([^.]*)\.(.*)/
on their answer.
The question is located at Find & replace jquery. I'm newbie in javascript, so I wonder, what does it mean? Thanks.
/([^.]*)\.(.*)/
Let us deconstruct it. The beginning and trailing slash are delimiters, and mark the start and end of the regular expression.
Then there is a parenthesized group: ([^.]*) The parentheseis are there just to group a string together. The square brackets denote a "character group", meaning that any character inside this group is accepted in its place. However, this group is negated by the first character being ^, which reverse its meaning. Since the only character beside the negation is a period, this matches a single character that is not a period. After the square brackets is a * (asterisk), which means that the square brackets can be matched zero or more times.
Then we get to the \.. This is an escaped period. Periods in regular expressions have special meaning (except when escaped or in a character group). This matches a literal period in the text.
(.*) is a new paranthesized sub-group. This time, the period matches any character, and the asterisk says it can be repeated as many times as needs to.
In summary, the expression finds any sequence of characters (that isn't a period), followed by a single period, again followed by any character.
Edit: Removed part about shortening, as it defeats the assumed purpose of the regular expression.
It's a regular expression (it matches non-periods, followed by a period followed by anything (think "file.ext")). And you should run, not walk, to learn about them. Explaining how this particular regular expression works isn't going to help you as you need to start simpler. So start with a regex tutorial and pick up Mastering Regular Expressions.
Original: /([^.]*)\.(.*)/
Split this as:
[1] ([^.]*) : It says match all characters except . [ period ]
[2] \. : match a period
[3] (.*) : matches any character
so it becomes
[1]Match all characters which are not . [ period ] [2] till you find a .[ period ] then [3] match all characters.
Anything except a dot, followed by a dot, followed by anything.
You can test regex'es on regexpal
It's a regular expression that roughly searches for a string that doesn't contain a period, followed by a period, and then a string containing any characters.
That is a regular expression. Regular expressions are powerful tools if you use them right.
That particular regex extracts filename and extension from a string that looks like "file.ext".
It's a regular expression that splits a string into two parts: everything before the first period, and then the remainder. Most regex engines (including the Javascript one) allow you to then access those parts of the string separately (using $1 to refer to the first part, and $2 for the second part).
This is a regular expression with some advanced use.
Consider a simpler version: /[^.]*\..*/ which is the same as above without parentheses. This will match just any string with at least one dot. When the parentheses are added, and a match happens, the variables \1 and \2 will contain the matched parts from the parentheses. The first one will have anything before the first dot. The second part will have everything after the first dot.
Examples:
input: foo...bar
\1: foo
\2: ..bar
input: .foobar
\1:
\2: foobar
This regular expression generates two matching expressions that can be retrieved.
The two parts are the string before the first dot (which may be empty), and the string after the first dot (which may contain other dots).
The only restriction on the input is that it contain at least one dot. It will match "." contrary to some of the other answers, but the retrived groups will be empty.
IMO /.*\..*/g Would do the same thing.
const senExample = 'I am test. Food is good.';
const result1 = senExample.match(/([^.]*)\.(.*)/g);
console.log(result1); // ["I am test. Food is good."]
const result2 = senExample.match(/^.*\..*/g);
console.log(result2); // ["I am test. Food is good."]
the . character matches any character except line break characters the \r or \n.
the ^ negates what follows it (in this case the dot)
the * means "zero or more times"
the parentheses group and capture,
the \ allows you to match a special character (like the dot or the star)
so this ([^.]*) means any line break repeated zero or more times (it just eats up carriage returns).
this (.*) part means any string of characters zero or more times (except the line breaks)
and the \. means a real dot
so the whole thing would match zero or more line breaks followed by a dot followed by any number of characters.
For more information and a really great reference on Regular Expressions check out: http://www.regular-expressions.info/reference.html
It's a regular expression, which basically is a pattern of characters that is used to describe another pattern of characters. I once used regexps to find an email address inside a text file, and they can be used to find pretty much any pattern of text within a larger body of text provided you write the regexp properly.
Related
I have created the regular expression which will take the email address as in following format:
abc#xyz.com.in
Regular Expression
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I am trying to do the email which is not having hyphen at start and last.
Invalid Format
-abc#xyz.com
abc#xyz.com-
valid format
abc#xyz.com
abc#xyz.com.in
Your regex can be edited in a simple way (see a demo at Regex101):
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
^: This is the beginning of the line
[\w\.]+: This is the first part of the email before # can have only word characters (\w) or dot (\.) at least once.
[\w\.\-]*: After that, the same characters from the list before can occur including the dash (\-) and as many times as you want. Remember, the dash has to be escaped if used in the list between [ and ], otherwise it represents a range instead of the dash itself.
#: This matches itself.
[\w\.]+: After the #` character, there must be at least one character from the list.
\.: Then followed by the dot literally.
[\w\.]{2,4}: Finally the last 2-4 characters.
$: And the end of a line.
The difference between this and your Regex is just a little:
/^[\w\.]+[\w\.\-]*#[\w\.]+\.[\w\.]{2,4}$/
/^(?!-)[\w-\.]+#([\w-]+\.)+[\w-]{2,4}/
I rather avoided the negative look-ahead and specify (whitelist) the characters that can occur on the position, unless it is really needed to blacklist them (which I generally try to avoid). The rest of the Regex is quite similar except you should escape the dash - character between the list braces [ and ].
Finally, I omitted the capturing groups ( and ) and leave it up to you to place them wherever you need.
Add \w to each end of your regex, and include the end anchor$
^\w[\w.-]+#([\w-]+\.)+[\w-]{2,4}\w$
Note also the dot doesn't need escaping within a character class.
a complete email RegEx
/^(([^<>()[\]\\.,;:\s#"]+(\.[^<>()[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
I am new to perl language - I have been trying to understand the below code
if ( $nextvalue !~ /^.+"[^ ]+ \/cs\/.+\sHTTP\/[1-9]\.[0-9]"|\/\/|\/Images\/fold\/1.jpg|\/busines|\/Type= OPTIONS|\/203.176.111.126/)
Can you please help us understand what is above meant for?
condition will be true when $nextvalue will NOT match following regular expression.
Regular expressiion will match if that string
either
starts with at least one character,
followed by double quote sign ("),
followed by at least one non-whitespace character,
followed by whitespace (),
followed by string "/cs/",
followed by at least one character,
followed by whitespace and string HTTP/,
followed by one of digits from 1 to 9 inclusive,
followed by dot
followed by one of digits from 0 to 9,
followed by double quote mark (")
or contains two forward slashes (//)
or contains sunstring "/Images/fold/1.jpg"
or contains substring "/busines"
or contains substring "/Type= OPTIONS"
or contains substring "/203.176.111.126"
Whenever i am unsure what some cryptic regular expression does, i turn to Debuggex:
^.+"[^ ]+ \/cs\/.+\sHTTP\/[1-9]\.[0-9]"|\/\/|\/Images\/fold\/1.jpg|\/busines|\/Type= OPTIONS|\/203.176.111.126
Debuggex Demo
This is a railroad diagram, every string that has a substring fitting the description along any of the grey tracks will match your regex. As your condition uses !~ meaning "does not match", those strings will then fail the check.
Debuggex certainly has issues (for example it displays ^, meaning you would have to know that this means the beginning of the string, same for dots and other, whitespaces show up as underscroes, etc.) but it certainly helps in understanding the structure of the expression and possibly gives you an idea what the author had in mind.
I came across single grouping concept in shell script.
cat employee.txt
101,John Doe,CEO
I was practising SED substitute command and came across with below example.
sed 's/\([^,]*\).*/\1/g' employee.txt
It was given that above expression matches the string up to the 1st comma.
I am unable to understand how this matches the 1st comma.
Below is my understanding
s - substitute command
/ delimiter
\ escape character for (
( opening braces for grouping
^ beginning of the line - anchor
[^,] - i am confused in this , is it negate of comma or mean something else?
why * and again .* is used to match the string up to 1st comma?
^ matches beginning of line outside of a character class []. At the beginning of a character class, it means negation.
So, it says: non-comma ([^,]) repeated zero or more times (*) followed by anything (.*). The matching part of the string is replaced by the part before the comma, so it removes everything from the first comma onward.
I know 'link only' answers are to be avoided - Choroba has correctly pointed out that this is:
non-comma ([^,]) repeated zero or more times () followed by anything (.). The matching part of the string is replaced by the part before the comma, so it removes everything from the first comma onward.
However I'd like to add that for this sort of thing, I find regulex quite a useful tool for visualising what's going on with a regular expression.
The image representation of your regular expression is:
Given the string "foo, bar", s/\([^,]*\).*/\1/g, and more specifically \([^,]\)*) means, "match any character that is not a comma" (zero or more times). Since "f" is not a comma, it matches "f" and "remembers" it. Because it is "zero or more times", it tries again. The next character is not a comma either (it is o), then, the regex engine adds that o to the group as well. The same thing happens for the 2nd o.
The next character is indeed a comma, but [^,] forbids it, as #choroba affirmed. What is in the group now is "foo". Then, the regex uses .* outside the group which causes zero or more characters to be matched but not remembered.
In the replacement part of the regex, \1 is used to place the contents of the remembered text ("foo"). The rest of the matched text is lost and that is how you remain with only the text up to the first comma.
I am having problems creating a regex validator that checks to make sure the input has uppercase or lowercase alphabetical characters, spaces, periods, underscores, and dashes only. Couldn't find this example online via searches. For example:
These are ok:
Dr. Marshall
sam smith
.george con-stanza .great
peter.
josh_stinson
smith _.gorne
Anything containing other characters is not okay. That is numbers, or any other symbols.
The regex you're looking for is ^[A-Za-z.\s_-]+$
^ asserts that the regular expression must match at the beginning of the subject
[] is a character class - any character that matches inside this expression is allowed
A-Z allows a range of uppercase characters
a-z allows a range of lowercase characters
. matches a period
rather than a range of characters
\s matches whitespace (spaces and tabs)
_ matches an underscore
- matches a dash (hyphen); we have it as the last character in the character class so it doesn't get interpreted as being part of a character range. We could also escape it (\-) instead and put it anywhere in the character class, but that's less clear
+ asserts that the preceding expression (in our case, the character class) must match one or more times
$ Finally, this asserts that we're now at the end of the subject
When you're testing regular expressions, you'll likely find a tool like regexpal helpful. This allows you to see your regular expression match (or fail to match) your sample data in real time as you write it.
Check out the basics of regular expressions in a tutorial. All it requires is two anchors and a repeated character class:
^[a-zA-Z ._-]*$
If you use the case-insensitive modifier, you can shorten this to
^[a-z ._-]*$
Note that the space is significant (it is just a character like any other).
I'm new to regular expression and I having trouble finding what "\'.-" means.
'/^[A-Z \'.-]{2,20}$/i'
So far from my research, I have found that the regular expression starts (^) and requires two to twenty ({2,20}) alphabetical (A-Z) characters. The expression is also case insensitive (/i).
Any hints about what "\'.-" means?
The character class is the entire expression [A-Z \'.-], meaning any of A-Z, space, single quote, period, or hyphen. The \ is needed to protect the single quote, since it's also being used as the string quote. This charclass must be repeated 2 to 20 times, and because of the leading ^ and trailing $ anchors that must be the entire content of the matching string.
It means to escape the single quote (') that delmits the regex (as to not prematurely end the string), and then a . which means a literal . and a - which means a literal -.
Inside of the character range, the . is treated literally, and if the - isn't part of a valid range, e.g. a-z, then it is treated literally as well.
Your regex says Match the characters a-zA-Z '.- between 2 and 20 times as the entire string, with an optional trailing \n.
This regex is in a string. The backslash is there to escape the single quote so the string doesn't end early, in the middle of the regex. The dot and dash are just what they are, a period and a dash.
So, you were nearly right, except it's 2-20 characters that are letters, space, single quote, period, or dash.
It's quoting the quote.
The regular expression is ^[A-Z'.-]{2,20}$.
In the programming language you are using, you write it as a quoted string:
'SOMETHING'
To get a single quote in there, it's been backslashed.
Everything inside the square brackets is part of the character class, and will match a single character listed. In your example, the characters listed are the letters A through Z, a space, a single quote, a period, or a hyphen. (Note the hyphen must be listed last to avoid indicating a range, like A-Z.) Your full regular expression will match between 2 and 20 of the listed characters. The single quote is needed so the compiler knows you are not ending the string that defines the regular expression.
Some examples of things this will match:
....................
abaca af - .
AAfa- - ..
.z
And so on.