Regular expression: page path starts with "/posts/" and ends with ".html" - regex

I'm stuck here:
=~^/posts/(*).html
but it doesn't work
I need something that can recognise something like this:
/posts/testing.html
/posts/another-testing-issue.html
And I'm not very good using RegEx
Can anyone help me please?
EDIT:
Floris had the right answer:
^/posts/.*html$
thank you!

Briefly, the expression you need is
^\/posts\/.*\.html$
Explanation:
^ start of string
\/posts\/ literal string '/posts/'
the backslash "protects" the forward slash -
it is called "escaping", and removes any special meaning it might have
(in some applications the / would be a delimiter)
.* any number of characters
\. literal '.'
html literal 'html'
$ end of string
Now for a bit more background on regex syntax…
A
s #Peter points out in the comment, a quantifier follows "the thing to quantify". In most (all?) regex syntaxes, writing (*) will generate the error preceding token is not quantifiable. You need something in front of the *, and a ( doesn't count (unless it was escaped).
This is where the dot comes in. The dot . means "any character at all. That is its usual meaning, which is why.*` is just about the most common thing in regular expressions, meaning "I don't care about the next bit…" (usually up to an "until" - whatever follows).
Because the dot has a special meaning, when you want the exact string .html, you need to write it as \.html (there's that escape backslash again to remove the special meaning from the dot).
As a final tweak, it is not uncommon to have an extension like .htm - so you could write your expression as
\/posts\/.*\.html?$
This would make the last character, the l, optional (the ? means "zero or one times the preceding expression, which in this case is the single character immediately before it).
You can see this at work at http://regex101.com/r/bK5yC7 - it is a wonderful tool for exploring regular expressions, and also gives a nice explanation (breakdown) of every expression you type (with highlighting of any errors)

You missed a dot as single character match and didn't escape the second one as being literal:
^/posts/(.*)\.html

In most of regular expression . mean any character and * means multiplicity, so try to fix to
^/posts/(.*)\.html
\ is escape character

Related

Check array syntax with Regex

I'm trying to create a regex that checks if a string is a valid path for Firestore document.
I will find a regex that testing if a string:
start with a char ^([a-z]{1})
after first char, there will be only letter/digit and/or a dot \w*(.?\w+){0,}
last chars in the string could be an index of an array (\[{1}\d+\]{1})?$
First and second points work well but the last group doesn't work. I test a string like data.images[11 and the regex return true.
first of all you can shorten some quantifiers in your regex:
{1} -> can be ignored completely
{0,} -> *
Your second part could be expressed like this, this will also support readability:
[\w.]* meaning: take any character inside the brackets 0 to n-times. The bracket expression also supports predefined classes, so we are using \w here. The dot INSIDE the brackets doesn't need to be escaped, it simply means the one character dot.
So your parts would be:
^([a-z])
[\w.]*
(\[\d+\])?$
I hope this helps. According to regexpal it matches data.images[11], but not data.images[11. Also it seems to support all your demands.
EDIT:
Your second part doesn't work because (like Asocia stated in the answer) you would need to escape the dot. The dot itself is a class meaning "any character" (depending on regex engine and settings sometimes even line breaks). As you mean the dot as a character you need to escape it.

Regular expression to match single quotes, double quotes and/or space

I have a regular expression looking for width=["|\']([^"]*)["|\']
works great when looking for width="750" and width='750' however it does not match width=750
so I got it as far as width=["|\']?([^"]*)["|\'] for optional first quote but the match just continues on and does not return just 750
If you are using a tool or language that supports backreferences, you should be able to use the following:
width=("|'|)(\S*)\1
This will try to match a single quote, double quote, or empty string with the first capture group, and then the \1 at the end will be whatever the first group captured. The value will always be the contents from the second capture group.
I also changed the [^"]* to \S* so this will match any number of non-whitespace characters. This is necessary to make sure that your match doesn't just go to the end of the string when there is no quotes around the value.
Example: http://rubular.com/r/Xg8ageZmgy
Character classes ([]) do not make use of | to mean or; they automatically or everything. You also don't have to escape the single quote (unless of course you're enclosing this whole expression in single quotes). You want:
["' ]?([^"' ]*)["' ]
Try this one:
width\s*=\s*(?:["\']([^"\']*)["\']|\S+)
I just added the \S+ to handle 700 after equal sign as OR condition. Also you do not need to place | inside the character class []
\s* means optional white spaces(zero or more times).
Which regular expression language are you using? Different languages have different details of syntax, so someone might give you an answer that works in their environment but not in yours.
For example, I copied your expression and tried it on some text in Emacs. It found a match in this text:
width=|750|
That's because Emacs regex doesn't use the '|' character to signify "either or" within the '[' and ']' brackets; it interprets it as just one more example of a character that the expression might match.
Also, it looks like your expression doesn't always stop after the 750 in this example:
width='750'
Instead, if there is a '"' character later in the input, it matches everything from the 750 up to that character. (It did the same thing with my earlier example in Emacs if there was a '"' later in the input.)
You will also match the 750 in this (note the mismatched quotation marks):
width='750"
Is that a problem, or is that an acceptable outcome?

What's the meaning of this perl regex expression?

the regex expression is as below:
if ($ftxt =~ m|/([^=]+)="(.+)"|o)
{
.....
}
this regex seems different from many other regex.What makes me confused is the "|" ,most regex use "/" instead of "|". And , group ([^=]+) also makes me confused.I know [^=] means "the start of the string" or "=",but what does it mean by repeat '^' one or more times? ,how to explain this?
You can use different delimiters instead of /. For instance you could use:
m#/([^=]+)="(.+)"#o
Or
m~/([^=]+)="(.+)"~o
The advantage here of using something different than / is that you don't have to escape slashes, because otherwise, you'd have to use:
m/\/([^=]+)="(.+)"/o
^
[Or [/]]
([^=]+) is a capture group, and inside, you have [^=]+. [^=] is a negated class and will match any character which is not a =.
^ behaves differently at the beginning of a character class and is not the same as ^ outside a character class which means 'beginning of line'.
As for the last part o, this is a flag which I haven't met so far so a little search brought me to this post, I quote:
The /o modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is.
Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use /o to compile the regex once despite further changes to the variable, but as the other answers noted, qr// is better for that.
Some regexp implementations allow you to use other special characters besides / as the delimiter. This is useful if you need to use that special character inside the regular expression itself, since you don't have to escape it. (In and of itself / is not a special character in regexp syntax, but it needs escaping if it's used in the regexp literal syntax of the host language.) The docs on Perl's quote operators mention this.
This is tutorial-level stuff: square brackets ([abc]) denote a character class - it means "any of the characters inside the brackets". (In my example, it means "either a or b or c.) Inside them, the ^ special character has a different meaning, it inverts the character class. So, [^=] means "any character except =", and [^=]+ means "one or more characters that aren't =".
Quoting the docs on Perl's RE syntax:
You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.
It is meant to match equation like expressions, to capture the key and values separately. Imagine you have a statement like height="30px", and you want to capture the height attribute name, as well as its value 30px.
So you have m|/([^=]+)="(.+)"|.
The key is supposed to be everything before the = is encountered. So [^=] captures it. The ^ is a negation metacharacter when used as the first character inside [] brackets. It means that it will match any character except =, which is what you want. The / is probably a mistake, if you need to capture the group, you should not use it, or if it is indeed intended, it means to literally match an opening parentheses. Since it is a special character, it needs to be escaped, that's why \(. if you mean to capture the group, it should be ([^=]+).
Next comes the = sign, which you don't care about. Then the quotes which contain the value. So you capture it like "(.+)". the .+ will go on matching greedily every character, including the final ". But then it will find that it can't match the final " in the regex, so it will backtrack, give up the last " the regex (.+) captured, so that leaves the string within the quotes to be captured in the group. Now you are ready to access the key and value through $1 and $2. Cool, isn't it?

regular expressions boost c++

trying to catch the characters at the start the string and newlines the string is
.V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
the regular expression i am using is from the string above i need to catch .V/ and .E/
^.[VE]/*
But it only seems to ctach .V/ can anyone see why as i thought ^ means newlines aswell as start of strings ? any help will be very gratefull as ive had this problem for a while now. If this is not the correct way as in doing this could you propose a different way.
Regex 101:
^ means start of string. And you guessed it right. There can only be one start of string.
^.[VE]/*
means :
Match start of string, followed by any character (other than newline), followed by either a V or a E, followed by 0 to n / (greedy).
Probably you want something like this :
\.[VE].*?(?:\\n|$)
Which means match a dot, followed by V or E and match everything until \n or end of string.
Comment if I am wrong.
So .V/1LBOG\n.F/AV0094/08NOV/SAL/Y\n.E/0134249356001"
Looks like this ?
.V/1LBOG
.F/AV0094/08NOV/SAL/Y
.E/0134249356001"
If yes, then you need to change your regex a little bit:
\.[VE].*
Abusing the fact that . does not match newlines by default.
. in regular expressions matches any single character, not a literal .. If you want to match a literal period, you need to escape it (\.). * doesn't match any number of any characters (as most shells would), but instead matches zero or more instances of whatever you put before it. For example, A* will match the literal letter A, AAAA etc., and .* will match any string.
^ means the beginning of a line. ^\.[VE]/ will match .V/ and .E/ (but only at the start of the line).
if you need .V or .E try ^.(V|E)/* the or | operator is useful for check ^.V/* or ^.E/*

What does /([^.]*)\.(.*)/ mean?

When I searched about something, I found an answered question in this site. 2 of the answers contain
/([^.]*)\.(.*)/
on their answer.
The question is located at Find & replace jquery. I'm newbie in javascript, so I wonder, what does it mean? Thanks.
/([^.]*)\.(.*)/
Let us deconstruct it. The beginning and trailing slash are delimiters, and mark the start and end of the regular expression.
Then there is a parenthesized group: ([^.]*) The parentheseis are there just to group a string together. The square brackets denote a "character group", meaning that any character inside this group is accepted in its place. However, this group is negated by the first character being ^, which reverse its meaning. Since the only character beside the negation is a period, this matches a single character that is not a period. After the square brackets is a * (asterisk), which means that the square brackets can be matched zero or more times.
Then we get to the \.. This is an escaped period. Periods in regular expressions have special meaning (except when escaped or in a character group). This matches a literal period in the text.
(.*) is a new paranthesized sub-group. This time, the period matches any character, and the asterisk says it can be repeated as many times as needs to.
In summary, the expression finds any sequence of characters (that isn't a period), followed by a single period, again followed by any character.
Edit: Removed part about shortening, as it defeats the assumed purpose of the regular expression.
It's a regular expression (it matches non-periods, followed by a period followed by anything (think "file.ext")). And you should run, not walk, to learn about them. Explaining how this particular regular expression works isn't going to help you as you need to start simpler. So start with a regex tutorial and pick up Mastering Regular Expressions.
Original: /([^.]*)\.(.*)/
Split this as:
[1] ([^.]*) : It says match all characters except . [ period ]
[2] \. : match a period
[3] (.*) : matches any character
so it becomes
[1]Match all characters which are not . [ period ] [2] till you find a .[ period ] then [3] match all characters.
Anything except a dot, followed by a dot, followed by anything.
You can test regex'es on regexpal
It's a regular expression that roughly searches for a string that doesn't contain a period, followed by a period, and then a string containing any characters.
That is a regular expression. Regular expressions are powerful tools if you use them right.
That particular regex extracts filename and extension from a string that looks like "file.ext".
It's a regular expression that splits a string into two parts: everything before the first period, and then the remainder. Most regex engines (including the Javascript one) allow you to then access those parts of the string separately (using $1 to refer to the first part, and $2 for the second part).
This is a regular expression with some advanced use.
Consider a simpler version: /[^.]*\..*/ which is the same as above without parentheses. This will match just any string with at least one dot. When the parentheses are added, and a match happens, the variables \1 and \2 will contain the matched parts from the parentheses. The first one will have anything before the first dot. The second part will have everything after the first dot.
Examples:
input: foo...bar
\1: foo
\2: ..bar
input: .foobar
\1:
\2: foobar
This regular expression generates two matching expressions that can be retrieved.
The two parts are the string before the first dot (which may be empty), and the string after the first dot (which may contain other dots).
The only restriction on the input is that it contain at least one dot. It will match "." contrary to some of the other answers, but the retrived groups will be empty.
IMO /.*\..*/g Would do the same thing.
const senExample = 'I am test. Food is good.';
const result1 = senExample.match(/([^.]*)\.(.*)/g);
console.log(result1); // ["I am test. Food is good."]
const result2 = senExample.match(/^.*\..*/g);
console.log(result2); // ["I am test. Food is good."]
the . character matches any character except line break characters the \r or \n.
the ^ negates what follows it (in this case the dot)
the * means "zero or more times"
the parentheses group and capture,
the \ allows you to match a special character (like the dot or the star)
so this ([^.]*) means any line break repeated zero or more times (it just eats up carriage returns).
this (.*) part means any string of characters zero or more times (except the line breaks)
and the \. means a real dot
so the whole thing would match zero or more line breaks followed by a dot followed by any number of characters.
For more information and a really great reference on Regular Expressions check out: http://www.regular-expressions.info/reference.html
It's a regular expression, which basically is a pattern of characters that is used to describe another pattern of characters. I once used regexps to find an email address inside a text file, and they can be used to find pretty much any pattern of text within a larger body of text provided you write the regexp properly.