How to match up till '.' in string using perl [closed] - regex

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I am trying to write a match regex to extract text from a string in perl.
The string can be like something:
I am a man. I am not a woman.
I want to match up to the first occurrence of '.', but if I use [^.], it returns only the first character i.e. 'I' instead of 'I am a man'.
How do I write my [] expression?

You can use
^[^.]*
which means any number of character that is not a dot, at the beginning of the string

You have to add a quantifier (how many times you want your pattern to be matched). By default, it's one.
[^.]{5}
This matches anything that is not a . 5 times.
[^.]{2,}
This matches anything that is not a . 2 times or more.
[^.]{2,5}
This matches anything that is not a . 2, 3, 4 or 5 times.
[^.]*
The quantifier * means 0 times or more (it's a shortcut for {0,}). So this matches anything that is not . 0 times or more.
[^.]+
The quantifier + means once or more (it's a shortcut for {1,}). So this matches anything that is not . once or more. This is what you want.
[^.]?
The quantifier ? means 0 times or once (it's a shortcut for {0, 1}). So this matches anything that is not . 0 times or once.

Related

Block Youtube ads [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 11 months ago.
Improve this question
I'm trying to block the annoying youtube ads with pihole but unfortunately it doesn't work for me. The following is not viewed at all:
(rr[\d]{1}---[\s]{2}-[\s]{8}-[\w]{4})\.googlevideo\.com
Has anyone had similar experiences?
Examples look like this
rr1---sn-8xgn5uxa-quhl.googlevideo.com
rr1---sn-8xgn5uxa-quhl.googlevideo.com
rr3---sn-8xgn5uxa-quhz.googlevideo.com
rr6---sn-8xgn5uxa-quhl.googlevideo.com
Using [\s]{2} in the pattern (which can be written as \s{2} matches 2 whitespace chars, but in the example data there is sn
The single meta characters in this case do not have to be placed between square brackets.
Looking at some documentation on this page \w \s and \d are not supported.
You might use
rr[[:digit:]]---sn-[[:alnum:]]{8}-[[:alnum:]]{4}\.googlevideo\.com
The pattern matches:
rr[[:digit:]] Match rr and a single digit
---sn- Match literally
[[:alnum:]]{8} Match 8 alphanumerics
-[[:alnum:]]{4} Match - and 4 alphanumerics
\.googlevideo\.com Match .googlevideo.com
See a regex demo.
Pi-Hole documentation does not mention the usual abbrevations for character classes (like \d, \s or \w you used).
If you replace your character classes with the ones from the Pi-Hole documentation you end with
(rr[:digit:]{1}---[:space:]{2}-[:space:]{8}-[A-Za-z0-9_]{4})\.googlevideo\.com
Probably the \s is not what you originally wanted, as your examples contain letters there instead of spaces. And \w includes an underscore, that does not appear in your examples. Also a {1} can be skipped.
So I would suggest this expression:
(rr[:digit:]---[:alnum:]{2}-[:alnum:]{8}-[:alnum:]{4})\.googlevideo\.com
If you don't need the hostname-part for further processing, you could remove the group marker () around it.
This pattern matches all your samples but it may be too tight?
rr\d---sn-8xgn5uxa-quh\w.googlevideo.com

Are regular expressions (0*1*)* and (0 + 1)* same? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I was solving exercise problems in my textbook, and I wondered that whether (0*1*)* and (0 + 1)* are same. I think they are same, but I have no idea how to prove it. Are they same regular expressions?
No, they are not the same:
0*1* matches any number of zeroes (including none) followed by any number of ones (including none). This is then repeated any number of times (including none) via ()*
0 + 1 matches a mandatory single zero, then at least one or more spaces, then one more space, then a mandatory single one. Again, this is repeated any number of times via ()* like before.
So, the two match very different things the first one will match input like 000000 or 1111 because it has (essentially) an optional zero or one, so an input that only has a single of these is correct but not according to the second regex. While the second one will match input that has at least two or more spaces between the mandatory 0 and 1 characters but the first regex will reject that, since it does not allow spaces.
No they aren't. The first on will for example match 0011, but the second one won't.
You can check how they behave in: https://regex101.com/
I think you meant (0+1)*instead of (0 + 1)* (mind the spaces, they are meaningful when dealing with regex).
To answer your question, no. They are not the same.
The + quantifier means "one or more" (it does not mean "or" like you might think), so the (0+1)* regex will match 01 but not 10, whereas the (0*1*)* regex will match both.

What's going on in this substitution? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
Could I get some help figuring out what's going on?
How does the .xnext end up at the beginning of the string?
[237] > cat /tmp/text.txt
xtop.xnext|sig 12345
[238] > perl -p -e 's/\.xnext\\|/.xnext./;' /tmp/text.txt
.xnext.xtop.xnext|sig 12345
In your regex, | is the "alternation" metacharacter and not a literal pipe character. So your pattern
\.xnext\\|
can either match the literal string .xnext\, which is what is specified on the left side of the alternation character, or
nothing, which is what is specified to the right side of the alternation.
So the beginning of your input string is a match for your regular expression pattern, and your substitution pattern .xnext. is prepended to your string.
The pattern you wanted to use was
\.xnext\|
which is how to specify the literal string .xnext|.

Regex for two digit number followed by . for find and replace in vim [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to find out following string pattern in vim and replace it with some thing else. Can you please tell me regex for the same.
1.
11.
20.
21.
99.
basically one / two digits followed by dot.
I think you could do something like the following (I'm not very experienced with VI so there might be a better way)
:%s/\d\+\./MyString/gc
So that's essentially using \d\+\. to search for numbers appearing one or more times followed by a ..
MyString is your replacement string.
:%s is the substitute command, :s would just search the current line.
/gc looks for the match as many times as it appears on the line (g), and asks for confirmation before each replacement (c).
Tried this?
[^0-9][0-9][0-9]\.
Or have you tried it and it didn't work?
It has an issue though of three digits and a dot, i.e. "123." will also be captured
The regex for 1 or 2 digits followed by a dot is:
\<\d\d\?\.
The "word boundary" \< precludes 3 digits (and a dot), which would be allowed without it (the last two digits of a 3-digit number would match).
To replace using this tegex in vi:
s/\<\d\d\?\./foo/g

Regex remove spaces after 2 letters [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
What is the regex for no whitespaces after the first 2 letters? For example, the string starts with AB then is followed by an undetermined amount of numbers and/or letters but it cannot have any spaces.
(^[A][B][\S])
^[a-zA-Z]{2}\S+$
Edit live on Debuggex
^ denotes the start of the line
[a-zA-Z]{2} denotes exactly 2 letters
\S any character besides white spaces
+ one or more.
$ denotes end of string.
please read more about regex here and use debugexx.com to experiment.
It looks like you're using Python, so I'll just assume that.
^\w\w\S*
That's 2 word characters at the start of the line, followed by 0 or more non-whitespace characters.
Not sure what platform you're using, but you can try the following:
^[\d\w]{2}[^\s]*$
Should do the trick.
The [\d\w]{2} represents 2 letters or digits and then the [^\s] means no white space for zero or more characters afterwards. The caret and dollar sign notation force this to happen at the beginning of the string.
EDIT: Although the title of this question is remove white space your question seems to just ask for the regex for 2 letter/digit characters with no white space afterwards, which is what I answered. If you are looking for how to make a lookahead ignore my answer.