Perl: Regular expressions Pattern matching [closed] - regex

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Is [A] a regular expression that will match a string of characters which contains any number of occurrences of the letter A (and only the letter A, with no other characters or spaces) such as AAAA?

Anything in square brackets is a character class. This is complicated enough that it has its own Perl documentation page (in the link), so it's not a surprise it wasn't evident how it works.
A character class defines a set of possible characters; when pattern matching, a character class by itself matches one character from the input, no matter how many characters there are inside the square brackets.
/[A]/ # find one copy of 'A' anywhere in the string
/[abcd]/ # find one copy of any of 'a', 'b', 'c', or 'd' anywhere in the string
/[A..Z]/ # find any one uppercase ASCII character somewhere in the string
If you want your class to match differently, you can add modifiers:
/[A..Z]+/ # find one or more uppercase ASCII characters in a row
/[A]*/ # find zero or more 'A's in a row
The linked page will show you a lot of other options to specify sets of characters inside the square brackets. But the key is that one set of square brackets matches one character unless you add + (one or more of these) or '*' (zero or more of these).

No.
The regular expression pattern [A] can be simplified to just A. It will match any string that contains A. While that includes AAAA, it also includes ZAZ.
For starters, you will need to anchor the match.

Related

Regex expression doesn't recognize dot at end of word - Regex (C++) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I'm trying to read a line out of a file using the following regex expression:
^([A-z.]+?\\s?[A-z]+)\\s([A-z]+)\\s(\\d{7})\\s(\\d?\\d.\\d)$
on the line:
W.W. Sneijder 0000574 10.0
(To be clear: the intent is to make any word with chars [a-z], [A-Z], or dots, match with the [A-z.]+ part.)
However, the regular expression doesn't recognize the second dot in W.W., which seems strange to me. Don't the square brackets combined with the + mean that any character from inside them is accepted, until (here) whitespace is encountered? I found a regex that does work but isn't that elegant:
^([A-z.]+[.\\s?[A-z]+)\\s([A-z]+)\\s(\\d{7})\\s(\\d?\\d.\\d)$
I'm hoping the find an elegant solution. It'd be great to hear your input.
Links such as RegEx - Not parsing dot(.) at the end of a sentence didn't seem to answer my question unfortunately.
Space separated data is just a different variant of the common CSV (Comma Separated Values) format. There are many ways to separate a string on arbitrary separators, but in C++ using space is actually very easy:
std::vector<std::string> separate_on_space(std::string const& input)
{
std::vector<std::string> output;
std::istringstream iss(input);
// Copy all space-separated "words" from the input to the vector
std::copy(std::istream_iterator<std::string>(iss), // Begin iterator
std::istream_iterator<std::string>(), // End iterator
std::back_inserter(output)); // Destination iterator
return output;
}
[See example here]
Once you have separated the values into a vector of strings, you can then convert the numeric values to their actual type (for example using std::stod) and store into suitable objects.
Of course this doesn't handle names with spaces in them in a graceful way, but that can be handled at a higher level (by checking the size of the resulting vector, and by knowing the last two elements should always the special numbers, and the rest are the names).
On the other hand the regular expression in the question doesn't handle it at all. :)
In your regex, the entire W.W. Sneijder is captured in the first group. Looking at your regex, I doubt you intended it that way.
I think the regex you wanted is ^([A-z.]+?\s?[A-z]+)\s(\d{7})\s(\d?\d.\d)$. Or if you wanted Sneijder to be in the second capture: ^([A-z.]+?)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$.
... or maybe you wanted ^([A-z.]+?\s?[A-z]*)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$ (* instead of + in the first capture group).
or ^([A-z.]+?(?:\s[A-z]+)?)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$ (optional space + text, again in the first capture groups).
All 4 expressions should match your test string, but behave differently on other test strings.
There certainly are improvements to the regex, such as ensuring the string does not start with a ..
As long as you touch the inside of each capture group but not the logic across capture groups, you can let the regex manage any level of control you desire and this will have no impact on the code that follows the text parsing.
It will always be 4 capture groups, with, except the first regex I posted above that has only 3 capture groups, with some guarantees on the text if you need to convert it to another type.

RegEx to find count of special characters in String [duplicate]

This question already has answers here:
How to get the count of only special character in a string using Regex?
(6 answers)
Closed 2 years ago.
I need to form the RegEx to produce the output only if more than two occurrences of special characters exists in the given string.
1) abcd##qwer - Match
2) abcd#dsfsdg#fffj-Match
3) abcd#qwetg- No Match
4) acwexyz - No Math
5) abcd#ds#$%fsdg#fffj-Match
Can anyone help me on this?
Note: I need to use this regular expression in one of the existing tool not in any programming language.
UPDATE after OP edit
The edited OP introduces a small amount of additional complexity that necessitates a different pattern entirely. The keys here are that (a) there is now a significantly limited set of "special characters" and (b) that these characters must appear at least twice (c) in any position in the string.
To implement this, you would use something like:
(?:.*?[##$%].*?){2,}
Asserts a non-capturing group,
Which contains any number of characters, followed by
Any character in the set ##$%
Followed by any number of characters
Ensures this pattern happens twice in a given string.
Original answer
By "special characters", I assume you mean anything outside standard alphanumeric characters. You can use the pattern below in most flavors of Regex:
([^A-Za-z0-9])\1
This (a) creates a set of all characters not including alphanumeric characters and matches a character against it, then (b) checks to see if the same character appears adjacent.
Regex101

Regex match a string within 2 different strings containing other characters

Given bar(alvin the chipmunk dude) and chipmunk(alvin the chipmunk dude), how would you match the word "chipmunk" only on the "bar" function?
Another question I just asked, but without the needed complexity I was looking for, is answered here. I do not believe this is a duplicate given the answer to the question from #revo. That answer does answer the other question however I see no way to adapt it to ensure the match is contained within two different strings ("bar(" and ")").
chipmunk(?=[^\)\(\\]*(?:\\.[^\)\(\\]*)*\)) (courtesy of #revo) matches "chipmunk" inside of the parentheses, but I want to constrain it to only to to being within "bar(" and ")".
Test here.
Using JetBrains IDE which uses Java.
Since you are using a Java regex library, you may leverage the constrained-width lookbehind feature:
Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.
You may use
(?<=bar\([^()]{0,1000})chipmunk
It matches any chipmunk string that is immediately preceded with bar( followed with 0 to 1000 chars other than ( and ).
You may test it at RegexPlanet.com.

Ruby regex for extracting email addresses not detecting hypens [duplicate]

This question already has answers here:
Get final special character with a regular expression
(2 answers)
Closed 8 years ago.
Tried looking at the regex that some others are using, but for some reason it's not working for me.
I just basically have a string, such as "testing-user#example.com", It'll only extract user#example.com and not the whole thing.
Here's what I have:
regex = Regexp.new(/\b[a-zA-Z0-9._%+-,]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/)
email = line.scan(regex)
Any help would be greatly appreciated.
The hyphen needs to be escaped for the position it is at inside of the character class.
[a-zA-Z0-9._%+\-,]+
^
(+-,) currently matches a single character in the range between + and ,
Inside of a character class the hyphen has special meaning. You can place the hyphen as the first or last character of the class. In some regex implementations, you can also place directly after a range. If you place the hyphen anywhere else you need to precede it with a backslash it in order to add it to your class.

RegEx Remove "-" but not " - " from a string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I want to remove "-" but not " - " from a string.
For Example: "01-Frozen - Madonna.mp3" becomes "01Frozen - Madonna.mp3"
I will than remove all digits using /d, I have seen some patterns for it.
So can any body help?
Let's take the example you already specified. 01-Frozen - Madonna.mp3.
The pattern is this: <non space character><hyphen><non space character>
If you need a space, the regex would be \s which will match a single non breaking space. The wonderful aspect of Regular Expression is that most match flags have an opposite, usually denoted by a capital letter of the same identifier. Since, in this case, we don't want a space, we could use \S which matches all characters that are not a space.
So the pattern now looks like: \S-\S.
If you've tried this, it won't work as expected since we want only the hyphens that do not have non-space-items around them and should not include the non-space-items themselves.
Cases like these call for a special kind of...erm...things termed as lookaheads and lookbehinds. Usually this involves a question mark and one more identifier — one of >, <, =, :, !. These extra identifiers ensure what kind of lazy you want your matches to get. You can read more about them here.
For this case, we need to use the = which will ensure that token appended to it — \S in our case — won't be a part of the result. This is called a positive lookahead matcher. So the final regex looks like this:
/(?=\S)-(?=\S)/
[Edited]
Paraphrasing #jerry's comments:
Well, if you want it to work properly, you'll need a lookbehind: /(?<=\S)-(?=\S)/. Though I would prefer negative ones in this case as it would be more natural to say 'not preceded by' and 'not followed by': /(?
Option 1:
/(?<=\S)-(?=\S)/
Option 2:
/(?<!\s)-(?!\s)/