Regular Expression Troubles - regex

Given the following type of string:
"#First Thing# #Another One##No Space# Main String #After Main# #EndString#"
I would like to come up with a regular expression that can return all the text surrounded by the # symbols as matches. One of the things giving me grief is the fact that the # symbol is both the opening and closing delimiter. All of my attempts at a regex have just returned the entire string. The other issue is that it is possible for part of the string to not be surrounded by # symbols, as shown by the substring "Main String" above. Does anyone have any ideas? I have toyed around with Negative Look-behind assertion a bit, but haven't been able to get it to work. There may or may not be a space in between the groups of #'s but I want to ignore them (not match against them) if there are. The other option would be to just write a string parser routine, which would be fairly easy, but I would prefer to use a regex if possible.

/((#[^#]+#)|([^#]+))/
Perhaps something like the above will match what you want.
This will match the space in between two hashes. Hmm.
/((#[^#]+#)|([^#]*[^#\s]+[^#]*))/
That will get rid of the nasty space, I think.

[Edit]
I think that this is what you need:
(?<=#)[^#]+?(?=#)
With input #First Thing# #Another One##No Space# Main String #After Main# matches:
First Thing
Another One
No Space
Main String
After Main
The second match is the space between Thing# and #Another.
[EDIT] To ignore space:
(?<=)(?!\s+)[^#]+?(?=#)
If you want to ignore trailing spaces:
(?<=)(?!\s+)[^#]+?(?=\s*#)

Try this. The first and last groups should not be captured and the .*? should be lazy
(?:#)(.*?)(?:#)
I think this is what you really need:
((#[^#]+#)|([^#]*[^#\s]+[^#]*))
but it will not capture the #'s around Main String

Related

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Regex — only zero or one 's'

I have a name, "foo bar", and in any string, foo, foos, bar and bars should be matched.
I thought this should work like this: (foo|bar)s?. I tried some other regexes as well, but they all were like this. How can I do this?
(foo|bar)s? is correct...
You should use a boundary like \b(foo|bar)s?\b. Else it would also match hihellofoos.
Your question seems to reflect perplexity over why you found a match in foosss. Note the difference between finding a match in a string, and matching the whole string.
You have several ways of dealing with this, and the right choice depends on your application.
Anchor the regex to the whole input line or input: ^(foo|bar)s?$
Anchor the regex to one word: \b(foo|bar)s?\b
Some APIs (but not preg_match) have a separate function to match the whole string.

How to ignore whitespace in a regular expression subject string?

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked

Regex for all strings not containing a string? [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 6 years ago.
Ok, so this is something completely stupid but this is something I simply never learned to do and its a hassle.
How do I specify a string that does not contain a sequence of other characters. For example I want to match all lines that do NOT end in '.config'
I would think that I could just do
.*[^(\.config)]$
but this doesn't work (why not?)
I know I can do
.*[^\.][^c][^o][^n][^f][^i][^g]$
but please please please tell me that there is a better way
You can use negative lookbehind, e.g.:
.*(?<!\.config)$
This matches all strings except those that end with ".config"
Your question contains two questions, so here are a few answers.
Match lines that don't contain a certain string (say .config) at all:
^(?:(?!\.config).)*$\r?\n?
Match lines that don't end in a certain string:
^.*(?<!\.config)$\r?\n?
and, as a bonus: Match lines that don't start with a certain string:
^(?!\.config).*$\r?\n?
(each time including newline characters, if present.
Oh, and to answer why your version doesn't work: [^abc] means "any one (1) character except a, b, or c". Your other solution would also fail on test.hg (because it also ends in the letter g - your regex looks at each character individually instead of the entire .config string. That's why you need lookaround to handle this.
(?<!\.config)$
:)
By using the [^] construct, you have created a negated character class, which matches all characters except those you have named. Order of characters in the candidate match do not matter, so this will fail on any string that has any of [(\.config) (or [)gi.\onc(])
Use negative lookahead, (with perl regexs) like so: (?!\.config$). This will match all strings that do not match the literal ".config"
Unless you are "grepping" ... since you are not using the result of a match, why not search for the strings that do end in .config and skip them? In Python:
import re
isConfig = re.compile('\.config$')
# List lst is given
filteredList = [f.strip() for f in lst if not isConfig.match(f.strip())]
I suspect that this will run faster than a more complex re.
As you have asked for a "better way": I would try a "filtering" approach. I think it is quite easy to read and to understand:
#!/usr/bin/perl
while(<>) {
next if /\.config$/; # ignore the line if it ends with ".config"
print;
}
As you can see I have used perl code as an example. But I think you get the idea?
added:
this approach could also be used to chain up more filter patterns and it still remains good readable and easy to understand,
next if /\.config$/; # ignore the line if it ends with ".config"
next if /\.ini$/; # ignore the line if it ends with ".ini"
next if /\.reg$/; # ignore the line if it ends with ".reg"
# now we have filtered out all the lines we want to skip
... process only the lines we want to use ...
I used Regexpal before finding this page and came up with the following solution when I wanted to check that a string doesn't contain a file extension:
^(.(?!\.[a-zA-Z0-9]{3,}))*$ I used the m checkbox option so that I could present many lines and see which of them did or did not match.
so to find a string that doesn't contain another "^(.(?!" + expression you don't want + "))*$"
My article on the uses of this particular regex