I am using Regex to search a file and find strings that are "sandwiched" between two other strings. This is my current code:
openingstring.*?closingstring
The issue I am having is that it is searching across multiple lines in the file. Let's say I want to find anything between "foo" and "bar" and my file looks like so:
foo this is NOT the string I want
foo this is the string I want bar
My regex expression is returning both lines, when what I would like is for it to only return line #2.
How can I go about only getting strings where foo and bar are on the same line?
I should also note that this is not being done in a text editor, or in a programming language necessarily, but in a user interface for automation software.
"." is supposed to match any characters except new line, which language are you using?
Anyway, You can try something like this:
foo[^\r\n]*bar
And note that you don't need "?" where "*" itself means 0 or more.
Why not using the inline modifier ?m?
(?m)foo.*bar
Or, to override Singleline mode, ?m-s:
(?m-s)foo.*bar
This is the case where .*? can be apparently greedy if it finds foo first, it will just go until it finds the next bar. This is only going to happen, in this case, though, if the dot . means Dot-All. You should try to turn that off. Or if you have no choice, use [^\r\n]*? instead of the Dot clause .*?
The Regex Engine will process Strings from "left-to-right".
Since your input string starts with foo, the engine will start to match at that point in the very first attempt. Nothing tells the engine, that it should not match the second foo with the expression .*? - so it proceeds until it finds bar:
foo .*? bar
foo this is NOT the string I want foo this is the string I want bar
perfect match.
It is always a good idea to exclude the opening and closing String from beeing matched inside the pattern to achieve the shortest possible match:
The pattern foo((?!foo|bar).)*bar will match anything between foo and bar only if it does neither contain foo nor bar:
foo((?!foo|bar).)*bar
Debuggex Demo
Related
Summary:
Use regex with place holders to find where the key and value are the same, and replace with just the key (in my case leveraging ES6 object-property shorthand syntax to clean thousands lines of broken ES5 code - where I can't find an auto helper in eslint rules for use with --fix).
Example:
module.exports = {
foo: foo,
bar: bar,
baz: someFunctionNotCalledBaz,
someOther: () => console.log('Defined directly. Not a reference to same name function.')
};
What I want (cleaning up old, broken code and ES6'ing a NodeJS project):
module.exports = {
foo,
bar,
baz: someFunctionNotCalledBaz,
someOther: () => console.log('Defined directly. Not a reference to same name function.')
};
I'm pretty familiar with regex, and I'm not sure this is even be possible. Using Vim, or an IDE Replace w/ regex I'd like to find a way to say:
Find all "word: word", regardless of spaces, and then the matching key on the value side:
(\w+)(:{1}\s{0,})(*SOMEHOW_REFERENCE_FIRST_MATCHING_GROUP_WITHIN_FIND*)
Replace with reference (using placeholder that would already work with matching group):
$1
Is this "lookback" even possible within the same regex? I did look at a bunch of other posts that matched my query, but to no avail.
This should do it:
sed -E 's/(.+): \1/\1/g' file
If you're unfamiliar with sed, the first part will look for strings that match the pattern (.+): \1, and the second part will replace it with \1
The \1 you see are backreferences, they refer to a capturing group. A capturing group is text inside parenthesis (here, (.+)).
(.+): \1 will locate any string of 1 or more characters followed by a semicolon and a space, and then the same string again.
And finally, sed will replace any matching string with \1, which is the part before the semicolon.
Hope this makes sense!
EDIT: Although I've marked this question with the java tag, I don't want a solution that requires java code. I just would like the pattern to be compatible with Java's regex implementation if possible (which unfortunately is not quite PCRE compatible). What I would like is just a single regex that produces the matches I want.
Suppose I have this string:
foo bar foo bar # foo bar foo bar
I'd like to match instances of "foo", but only if they are not after any "#" symbol (if one is present). In other words, I want this result:
foo bar foo bar # foo bar foo bar
^^^ ^^^
I tried using a negative look-behind like this:
(?<!#.*)\bfoo\b
...but this doesn't work because a look-behind cannot be of variable length. Any suggestions?
This one should do the work
(?=.*#) lookahead and gets all text before "#"
global flag "g" repeats pattern
/(?=.*#)(\bfoo\b)/g
You can do replaceFirst method to remove text after # and then do a simple word match:
final Pattern pattern = Pattern.compile("\\bfoo\\b");
final Matcher matcher = pattern.matcher(input.replaceFirst("#.*$", ""));
while (matcher.find()) {
System.err.printf("Found Match: %s%n", matcher.group());
}
Java regex is not powerful enough for doing it with a single regex.
Lookbehind is fixed width, so that's not a solution.
Lookeahead is only applicable when you can be sure that there is a # in the string.
Java does not allow failing a match and then continuing searching at the end (like with SKIP/FAIL in PCRE). It always continues at the character after the last matching start.
#.*|(\bfoo\b) and then checking if the first matching group is defined would be a workaround here, but there's no pure way to just match \bfoo\b sequences.
There is no way to do it with a single regex as others said already. But there is a workaround for this.
Select # and every thing after:
#.*
Copy highlighted part and paste it in parenthesis in place of
HERE:
foo(?=.*\QHERE\E)
I apologize for the horrendous topic name but I couldn't think of a way to further abstract this question. I have been wracking my brain trying to figure out the RegEx syntax for this problem and pouring over questions about lookarounds, but to no avail.
I want to return results from start to the first instance of foo (unless it is immediately followed by bar) OR the end of the file. Additionally, if foo bar appears before foo !bar or end of file, I do not want anything returned.
Below is what I have been working with so far. I may be completely off track; however, I am definitely looking to stay within RegEx unless it's completely impossible to do. I've already solved this problem using not RegEx, but I'm trying to expand my understanding of RegEx as it bothers me I couldn't work out how to do this search. Also the RegEx implementation I am using is PCRE.
Currently this RegEx will report regardless of whether foo bar appears as the first foo or not. I feel as though I am missing some simple solution but using negative lookbehind and other methods I've not been able to get the search to not return anything if foo bar appears as the first foo while also returning cases where foo !bar appears either on its own, before foo bar, or where no foo appears at all.
Current Search:
start(?:\n|\r|.)*?(?:\Z|foo(?! +bar))
Here's three example files and what I want the search to return delineated by single quotes.
Example 1: Should not return anything.
Start
Text
Text
Foo Bar
Foo Doo
Example 2: Should return text between quotes.
'Start
Text
Text
Foo Doo
Foo' Bar
Example 3: Should return text between quotes.
'Start
Text
Text'
Thanks!
You need first to prevent "foo" in the content after "start". To do that you can use several ways. A well known way is to use: (?:(?!foo).)* (you ensure that each character you match is not the begining of the word you don't want). However this way isn't very performant in general since a lookahead is tested at each position.
An other way consists to use the first character of the word you want to avoid and to build a negative character class with it. So you can describe the content like this:
(?>[^f]+|f(?!oo))*
The advantage of this approach is to limit the amount of lookahead tests that are only performed when the first letter "f" is encountered. The inconvenient, is that you need to hardcode the letter and the other part of the word in the pattern or to build the pattern dynamically with substrings of the word. (sprintf can be handy in this case)
Then the whole pattern becomes:
start(?>[^f]+|f(?!oo))*(?:foo(?! bar)|\z)
pattern description:
start
(?> # open an atomic group
[^f]+ # all characters except f (one or more times)
| # OR
f(?!oo) # f not followed by oo
)* # repeat the group zero or more times
(?:
foo(?! bar) # "foo" not followed by a space and "bar"
| # OR
\z # end of the string
)
It's a little messy but here we go:
((?(?=.*Foo Bar)Start.*?Foo(?= Bar(?![\s]*$)(?!.*?foo (?!bar)))|.*))
NOTE: You would need to enable the 's' modifier to enable dot to match newline.
The output is in the first capturing group (\1). The detailed explanation is at the bottom.
As a general comment, it will be probably easier to do conditionals(if/esle) stuff inside the codes than in the regex. It will also be more readable and easier to maintain.
Btw, you can try this regex here.
Hope it helps! :D
( # first capturing group
(? # if conditional
(?=.*Foo Bar) # if(foo bar exists in this file), using look ahead
Start.*?Foo # Match Start to the first instance of Foo
(?= # Look ahead
Bar # Match space and Bar
(?![\s]*$) # Match !(white spaces and end of line)
(?!.*?foo (?!bar))) # Match !(foo !bar)
| # else
.* # Match everything
)
)
I have a name, "foo bar", and in any string, foo, foos, bar and bars should be matched.
I thought this should work like this: (foo|bar)s?. I tried some other regexes as well, but they all were like this. How can I do this?
(foo|bar)s? is correct...
You should use a boundary like \b(foo|bar)s?\b. Else it would also match hihellofoos.
Your question seems to reflect perplexity over why you found a match in foosss. Note the difference between finding a match in a string, and matching the whole string.
You have several ways of dealing with this, and the right choice depends on your application.
Anchor the regex to the whole input line or input: ^(foo|bar)s?$
Anchor the regex to one word: \b(foo|bar)s?\b
Some APIs (but not preg_match) have a separate function to match the whole string.
I am trying to write a regex string to match a string revived from an IRC channel.
The message will be in the format "!COMMAND parameters"; the only command that is built by the system so far is repeat.
The regex I am using looks like this:
/![repeat] (.*?)/
When other commands are added it will look like:
/![cmd1|cmd2|cmd3] (.*?)/
It does not seem to be matching the right things in the string. Can anyone offer any input on this?
It appears that I need to add some basic regex stuff.
() brackets return data, [] matches but does not return.
Swapping to () does not work either.
The IRC program I am writing has a dynamic number of commands, so far I have only added "repeat" so the command pattern is "[repeat]". If I added "say", it would be "[repeat|say]".
Use the parentheses for grouping:
/!(cmd1|cmd2|cmd3) (.*)/
The brackets […] denote a character class describing just one character out of a set of characters.
You should also not use a non-greedy .* as the minimal match of .*? is an empty string.
You used bad brackets
/!(cmd1|cmd2|cmd3) (.*)/
I don't understand what did you mean with ? in your regex
[repeat] is a character class and will match r or e or p etc..., you should just use
/!repeat (.*?)/
and
/!(cmd1|cmd2|cmd3) (.*?)/
I don't understand exactly what you are hoping to match, but the lazy operator seems wrong for example
/!COMMAND (.*?)/ applied to !COMMAND paramater will match !COMMAND only, (.*?) at the end of a regex is guaranteed to match nothing.
You're doing one thing wrong.
If you replace your [] brackets with () everything should work. Between [] you put some letters to match. [abc] would match a, b, or c, not "abc", while (abc) would match "abc" and (abc|bca) would match "abc" or "bca".
Check out the Perl regular expressions tutorial and reference for more information.