I'd like to use some regex to match the contents of some brackets and the text immediately after that until some whitespace, except in the situation that there is another opening bracket before reaching that white space.
For example in the following:
- (NSArray *)componentsForRegularExpression:(NSString *)regex
(NSArray *) and (NSString *)regex would be matched.
However using the regex I have already, matches (NSString *)regex correctly however rather than just matching (NSArray *) it matches the whole of (NSArray *)componentsForRegularExpression: which I do not wish it to do.
The regex I've used is as follows:
\(.*?\)[^\s|(]*
So how would I use regex to accomplish this, to match the contents of the brackets always but to only also match what is after it (up until whitespace) so longer as there is not another open bracket it that period?
How about this:
\(.*?\)([^\s(]*(?=\s|$))?
It matches something in brackets, then optionally matches any number of non-space non-) characters followed by look-ahead to match a space (or end-of-string, in case it may appear at the end of the string).
Note that there shouldn't be a | in [] (unless you want to match the | character).
Live demo (surrounded by brackets and added non-capturing group (?:)).
This regex should work for you:
(\([^)]*\)(?:(?![^(]*\()[^\s]*|))
Live Demo: http://www.rubular.com/r/opurflXx2E
I see I ended up with much the same answer as #Dukeling. I did however manage to avoid lazy matching.
\([^)]+\)(?:[^\s(]*(?=$|\s))?
Related
I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example
I am new to regex and playing around with writing regex to match markdown syntaxes, particularly italic text like:
this is markdown with some *italic text*
After writing some naive implementations I found this regex which seems to do the job quite nicely (dealing with edge-cases) and matches the entire string:
(?<!\*)\*([^ ][^*\n]*?)\*(?!\*)
However, I don't want to match the entire string - I only want to match the beginning and end * characters (so that I can do some special formatting to those characters). How might I go about doing that?
The tricky thing is that I only want to the match the * characters when the rest of the string matches the correct format of a string in italics (i.e. meets the requirements of that regex above). So a simple regex like (\*|\*) isn't going to cut it.
Except from using a capturing group for the asterix at the start and at the end, you can add an asterix to the first negated character class to prevent matching a double **.
Note that as pointed out by #toto you don't really need the capturing groups around the asterix (\*). You can also match them and add the replacement characters before and after the single capturing group for the content in the middle.
It also means that it should match at least a single character other then an asterix.
You don't have to make the first character class non greedy *? as it can not cross the * boundary that follows.
(?<!\*)(\*)([^*\s][^*\r\n]*)(\*)(?!\*)
Regex demo
If there can also not be a space before the ending asterix, you can repeat matching a space followed by matching any non whitespace char except an asterix (?: [^*\s]+)*
The \r\n in the negated character class is to prevent newline boundaries which are also matched by \s. If that should not be the case, you can replace that by a space or tab and space.
(?<!\*)(\*)([^*\s]+(?: [^*\s]+)*)(\*)(?!\*)
Regex demo
Just change the first and second \* to capturing groups and you can change at will:
(?<!\*)(\*)([^ ][^*\n]*?)(\*)(?!\*)
Demo
EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)
I've been trying hard to get this Regex to work, but am simply not good enough at this stuff apparently :(
Regex - Trying to extract sources
I thought this would work... I'm trying to get all of the content where:
It starts with ds://
Ends with either carriage return or line feed
That's it! Essentially I'm going to then do a negative lookahead such that I can remove all content that is NOT conforming to above (in Notepad++) which allows for Regex search/replace.
Search for lines that contain the pattern, and mark them
Search menu > Mark
Find what: ds://.*\R
check Regular expression
Check Mark the lines
Find all
Remove the non marked lines
Search menu > Bookmark
Remove unmarked lines
You don't need to add the \w specifier to look for a word after the ds:// in the look ahead. Removing that and altering the final specification from "zero or one carriage return, then zero or one newline" to "either a carriage return or a newline" in capture group should do it for you:
(?=ds:\/\/).*(?:\r|\n)
Update: Carriage return or Line feed group does not need to be captured.
Update 2: The following regex will actually work for your proposed use case in the comments, matching everything but the pattern you described in the question.
^(?:(?!ds:\/\/.*(?:\r|\n)).)*$
You regex (?=ds:\w+).*\r?\n? does not match because in the content there is ds:// and \w does not match a forward slash. To make your regex work you could change it to:
(?=ds://\w+).*\r?\n? demo which can be shortened to ds://.*\R? demo
Note that you don't have to escape the forward slash.
If you want to do a find and replace to keep the lines that contain ds:// you could use a negative lookahead:
Find what
^(?!.*ds://).*\R?
Replace with
Leave empty
Explanation
^ Start of the string
(?!.*ds://) Negative lookahead to assert the string does not contain ds://
.* Match any character 0+ times
\R? An optional unicode newline sequence to also match the last line if it is not followed by a newline
See the Regex demo
Here you go, Andrew:
Regex: ds:\/\/.*
Link: https://regex101.com/r/ulO9GO/2
Let me know if any question.
I'm just having trouble figuring out how to regex properly. What I need is to match an asterisk followed by a space followed by any amount of characters that aren't \n. (Similar to reddit list formatting)
Example:
* Test
* Test2
* Test3
The closest I got was this, but it wasn't working.
/^[*][ ](.*?)/s
Can anyone familiar with PCRE help me.
You should not use a lazy dot pattern at the end of the regex because it will never match any single char (as it will be skipped when the regex engine comes up to it, and since there is nothing to match after it, the empty string will be matched by .*?).
Use the greedy dot pattern:
^\* (.*)
See the regex demo
Other notes: you may use \h to match any horizontal whitespace instead of the regular space in the pattern. To match start of lines with ^ use m modifier. Only use s modifier if you need . to match any chars including a newline (and carriage return depending on PCRE verbs that are active).