Regex get text inside one pair of square brackets - regex

In the text below how do I get the text inside the first pair of square brackets
xxxx [I can be any text and even have digits like 0 25 ] [sdfsfsf] [ssf sf565wf]
This is what I tried. But it goes till the last square bracket.
.*\[.*]
What i want selected is
[I can be any text and even have digits like 0 25 ]

If you don't want to go past the closing square bracket, use [^\]]* in place of .*:
^[^\[]*(\[[^\]]*])
Add ^ anchor at the beginning if you would like to search multiple lines.
Add a capturing group around the square brackets, and get the content of that group to obtain the text that you need.
Demo.

Another one with DEMO. A bit complicated though:
(\[[^\]]+\])[^\[\]]*(?:\[[^\]].*\])
EXPLANATION
(\[[^\]]+\]) #capturing group
#match first [] pair
[^\[\]]* #match characters except ] and [
(?:\[[^\]].*\]) #non-capturing group
#match all the rest [] pairs
#this is a greedy match

* and + are 'greedy' by default, so they try to match as much text as possible. If you want them to match as little as possible, you can make them non-greedy with ?, eg .*\[.*?\]. Also, the .* at the beginning matches any number of any characters before the opening square bracket, so this regex will match all text up to a ']' as long as there is a '[' somewhere before the ']'. If you only want to match the brackets and their contents, you want simply \[.*?\].
Non-greedy modifiers with ? are not supported in all regex engines; if it's available to you you should write it with ? because it makes your intent clearer, but if you are using a simpler regex engine you can achieve the same effect by using \[[^\]]*\] instead. This is a negated character class, which matches as many as possible of any character except ']'.

Related

Deleted everything before the dot

How can I use regex in notepad++ to make a query like this:
I have a list with subdomains containing three words such as
web1.com
test.web2.com
www.test.web3.com
I want to filter so that only three words remain and something like this comes out:
web1.com
test.web2.com
test.web3.com
I was able to delete so that only the domain remains, but this is not what I want
^(?:.+\.)?([^.\r\n]+\.[^.\r\n]+)$
An idea to match until the endpart starts and capture that.
^.*?\.([\w-]+\.[\w-]+\.[\w-]+)$
Replace with $1 (what was captured by the first group)
.*? matches lazily any amount of any characters (besides newline)
[\w-]+ char-class matches one or more word characters and hyphen
See this demo at regex101 (more explanation on the right side)
In Notepad++ be sure to have unchecked: [ ] dot matches newline
Another take at it using a positive lookahead to assert the 3 "words" to the right, allowing for non whitespace chars excluding a dot using [^\s.]
In the replacement use an empty string.
^\S+?\.(?=[^\s.]+\.[^\s.]+\.[^\s.]+$)
See a regex demo.

Notepad++ regex extract two options

I've a list below:
7080508136242611718:7080508978035787525:7549dda86ba9af19:31050:install_id=7080508978035787525; store-country-code=us; store-idc=useast5; ttreq=1$fd2f36282a10633c5638a02cc54c19ff13f60755; passport_csrf_token=13bf74c4e5fe04307f0a99de9aed53f9; passport_csrf_token_default=13bf74c4e5fe04307f0a99de9aed53f9; odin_tt=11ed1b48fba2d7a9fe3d86929b3d52cebbad0ca7f7dbd127e220cfb3be279621ba04487517b536050a6ded9fbe50e300cd11615e2e9551523478e5484896a9dda800e55e428842872fcf862e8c57d439:1648559503:351451268482810:3f:49:8c:b7:8c:cb:c5379d41-6cf3-4152-9d48-7aa45f7f611c:79375640-197c-4aaa-86cf-4ef8e7238be2:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdkW2eP4TZYMDY7A
7080507996291827206:7080508977079666438:6742591cc0d20580:31050:install_id=7080508977079666438; store-country-code=us; store-idc=useast5; ttreq=1$a119611bfe79541b0b4c029fe910b6507123eec2; passport_csrf_token=fb42bbd472462c17f45acb531deb057a; passport_csrf_token_default=fb42bbd472462c17f45acb531deb057a; odin_tt=6c3b06ff01fd67f42e3dccb60a1e69ca67cb8654f49662017acc209f7176517bcd13a374311f7a1b3538e6407fb237267abf43578d3180d8c834e7df886fa4377a9b950dbb6ff146e3fabf37158dcfa8:1648559508:351451233766930:dd:9e:82:59:5f:7f:596da881-89e8-4f60-b644-5fef23f0a422:f04adc87-56de-4191-a25f-843bec1d5818:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdsYPWv4TZYMDY7A
7080509102451394054:7080509820378072837:e36dc9aceecfc1cc:31050:install_id=7080509820378072837; store-country-code=us; store-idc=useast5; ttreq=1$d94700921d5ee2b21992910a2a4e84dd0ade1ec8; passport_csrf_token=2d4f4eca772dbfcbb37548ff02da3166; passport_csrf_token_default=2d4f4eca772dbfcbb37548ff02da3166; odin_tt=53d6999ebe29c0d5144a9669331ce3307a290891370914dabadbfa0520114e6e76b9103c9a6db5476e139251ee478f3a305577a89e3fa07288b7aca00774d3fccbd03566687dbcfdce31700065295939:1648559700:351451299637010:71:de:41:2b:ad:b4:1eba1ae9-3216-40e1-be7f-00303e524c27:2713cbd3-7a4f-493e-b76f-ac6d56ab8045:5:AgMNAgIAhyQWF-RPsNA-7qeIMtk5-CKcsBcWP4TZYMDY7w
7080509086894851590:7080509909225604870:98be64e38551984d:31050:install_id=7080509909225604870; store-country-code=us; store-idc=useast5; ttreq=1$05929375d8605739d8ebdbb5ce15eb406da5c467; passport_csrf_token=c95c71ad206a1d371e5b67505ae25be8; passport_csrf_token_default=c95c71ad206a1d371e5b67505ae25be8; odin_tt=6ddaa02f6133e61a4c591ef2a872f0ec2339d8b6a3fc480575fe279b13ded615e1fa7de979e18565f3ac8b8229a19a98bdf79aa1804071dcc025e1a4cd5314522cf40a62ca961770baea1d5d653d6d64:1648559720:351451292934660:9d:cf:c3:92:f6:f5:787dfb42-f4bf-43fa-9c64-ded19a1b1660:366c3024-217d-4f85-90dd-d95a0fd3e296:4:AgICAw0AFockF-RPsNA-7qeIMtk5-CKcs7bUP4TZYMDY7w
7080509183397299718:7080509974838085382:f39db5d314071713:31050:install_id=7080509974838085382; store-country-code=us; store-idc=useast5; ttreq=1$561ee2083cb13f0849a9f09e7f89edfe08c7ce6c; passport_csrf_token=721a8fee6f4f97c16ed1923ad3bbc72d; passport_csrf_token_default=721a8fee6f4f97c16ed1923ad3bbc72d;
I'd like to extract first two options aka below:
7080508136242611718:7080508978035787525
7080507996291827206:7080508977079666438
7080509102451394054:7080509820378072837
7080509086894851590:7080509909225604870
7080509183397299718:7080509974838085382
I've tried: *.: but its remove the reset of text. and keeps only first.
I've tried ^.*[0-9]+.*$ to get the second one. but no success.
Hopefully somebody can help me with accurate regex.
Thank you in advance.
This pattern *.: by itself is not a valid regex, and this pattern ^.*[0-9]+.*$ matches the whole string with at least a single digit.
If you want to match the digits and : you could make use of \K to forget what is matched so far and then match the rest of the line.
In the replacement use an empty string.
^\d+:\d+\K.*
^ Start of string
\d+:\d+ Match 1+ digits with : in between
\K.* Clear the current match, and match the rest of the line
Regex demo
^[^:]*:[^:]*\K.*
When matching things with delimiters I will use a negated character set to match the contents. In this case, the delimiter is a colon, so I want to match everything that isn't a colon until there's a colon. Then I want to match everything that isn't a colon. This will match everything up until the second colon. Because I want to keep what I just matched, I am using .* after \K, which resets the match at that point and matches everything else.
That pattern can be replaced with nothing, and the result is the first two columns of each line left.
You can use
Find: ^(\d+:\d+).*
Replace: $1
See this regex demo online.
The ^(\d+:\d+).* regex matches and captures into Group 1 one or more digits + : + one or more digits (with (\d+:\d+)) at the beginning of a line (^) and then matches the rest of the line (with .*).
The $1 replacement replaces the match with the Group 1 value.
See the demo and settings screenshot:
As an alternative, if there are chars other than digits you can also use
^([^:\v]+:[^:\v]+).*
where [^:\v]+ matches one or more chars other than a comma and any vertical whitespace.

How to allow spaces in between words?

EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)

Match same string twice within certain characters

I need to write a regex that matches patterns like this:
[[string|string]]
It's the same string twice within that specific syntax (I don't want to match the brackets themselves). I managed to come up with this:
(?<=\[\[)(.*)(?=\|)\|\1\]\]
However, it's not matching for some reason and I don't understand where's my mistake.
UPDATE: Turns out it wasn't working because my code was dirty and there were some ● characters in the first string, so both strings weren't equal: https://regexr.com/3n7ni
Removing those extraneous characters made the regex match, although it still needed tweaks (like not matching the closure brackets): https://regexr.com/3n7o7
See regex in use here
\[{2}([^|\]]+)\|\1]{2}
\[{2} Matches [ literally, twice
([^|\]]+) Captures one or more of any character except | or ] into capture group 1
\| Matches | literally
\1 Matches the text most recently captured into capture group 1
]{2} Matches ] literally, twice
To match the full pattern you can update your regex to include the first 2 brackets:
\[\[(.*)\|\1\]\]
I think you could also do without this positive lookahead (?=\|).
Your problem is the use of a greedy match (.*) (consume as much as possible). You should be using a reluctant match (.*?) (consume as little as possible):
\[\[(.*?)\|\1\]\]
See live demo.
Note that your look ahead (?=\|) is useless.

Regex match for text

I am tring to create a regex to match the content between numbered lists, e.g. with the following content:
1) Text for part 1
2) Text for part 2
3) Text for part 3
The following PCRE should work, assuming you haven't got any thing formatted like "1)" or the like inside of the sections:
\d+\)\s*(.*?)\s*(?=\d+\)|$)
Explanation:
\d+\) gives a number followed by a ).
\s* matches the preceding whitespace.
(.*?) captures the contents non-greedily.
\s* matches the trailing whitespace.
(?=\d+\)|$) ensures that the match is followed by either the start of a new section or the end of the text.
Note, it doesn't enforce that they must be ascending or anything like that, so it'd match the following text as well:
4) Hello there 1) How are you? 5) Good.
I'd suggest the following (PCRE):
(?:\d+\)\s*(.*?))*$
The inner part \d+\)\s* matches the list number and the closing brace, followed by optional white space(s).
(.*?) matches the list text, but in a non-greedy manner (otherwise, it would also match the next list item).
The enclosing (?: )*$ then matches the above zero or more times, until the end of the input.
You should keep in mind text after number and bracket might be any text, this would find your substrings:
\d\).+?(?=\d\)|$)
EDIT:
To get rid of whitespace and return only text without a number, get group 1 from following match:
\d\)\w*(.+?)(?=\d\)|$)
To get number in group(1) and text in group(2) use this:
(\d)\)\w*(.+?)(?=\d\)|$)