How to match only one group - Regex - regex

I am using Regex in the program Octoparse and I need to match only this #67 in MIDI Controller or this: #30 in DJ Mixer. I don't need the #, but I don't mind it. Since not every time it is going to end with Controller or Mixer I can't use the words as an end.
Can I somehow group them and then choose which group to match? I know only basic Regex so it's a little bit hard for me. I saw I can use <\1> but it doesn't work.
fiddle
Here is what the program looks like:
As you can see I can't remove the global flag.

As far as I understood from the description of your task, it doesn’t matter to you the first occurrence or the last, then you can do this:
(?!(.|\s)*?#\d+(.|\s)*?)(?<=#)(.+?)[\w\s].+
https://regex101.com/r/7OSJ7p/1

Related

What is the correct regex pattern to use to clean up Google links in Vim?

As you know, Google links can be pretty unwieldy:
https://www.google.com/search?q=some+search+here&source=hp&newwindow=1&ei=A_23ssOllsUx&oq=some+se....
I have MANY Google links saved that I would like to clean up to make them look like so:
https://www.google.com/search?q=some+search+here
The only issue is that I cannot figure out the correct regex pattern for Vim to do this.
I figure it must be something like this:
:%s/&source=[^&].*//
:%s/&source=[^&].*[^&]//
:%s/&source=.*[^&]//
But none of these are working; they start at &source, and replace until the end of the line.
Also, the search?q=some+search+here can appear anywhere after the .com/, so I cannot rely on it being in the same place every time.
So, what is the correct Vim regex pattern to use in order to clean up these links?
Your example can easily be dealt with by using a very simple pattern:
:%s/&.*
because you want to keep everything that comes before the second parameter, which is marked by the first & in the string.
But, if the q parameter can be anywhere in the query string, as in:
https://www.google.com/search?source=hp&newwindow=1&q=some+search+here&ei=A_23ssOllsUx&oq=some+se....
then no amount of capturing or whatnot will be enough to cover every possible case with a single pattern, let alone a readable one. At this point, scripting is really the only reasonable approach, preferably with a language that understands URLs.
--- EDIT ---
Hmm, scratch that. The following seems to work across the board:
:%s#^\(https://www.google.com/search?\)\(.*\)\(q=.\{-}\)&.*#\1\3
We use # as separator because of the many / in a typical URL.
We capture a first group, up to and including the ? that marks the beginning of the query string.
We match whatever comes between the ? and the first occurrence of q= without capturing it.
We capture a second group, the q parameter, up to and excluding the next &.
We replace the whole thing with the first capture group followed by the second capture group.

Extract only the text field needed

I am at the beginning of learning Regex, and I use every opportunity to understand how it's working. Currently I am trying to extract dates from a text file (which is in fact a vnt-file type from my mobile phone). It looks like following:
BEGIN:VNOTE
VERSION:1.1
BODY;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:18.07.=0A14.08.=0A15.09.=0A15.10.=
=0A13.11.=0A13.12.=0A12.01.=0A03.02. Grippe=0A06.03.=0A04.04.2015=0A0=
5.05.2015=0A03.06.2015=0A03.07.2015=0A02.08.2015=0A30.08.2015=0A28.09=
17.11.2017=0A
DCREATED:20171118T095601
X-IRMC-LUID:150
END:VNOTE
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
and so on. If the date has also a year, it should also be displayed.
I almost found out how to detect the dates by the following regex:
.+(\d\d\.\d\d\.(2015|2016|2017)?).+
But it only detect very few of the dates. The result is this:
BEGIN:VNOTE
VERSION:1.1
15.10.
04.04.2015
30.08.2015
24.01.2016
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
Then I tried to add a question mark which makes the .+ not greedy, as far as I read in tutorials. Then the regex looks like:
.+?(\d\d\.\d\d\.(2015|2016|2017)?).+?
But the result is still not what I am looking for:
BEGIN:VNOTE
VERSION:1.1
21.03.20.04.18.05.18.06.18.07.14.08.15.09.15.10.
13.11.13.12.12.01.03.02.06.03.04.04.20150A0=
03.06.201503.07.201502.08.201530.08.20150A28.09=
28.10.201525.11.201528.12.201524.01.20160A
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
For someone who is familiar with regex I am pretty sure this is very easy to solve, but I don't get it. It's very confusing when you are new to regex. I tried to find a hint in some tutorials or stackoverflow posts, but all I found is this: Notepad++ how to extract only the text field which is needed?
But it doesn't work for me. I assume it might have something to do with the fact that my text file is not one single line.
I have my example on regex101 too.
I would be very thankful if maybe someone can give me a hint what else I can try.
Edit: I would like to detect the dates with the regex and as a result have a list with only the dates (maybe it is called substitute?)
Edit 2: Sorry for not mentioning it earlier: I just want to use the regex in e.g. Notepad++ or an online regex test website. Just to get the result of the dates and save the result in a new txt-file. I don't want to use the regex in an programming language. My apologies for not being precisely before.
Edit 3: The result should be a list with the dates, and each date in a new line:
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
I suggest this pattern:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)
This makes use of the \G flag that, in this case, allows for multiple matches from the very start of the match without letting any single unmatched character in the text, thus allowing the removal of all but what's wanted.
If you want to remove the extra matches as well, add |.* at the end:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)|.*
regex101 demo
In N++, make sure the options underlined are selected, and that the cursor is at the beginning. In the picture below, I replaced then undid the replacement, only to show that matches were identified (16 replacements).
You can try using the following pattern:
\d{2}\.\d{2}\.(?:\d{4})?
This will match day.month dates of the form 18.07., but it also allows such a date to be followed by a four digit year, e.g. 18.07.2017. While it would be nice to make the pattern more restrictive, to avoid false fire matches, I do not see anything obvious which can be added to the above pattern. Follow the demo link below to see the pattern in action.
Demo

Get only first match in Regex

Given this string: hello"C07","73" (quotes included) I want to get "C07". I'm using (?:hello)|(?<=")(?<screen>[a-zA-Z0-9]+)?(?=") to try to do this. However, it consistently matches "73" as well. I've tried ...0-9]+){1}..., but that doesn't work either. I must be misunderstanding how this is supposed to work, but I can't figure out any other way.
How can I get just the first set of characters between quotes?
EDIT: Here's a link to show my problem.
EDIT: Ok, here's exactly what I'm trying to do:
Basically, what I'm trying to get is this: 1) a positive match on "hello", 2) a group named "screen" with, in this case, "C07" in it and 3) a group named "format" with, in this case, "73" in it.
Both the "C07" and "73" will vary. "hello" will always be the same. There may or may not be an extra comma between "hello" and the first double-quote.
For you initial question of how to stop after the first match either removing the global search, or searching from the start of the string would accomplish that.
For the latter question you can name your groups and just keep extending the pattern throughout the line(s).
hello"(?<screen>[^"]+)","(?<format>[^"]+)"
Demo: http://regex101.com/r/PBXe8l/1
Based on your regex example, why not:
^(?:hello)"([a-zA-Z\d]+)"
Regex Demo

Retrieve characters after nth occurrence of an another with Regex

I'm writing a simple bot that broadcasts messages to clients based on messages from a server. This will be done in JavaScript but I am trying to understand Regex. I've been Googling for the past hour and I've come so close but I am simply unable to solve this one.
Basically I need to retrieve everything between the second / and the first [. It sounds really simple but I cannot figure out how to do this.
Here's some sample code:
192.168.1.1:33291/76561198014386231/testName joined [linux/76561198014386231]
Here's the Regex I've come up with:
\/(.*?)\[
I've found lots of similar questions here on StackOverflow but most of them seem specific to a particular language or end up being too complex and I'm unable to whittle down the query.
I know this is a simple one, but I am totally stumped.
Instead of .*?. Then you could match everything but a forward slash by doing [^\/]*.
([^\/]*)\s*\[
Live preview
If it needs to be after the second slash. As in the contents between the second slash and the square bracket can contain slashes. Then you could do:
(?:.*?\/){2}(.*)\s*\[
Live preview
Remove the \s* if you want to. I'm just assuming you don't care about that whitespace.

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.