REGEX : cut string from the four comma - regex

I have to cut a string from the fourth stop and I can not seem to find a good way.
Sample String:
"String1","String2","String3","String4","String5"
I can do REGEX pattern like this:
".?",".?",".?",".?,".?(.?)"
and select capture group 1.
But I have several kinds of comma place.
Could someone know a better way to do this without scripts?
Thanks!

How about something like this: (.*?,){4}(.*) The second capture group will have string 5 and beyond.
Do you test your Regex patterns with regex101.com? I find this helps a lot to test patterns.

Related

Regex - Find the Shortest Match Possible

The Problem
Given the following:
\plain\f2 This is the first part of the note. This is the second part of the note. This is the \plain\f2\fs24\cf6{\txfielddef{\*\txfieldstart\txfieldtype1\txfieldflags144\txfielddataval44334\txfielddata 35003800380039000000}{\*\txfielddatadef\txfielddatatype1\txfielddata 340034003300330034000000}{\*\txfieldtext 20{\*\txfieldend}}{\field{\*\fldinst{ HYPERLINK "44334" }}{\fldrslt{20}}}}\plain\f2\fs24 part of the note.
I'd like to produce this:
\plain\f2 This is the first part of the note. This is the second part of the note. This is the third part of the note.
What I've Tried
The example input/output is a very simplified version of the data I need to parse and it would be nice to have a way to parse the data programmatically. I have a PHP application and I've been trying to use regex to match the segments that are important and then filter out the parts of the string that aren't required. Here's what I've come up with so far:
/\\plain.*?\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*? /gm
regex101: https://regex101.com/r/ILLZU6/2
It almost matches what I want, but it but grabs the longest possible match instead of the shortest. I want it to match only one \\plain before the \\field{.... Maybe after the \\plain, I could match anything except for a space? How would I go about doing that?
I'm no regex expert, but my use-case really calls for it. (Otherwise, I'd just write code to handle everything.) Any help would be much appreciated!
(?:(?!\\plain).)* will match any string unless it contains a match for \\plain. Here's the regex implementing this:
/\\plain(?:(?!\\plain).)*\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*? /gm
regex101: https://regex101.com/r/ILLZU6/5
Also, you can replace the space at the end with (?: |$) if you want to allow the end of the text to trigger it as well as a space:
/\\plain(?:(?!\\plain).)*\\field{\\\*\\fldinst{ HYPERLINK "(.*?)" }}{\\fldrslt{(.*?)}}}}\\plain.*?(?: |$)/gm
regex101: https://regex101.com/r/ILLZU6/4

Regex for extracting each word between hyphens

I am learning regex and trying to write a pattern that exactly matches each of the strings without'-' so that I can iterate for each of the groups and print the respective strings.
I have a string that looks like "Abcd001-wd2s-vwe1-20180e3103.txt"
I was able to write a regex for extracting Abcd001, wd2s and .txt from above text as shown below
(\A[^-]+)=> Abcd001
(-[^-]+-)=> wd2s
(\..*)=>.txt
However, I was unable to come up with the correct pattern for extracting the exact strings vwe1 and 20180e3103
It will be really helpful if you can guide me on this or if there is a better approach to achieve this?
Please note: [^-.]+ may give me all the words separately but I am looking for an option where I have a group defined for each of these strings so that its one to one mapping.
Thanks!
To get vwe1 or 20180e3103 from the example data, you might use a quantifier {2} or {3} to repeat matching one or more word charcters followed by a hyphen (?:\w+-){2}.
Then you could capture in a group ([^-.]+) matching not a hyphen or a dot.
(?:\w+-){2}([^-.]+)
Try the below regex
/\-([^\)]+)\-/gmi;
Also check the similar implementation:
https://stackoverflow.com/a/50336050/8179245

RegEx pickup words after specific word

Having a bit of the RegEx brain fart, and could really do with some kind assistance if anyone has time please?
I would like to pick up all words for URL after domain name.
For example:
http://www.bbc.co.uk/programmes/b08y26qp
Should return: programmes,b08y26qp
I have got this far:
[a-z][a-z0-9]*
But how do I qualify to begin returning words after http://www.bbc.co.uk/?
Very many thanks!
Using $ you bind the regex to the end of the line. In this case it does matter what's in the begining.
Using () you can specify groups. This allows to retrieve results easily.
This regex applied to http://www.bbc.co.uk/programmes/b08y26qp
([A-Za-z0-9]+)\/([A-Za-z0-9]+)$
results in:
Group 1: programmes
Group 2: b08y26qp
See this example also in regex 101: https://regex101.com/r/YkUHk5/1/
You just need to prepend http://www.bbc.co.uk/ as string literal to your regex. You should also use the string start anchor (^) to reduce work on a failed match (^http:\/\/www\.bbc\.co\.uk\/)
Online
You can go to https://regex101.com/, and just add a \ before each (non grey) highlighted character until the whole regex only has grey highlights.
Java
In Java, just let Pattern.quote(string) and Matcher.quoteReplacement(string) do the escaping for you.
Of course, if you have a programming language, Something like this would be better. urlString.substring("http://www.bbc.co.uk/".length()+1).split("/")

Negate the regex in this string

I have had my head scratching over a simple, but complicated problem for me. And have been trying to find solution as well as doing hit and trials since 5 hours, unfortunately not able to solve.
There is a string which is like "Dept #809 something something something and so on", I need to exclude "Dept #809", and in place of 809 it could be any number 1 to 3 characters long. I am able to match this string using this regex /^(Dept #(\d{1,3}))/, but I simply want to exclude this. Have done most of the things, but not able to do :(.
Please help me out!
If you cannot use plain JavaScript code and depend on a single regex pattern to extract the part of string you need, use a regex to match the beginning of your input, and match and capture the rest of the string.
You may do it with
/^Dept #\d+(.*)/
Or - if there may be line breaks:
/^Dept #\d+([\s\S]*)/
See the regex demo.
In both case, Dept #<DIGITS> is matched at the string start, and the rest is captured into Group 1 that you should be able to access after the match is found.

How to capture the word between is after certain text after end with some text in regex?

I would like to find something like this:
-(IBOutlet)UIView *aView;
I would like to find aView, something that I can confirm is -(IBOutlet) must be a prefix, but it comes with not ensure a space or another string, after that, we need to string that must begin with '*', until it match the ;.
So, my regex look like that:
(IBOutlet)*\*?;
For sure, it can't capture what I want. Any advise?
You just have to build it up incrementally. The best reference that I have found (by far) is http://www.regular-expressions.info. After learning the basics, you can then use one of many online pattern matching tools, here is one:
https://regex101.com
With that, your goal is easily determined (with some allowances for free space):
^\s*-\s*\(IBOutlet\)(\w*)\s*(\*\w*)
First problem: you don't have a capturing group so how do you get aView back after the match?
Second, the \*? means "match the * character literally, 0 or 1 times", which I guess isn't what you want either.
Try this pattern:
(IBOutlet)*\*(.+);
RegEx 101 can explain what each component means.