Regex Match terms in between delimiters - regex

I'd say I'm getting the hang at Regex but when it comes to extracting data, I'm lost. Here are the inputs I have to parse through:
Format:
String(String,...String,Integer)
Ex.
Jeff(White,Male,24)
Mark Zuckerberg(Facebook,9)
Grocery(Eggs,Cheese,Pancake,Bread,Milk,Strawberry,0)
I want to match the Strings and Integer, but not the commas or parenthesis.
This one is is a bit easy because the strings don't have symbols in them but the other day I needed to extract the word cake out of something like this:
<Header><Body><font=Tahoma,15pt><b>cake <\b><\font> and whenever I'd try, I'd match the entire statement, not just the cake word, because I'd do like:
.*<b>[a-zA-Z]+<\b>.*. So yeah... the whole concept of using Regex to extract bits of a string is foreign to me. How is it usually done in these two examples?

Try following .
(?<=<b>)\s*\cake\s*(?=<\\b>)
If you want to match word other than cake, try following.
(?<=<b>)\s*\w+\s*(?=<\\b>)
Regex to match string in first part of your Question (String(string, ... ,number))
^\w+\((\w+,)+\d\)$
In the first part of your Question, if you like to match only words and number (Grocery,Eggs, ... ,0) in your string, try following
(?<=^|\(|\,)\w+

Related

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

Regex for any number of alphanumeric phrases between two '.'

I'm having a hard time trying to phrase this question correctly when researching solutions, so I thought I would ask here. I'm trying to validate a field in my UI that a user will enter in a "Java-package" format string. So a correct example would be "com.my.app.class1". However, it needs to be the full package path, so I don't want to accept '*' in the string. I'm trying to find a way to represent this in regex to validate it. My first thought is to split the string into pieces using a . as the delimiter (var splitArray : any[] = packageInput.split('.')), then iterating over the array and check for the correct regex. However, I wanted to know if I could do it all in one regex phrase.
Something as simple as ^\w+(\.\w+)*$ will validate strings of the type you've described, as long as they contain alpha, digits, or _.
It matches all of:
class1
com.my.class1
com.my.app.class1
com.my.app.sub.class1
and doesn't match:
com.my.app.*

Perl regex to match only if not followed by both patterns

I am trying to write a pattern match to only match when a string is not followed by both following patterns. Right now I have a pattern that I've tried to manipulate but I can't seem to get it to match correctly.
Current pattern:
/(address|alias|parents|members|notes|host|name)(?!(\t{5}|\S+))/
I am trying to match when a string is not spaced correctly but not if it is part of a larger word.
For example I want it to match,
host \t{4} something
but not,
hostgroup \t{5} something
In the above example it will match hostgroup and end up separating it into 2 separate words "host" and "group"
Match:
notes \t{4} something
but not,
notes_url \t{5} something
Using my pattern it ends up turning into:
notes \t{5} _url
Hopefully that makes a bit more sense.
I'm not at all clear what you want, but word boundaries will probably do what you ask.
Does this work for you?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5})/
Update
Having understood your problem better, does this do what you want?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5}(?!\t))/

How to extract big mgrs using regex

I have an input json:
{"id":12345,"mgrs":"04QFJ1234567890","code":"12345","user":"db3e1a-3c88-4141-bed3-206a"}
I would like to extract with regular expression MGRS of 1000 kilometer, in my example result should be: 04QFJ1267
First 2 symbols always digits, next 3 always chars and the rest always digits. MGRS have a fix length of 15 chars at all.
Is it possible?
Thanks.
All you really need to do is remove characters 8-10 and 13-15. If you want/need to do that using regex, then you could use the replace method with regex: (EDIT Edited to remove the rest of the string).
.*?(\w{7})\d{3}(\d{2})\d+.*
and replacement string:
$1$2
I see now you are using Java. So the relevant code line might look like:
resultString = subjectString.replaceAll(".*?(\\w{7})\\d{3}(\\d{2})\\d+.*", "$1$2");
The above assumes all your strings look like what you showed, and there is no need to test to be sure that "mgrs" is in the string.

Regex: match everything before FIRST underscore and everything in between AFTER

I have an expression like
test_abc_HelloWorld_there could be more here.
I'd like a regex that takes the first word before the first underscore. So get "test"
I tried [A-Za-z]{1,}_ but that didn't work.
Then I'd like to get "abc" or anything in between the first 2 underscores.
2 Separate Regular expressions, not combined
Any help is very appreciated!
Example:
for 1) the regex would match the word test
for 2) the regex would match the word abc
so any other match for either case would be wrong. As in, if I were to replace what I matched on then I would get something like this:
for case 1) match "test" and replace "test" with "Goat".
'Goat_abc_HelloWorld_there could be more here'
I don't want a replace, I just want a match on a word.
In both case you can use assertions.
^[^_]+(?=_)
will get you everything up to the first underscore of the line, and
(?<=_)[^_]+(?=_)
will match whatever string is located between two unserscores.
Step back and consider that maybe you're overengineering the solution here. Ruby has a split method for this, other languages probably have their own equivalents
given something like this "AAPL_annual_i.xls", you could just do this and take advantage of the fact that your data is already structured
string_object = "AAPL_annual_i.xls"
ary = string_object.split("_")
#=> ["AAPL", "annual", "i.xls"]
extension = ary.split(".")[1]
#=> ["xls"]
filetype = ary[3].split(".")[0] #etc
'doh!
But seriously, I've found that leaning on the split method is not only easier on me, it's easier on my associates who have to read my code and understand what it does.