Regex: match only line where numbers are located - regex

Really tired of this regex. So many combinations.... I believe I need another brain :-)
Here is my problem and if someone help, I'd be highly appreciated.
I have those 6 lines of JSON response
...some JSON code here
"note" : "",
"note" : "Here is my note",
"note" : "My note $%$",
"note" : "Created bug 14569 in the system",
"note" : "Another beautiful note",
"note" : "##$%##%dgdeg"
...continuation of the JSON code
With the help of Regex, how do I match number 14569 only?
I have tried this regex, but it matches all 6 lines
"note"([\s\:\"a-zA-Z])*([0-9]*) - 6 matches (I only need one)
"note"([\s\:\"a-zA-Z])*(^[0-9]*) - no matches
"note"([\s\:\"a-zA-Z])*([0-9]*+?) - pattern error
"note"([\s\:\"a-zA-Z])*(^[0-9]*+#?) - no match
Thanks for you help!
Updated for Matt. Below is my full JSON object
"response": {
"notes": [{
"note" : "",
"note" : "Here is my note",
"note" : "My note $%$",
"note" : "Created bug 14569 in the system",
"note" : "Another beautiful note",
"note" : "##$%##%dgdeg"
}]
}

You could try this regex:
"note"\s*:\s*".*?([0-9]++).*"
It will give you the number in group 1 of the match.
If you don't want to match numbers that are part of a word (e.g. "bug11") then surround the capture group with word boundary assertions (\b):
"note"\s*:\s*".*?\b([0-9]++)\b.*"
Regex101 demo

If all that you care about is that the line includes a number, then that is all you need to look for.
/[0-9]/ # matches if the string includes a digit
Or, as you want to capture the number:
/([0-9]+)/ # matches (and captures) one or more digits
This is a common error that I see when beginners build regular expressions. They want to build a regex that matches the whole string - when, actually, they only need to match the bit of the string that they want to match.
Update:
It might help to explain why some of your other attempts failed.
"note"([\s\:\"a-zA-Z])*([0-9]*) - 6 matches (I only need one)
The * means "match zero or more of the previous item", effectively making the item optional. This matches all lines as they all contain zero or more digits.
"note"([\s\:\"a-zA-Z])*(^[0-9]*) - no matches
The ^ means "the next item needs to be at the start of the string". You don't have digits at the start of your string.
"note"([\s\:\"a-zA-Z])*([0-9]*+?) - pattern error
Yeah. You're just adding random punctuation here, aren't you? *+? means nothing to the regex parser.
"note"([\s\:\"a-zA-Z])*(^[0-9]*+#?) - no match
This fails for the same reason as the previous attempt where you use ^ - the digits aren't at the start of the string. Also, the # has no special meaning in a regex, so #? means "zero or one # characters".

If you have JSON, why don't you parse the JSON and then grep through the result?
use JSON 'decode_json';
my $data = decode_json( $json_text );
my #rows = map { /\b(\d+)\b/ ? $1 : () } # keep only the number
map { $_->{note} } #$data;

This might work (?m-s)^[^"\r\n]*?"note"\h*:\h*"[^"\r\n]*?\d+[^"\r\n]*".*
https://regex101.com/r/ujDBa9/1
Explained
(?m-s) # Multi-line, no dot-all
^ # BOL
[^"\r\n]*? # Not a double quote yet
"note" \h* : \h* # Finally, a note
" [^"\r\n]*? \d+ [^"\r\n]* " # Is a number embedded within the quotes ?
.* # Ok, get the rest of the line

Related

SCALA regex: Find matching URL rgex within a sentence

import java.util.regex._
object RegMatcher extends App {
val str="facebook.com"
val urlpattern="(http://|https://|file://|ftp://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?"
var regex_list: Set[(String, String)] = Set()
val url=Pattern.compile(urlpattern)
var m=url.matcher(str)
if (m.find()) {
regex_list += (("date", m.group(0)))
println("match: " + m.group(0))
}
val str2="url is ftp://filezilla.com"
m=url.matcher(str2)
if (m.find()) {
regex_list += (("date", m.group(0)))
println("str 2 match: " + m.group(0))
}
}
This returns
match: facebook.com
str 2 match: url is ftp:
How do I manage the regex pattern so that both the strings are matched well.
What do the symbols actually mean in regex. I am very new to regex. Please help.
I read your regex as:
0 or 1 (? modifier) of the schemes (http://, https://, etc.)
followed by 0 or 1 instance of www.,
followed by 1 or more (+ modifier ) alphanumeric characters ,
followed by any character ( . is a regex special character, remember, standing for any one character),
followed by 0 or more (* modifier) alphanumerics,
followed by any character (. again)
followed by 3 lowercase letters ({3} being an exact count modifier)
followed by 0 or 1 of any character (.?)
followed by one or more lowecase letters.
If you plug your regex into regex101.com, you'll not only see a similar breakdown ( without any errors I might have made, though I think i nailed it), and you'll also have a chance to test various strings against it. Then, once you have your regexes working the way you want, you can bring them back to your script. It's a solid workflow for both learning regexes and developing an expression for a particular purpose.
If you drop your regex and your inputs into regex 101, you'll see why you're getting the output you see. But here's a hint: when you ask your regular expression to match "url is ftp://filezilla.com", nothing excludes "url is" from being part of the match. That's why you're not matching the scheme you want. Regex101 really is a great way to investigate this further.
The regex can be updated to
((ftp|https|http?):\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,})
This is all I needed.

Regex Lookahead/Lookbehind if more than one occurance

I have string formulas like this:
?{a,b,c,d}
It can be can be embedded like this:
?{a,b,c,?{x,y,z}}
or this is the same:
?{a,b,c,
?{x,y,z}
}
So I have to find those commas, what are in the second and greather "level" brackets.
In the example below I marked the "levels" where I have to find all commas:
?{a,b,c,
?{x,y, <--Those
?{1,2,3} <--Those
}
}
I've tried with lookahead and lookbehind, but I'm totally confused now :/
Here is my latest working try, but it is not good at all:
OnlineRegex
Update:
To avoid misunderstanding, I don't want to count the commas.
I'd like to get groups of commas to replace them.
The condition is find the commas where more than one "open tags" before it like this: ?{
.. without closing tag like this: }
Examlpe.:
In this case I have not replace any commas:
?{1,2,3} ?{a,b,c}
But in this case I have to replace commas between a b c
?{1,2,3,?{a,b,c}}
For the examples which you have provided, the following regex works(gives the desired output as mentioned by you):
(?<!^\?{[^{}]*),(?=[\s\S]*(?:\s*}){2,})
For String ?{a,b,c,d}, see Demo1 No Match
For String, ?{a,b,c,?{x,y,z}}, see Demo2 Match successful
For String,
?{a,b,c,
?{x,y,z}
}
see Demo3 Match Successful
For String,
?{a,b,c,
?{x,y,
?{1,2,3}
}
}
see Demo4 Match Successful
For String ?{1,2,3} ?{a,b,c} ?{1,2,3} ?{a,b,c}, see Demo5 No Match
Explanation:
(?<!^\?{[^{}]*), - negative lookbehind to discard the 1st level commas. The logic applied here is it should not match the comma which is preceded by start of the string followed by ?{ followed by 0+ occurrences of any character except { or }
(?=[\s\S]*(?:\s*}){2,}) - The comma matched above must be followed by atleast 2 occurrences of }(consecutive or having only whitespaces between them)
Your question is rather unclear #norbre, but I presume you'd like to extract (i.e. "count") the number of commas.
You can't do this with a regex. Regexps can't count number of occurences. However, you can use this to extract the "internal part" and then use a spreadsheet formula to count number of commas:
^(?:\?{[a-zA-Z0-9,]+?,\n??\s*?\?{)([a-zA-Z0-9,?{}\n\s]+?(?:\n*?\s*?|})+)(?:[a-zA-Z0-9,\n\s]*})$
Try: https://regex101.com/r/Rr0eFo/5
Examples
1.
Input:
?{a,b,c,?{e,f},1,2,3}
Output:
e,f}
2.
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Output:
x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
Input:
?{a,b,c,?{e},1,2,3}
Output:
e}
(note that there are no commas here!)
One caveat however. As I have said, regexps can't count number of occurences.
Hence, the following sample (don't know if it's valid or not for your case) would return wrong match:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Output:
e,f}
,1,2,3,?{a,b}
OK replacing commas is another story so I'll add another answer.
Your regexp engine would need to support recursion.
Still I don't see a way to do it with one regex - one match would either contain the first comma or contain everything between the braces!
What I suggest is to use one regexp to get "what is inside the inner braces", run a replace (, => "") and assemble the whole line again using submatches from the regexp.
Here it is: (\?{[^?{}]*)((?>[^?{}]|(?R))+?)([^?{}]*?\})
Try: https://regex101.com/r/IzTeY0/3
Example 1:
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Submatches:
1. ?{a,b,c,
2. ?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
,d,e,f}
Replace all commas in submatch 2 with anything you want, then reassamble the whole string using submatches 1 and 3.
Again, this would break the regexp:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Submatch 2 would look like this:
?{e,f}
,1,2,3,?{a,b}

Regex to match 3-digit number and signs and no match

I've been playing with a regex definition (language "Basic") but cannot get it to work.
I will delete my previous post on the same matter when I get a solution.
The regex shall:
MATCH:
"400:-"
"200:-"
"588:-"
"999:-"
BUT NO MATCH:
"1 200:-"
"o 100:-"
"1400:-"
"y 800:-"
"400"
"i 588:-"
Why does this regex not work?
(^[0-9]?[0-9]?[0-9]:-$)
Just try with following regex:
^\d{3}:-$
Your regular expression does work, just remove the optional quantifier ? and place your beginning/ending line anchors outside of your capturing group. It could be simplified to the following.
^([0-9]{3}:-)$
Try
"^[0-9]{3}:-"
That tells it to find any number between 0 and 9 three times, at the beginning of the string, followed immediately by ":-"
If you don't want it to check just the beginning then
bool check;
Regex reg = new Regex("[0-9]{3}:-");
check = reg.IsMatch("400:-"); // true
check = reg.IsMatch("40:-"); // false
check = reg.IsMatch("asdf400:-"); // true
But this will make it match the ones you don't want matched.

Confusion in regex pattern for search

Learning regex in bash, i am trying to fetch all lines which ends with .com
Initially i did :
cat patternNpara.txt | egrep "^[[:alnum:]]+(.com)$"
why : +matches one or more occurrences, so placing it after alnum should fetch the occurrence of any digit,word or signs but apparently, this logic is failing....
Then i did this : (purely hit-and-try, not applying any logic really...) and it worked
cat patternNpara.txt | egrep "^[[:alnum:]].+(.com)$"
whats confusing me : . matches only single occurrence, then, how am i getting the output...i mean how is it really matching the pattern???
Question : whats the difference between [[:alnum:]]+ and [[:alnum:]].+ (this one has . in it) in the above matching pattern and how its working???
PS : i am looking for a possible explanation...not, try it this way thing... :)
Some test lines for the file patternNpara.txt which are fetched as output!
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
Looking at your screenshot it seems you're trying to match email address that has # character also which is not included in your regex. You can use this regex:
egrep "[#[:alnum:]]+(\.com)" patternNpara.txt
DIfference between 2 regex:
[[:alnum:]] matches only [a-zA-Z0-9]. If you have # or , then you need to include them in character class as well.
Your 2nd case is including .+ pattern which means 1 or more matches of ANY CHARACTER
If you want to match any lines that end with '.com', you should use
egrep ".*\.com$" file.txt
To match all the following lines
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
^[[:alnum:]].+(.com)$ will work, but ^[[:alnum:]]+(.com)$ will not. Here is the reasons:
^[[:alnum:]].+(.com)$ means to match strings that start with a a-zA-Z or 0-9, flows two or more any characters, and end with a 'com' (not '.com').
^[[:alnum:]]+(.com)$ means to match strings that start with one or more a-zA-Z or 0-9, flows one character that could be anything, and end with a 'com' (not '.com').
Try this (with "positive-lookahead") :
.+(?=\.com)
Demo :
http://regexr.com?38bo0

regex to find inner most occurrence of strings between two delimiters

I am using TextCrawler *regxp* to align existing plain text file.
Text inside the file are continuous without line break.
....moredata....
,actor's list:
Amy Brenneman, Aaron Eckhart, Catherine Keener, Natassja Kinski
, Jason Patric, Ben Stiller,
movies released:
Gladiator,Matrix Reloaded,The Shawshank Redemption,Pirates of the Caribbean
- Curse of the Black Pearl,Monsters Inc,
genre:
SciFi,Romance,Drama,Action,Comedy,Advenure,Animated,Western,Horror
....moredata....
I am trying to find the string(s) between the comma and the colon and replace with the same but with new line added before found pattern.
I tried following, but it matching string form outermost comma to colon.
[,]{1}.[A-Z].*[:]
Any idea on the same ? Where i went wrong?
Why not use this pattern:
search: (?<=,)[^,:]+(?=:)
replace: \n$0
pattern details:
(?<=,) # lookbehind assertion: only a check that means "preceded by ,"
[^,:]+ # negated char class: all characters except , and :
(?=:) # lookahead assertion: only a check that means "followed by :"
Lookarounds are only tests that can make the pattern fail or succeed, they are not part of the match result.
The below mentioned pattern works:
Search Pattern : (,?[^:,]+:)
Replacement String : \n\1\n
For eg:
Given a file a.txt with contents :
actor's list:A,B,C,movies released:D,E,F,genre:G,H,I
perl -pe "s#(,?[^:,]+:)#\n\1\n#g" a.txt
The above command produces a output of the below format :
actor's list:
A,B,C
,movies released:
D,E,F
,genre:
G,H,I
I hope the the above output is what you are expecting.