Regex number between slash [duplicate] - regex

This question already has answers here:
Regex to match a C-style multiline comment
(8 answers)
Closed 3 years ago.
I have a lot of lines with mark like
/* 1 */
/* 2 */
....
/* 1000 */
I want to replace them by comma. I came up with a simple regex to use on Notepadd++
\/(.*?)\/
Works fine, but sometimes some lines has txt like this and matches the regex when should not
de produtos / trazendo inputs qualitativos / estratégicos para a marca
------------^-------------------------------^----------------------
I am trying to use /* instead of just / but with no success!
Any suggestion?

To be able to match /* ... */ blocks, you may use this regex:
\/\*.*?\*\/
Since * is meta-character in regex, it needs to be escaped as well.
Also it is required to use lazy quantifier .*? to avoid matching across the blocks.

The following should do the work
\/\*[\d\s]+\*\/
It will match first opening comment, then either digit or space multiple times and then closing comment

Related

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

Regex function to find all and only 6 digit numeric string ignoring spaces if any any between [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I have HTML source page as text file.
I need to read file and find out only those numeric strings which have 6 continous digits and can have a space in between those 6 digits
Eg
209 016 - should be come up in search result and as 400013(space removed)
209016 - should also come up in search and unaltered as 209016
any numeric string more then 6 digits long should not come up in search eg 20901677,209016#223, 29016,
I think this can be achieved by regex but I was not able to
A soln in regex is more desirable but anything else is also welcome
To match 6 digits with any number of spaces in between, you may use the following pattern:
\b(?:\d[ ]*?){6}\b
Or if you want to reject it when it's followed by an #, you may use:
\b(?:\d[ ]*?){6}\b(?!#)
Regex demo.
Then, you can use the replace method to remove the space characters.
Python example:
import re
regex = r"\b(?:\d[ ]*?){6}\b(?!#)"
test_str = ("209016 \n"
"209 016\n"
"20901677','209016#223', '29016")
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
print (match.group().replace(" ", ""))
Output:
209016
209016
Try it online.
You can try the following regex:
\b(?<!#)\d(?:\s*\d){5}\b(?!#)
demo: https://regex101.com/r/ZCcDmF/2/
But note that you might have to modify your boundaries if you need to exclude more than the #. it will become something like:
\b(?<!#|other char I need to exclude|another one|...)\d(?:\s*\d){5}\b(?!#|other char I need to exclude|another one|...)
where you have to replace other char I need to exclude, another one,... by the characters.

How to filter out c-type comments with regex? [duplicate]

This question already has answers here:
Regex to match a C-style multiline comment
(8 answers)
Improving/Fixing a Regex for C style block comments
(5 answers)
Strip out C Style Multi-line Comments
(4 answers)
Closed 3 years ago.
I'm trying to filter out "c-style" comments in a line so i'm only left with the words (or actual code).
This is what i have so far: demo
regex:
\/\*[^\/]*[^\*]*\*\/
text:
/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/
My guess is that this expression might likely work,
\/\*(\/\*\*\/)?\s*([^\/*]+?)\s*(?:\/?\*?\*?\/|\*)
or we would modify our left and right boundaries, if we would have had different inputs.
In this demo, the expression is explained, if you might be interested.
We can try doing a regex replacement on the following pattern:
/\*.*?\*/
This matches any old-school C style comment. It works by using a lazy dot .*? to match only content within a single comment, before the end of that comment. We can then replace with empty string, to effectively remove these comments from the input.
Code:
Dim input As String = "/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/"
Dim output As String = Regex.Replace(input, "/\*.*?\*/", "")
Console.WriteLine(input)
Console.WriteLine(output)
This prints:
one two three four five

Regexp for string stating with a + and having numbers only [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
I have the following regex for a string which starts by a + and having numbers only:
PatternArticleNumber = $"^(\\+)[0-9]*";
However this allows strings like :
+454545454+4545454
This should not be allowed. Only the 1st character should be a +, others numbers only.
Any idea what may be wrong with my regex?
You can probably workaround this problem by just adding an ending anchor to your regex, i.e. use this:
PatternArticleNumber = $"^(\\+)[0-9]*$";
Demo
The problem with your current pattern is that the ending is open. So, the string +454545454+4545454 might appear to be a match. In fact, that entire string is not a match, but the engine might match the first portion, before the second +, and report a match.

Regexp - Match any character except "Something.AnyChar" [duplicate]

This question already has answers here:
Need a regex to exclude certain strings
(6 answers)
Closed 9 years ago.
I have a string:
Input:
"Feature.. sklsd " AND klsdjkls 9290 "Feass . lskdk SDFSD __ ksdljsklfsd" NOT "Feuas" "Feature.lskd" OR PUT klasdkljf al9- .s.a, 9a0sd90209 .a,sdklf jalkdfj al;akd
I need to match any character except OR, NOT, AND, "Feature.any_count_of_characters"
the last one is important this start with: "Feature.
This is followed by any number of characters and then ends with: " character.
I'm trying to solve this using lookahead or lookbehind but I can get the expected output, only a portion of characters that I don't want.
My expected output is
"Feature.. sklsd " AND klsdjkls 9290 "Feass . lskdk SDFSD __ ksdljsklfsd" NOT "Feuas" "Feature.lskd" OR PUT klasdkljf al9- .s.a, 9a0sd90209 .a,sdklf jalkdfj al;akd
All that is in black.
To test it i'm using these links:
http://gskinner.com/RegExr/
http://regexpal.com/
Thanks.
EDIT
Check this link http://regexr.com?37v36
inside the link i get matched some expression. But i don't need the expression that matched. i need the inverse, how i can get it?
Thanks.
Just use
\s*(?:AND|OR|NOT|"[^"]+")\s*
but do a replace operation. That will leave what you want.
Your basic problem is that look behinds can not have arbitrary lengths, but you need that. There are work arounds, but a simpler approach is to use a capturing group:
"Feature\.[^"]*" (?:OR|NOT|AND) ([^"])
And your target will be in group 1 of the match.