regex - Removing text from around numbers in Notepad++ - regex

I have a large subset of data that looks like this:
MyApp.Whatever\app.config(115): More stuff here, but possibly with numbers or parenthesis...
I'd like to create a replace filter using Notepad++ that would identify and replace the line number "(115):" and replace it with a tab character followed by the same number.
I've been trying filters such as (\(\d+\):) and (\(\[0-9]+\):), but they keep returning the entire value in the \1 output.
How would I create a filter using Notepad++ that would successfully replace (115): with tab character + 115?

Use a quantifier.. (\(\d+?\):) where the ? will prevent it from being greedy. Also, since everything is in a () it will group it all and treat it as \1 ..
If it was in perl I'd say \((\d+?)\): which should match only the inner part.
Edit:
Just talked with my colleague - he said s/\((\d+)\)/\t\1/ and if you needed app config in front you could just put that in the front.

this should work for your needs
replace
\((\d+)\):
with
\t$1

Replacing (\(\d+\):) with \t\1 will keep the parenthesis and the colon since you've included them in the group (the outer parenthesis), and I think that's what you mean by "they keep returning the entire value."
Instead of escaping those inner parenthesis, escape the outer ones like the other answers have suggested: \((\d+)\): - this says to match a left paren, then match and capture a group of digits, then match a right paren and a colon. Replacing that with \t\1 will get rid of the parens and colon that were not in the captured group.

Related

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

RegEx help for NotePad++

I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));

Regex - Replace everything attached to an expression, but not the expression itself

If it matters, I'm working with Python/R for this particular script, but I think this should be a general regex question.
I have something along the format of
"_id" : ObjectID("34z83b3853e820x583203"),
This happens millions of times in a particular file. I want to convert all of these to
"_id" : "34z83b3853e820x583203",
The catch is, I can't just replace any "), with ", as there may be other instances in the file.
Replacing ObjectID(" with " should be trivial.
So essentially, I have to find where there is 15+ character AND numbers mixed, immediately followed by "),
Once found, I need to preserve that string, and just delete the ).
Is there a good way to go about this that I'm missing? Finding an expression and preserving pieces of it?
My initial impression was to use a lookbehind
(?<=[a-zA-Z0-9]{15,}")\)
In hopes that this would look for a ) that is proceeded by a string of 15+ alphanumeric characters, however
1) I do not believe this means it has to be alpha AND numeric, just alpha or numeric or both.
2) It's not catching the desired parenthesis regardless.
You can do both steps together (replacing opening ( and closing parentheses ))
Regex: ObjectID\((\"[a-zA-Z0-9]{15,}\")\)
(\"[a-zA-Z0-9]{15,}\") is the first capturing group and includes the quotes and the alphanumeric characters between which have a rule of 15 or above like you've mentioned. Since this is the first capturing group it is represented by $1
ObjectID\( is the literal ObjectID followed by the opening parentheses \(
\) is the closing parentheses at the end
Replace with: $1
Regex101 Demo
Hope this helps!

Regex Expression to allow comma only inside a string (within quotes)and not outside it

I am kind of new to regex. I am looking for a regex expression to add it as a constraint not to allow comma outside a string .
My input is like
"1,212121,121212","Extra_data"
Here the regex expression should not check for comma in the first value within quotes "1,212121,121212" but should check after the quotes including ,"Extra_data" . In short expression should allow comma in a string only inside quotes and not outside.
Kindly help me with the expression.
I think this is what you're looking for, essentially a group of numbers or commas surrounded by parentheses then followed by comma and another phrase (not necessarily numbers) in parentheses. Capturing group #1 gives you "1,212121,121212" and capturing group #2 gives you ,"Extra_data"
("[\d,]+")(,"[^"]+")
It would be helpful to see more of how your input might come in. I think that the biggest question that remains is whether that first group always contain only numbers/commas, or are there sometimes other characters such as letters, underscores, etc in that first group? If that first group contains only numbers, as I've assumed, then this should work. If it doesn't, then this will not work.
Edit:
"\s*(,\s*"[^"]+")
try this
".*?(?=,).*?"
it only extract comma in a string only inside quotes
Try the following regex:
"[^"]*"(,)[^"]*"[^"]*"
It will capture the commas you need. But note that PHP has no support for captures of the same groups. i.e. in your case:
If the input is : "1,212121,121212","Extra_data","hel,lo","a,bc"
It will capture commas before "Extra_data" and "a,bc" but will exclude the comma before "hel,lo". For that you'll have to use recursion.
You can try using this regex.
(^,)|("\s*,\s*")|(,$)
If you find any match for this regex, then the string will be invalid.

Regex match the space bewtween matches as well?

I'm not new to regex (or SO), but I can't seem to find a solid solution for matching the leftover spaces between matches.
For instance, I want to know what is inside quotes, and what is not, and do things to both.
Getting quotes is easy: (\".+?\"|'.+?') = quoteMatch
but making another match group to select everything else is not.
The closest I've gotten is quoteMatch+'|(.)'. This will separate my quote groups from my everything else groups, but it doesn't group together the 'else groups.
Trying quoteMatch+'|(.+)' selects everything together and quoteMatch+'|(.+?)' puts me back a step.
I imagine I need to find a way to make the first match more greedy than the second, but anything I do to make it greedy makes it start taking over multiple quotes and the things in between (ie. match = "quote1" things in between "quote2".
I've also looked into using the split function, but it doesn't return what the split was, and is not quite as eloquent of a solution as I imagine must exist.
Thank you for any help.
Move the match for selecting the other character to the inside of the capturing group as an alternation:
(\".+?\"|'.+?'|.+?(?=["']|$))
Then you can use a positive lookahead such as (?=["']|$) in order to match until a quote or the end of the line.
Live Example
In doing so, an input of:
before quotes "quote1" in between quotes "quote2" after quotes
Would return:
(before quotes ), ("quote1"), ( in between quotes ), ("quote2"), ( after quotes)
As a side note, you can also combine the first two alternations by using a backreference to close the quote:
((['"]).+?\2|.+?(?=["']|$))