Match all spaces after a particular string - regex

Doing a find and replace in VsCode on a large amount of files. I'm looking to replace all spaces after a set of quotes, but only on a specific line.
I can very easily find all spaces using \s+, but I don't understand how to capture only the spaces after a specific string(one specific line). I've tried positive look behinds, but I can only get it to match the first space, but I need to match all spaces on that line.
Example code:
variable = "01 - Testing this thing"
I need to find and replace all the spaces between the quotation marks with underscores, but I can't get any regex to match all the spaces between the quotes. I might want to replace the dash(-) as well, but the spaces are more important and I'm struggling to figure it out.

Here is a pretty good workflow.
Open a Search Editor (from the Command Palette or set a keybinding to it).
Use this regex (?<=variable = ")[^"]*.
That will find all matches in all files in your workspace or whatever folders you designate in the file to include filter. I suggest setting the context lines option to 0.
Ctrl+Shift+L to select all your matches. The matches are the 01 - Testing this thing part.
Now do a regular find in that search editor tab - with the Find in Selection option enabled.
Simply doing a find of and replaceAll with _ will make all those changes (in the Search Editor only).
To apply those changes to all the files with your initial search results, use the extension search-editor-apply-changes Apply Search Editor Changes... command.
Then you can check to see if the changes were as you expected and save all. It will open all affected files so you can inspect them.
Seems like a few steps but notice the first regex can be very simple. And then you are doing a simple find/replace in just those selections. Demo:

You search for a string that matches, it has A space between the quotes. Replace with what is before and after the space but the space is now an underscore. You have to apply this as often as the max number od spaces in a string. It can't be done in 1 regex search-replace.
In the Search Bar
Find Regex:
(variable = "[^" ]*) ([^"]*")
Replace:
$1_$2
Then apply Replace All (button) and Refresh (button) until no more searches found.

Related

Regex NotePad++ or batch script to find and replace double bracketed text with CR LF -- would prefer NP++

I managed to do most of my conversion in VBA Macro (Word > txt) but some changes were made also that I could not forego or get around. Unfortunately, I had not been in the habit of using styles and precise formatting in my docs... (Which is why a PanDoc conversion did not "pan" out well, if you'll excuse the pun.)
In my docs, I was using bold text/lines for in-text titles (not Heading 2 alas) but as I was converting mid-sentence one or two-word bold phrases into phrases to go between double square brackets, the makeshift titles/headings were also changed to [[some title]] format in the process.
With Find and Replace (a batch script that goes through all files in a folder would also do), I would like to search for each and any number of instances of CRLF [[some title CRLF]]CRLF and replace the brackets with ** (to make the title bold), or perhaps ## to make the headings I was missing back in MS Word (I would of course need the line breaks as well).
For better understanding, please see attached picture here:
I am fairly sure that all instances are similarly syntaxed. If not, I may be able to tailor your regex code to differing instances later on.
As you can see, I was trying to do it in two steps but that's not good, because the second step (which I couldn't even get right) would propably have altered other texts I need intact (there must be sentences that start with double brackets after CRLF).
I would need the two steps in one so that only the targeted double bracketed text would be changed to bold or Heading 2.
Basically what I could not do is: find the proper regex solution for matching double CRLF-ed and square-bracketed text for any number of words than may occupy more than one line and starts with a capital letter. I would need an empty line above and below the title as indicated in the image (the VBA macro somehow made two instances of CRLF and carried the brackets to a new line, which I do not like, either).
EDIT.
In the meantime I managed to cook something up but now I couldn't insert the CRLF in front of the match string. At this point this is not enough as other instances are also changed, even lowercase in-line items, for some reason...
Regex:
\[\[([A-Z][\S\s]+?)\]\]
Substitution:
## $1\r\n
https://regex101.com/r/mH6B9N/1
Since then, I made improvements towards what I wanted (I had to test in NotePad++ and not Regex101, for different results), but now in multiple documents I have found match across spill-over lines, as described in here:
Single line regex search in Notepad++
Is it possible that I cannot do what I want? The problem is having non-title text strings having line-break, double brackets and capitalized letters.
What it looks like in other documents:
See here.
I circled around with red in image for clarification. See also:
https://regex101.com/r/8XsIGx/1
Is it possible to match a certain word like "címnél" and not execute on that match if that word is present in a line?
Thanks very much in advance,
F.
You can use
(?s)\R\K\[\[((?:(?!\[\[|]]).)*)\R*]](?=\R)
Replace with ## $1. See the regex demo.
Details:
(?s) - equivalent of the . matches newline option
\R - a line break sequence
\K - omit the text matched so far (the newlines)
\[\[ - a [[ text
((?:(?!\[\[|]]).)*) - Group 1: any char, as many as possible occurrences, that does not start a [[ or ]] char sequence
\R* - zero or more line breaks
]] - a ]] text
(?=\R) - immediately to the right, there must be a line break.

Regex in search & replace: avoid fixed length of lookaround

In a long corpus of text, I want to make some corrections in certain
environments. However, I am encountering problems when using regex with text
editors. I switched to gedit to have an editor which supports regex in
search & replace.
Crucially, I only want to make changes if the line starts with a certain
pattern (\nm or \mb). The problem is that the element that I want to
replace (o' -> o'o) is not at a fixed length from the beginning of the line
and I can't include the regex in the lookbehind (the lookbehind fails).
Is there any way to include what I am looking for in a simple text editor
regex? Or is this already a step where I have to learn how to script in, for
example, Python?
This is what the regex looks like so far.
(?<=\\(nm|mb)).*o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Of course, I can't apply .* in the replace without losing its content.
Put a capture group around .* and a back-reference in the replacement.
Find: (?<=\\(nm|mb))(.*)o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Replace: \1o'o

Using regex to store text before and after two different characters

I have a series of files that have this format:
01x05e - Some text (Some more text)
01x05f - Some text (Some more text)
01x05g - Some text (Some more text)
What I'd like to to is strip them to produce this:
01x05e - Some more text
01x05f - Some more text
01x05g - Some more text
To do this, I am using the Bulk Rename Utility. I thought I had some regex I could use to do this:
-.*?\(
This successfully matches everything between the "-" and the "(" above. I was hoping I could use it to remove all of that text, before asking the Bulk Rename Utility to do a very trivial removal of the final character in every filename (i.e. the ")").
However, in Bulk Rename Utility I can't just match the text and replace it with no content to remove it. Instead, I have to specify a replace criteria and nothing I'm entering seems to be working. Any time I enter any data in the "replace" section it simply erases all of my filename.
This leads me to think I need to approach the regex differently and find some way of grabbing the data before the "-" and grabbing the data between both "(" and ")" and placing them together.
However, I've no idea how to even start doing this. Is this the correct approach, or am I missing something obvious with regards to the match and replace of the regex I currently have?
The Bulk Rename Utility has a support website, including a forum that features a section providing regex rename support, here.
It seems like you want to capture from the first paren (open) to the last paren (close) and delete everything after the first hyphen. Capturing groups are your friend here:
^([^-]* -) - This will match from start to first hyphen after space.
[^(]* - This will match everything except open paren.
\((.*)\)$ - This will match things inside the parens.
You can put it all together:
^([^-]* -)[^(]*\((.*)\)$
And replace it with just the first group and the last group:
\1 \2

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

How to replace . in patterned strings with / in Visual Studio

I have lot of code in our solution like this:
Localization.Current.GetString("abc.def.gih.klm");
I want to replace it with:
Localization.Current.GetString("/abc/def/gih/klm");
the number of dots (.) is variable.
How can I do this in Visual Studio (2010)?
Edit: I want to replace strings in code (in VS 2010 editor), not when I run my application
Thank you very much
Misread your request.
If you press ctrl+shift h and put this as your find string
{Localization\.Current\.GetString\("[A-Za-z\/]+}(\.)
Then put this as your replace with:
\1/
And then in find options tick use regular expressions.
This will find the first dot and replace it. Clicking find next will get the second one etc. You will have to keep doing a replace all until they are all done. Someone can probably improve that!
As shown below
Try this in the "Replace in Files" Dialogue with "Use Regular expressions"
Find what:
{[^"]*"[^"]*}\.
If you want to be a bit more strict on the allowed characters between the quotes then try this
{[^"]*"[A-Za-z.]*}\.
this would allow only ASCII characters and dots between the quotes.
Replace with
\1/
It will find the first " in a row and replace the last dot before the next " with /
The problem is, it replaces only the last occurrence of a dot within the first set of "" in each row. So you would have to call this a few times until you get the message "The text was not found"
And be careful if there is a wanted dot between "". it will be replaced also.
EDIT
you can't use this in visual studio as it has its own flavour of regex, not the one used in the .NET regex classes, and I don't think you can do lookbehind with it.
you can use this regex:
(?<=\("[\w.]+)\.
in the find and replace, replacing by .
Breaking it down:
Match a dot (the . at the end)
Which is preceeded by (positive look behind) a bracket ( followed by a " and then any number of characters which are letters or a dot (dots don't need to be escaped in a group)
if you are sure that the text that you want to replace only ever has the Localization.Current.GetString bit then you could include that in the lookbehind of the regex:
(?<=Localization\.Current\.GetString\("[\w.]+)\.