Select specific text in specific column of CSV with variable text length - regex

I'm working on a CSV file in Notepad++, and I need to match a specific occurance of a set number of characters in the text. Example data:
19256506_1.MSG,19256506,1.MSG,RE: New Consent Language,
19256505_1.MSG,19256505,1.DOCX,RE: New Consent Language,
19256433_1.MSG,19256433,1.MSG,RE: New Consent Language,
What I need to select is the file extensions in the 3rd row, leaving only the number. The problem is, it could either be .MSG, .DOCX, .PDF, etc. Basically, I need to select anything in the 3rd column after and including the ., but up to and excluding the next ,.
How can I match this using regex?

You could use
^[^,]*,[^,]*,[^.,]*(\.[^,]+)
Use the first group, see a demo on regex101.com.

Press ctrl + h (or Search > Replace) and try replacing:
(?s)^(?:[^,]*,){2}[^,]*(\.[^,]+).*?$
With
\1
Make sure Search Mode is set to Regular Expression
Hit Replace All
Demo

Related

Match all spaces after a particular string

Doing a find and replace in VsCode on a large amount of files. I'm looking to replace all spaces after a set of quotes, but only on a specific line.
I can very easily find all spaces using \s+, but I don't understand how to capture only the spaces after a specific string(one specific line). I've tried positive look behinds, but I can only get it to match the first space, but I need to match all spaces on that line.
Example code:
variable = "01 - Testing this thing"
I need to find and replace all the spaces between the quotation marks with underscores, but I can't get any regex to match all the spaces between the quotes. I might want to replace the dash(-) as well, but the spaces are more important and I'm struggling to figure it out.
Here is a pretty good workflow.
Open a Search Editor (from the Command Palette or set a keybinding to it).
Use this regex (?<=variable = ")[^"]*.
That will find all matches in all files in your workspace or whatever folders you designate in the file to include filter. I suggest setting the context lines option to 0.
Ctrl+Shift+L to select all your matches. The matches are the 01 - Testing this thing part.
Now do a regular find in that search editor tab - with the Find in Selection option enabled.
Simply doing a find of and replaceAll with _ will make all those changes (in the Search Editor only).
To apply those changes to all the files with your initial search results, use the extension search-editor-apply-changes Apply Search Editor Changes... command.
Then you can check to see if the changes were as you expected and save all. It will open all affected files so you can inspect them.
Seems like a few steps but notice the first regex can be very simple. And then you are doing a simple find/replace in just those selections. Demo:
You search for a string that matches, it has A space between the quotes. Replace with what is before and after the space but the space is now an underscore. You have to apply this as often as the max number od spaces in a string. It can't be done in 1 regex search-replace.
In the Search Bar
Find Regex:
(variable = "[^" ]*) ([^"]*")
Replace:
$1_$2
Then apply Replace All (button) and Refresh (button) until no more searches found.

Find and replace in Google Spreadsheet using Regular Expressions

I would like to use the Google Spreadsheet Find and Replace function with the "Search using regular expressions" function activated to do a search and replace in my document.
Use case: Some of my cells contain erroneous linefeed characters at the end (leftovers from paste operation by some of the editors).
I'm using the following pattern to successfully find the cells
.*\012$
Is there some syntax for the "Replace with" field that lets me replace the cell's content by the string I found minus the \012 character at the end?
The Google Spreadsheet documentation does not contain any relevant information. https://support.google.com/docs/answer/62754?hl=en
Here's a screenshot of the box
You may use a capturing group (...) in the pattern around the part you want to keep, and use a $1 backreference in the replacement:
(.*)\012$
to replace with $1.
See the regex demo.

Notepad++ Regex inverse match

I'm new to Regex and trying to figure out how to remove all text from file open in Notepad++ that does not match #LCxxxx or #LAxxxx. Example below (text wanting to keep in bold):
1.In rare cases, reinstalling this MSP file can cause the Citrix Display Driver.....
[From ICAWS760WX86][#0528688]
30.This release includes an enhancement...
[From ICAWS760WX86022][#LA3014]
New Fixes in This Release
1.Windows Server 2008 R2 and Windows Server 2012 R2,...
[From ICAWS760WX86026][#LC2179]
Fixes from Replaced Hotfixes
1.If the Windows Remote Desktop Session Host....
[From ICAWS760WX86004][#LC1180]
I think this is what you're looking for:
(?:[\S\s]*?)(\#L[AC]\d{4})(?:.*)
Replace with:
$1\n
You could do a regular expression search and replace, searching for
(#L[AC]....)
where "dot matches newline" is NOT selected. Replace with
\r\n\1\r\n
That will put all the wanted pieces of text on a line on their own.
Next use the "Mark" tab in the find window. Select "Bookmark line", use the same search string as above (the capture brackets are not needed this time, but they are harmless and so can be left), and them click "Mark all". Now all the wanted lines are bookmarked. Use menu => Search => Bookmark => Remove unmarked lines.
There may be a way of doing it all in one go, but that would be a complex regular expression. The method above uses two simple steps.
remove all text from file open in Notepad++ that does not match #LCxxxx or #LAxxxx
^.*(\[#L[CA]\d+\])$|^.*$
DESCRIPTION
DEMO
https://regex101.com/r/hO1aL8/2
Notepad++
Do a search and replace like describe in the screenshot below:
Alternatively, if you want to get rid off the empty lines during the replace operation, use the regular expression below:
^[\S\s]+?(\[#L[CA]\d+\])$
\s : Whitespaces (\t,\r,\n ...)
\S : Any character except whitespaces.
Tested on Notepad 6.6.9

Using regex to extract string after certain number of characters

How do I extract the string after certain number of characters?
Eg. 0 HOPOPT IPv6 Hop-by-Hop Option Y [RFC2460]
I want to select the highlighted part which is every character after 57th character. I need to make a replace in Sublime editor and have to specifically select the highlighted portion, how can I do that?
This should work:
.{57}(.*)
Everything after the 57th character will be captured in group 1.
Alternatively, if your platform supports it, you can use a lookbehind:
(?<=.{57}).*
Using sublime, simply press Ctrl+F to bring up the find options, select the regex option (.*) and enter in one of the above regular expressions. Here's a screenshot (with slightly modified regular expression, since what you say you want to hightlight actually appears to begin at the 32nd character.)

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.