Regex/Notepad++ to extract text from file

Regex/Notepad++ to extract text from file - regex

I have multiple files with text in parenthesis that I need to extract from the file (or delete everything else in that file). I have a method that works, but it only works for one file. Here is an example of the kind of files I'm dealing with.
(is it on?)
[3.87595 3.87595 0 ]xsh
grestore
NDTMRY+Helvetica[8.5 0 0 -8.5 0 0 ]msf
mo
(NO)
The method I have used is as follows:
in notepad++ under the mark tab in find replace; Find: ^(.*?$ (with bookmark line checked)
Search>bookmarks>remove unbookmarked lines
Is there a way/better way to do this for multiple files at a time? In this or another language such as python.
Thanks!

Yes, it is possible to remove in multiple files lines that do not start with (.
Here is the screenshot with settings:
So, here are the instructions:
Press Ctrl+H and click Find in Files
In Find What, type ^(?!\().*\R*, keep Replace With empty
Add file masks in Filters
Select the initial directory in Directory.
Make sure Regular expression radio button is selected.
Adjust other options and hit Replace in Files button.

Related

Search for multiple strings in several files with Sublime 3 using AND

This previous (similar) question of mine Search for multiple strings in several files with Sublime 3 was answered with a way to search for multiple strings in multiple files in SublimeText, using the regex OR operator:
Find: (string1|string2)
Where: <open folders>
This works perfectly for searching files where either string1 OR string2 is present. What I need now is to search in lots of files for both strings present. I.e., I need to use the AND operator.
I looked around this question Regular Expressions: Is there an AND operator? and also this one Regex AND operator and came up with the following recipes:
(?=string1)(?=string2)
(?=.*string1)(?=.*string2)
(string1 string2)
(string1\&string2)
but none of them work.
So the question is: how can I search multiple strings in several files at once with SublimeText?
(I'm using SublimeText 3103)
Add: the strings are not necessarily in the same line. They can be located anywhere within each file. For example, this file:
string1 dfgdfg d dfgdf
sadasd
asdasd
dfgdfg string2 dfgdfg
should trigger a match.

Open sublime Text and press
Shift+Ctrl+F
or click on the Find in Files options under Files tab. The above is keyboard shortcut for this option. When you press above key, these are following options
When you select ... button from above, you get 6 options which are Add Folder or Add Open Files or Add Open Folders
To search strings that occur in the same line
Use the following regex for your and operation
(?=.*string1)(?=.*string2)
I am using the following regex
(?=.*def)(?=.*s)\w+ <-- \w+ will help in understanding which line is matched(will see later)
and I am searching within current open files
Make sure the Use Buffer option is enabled (one just before Find). It will display the matches in a new file. Also make sure the Show Context (one just before Use Buffer) option is enabled. This will display the exact line that matches. Now Click on Find on the right side.
Here is the output I am getting
See the difference in background color of line 1315 and 1316(appearing in left side). 1316 is matched line in designation file
This is the image of last part
There were total 6 files that were opened while I used this regex
For finding strings anywhere in file
Use
(?=[\s\S]*string1)(?=[\s\S]*string2)[\s\S]+
but it will kill sublime if number of lines increases.
If there are only two words that you need to find, the following will work super fast in comparison to above
(\bstring1\b[\S\s]*\bstring2\b)|(\bstring2\b[\S\s]*\bstring1\b)

Compare files and return only the differences using Notepad++

Notepad++ has a Compare Plugin tool for comparing text files, which operates like this:
Launch Notepad++ and open the two files you wish to run a comparison
check on.
Click the “Plugins” menu,
Select “Compare” and click “Compare.”
The plugin will run a comparison check and display the two files side
by side, with any differences in the text highlighted.
This is a nice feature, and which I have used happily for some time. Now, I have been looking for an option to go further and select the highlighted differing lines (e.g. by deleting the non-highlighted ones), or vice versa: i.e. expunge the highlighted lines.
Is there a straightforward way to achieve this?

To substract two files in notepad++ (file1 - file2) you may follow this procedure:
Recommended: If possible, remove duplicates on both files, specially if the files are big. To do this: Edit => Line operations => Sort Lines Lexicographically Ascending (do it on both files)
Add ---------------------------- as a footer on file1 (add at least 10 dashes). This is the marker line that separates file1 content from file2.
Then copy the contents of file2 to the end of file1 (after the marker)
Control + H
Search: (?m-s)^(?:-{10,}+\R[\s\S]*+|(.*+)\R(?=(?:(?!^-{10,}$)-++|[^-]*+)*+^-{10,}+\R(?:^.*+\R)*?\1(?:\R|\z))) note: use case sensitivity according to your needs
Replace by: (leave empty)
Select Regular expression radio button
Replace All
You can modify the marker if It is possible that file1/file2 can have lines equal to the marker. In that case you will have to adapt the regular expression.
By the way, you could even record a macro to do all steps (add the marker, switch to file2, copy content to file1, apply the regex with a single button press.
Edited:
Changed the regex to add some improvements:
Speed related:
Avoid as much backtracking as possible
Avoid searching after the mark
Usability:
Dashes are allowed for the lines. But the separator is still ^-{10,}$
Works with other characters besides words
Speed comparison:
New method vs Old method
So basically 78ms vs 1.6seconds. So a nice improvement! That makes comparing Kilobyte-sized files possible.
Still you may want to use some dedicated program for comparing or substracting bigger files.

If the number of differences is not large, a quicker method might be just bookmarking each differing line using keyboard shortcuts. Starting from the beginning of the file, press Alt+Page Down to focus on the first difference, and then press Ctrl+F2 to bookmark it. Continue with alternatingly pressing Alt+Page Down and Ctrl+F2 until the last difference.
With all the differing lines bookmarked, you can use any of the operations under "Search -> Bookmarks" menu:
Cut Bookmarked Lines
Copy Bookmarked Lines
Paste to (Replace) Bookmarked Lines
Remove Bookmarked Lines
Remove Unmarked Lines

I have a dirty workaround for this. It saves some time compared to Control+C, Alt+Tab, Control+V; Control+C, Alt+Tab, Control+V; ... but It may not be worth on big files or if the differences for both files are big. For bigger files you may prefer using some other tool.
Typically this works best when comparing group of 'words' and does not work with content that is tabulated (like source code)
So the workaround is:
Optional: (depends on the content that's being compared) Sort both files (it will make the future comparison easier) To do this: Edit => Line operations => Sort Lines Lexicographically Ascending (do it on both files)
Compare files with the plugin
Choose one file and inspect the lines you want to keep. Add one tabulator before each of those lines. Remeber you can select several lines and press tab for tabulating them. Optionally, you may add tabulators to the lines you want to remove
Sort the file. The tabulated lines will come up first. So now you can copy-paste them (or copy-paste the untabulated ones)

move the files to a linux box and then execute diff command:
$ diff file1.txt file2.txt > file_diff.txt

Transpose function in Notepad++

I have a text file as:
0xC1,0x80,
0x63,0x00,
0x3F,0x80,
0x01,0xA0,
I want output as:
Line1: 0xC1,0x63,0x3F,0x01,
Line2: 0x80,0x00,0x80,0xA0,
How to do this using replace function in Notepad++?

You can use the below shortcuts to do the transpose in Notepad ++
Step 1: Ctrl + A: selects all.
Step 2: Ctrl + J: Transpose the Row you selected

Use the box select feature to select the second column text.
Use Alt+Shift+Arraw keys to select the second column.
Copy the selected text to a new file.
Use Find/Replace to remove all the newline characters.
Ctrl+F to open find/replace dialog box.
Select either Extended or Regular Expression Serach mode.
Type \r\n in Find What box.
Keep the Replace with box blank.
Click on Replace All in ALL Open Documents.
Now, the text is brought in single line.
Copy the text from second file and paste it to second line of first file.
Cheers...

There is no built-in function in Notepad++ for transposing a matrix and you can't do it using Replace (as M42 pointed out). Also, I'm not aware of any related plugin. So you will either need a different editor or do it with a script. The simplest solution I guess using a Spreadsheet, eg Excel or OpenOffice, both of them allow you to easily transpose a table.
But, there's still a good alternative without leaving Notepad++. Is to use the Python Script plugin.
Setup Python Script plugin
Install Python Script plugin, from Plugin Manager or from the official website.
When installed, go to Plugins > Python Script > New Script. Choose a filename for your new script (eg transpose.py) and copy the first code block that follows and copy the second one to another script, called for example transpose_uneven.py.
Open your data file and then run Plugins > Python Script > Scripts > transpose.py. This will open a new tab with your data transposed.
transpose.py
delimiter=","
newline="\n"
content=editor.getText()
matrix=[line.split(delimiter) for line in content.rstrip(newline).split(newline)]
transposed=list(map(list, zip(*matrix)))
notepad.new()
for line in transposed:
editor.addText(delimiter.join(line) + newline)
if len(transposed)!=len(matrix[0]):
console.clear()
console.show()
console.write("Warning: some rows are of uneven length. You might consider using the transpose_uneven script instead.")
transpose_uneven.py
import itertools
delimiter=","
newline="\n"
content=editor.getText()
matrix=[line.split(delimiter) for line in content.rstrip(newline).split(newline)]
transposed=list(map(list, itertools.izip_longest(*matrix, fillvalue="")))
notepad.new()
for line in transposed:
editor.addText(delimiter.join(line) + newline)
Examples
The transpose.py script will transpose the following example:
0xC1,0x80,
0x63,0x00,
0x3F,0x80,
0x01,0xA0,
To:
0xC1,0x63,0x3F,0x01
0x80,0x00,0x80,0xA0
,,,
If some of your rows are uneven:
0xC1,0x80,
0x63,0x00,
0x3F,0x80,
0x01,0xA0,
0x02
The uneven columns will be discarded accordingly:
0xC1,0x63,0x3F,0x01,0x02
If this is not desired, use transposed_uneven.py and it will return:
0xC1,0x63,0x3F,0x01,0x02
0x80,0x00,0x80,0xA0,
,,,,

If you really have such a fixed format and need such a fixed output i normally try it with an instant macro.
So my cursor is in the top left corner of the file ready to manipulate and i press the record button (or within the menu bar Macro - Start recording).
In you specific case now press:
End
Del
Pos1
↓
End hit the stop button (or within the menu bar Macro - Stop recording).
Now for a first test hit the playback button (or within the menu bar Macro - Playback) and test if it works. If yes click on Macro - Run a macro multiple times and select Run until the end of file.

Add a line after another line in multiple files

I have opened 100 files like this:
[database]
server=SQL01
db=milli
authentication=auServer
[Misc]
Now I need to add the same line like this
[database]
server=SQL01
db=milli
authentication=auServer
username=user1
[Misc]
How can I do this, probably some sort of regex?

You could simply do this with Notepad++ using the Find in Files feature.
Put authentication=auServer in the Find what text box, authentication=auServer\r\nusername=user1 in the Replace with text box, *.* in the Filters drop down, C:\some directory in the Directory drop down, tick the In all sub-folders check box, and switch the Search Mode to Extended.
Then just click Replace in Files and you're done.

You could try using the Find in files tab on the search dialogue. Make sure Regular expression selected. Set the search string to (db=milli\r\nauthentication=auServer\r\n)(\r\n\[Misc\]) and the replacement to \1username=user1\r\n\2. Then click Replace in files.
Note that the above will add the line in ALL matching places in the files. To specify the files use the Filter and Directory fields, also make sure that the three tick boxes beneath the Close button are correct.
The Replace in files should be avoided unless you are confident that it will not destroy your files.

Can Notepad++ save out search results to a text file?

I need to do quite a few regular expression search/replaces throughout hundreds and hundreds of static files. I'm looking to build an audit trail so I at least know what files were touched by what searches/replaces.
I can do my regular expression searches in Notepad++ and it gives me file names/paths and number of hits in each file. It also gives me the line #s which I don't really care that much about.
What I really want is a separate text file of the file names/paths. The # of hits in each file would be a nice addition, but really it's just a list of file names/paths that I'm after.
In Notepad++'s search results pane, I can do a right click and copy, but that includes all the line #s and code which is just too much noise, especially when you're getting hundreds of matches.
Anyone know how I can get these results to just the file name/paths? I'm after something like:
/about/foo.html
/about/bar.html
/faq/2012/awesome.html
/faq/2013/awesomer.html
/foo/bar/baz/wee.html
etc.
Then I can name that file regex_whatever_search.txt and at the top of it include the regex used for the search and replace. Below that, I've got my list of files it touched.
UPDATE What looks like the easiest thing to do (at least that I've found) is to just copy all the search results into a new text file and run the following regex:
^\tLine.+$
And replace that with an empty string. That'll give you just the file path and hit counts with a lot of empty space between each entry. Then run the following regex:
\s+\n
And replace with:
\n
That'll strip out all the unwanted empty space and you'll be left with a nice list.

maybe you need power of unix tools
assume you have GNUWin32 installed in c:\tools\gnuwin32
than if you have replace.bat file with that content:
#echo off
set BIN=c:\tools\gnuwin32\bin
set WHAT=%1
set TOWHAT=%2
set MASK=%3
rem Removing quotes
SET WHAT=###%WHAT%###
SET WHAT=%WHAT:"###=%
SET WHAT=%WHAT:###"=%
SET WHAT=%WHAT:###=%
SET TOWHAT=###%TOWHAT%###
SET TOWHAT=%TOWHAT:"###=%
SET TOWHAT=%TOWHAT:###"=%
SET TOWHAT=%TOWHAT:###=%
SET MASK=###%MASK%###
SET MASK=%MASK:"###=%
SET MASK=%MASK:###"=%
SET MASK=%MASK:###=%
echo %WHAT% replaces to %TOWHAT%
rem printing matching files
%BIN%\grep -r -c "%WHAT%" %MASK%
rem actual replace
%BIN%\find %MASK% -type f -exec %BIN%\sed -i "s/%WHAT%/%TOWHAT%/g" {} +
you can do regex replace in masked files recursively with output you required
replace "using System.Windows" "using Nothing" *.cs

The regulat expression I use for this kind of problem is
^\tLine.[0-9]*:.
And it works for me

This works well if you have Excel available and want to avoid using regular expressions:
Ctrl+A to select all the results
drag & drop the selected results to Excel
Create a Filter on the 1st row
Filter out the lines that have "(Blank)" on the 1st column
Select the remaining lines (i.e. the lines with the filenames) and copy/paste them to another sheet or any wanted destination
You could also Ctrl+A, Ctrl+C the search results, then use the Paste Option "Use Text Import Wizard" in Excel, say that the data is "Fixed width" and put one single break line after the 2nd character (to remove the two leading spaces in the filename during import), and use a filter to filter out the unwanted rows.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js