My text file has more than ten thousand lines. Each line starts with a word or a phrase followed by a tab and the content, such as:
[line 1] This is the first line. [tab] Here is the content.[end of line]
I want to find character s in all the words between the beginning of each line and a tab (\t), and replace it by a pipe (|) so that the text will look like:
[line 1] Thi| i| the fir|t line. [tab] Here is the content.[end of line]
Here is what I have done:
Search: ^(.*)s+(.*)?\t
Replace: \1|\2\t
It works but the problem is it does not replace s in one replace. I have to click on Replace All for several times before s in all the words is replaced.
So it comes to my question: how can I replace all the occurrences of character s in just one search and replace?
Note that I'm working on TextWrangler but I'm OK with other editors.
Thanks a lot.
You are searching for lines containing an s and do the match. Instead you should be searching for the s directly, and use lookahead to ensure that it is followed by a tab.
Search: s(?=.*\t)
Replace: |
Note that this catches all s's up to the last tab. - This will be a problem if your main content can contain tabs.
To stop catching s's after the first tab you have to cheat. Since variable length negative lookbehind doesn't work in AFAIK any regexp dialect.
However if we can ensure that the last s catches the whole line...
Search: (?:(^[^s\t]*\t.*$)|s([^s\t]*(?:(?=s.*\t)|\t.*$)))
Replace: |\1\2
This will catch the whole line in the case where no s occurs before the first tab. And put a | in front of that line. I see no way around this.
Related
I try to verify a CSV file where we had problems with line breaks.
I want to find all lines not starting with a ".
I am trying with /!^"/gim but the ! negation is not working.
How can I negate /^"/gim properly?
In regex, the ! does not mean negation; instead, you want to negate a character set with [^"]. The brackets, [], denote a character set and if it starts with a ^ that means "not this character set".
So, if you wanted to match things that are not double-quotes, you would use [^"]; if you don't want to match any quotes, you could use [^"'], etc.
With Notepad++, you should be able to search with the following to find lines that don't start with the " character:
^[^"]
If you want to highlight the full line, use:
^[^"].*
In Notepad++ you can use the very usefull negative lookahead
In your case you can try the following:
^(?!")
If you want to match wholes lines add .+ or .{1,7} or anything e.g.:
^(?!").*
will also match empty lines.
Explanation part
^ line start
(?!regexp) negative lookahead part: this means that if the regexp match, the result will not be shown
Step 1 - Match lines. Find dialog > Mark tab, you can bookmark lines that match.
Step 2 - Remove lines bookmarked OR Remove lines not bookmarked. Search > Bookmark > Remove Unmarked Lines or Remove Bookmarked lines
Happy New Year !
I have a problem. I don’t know how to marks\select some words delimited by tabs on a consecutive lines: Recent, Coments and Tags
please see this print screen:
I can easy to put | sign, like: Recent|Comments|Tags but this will select all the words in the files that repeats, and I want only those 3 on those lines.
What I want is to make a regex, to remove all text before those 3 words, and another regex to remove everything after those 3 words.
I try something like this ((?s)((^.*)^.*Recente.*$|^.*Coments.*$|^.*Tags.*^))(.*$)but is not very good. And I have to pay atention, because those words can repeated in the text files, so I have to select\mark exactly those 3, on that 3 consecutive line (that doesn't have any other words on it)
Since you mentioned in a comment that you want to do this in Notepad++ (a fact that should have been mentioned in the question text), and since the screenshot shows a single space after the first two words, you might try this regular expression:
.*\n([ \t]+Recente\s+Coments\s+Tags).*
It will select everything, but capture the 3 words including whitespace between them and whitespace preceding first word on same line.
If you then replace with $1, everything not in the capture group will be removed.
Actually, the spaces after the first two words don't matter to this regex.
Could you please try this in perl:
perl -0777 -ne 'while(m/((\s|\t)+)Recent\n\1Comments\n\1Tags/g){print "$&\n";}' /path/to/file
To breakdown:
Start with 1 or more tab characters (first capture group)
Then "Recent" followed by new line
Capture group, Comments and new line
Capture group, Tags
By the way, is "tab" really tab or multiple consecutive whitespaces (\s+)?
I try to verify a CSV file where we had problems with line breaks.
I want to find all lines not starting with a ".
I am trying with /!^"/gim but the ! negation is not working.
How can I negate /^"/gim properly?
In regex, the ! does not mean negation; instead, you want to negate a character set with [^"]. The brackets, [], denote a character set and if it starts with a ^ that means "not this character set".
So, if you wanted to match things that are not double-quotes, you would use [^"]; if you don't want to match any quotes, you could use [^"'], etc.
With Notepad++, you should be able to search with the following to find lines that don't start with the " character:
^[^"]
If you want to highlight the full line, use:
^[^"].*
In Notepad++ you can use the very usefull negative lookahead
In your case you can try the following:
^(?!")
If you want to match wholes lines add .+ or .{1,7} or anything e.g.:
^(?!").*
will also match empty lines.
Explanation part
^ line start
(?!regexp) negative lookahead part: this means that if the regexp match, the result will not be shown
Step 1 - Match lines. Find dialog > Mark tab, you can bookmark lines that match.
Step 2 - Remove lines bookmarked OR Remove lines not bookmarked. Search > Bookmark > Remove Unmarked Lines or Remove Bookmarked lines
I use Notepad++ and I need to delete all lines starting with, say "abc".
Attention, I don't need to replace the line starting with "abc" with a empty line, but I need to completely delete these lines.
How do I proceed (using regex, I suppose)?
Try replace
^abc.*(\r?\n)?
with
nothing
The ^ indicates the start of a line.
The . means wild-card.
The .* means zero or more wild-cards.
x? means x is optional.
The \r?\n covers both \r\n (generally Windows) and \n (generally Unix), but must be optional to cover the last line.
Search for this regular expression
^abc.*\r\n
Replace with nothing.
Searching a little bit more on regex in Notepad++ I discovered that the new line character is not \n as I expected (Windows), but the \n\r.
So, my regex replace expression should be:
Find: abc.*\r\n
Replace with: (nothing, empty field)
Try the regex \nabc.* in "Find and Replace" --> "Replace"
Leave "Replace With" field empty.
EDIT : This won't work with first like (because '\n' means "new line")
Press Ctrl+H to bring up the Replace window. Put
^abc.*(\r?\n)?
in the Find what and leave Replace with empty. Select Reqular expression and hit Replace All.
This reqular expression handles all the edge cases:
When the first line of the file starts with abc
When the last line of the file starts with abc and there is no new line at the end of the file.
I have multiple html files and some of them have some blank lines, I need a regex to remove all blank lines and leave only one blank line.. So it removes anything more than one blank line, and leave those that are just one or none (none like in having text in them).
I need it also to consider lines that are not totally blank, as some lines could have spaces or tabs (characters that doesn't show), so I need it to consider these lines with the regex to be removed as long as it is more than one line..
Search for
^([ \t]*)\r?\n\s+$
and replace with
\1
Explanation:
^ # Start of line
([ \t]*) # Match any number of spaces or tabs, capture them in group 1
\r?\n # Match one linebreak
\s+ # Match any following whitespace
$ # until the last possible end of line.
\1 will then contain the first line of whitespace characters, so when you use that as the replacement string, only the first line of whitespace will be preserved (excluding the linebreak at the end).
This worked for me on notepad++ v6.5.1. UNICODE windows 7
Search for: ^[ \t]*\r\n
Replace with: nothing, leave blank
Search mode: Regular expression.
search for (\r?\n(\t| )*){3,}, replace by \r\n\r\n, check "Regular expression" and ". matches newline".
Tested with Notepad++ 6.2
This will replace the successive blank lines containing white spaces (or not) and replace it with one new line.
Search for
(\s*\r?\n){3,}
replace with
\r\n
You can find it yourself what you need to replace with
\n\n OR \n\r\n or \r\n\r\n etc ... now you can even modify your regular expression ^([ \t]*)\r?\n\s+$ according to your need.
I tested any of the above suggestions, always was either too less or to much deleted. So that either you got no blank line where at least one was beforehand or deleted not enough (whitespaces was left, etc.). Unfortunately I cannot write comments yet. Tested both with 6.1.5 and updated to 6.2 and tested again. depending on how mayn files there are, I would suggest use
Edit->Blank Operations->Trim trailing whitespace
Followed by Ctrl+A and
TextFX -> TextFX Edit -> Delete surplus blank lines
A Macro I tried to record didn't work. Theres even a macro for just remove trailing whitespace (Alt+Shift+S, see Settings | Shortcut Mapper... | Macros). There's a
Edit->Blank Operations->Remove unnecessary EOL and whitespace
but that deletes every EOL and puts everything in a single line.
In notepad++ v8.4.7 there is the option:
Edit > Line Operations > Remove Empty Lines (Containing Blank characters)
or
Edit > Line Operations > Remove Empty Lines
So there is no need to use a regular expressions for this. But this only works for one file at a time.
I looked for ^\r\n and click "Replace All" with nothing (empty) in "Replace with" textbox.