My raw text file is like
>
item1
>
item{2}
>
item3
>
item4}
I would like to remove/match all items containing { or }. In the example above, it would be removing item 2 and 4. The result would be:
>
item1
>
item3
I want to do greedy match so it matches the minimum block. It also has to match over multiple lines. like:
(?s)>(.+?)[\{|\}](.+?)>
But it's not working properly for me.
This regex does exactly what you asked for. It assumes that exact type of input, with nothing else on the line above the one which contains { or }.
>\n.*?[{}]+.*?$
If that line could have text on it, the following one works.
>.*\n.*?[{}]+.*?$
Both these replaces will leave a blank line. To avoid this, add \n either in front or the back of the regex, depending on what fits your document.
Try this one:
\n>\s(.*)(\{|\})
Related
Note: I want to do this in Vscode.
If next line starts with > then remove the > and merge with the previous line or, in other words, merge everything and remove the leading > until the next line found doesn't start with >
Example of the multiline citation that i have:
Some text
> Some citation
> and this is the continuation of that citation
> that should become in one line
Some other text
Would become:
Some text
Some citation and this is the continuation of that citation that should become in one line
Some other text
select a >
select all: ctrl-shift-L
delete '>`: Delete
join lines: Backspace
exit multi cursor: Esc
See https://regex101.com/r/NTZ6gd/3
>\s+([^\n]*)((\n)(?!^$)|(\n?))
and replace with
$1 $4
There is some added complexity if the text ends a citation or not. >'s are allowed within the body of the citation. I don't know if you your citations have spaces at the end of each line so added the space within the replacement. You can remove that replacement space if there is one space at the end of each citation line.
I have a directory with a bunch of text files, all of which follow this structure:
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
- Again, some list items of random text
- Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
And I need to run a replace operation (let's say, I need to prepend CCC at the beginning of the line, just after the dash) on only those "list items", which are between PATTERN_A and PATTERN_B. The problem is they aren't really much different from the text above PATTERN_A, or below PATTERN_B, so an ordinary regex can't really catch them without also affecting the remaining text.
So, my question would be, what tool and what regex should I use to perform that replacement?
(Just in case, I'm fine with Vim, and I can collect those files in a QuickFix for a further :cdo, for example. I'm not that good with awk, unfortunately, and absolutely bad with Perl :))
Thanks!
If I have understood your questions, you can do so quite easily with a pattern-range selection and the general substitution form with sed (stream editor). For example, in your case:
$ sed '/PATTERN_A/,/PATTERN_B/s/^\([ ]*-\)/\1CCC/' file
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
(note: to substitute in place within the file add the -i option, and to create a backup of the original add -i.bak which will save the original file as file.bak)
Explanation
/PATTERN_A/,/PATTERN_B/ - select lines between PATTERN_A and PATTERN_B
s/^\([ ]*-\)/\1CCC/ - substitute (general form 's/find/replace/') where find is from beginning of line ^ capturing text between \(...\) that contains [ ]*- (any number of spaces and a hyphen) and then replace with \1 (called a backreference that contains all characters you captured with the capture group \(...\)) and appending CCC to its end.
Look things over and let me know if you have questions or if I misinterpreted your question.
With Perl also, you can get the results
> perl -pe ' { s/^(\s*-)/\1CCC/g if /PATTERN_A/../PATTERN_B/ } ' mass_replace.txt
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
>
The question is like: Remove lines that is shorter than 5 characters before the # using Notepad++
But it differs a bit...
I have like that:
abc:123
abc:1234
abc:12345
PLEASE NOTE: abc is not on all the lines, it is just an example.
I want to remove the first line in the previous example because 123 which is after : is shorter than or not equal to 5 characters.
Any help would be appreciated.
Thanks!
Open Notepad++ find and replace choose regex mode in the search and place ^((?!.+:\d{5,}).)*$ in search and keep replace with blank and press replaceAll
^((?!.+:\d{5,}).)*$
Without knowing the language there is only so much help I can offer. I'll give you an example of how I would solve this problem in C#.
Start by creating a string for your updated file (without the short lines)
string content = "";
Read a line in from your file.
Then get a substring of the line you read in - the abc: portion and check the length.
line = line.substring(indexof(":"), length - indexof(":"))
if(line.length > 5)
{
content += line;
}
At the end, truncate your file and write content to it.
Relatively new linux/vim/regex user here. I want to use regex to search for a numerical patterns, capture it, and then use the captured value to append a string to the previous line. In other words...I have a file of format:
title: description_id
text: {en: '2. text description'}
I want to capture the values from the text field and append them to the beginning of the title field...to yield something like this:
title: q2_description_id
text: {en: '2. text description'}
I feel like I've come across a way to reference other lines in a search & replace but am having trouble finding that now. Or maybe a macro would be suitable. Any help would be appreciated...thanks!
Perhaps something like:
:%s/\(title: \)\(.*\n\)\(text: \D*\)\(\d*\)/\1q\4_\2\3\4/
Where we are searching for 4 parts:
"title: "
rest of line and \n
"text: " and everything until next digit in line
first string of consecutive digits in line
and spitting them back out, with 4) inserted between 1) and 2).
EDIT: Shorter solution by Peter in the comments:
:%s/title: \zs\ze\_.\{-}text: \D*\(\d*\)/q\1_/
Use \n for the new lines (and ^v+enter for new lines on the substitute line): A quick and not very elegant example:
:%s/title: description_id\n\ntext: {en: '\(\i*\)\(.*\)/title: q\1_description_id^Mtext: {en: '\1\2/
In Google Sheets, I have this in one cell:
Random stuff blah blah 123456789
<Surname, Name><123456><A><100><B><200>
<Surname2, Name2><456789><A><300><B><400>
Some more random stuff
And would like to match the strings within <> brackets. With = REGEXEXTRACT(A4, "<(.*)>") I got thus far:
Surname, Name><123456><A><100><B><200
which is nice, but it is only the first line. The desired output would be this (maybe including the <> at the beginning/end, it doesn't really matter):
Surname, Name><123456><A><100><B><200>
<Surname2, Name2><456789><A><300><B><400
or simply:
Surname, Name><123456><A><100><B><200><Surname2, Name2><456789><A><300><B><400
How to get there?
Please try:
=SUBSTITUTE(regexextract(substitute(A4,char(10)," "),"<(.*)>"),"> <",">"&char(10)&"<")
Starting in the middle, the substitute replaces line breaks (char(10)) with spaces. This enables the regexextract the complete (ie multi-line) string to work on, with the same pattern as already familiar to OP. SUBSTITUTE then reinstates the relevant space (identified as being immediately surrounded by > and <) with a line break.
Google sheets uses RE2 syntax. You can set the multi-line and s flags in order to match multiple lines. The following will match all characters over multiple lines in cell A2.
=REGEXEXTRACT(A2, "(?ms)^(.*)$")
REGEXEXTRACT(A1,"text1(?ms)(.*)text2")
So, in this case:
REGEXEXTRACT(A1,"<(?ms)(.*)>")