How to join all lines till next condition?

How to join all lines till next condition? - regex

I can't find out how to join all lines till a next condition happens (a line with only 1 or more numbers) p.e.
input:
1
text text text text (with numbers)
text text text text (with numbers)
2
this text
text text text text (with numbers)
text text text
3
text text text text (with numbers)
4
etc
desidered output:
1 text text text text (with numbers) text text text text (with numbers)
2 this text text text text text (with numbers) text text text
3 text text text text (with numbers)
4
etc
I normally use global/^/,+2 join but the number of lines to join are not always 3 in my example above.

Instead of the static +2 end of the range for the :join command, just specify a search range for the next line that only contains a number (/^\d\+$/), and then join until the line before (-1):
:global/^/,/^\d\+$/-1 join

v/^\d\+/-j will do the trick.
v execute the function for each not matching the condition
^\d\+ your condition : Line starting with a number.
-j go one line backward an join. Or if you prefer join the current line with the previous line.
So basically we join every lines not matching your condition with the previous line.

Just because of the comment by Tim that it couldn't be done with only a regular expression search and replace using Vim, I present this: how to do it with only a regular expression search and replace, using Vim:
:%s#\s*\n\(\d\+\s*\n\)\#!# #
If you're not fond of backslashes, it can be simplified using "very magic" \v:
:%s#\v\s*\n(\d+\s*\n)#!# #
This is adapted from Tim's Perl-style regular expression given in the same comment, improved to make sure the "stop line" only has numbers (and maybe trailing whitespace).
See :help perl-patterns if you're comfortable with Perl and find yourself having trouble with the Vim regular expression dialect.

Related

Using a regex to extract a set of numbers and/or blank lines

I'm constructing a regex using PCRE to process text to extract a set of numbers from a set of text lines (the lines are produced by parsing HTML with XPATH but the question doesn't depend on that). If the number required isn't present, I need to return a blank line.
I'm using a module in Drupal called Feeds Tamper that provides a limited set of options to modify the content -- including a Regex find and replace based on PCRE (not PCRE2). I have options to do a sequence of Regex Find and Replace and/or simple Find and Replace.
The input takes the format:
Text A Location1 More text q=1,2)" Even more text
Text B
Text C Location1 More text q=3,4)" Even more text
Text D
There can be any number of lines including and not including the digits I want to extract; the last line may or may not have a digit in it; I need to process all the lines and end up with one result per line and no extras. The results are then replaced with a capturing group.
My search Regex currently looks like
.*?Location1.*?q=(.*?),(.*?)".*?(\r|$)|.*?(\r|$)
and my replacement like
\1|
but (see regex101.com) this gives results such as
1||
||
3||
||
||
where the expected output is:
1|
|
3|
|
i.e there is an extra line at the end that doesn't correspond to an input line, and an extra pipe character at the end of each line.
If I use
.*?Location1.*?q=(.*?),(.*?)".*?\r|.*?\r
the last line is omitted so I get:
1|
|
3|
If I don't add a pipe | to end of the substitution I get the right number of lines with the expected content (digit or blank), but as soon as I add something at the end of the substitutionI get an extra line and the substituted characte ris doubled.
What do I need to change in my Regex and why?

Something like this:
^(?:.*Location1.*?q=(\d+),(\d+))?.*$
First it matches start of line, optionally followed by the "required" Location and q= parts and captures the numbers. Finally it matches anything up to the end.
Here at regex101.

Regular Expression - joining two lines, but first number of joined 2nd line is deleted

I have some sample data (simplified extract below - the real file contains 52,000 lines, with pairs of lines, the 2nd line of each pair is always a date field, and there are always 2 blank lines between each data pair):
The colour of money 20170233434
10-DEC-2015
SOME TEST DATA 32423412123
19-OCT-2015
I want to join each line up, using a Regular Expression (I am using TextPad, but I think the RegEx syntax is generic).
I am doing a replace search, and want to end up with this:
The colour of money 20170233434 10-DEC-2015
SOME TEST DATA 32423412123 19-OCT-2015
I am using this in the "Find what" field:
\n^[0|1|2|3|4|5|6|7|8|9]
And replacing with NULL.
The end result I am getting is almost there:
The colour of money 20170233434 0-DEC-2015
SOME TEST DATA 32423412123 9-OCT-2015
But not quite, because the first digit of the date values are being stripped out.
How would I modify the RegEx to not delete the first number of the 2nd line? I tried to replace with [0|1|2|3|4|5|6|7|8|9] but that just put that entire string in front of each date field, and still stripped out the first number of the date.

Just search for this
\r?\n(\d{1,2}\-)
And replace it with $1. See the live example here.
If you want to replace it with null, you can also use a lookahead:
\r?\n(?=\d{1,2}\-)
And replace it with null. See the live example here.
Those regular expressions only match for a newline character (in UNIX \n or Windows \r\n) followed by 1 or 2 characters of a number and finally followed by a dash. If you want to be more specific, you could also use this regular expression:
\r?\n(\d{1,2}\-[A-Z]{3}\-\d{4})
Or with a lookahead respectively:
\r?\n(?=\d{1,2}\-[A-Z]{3}\-\d{4})
You could even check for the double linebreaks after the statement (live example):
\r?\n(\d{1,2}\-[A-Z]{3}\-\d{4}(?:\r?\n){2})
Or with a lookahead respectively (live example):
\r?\n(?=\d{1,2}\-[A-Z]{3}\-\d{4}(?:\r?\n){2})

Add text at the end of specific lines

I know how to add something to the end of every line, but how to add text at the end of the lines containing specific words.
Some line of text here
Tomatoes Oranges
Mili Deci Centi
Some line of text there
Fire Flame
Dog Cat
Tall Small
Some line of text with more text
Mother farher
-------
I want to add characters at the end of the lines containing "Some line", something like this:
Some line of text here EXTRATEXT
Tomatoes Oranges
Mili Deci Centi
Some line of text there EXTRATEXT
Fire Flame
Dog Cat
Tall Small
Some line of text with more text EXTRATEXT
Mother farher
-------
The lines end in different characters, so I need to search for a pattern that is inside the line, and add text at the end of those line.

Replace the following pattern:
Some line.*
With:
$0 EXTRATEXT
This matches from Some line up to the end of the line (.*, as . matches any character but a newline).
You can then replace the whole match ($0) with itself followed by the extra text you want.

[a-zA-Z]+\n or \w+\n or mutliple \n+ at the end if you want to clean empty lines too. Finally if it's important that the word is capital on the firs letter: [A-Z][a-zA-Z]+\n

Why don't you try delimiting the regex pattern with a line-break, or a carriage return.
I think it might be achieved with \r\n at the end of the regex, on Notepad++.

Remove duplicate lines based on a search in notepad++

I have a text file that contains thousands of lines of text as below.
aaaa "test "
aa "test "(version 2)
bbbb "test "(version 4)
bbbbb "test1 "(with heads)
abs "test1 "
absc "test3"
I would like to be able to remove all the duplicates based on a search and keep only the first line (in my case all lines with the same value between the quotation marks)
EDIT : More details about how I detect that a line is a duplicate of another :
I check the value between the quotation marks. On the 3 first lines there is the value "test " between quotation marks so I want to keep the first line with this value and remove the other values. For lines 4 and 5 the value is "test1 " so I keep only line 4 and remove the other.
So after cleaning my text file would have this form
aaaa "test "
bbbbb "test1 "(with heads)
absc "test3"
I tried to use this regular search in notepad++
(.\".*?")
But I don't know how to use it to find duplicates and remove the other lines with the same value. I already checked other user's case but I can't found a solution.

I would solve it in several steps.
append line numbers
put the quoted text in front
sort, now lines with the same quoted text are sorted behind each other, and secondly in the original sequence due to the line numbers from step 1
remove "duplicates"
remove the inserted quoted text from step 2
sort by the line number from step 1
remove the line numbers from step 1
Now the detailed explanation:
append line numbers: use Edit -> Column Editor in the first column two times
insert text (some delimiter that does not occur in the file, e.g. | or : )
insert numbers start with 1 increment by 1 use leading zeros
Now each line should start with a line number and a delimiter
prepend the quoted text: use regexp replace
Find what: ^([^"]*)("[^"]+")(.*)$
Replace: \2\1\2\3
Now your lines should start with the text.
Sort: by using Edit -> Line Operations -> Sort ...
Remove Duplicates: with an regexp replace:
Find What: ("[^"]+")(.*)\n\1.*
Replace: \1\2
Use Replace All.
Remove the texts from step 2: using regex replace
Find What: ^"[^"]+"
Replace with: Nothing i.e. leave empty
Sort by the original line numbers: by using Edit -> Line Operations -> Sort ...
Remove the line numbers from step 1: using a regexp replace:
Find What: ^(.*\|) (use \| or whatever you used in step 1 as delimiter)
Replace with: Nothing i.e. leave empty

Regex Find & Replace - Find string of any character and specific length then replace 1 character

I have a document that has a range of numbers like this:
0300010000000394001001,27
0300010000000394001002,0
0300010000000394002001,182
0300010000000394002002,51
0300010000000394003001,156
0300010000000394003002,40
I need to find the new line character and replace with a number of spaces depending on the string length.
If it has 24 characters like this - 0300010000000394001002,0 then I need to replace the new line character at the end with 5 blank spaces.
If it has 25 characters like this - 0300010000000394002002,51 then I need to replace the new line character at the end with 4 blank spaces and so on.
In my text editor I can use find and replace. I search for the line length by ^(.|\s){24}$ for 24 characters - but this will obviously replace the whole line and I only need to replace the new line character at the end.
I want to specify a new line character AFTER ^(.|\s){24}$. Is this possible?

It sounds like you need two things.
Multi-line Mode (See "Using ^ and $ as Start of Line and...")
Backreferencing
Most editors that support regex support these naturally, but you'll have to let us know what editor you're using for us to be specific. Without knowing what editor you're using, all I can say is that you want to do some combination of the following:
regex subst
----- -----
^(.{24})\n $1 <-- there are spaces here
^(.{24})^M \1 <-- there are spaces here
^(.{24})\s ^^^^^

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to join all lines till next condition? - regex

Instead of the static +2 end of the range for the :join command, just specify a search range for the next line that only contains a number (/^\d\+$/), and then join until the line before (-1): :global/^/,/^\d\+$/-1 join

Related

Using a regex to extract a set of numbers and/or blank lines

Regular Expression - joining two lines, but first number of joined 2nd line is deleted

Add text at the end of specific lines

Remove duplicate lines based on a search in notepad++

Regex Find & Replace - Find string of any character and specific length then replace 1 character

Categories

Resources