Vim - how to join lines using matching pattern - regex

I have a txt file that contains contact info for businesses. Currently, each line contains a different piece of data for the business. I'm attempting to construct a pipe-delimited file with all the info for each business on a single line. The catch is that there are a different number of lines for each business. So the file looks like this:
Awesome Company Inc|
Joe Smith, Owner|
Jack Smith, Manager|
Phone: (555)456-2349|
Fax: (555)456-9304|
Website: www.awesomecompanyinc.com [HYPERLINK: http://www.awesomecompanyinc.com]|
* Really Cool Company|
* Line of business: Awesomesauce|
Killer Products LLC|
Jack Black, Prop|
Phone: (555)234-4321|
Fax: (555)912-1234|
1234 Killer Street, 1st Floor|
Houston, TX 77081|
* Apparel for the classy assassin|
* Fearful Sunglasses|
* Member of the National Guild of Killers since 2001|
* Line of business: Fuhgettaboutit|
etc.
So I can use :g/<pattern>/j to join lines within a pattern but I'm having trouble working out what the pattern should be. In the example above, lines 1-9 need to joined, and then lines 10-19.
The key is the lines that begin with 2 spaces and an asterisk:
* Line of business
The pattern should basically say: "Starting with the first line beginning with a letter, join all lines until the first line after the last line beginning with \ \ \*\, then repeat until the end of the file."
How would I write this? Should I maybe do it in two steps - i.e., is there a way to first join all the lines starting with letters, then all the lines starting with \ \ \*\, then join each resulting pair?

Starting with the first line beginning with a letter, join all lines until the first line after the last line beginning with \ \ *\, then repeat until the end of the file.
You can actually translate that almost literally to Vimscript:
Starting with the first line beginning with a letter is /^\a/
until the first line after the last line beginning with * is /^ \* .*\n\a: find a line starting with the bullet (^ \*), match the rest of the line (.*), and assert that the next line isn't a bulleted one (\n\a)
then repeat until the end of the file. is done via :global
Taken together:
:global/^\a/,/^ \* .*\n\a/join

Edit: nevermind, just realized that there are a bunch of settings that would have to be set for my solution to work. To make it work generally you would need
for i in range(10)
try
v/business/join
endtry
endfor
And even that assumes that there isn't a business block that has more than 1024 lines. At that point you might as well use ranges

Related

Select n-th line with first line condition

I have subtitles file which was auto-generated for one of the Youtube videos.
Here, there are 4 speeches. Every speech has number, time, first text line and second text line.
I would like to delete every first text of line in every time spans. I need it because currently when new text comes I see the old one and the new one. In other words, old text is moving up and new comes from the bottom. I would like to see only the new one.
1
00:00:02,880 --> 00:00:06,550
[empty]<--to be removed
[Music]
2
00:00:06,550 --> 00:00:06,560
[Music]<--to be removed
[empty]
3
00:00:06,560 --> 00:00:09,290
[Music]<--to be removed
my name is Maria and I'm a technical
4
00:00:09,290 --> 00:00:09,300
my name is Maria and I'm a technical<--to be removed
[empty]
What have I tried? I am only able to select: number line, time line and first text line. Somehow (?=regexp) doesn't work with my query. Here is my query:
(^\d+$\n.+$\n)
^\d+$ - starts and ends with digit elements
\n.+$ - select new line, select all elements till the end of the line
\n - select one more line but don't select elements
You may use the following regex:
^(\d+\r?\n.*?-->.*)\r?\n.+
Replace with $1. See the regex demo.
Details
^ - start of a line
(\d+\r?\n.*?-->.*) - Capturing group 1:
\d+ - 1+ digits
\r?\n - a CRLF or LF line break
.*?-->.* - a line that has --> (this is to make matching safer, your .+ can do, too, if you are sure there are no subtitle text lines that are only made up of digits)
\r?\n - CRLF or LF
.+ - 1 or more chars other than line break chars.

Regex - Adding a new line and word after every line beginning with certain phrase

I'm currently writing a program in java, and I have 100s of different lines of code beginning with
Bw.write
How can I make it so after every line beginning with Bw.write, I can insert a new line under it with BufferedWriter.newLine();
Thanks in advance
EDIT: Sorry for the confusion
For example I need
Bw.write 12345
Bw.write 98765
to become
Bw.write 12345
BufferedWriter.newLine();
Bw.write 98765
BufferedWriter.newLine();
Replace Bw\.write.* with $0\r\nBufferedWriter\.newLine()
Click for DEMO
Before Replace:
After Replace All:

Use recoginsed data for replacing

I have column of dates in my Notepad++:
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I need it to cut in weeks by placing \r\n after each week like :
2017-06-12
2017-06-13
2017-06-14
2017-06-15
2017-06-16
2017-06-17
2017-06-18
2017-06-19
2017-06-20
2017-06-20
2017-06-21
2017-06-22
2017-06-23
2017-06-24
2017-06-25
2017-06-26
2017-06-27
2017-06-28
2017-06-29
2017-06-30
2017-07-01
2017-07-02
2017-07-03
2017-07-04
2017-07-05
2017-07-06
2017-07-07
2017-07-08
2017-07-09
2017-07-10
I do replace by using RegEx. I find 7 days:
\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n\d\d\d\d-\d\d-\d\d\r\n
And now I would like to add \r\n
But how to use selected data for replace with itself plus \r\n ?
If you are sure that the first date is monday, you could that:
Ctrl+H
Find what: (?:\d{4}-\d\d-\d\d\R){7}
Replace with: $0\r\n
Replace all
In your example input there are some lines doubled. e.g. the 2017-06-20. In your example output this line is also doubled and the week-block consists of eight lines. Seven unique lines and one doubled line for 2017-06-20. I assume that all lines in the input are sorted, thus non unique lines are behind each other. Additionally I assume that the first line marks the first day of a week.
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (((.*\R)\3*){7})
Replace With: \1\r\n
Check regular expression, do not check . matches newline
Click Replace or Replace All
Explanation
Lets explain (((.*\R)\3*){7}) from the inside out, starting at the third inner group: in the following x,y are regex-parts and do not mean literal characters.
(.*\R) the third group is just one line from start to end
(y\3*) we look for a y followed by an optional part that is captured in the third braces group, here it means a y followed by an optional number of repetitions of y, here y is the third group referenced by \3; this deals with the 2017-06-20 case
(x{7}) we match seven repetions of x, which means here seven unique rows wich can have repetitions in the block, so 8 line with one line doubled is ok

vim: substitute specific character, but only after nth occurance

I need to make this exercise about regexes and text manipulation in vim.
So I have this file about the most scoring soccer players in history, with 50 entries looking like this:
1 Cristiano Ronaldo Portugal 88 121 0.73 03 Manchester United Real Madrid
The whitespaces between the fields are tabs (\t)
The fields each respond to a differen category: etc...
This last field contains one or more clubs the player has played in. (so not a fixed number of clubs)
The question: replace all tabs with a ';', except for the last field, where the clubs need to be seperated by a ','.
So I thought: I just replace all of them with a comma, and then I replace the first 7 commas with a semicolon. But how do you do that? Everything - from regex to vim commands - is allowed.
The first part is easy: :2,$s/\t/,/g
But the second part, I can't seem to figure out.
Any help would be greatly appreciated.
Thanks, Zeno
This answer is similar to #Amadan's, but it makes use of the ability to provide an expression as the replace string to actually do the difficult bit of changing the first set of tabs to semicolons:
%s/\v(.{-}\t){7}/\=substitute(submatch('0'), '\t', ';', 'g')/|%s/\t/,/g
Broken down this is a set of three substitute commands. The first two are cobbled together with a sub-replace-expression:
%s/\v(.{-}\t){7}/\=substitute(submatch('0'), '\t', ';', 'g')/
What this does is find exactly seven occurrances ({7}) of any character followed by a tab, in a non-greedy way. ((.{-}\t)). Then we replace this entire match (submatch(0)) with the result of the substitute expression (\=substitute(...)). The substitute expression is simple by comparison as it just converts all tabs to semicolons.
The last substitute just changes any other tabs on the line to commas.
See :help sub-replace-expression
Here's one way you could do it:
:let #q=":s/\t/;\<cr>"
:2,$norm 7#q
:2,$s/\t/,/g
Explanation:
First, we define a macro 'q' that will replace one tab with a semicolon. Now, on any line we can simply run this macro n times to replace the first n tabs. To automatically do this to every line, we use the norm command:
:2,$norm 7#q
This is essentially the same thing as literally typing 7#q (e.g. "run macro 'q' seven times") on every line in the specified range. From there, we can simply replace every tab with a comma.
:2,$s/\t/,/g
:2,$s/\t\(.*\t\)\#=/;/g
:2,$s/\t/,
Change any tabs where there is a tab later to ;
Change any remaining tabs to ,
EDIT: Misunderstood. Here is a fixed version:
:2,$s/\(\(\t.*\)\{7}\)\#<=\t/,/g
:2,$s/\t/;/g
Change any tabs where there's seven tabs before it to ,
Change any remaining tabs to ;
My PatternsOnText plugin has (among others) a :SubstituteSelected command that allows to specify the match positions. With this, you can easily replace the first 8 tabs with semicolons, and then use a regular substitute to change the remaining tabs into commas:
:2,$SubstituteSelected/\t/;/g 1-8
:2,$s/\t/,/g
We solved the issue by just capturing the first 8 groups manually ([^\t]*\t)(...)(...) and then separate them with a semicolon (\1;\2;...;) then replacing the remaining tabs with comma's | 2,$s/\t/,/g
Thanks to everyone trying to help!

Adding characters in front and end of specific subtitle lines in Notepad++?

I want to add a dash in front of a continuing subtitle line. Like this:
Example sub (.srt):
1
00:00:48,966 --> 00:00:53,720
Today he was so angry and happy
at the same time,
2
00:00:53,929 --> 00:00:57,683
he went to the store and bought a
couple of books. Then the walked home
3
00:00:57,849 --> 00:01:01,102
with joy and jumped in the pool.
4
00:00:57,849 --> 00:01:01,102
One day he was in a bad mood and he
didn't get happier when he read.
TO THIS:
1
00:00:48,966 --> 00:00:53,720
Today he was so angry and happy
at the same time-
2
00:00:53,929 --> 00:00:57,683
-he went to the store and bought a
couple of books. Then the walked home-
3
00:00:57,849 --> 00:01:01,102
-with joy and jumped in the pool.
4
00:00:57,849 --> 00:01:01,102
One day he was in a bad mood and he
didn't get happier when he read.
The original subtitle is in Swedish. This is the standard for scandinavian subtitles.
How do I format it with regex in Notepad++? How should I write the tags and what if the subtitle contains italic tags in front and end?
You can use this regex with the g and m modifiers:
(?:,|([^.?!]<[^>]+>|[^>.?!]))$(\n\n.*\n.*\n)
Use $1-$2- as the substitution.
I'm using a simple definition of sentence. If there is one of .?!, that's counted as the end of a sentence. While this may not be a perfect definition, you're only looking at the ends of sentences.
Depending on several factors (for example, a line ending in ), you may need to tweak it a little.
Essentially, the regex is two parts.
The first part matches one of three things at the end of a line. If it matches a comma, that comma is removed. Otherwise, it looks to see if the last letter (if there is a tag, the letter before that) is NOT any of .?!.
The second part matches all the lines before the one that needs the dash. This also helps ensure that the end of the line you just matched is followed by a new line (and not more text).