I have a directory with a bunch of text files, all of which follow this structure:
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
- Again, some list items of random text
- Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
And I need to run a replace operation (let's say, I need to prepend CCC at the beginning of the line, just after the dash) on only those "list items", which are between PATTERN_A and PATTERN_B. The problem is they aren't really much different from the text above PATTERN_A, or below PATTERN_B, so an ordinary regex can't really catch them without also affecting the remaining text.
So, my question would be, what tool and what regex should I use to perform that replacement?
(Just in case, I'm fine with Vim, and I can collect those files in a QuickFix for a further :cdo, for example. I'm not that good with awk, unfortunately, and absolutely bad with Perl :))
Thanks!
If I have understood your questions, you can do so quite easily with a pattern-range selection and the general substitution form with sed (stream editor). For example, in your case:
$ sed '/PATTERN_A/,/PATTERN_B/s/^\([ ]*-\)/\1CCC/' file
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
(note: to substitute in place within the file add the -i option, and to create a backup of the original add -i.bak which will save the original file as file.bak)
Explanation
/PATTERN_A/,/PATTERN_B/ - select lines between PATTERN_A and PATTERN_B
s/^\([ ]*-\)/\1CCC/ - substitute (general form 's/find/replace/') where find is from beginning of line ^ capturing text between \(...\) that contains [ ]*- (any number of spaces and a hyphen) and then replace with \1 (called a backreference that contains all characters you captured with the capture group \(...\)) and appending CCC to its end.
Look things over and let me know if you have questions or if I misinterpreted your question.
With Perl also, you can get the results
> perl -pe ' { s/^(\s*-)/\1CCC/g if /PATTERN_A/../PATTERN_B/ } ' mass_replace.txt
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
>
I have a text file that takes the form of:
first thing: content 1
second thing: content 2
third thing: content 3
fourth thing: content 4
This pattern repeats throughout the entire text file. However, sometimes one of the rows is completely gone like so:
first thing: content 1
second thing: content 2
fourth thing: content 4
How could I search the document for these missing rows and just add it back with a value of "NA" or some filler to produce a new text file like this:
# 'third thing' was not there, so re-adding it with NA as content
first thing: content 1
second thing: content 2
third thing: NA
fourth thing: content 4
Current code boilerplate:
with open('original.txt, 'r') as in:
with open('output.txt', 'wb') as out:
#Search file for pattern (Maybe regex?)
#If pattern does not exist, add the line
Thanks for any help you all can offer!
You must look for 1-3 lines (less than 4) followed by newline:
^\n([^\n]*\n){1,3}\n
Demo: https://regex101.com/r/rL3eA5/2
This isn't pretty, but it works. Here's a regex to detect where lines are missing:
(?:^|\n)(second thing:\s*[^\n]+\n)|(first thing:\s*[^\n]+\n(?!second thing:))|(second thing:\s*[^\n]+\n(?!third thing:))|(third thing:\s*[^\n]+\n(?!fourth thing:))|(third thing:\s*[^\n]+\n\n)
regex101 demo here
Notice the Single Line flag.
When you've got a match, check which match group that matches. If it's the first one, the first line is missing. If it's the second one, the second line is missing and so on for third and fourth.
Here's an example how to replace if the 1'st group got a match.
Here's an example how to replace if the 3'rd group got a match.
Here's an example how to replace if the 4'rd group got a match.
You'll probably have to do some tweaking, but it should get you on your way ;)
Regards.
I have a source document with the following text
Here is a bunch of text
...
Collect underpants
???
Profit!
...
More text
I would like to visually select the middle three lines and insert numbers in front of them:
Here is a bunch of text
...
1. Collect underpants
2. ???
3. Profit!
...
More text
All the solutions I found either put the numbers on their own new lines or prepended the actual line of the file.
How can I prepend a range of numbers to existing lines, starting with 1?
It makes for a good macro.
Add the first number to your line, and put your cursor back at the beginning.
Start a macro with qq (or q<any letter>)
Copy the number with yf<space> (yank find )
Move down a line with j
Paste your yank with P
Move back to the beginning of the line with 0
Increment the number with Ctrl-a
Back to the beginning again with 0 (incrementing positions you at the end of the number)
End the macro by typing q again
Play the macro with #q (or #<the letter you picked>)
Replay the macro as many times as you want with <number>## (## replays the last macro)
Profit!
To summarize the fun way, this GIF image is i1. <Esc>0qqyf jP0^a0q10#q.
To apply enumeration for all lines:
:let i=1 | g/^/s//\=i.'. '/ | let i=i+1
To enumerate only selected lines:
:let i=1 | '<,'>g/^/s//\=i.'. '/ | let i=i+1
Set non recursive mapping with following command and type ,enum in command mode when cursor is inside the lines you are going to enumerate.
:nn ,enum {j<C-v>}kI0. <Esc>vipg<C-a>
TL;DR
You can type :help CTRL-A to see an answer on your question.
{Visual}g CTRL-A Add [count] to the number or alphabetic character in
the highlighted text. If several lines are
highlighted, each one will be incremented by an
additional [count] (so effectively creating a
[count] incrementing sequence).
For Example, if you have this list of numbers:
1.
1.
1.
1.
Move to the second "1." and Visually select three
lines, pressing g CTRL-A results in:
1.
2.
3.
4.
If you have a paragraph (:help paragraph) you can select it (look at :help object-select). Suppose each new line in the paragraph needs to be enumerated.
{ jump to the beginning of current paragraph
j skip blank line, move one line down
<C-v> emulates Ctrl-v, turns on Visual mode
} jump to the end of current paragraph
k skip blank line, move one line up
required region selected, we can make multi row edit:
I go into Insert mode and place cursor in the beginning of each line
0. is added in the beginning of each line
<Esc> to change mode back to Normal
You should get list prepended with zeros. If you already have such, you can omit this part.
vip select inner paragraph (list prepended with "0. ")
g<C-a> does the magic
I have found it easier to enumerate with zeroes instead of omitting first line of the list to enumerate as said in documentation.
Note: personally I have no mappings. It is easier to remember what g <C-a> does and use it directly. Answer above describes usage of pure <C-a> which requires you to manually count whatever, on the other hand g <C-a> can increment numbers with given value (aka step) and have it's "internal counter".
Create a map for #DmitrySandalov solution:
vnoremap <silent> <Leader>n :<C-U>let i=1 \| '<,'>g/^/s//\=i.'. '/ \| let i=i+1 \| nohl<CR>
I have a file with long lines and need to see/ copy what the values are in a specic location(s) for the whole file but copy the rest of the line.
If the text width is small enough, ~184 columns, I can use :set colorcolumnnum to highlight the value. However over 184 characters it gets a bit unwieldy scrolling.
I tried :g/\%1237c/y Z, for one of the positions I needed, but that yanked the entire line.
eg for a smaller sample :g/\%49c/y Z will yank all of line 1 and 2 but I want to yank, or copy, the character at that column ie = on line 1 and x on line 2.
vim: filetype=help foldmethod=indent foldclose=all modifiable noreadonly
Table of Contents *sfcontents* *vim* *regex* *sfregex*
*sfsearch* - Search specific commands
|Ampersand-replaces-previous-pattern|
|append-a-global-search-to-a-register|
*sfHelp* Various Help related commands
There are two problems with your :g command:
For each matching line, the cursor is positioned on the first column. So even though you've matched at a particular column, that position is lost.
The \%c atom actually matches byte indices (what Vim somewhat confusingly names "columns"), so your measurement will be off for Tab and non-ASCII characters. Use the virtual column atom \%v instead.
Instead of :global, I would use :substitute with a replace-expression, in the idiom described at how to extract regex matches using vim:
:let t=[] | %s/\%49v./\=add(t, submatch(0))[-1]/g | let ## = join(t, "\n")
Alternatively, if you install my ExtractMatches plugin, I'd be that short command invocation:
:YankMatchesToReg /\%50v./
A several line document has a header/title section and then about 10 listings under each. I need to put the header/title info in with each of the listings so that they can be properly uploaded into a website (using comma and pipe delimiters). It looks like this:
SectionName1 and TitleName1
1111 - The SubSectionName A
222 - The SubSectionName B
3333 - The SubSectionName C
SectionName2 and TitleName2
444 - The SubSectionName D
55555 - The SubSectionName E
66 - The SubSectionName F
Repeating several hundred times. What I need is to produce something like:
SectionName1,TitleName1,1111,SubSectionNameA
SectionName1,TitleName1,222,SubSectionNameB
SectionName1,TitleName1,3333,SubSectionNameC
SectionName2,TitleName2,444,SubSectionNameD
SectionName2,TitleName2,55555,SubSectionNameE
SectionName2,TitleName2,66,SubSectionNameF
I realize there can multiple approaches to this solution, but I'm having a difficult time pulling the trigger on any one method. I understand submatches, joins and getline but I am not good at practical use of them in this scenario.
Any help to get me mentally started would be greatly appreciated.
Let me propose the following quite general Ex command solving the
issue.1
:g/^\s*\h/d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|
\ 'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
At the top level, this is the :global command that enumerates the lines
starting with zero or more whitespace characters followed by a Latin letter or
an underscore (see :help /\h). The lines matching this pattern are supposed
to be the header lines containing section and title names. The rest of the
command, after the pattern describing the header lines, are instructions to be
executed for each of those lines.
The actions to be performed on the headers can be divided into three steps.
Delete the current header line, at the same time extracting section
and title names from it.
:d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')
First, remove the current line, saving it into the unnamed register,
using the :delete command. Then, update the contents of that
register (referred to as #"; see :help #r and :help "") to be
result of the substitution changing the word and surrounded by
whitespace characters, to a single comma. The actual replacement is
carried out by the substitute() function.
However, the input is not the exact string containing the whole header
line, but its prefix leaving out the last character, which is
a newline symbol. The [:-2] notation is a short form of the
[0:-2] subscript expression that designates the substring from the
very first byte to the second one counting from the end (see :help
expr-[:]). This way, the unnamed register holds the section and the
title names separated by comma.
Determine the range of dependent subsection lines.
:ki|/\n\s*\h\|\%$/kj
After the first step, the subsection records belonging to the just
parsed header line are located starting from the current line (the one
followed the header) until the next header line or, if there is no
such line below, the end of buffer. The numbers of these lines are
stored in the marks i and j, respectively. (See :helpg ^A mark
is for description of marks.)
The marks are placed using the :k command that sets a specified mark
at the last line of a given range which is the current line, by
default. So, unlike the first line of the considered block, the last
one requires a specific line range to point out its location.
A particular form of range, denoting the next line where a given
pattern matches, is used in this case (see :help :range). The
pattern defining the location of the line to be found, is composed in
such a way that it matches a line immediately preceding a header (a
line starting with possible whitespace followed by an alphabetical
character), or the very last line. (See :help pattern for details
about syntax of Vim regular expressions.)
Transform the delineated subsection lines according to desired format,
prepending section and title names found in the corresponding header
line.
:'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
This step comprised of the two :substitute commands that are run
over the range of lines delimited by the locations labelled by the
marks i and j (see :help [range]).
The first substitution command matches the beginning of a subsection
line—an identifier followed by a hyphen and the word The, all
floating in a whitespace—and replaces it with the contents of the
unnamed register, holding the section and title names concatenated
with a comma, the matched identifier, and another comma. The second
substitution finalizes the transformation by squeezing all whitespace
characters on the line to gum the subsection name and the following
letter together.
To construct the replacement string in the first :substitute
command, the substitute-with-an-expression feature is used (see :help
sub-replace-\=). The substitution part of the command should start
with \= for Vim to interpret the remaining text not in a regular
way, but as an expression (see :help expression). The result of
that expression's evaluation becomes the substitution string. Note
the use of the submatch() function in the substitute expression to
retrieve the text of a submatch by its number.
1 The command is wrapped for better readability, its one-line
version is listed below for ease of copy-pasting into Vim command line. Note
that the wrapped command can be used in a Vim script without any change.
:g/^\s*\h/d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
Simplest/fastest way I can think of is a simple macro. Do once, rinse, repeat.
Assuming your cursor is initially on the first character of the first line (S of SectionName), this macro should work as long as the document is exactly in the same format as posted above.
f ctT,<Esc>yyjpjjpjddkkkddkkkJr,f ctS,<Esc>f xjJr,f ctS,f xjJr,f ctS,<Esc>f xjdd
well I think the question is not that clear. why in your demo input, after "-", the text was like:
55555 - The SubSectionName E
but in your expected output, it turned into:
55555,SubSectionNameE
all spaces were removed, this is ok, but why "The" was removed as well? is there any pattern for "the" ?
I wrote an awk oneliner, it removes all spaces in output, but leave those "The" there, you can change it to get the right output you need.
awk -F' and ' -vOFS="," 'NF>1{s=$1;t=$2;next;}$1{gsub(/\s+/,"");gsub(/-/,",");print s,t,$0} ' input
test on your example input:
kent$ cat v
SectionName1 and TitleName1
1111 - The SubSectionName A
222 - The SubSectionName B
3333 - The SubSectionName C
SectionName2 and TitleName2
444 - The SubSectionName D
55555 - The SubSectionName E
66 - The SubSectionName F
kent$ awk -F' and ' -vOFS="," 'NF>1{s=$1;t=$2;next;}$1{gsub(/\s+/,"");gsub(/-/,",");print s,t,$0} ' v
SectionName1,TitleName1,1111,TheSubSectionNameA
SectionName1,TitleName1,222,TheSubSectionNameB
SectionName1,TitleName1,3333,TheSubSectionNameC
SectionName2,TitleName2,444,TheSubSectionNameD
SectionName2,TitleName2,55555,TheSubSectionNameE
SectionName2,TitleName2,66,TheSubSectionNameF