Vim Markdown highlighting (list items and code block conflicts) - regex

I decide to learn more about vim and its syntax highlighting.
Using examples for others, I am creating my own syntax file for Markdown. I have seen mkd.vim and it has this problem too.
My issue is between list items and code block highlighting.
Code Block definition:
first line is blank
second line begins with at least 4 spaces or 1 tab
block is finished with a blank line
Example:
Regular text
this is code, monospaced and left untouched by markdown
another line of code
Regular Text
My Vim syntax for code block:
syn match mkdCodeBlock /\(\s\{4,}\|\t\{1,}\).*\n/ contained nextgroup=mkdCodeBlock
hi link mkdCodeBlock comment
Unorder List item definition:
first line is blank
second line begins with a [-+*] followed by a space
the list is finished with a blank line then a normal (non-list) line
in between line items any number of blank lines can be added
a sub list is specified by indenting (4 space or 1 tab)
a line of normal text after a list item is include as a continuation of that list item
Example:
Regular text
- item 1
- sub item 1
- sub item 2
- item 2
this is part of item 2
so is this
- item 3, still in the same list
- sub item 1
- sub item 2
Regular text, list ends above
My Vim syntax for unorder list item definition (I only highlight [-+*]):
syn region mkdListItem start=/\s*[-*+]\s\+/ matchgroup=pdcListText end=".*" contained nextgroup=mkdListItem,mkdListSkipNL contains=#Spell skipnl
syn match mkdListSkipNL /\s*\n/ contained nextgroup=mkdListItem,mkdListSkipNL skipnl
hi link mkdListItem operator
I cannot get the highlighting to work with the last two rule for list and with a code block.
This is an example that breaks my syntax highlighting:
Regular text
- Item 1
- Item 2
part of item 2
- these 2 line should be highlighted as a list item
- but they are highlighted as a code block
I currently cannot figure out how to get the highlighting to work the way I want it too
Forgot to add a "global" syntax rule used in both rules listed below. It is to ensure a that they start with a blank line.
syn match mkdBlankLine /^\s*\n/ nextgroup=mkdCodeBlock,mkdListItem transparent
Another Note: I should have been more clear. In my syntax file, the List rules appear before the Blockquote Rules

Just make sure that the definition of mkdListItem is after the definition of mkdCodeBlock, like this:
syn match mkdCodeBlock /\(\s\{4,}\|\t\{1,}\).*\n/ contained nextgroup=mkdCodeBlock
hi link mkdCodeBlock comment
syn region mkdListItem start=/\s*[-*+]\s\+/ matchgroup=pdcListText end=".*" contained nextgroup=mkdListItem,mkdListSkipNL contains=#Spell skipnl
syn match mkdListSkipNL /\s*\n/ contained nextgroup=mkdListItem,mkdListSkipNL skipnl
hi link mkdListItem operator
syn match mkdBlankLine /^\s*\n/ nextgroup=mkdCodeBlock,mkdListItem transparent
Vim documentation says in :help :syn-define:
"In case more than one item matches at the same position, the one that was
defined LAST wins. Thus you can override previously defined syntax items by
using an item that matches the same text. But a keyword always goes before a
match or region. And a keyword with matching case always goes before a
keyword with ignoring case."

hcs42 was correct. I do remember reading that section now, but I forgot about it until hcs24 reminded me about it.
Here is my updated syntax (few other tweaks) that works:
"""""""""""""""""""""""""""""""""""""""
" Code Blocks:
" Indent with at least 4 space or 1 tab
" This rule must appear for mkdListItem, or highlighting gets messed up
syn match mkdCodeBlock /\(\s\{4,}\|\t\{1,}\).*\n/ contained nextgroup=mkdCodeBlock
"""""""""""""""""""""""""""""""""""""""
" Lists:
" These first two rules need to be first or the highlighting will be
" incorrect
" Continue a list on the current line or next line
syn match mkdListCont /\s*[^-+*].*/ contained nextgroup=mkdListCont,mkdListItem,mkdListSkipNL contains=#Spell skipnl transparent
" Skip empty lines
syn match mkdListSkipNL /\s*\n/ contained nextgroup=mkdListItem,mkdListSkipNL
" Unorder list
syn match mkdListItem /\s*[-*+]\s\+/ contained nextgroup=mkdListSkipNL,mkdListCont skipnl

Tao Zhyn, that maybe covers your use cases but it doesn't cover the Markdown syntax. In Markdown a list item could contain a code block. You could take a look at my solution here
TL;DR; the problem is that vim doesn't let you say something like: a block that have the same indentation as its container + 4 spaces. The only solution I found is to generate rules for each kind of blocks that could be contained in a list items for each level of indentation (actually I support 42 level of indentation but it's an arbitrary number)
So I have markdownCodeBlockInListItemAtLevel1 that must be contained in a markdownListItemAtLevel1 and it needs to have at least 8 leading spaces, an then markdownCodeBlockInListItemAtLevel2 that must be contained in a markdownListItemAtLevel2 that must be contained in a markdownListItemAtLevel1 ant needs to have at least 10 leading spaces, ecc...
I know that a few years have passed but maybe someone would consider this answer helpful since all syntax based on indentation suffers of the same problem

Related

Mass regex search-and-replace BETWEEN patterns

I have a directory with a bunch of text files, all of which follow this structure:
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
- Again, some list items of random text
- Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
And I need to run a replace operation (let's say, I need to prepend CCC at the beginning of the line, just after the dash) on only those "list items", which are between PATTERN_A and PATTERN_B. The problem is they aren't really much different from the text above PATTERN_A, or below PATTERN_B, so an ordinary regex can't really catch them without also affecting the remaining text.
So, my question would be, what tool and what regex should I use to perform that replacement?
(Just in case, I'm fine with Vim, and I can collect those files in a QuickFix for a further :cdo, for example. I'm not that good with awk, unfortunately, and absolutely bad with Perl :))
Thanks!
If I have understood your questions, you can do so quite easily with a pattern-range selection and the general substitution form with sed (stream editor). For example, in your case:
$ sed '/PATTERN_A/,/PATTERN_B/s/^\([ ]*-\)/\1CCC/' file
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
(note: to substitute in place within the file add the -i option, and to create a backup of the original add -i.bak which will save the original file as file.bak)
Explanation
/PATTERN_A/,/PATTERN_B/ - select lines between PATTERN_A and PATTERN_B
s/^\([ ]*-\)/\1CCC/ - substitute (general form 's/find/replace/') where find is from beginning of line ^ capturing text between \(...\) that contains [ ]*- (any number of spaces and a hyphen) and then replace with \1 (called a backreference that contains all characters you captured with the capture group \(...\)) and appending CCC to its end.
Look things over and let me know if you have questions or if I misinterpreted your question.
With Perl also, you can get the results
> perl -pe ' { s/^(\s*-)/\1CCC/g if /PATTERN_A/../PATTERN_B/ } ' mass_replace.txt
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
>

Insert line in to pattern on text file python

I have a text file that takes the form of:
first thing: content 1
second thing: content 2
third thing: content 3
fourth thing: content 4
This pattern repeats throughout the entire text file. However, sometimes one of the rows is completely gone like so:
first thing: content 1
second thing: content 2
fourth thing: content 4
How could I search the document for these missing rows and just add it back with a value of "NA" or some filler to produce a new text file like this:
# 'third thing' was not there, so re-adding it with NA as content
first thing: content 1
second thing: content 2
third thing: NA
fourth thing: content 4
Current code boilerplate:
with open('original.txt, 'r') as in:
with open('output.txt', 'wb') as out:
#Search file for pattern (Maybe regex?)
#If pattern does not exist, add the line
Thanks for any help you all can offer!
You must look for 1-3 lines (less than 4) followed by newline:
^\n([^\n]*\n){1,3}\n
Demo: https://regex101.com/r/rL3eA5/2
This isn't pretty, but it works. Here's a regex to detect where lines are missing:
(?:^|\n)(second thing:\s*[^\n]+\n)|(first thing:\s*[^\n]+\n(?!second thing:))|(second thing:\s*[^\n]+\n(?!third thing:))|(third thing:\s*[^\n]+\n(?!fourth thing:))|(third thing:\s*[^\n]+\n\n)
regex101 demo here
Notice the Single Line flag.
When you've got a match, check which match group that matches. If it's the first one, the first line is missing. If it's the second one, the second line is missing and so on for third and fourth.
Here's an example how to replace if the 1'st group got a match.
Here's an example how to replace if the 3'rd group got a match.
Here's an example how to replace if the 4'rd group got a match.
You'll probably have to do some tweaking, but it should get you on your way ;)
Regards.

Enumerate existing text in Vim (make numbered list out of existing text)

I have a source document with the following text
Here is a bunch of text
...
Collect underpants
???
Profit!
...
More text
I would like to visually select the middle three lines and insert numbers in front of them:
Here is a bunch of text
...
1. Collect underpants
2. ???
3. Profit!
...
More text
All the solutions I found either put the numbers on their own new lines or prepended the actual line of the file.
How can I prepend a range of numbers to existing lines, starting with 1?
It makes for a good macro.
Add the first number to your line, and put your cursor back at the beginning.
Start a macro with qq (or q<any letter>)
Copy the number with yf<space> (yank find )
Move down a line with j
Paste your yank with P
Move back to the beginning of the line with 0
Increment the number with Ctrl-a
Back to the beginning again with 0 (incrementing positions you at the end of the number)
End the macro by typing q again
Play the macro with #q (or #<the letter you picked>)
Replay the macro as many times as you want with <number>## (## replays the last macro)
Profit!
To summarize the fun way, this GIF image is i1. <Esc>0qqyf jP0^a0q10#q.
To apply enumeration for all lines:
:let i=1 | g/^/s//\=i.'. '/ | let i=i+1
To enumerate only selected lines:
:let i=1 | '<,'>g/^/s//\=i.'. '/ | let i=i+1
Set non recursive mapping with following command and type ,enum in command mode when cursor is inside the lines you are going to enumerate.
:nn ,enum {j<C-v>}kI0. <Esc>vipg<C-a>
TL;DR
You can type :help CTRL-A to see an answer on your question.
{Visual}g CTRL-A Add [count] to the number or alphabetic character in
the highlighted text. If several lines are
highlighted, each one will be incremented by an
additional [count] (so effectively creating a
[count] incrementing sequence).
For Example, if you have this list of numbers:
1.
1.
1.
1.
Move to the second "1." and Visually select three
lines, pressing g CTRL-A results in:
1.
2.
3.
4.
If you have a paragraph (:help paragraph) you can select it (look at :help object-select). Suppose each new line in the paragraph needs to be enumerated.
{ jump to the beginning of current paragraph
j skip blank line, move one line down
<C-v> emulates Ctrl-v, turns on Visual mode
} jump to the end of current paragraph
k skip blank line, move one line up
required region selected, we can make multi row edit:
I go into Insert mode and place cursor in the beginning of each line
0. is added in the beginning of each line
<Esc> to change mode back to Normal
You should get list prepended with zeros. If you already have such, you can omit this part.
vip select inner paragraph (list prepended with "0. ")
g<C-a> does the magic
I have found it easier to enumerate with zeroes instead of omitting first line of the list to enumerate as said in documentation.
Note: personally I have no mappings. It is easier to remember what g <C-a> does and use it directly. Answer above describes usage of pure <C-a> which requires you to manually count whatever, on the other hand g <C-a> can increment numbers with given value (aka step) and have it's "internal counter".
Create a map for #DmitrySandalov solution:
vnoremap <silent> <Leader>n :<C-U>let i=1 \| '<,'>g/^/s//\=i.'. '/ \| let i=i+1 \| nohl<CR>

Copying only the value at column n Vim

I have a file with long lines and need to see/ copy what the values are in a specic location(s) for the whole file but copy the rest of the line.
If the text width is small enough, ~184 columns, I can use :set colorcolumnnum to highlight the value. However over 184 characters it gets a bit unwieldy scrolling.
I tried :g/\%1237c/y Z, for one of the positions I needed, but that yanked the entire line.
eg for a smaller sample :g/\%49c/y Z will yank all of line 1 and 2 but I want to yank, or copy, the character at that column ie = on line 1 and x on line 2.
vim: filetype=help foldmethod=indent foldclose=all modifiable noreadonly
Table of Contents *sfcontents* *vim* *regex* *sfregex*
*sfsearch* - Search specific commands
|Ampersand-replaces-previous-pattern|
|append-a-global-search-to-a-register|
*sfHelp* Various Help related commands
There are two problems with your :g command:
For each matching line, the cursor is positioned on the first column. So even though you've matched at a particular column, that position is lost.
The \%c atom actually matches byte indices (what Vim somewhat confusingly names "columns"), so your measurement will be off for Tab and non-ASCII characters. Use the virtual column atom \%v instead.
Instead of :global, I would use :substitute with a replace-expression, in the idiom described at how to extract regex matches using vim:
:let t=[] | %s/\%49v./\=add(t, submatch(0))[-1]/g | let ## = join(t, "\n")
Alternatively, if you install my ExtractMatches plugin, I'd be that short command invocation:
:YankMatchesToReg /\%50v./

What would be the best approach to this substitution in Vim?

A several line document has a header/title section and then about 10 listings under each. I need to put the header/title info in with each of the listings so that they can be properly uploaded into a website (using comma and pipe delimiters). It looks like this:
SectionName1 and TitleName1
1111 - The SubSectionName A
222 - The SubSectionName B
3333 - The SubSectionName C
SectionName2 and TitleName2
444 - The SubSectionName D
55555 - The SubSectionName E
66 - The SubSectionName F
Repeating several hundred times. What I need is to produce something like:
SectionName1,TitleName1,1111,SubSectionNameA
SectionName1,TitleName1,222,SubSectionNameB
SectionName1,TitleName1,3333,SubSectionNameC
SectionName2,TitleName2,444,SubSectionNameD
SectionName2,TitleName2,55555,SubSectionNameE
SectionName2,TitleName2,66,SubSectionNameF
I realize there can multiple approaches to this solution, but I'm having a difficult time pulling the trigger on any one method. I understand submatches, joins and getline but I am not good at practical use of them in this scenario.
Any help to get me mentally started would be greatly appreciated.
Let me propose the following quite general Ex command solving the
issue.1
:g/^\s*\h/d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|
\ 'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
At the top level, this is the :global command that enumerates the lines
starting with zero or more whitespace characters followed by a Latin letter or
an underscore (see :help /\h). The lines matching this pattern are supposed
to be the header lines containing section and title names. The rest of the
command, after the pattern describing the header lines, are instructions to be
executed for each of those lines.
The actions to be performed on the headers can be divided into three steps.
Delete the current header line, at the same time extracting section
and title names from it.
:d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')
First, remove the current line, saving it into the unnamed register,
using the :delete command. Then, update the contents of that
register (referred to as #"; see :help #r and :help "") to be
result of the substitution changing the word and surrounded by
whitespace characters, to a single comma. The actual replacement is
carried out by the substitute() function.
However, the input is not the exact string containing the whole header
line, but its prefix leaving out the last character, which is
a newline symbol. The [:-2] notation is a short form of the
[0:-2] subscript expression that designates the substring from the
very first byte to the second one counting from the end (see :help
expr-[:]). This way, the unnamed register holds the section and the
title names separated by comma.
Determine the range of dependent subsection lines.
:ki|/\n\s*\h\|\%$/kj
After the first step, the subsection records belonging to the just
parsed header line are located starting from the current line (the one
followed the header) until the next header line or, if there is no
such line below, the end of buffer. The numbers of these lines are
stored in the marks i and j, respectively. (See :helpg ^A mark
is for description of marks.)
The marks are placed using the :k command that sets a specified mark
at the last line of a given range which is the current line, by
default. So, unlike the first line of the considered block, the last
one requires a specific line range to point out its location.
A particular form of range, denoting the next line where a given
pattern matches, is used in this case (see :help :range). The
pattern defining the location of the line to be found, is composed in
such a way that it matches a line immediately preceding a header (a
line starting with possible whitespace followed by an alphabetical
character), or the very last line. (See :help pattern for details
about syntax of Vim regular expressions.)
Transform the delineated subsection lines according to desired format,
prepending section and title names found in the corresponding header
line.
:'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
This step comprised of the two :substitute commands that are run
over the range of lines delimited by the locations labelled by the
marks i and j (see :help [range]).
The first substitution command matches the beginning of a subsection
line—an identifier followed by a hyphen and the word The, all
floating in a whitespace—and replaces it with the contents of the
unnamed register, holding the section and title names concatenated
with a comma, the matched identifier, and another comma. The second
substitution finalizes the transformation by squeezing all whitespace
characters on the line to gum the subsection name and the following
letter together.
To construct the replacement string in the first :substitute
command, the substitute-with-an-expression feature is used (see :help
sub-replace-\=). The substitution part of the command should start
with \= for Vim to interpret the remaining text not in a regular
way, but as an expression (see :help expression). The result of
that expression's evaluation becomes the substitution string. Note
the use of the submatch() function in the substitute expression to
retrieve the text of a submatch by its number.
1 The command is wrapped for better readability, its one-line
version is listed below for ease of copy-pasting into Vim command line. Note
that the wrapped command can be used in a Vim script without any change.
:g/^\s*\h/d|let#"=substitute(#"[:-2],'\s\+and\s\+',',','')|ki|/\n\s*\h\|\%$/kj|'i,'js/^\s*\(\d\+\)\s\+-\s\+The/\=#".','.submatch(1).','/|'i,'js/\s\+//g
Simplest/fastest way I can think of is a simple macro. Do once, rinse, repeat.
Assuming your cursor is initially on the first character of the first line (S of SectionName), this macro should work as long as the document is exactly in the same format as posted above.
f ctT,<Esc>yyjpjjpjddkkkddkkkJr,f ctS,<Esc>f xjJr,f ctS,f xjJr,f ctS,<Esc>f xjdd
well I think the question is not that clear. why in your demo input, after "-", the text was like:
55555 - The SubSectionName E
but in your expected output, it turned into:
55555,SubSectionNameE
all spaces were removed, this is ok, but why "The" was removed as well? is there any pattern for "the" ?
I wrote an awk oneliner, it removes all spaces in output, but leave those "The" there, you can change it to get the right output you need.
awk -F' and ' -vOFS="," 'NF>1{s=$1;t=$2;next;}$1{gsub(/\s+/,"");gsub(/-/,",");print s,t,$0} ' input
test on your example input:
kent$ cat v
SectionName1 and TitleName1
1111 - The SubSectionName A
222 - The SubSectionName B
3333 - The SubSectionName C
SectionName2 and TitleName2
444 - The SubSectionName D
55555 - The SubSectionName E
66 - The SubSectionName F
kent$ awk -F' and ' -vOFS="," 'NF>1{s=$1;t=$2;next;}$1{gsub(/\s+/,"");gsub(/-/,",");print s,t,$0} ' v
SectionName1,TitleName1,1111,TheSubSectionNameA
SectionName1,TitleName1,222,TheSubSectionNameB
SectionName1,TitleName1,3333,TheSubSectionNameC
SectionName2,TitleName2,444,TheSubSectionNameD
SectionName2,TitleName2,55555,TheSubSectionNameE
SectionName2,TitleName2,66,TheSubSectionNameF