Markdown: lists are not well converted - list

I'm trying to create a list in Markdown. As I've read in some documentation, if I write this Markdown code:
My list
* first item
* second item
* third item
Not in the list
I would get as result the same as if I write this in HTML:
<p>My list</p>
<li>
<ul>first item</ul>
<ul>second item</ul>
<ul>third item</ul>
</li>
<p>Not in the list</p>
I use Atom as editor and its Markdown previewer and everything is OK, but when I use pandoc to convert my Markdown file as follows:
pandoc test.md -o test.odt
what I get is this:
My list * first item * second item * third item
Not in the list
Where am I doing wrong?

There are two possible solutions to your problem:
Add a blank line between the paragraph and the list (as #melpomene mentioned in a comment).
My list
* first item
* second item
* third item
Not in the list
Leave out the blank line and tell Pandoc to use commonmark as the input format rather than the default, markdown.
pandoc -f commonmark -o test.odt test.md
The "problem" is that the Atom editor uses a CommonMark parser and, by default, Pandoc uses an old-school Markdown parser which mostly follows these rules and the reference implementation (markdown.pl). In fact, the Commonmark spec specifically acknowledges this difference:
In CommonMark, a list can interrupt a paragraph. That is, no blank
line is needed to separate a paragraph from a following list:
Foo
- bar
- baz
<p>Foo</p>
<ul>
<li>bar</li>
<li>baz</li>
</ul>
Markdown.pl does not allow this, through fear of triggering a list
via a numeral in a hard-wrapped line:
The number of windows in my house is
14. The number of doors is 6.
If you want common behavior among your tools, then you need to only use tools which follow the same behavior.

Related

Using regex extract a particular text from a paragraph

I have used the below to extract a string from a paragraph.
data = '''actions/steps to (re-) produce the problem:
1) Media--> Music collectio--> on right side--> click on Add Favourite icon--> on clicking Add from Favourite icon--> (Delete from favourite ) will display--> again click on Delete the favourite
expected result/behaviour:
it should display the track as well
observed result/behavior:
1st track list will display then
2nd list of songs will display
3rd no records will display
this behaviour will appear again and again
possible impact:
this can be an issue while driving
actions/steps to recover from error:
software version tested (including supplied software or CAF version if relevant):
MGU :- 17w.25.4-2'''
observed=[]
for i in data["Error Description"]:
if len(re.findall(r'(Observed result\/behavior:|observed result\/behavior:)([^(]*)Possible impact:', i))==1:
observed.append((re.findall(r'(Observed result\/behavior:|observed result\/behavior:)([^(]*)Possible impact:', i))[0][1])
else:
observed.append(" ".join((re.findall(r'(Observed result\/behavior:|observed result\/behavior:)([^(]*)Possible impact:', i))))
OUTPUT :
It shows nothing as the "observed:" has 4 lines. If it generally has one line and the immediate preceding is "possible impact:" then it displays the output.
I need my output though if the observed has n no of lines
Please help.
This should work on the assumption that observed result/behavior: will have one blank line before the next paragraph:
begin = data.index('observed result/behavior:')
end = data[begin:].index('\n\n')
output = data[begin:(begin+end)]
print(output)
observed result/behavior:
1st track list will display then
2nd list of songs will display
3rd no records will display
this behaviour will appear again and again

Preserve indentation in C++ comments in vim

Is it possible to configure vim and cindent to not alter indentation in c++ comments when reindenting the file (gg=G) ?
I have some formated lists in comments aligned with 4 spaces but vim interpret this as bad indent and realign everything.
For example:
/**
my list:
* item 1
* item 2
*/
becomes:
/**
my list:
* item 1
* item 2
*/
I want a way to tell vim: "Don't touch to comments content but indent everything else."
It is important because our project use doxygen with a markdown like parser to generate documentation and indentation is used by list levels.
How about writing like this so in-comment indentation is independent of comment indentation:
/**
* my list:
* * item 1
* * item 2
*/
As suggested by review, I repost an answer with answer from vi stackexchange community here:
I don't believe it's possible to achieve this with 'cinoptions'.
The correct solution is probably to write a new indentexpr that applies C-indenting (accessible via the cindent() function) only to lines that aren't within comments.
However, here's a couple of quick and dirty solutions:
I skipped first solution which I don't use and is therefore not the answer. You can still see it on the original post.
Using a Function
function! IndentIgnoringComments()
let in_comment = 0
for i in range(1, line('$'))
if !in_comment
" Check if this line starts a comment
if getline(i) =~# '^\s*/\*\*'
let in_comment = 1
else
" Indent line 'i'
execute i . "normal =="
endif
else
" Check if this line ends the comment
if getline(i) =~# '\*\/\s*$'
let in_comment = 0
endif
endif
endfor
endfunction
You can run this with :call IndentIgnoringComments() or you could set up a command or a mapping. e.g.:
nnoremap <leader>= :call IndentIgnoringComments()<CR>
I personaly defined a command which call this function and combine it with another reformating I apply often on files in this project (:%s/\s*$//g).
Thank to Rich on https://vi.stackexchange.com
Original post: https://vi.stackexchange.com/a/13962/13084

How to nest text and element in ember's emblem?

In haml the following would produce the correctly nested HTML:
%p Hi There I'm inside this paragraph
%button I'm also inside this paragraph
Produces:
<p> Hi There I'm inside this paragraph <button>I'm also inside this paragrpah</button></p>
In Emblem.js if I do:
p Hi There I'm inside this paragraph
%button I'm also trying to be in the paragraph
It produces this:
<p> Hi There I'm inside this paragraph %button I'm also trying to be in the paragraph</p>
Does anyone know how to nest content and elements inside emblem.js?
p Hi There I'm inside this paragraph
button I'm also trying to be in the paragraph
without the % will do the trick
I can recommend checking out http://emblemjs.com/syntax/ , it's a great resource that briefly explains every possible use case.

Construct Xpath

I have the following repeated piece of the web-page:
<div class="txt ext">
<strong class="param">param_value1</strong>
<strong class="param">param_value2</strong>
</div>
I would like to extract separately values param_value1 and param_value2 using Xpath. How can I do it?
I have tried the following constructions:
'//strong[#class="param"]/text()[0]'
'//strong[#class="txt ext"]/strong[#class="param"][0]/text()'
'//strong[#class="param"]'
none of which returned me separately param_value1 and param_value2.
P.S. I am using Python 2.7 and the latest version of Scrapy.
Here is my testing code:
test_content = '<div class="txt ext"><strong class="param">param_value1</strong><strong class="param">param_value2</strong></div>'
sel = HtmlXPathSelector(text=test_content)
sel.select('//div/strong[#class="param"]/text()').extract()[0]
sel.select('//div/strong[#class="param"]/text()').extract()[1]
// means descendant or self. You are selecting any strong element in any context. [...] is a predicate which restricts your selection according to some boolean test. There is no strong element with a class attribute which equals txt ext, so you can exclude your second expression.
Your last expression will actually return a node-set of all the strong elements which have a param attribute. You can then extract individual nodes from the node set (use [1], [2]) and then get their text contents (use text()).
Your first expression selects the text contents of both nodes but it's also wrong. It's in the wrong place and you can't select node zero (it doesn't exist). If you want the text contents of the first node you should use:
//strong[#class="param"][1]/text()
and you can use
//strong[#class="param"][2]/text()
for the second text.

Qt : QXmlQuery and XPaths

I'm here to ask you some help with QXmlQuery and Xpath.
I'm trying to use this combination to extract some data from several HTML documents.
These documents are downloaded and then cleaned with the HTML Tidy Library.
The problem is when I try my XPath. Here is an example code :
[...]
<ul class="bullet" id="idTab2">
<li><span>Hauteur :</span> 1127 mm</li>
<li><span>Largeur :</span> 640 mm</li>
<li><span>Profondeur :</span> 685 mm</li>
<li><span>Poids :</span> 159.6 kg</li>
[...]
The clean code is stored in a QString "code" :
QStringList fields, values;
QXmlQuery query;
query.setFocus(code);
query.setQuery("//*[#id=\"idTab2\"]/*/*/string()");
query.evaluateTo(&fields);
My goal is to get all the fields (Hauteur, Largeur, Profondeur, Poids, etc.) and their value (1127 mm, 640 mm, 685 mm, 159.6 kg, etc.).
Question 1
As you can see, I use this XPath //*[#id="idTab2"]/*/*/string() to recover the fields because this : //ul[#id="idTab2"]/li/span/string() doesn't work. When I try to specify a tag name, it gives me nothing. It only works with *. Why ? I've checked the code returned by the tidy function and the XPath is not altered. So, I don't see any prolem. Is this normal ? Or maybe there is something I don't know...
Question 2
In the previous XHTML code, the li tags wrap a span tag and some text. I don't know how to get only the text and not the content of the span tag. I tried :
//*[#id="idTab2"]/*/string() gives : Hauteur : 1127 mm Largeur : 640 mm Profondeur : 685 mm
//*[#id="idTab2"]/*[2]/string() gives : Nothing
So, if I'm not wrong, the text in the li tag is not considered as a child node but it should be. See the accepted answer : Select just text directly in node, not in child nodes.
Thanks for reading, I hope someone can help me.
To get the elements (not the text representation) inside the different <li>s, you can test the text content:
//*[#id=\"idTab2\"]/li[starts-with(span, "Hauteur")]
Same thing of other items:
//*[#id=\"idTab2\"]/li[starts-with(span, "Largeur")]
//*[#id=\"idTab2\"]/li[starts-with(span, "Profondeur")]
//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")]
To get the string representation of these <li>, you can use string() around the whole expression, like this:
string(//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")])
which gives "Poids : 159.6 kg"
To extract only the text node in the <li>, without the <span>, you can use these expressions, which select the text nodes which are direct children of <li> (<span> is not a text node), and removes the leading and trailing whitespace characters (normalize-space())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Hauteur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Largeur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Profondeur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")]/text())
The last on gives "159.6 kg"