RStudio RMarkdown changes paragraphs starting with `(J\)` when switching to visual mode - r-markdown

Consider the following R Markdown document:
---
title: "Test"
author: "Joe Cool"
date: '2023-02-03'
output: html_document
---
(M\) This is one paragraph. Behold the glorious text it contains.
(M\) This is another paragraph. It too has marvelous content. Behold its wonders.
(J\) This paragraph is not as good as the other two. But it tried hard. We should give it credit for that.
(M\) And then we resume the good stuff.
The reason for writing (M\) and (J\) instead of (M) and (J) is because otherwise R Markdown will interpret this as a list, when it is not a list but portion marked paragraphs. This was how I found a way to not get these paragraphs inappropriately converted.
If you switch to Visual mode in RStudio with a document like this, then switch back to Source mode, the document gets changed to this:
---
title: "Test"
author: "Joe Cool"
date: '2023-02-03'
output: html_document
---
(M) This is one paragraph. Behold the glorious text it contains.
(M) This is another paragraph. It too has marvelous content. Behold its wonders.
(J) This paragraph is not as good as the other two. But it tried hard. We should
give it credit for that.
(M) And then we resume the good stuff.
This is bad because this is interpreted by Pandoc as a list and gets rendered as such, when it is not a list.
So how do I protect the parentheses and their contents in such a way to prevent this conversion? Or is there a better way to protect them so they are not converted to a list? No, I'm not changing the parentheses' comments; there's a reason why they are the way they are.

Related

Markdown List Parsing

I am writing a markdown document with a lot of short lists with headings
This is a sentence describing a list:
- Hello
- World
This renders fine, but the space between the heading sentence and list makes the markdown disorganized, especially since my document has so many short lists. I'd like to do something like this:
This is a sentence describing a list:
- Hello
- World
so there is no space between the heading and list in markdown file. Unfortunately, markdown renders that as one big sentence ("This is a sentence describing a list: -hello -world") and not as a list. Is there a way to force a break at the end of the line to make markdown recognize the dashes as a list? A solution of this format
This is a sentence describing a list:[something like \newline]
- Hello
- World
would be perfect. I would like to do this in straight markdown, but for right now I have additional flexibility of Latex commands since I am embedding markdown within a Latex document (this package: https://ctan.math.washington.edu/tex-archive/macros/generic/markdown/markdown.pdf)
You can use fake bullet lists to achieve this. You can build them up using non-breaking spaces and a ⦁ (Z NOTATION SPOT) character.
This is a sentence describing a list:[space][space]
⦁ Hello[space][space]
⦁ World
Result:
This is a sentence describing a list:
   ⦁  Hello
   ⦁  World
For comparison, here's a real bullet list:
This is a sentence describing a list:
Hello
World

how to match a sentence having particular word in different patterns

We have a problem here...
We have a text having different patterns of sentences.
We want to get the sentence having a particular word.
Eg:
One further point, by way of providing another model. The analysis in
the second paragraph could lead in the following direction. 'The
Destructors' deals with, obviously, destruction, whilst the book of
Genesis deals with creation. The vocabulary is similar: Blackie
notices that 'chaos had advanced', an ironic reversal of God's
imposing of form on a void. Furthermore, the phrase 'streaks of light
came in through the closed shutters where they worked with the
seriousness of creators', used in the context of destruction, also
parodies the creation of light and darkness in the early passages of
the Biblical book. Greene's ironic use of the vocabulary of the Bible
might be making the point that, for him, the Second World War
signalled the end of a particular Christian era. Now, it is perfectly
arguable that the rise of fascism is linked to this, or that it is the
cause. The cult of personality and secular leadership has, for Greene,
taken over from the key role of the church in Western societies. In
this way the two main themes identified above - the tension between
individual and community, and religion - are linked. In terms of essay
writing this link could well be made after the discussion of the theme
of the individual and the community, and its links with the theme of
leadership. This might be the general conclusion to the essay. After
thoughtful consideration and interpretation a student may well decide
that this is what the (destructors.)' boils down to: Greene is making a
clear link between the rise of fascism and the decline of the Church's
influence. Despite the fact that fascism has been recently defeated,
Greene sees the lack of any contemporary values which could provide
social cohesion as providing the potential for its reappearance.
In the above text, we have bold words (destructors). We want to get the sentences which are having the word "destructors".
The word "destructors" can be present in different formats. Eg: (destructors), (DesTrucTors), (Des.tructors), DESTRUCTORS, destructors, des-tructors.
When we tried writing a regex to match the sentences, we are failing to get the sentences at some conditions(like we are getting half sentences, etc.,).
Could you please help us with this.
If this information doesn't help you to solve, please let us know. Will update it.
Thank you...
I'm not too sure about Python, but I believe this might work:
for match in re.finditer(r"[^.]*destructors[^.]*\.[^\w\s]*", subject, re.IGNORECASE):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
In any case, I think the regex you want is:
[^.]*destructors[^.]*\.[^\w\s]*
with the case insensitive and global flags set.
It will be helpful if you could provide the regex pattern which you have tried with so far. The best I can come up with is,
str_text='your text here containing DESTRUCTORS'
match=re.search('pass all the destructors combination here', str_text, flags=re.IGNORECASE)
Try for more patterns available for string formatting with regex here,https://docs.python.org/3/library/re.html

Doxygen parsing ampersands for ascii chars

I've been using Doxygen to document my project but I've ran into some problems.
My documentation is written in a language which apostrophes are often used. Although my language config parameter is properly set, when Doxygen generates the HTML output, it can't parse apostrophes so the code is shown instead of the correct character.
So, in the HTML documentation:
This should be the text: Vector d'Individus
But instead, it shows this: Vector d'Individus
That's strange, but searching the code in the HTML file, I found that what happens is that instead of using an ampersand to write the ' code, it uses the ampersand code. Well, seeing the code is easier to see:
<div class="ttdoc">Vector d&#39;Individus ... </div>
One other thing is to note that this only happens with the text inside tooltips...
But not on other places (same code, same class)...
What can I do to solve this?
Thanks!
Apostrophes in code comments must be encoded with the correct glyph for doxygen to parse it correctly. This seems particularly true for the SOURCE_TOOLTIPS popups. The correct glyph is \u2019, standing for RIGHT SINGLE QUOTATION MARK. If the keyboard you are using is not providing this glyph, you may write a temporary symbol (e.g. ') and batch replace it afterwards with an unicode capable auxiliary tool, for example: perl -pC -e "s/'/\x{2019}/g" < infile > outfile. Hope it helps.
Regarding the answer from ramkinobit, this is not necessary, doxygen can use for e.g. the Right Single quote: ’ (see doxygen documentation chapter "HTML commands").
Regarding the apostrophe the OP asks for one can use (the doxygen extension) &apos; (see also doxygen documentation chapter "HTML commands")).
There was a double 'HTML escape' in doxygen resulting in the behavior as observed for the single quote i.e. displaying '.
I've just pushed a proposed patch to github (pull request 784, https://github.com/doxygen/doxygen/pull/784).
EDIT 07/07/2018 (alternative) patch has been integrated in main branch on github.

Regex for remove citation

I searched regex for removing citation from text ( they sound strange from voice reading software ).
I want to remove from text all citation in form
(Author, 2000), (Author, in press)
and
(Author something, something 2004, Author2 2005)
But in same time not remove normal text in braces, and for ex. (Figure 3) (which might be helpful for reader).
Example of text with citations: http://journal.frontiersin.org/article/10.3389/fnhum.2014.00114/full
I've better:
\([^\)]*,[^\)]*\)
See LiveDemo
I use the one I found here:
r"([A-Z][\w\-]+ )?\((\D*\d{4}(: ?[\d\-]*)*(, \d{4}(: ?[\d\-]*)*)*;?)*\)"
Best i found is
[\(].?[^\)]*?[\d\d\d\d]{1}.*?[\)]{1}
It might not be optimal, because it selects (1) which in some cases might not be what reader want, still it is close to optimum.

Stripping superscript from plaintext

I often grab quotes from articles that include citations that include superscripted footnotes, which when copied are a pain in the ass. They show up as actual letters in the text as they are pasted in plaintext and not in html.
Is there a way I could run this through a regex to take out these superscripts?
For example
In the abeginning bGod ccreated the dheaven and the eearth.
Should become
In the beginning God created the heaven and the earth.
I can't think of a way to have regex search for misspellings and a corresponding sequential set of numbers and letters.
Any thoughts? I'm also using Sublime Text 3 for the majority of my writing, but I wouldn't mind outsourcing this to an AppleScript, or text replacement app (aText, textExpander, etc.).
Matching Code vs. Matching a Screen
It's hard to tell without seeing an example, but this should be doable if you copy the text from code view, as opposed to the regular browser view. (Ctrl or Cmd-J is your friend). Since writing the rules will take time, this will only be worthwhile for large chunks of text.
In code view, your superscript will be marked up in a way that can be targetted by regex. For instance:
and therefore bananas make you smartera
in the browser view (where the a at the end is a citation note) may look like this in code view:
and therefore bananas make you smarter<span class="mycitations">a</span>
In your editor, using regex, you can process the text to remove all tags, or just certain tags. The rules may not always be easy to write, and of course there are many disclaimers about using regex to parse html.
However, if your source is always the same (Wikipedia for instance), then you can create and save rules that should work across many pages.