Rmarkdown rendering error in math with underscores - r-markdown

I am trying to display a formula in RMarkdown with two underscores (one inside a {}-bracket).
The preview in Rstudio works as expected and the formula is rendered correctly.
In the rendered document (HTML), the code is, however, not rendered correctly but instead the parts of the equation between the underscores are put into italics as if they were normal text and not within a formula environment.
An MWE is given by this
$$
w = \frac{1}{x_1}\sum y_{1}
$$
which gets rendered to this
We see that the parts between the underscores are displayed in italic and the formula is not rendered but its source code is displayed.
A solution is to escape the underscores (breaks the preview, but gets rendered correctly)
$$
w = \frac{1}{x\_1}\sum y\_{1}
$$
Note that only escaping one underscore works as well!
Is this expected behavior or a bug in the knitr engine?
Is there a solution that solves this in both the preview as well as the final document?
Edit
I use xaringan and xaringanthemer, not sure if this causes the error. Nonetheless, here is my header
title: "MWE"
output:
xaringan::moon_reader:
lib_dir: libs
css: xaringan-themer.css
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false

The solution can be found in the official documentation:
Limitations:
1) The source code of a LaTeX math expression must be in one line, unless it is inside a pair of double dollar signs, in which case the starting $$ must appear in the very beginning of a line, followed immediately by a non-space character, and the ending $$ must be at the end of a line, led by a non-space character;
There should not be spaces after the opening $ or before the closing $.
Math does not work on the title slide (see #61 for a workaround).
So, just write:
$$w = \frac{1}{x_1}\sum y_{1}$$

Related

Regex is not removing websites from text data in preprocessing

I am doing text preprocessing and in my text there are websites. I want to remove these but I couldn't do it.
Below is the sample text:
\n\nWorldwide web (www)\n\nName for the entirety of documents linked
through hyperlinks on the Internet; often used as a synonym for the
latter26.\n\n\n\n\n\n\n\n24\xe2\x80\x83\twww.sicherheitskultur.at,
Information Security Glossary\n\n25\xe2\x80\x83\tSource of text
(partly): KS\xc3\x96: Cyber Risk Matrix -
Glossary\n\n26\xe2\x80\x83\twww.sicherheitskultur.at, Information
Security Glossary\n\n\n\n\n\n23\n'
Websites are visible (in bold) and I want to remove these.
I have tried one code (from StackOverflow answer-Python code to remove HTML tags from a string) but it is not removing these websites.
Below is the codes:
def remove_web(text):
cleanr = re.compile('<.*?.*#>')
text = re.sub(cleanr, '', text)
return text
Thanks in advance!
so if you only want to remove this particularly URL, you could use this regex:
www\.[a-z]+\.at
(Go with David Amar's solution.)
www(\.\w+)+
Explanations :
- first it reads www
- then at least one block like this : a dot + some text (letters, numbers, undescores)
To match more chars in the url (hypens, for example), replace \w by a character set like [a-zA-Z0-9_-] for example

Removing parentheses and everything in them with Regex

Having a bit of trouble with some code I'm working through. Basically, I have transcripts (txt files) for a few Japanese anime, of which I want to remove everything but the spoken lines (Japanese sentences) in order to do some NLP experiments.
I've managed to accomplish a good bit of cleaning, but where I'm stuck is with parentheses. A majority of the elements in my list start with a character's name inside parentheses (i.e. (Armin)). I want to remove these, but all the regex code I've found online doesn't seem to work.
Here's a snippet of the list I'm working with:
['(アルミン)その日', '人類は思い出した', '(アルミン)奴らに', '支配されていた恐怖を', '(アルミン)鳥籠の中に', 'とらわれていた―', '屈辱を', '(キース)総員', '戦闘用意!', '目標は1体だ', '必ず仕留め―', 'ここを', '我々', '人類', '最初の壁外拠点とする!', '(エルヴィン)あっ…', '目標接近!', '(キース)訓練どおり5つに分かれろ!', '囮は我々が引き受ける!', '全攻撃班', '立体機動に移れ!', '(エルヴィン)全方向から', '同時に叩くぞ!', '(モーゼス)やあーっ!']
I've tried the following code (it's as close as I could get):
no_parentheses = []
for line in mylist:
if '(' in line:
line = re.sub('\(.*\)','', line)
no_parentheses.append(line)
else:
no_parentheses.append(line)
But when I view the results, those pesky parentheses remain in my list mockingly.
Could anyone offer suggestions to resolve this issue?
Thanks again!
The brackets used in the text are full-width brackets. Specifically, U+FF08 FULLWIDTH LEFT PARENTHESIS, and U+FF09 FULLWIDTH RIGHT PARENTHESIS.
Your regex should use full-width brackets as well.
line = re.sub('(.*)','', line)

How can I find a string in a cell containing a link?

I have some cells in openoffice calc which contain links/URLs. They display, of course, in calc as text, and hovering the mouse shows the URL. Clicking on those cells brings up the URL referenced.
I want to match a string in the displayed text. The below shows the spreadsheet:
spreadsheet
Cell A1 contains the string searched for.
Cells A4:A7 contain the links/URLs.
Cells B4:B7 are copies of A4:A7 but with Default format to remove the link/URLs. Cell B3 contains my match formula, which successfully finds the string in B4:B7.
I've tried the following in cell A3 to find the string in A4:A7
`=MATCH("^"&A1&".*";B4:B7;0)` #only works on the default formatted cells.
`=MATCH(".*"&A1&".*";A4:A7;0)` #
`=MATCH(A1&".*";A4:A7;0)` #
`=MATCH(A1;A4:A7;0)` #
Also, tried several other regular expressions, none of which work. Yes, I'm rusty on regex's, but what am I doing wrong? Or, is the literal string actually not present in the search field unless I change the format?
All the problems with the searches were caused by the fact that
'Search criteria = and <> must apply to whole cells'
was enabled in Tools->Options->Openoffice Calc->Calculate.
Turning this setting off makes everything work as advertised. The clue was that the regex ".*"&A1&".*", which of course matches a full line of plain text, worked with the range B4:B7.
The simplest solution is the expression:
=MATCH(""&A1;A4:A7;0) # "" invoked to trigger regex

Selecting URLs using RegExp but ignoring them when surrounded by double quotes

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.
I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:
(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?
I want it to match the following:
www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza
But not match:
"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"
Edit:
To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:
I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.
A sample of a different one I have working below. language is php:
$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";
$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
$replace = array('<br/><br/>',
'<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>');
$articleEntry = preg_replace($pattern,$replace,$articleEntry);
The result of the above will replace any new lines "\n" with a double break "" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.
I've found a solution!
(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)
The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava
While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.
add ^ and $ to your regex
^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$
please notice you might need to escape the slashes after http (meaning https?\:\/\/)
update
if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.

Vim: Editing the python.vim syntax file to highlight like Textmate

I'm trying to edit the python.vim syntax file to duplicate the syntax highlighting for python in Textmate. The attached image illustrates the highlighting of function parameters which i'm struggling to achieve.
The self, a, b is highlighted in Textmate but not in Vim. I figured that I have to do the following.
Match a new region
syn region pythonFunction start="(" end=")" contains=pythonParameters skipwhite transparent
Try to match a string followed by a comma
syn match pythonParameters ".*" contained
So in point 2 the ".*" will match any string at the moment and must be expanded further to be correct. However i'm not sure if i'm on the right path since the match in 2 is not constrained
to region between the brackets (). Any tips or input will be appreciated.
EDIT 1: If anyone wondered how it turned out eventually.
Here is my vim syntax highlighting for python.
EDIT 2: So just for ultimate thoroughness I created a github page for it.
http://pfdevilliers.github.com/Pretty-Vim-Python/
Ok, you've got a couple problems.
There is already a region called pythonFunction, for highlighting def and function names.
This region will match any parenthesis, anywhere
So, find the pythonFunction match, and change it to this:
syn match pythonFunction
\ "\%(\%(def\s\|class\s\|#\)\s*\)\#<=\h\%(\w\|\.\)*" contained nextgroup=pythonVars
Adding nextgroup tells vim to match pythonVars after a function definition.
Then add:
syn region pythonVars start="(" end=")" contained contains=pythonParameters transparent keepend
syn match pythonParameters "[^,]*" contained skipwhite
Finally, to actually highlight it, find the HiLink section, and add:
HiLink pythonParameters Comment
Change Comment to the grouping you want, or add your own. I'm using Statement myself.
Vim, Highlight matching Parenthesis ( ), square brackets [ ], and curly braces: { }
The config options for setting the colors of the foreground and background under the cursor when over a parentheses, square bracket or curly brace is this one:
hi MatchParen ctermfg=16 ctermbg=208 cterm=bold
To enable/disable the background color of the line under the cursor:
:set cursorline
:set nocursorline
To set the color of the background color of the line under the cursor:
hi VisualNOS ctermbg=999
hi Visual ctermbg=999
Here is my adaptation:
https://github.com/sentientmachine/erics_vim_syntax_and_color_highlighting