Escaping backslash using sub [duplicate] - regex

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 8 years ago.
I have a string template, read from a file, with the content:
template <- "\\begin{tabular}\n[results]\n\\end{tabular}"
I want to replace results with some text I have generated in my code, to compose a laTeX table, but when I run:
sub("\\[results\\]","text... \\hline text...",template)
I have the following output:
"\\begin{tabular}\ntext... hline text...\n\\end{tabular}"
The backslash is not escaped and I don't understand why, because I'm using \\ for this purpose.
I am using R-3.0.2.

The regex engine is consuming the \\ for a potential capture group, you need to add two more backslashes:
sub("\\[results\\]","text... \\\\hline text...",template)
[1] "\\begin{tabular}\ntext... \\hline text...\n\\end{tabular}"


Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
inp = "#abc #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

regex to extract before colon and between quotes [duplicate]

This question already has answers here:
Regex to match key in YAML
(3 answers)
Closed 4 years ago.
What regexp to use to match before colon and between quotes?
"This text only":"bla bla bla"
This text only
I need this to extract only key fields in yaml.
"(.*)"\:"(.*)" shall provide you your both the key and value test link
If only the key is needed then:
"(.*)".* shall get you the key only part

Groovy - Extract a string between two different strings [duplicate]

This question already has answers here:
Regex Match all characters between two strings
(16 answers)
Closed 5 years ago.
I have files names in the below format -
I want to extract only the environment part from the file name i.e. Dev1, QA2, AWSDev1, QA4etc. How can I go about with this type of file names. I thought about substring but the environment length is not constant. Is it possible to do it with regex
Appreciate your help. TIA
It is definitely possible using lookarounds:
(?<=_) match is preceded by _
[^._] take all characters except . and _
(?=\.) match is followed by .

Replacing surroundings of text in brackets when it occurs multiple times in a string [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I have a string containing LaTeX code, for example \emph{some words here} and I want to get Markdown syntax, for example,*some words here*. I tried:
s <- "some text in \\emph{italics} and some more ..."
pattern <- "\\\\emph\\{(.*)\\}"
gsub(pattern,"*\\1*", s)
> "some text in *italics* and some more ..."
However, I do not succeed at handling multiple occurences in one string.
s <- "some text in \\emph{italics} and some \\emph{more italics} and ..."
gsub(pattern,"*\\1*", s)
> "some text in *italics} and some \\emph{more italics* and ..."
I guess I need a non-greedy version which handles multiple occurrences, but I am not sure how to do it. Any ideas?
Use lazy ? quantifier like this.
Regex: \\\\emph{(.*?)}
Regex101 Demo

gsub("BLAH", "", "BLAH\WHAT") won't let x have a backslash? [duplicate]

This question already has answers here:
How to escape a backslash in R? [duplicate]
(2 answers)
Closed 8 years ago.
I'm doing some batch string clean up and a lot of the entries look like this:
"ABC\Company Co."
Which causes weird errors, and I can't seem to remove the backslash.
For example, try entering this into your console:
gsub("BLAH", "", "BLAH\WHAT")
and you get:
Error: '\W' is an unrecognized escape in character string starting ""BLAH\W"
I know that it's thinking \W is a command.. I'm actually suprised that gsub's 'interpreting' x, since x is just the string I want to sub out. I don't get why gsub cares what's actually in x, just that it should replace "BLAH" with "" within "BLAH\WHAT"...
The obvious solution would be to remove the \ from the string ahead of time.
gsub("\\", "", "BLAH\WHAT")
But then you get the exact same error message!
Thoughts? Thanks!
gsub("\\\\", "", "BLAH\\WHAT")
which gives
To produce one backslash, you need to escape it using a \. Thus, "\\\\" produces two backslashes, which matches the two inside "BLAH\\WHAT".
See these related questions:
How to escape a backslash in R?
How to escape backslashes in R string