Escaping backslash using sub [duplicate] - regex

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 8 years ago.
I have a string template, read from a file, with the content:
template <- "\\begin{tabular}\n[results]\n\\end{tabular}"
I want to replace results with some text I have generated in my code, to compose a laTeX table, but when I run:
sub("\\[results\\]","text... \\hline text...",template)
I have the following output:
"\\begin{tabular}\ntext... hline text...\n\\end{tabular}"
The backslash is not escaped and I don't understand why, because I'm using \\ for this purpose.
I am using R-3.0.2.

The regex engine is consuming the \\ for a potential capture group, you need to add two more backslashes:
sub("\\[results\\]","text... \\\\hline text...",template)
[1] "\\begin{tabular}\ntext... \\hline text...\n\\end{tabular}"

Related

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

regex to extract before colon and between quotes [duplicate]

This question already has answers here:
Regex to match key in YAML
(3 answers)
Closed 4 years ago.
What regexp to use to match before colon and between quotes?
e.g
"This text only":"bla bla bla"
↓
This text only
I need this to extract only key fields in yaml.
"(.*)"\:"(.*)" shall provide you your both the key and value test link
If only the key is needed then:
"(.*)".* shall get you the key only part

Groovy - Extract a string between two different strings [duplicate]

This question already has answers here:
Regex Match all characters between two strings
(16 answers)
Closed 5 years ago.
I have files names in the below format -
India_AP_Dev1.txt
USA_GA_QA2.txt
USA_NY_AWSDev1.txt
AUS_AA_BB_QA4.txt
I want to extract only the environment part from the file name i.e. Dev1, QA2, AWSDev1, QA4etc. How can I go about with this type of file names. I thought about substring but the environment length is not constant. Is it possible to do it with regex
Appreciate your help. TIA
It is definitely possible using lookarounds:
(?<=_)[^._]*(?=\.)
(?<=_) match is preceded by _
[^._] take all characters except . and _
(?=\.) match is followed by .
Demo

Replacing surroundings of text in brackets when it occurs multiple times in a string [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I have a string containing LaTeX code, for example \emph{some words here} and I want to get Markdown syntax, for example,*some words here*. I tried:
s <- "some text in \\emph{italics} and some more ..."
pattern <- "\\\\emph\\{(.*)\\}"
gsub(pattern,"*\\1*", s)
> "some text in *italics* and some more ..."
However, I do not succeed at handling multiple occurences in one string.
s <- "some text in \\emph{italics} and some \\emph{more italics} and ..."
gsub(pattern,"*\\1*", s)
> "some text in *italics} and some \\emph{more italics* and ..."
I guess I need a non-greedy version which handles multiple occurrences, but I am not sure how to do it. Any ideas?
Use lazy ? quantifier like this.
Regex: \\\\emph{(.*?)}
Regex101 Demo

gsub("BLAH", "", "BLAH\WHAT") won't let x have a backslash? [duplicate]

This question already has answers here:
How to escape a backslash in R? [duplicate]
(2 answers)
Closed 8 years ago.
I'm doing some batch string clean up and a lot of the entries look like this:
"ABC\Company Co."
Which causes weird errors, and I can't seem to remove the backslash.
For example, try entering this into your console:
gsub("BLAH", "", "BLAH\WHAT")
and you get:
Error: '\W' is an unrecognized escape in character string starting ""BLAH\W"
I know that it's thinking \W is a command.. I'm actually suprised that gsub's 'interpreting' x, since x is just the string I want to sub out. I don't get why gsub cares what's actually in x, just that it should replace "BLAH" with "" within "BLAH\WHAT"...
The obvious solution would be to remove the \ from the string ahead of time.
gsub("\\", "", "BLAH\WHAT")
But then you get the exact same error message!
Thoughts? Thanks!
Use
gsub("\\\\", "", "BLAH\\WHAT")
which gives
[1] "BLAHWHAT"
To produce one backslash, you need to escape it using a \. Thus, "\\\\" produces two backslashes, which matches the two inside "BLAH\\WHAT".
See these related questions:
How to escape a backslash in R?
How to escape backslashes in R string