Replacing surroundings of text in brackets when it occurs multiple times in a string [duplicate] - regex

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I have a string containing LaTeX code, for example \emph{some words here} and I want to get Markdown syntax, for example,*some words here*. I tried:
s <- "some text in \\emph{italics} and some more ..."
pattern <- "\\\\emph\\{(.*)\\}"
gsub(pattern,"*\\1*", s)
> "some text in *italics* and some more ..."
However, I do not succeed at handling multiple occurences in one string.
s <- "some text in \\emph{italics} and some \\emph{more italics} and ..."
gsub(pattern,"*\\1*", s)
> "some text in *italics} and some \\emph{more italics* and ..."
I guess I need a non-greedy version which handles multiple occurrences, but I am not sure how to do it. Any ideas?

Use lazy ? quantifier like this.
Regex: \\\\emph{(.*?)}
Regex101 Demo

Related

How can I remove a certain pattern from a string? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have this string like "682_2, 682_3, 682_4". (682 is a random number)
How can i get this string "2, 3, 4" using regex and ruby?
You can do this in ruby
input="682_2, 682_3, 682_4"
output = input.gsub(/\d+_/,"")
puts output
A simple regex could be
/_([0-9]+)$/ and in the match group of the result you will have 2 for 682_2 and 3 for 682_3
Ruby code snippet would be "64532_2".match(/_([0-9]+)/).captures[0]
you can use scan which returns an array containing the matches:
string_code.scan(/(?<=_)\d/)
(?<=_) tells to find a pattern that has a given pattern (_ in this case) before itself but wont capture that, it captures only \d. if it can have more than 1 digit like 682_13,682_33 then \d+ is necessary.

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

How to filter out c-type comments with regex? [duplicate]

This question already has answers here:
Regex to match a C-style multiline comment
(8 answers)
Improving/Fixing a Regex for C style block comments
(5 answers)
Strip out C Style Multi-line Comments
(4 answers)
Closed 3 years ago.
I'm trying to filter out "c-style" comments in a line so i'm only left with the words (or actual code).
This is what i have so far: demo
regex:
\/\*[^\/]*[^\*]*\*\/
text:
/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/
My guess is that this expression might likely work,
\/\*(\/\*\*\/)?\s*([^\/*]+?)\s*(?:\/?\*?\*?\/|\*)
or we would modify our left and right boundaries, if we would have had different inputs.
In this demo, the expression is explained, if you might be interested.
We can try doing a regex replacement on the following pattern:
/\*.*?\*/
This matches any old-school C style comment. It works by using a lazy dot .*? to match only content within a single comment, before the end of that comment. We can then replace with empty string, to effectively remove these comments from the input.
Code:
Dim input As String = "/* 1111 */ one /*2222*/two /*3333 */ three/* 4444*/ four /*/**/ five /**/"
Dim output As String = Regex.Replace(input, "/\*.*?\*/", "")
Console.WriteLine(input)
Console.WriteLine(output)
This prints:
one two three four five

Add space between two letters in a string in R [duplicate]

This question already has answers here:
Use regex to insert space between collapsed words
(2 answers)
Closed 5 years ago.
Suppose I have a string like
s = "PleaseAddSpacesBetweenTheseWords"
How do I use gsub in R add a space between the words so that I get
"Please Add Spaces Between These Words"
I should do something like
gsub("[a-z][A-Z]", ???, s)
What do I put for ???. Also, I find the regular expression documentation for R confusing so a reference or writeup on regular expressions in R would be much appreciated.
You just need to capture the matches then use the \1 syntax to refer to the captured matches. For example
s = "PleaseAddSpacesBetweenTheseWords"
gsub("([a-z])([A-Z])", "\\1 \\2", s)
# [1] "Please Add Spaces Between These Words"
Of course, this just puts a space between each lower-case/upper-case letter pairings. It doesn't know what a real "word" is.

Escaping backslash using sub [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 8 years ago.
I have a string template, read from a file, with the content:
template <- "\\begin{tabular}\n[results]\n\\end{tabular}"
I want to replace results with some text I have generated in my code, to compose a laTeX table, but when I run:
sub("\\[results\\]","text... \\hline text...",template)
I have the following output:
"\\begin{tabular}\ntext... hline text...\n\\end{tabular}"
The backslash is not escaped and I don't understand why, because I'm using \\ for this purpose.
I am using R-3.0.2.
The regex engine is consuming the \\ for a potential capture group, you need to add two more backslashes:
sub("\\[results\\]","text... \\\\hline text...",template)
[1] "\\begin{tabular}\ntext... \\hline text...\n\\end{tabular}"