re.sub() ellipsis in Python 3 - regex

I need a simple solution, but it's evading me. I am passing a list of strings to a for loop for some cleaning up, and need to remove any instance of an ellipsis. Here's an example of what I've tried:
text_list = ["string1", "string2", "string3...", "string.4"]
for i in range(len(text_list)):
text_list[i] = re.sub("\.", "", text_list[i])
text_list[i] = re.sub("\.{3}", "", text_list[i])
text_list[i] = re.sub("\.\.\.", "", text_list[i])
Naturally, none of these removes an ellipsis. The period is removed, though. So my output would be:
for text in text_list:
print(text)
>>>string1
string2
string3... <- THIS ONE DIDN'T CHANGE
string4 <- BUT THIS ONE DID
I've exhausted my regex documentation and Google searches. How do I match an ellipsis with a regex?

#swalladge had the right notion here: use unicode. Here is his answer.
"If you want to remove an actual ellipsis, as in the unicode HORIZONTAL ELLIPSIS character (…), then you need to use that in the code, since 3 periods won't match it." –#swalladge
#rickdenhaan also had an easier way to accomplish the task. Thanks!

Related

How to get items into array from string with comma separated values in type script and any item has comma it will be in double quotes

I've been struggling to get all items of below string into an array.
abc,"de,f",hi,"hello","te,st&" items into an array in Typescript.
If any string has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Tried split function but it fails as my strings can have comma as well.
Any help in this regard is highly appreciated.
Thank you.
If you are looking to use a regular expression matching, can you try a different regEx that would match strings inside quotes first, then strings outside quotes, something like (\".+?\")|(^[^\"]+,)|(,[^\"]+,)
I don't know how relevant it would be in case of TypeScript, but I am guessing you'd be able to work something out that takes this Pattern and gives you the matches one by one
First of all, I think that you are making the things more complicated than what they are by implementing the following logic:
has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Instead of doing this that way, you should systematically put your elements inside double quote:
abc,"de,f",hi,"hello","te,st&"
→
"abc","de,f","hi","hello","te,st&"
you will have then the following string to parse.
A regex like this one will do the job:
(?<=,")([^"]*)(?=",)|(?<=")([^"]*)(?=",)|(?<=")([^"]*)(?="$)
using back references $1$2$3, you can extract your elements.
RegEx /(?:^|,)(\"(?:[^\"])\"|[^,])/ has helped me get the required values.
var test = '"abc,123",test,123,456,"def:get"';
test.split(/(\"(?:[^\"])\"|[^,])/);
Its returning the below array.
["", ""abc,123"", ",", "test", ",", "123", ",", "456", ",", ""def:get"", ""]
And when a particular values in side double quotes,I just trimmed them to get the actual values and have ignore empty items of array..
use the split a string .....
let fullName = "First,Last"
let fullNameArr = fullName.characters.split{$0 == ","}.map(String.init)
fullNameArr[0] // First
fullNameArr[1] // Last

Swift 3: iosMath label removing all spaces

I'm trying to display text which may at times contain a math expression so I am using MTMathUILabel from iosMath. I generate the labels dynamically and add them to a stack as I pull the strings from the db. The problem is that all text which is not math appears with no spaces. i.e:
In db: Solve the following equation: (math here)
In label: Solvethefollowingequation: (math here)
Here is what I have tried so far:
for question in all_questions {
let finalString = question.question?.replacingOccurrences(of: " ", with: "\\space", options: .literal, range: nil)
let label = MTMathUILabel()
label.textColor = UIColor.black
label.latex = finalString
stack.addArrangedSubview(label)
}
But the problem is that it literally places two . And xcode doesn't let me write just one \ because it is not escaped. However if I just write
print("\\space")
Then it will print just one.
How can I fix this so I add only one \? If this cannot be done, how can I achieve what I want? Is there a better library out there?
After giving a quick look at MTMathUILabel's doc and LaTeX conventions, I believe you should replace your spaces with a tilde character "~". This will make them non-breaking spaces and avoid the backslash issue (which is probably due to \space not being understood by MTMathUILabel).
Systematic replacement of all spaces may yield undesirable result if the formula itself has legitimate spaces in it.
For example, a quadratic equation would be expressed as:
x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}
You will end up replacing spaces inside curly braces, and that may or may not be what you want:
x~=~\frac{-b~\pm~\sqrt{b^2-4ac}}{2a}

Replace dots using `gsub`

I am trying to replace all the "." in a specific column of my data frame with "/". There are other characters in each cell and I want to make sure I only change the "."'s.
When I use gsub, I get an output that appears to make the changes, but then when I go to View(), the changes are not actually made...I thought gsub was supposed to actually change the value in the data frame. Am I using it incorrectly? I have my code below.
gsub(".", "/", spy$Identifier, ignore.case = FALSE, perl = FALSE,
fixed = TRUE, useBytes = FALSE)
I also tried sub, but the code I have below changed every entry itself to "/" and I am not sure how to change it.
spy$Identifier <- sub("^(.).*", "/", spy$Identifier)
Thanks!
My recommendation would be to escape the "." character:
spy$Identifier <- gsub("\\.", "/", spy$Identifier)
In regular expression, a period is a special character that matches any character. "Escaping" it tells the search to look for an actual period. In R's gsub this is accomplished with two backslashes (i.e.: "\\"). In other languages, it's often just one backslash.

Need help removing HTML tags, certain punctuation, and ending periods

Suppose I have this test string:
test.string <- c("This is just a <test> string. I'm trying to see, if a FN will remove certain things like </HTML tags>, periods; but not the one in ASP.net, for example.")
I want to:
Remove anything contained within an html tag
Remove certain punctuation (,:;)
Period that end a sentence.
So the above should be:
c("This is just a string I'm trying to see if a FN will remove certain things like periods but not the one in ASP.net for example")
For #1, I've tried the following:
gsub("<.*?>", "", x, perl = FALSE)
And that seems to work OK.
For #2, I think it's simply:
gsub("[:#$%&*:,;^():]", "", x, perl = FALSE)
Which works.
For #3, I tried:
gsub("+[:alpha:]?[.]+[:space:]", "", test.string, perl = FALSE)
But that didn't work...
Any ideas on where I went wrong? I totally suck at RegExp, so any help would be much appreciated!!
Based on your provided input and rules for what you want removed, the following should work.
gsub('\\s*<.*?>|[:;,]|(?<=[a-zA-Z])\\.(?=\\s|$)', '', test.string, perl=T)
See Working Demo
Try this:
test.string <- "There is a natural aristocracy among men. The grounds of this are virtue and talents. "
gsub("\\.\\s*", "", gsub("([a-zA-Z0-9]). ([A-Z])", "\\1 \\2", test.string))
# "There is a natural aristocracy among men The grounds of this are virtue and talents

Remove square brackets from a string vector

I have a character vector in which each element is enclosed in brackets. I want
to remove the brackets and just have the string.
So I tried:
n = c("[Dave]", "[Tony]", "[Sara]")
paste("", n, "", sep="")
Unfortunately, this doesn't work for some reason.
I've performed the same task before using this same code, and am not sure why it's not working this time.
I want to go from '[Dave]' to 'Dave'.
What am I doing wrong?
You could gsub out the brackets like so:
n = c("[Dave]", "[Tony]", "[Sara]")
gsub("\\[|\\]", "", n)
[1] "Dave" "Tony" "Sara"
A regular expression substitution will do it. Look at the gsub() function.
This gives you what you want (it removes any instance of '[' or ']'):
gsub("\\[|\\]", "", n)
The other answers should be enough to get your desired output. I just wanted to provide a brief explanation of why what you tried didn't work.
paste concatenates character strings. If you paste an empty character string, "", to something with a separator that is also an empty character string, you really haven't altered anything. So paste can't make a character string shorter; the result will either be the same (as in your example) or longer.
If working within tidyverse:
library(tidyverse); library(stringr)
n = c("[Dave]", "[Tony]", "[Sara]")
n %>% str_replace_all("\\[|\\]", "")
[1] "Dave" "Tony" "Sara"