I have a string, which has the word "amogus" in it. I use gsub, to replace amogus with
"<font color="..red..">".."amogus".."</font>"
But everytime I refresh to check for anything else I need to gsub, it thinks that the replaced text needs to be replaced, since it also contains amogus. How can I fix this?
Thanks
-creepersaur
You may solve it with %f pattern:
local s = " amo <>amo<> "
s = s:gsub("%f[%w>]amo", "<>amo<>")
print(s) --> <>amo<> <>amo<>
Related
I need a tip, tip or suggestion followed by some example of how I can add an extension in .txt format after the last character of a variable's output line.
For example:
set txt " ONLINE ENGLISH COURSE - LESSON 5 "
set result [concat "$txt" .txt]
Print:
Note that there is space in the start, means and fin of the variable phrase (txt). What must be maintained are the spaces of the start and means. But replace the last space after the end of the sentence, with the format of the extension [.txt].
With the built-in concat method of Tcl, it does not achieve the desired effect.
The expected result was something like this:
ONLINE ENGLISH COURSE - LESSON 5.txt
I know I could remove spaces with string map but I don't know how to remove just the last occurrence on the line.
And otherwise I don’t know how to remove the last space to add the text [.txt]
If anyone can point me to one or more solutions, thank you in advance.
set result "[string trimright $txt].txt"
or
set result [regsub {\s*$} $txt ".txt"]
I have search but found python and related solutions.
I have a string like
"Hello 'how' are % you?"
which I want to convert to below after Remove everything except numbers and alphabets
Hello how are you
I am using Regexreplace as follows but now sure what should be the replacement or if its a right approach
=REGEXREPLACE(B2 , "([^A-Za-z0-9]+)")
The main thing i want to remove from the string are the stuff like " or strange symbols
can anyone help?
You can use:
=TRIM(REGEXREPLACE(B2,"[\W_]+"," "))
Or, include the space in your character class:
=REGEXREPLACE(B2,"[\W_ ]+"," "))
Where: \W is short for [^A-Ba-b0-9_], so to include the underscore we added it to the character class.
you can use:
=TRIM(REGEXREPLACE(A1, "'|%|""", ))
I'm trying to clean a CSV file which has a column with contents like this:
Sometexthere1", "code"=>"47.51-2-01"}]
And I would like to remove everything before the first quote (") in order to keep just this:
Sometexthere1
I know that I can use $` to get everything before some match in regex, but I am not understanding how to keep just the string before the first double quote.
Parameter expansion does this well enough:
# Define a variable
s='Sometexthere1", "code"=>"47.51-2-01"}]'
# expand it, removing the longest possible match (from the end) for '"'*
result=${s%%'"'*}
# demonstrate that result by printing it
printf '%s\n' "$result"
...properly returns Sometexthere1.
You probably mean "delete everything after a double quote"? In Open Refine, you can use this GREL formula :
value.replace(/".+/, "")
> Result : Sometexthere1
I am trying to assemble a UDF in Scala that takes a column from a data frame and manipulates it to remove HTML and other useless pieces of text.
The column I need to modify is very messy, sometimes there is HTML, sometimes there is not... Searching SO I have found a regex solution to remove HTML
what I'd like to accomplish now is to find a regex that can find a specific word in the text and delete all the text after that word.
I think I understand from this SO answer that the regex should be something like \).* if you want to remove all after ), so I am trying to adapt this to my case, unsuccessfully due to my lack of knowledge about regex.
I have strings like:
I am interested to hear from you, thanks Sent from iPhone other stuff I want to delete....
I'd like to retain the first part of the string up to "Sent from" excluded, so a perfect output would be:
I am interested to hear from you, thanks
What I have so far is something like:
val toStringNoHTML = udf[String, String](_.toString
// code from SO as linked above
.replaceAll("""<(?!\/?a(?=>|\s.*>))\/?.*?>""", " ")
// delete all text after key word
.replaceAll("""'Sent from'.*""", "")
// remove all punctuation
.replaceAll("""[\p{Punct}\n]""", " ")
)
While the HTML gets remove, the "Sent from" and all the text after does not. Any hint how to adjust the regex to make it work?
EDIT
as pointed out in the comment, a small typo prevented my code to work, thanks for the help:
.replaceAll("""'Sent from'.*""", "")
should be
.replaceAll("""Sent from.*""", "")
Instead of doing multiple replaceAll(pattern, blank) I'd be tempted to start with an extraction.
val msgRE = "(.*>)?(.*)Sent from.*".r
val result = udfStr match {
case msgRE(_, msg) => Some(msg.trim) // .replaceAll() can be added here
case _ => None
}
Here the result is an Option[String] but that really depends on how you want to handle the non-matching input.
If more cleaning is needed after the extraction then replaceAll() can be added where indicated (or the extraction pattern can be better refined).
I am trying to replace all the "." in a specific column of my data frame with "/". There are other characters in each cell and I want to make sure I only change the "."'s.
When I use gsub, I get an output that appears to make the changes, but then when I go to View(), the changes are not actually made...I thought gsub was supposed to actually change the value in the data frame. Am I using it incorrectly? I have my code below.
gsub(".", "/", spy$Identifier, ignore.case = FALSE, perl = FALSE,
fixed = TRUE, useBytes = FALSE)
I also tried sub, but the code I have below changed every entry itself to "/" and I am not sure how to change it.
spy$Identifier <- sub("^(.).*", "/", spy$Identifier)
Thanks!
My recommendation would be to escape the "." character:
spy$Identifier <- gsub("\\.", "/", spy$Identifier)
In regular expression, a period is a special character that matches any character. "Escaping" it tells the search to look for an actual period. In R's gsub this is accomplished with two backslashes (i.e.: "\\"). In other languages, it's often just one backslash.