Remove square brackets from a string vector - regex

I have a character vector in which each element is enclosed in brackets. I want
to remove the brackets and just have the string.
So I tried:
n = c("[Dave]", "[Tony]", "[Sara]")
paste("", n, "", sep="")
Unfortunately, this doesn't work for some reason.
I've performed the same task before using this same code, and am not sure why it's not working this time.
I want to go from '[Dave]' to 'Dave'.
What am I doing wrong?

You could gsub out the brackets like so:
n = c("[Dave]", "[Tony]", "[Sara]")
gsub("\\[|\\]", "", n)
[1] "Dave" "Tony" "Sara"

A regular expression substitution will do it. Look at the gsub() function.
This gives you what you want (it removes any instance of '[' or ']'):
gsub("\\[|\\]", "", n)

The other answers should be enough to get your desired output. I just wanted to provide a brief explanation of why what you tried didn't work.
paste concatenates character strings. If you paste an empty character string, "", to something with a separator that is also an empty character string, you really haven't altered anything. So paste can't make a character string shorter; the result will either be the same (as in your example) or longer.

If working within tidyverse:
library(tidyverse); library(stringr)
n = c("[Dave]", "[Tony]", "[Sara]")
n %>% str_replace_all("\\[|\\]", "")
[1] "Dave" "Tony" "Sara"

Related

How to get items into array from string with comma separated values in type script and any item has comma it will be in double quotes

I've been struggling to get all items of below string into an array.
abc,"de,f",hi,"hello","te,st&" items into an array in Typescript.
If any string has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Tried split function but it fails as my strings can have comma as well.
Any help in this regard is highly appreciated.
Thank you.
If you are looking to use a regular expression matching, can you try a different regEx that would match strings inside quotes first, then strings outside quotes, something like (\".+?\")|(^[^\"]+,)|(,[^\"]+,)
I don't know how relevant it would be in case of TypeScript, but I am guessing you'd be able to work something out that takes this Pattern and gives you the matches one by one
First of all, I think that you are making the things more complicated than what they are by implementing the following logic:
has comma (,) or ampersand (&) in it,It will be placed in double quotes.
Instead of doing this that way, you should systematically put your elements inside double quote:
abc,"de,f",hi,"hello","te,st&"
→
"abc","de,f","hi","hello","te,st&"
you will have then the following string to parse.
A regex like this one will do the job:
(?<=,")([^"]*)(?=",)|(?<=")([^"]*)(?=",)|(?<=")([^"]*)(?="$)
using back references $1$2$3, you can extract your elements.
RegEx /(?:^|,)(\"(?:[^\"])\"|[^,])/ has helped me get the required values.
var test = '"abc,123",test,123,456,"def:get"';
test.split(/(\"(?:[^\"])\"|[^,])/);
Its returning the below array.
["", ""abc,123"", ",", "test", ",", "123", ",", "456", ",", ""def:get"", ""]
And when a particular values in side double quotes,I just trimmed them to get the actual values and have ignore empty items of array..
use the split a string .....
let fullName = "First,Last"
let fullNameArr = fullName.characters.split{$0 == ","}.map(String.init)
fullNameArr[0] // First
fullNameArr[1] // Last

re.sub() ellipsis in Python 3

I need a simple solution, but it's evading me. I am passing a list of strings to a for loop for some cleaning up, and need to remove any instance of an ellipsis. Here's an example of what I've tried:
text_list = ["string1", "string2", "string3...", "string.4"]
for i in range(len(text_list)):
text_list[i] = re.sub("\.", "", text_list[i])
text_list[i] = re.sub("\.{3}", "", text_list[i])
text_list[i] = re.sub("\.\.\.", "", text_list[i])
Naturally, none of these removes an ellipsis. The period is removed, though. So my output would be:
for text in text_list:
print(text)
>>>string1
string2
string3... <- THIS ONE DIDN'T CHANGE
string4 <- BUT THIS ONE DID
I've exhausted my regex documentation and Google searches. How do I match an ellipsis with a regex?
#swalladge had the right notion here: use unicode. Here is his answer.
"If you want to remove an actual ellipsis, as in the unicode HORIZONTAL ELLIPSIS character (…), then you need to use that in the code, since 3 periods won't match it." –#swalladge
#rickdenhaan also had an easier way to accomplish the task. Thanks!

Regex works, but not on strings in my vector

So I am attempting to use grep to find pattern and replace values within my single column data frame. I basically want grep that says "delete everything after the comma until the end of the string".
I wrote the expression, and it works on my dummy vector:
> library(stringr)
> pretendvector <- c("Hi","Hi,there","Hi there, how are you")
>str_replace(pretendvector, regex(',.*$'),'')
[1] "Hi" "Hi" "Hi there"
However, when apply the same expression to my vector (since its for stringr I vectorized the column of the dataframe), it returns every value in the column, and does not apply the expression. Does anyone have any idea why this might be?
I guess the OP didn't assign the output from str_replace to a new object or update the original vector. In that case,
newvector <- str_replace(pretendvector, regex(',.*$'),'')
We can also do this using sub from base R
newvector <- sub(",.*", "", pretendvector)

Removing parentheses as unwanted text in R using gsub

I'm trying to clean up a column in my data frame where the rows look like this:
1234, text ()
and I need to keep just the number in all the rows. I used:
df$column = gsub(", text ()", "", df$column)
and got this:
1234()
I repeated the operation with only the parentheses, but they won't go away. I wasn't able to find an example that deals specifically with parentheses being eliminated as unwanted text. sub doesn't work either.
Anyone knows why this isn't working?
Parentheses are stored metacharacters in regex. You should escape them either using \\ or [] or adding fixed = TRUE. But in your case you just want to keep the number, so just remove everything else using \\D
gsub("\\D", "", "1234, text ()")
## [1] "1234"
If your column always looks like a format described above :
1234, text ()
Something like the following should work:
string extractedNumber = Regex.Match( INPUT_COLUMN, #"^\d{4,}").Value
Reads like: From the start of the string find four or more digits.

Remove from a string all except selected characters

I want to remove from a string all characters that are not digits, minus signs, or decimal points.
I imported data from Excel using read.xls, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:
excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")),
replacement = "", x = excel_coords)
> clean_coords
[1] "19.53380" "20.02591" "-155.91059" "-155.8154"
Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).
Short and sweet. Thanks to comment by G. Grothendieck.
gsub("[^-.0-9]", "", excel_coords)
From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."
Can also be done by using strsplit, sapply and paste and by indexing the correct characters rather than the wrong ones:
excel_coords <- c(" 19.53380Ý°", " 20.02591°", "-155.91059°", "-155.8154°")
correct_chars <- c(0:9,"-",".")
sapply(strsplit(excel_coords,""),
function(x)paste(x[x%in%correct_chars],collapse=""))
[1] "19.53380" "20.02591" "-155.91059" "-155.8154"
gsub("(.+)([[:digit:]]+\\.[[:digit:]]+)(.+)", "\\2", excel_coords)
[1] "9.53380" "0.02591" "5.91059" "5.8154"