I have a string with multiple commas and spaces as delimiters between words. Here are some examples:
ex #1: string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
ex #2: string = 'word1 word2 word3'
ex #3: string = 'word1,word2,word3,'
I want to use a regex to convert either of the above 3 examples to "word1, word2, word3" - (Note: no comma after the last word in the result).
I used the following code:
import re
input_col = 'word1 , word2 , word3, '
test_string = ''.join(input_col)
test_string = re.sub(r'[,\s]+', ' ', test_string)
test_string = re.sub(' +', ',', test_string)
print(test_string)
I get the output as "word1,word2,word3,". Whereas I actually want "word1, word2, word3". No comma after word3.
What kind of regex and re methods should I use to achieve this?
you can use the split to create an array and filter len < 1 array
import re
s='word1 , word2 , word3, '
r=re.split("[^a-zA-Z\d]+",s)
ans=','.join([ i for i in r if len(i) > 0 ])
How about adding the following sentence to the end your program:
re.sub(',+$','', test_string)
which can remove the comma at the end of string
One approach is to first split on an appropriate pattern, then join the resulting array by comma:
string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
parts = re.split(",*\s*", string)
sep = ','
output = re.sub(',$', '', sep.join(parts))
print(output
word1,word2,word3
Note that I make a final call to re.sub to remove a possible trailing comma.
You can simply use [ ]+ to detect extra spaces and ,\s*$ to detect the last comma. Then you can simply substitute the [ ]+,[ ]+ with , and the last comma with an empty string
import re
input_col = 'word1 , word2 , word3, '
test_string = re.sub('[ ]+,[ ]+', ', ', input_col) # remove extra space
test_string = re.sub(',\s*$', '', test_string) # remove last comma
print(test_string)
I have some fields that could have the following values:
"
\
\n <-- or any possible carriage return
I want to replace them with the following:
\"
\\
<-- this represents a space
Ideally, I would like to do it in a single pass using REGEX_REPLACE or some other method.
I'm currently doing it like the following. It is inefficient because it has to make three passes.
SELECT replace(
replace(
replace(
'He\llo " I am \na \string\n pl"eas\ne fi"x me\n,
'\',
'\\'
),
'\n',
' '
),
'"',
'\"'
)
FROM DUAL;
The output should be
He\\llo \" I am \ a \\string\ pl\"eas\ e fi\"x me\
Hi I am new to regular expression.
I have string like this
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )"
I tried replace the square brackets [] with single quote ' with replaceAll() method. But it did not work.
Can any one help me what will be regular expression for replacing the square brackets [] with single quote in my above string.
\\[\\] replace [] in a string. And use below to replace [] with '
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )".replace(new RegExp('\\[|\\]', 'g'), "'");
but that creates two single quotes, one as a child of other. That will not work. So, you must replace with \' to avoid the error as shown below.
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )".replace(new RegExp('\\[|\\]', 'g'), "\\'");
I'm looking for a regex to escape all backslash characters into double blackshash characters, except for newlines, carriage returns, etc..
This function:
Public Function EscapeInvalidCharacters(ByVal str As String) As String
str = str.Replace("\", "\\") ' Backslash
str = str.Replace("'", "\'") ' Single quote
str = str.Replace("""", "\""") ' Double quote
str = str.Replace(vbNewLine, "\n") ' New line
str = str.Replace(vbCr, "\r") ' Carriage return
str = str.Replace(vbTab, "\t") ' Horizontal tab
str = str.Replace(vbBack, "\b") ' Backspace
str = str.Replace(vbFormFeed, "\f") ' Form feed
Return str
End Function
with this input:
"This is a test.\nThis is another test.\n"
Wrongly, generates the following output:
"This is a test.\\nThis is another test.\\n"
Is it possible to alter the line str = str.Replace("\", "\\") so that it doesn't alter the newlines?
\(?!r|n|t|b|f)
Try this.This should do it
Thanks to vks, I came up with this solution:
Public Function EscapeInvalidCharacters(ByVal str As String) As String
'str = str.Replace("\", "\\") ' Backslash
str = Regex.Replace(str, "\\(?!n|r|t|b|f)", "\\")' Backslash
str = str.Replace("'", "\'") ' Single quote
str = str.Replace("""", "\""") ' Double quote
str = str.Replace(vbNewLine, "\n") ' New line
str = str.Replace(vbCr, "\r") ' Carriage return
str = str.Replace(vbTab, "\t") ' Horizontal tab
str = str.Replace(vbBack, "\b") ' Backspace
str = str.Replace(vbFormFeed, "\f") ' Form feed
Return str
End Function
I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "