Matching any combination of space AND newline - regex

I'm trying to find a regexp that catches all instances that contain at least one \n and any number of (space), no matter the order. So, for instance (with spaces denoted with _), all of these should be caught by the regexp:
\n
\n\n\n\n
\n\n\n_\n\n
_\n
\n_
_\n_
_\n\n
\n\n_
_\n\n_
_\n\n_\n
\n_\n_
_\n\n_\n_
___\n__\n and so on...
However, it must not catch spaces that do not border a \n.
In other words, I'd like to reduce all of this (if I'm not making any mistake) to one line:
import re
mystring = re.sub(r'(\n)+' , '\n' , mystring)
mystring = re.sub(r'( )+' , ' ' , mystring)
mystring = re.sub(r'\n ' , '\n' , mystring)
mystring = re.sub(r' \n' , '\n' , mystring)
mystring = re.sub(r'(\n)+' , '\n' , mystring)
mystring = re.sub(r'(\n)+' , ' | ' , mystring)

[ ]*(?:\n[ ]*)+
or, if you want to match tabulations:
[ \t]*(?:\n[ \t]*)+
Demo & explanation

You can use the following regular expression:
(( )*\n+( )*)+

Related

regex in Python to remove commas and spaces

I have a string with multiple commas and spaces as delimiters between words. Here are some examples:
ex #1: string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
ex #2: string = 'word1 word2 word3'
ex #3: string = 'word1,word2,word3,'
I want to use a regex to convert either of the above 3 examples to "word1, word2, word3" - (Note: no comma after the last word in the result).
I used the following code:
import re
input_col = 'word1 , word2 , word3, '
test_string = ''.join(input_col)
test_string = re.sub(r'[,\s]+', ' ', test_string)
test_string = re.sub(' +', ',', test_string)
print(test_string)
I get the output as "word1,word2,word3,". Whereas I actually want "word1, word2, word3". No comma after word3.
What kind of regex and re methods should I use to achieve this?
you can use the split to create an array and filter len < 1 array
import re
s='word1 , word2 , word3, '
r=re.split("[^a-zA-Z\d]+",s)
ans=','.join([ i for i in r if len(i) > 0 ])
How about adding the following sentence to the end your program:
re.sub(',+$','', test_string)
which can remove the comma at the end of string
One approach is to first split on an appropriate pattern, then join the resulting array by comma:
string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
parts = re.split(",*\s*", string)
sep = ','
output = re.sub(',$', '', sep.join(parts))
print(output
word1,word2,word3
Note that I make a final call to re.sub to remove a possible trailing comma.
You can simply use [ ]+ to detect extra spaces and ,\s*$ to detect the last comma. Then you can simply substitute the [ ]+,[ ]+ with , and the last comma with an empty string
import re
input_col = 'word1 , word2 , word3, '
test_string = re.sub('[ ]+,[ ]+', ', ', input_col) # remove extra space
test_string = re.sub(',\s*$', '', test_string) # remove last comma
print(test_string)

Oracle REGEX REPLACE - How to replace quote with backslash quote, newline with space, and blackslash with double blackslash

I have some fields that could have the following values:
"
\
\n <-- or any possible carriage return
I want to replace them with the following:
\"
\\
<-- this represents a space
Ideally, I would like to do it in a single pass using REGEX_REPLACE or some other method.
I'm currently doing it like the following. It is inefficient because it has to make three passes.
SELECT replace(
replace(
replace(
'He\llo " I am \na \string\n pl"eas\ne fi"x me\n,
'\',
'\\'
),
'\n',
' '
),
'"',
'\"'
)
FROM DUAL;
The output should be
He\\llo \" I am \ a \\string\ pl\"eas\ e fi\"x me\

Regular expression for replacing "[" "]" with single quote " ' " character in string

Hi I am new to regular expression.
I have string like this
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )"
I tried replace the square brackets [] with single quote ' with replaceAll() method. But it did not work.
Can any one help me what will be regular expression for replacing the square brackets [] with single quote in my above string.
\\[\\] replace [] in a string. And use below to replace [] with '
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )".replace(new RegExp('\\[|\\]', 'g'), "'");
but that creates two single quotes, one as a child of other. That will not work. So, you must replace with \' to avoid the error as shown below.
"( NATIVE_WHERE_CLAUSE = 'UnitOfMeasure.MeasurementType=[Weight]' ) AND ( NATIVE_RELATION_WHERE_CLAUSE = 'Reference_Name=[Nut to coolent oil]' )".replace(new RegExp('\\[|\\]', 'g'), "\\'");

Regex to match all backslash characters, except new line, carriage return, etc

I'm looking for a regex to escape all backslash characters into double blackshash characters, except for newlines, carriage returns, etc..
This function:
Public Function EscapeInvalidCharacters(ByVal str As String) As String
str = str.Replace("\", "\\") ' Backslash
str = str.Replace("'", "\'") ' Single quote
str = str.Replace("""", "\""") ' Double quote
str = str.Replace(vbNewLine, "\n") ' New line
str = str.Replace(vbCr, "\r") ' Carriage return
str = str.Replace(vbTab, "\t") ' Horizontal tab
str = str.Replace(vbBack, "\b") ' Backspace
str = str.Replace(vbFormFeed, "\f") ' Form feed
Return str
End Function
with this input:
"This is a test.\nThis is another test.\n"
Wrongly, generates the following output:
"This is a test.\\nThis is another test.\\n"
Is it possible to alter the line str = str.Replace("\", "\\") so that it doesn't alter the newlines?
\(?!r|n|t|b|f)
Try this.This should do it
Thanks to vks, I came up with this solution:
Public Function EscapeInvalidCharacters(ByVal str As String) As String
'str = str.Replace("\", "\\") ' Backslash
str = Regex.Replace(str, "\\(?!n|r|t|b|f)", "\\")' Backslash
str = str.Replace("'", "\'") ' Single quote
str = str.Replace("""", "\""") ' Double quote
str = str.Replace(vbNewLine, "\n") ' New line
str = str.Replace(vbCr, "\r") ' Carriage return
str = str.Replace(vbTab, "\t") ' Horizontal tab
str = str.Replace(vbBack, "\b") ' Backspace
str = str.Replace(vbFormFeed, "\f") ' Form feed
Return str
End Function

Convert punctuation to space

I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "