I am looking for a regular expression to remove all special characters from a string, except whitespace. And maybe replace all multi- whitespaces with a single whitespace.
For example "[one# !two three-four]" should become "one two three-four"
I tried using str = Regex.Replace(strTemp, "^[-_,A-Za-z0-9]$", "").Trim() but it does not work. I also tried few more but they either get rid of the whitespace or do not replace all the special characters.
[ ](?=[ ])|[^-_,A-Za-z0-9 ]+
Try this.See demo.Replace by empty string.See demo.
http://regex101.com/r/lZ5mN8/69
Use the regex [^\w\s] to remove all special characters other than words and white spaces, then replace:
Regex.Replace("[one# !two three-four]", "[^\w\s]", "").Replace(" ", " ").Trim
METHOD:
instead of trying to use replace use replaceAll eg :
String InputString= "[one# !two three-four]";
String testOutput = InputString.replaceAll("[\\[\\-!,*)##%(&$_?.^\\]]", "").replaceAll("( )+", " ");
Log.d("THE OUTPUT", testOutput);
This will give an output of one two three-four.
EXPLANATION:
.replaceAll("[\\[\\-!,*)##%(&$_?.^\\]]", "") this replaces ALL the special characters present between the first and last brackets[]
.replaceAll("( )+", " ") this replaces more than 1 whitespace with just 1 whitespace
REPLACING THE - symbol:
just add the symbol to the regex like this .replaceAll("[\\[\\-!,*)##%(&$_?.^\\]]", "")
Hope this helps :)
Related
I'm trying to remove every white space and line break before the first character.
For example,
let str = " \n \n hello, my name is jay"
// do something
str = "hello, my name is jay"
No matter how many line breaks and white spaces are in front of the first character, I want to remove all of them.
Thanks guys!
You could just try this:
str = str.replace("\n", "").trim()
That'll replace all the \n and trim spaces off your string.
The following find and replace seems to be working:
Find: let\s+(\S+)\s*=\s*"(?:\s|\\[nrt])+(.*?)"
Replace: let $1 = "$2"
Demo
The strategy is to match and capture the string literal definition, while removing the leading whitespace, should it be present. This answer assumes that you want to apply a regex to actual source code script to remove leading whitespace from string literal definitions.
String s="Swamy Application";
s=s.replaceAll("\\S"," ");
system.out.println(s);
Should return String but we are getting empty
I need explanation What happening in \\S.
String.replaceAll() takes a regex as first parameter and the replacement text for all matches of that regex as second parameter. Here, you have given \\S as the first parameter, which matches every non-whitespace character. The replacement string given is a whitespace. So the returned String would be having whitespaces only.
\S matches any non-whitespace character which is leading to replace the alpha characters in the string to whitespace.
"Swamy Application" -> " "
More about this at source
If you are trying to replace the whitespace character from the string then use:
"\s"
else if you are trying to replace the only S character from the string then use:
"S"
c = re.split(r'\w+', message)
print(c)
message contains '!nano speak', but the regex is giving me this in return:
>>> ['!', ' ', '\r\n']
I'm very new to regex, but this seems like something I should get, and I can't seem to find this problem in search. It seems like it's doing exactly the opposite, and I'm sure it's a lower-case w.
re.split is using the regex as a delimiter to split the string. You set the delimiter to be any number of alphanumeric characters. This means that it will return everything between words.
In order to get the tokens defined by the regex you can use re.findall:
>>> re.findall(r'\w+', '!nano speak')
['nano', 'speak']
\w matches word character (alphanumeric and underscore), so in the string "!nano speak", it matches everything except "!" and the space, then splitting according to "nano" and "space". So you get "!", " " and "\r\n".
To remove all non characters, you should
re.sub("[^a-zA-Z]+", "", "!nano speak")
I currently need to figure out how to use regex and came to a point which i don't seem to figure out:
the test strings that are the sources (They actually come from OCR'd PDFs):
string1 = 'Beleg-Nr.:12123-23131'; // no spaces after the colon
string2 = 'Beleg-Nr.: 12121-214331'; // a tab after the colon
string3 = 'Beleg-Nr.: 12-982831'; // a tab and spaces after the colon
I want to get the numbers eplicitly. For that I use this pattern:
pattern = '/(?<=Beleg-Nr\.:[ \t]*)(.*)
This will get me the pure numbers for string1 and string2 but isn't working on string3 (it gives me additional whitespace before the number).
What am I missing here?
Edit: Thanks for all the helpful advises. The software that OCRs on the fly is able to surpress whitespace on its own in regexes. This did the trick. The resulting pattern is:
(?<=Beleg-Nr\.:[\s]*)(.*)
You can use "\s" special symbol to include both space and tabs (so, you will not need combine it into a group via []).
This works for me:
/(Beleg-Nr.:\s*)(.*)/
http://regexr.com?35rj6
The problem is that [ ]* will match only spaces. You need to use \s which will match any whitespace character (more specifically \s is [\f\n\r\t\v\u00A0\u2028\u2029]) :
/(?<=Beleg-Nr.:\s*)(.*)/
Side note:
* is greedy by default, so it will try to match max number of whitespaces possible, so you do not need to use negative [^\s] in your last () group.
Just replace the (.*) with a more restrictive pattern ([^ ]+$ for example). Also note, that the . after Beleg-Nr matches other chars as well.
The $ in my example matches the end of the line and thus ensures, that all characters are being matched.
I'd suggest to match to tabs as well:
pattern = '/(?<=Beleg-Nr\.:[ \t]*)([^ \t]+)$
I want to remove trailing white spaces and tabs from my code without
removing empty lines.
I tried:
\s+$
and:
([^\n]*)\s+\r\n
But they all removed empty lines too. I guess \s matches end-of-line characters too.
UPDATE (2016):
Nowadays I automate such code cleaning by using Sublime's TrailingSpaces package, with custom/user setting:
"trailing_spaces_trim_on_save": true
It highlights trailing white spaces and automatically trims them on save.
Try just removing trailing spaces and tabs:
[ \t]+$
To remove trailing whitespace while also preserving whitespace-only lines, you want the regex to only remove trailing whitespace after non-whitespace characters. So you need to first check for a non-whitespace character. This means that the non-whitespace character will be included in the match, so you need to include it in the replacement.
Regex: ([^ \t\r\n])[ \t]+$
Replacement: \1 or $1, depending on the IDE
The platform is not specified, but in C# (.NET) it would be:
Regular expression (presumes the multiline option - the example below uses it):
[ \t]+(\r?$)
Replacement:
$1
For an explanation of "\r?$", see Regular Expression Options, Multiline Mode (MSDN).
Code example
This will remove all trailing spaces and all trailing TABs in all lines:
string inputText = " Hello, World! \r\n" +
" Some other line\r\n" +
" The last line ";
string cleanedUpText = Regex.Replace(inputText,
#"[ \t]+(\r?$)", #"$1",
RegexOptions.Multiline);
Regex to find trailing and leading whitespaces:
^[ \t]+|[ \t]+$
If using Visual Studio 2012 and later (which uses .NET regular expressions), you can remove trailing whitespace without removing blank lines by using the following regex
Replace (?([^\r\n])\s)+(\r?\n)
With $1
Some explanation
The reason you need the rather complicated expression is that the character class \s matches spaces, tabs and newline characters, so \s+ will match a group of lines containing only whitespace. It doesn't help adding a $ termination to this regex, because this will still match a group of lines containing only whitespace and newline characters.
You may also want to know (as I did) exactly what the (?([^\r\n])\s) expression means. This is an Alternation Construct, which effectively means match to the whitespace character class if it is not a carriage return or linefeed.
Alternation constructs normally have a true and false part,
(?( expression ) yes | no )
but in this case the false part is not specified.
[ |\t]+$ with an empty replace works.
\s+($) with a $1 replace also works, at least in Visual Studio Code...
To remove trailing white space while ignoring empty lines I use positive look-behind:
(?<=\S)\s+$
The look-behind is the way go to exclude the non-whitespace (\S) from the match.
To remove any blank trailing spaces use this:
\n|^\s+\n
I tested in the Atom and Xcode editors.
In Java:
String str = " hello world ";
// prints "hello world"
System.out.println(str.replaceAll("^(\\s+)|(\\s+)$", ""));
You can simply use it like this:
var regex = /( )/g;
Sample: click here