How to get Vim to highlight non-ascii characters?

How to get Vim to highlight non-ascii characters? - regex

I'm trying to get Vim to highlight non-ASCII characters. Is there an available setting, regex search pattern, or plugin to do so?

Using range in a [] character class in your search, you ought to be able to exclude the ASCII hexadecimal character range, therefore highlighting (assuming you have hlsearch enabled) all other characters lying outside the ASCII range:
/[^\x00-\x7F]
This will do a negative match (via [^]) for characters between ASCII 0x00 and ASCII 0x7F (0-127), and appears to work in my simple test. For extended ASCII, of course, extend the range up to \xFF instead of \x7F using /[^\x00-\xFF].
You may also express it in decimal via \d:
/[^\d0-\d127]
If you need something more specific, like exclusion of non-printable characters, you will need to add those ranges into the character class [].

Yes, there is a native feature to do highlighting for any matched strings.
Inside Vim, do:
:help highlight
:help syn-match
syn-match defines a string that matches fall into a group.
highlight defines the color used by the group.
Just think about syntax highlighting for your vimrc files.
So you can use below commands in your .vimrc file:
syntax match nonascii "[^\x00-\x7F]"
highlight nonascii guibg=Red ctermbg=2

For other (from now on less unlucky) folks ending up here via a search engine and can't accomplish highlighting of non-ASCII characters, try this (put this into your .vimrc):
highlight nonascii guibg=Red ctermbg=1 term=standout
au BufReadPost * syntax match nonascii "[^\u0000-\u007F]"
This has the added benefit of not colliding with regular (filetype [file extension] based) syntax definitions.

This regex works to highlight as well. It was the first google hit for "vim remove non-ascii characters" from briceolion.com and with :set hlsearch will highlight:
/[^[:alnum:][:punct:][:space:]]/

If you are interested also in the non printable characters use this one: /[^\x00-\xff]/
I use it in a function:
function! NonPrintable()
setlocal enc=utf8
if search('[^\x00-\xff]') != 0
call matchadd('Error', '[^\x00-\xff]')
echo 'Non printable characters in text'
else
setlocal enc=latin1
echo 'All characters are printable'
endif
endfunction

Based on the other answers on this topic and the answer I got here I've added this to my .vimrc, so that I can control the non-ascii highlighting by typing <C-w>1. It also shows inside comments, although you will need to add the comment group for each file syntax you will use. That is, if you will edit a zsh file, you will need to add zshComment to the line
au BufReadPost * syntax match nonascii "[^\x00-\x7F]" containedin=cComment,vimLineComment,pythonComment
otherwise it won't show the non-ascii character (you can also set containedin=ALL if you want to be sure to show non-ascii characters in all groups). To check how the comment is called on a different file type, open a file of the desired type and enter :sy on vim, then search on the syntax items for the comment.
function HighlightNonAsciiOff()
echom "Setting non-ascii highlight off"
syn clear nonascii
let g:is_non_ascii_on=0
augroup HighlightUnicode
autocmd!
augroup end
endfunction
function HighlightNonAsciiOn()
echom "Setting non-ascii highlight on"
augroup HighlightUnicode
autocmd!
autocmd ColorScheme *
\ syntax match nonascii "[^\x00-\x7F]" containedin=cComment,vimLineComment,pythonComment |
\ highlight nonascii cterm=underline ctermfg=red ctermbg=none term=underline
augroup end
silent doautocmd HighlightUnicode ColorScheme
let g:is_non_ascii_on=1
endfunction
function ToggleHighlightNonascii()
if g:is_non_ascii_on == 1
call HighlightNonAsciiOff()
else
call HighlightNonAsciiOn()
endif
endfunction
silent! call HighlightNonAsciiOn()
nnoremap <C-w>1 :call ToggleHighlightNonascii()<CR>

Somehow none of the above answers worked for me.
So I used :1,$ s/[^0-9a-zA-Z,-_\.]//g
It keeps most of the characters I am interested in.

Someone already have answered the question. However, for others that are still having problems, here is another solution to highlight non-ascii characters in comments (or any syntax group in the matter). It's not the best, but it's a temporary fix.
One may try:
:syntax match nonascii "[^\u0000-\u007F]" containedin=ALL contained |
\ highlight nonascii ctermfg=yellow guifg=yellow
This has mix parts from other solutions. You may remove contained, but, from documentation, there may be potential problem of recursing itself (as I understand). To view other defined patterns, syn-contains section would contain it.
:help syn-containedin
:help syn-contains
Replicated issue from: Set item to higher highlight priority on vim

Related

Notepad++ Regex Replace Makeshift Footnotes format With Proper Markdown format

In Word, I had to convert my footnotes to lines appearing at the end of each file to able to make changes in formatting. Some macro I found online was using braces and I ended up using also highlighting so I can see easily where my footnotes used to be. In this way, I have the following strings twice in my documents in the main text and also at the end of each document, sort of like makeshift endnotes.
=={1}==
.
.
.
=={99}==
I want to be able to match those instances in the text and convert them to proper markdown now. The problem is that the in-text format
[^1], [^2], etc.
will be different from what needs to come at the bottom with a semi-colon added:
[^1]:
etc.
So I'm guessing I'll have to live with replacing my old formatting with the new ones with semi-colons and deleting the semi-colons individually while I edit/clean up my text in the future. Without adding the semi-colon, it won't work.
My question is how to use the regex to match the two-digit strings with braces and equation marks.
This
==(\{d{1,2}\})==
did not work.
Also, as I am no pro, I would need the replacement as well. It probably will be
[^($1)]:
I reckon. Apparently, the equal mark doesn't have to be escaped.
Current format:
...some text...makeshift footnote in the format of
=={one- or two-digit number with no spaces in between}==
For example,
=={1}==
=={23}==
etc.
Desired result for all occurences recursively:
[^1]:
.
.
.
[^99]:
The markdown format is single square brackets with a caret and a number, also a semi-colon with the actual footnotes. Usually the number goes up to 42-45 maximum but it doesn't matter, the two digit regex is needed. As I said, the semi-colon will be needed in all instances.
Cheers

You have just some errors in your regex, you forget to escaped the d for digit, it should be \d and the capture group must not include the curly braces.
Use:
Ctrl+H
Find what: =={(\d{1,2})}==
Replace with: [^$1]:
TICK Wrap around
SELECT Regular expression
Replace all
Explanation:
=={ # literally
(\d{1,2}) # group 1, 1 or 2 digits
}== # literally
Screenshot (before):
Screenshot (after):

How do I remove all non-ASCII characters with regex and Notepad++?

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.
I need to know what command to write in find and replace (with picture it would be great).
If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked
If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...

This expression will search for non-ASCII values:
[^\x00-\x7F]+
Tick off 'Search Mode = Regular expression', and click Find Next.
Source: Regex any ASCII character

In Notepad++, if you go to menu Search → Find characters in range → Non-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.
Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:
[\x00-\x1F]+
In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:
[^\x1F-\x7F]+

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+
To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them
If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.
Cheers

To keep new lines:
First select a character for new line... I used #.
Select replace option, extended.
input \n replace with #
Hit Replace All
Next:
Select Replace option Regular Expression.
Input this : [^\x20-\x7E]+
Keep Replace With Empty
Hit Replace All
Now, Select Replace option Extended and Replace # with \n
:) now, you have a clean ASCII file ;)

Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.

Another way...
Install the Text FX plugin if you don't have it already
Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
Go to Find/Replace and look for ###. Replace it with a space.
This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.

Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file
Click on the [SOH] symbol in the file
CTRL=H to bring up the replace
Leave the 'Find What:' as is
Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...)
Click 'Replace All'
Done and done!

In addition to Steffen Winkler:
[\x00-\x08\x0B-\x0C\x0E-\x1F]+
Ignores \r \n AND \t (carriage return, linefeed, tab)

Regex -- replace all spaces before a particular character

The goal is to make something like
This is some text=This is some text
become:
This\ is\ some\ text=This is some text
I've been playing with variations of things I know will grab spaces/whitespaces (like "\ " or \s) in front of (?==) which seems to select until the = character, but nothing seems to be working in Intellij IDEA's search and replace.
Any suggestions?

Copying the answer from the comments in order to remove this question from the "Unanswered" filter:
This - (\s)(?=.*=) should work. Replace it with \$1
~ answer per Rohit Jain
This was additionally confirmed by the OP:
That worked, though I used a literal space instead of the \s because it was picking up some additional white space I didn't want replaced. Also had to do some silly escaping for the replace (\\$1 )

Replace all instances of character between tags with vim

I need to replace all instances of / character with \ between < filename >...< / filename > tags.
The file has like 2.000 of those tags and I only need to replace the / character inside those tags.
How can i do?

Edit: Given the new information, the below substitution would probably work:
:%s/<filename>\zs.\{-}\ze<\/filename>/\=substitute(submatch(0), '\/', '\', 'g')/
Explaination:
%s: substitute across the entire file
/<filename>: start of pattern and static text to match against
\zs: start of the matched text
.\{-}: any character, non greedy
\ze: end of matched text
<\/filename>/: end of targeted tag and pattern
\=: evaluate the replacement as a vim expression
substitute(submatch(0), '\/', '\', 'g')/: replace all /'s with \ in the matched text.
Original answer:
I'm going to assume you mean XML-style tags here. What I would do is visually select the area you'd like to operate on, then use the \%V regex atom to only operate on that selection.
vit:s!\%V/!\\!g
Should do the trick. Note that when pressing :, vim will automatically add a range for the visual selection, the actual substitution command will look like:
:'<,'>s!\%V/!\\!g

Iff we can assume that the tags are on single lines, it is simply:
Note Enter ^M as C-vC-m (C-qC-m on windows)
:g/<filename>/norm! /filename>/e^Mvity:let #"=substitute(#", '/', '\\', "g")^Mgvp
Hmmm integrating the hint by Randy on using \%V in a pattern makes it simpler:
:g/<filename>/norm! /filename>/e^Mvit:s#\%V/#\\#g^M
I tested both. Whoo. I'll explain now. Hold on.
:g/<filename>/ - _for each line containing <filename>
norm! - _execute normal commands (ignoring mappings)
/filename/eEnter jump to the end of the open tag
vit - select the inner text of that tag in visual mode
:s#\%V/#\\#gEnter - _on that visual selection, perform the substitution (replace \ by /)

VIM has a sharp learning curve, as do regex's. I believe this command will do it. You have to escape each char with '\'.
:%s/\//\\/g

Regex - Multiline Problem

I think I'm burnt out, and that's why I can't see an obvious mistake. Anyway, I want the following regex:
#BIZ[.\s]*#ENDBIZ
to grab me the #BIZ tag, #ENDBIZ tag and all the text in between the tags. For example, if given some text, I want the expression to match:
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
At the moment, the regex matches nothing. What did I do wrong?
ADDITIONAL DETAILS
I'm doing the following in PHP
preg_replace('/#BIZ[.\s]*#ENDBIZ/', 'my new text', $strMultiplelines);

The dot loses its special meaning inside a character class — in other words, [.\s] means "match period or whitespace". I believe what you want is [\s\S], "match whitespace or non-whitespace".
preg_replace('/#BIZ[\s\S]*#ENDBIZ/', 'my new text', $strMultiplelines);
Edit: A bit about the dot and character classes:
By default, the dot does not match newlines. Most (all?) regex implementations have a way to specify that it match newlines as well, but it differs by implementation. The only way to match (really) any character in a compatible way is to pair a shorthand class with its negation — [\s\S], [\w\W], or [\d\D]. In my personal experience, the first seems to be most common, probably because this is used when you need to match newlines, and including \s makes it clear that you're doing so.
Also, the dot isn't the only special character which loses its meaning in character classes. In fact, the only characters which are special in character classes are ^, -, \, and ]. Check out the "Metacharacters Inside Character Classes" section of the character classes page on Regular-Expressions.info.

// Replaces all of your code with "my new text", but I do not think
// this is actually what you want based on your description.
preg_replace('/#BIZ(.+?)#ENDBIZ/s', 'my new text', $contents);
// Actually "gets" the text, which is what I think you might be looking for.
preg_match('/(#BIZ)(.+?)(#ENDBIZ)/s', $contents, $matches);
list($dummy, $startTag, $data, $endTag) = $matches;

This should work
#BIZ[\s\S]*#ENDBIZ
You can try this online Regular Expression Testing Tool

The mistake is the character group [.\s] that will match a dot (not any character) or white space. You probably tried to get .* with . matching newline characters, too. You achieve this by enabling the single line option ((?s:) does this in .NET regex).
(?s:#BIZ.*?#ENDBIZ)

Depending on the environment you're using your regex in, it may need special care to properly parse multiline text, eg re.DOTALL in Python. So what environment is that?

you can use
preg_replace('/#BIZ.*?#ENDBIZ/s', 'my new text', $strMultiplelines);
the 's' modifier says "match the dot with anything, even the newline character". the '?' says don't be greedy, such as for the case of:
foo
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
bar
#BIZ
some text some test
more text
maybe some code
#ENDBIZ
hello world
the non-greediness won't get rid of the "bar" in the middle.

Unless I am missing something, you handle this the same way that you would in Perl, with either the /m or /s modifier at the end? Oddly enough the other answers that rather correctly pointed this out got down voted?!

It looks like you're doing a javascript regex, you'll need to enable multiline by specifying the m flag at the end of the expression:
var re = /^deal$/mg

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get Vim to highlight non-ascii characters? - regex

I'm trying to get Vim to highlight non-ASCII characters. Is there an available setting, regex search pattern, or plugin to do so?

This regex works to highlight as well. It was the first google hit for "vim remove non-ascii characters" from briceolion.com and with :set hlsearch will highlight: /[^[:alnum:][:punct:][:space:]]/

Somehow none of the above answers worked for me. So I used :1,$ s/[^0-9a-zA-Z,-_\.]//g It keeps most of the characters I am interested in.

Related

Notepad++ Regex Replace Makeshift Footnotes format With Proper Markdown format

How do I remove all non-ASCII characters with regex and Notepad++?

Regex -- replace all spaces before a particular character

Replace all instances of character between tags with vim

Regex - Multiline Problem

Categories

Resources