I need a simple solution. I have a text that is improperly punctuated and in many places a comma is followed by a capital letter. Example: Here you are, You sicko. A comma followed by a cap. Any string to find these? ,\w doesn't work. I only want caps.
I only know basic regex. I'll use it to search in Notepad++
Thank you.
Try this one:
, [A-Z]
In general case, for any punctuation,
[.,!?\\-]+ [A-Z]+
See image below:
Link: https://regex101.com/r/BrGZmF/1
I have a text which contains some diacritic characters. I want to find them using regex but I haven't managed to find and remove only such characters (plus some others such as %^*$#).
The closest I have gotten was using unicode, but it matched way more than only diacritic characters. Can someone help with this? It seems simple but I've wasted a few hours on this already...
Thanks!
I have a large text file, originally generated in Microsoft Word, that contains these four character sequences, alongside regular text:
?~#~\
?~#~]
?~#~X
?~#~Y
From the content of what is written in the file, it appears that the sequences respectively correspond to open double quotes, close double quotes, open single quote, and close single quote. When displayed in Vim, everything in the sequences other than the question mark appears in blue.
I cannot remove them with a command such as
:.,$s/?~#~Y//
This command results in the following error from vim:
E33: No previous substitute regular expression
E476: Invalid command
Press ENTER or type command to continue
These commands also produce errors:
:.,$s/\?~#~Y//
:.,$s/\?\~\#\~Y//
Specifically,
E866: (NFA regexp) Misplaced ?
E476: Invalid command
Press ENTER or type command to continue
What would be the correct way to automatically remove or replace the sequences? Ideally, I'd like to remove the double quotes, and replace the open/close single quotes with a traditional single quote or apostrophe.
Since "everything in the sequences other than the question mark appears in blue", all characters except the question mark are probably binary characters. I'd suggest this approach:
go to the first sequence and yank it: press v to start marking, extend the mark to the end of the sequence, then press y
paste the sequence as the replace pattern from the unnamed register: :%s/Ctrl-r"//gEnter
repeat for the remaining sequences.
If you’re using a unicode-compatible encoding (such as utf-8) and your font supports it, the smart quotes will show properly.
Additionally, the digraphs for them are 6', 6", 9', and 9". This makes it pretty easy to chain a couple of substitutes to swap them for straight variants:
%s/<C-k>6'\|<C-k>9'/'/g
Etc. Wrap it in a function or command to make it easier for later.
Sorry to bump an old thread but I stumbled upon this late at night while trying to figure out how to remove the exact same characters from a bind9 configuration file that I had pasted in from a website. The aberrant characters were "~#~X", "~#~Y", " | ", and I believe another but I can't remember it at the moment. Anyway, regular expressions couldn't seem to find and replace using the above methods, but I was able to find a solution.
If you can set VIM to show the special characters in their binary representation, then you can use regex to find that. Here's how I did it:
Steps to fix
Open the file with the problem characters in VIM
(a) original method - :set encoding=latin1|set isprint=|set display+=uhex
(b) easier method - :set encoding=utf-8
NOTE: either of these should display the digraph characters in their binary form <<<>>>
(e.g. <80>, <99>, ... )
Then search and replace with VIM regex like so
:%s:\%xNN:':g #replace NN with byte code (i.e. 80, 99, etc.)
Let's break that command down, shall we:
%s: - search command looking for all occurrences due to the % at the start and the 's' for search. The ':' (colon) has been used as the delimiter in this case, but you can use other symbols to delimit the search command.
\%x - the backslash escapes the %x which represents a byte code that we're looking for (i.e. <2 x numbers between brackets>)
NN - replace with the two chars inside of the <> that you're looking to replace in your file. In my case, the byte codes were <e2>, <80>, <99>, which I had to search for separately.
:' - then, the colon delimiting the replacement group where I'm specifying a single quote to replace the byte code, you could put whatever text you want here.
:g - finally, the last colon delineation and the letter 'g' which means to search the entire file top to bottom.
You can do more research in VIM's help with:
:help isprint
Anyway, I hope this helps someone else in the future.
References:
https://blog-en.openalfa.com/how-to-edit-non-printing-and-unicode-characters-in-vim-editor
https://unix.stackexchange.com/questions/108020/can-vim-display-ascii-characters-only-and-treat-other-bytes-as-binary-data
VIM How do I search for a <XX> single byte representation
I am trying to get beyond compare ignore a single "-" and double "--" hyphen. I am comparing two PDF documents where the original has a single hyphen and the edited copy replaced all the single hyphens with doubles. I want Beyond Compare to ignore that difference.
I have searched and found where people are able to ignore after either a char or digit (below), but these may be preceded with a space?
[0-9,-]+
You could use the replacement setting (Session->Session Settings)
To do this, you don't need a regex, the replacement is done only in one side.
I am trying to match all latin characters in UTF 16 encoded text. I have been using [A-Za-z] which has been working great. As I've been parsing chinese and japanese text I've been coming across bizarre versions of A-Z that the regex isn't picking up.
https://gist.github.com/kyleect/1c66fd388d362653969d
Left are the characters I can't identify, right is from my keyboard. I copy and pasted them in to chrome page find input, google search and the find input in my text editor. All agree: Left == Right but Right != Left
What are these characters and wow do I target them in regex?
You can take a look at their character codes in your browser’s console:
> 'B'.charCodeAt(0).toString(16)
ff22
It’s a fullwidth letter! You can probably match the whole set with [\uff21-\uff3a] in a decent regex engine. Or A-Z in an even more decent one.