vim how to remove encoding special signs - regex

I have a document in vim which contains encoding-related chars I want to get rid of (e.g. replace with "").
I have a general problem in describing their origin. There are examples of how they are displayed in different editors (my desired tool to get rid of them is vim).
in vim:
Oś<9c>więcim (<9c> is a part I would like to get rid of)
in Geany:
(but copy-paste copies without this 'square' sign)
in LibreOffice Calc:
Please note there are other Polish-langauage-specific signs in my text whcih are displayed correct.
Q: how to regex it out in vim?

You can enter the <9c> via :help i_CTRL-V_digit by pressing Ctrl + V (on Windows, often Ctrl + Q instead), followed by X and the hexadecimal number:
:%s/<C-V>x9c//g
Alternatively, the special \%x9c regular expression atom matches that value:
:%s/\%x9c//g
Alternatively, you could also just yank the character when the cursor is on it via yl, and then paste in the :s command-line via <C-R>".

Related

How would I copy and paste selected text using Regular Expressions and the Replace dialog in Notepad ++?

Dvelving straight into the problem; all I'm trying to do here is to duplicate a line and add a bracket at the end using Regular Expressions and automate the process through the Replace With dialog in Notepad ++.
My issue visualized:
In the representation underneath, I have a bunch of instances of "["Mesh"]" that all have different path values assigned to it. All I want to do is duplicate it the path entry and add bracket at the end before the comma in the duplicated one.
What I have right now:
...
["Mesh"] = Platform(
"models/ships/japan/Zuikaku.mmod",
...
What I'm trying to achieve:
...
["Mesh"] = Platform(
"models/ships/japan/Zuikaku.mmod",
"models/ships/japan/Zuikaku.mmod"),
...
Without getting too specific, since there are ~500 of these instances across the file I'm modifying, I do not want to go through each one while simply clicking CTRL + D to duplicate each line and add the bracket as that would take literal ages to do.
I have some limited experience with Regular Expressions from previous uses, but very limited. I know I can select the entire line in the Search dialog using ".*" but that's as far as I've gotten.
Thank you in advance for your time!
You should be able to use this regex (disable . matches newline). I am using (\R+) to capture end-of-line characters (and reproduce them in the output) so that it will work on systems that use other than just newline to end lines.
(\["Mesh"\]\s*=\s*.*(\R+))(.*),$
Replace with
$1$3,$2$3\),
For the input of
...
["Mesh"] = Platform(
"models/ships/japan/Zuikaku.mmod",
...
This gives
...
["Mesh"] = Platform(
"models/ships/japan/Zuikaku.mmod",
"models/ships/japan/Zuikaku.mmod"),
...

Removed hyphen from word_separators, ctrl+d no longer makes sense

I removed the - from the word_separators setting, and that works fine.
But ctrl + d on the word a still matches the "a" in a-b, I don't want it to do that anymore.
It's because ctrl + d wraps your search with regex boundaries \b, and - is still considered a boundary.
Is there anything I can do to now make ctrl + d not consider - a boundary anymore.
EDIT: picture:
The "a" in a-b should not be highlighted, as a-b is a single variable name in this language, which is why I removed the - from word_separators
More clarification: If I'm trying to replace all instance of the variable a, I don't want it matching against parts of other variables, like the "a" in a-b.
From what I can tell from some informal experimenting while answering your other question, the "word_separators" setting seems to primarily relevant when double-clicking to select words. For example, I have the following words in a file:
and my word_separators list is ./\\()\"'-:,;<>~!##%^&*|+=[]{}`~?$, so it includes - and / but not _. If I put my cursor in the first foo (without selecting the whole word first) and hit CtrlD, I get
and if I continue hitting CtrlD for several more times, I get
so only the "individual words" are selected - foo_bar is not, nor is foobar. However, if I set word_separators to .\\()\"':,;<>~!##%^&*|+=[]{}`~?$ (removing - and /) I get the same results when hitting CtrlD repeatedly:
- and / are still treated as word separators, even though I removed them from the list. If I add _ to the word_separators list, the results are the same, and only one obvious conclusion can be drawn: word_separators is ignored by CtrlD (find_under_expand).
However, the word_separators list IS used when double-clicking to select a word. With the list like this: .\\()\"'_:,;<>~!##%^&*|+=[]{}`~?$ (missing - and /, but with _), double-clicking on foo in each word in turn gives the following:
Interestingly, double-clicking on the very first foo gives
indicating that the "box" highlighting of similar selections is not paying attention to word_separators.
When using Find -> Find... to search, word_separators is ignored. When nothing is selected and foo is entered into the search box (non-regex search), the following matches are highlighted:
This is the same regardless of whether -, /, and/or / are in word_separators or not.
If "Whole Word" is set in the options, the results are a bit different, but again they don't change regardless of whether -, /, and/or / are in word_separators:
TL;DR
So, the conclusion is that word_separators is only in effect when double-clicking to select a word. Using a Find dialog or CtrlD (find_under_expand command) relies on some internal separator list, which apparently can't be altered (see my answer here).
A little bit more
Some info I forgot to add earlier: word_separators is also used by some plugins for various sorts of things, such as creating/modifying/otherwise working with selections, doing programmable completions, find and replace, and other sorts of stuff.

EditPad: Need a regex that handles multiple possible data formats

First, I'm using EditPadPro for my regex cleaning, so any answers given should work within that environment.
I get a large spreadsheet full of data that I have to clean every day. I've managed to get it down to a couple of different regexes that I run, and this works... but I'm curious to see if it's possible to reduce down to a single regex.
Here is some sample data:
3-CPC_114851_70095_70095_CAN-bre
3-CPC_114851_70095_70095_CAN
b11-ao1-113775-bre
b7-ao-114441
b7-ao-114441-bre
b7-ao1-114441
b7-ao1-114441-bre
http://go.nlvid.com/results1/?http://bo
go.nlv/results1/?click
b4-sm-1359
b6-sm-1356-bre
1359_195_1453814569-bre
1356_104_1456856729
b15-rad-8905
b15-rad-8905-bre
Here is how the above data needs to end up:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
So, there are numerous rules, such as:
In cases of more than 2 underscores, the result needs to contain only the value immediately after the first underscore, and everything from the dash onwards.
In cases where the string contains "-ao-", "-ao1-", everything prior to the final numeric string should be removed.
If a question mark is present, everything from the mark onwards should be removed.
If the string contains "-sm-" or "-rad-", everything prior to those alpha strings should be removed.
If the string contains 2 underscores, averything after the first numeric string up to a dash
(if present) should be removed, and the string "sm-" should be prepended.
Additionally there is other data that must be left untouched, including but not limited to:
113535|24905|24905
as well as many variations on this pattern of xxxxxx|yyyyy|zzzzz (and not always those string lengths)
This may be asking way too much of regex, I'm not sure as I'm not great with it. But I've seen some pretty impressive things done with it, so I thought I'd put this out to the community and see what you come back with.
Jonathan, I can wrap all of those into one regex, except the last one (where you prepend sm- to a string that does not contain sm). It is not possible in this context, because we cannot capture "sm" to reuse in the replacement, and because there is no "conditional replacement" syntax in EPP.
That being said, you can achieve what you want in EPP with two regexes and one macro to chain the two.
Here is how.
The solution below is tested in EPP.
Regex 1
Press Ctrl + Sh + F to enter Search / Replace mode
Enter the following Search and Replace in the appropriate boxes
At the top right of the Search bar, click the Favorite Searches pull-down, select "Add", give it a name, e.g. Regex 1
Search:
(?mx)^
(?=(?:[^_\r\n]*?_){3})[^_\r\n]+?_([^_\r\n]+)[^-\r\n]+(-[^\r\n]+)?
|
[^\r\n]*?-ao1?-\D*([^\r\n]+)
|
([^\r\n?]*)(?=\?)[^\r\n]+
|
[^\r\n]*?-((?:sm|rad)-[^\r\n]+)
Replace:
\1\2\3\4\5
Regex 2
Same 1-2-3 steps as above.
Search
^(?!(?:[^_\r\n]*?_){3})(?=(?:[^_\r\n]*?_){2})(\d+)(?:[^-\r\n]+(-[^\r\n]+)?)
Replace
sm-\1\2
Chaining Regex 1 and Regex 2
Top menu: Macros, Record Macro, give it a name.
Click the Favorite searches pulldown, select Regex 1
Hit Replace All.
Click the Favorite searches pulldown, select Regex 2
Hit Replace All.
Macros, Stop recording.
Whenever you want to do your sequence of replacements, pull it by name under the Macros menu.
Testing This
I have tested my "Jonathan macro" on your input. Here is the result:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
Try this:
Toggle the Search Panel : SHIFT+CTRL+F
SEARCH: .*?((?:sm-|rad-)?(?:(?:\d+|[\w\.]+\/.*?))(?:-\w+)?$)
REPLACE: $1
Check REGEX and WORDS
Click Replace All or Hit CTRL+ALT+F3
Check the image below:

How do I substitute selected contents despite any regex characters in vim?

In following code:
int return_int_func() { return 0; }
float fv = return_int_func();
Obviously, compiler will warn me fv may lost precisions because of auto-casting. Face lots of those things, I want replace all stuffs with substitute command. In short, I want this:
float fv = static_cast<float>(return_int_func());
But real codes has lots of forms like that:
float fv = obj.int_field;
float fv = obj->load_int_field("name");
float fv = xx.yy->zz;
I want select my target (obj.int_field,obj->load_int_field("name") or xx.yy->zz) and replace it with static_cast<float>(\1). I tried this:
:'<,'>s/\%V/static_cast<float>(&)/g
But vim replaces all characters in selected word with static_cast... and that isn't what I want at all. Then I tried this:
:'<,'>s/\(\%V\)/static_cast<float>(\1)/g
Vim also do the same thing. I have no idea how to replace whole content (and despite any regex characters) with my pattern. Any suggestions?
The solution is almost too easy! Here it is.
:s/\%V.*\%V./static_cast<float>(&)/
This is actually almost the same as the example from the :help. We can take away from this that we should all just have looked up :h /\%V first thing in the morning ...
\%V is a zero-width atom that matches stuff that is selected in Visual mode. Here it can match at the start of the Visual area. .* then matches (greedily) as much as it can; its greediness is reined in by the final \%V., which requires the last character of the match also to lie within the Visual area.
Tip: If you need to make this change many times over many lines, define the following mappings (even better: put them in your vimrc permanently).
nnoremap & :&&<CR>
xnoremap & :&&<CR>
Then you can repeat the substitution shown above by simply selecting something, and then pressing & to perform the substitution.
Let me try to paraphrase your question: You would like to Visual select some text, and then perform a substitution, where the selected text is also part of the replacement text.
I think in this case a macro is a much better tool.
To create the macro, first select the first piece of text that you want to wrap in the static cast. For example, select return_int_func(). (For each step, I'll show what the buffer looks like.)
When you're ready, press qq to start recording into register q, then press c.
float fv = |;
Type the left part of your wrapper text, static_cast<float>(.
float fv = static_cast<float>(|;
Press CTRL-R " (Control-R followed by "): this will reinsert the original text.
float fv = static_cast<float>(return_int_func()|;
Type ) to complete the change, and then Escape to leave insert mode.
float fv = static_cast<float>(return_int_func()▉;
Finally, press q to stop recording.
At this point you have made the first change and also recorded it as a macro in register q.
For all remaining changes, simply select a target such as obj->load_int_field("name") and press #q to repeat the change.
Look up :help 10.1 for more information about macros.
The \%V facility is really not for acting on the selected text as a whole; more for searching inside of that text.
Assuming that you are going to be putting this into a function or maybe mapping this to a key combination, here is an alternative approach that does what you are looking for:
:exec 'normal! gv"adistatic_cast<float>('|exec 'normal! "apa)'
Note that this will use your a buffer, so if you want to use another buffer you can change the two instances of "a with "x, where x is the buffer you wish to use.
Basically this is going to programmatically yank the selected text, insert static_cast<float>(, paste the text that was selected, and then insert ).
You can try
:%s/float fv =(.*)$/float fv = static_cast<float>(\1)/g
if I understood you right, you want to do text substitution only on selected text. This is not so easy to do, at least no so easy as a :s command can do. Because, your visual selection can be in single line, can cross multi lines, also it could char-wise, line-wise, block-wise..
but it can be done with this function:
function! SubVisualText(pat, repl,flag)
try
let v_save = #v
normal! gv"vy
let s = #v
let s = substitute(s, a:pat, a:repl,a:flag)
call setreg('v',s,visualmode())
normal! gv"vp
finally
let #v = v_save
endtry
endfunction
you use it by:
source the function
visual select the area (could be done by v, V or Ctrl-V)
:<ctrl-u>call SubVisualText(pattern, replacement, flag)<Enter>
the <ctrl-u> is for removing the leading range, since the function doesn't need the range.
when you run it, it looks like:(I just tested with Ctrl-V selection)

Sublime Text macro to find and replace file path characters on current line

I use Sublime 2 for developing R and PHP code, although I imagine this shortcut would be useful for other languages.
If I copy the path of a file from Windows Explorer / XYPlorer (or other source) it has backslashes for directories. When entering a path into the source code, it needs forward slashes.
Sublime has some reasonably powerful macro commands, but I cannot think of a combination that would be able to:
take the string of text on the current line
replace all instances of '\' and replace them with '/'
Here is the workflow that I envisage:
Locate my filename in Explorer and copy its path
In Sublime, write a line of code and paste in the path
Hit a keyboard shortcut, say Ctrl+Shift+\, and all back slashes are converted to forward slashes
The result:
myPath = "E:\WORK\Code\myFile.csv";
Becomes:
myPath = "E:/WORK/Code/myFile.csv";
Without running the risk of backslashes elsewhere in the file being changed (e.g. \n characters), and without having to use multiple key presses or mouse clicks.
I imagine this would be possible with Regex. Two things I am no expert in are Sublime macros or regex, so I wonder if anyone else knows the magical commands that would achieve this?
I tried this for about 15 minutes. A few things:
Sublime text 2 doesn't allow for find/replace with macros
Sublime text 3 doesn't allow for 'find in selection'
So, I think you are kind of beat right now other than writing a plugin, which would be fairly straightforward.
This works for Sublime Text 3:
Type r before the string to tell python to read the directory as a raw string.
This way all the backslashes are read as slashes instead of 'ignore next character' (default meaning of \ in python)
Example
myPath = r"E:\WORK\Code\myFile.csv"
Python should now read the \ as /