Copying only the value at column n Vim - regex

I have a file with long lines and need to see/ copy what the values are in a specic location(s) for the whole file but copy the rest of the line.
If the text width is small enough, ~184 columns, I can use :set colorcolumnnum to highlight the value. However over 184 characters it gets a bit unwieldy scrolling.
I tried :g/\%1237c/y Z, for one of the positions I needed, but that yanked the entire line.
eg for a smaller sample :g/\%49c/y Z will yank all of line 1 and 2 but I want to yank, or copy, the character at that column ie = on line 1 and x on line 2.
vim: filetype=help foldmethod=indent foldclose=all modifiable noreadonly
Table of Contents *sfcontents* *vim* *regex* *sfregex*
*sfsearch* - Search specific commands
|Ampersand-replaces-previous-pattern|
|append-a-global-search-to-a-register|
*sfHelp* Various Help related commands

There are two problems with your :g command:
For each matching line, the cursor is positioned on the first column. So even though you've matched at a particular column, that position is lost.
The \%c atom actually matches byte indices (what Vim somewhat confusingly names "columns"), so your measurement will be off for Tab and non-ASCII characters. Use the virtual column atom \%v instead.
Instead of :global, I would use :substitute with a replace-expression, in the idiom described at how to extract regex matches using vim:
:let t=[] | %s/\%49v./\=add(t, submatch(0))[-1]/g | let ## = join(t, "\n")
Alternatively, if you install my ExtractMatches plugin, I'd be that short command invocation:
:YankMatchesToReg /\%50v./

Related

Edit CSV rows in two different ways

I have a bash script that outputs two CSV columns. I need to prepend the three-digit number of those rows of the second column that contain them with "f. " and keep the rest of the rows intact. I have tried different ways so far but each has failed in one way or another.
What I've tried mainly has been to use regular expressions with either the first or second column to separate the desired rows from the rest, but I can't separate and prepend at the same time without cancelling out or messing up the process somehow. Some of the commands I've used so far have been: $ sed $ cut as well as (nested) for loops, read-while loops, if/else and if/else/elif statements, etc. What follows is one such (failed) solution:
for var1 in "^.*_[^f]_.*"
do
sed -i "" "s:$MSname::" $pathToCSV"_final.csv"
for var2 in "^.*_f_.*"
do
sed -i "" "s:$MSname:f.:" $pathToCSV"_final.csv"
done
done
And these are some sample rows:
abc_deg0014_0001_a_1.tif,British Library 1 Front Board Outside
abc_deg0014_0002_b_000.tif,British Library 1 Front Board Inside
abc_deg0014_0003_f_001r.tif,British Library 1 001r
abc_deg0014_0004_f_001v.tif,British Library 1 001v
…
abc_deg0014_0267_f_132r.tif,British Library 1 132r
abc_deg0014_0268_f_132v.tif,British Library 1 132v
abc_deg0014_0269_y_999.tif,British Library 1 Back Board Inside
abc_deg0014_0270_z_1.tif,British Library 1 Back Board Outside
Here $MSname = British Library 1 (since with different CSVs the "British Library 1" part can change to other words that I need to remove/replace and that's why I use parameter expansion).
The desired result:
abc_deg0014_0002_b_000.tif,Front Board Inside
abc_deg0014_0003_f_001r.tif,f. 001r
…
abc_deg0014_0268_f_132v.tif,f. 132v
abc_deg0014_0269_y_999.tif,Back Board Inside
If you look closely, you'll notice these rows are also differentiated from the rest by "f" in their first column (the rows that shouldn't get the "f. " in front of their second column are differentiated by "a", "b", "y", and "z", respectively, in the first column).
You are not using var1 or var2 for anything, and even if you did, looping over variables and repeatedly running sed -i on the same output file is extremely wasteful. Ideally, you would like to write all the modifications into a single sed script, and process the file only once.
Without being able to guess what other strings than "British Library 1" you have and whether those require different kinds of actions, I would suggest something along the lines of
sed -i '/^[^,]*_f_[^,_]*,/s/,British Library 1 /,f. /
s/,British Library 1 /,/' "${pathToCSV}_final.csv"
Notice how the sed script in single quotes can be wrapped over multiple physical lines. The first line finds any lines where the last characters between underscores in the first comma-separated column is f, and replaces ",British Library 1 " with ",f. ". (I made some adjustments to the spacing here -- I hope they make sense for you.) On the following line, we simply replace any (remaining) occurrences of ",British Library 1 " with just a comma; the idea is that only the lines which didn't match the regex on the previous line will still contain this string, and so we don't have to do another regex match.
This can easily be extended to cover more patterns in the same sed script, rather than repeatedly looping over the file and rewriting one pattern at a time. For example, if your next task is to replace Windsor Palace A with either a. or nothing depending on whether the penultimate underscore-separated subfield in the first field contains a, that should be obvious enough:
sed -i '/^[^,]*_f_[^,_]*,/s/,British Library 1 /,f. /
s/,British Library 1 /,/
/^[^,]*_a_[^,_]*,/s/,Windsor Palace A /,a. /
s/,Windsor Palace A /,/' "${pathToCSV}_final.csv"
In some more detail, the regex says
^ beginning of line
[^,]* any sequence of characters which are not a comma
_f_ literal characters underscore, f, underscore
[^,_]* any sequence of characters which are not a comma or an underscore
, literal comma
You should be able to see that this will target the last pair of underscores in the first column. It's important to never skip across the first comma, and near the end, not allow any underscores after the ones we specifically target before we finally allow the comma column delimiter.
Finally, also notice how we always use double quotes around variables which contain file names. There are scenarios where you can avoid this but you have to know what you are doing; the easy and straightforward rule of thumb is to always put double quotes around variables. For the full scoop, see When to wrap quotes around a shell variable?
With awk, you can look at the firth field to see whether it matches "3digits + 1 letter" then print with f. in this case and just remove fields 2,3 and 4 in the other case. For example:
awk -F'[, ]' '{
if($5 ~ /.?[[:digit:]]{3}[a-z]$/) {
printf("%s,f. %s\n",$1,$5)}
else {
printf("%s,%s %s %s\n",$1,$5,$6,$7)
}
}' test.txt
On the example you provide, it gives:
abc_deg0014_0001_a_1.tif,Front Board Outside
abc_deg0014_0002_b_000.tif,Front Board Inside
abc_deg0014_0003_f_001r.tif,f. 001r
abc_deg0014_0004_f_001v.tif,f. 001v
abc_deg0014_0267_f_132r.tif,f. 132r
abc_deg0014_0268_f_132v.tif,f. 132v
abc_deg0014_0269_y_999.tif,Back Board Inside
abc_deg0014_0270_z_1.tif,Back Board Outside

Mass regex search-and-replace BETWEEN patterns

I have a directory with a bunch of text files, all of which follow this structure:
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
- Again, some list items of random text
- Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
And I need to run a replace operation (let's say, I need to prepend CCC at the beginning of the line, just after the dash) on only those "list items", which are between PATTERN_A and PATTERN_B. The problem is they aren't really much different from the text above PATTERN_A, or below PATTERN_B, so an ordinary regex can't really catch them without also affecting the remaining text.
So, my question would be, what tool and what regex should I use to perform that replacement?
(Just in case, I'm fine with Vim, and I can collect those files in a QuickFix for a further :cdo, for example. I'm not that good with awk, unfortunately, and absolutely bad with Perl :))
Thanks!
If I have understood your questions, you can do so quite easily with a pattern-range selection and the general substitution form with sed (stream editor). For example, in your case:
$ sed '/PATTERN_A/,/PATTERN_B/s/^\([ ]*-\)/\1CCC/' file
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
(note: to substitute in place within the file add the -i option, and to create a backup of the original add -i.bak which will save the original file as file.bak)
Explanation
/PATTERN_A/,/PATTERN_B/ - select lines between PATTERN_A and PATTERN_B
s/^\([ ]*-\)/\1CCC/ - substitute (general form 's/find/replace/') where find is from beginning of line ^ capturing text between \(...\) that contains [ ]*- (any number of spaces and a hyphen) and then replace with \1 (called a backreference that contains all characters you captured with the capture group \(...\)) and appending CCC to its end.
Look things over and let me know if you have questions or if I misinterpreted your question.
With Perl also, you can get the results
> perl -pe ' { s/^(\s*-)/\1CCC/g if /PATTERN_A/../PATTERN_B/ } ' mass_replace.txt
...
- Some random number of list items of random text
- And even more of it
PATTERN_A (surrounded by empty lines)
-CCC Again, some list items of random text
-CCC Which does look similar as the first batch
PATTERN_B (surrounded by empty lines)
- And even more some random text
....
>

Format a text file by regex match and replace

I have a text file that looks like the following:
Chanelle
Jettie
Winnie
Jen
Shella
Krysta
Tish
Monika
Lynwood
Danae
2649
2466
2890
2224
2829
2427
2816
2648
2833
2453
I need to make it look like this
Chanelle 2649
Jettie 2466
... ...
I tried a lot on sublime editor but couldn't figure out the regex to do that. Can somebody demonstrate if it can be done.
I tested the following in Notepad++ but it should work universally.
Use this as the search string:
(?:(\s+[A-Za-z]+)(\r?\n))((?:\s*[A-Za-z]*\r?\n)+)\s+(\d+)
and this as the replacement:
$1 $4$2$3
Running a replace with it once will do one line at a time, if you run it multiple times it'll continue to replace lines until there are no matching lines left.
Alternatively, you can use this as the replacement if you want to have the values aligned by tabs, but it's not going to match in all cases:
$1\t\t$4$2$3
While the regex answer by SeinopSys will work, you don't need a regex to do this - instead, you can take advantage of Sublime's multiple cursors.
Place your cursor at the beginning of line 1, then hold down Shift↓ to select all the names.
Hit CtrlShiftL (Selection -> Split into Lines) to split the selection into lines.
CtrlC to copy.
Place your cursor on line 11 (the first number line) and press CtrlShift↓ (Windows/OS X) or AltShift↓ (Linux) to place a cursor at the beginning of each number line.
Hit CtrlV to paste the names before the numbers.
You can now delete the names at the top and you're all set. Alternatively, you could use CtrlX to cut the names in step 3.

Enumerate existing text in Vim (make numbered list out of existing text)

I have a source document with the following text
Here is a bunch of text
...
Collect underpants
???
Profit!
...
More text
I would like to visually select the middle three lines and insert numbers in front of them:
Here is a bunch of text
...
1. Collect underpants
2. ???
3. Profit!
...
More text
All the solutions I found either put the numbers on their own new lines or prepended the actual line of the file.
How can I prepend a range of numbers to existing lines, starting with 1?
It makes for a good macro.
Add the first number to your line, and put your cursor back at the beginning.
Start a macro with qq (or q<any letter>)
Copy the number with yf<space> (yank find )
Move down a line with j
Paste your yank with P
Move back to the beginning of the line with 0
Increment the number with Ctrl-a
Back to the beginning again with 0 (incrementing positions you at the end of the number)
End the macro by typing q again
Play the macro with #q (or #<the letter you picked>)
Replay the macro as many times as you want with <number>## (## replays the last macro)
Profit!
To summarize the fun way, this GIF image is i1. <Esc>0qqyf jP0^a0q10#q.
To apply enumeration for all lines:
:let i=1 | g/^/s//\=i.'. '/ | let i=i+1
To enumerate only selected lines:
:let i=1 | '<,'>g/^/s//\=i.'. '/ | let i=i+1
Set non recursive mapping with following command and type ,enum in command mode when cursor is inside the lines you are going to enumerate.
:nn ,enum {j<C-v>}kI0. <Esc>vipg<C-a>
TL;DR
You can type :help CTRL-A to see an answer on your question.
{Visual}g CTRL-A Add [count] to the number or alphabetic character in
the highlighted text. If several lines are
highlighted, each one will be incremented by an
additional [count] (so effectively creating a
[count] incrementing sequence).
For Example, if you have this list of numbers:
1.
1.
1.
1.
Move to the second "1." and Visually select three
lines, pressing g CTRL-A results in:
1.
2.
3.
4.
If you have a paragraph (:help paragraph) you can select it (look at :help object-select). Suppose each new line in the paragraph needs to be enumerated.
{ jump to the beginning of current paragraph
j skip blank line, move one line down
<C-v> emulates Ctrl-v, turns on Visual mode
} jump to the end of current paragraph
k skip blank line, move one line up
required region selected, we can make multi row edit:
I go into Insert mode and place cursor in the beginning of each line
0. is added in the beginning of each line
<Esc> to change mode back to Normal
You should get list prepended with zeros. If you already have such, you can omit this part.
vip select inner paragraph (list prepended with "0. ")
g<C-a> does the magic
I have found it easier to enumerate with zeroes instead of omitting first line of the list to enumerate as said in documentation.
Note: personally I have no mappings. It is easier to remember what g <C-a> does and use it directly. Answer above describes usage of pure <C-a> which requires you to manually count whatever, on the other hand g <C-a> can increment numbers with given value (aka step) and have it's "internal counter".
Create a map for #DmitrySandalov solution:
vnoremap <silent> <Leader>n :<C-U>let i=1 \| '<,'>g/^/s//\=i.'. '/ \| let i=i+1 \| nohl<CR>

Stata: removing line feed control characters

I have a dataset which I export with command outsheet into a csv-file. There are some rows which breaks line at a certain place. Using a hexadecimal editor I could recognize the control character for line feed "0a" in the record. The value of the variable producing the line break shows visually (in Stata) only 5 characters. But if I count the number of characters:
gen xlen = length(x)
I get 6. I could write a Perl programm to get rid of this problem but I prefer to remove the control characters in Stata before exporting (for example using regexr()). Does anyone have an idea how to remove the control characters?
The char() function calls up particular ASCII characters. So, you can delete such characters by replacing them with empty strings.
replace x = subinstr(x, char(10), "", .)