Replace a substring in the first column using vi - list

I have a huge file that has multiple columns as shown below:
J02-31 23.2 ...
J30-09 -45.4 ...
J05+30 56.1 ...
J00-20 -78.2 ...
J11-54 232.0 ...
... ... ...
I would like to replace - with $-$ only in the first column, i.e., my output should be like this:
J02$-$31 23.2 ...
J30$-$09 -45.4 ...
J05+30 56.1 ...
J00$-$20 -78.2 ...
J11$-$54 232.0 ...
... ... ...
Is there a way to do this using vi. I know that python/pandas can do it, but I am interested in vi usage.

I'd go with
:%s/^\S*\zs-/$-$/
which means:
%s/: apply this substitution for every line
^\S*: read as many non-whitespace characters from the start of the line as possible
\zs: actual match start (you could also capture the \S* above instead and insert it back too)
-: match the - (note: this will only match the last - in the first column, your question isn't really clear if there can be multiple there)
/$-$/: replace the matching part (which is only - thanks to the \zs) with $-$

You could do:
:g/^\S*-/s/-/$-$/
Which performs the replacement s/-/$-$/ only on lines which match the pattern /^\S*-/ (ie, those lines which have a - in the first column).

Related

Modify position in a line if Regular Expression found

I need to modify the positions number 10 of every line that finds the word 'Example' (can´t use the actual data here) and add the string '(ID) '. It doesn´t necessarily have to begin with 9 numbers, it just needs to add the string to the position number 10.
For example, this line should be modified like this:
ORIGINAL: 123456789This line is being used as an Example
SOLUTION: 123456789(ID) This line is being used as an Example
So far I have this, to find the Example and copy the rest of the line as to not lose the text:
Find: (.*)Example
Bonus points if it works for two different words 'Example1' and 'Example2' in different sentences, the 'and also' part of this example would change in every line.
ORIGINAL: 123456789This line is being used as an Example1 and also Example2
SOLUTION: 123456789(ID) This line is being used as an Example1 and also Example2
This would have this search:
Find: (.*)Example1(.*)Example2
Thank you
You could try:
Find: (\d{9})(?=.*\bExample1\b.*\bExample2\b)
Replace: $(ID)
^^^ single space after (ID)
Demo
The regex pattern used matches and captures a 9 digit number (you may adjust to any width, or range of widths, which you want). It also uses a positive lookahead to assert that Example1 and Example2 in fact occur later in the same line:
(?=.*\bExample1\b.*\bExample2\b)
This is how you add characters in a certain position, even tho I accepted Tims answer because it´s very similar and made me figure it out:
^(\S{9})(?=.*\bExample1\b.*\bExample2\b)
As you can see, I only added '^' so it´s the position from the start of the line, and 'S' instead of 'd' so it counts characters that are not whitespace, instead of numbers. This should work for any type of line you have.

Deleting comments in a large file

I am trying to delete a bunch of comments that are all in the following format:
/**
* #ngdoc
... comment body (delete me, too!)
*/
I have tried using this command: %s/\/**\n * #ngdoc.\{-}*\///g
Here is the regex without the patterns: %s/pattern1.\{-}pattern2//g
Here are the individual patterns: \/**\n * #ngdoc and *\/
When I try my pattern in vim I get the following error:
E871: (NFA regexp) Can't have a multi follow a multi !
E61: Nested *
E476: Invalid command
Thanks for any help with this regexp nightmare!
Instead of trying to cram this into one complex regex, it's much easier to search for the start of a comment and delete from there on to the end of a comment
:g/^\/\*\*$/,/\*\/$/d_
This breaks down into
:g start a global command
/^\/\*\*$/ search for start of a comment: <sol>/**<eol>
,/^\*\/$/ extend the range to the end of a comment: <sol>*/<eol>
d delete the range
_ use the black hole register (performance optimization)
Your problem is you have \{-} followed by * which are the multis referenced in the error message. Quote the *:
%s/\/\*\*\n \* #ngdoc\_.\{-}\*\/\n//g
Using embedded newlines in the pattern is the wrong approach. You should instead use an address range. Something like:
sed '\#^/\*\*$#,\#^\*/$#d' file
This will delete all lines starting from one that matches /** anchored at column 1 to the line matching */ anchored at column 1. If your comments are well behaved (eg, no trailing space after /**), this should do what you want.
Try this using gc to be careful when deleting
%s/\v\/\*\*\n\s\*\s\#ngdoc\n((\s*\n)?(\s\*.*\n)?){-}\s?\*\///gc
Match comments like
/**
* #ngdoc
* ... comment body (delete me, too!)
*
*/
My approached consists of using a macro:
qa/\/\*\*<enter><shift-v>/\*\/<enter>d
qa ........ starts recording macro "a"
/\/\*\* ... searches for the comment beginning
<Enter> ... use Ctrl-v Enter
V ......... starts visual block (until...)
/\*\/ ..... end of your comment
<Enter> ... Ctrl-v Enter agai
d ......... it will delete selected area
In order to isert etc presse followed by the keyword you want.

add character before first word in line

I want to add a minus sign "-" infront of the first word in a line on the editor VIM. The lines contains spaces for indentation. The indentation shall not be touched. E.g
As Is
list point 1
sub list point 2
and so on...
I want
- list point 1
- sub list point 2
- and so on...
I can find the first word, but i struggle with replacing it in the correct way.
^\s*\w
in Vim
/^\s*\w
But in the replacement I always remove the complete found part....
:s/^\s*\w/- \w/
Which leads to
- ist point 1
- ub list point 2
- nd so on...
Use & which is replaced with the matched string:
:%s/\w/- &
I'm late to the party but:
:%norm! I- <CR>
And another one with :s:
:%s/^\s*/&- /
An alternative to falsetrue's answer: You can capture the first word character and print it out along with the leading -:
%s/\(\w\)/- \1/
:normal cmd may help too:
:%norm! wi-
note that after - there is a space.

Remove the first character of each line and append using Vim

I have a data file as follows.
1,14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050
1,13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185
1,14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480
1,13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735
Using vim, I want to reomve the 1's from each of the lines and append them to the end. The resultant file would look like this:
14.23,1.71,2.43,15.6,127,2.8,3.06,.28,2.29,5.64,1.04,3.92,1065,1
13.2,1.78,2.14,11.2,100,2.65,2.76,.26,1.28,4.38,1.05,3.4,1050,1
13.16,2.36,2.67,18.6,101,2.8,3.24,.3,2.81,5.68,1.03,3.17,1185,1
14.37,1.95,2.5,16.8,113,3.85,3.49,.24,2.18,7.8,.86,3.45,1480,1
13.24,2.59,2.87,21,118,2.8,2.69,.39,1.82,4.32,1.04,2.93,735,1
I was looking for an elegant way to do this.
Actually I tried it like
:%s/$/,/g
And then
:%s/$/^./g
But I could not make it to work.
EDIT : Well, actually I made one mistake in my question. In the data-file, the first character is not always 1, they are mixture of 1, 2 and 3. So, from all the answers from this questions, I came up with the solution --
:%s/^\([1-3]\),\(.*\)/\2,\1/g
and it is working now.
A regular expression that doesn't care which number, its digits, or separator you've used. That is, this would work for lines that have both 1 as their first number, or 114:
:%s/\([0-9]*\)\(.\)\(.*\)/\3\2\1/
Explanation:
:%s// - Substitute every line (%)
\(<something>\) - Extract and store to \n
[0-9]* - A number 0 or more times
. - Every char, in this case,
.* - Every char 0 or more times
\3\2\1 - Replace what is captured with \(\)
So: Cut up 1 , <the rest> to \1, \2 and \3 respectively, and reorder them.
This
:%s/^1,//
:%s/$/,1/
could be somewhat simpler to understand.
:%s/^1,\(.*\)/\1,1/
This will do the replacement on each line in the file. The \1 replaces everything captured by the (.*)
:%s/1,\(.*$\)/\1,1/gc
.........................
You could also solve this one using a macro. First, think about how to delete the 1, from the start of a line and append it to the end:
0 go the the start of the line
df, delete everything to and including the first ,
A,<ESC> append a comma to the end of the line
p paste the thing you deleted with df,
x delete the trailing comma
So, to sum it up, the following will convert a single line:
0df,A,<ESC>px
Now if you'd like to apply this set of modifications to all the lines, you will first need to record them:
qj start recording into the 'j' register
0df,A,<ESC>px convert a single line
j go to the next line
q stop recording
Finally, you can execute the macro anytime you want using #j, or convert your entire file with 99#j (using a higher number than 99 if you have more than 99 lines).
Here's the complete version:
qj0df,A,<ESC>pxjq99#j
This one might be easier to understand than the other solutions if you're not used to regular expressions!

hive regexp_extract weirdness

I am having some problems with regexp_extract:
I am querying on a tab-delimited file, the column I'm checking has strings that look like this:
abc.def.ghi
Now, if I do:
select distinct regexp_extract(name, '[^.]+', 0) from dummy;
MR job runs, it works, and I get "abc" from index 0.
But now, if I want to get "def" from index 1:
select distinct regexp_extract(name, '[^.]+', 1) from dummy;
Hive fails with:
2011-12-13 23:17:08,132 Stage-1 map = 0%, reduce = 0%
2011-12-13 23:17:28,265 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201112071152_0071 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Log file says:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
Am I doing something fundamentally wrong here?
Thanks,
Mario
From the docs https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF it appears that regexp_extract() is a record/line extraction of the data you wish to extract.
It seems to work on a first found (then quit) as opposed to global. Therefore the index references the capture group.
0 = the entire match
1 = capture group 1
2 = capture group 2, etc ...
Paraphrased from the manual:
regexp_extract('foothebar', 'foo(.*?)(bar)', 2)
^ ^
groups 1 2
This returns 'bar'.
So, in your case, to get the text after the dot, something like this might work:
regexp_extract(name, '\.([^.]+)', 1)
or this
regexp_extract(name, '[.]([^.]+)', 1)
edit
I got re-interested in this, just a fyi, there could be a shortcut/workaround for you.
It looks like you want a particular segment separated with a dot . character, which is almost like split.
Its more than likely the regex engine used overwrites a group if it is quantified more than once.
You can take advantage of that with something like this:
Returns the first segment: abc.def.ghi
regexp_extract(name, '^(?:([^.]+)\.?){1}', 1)
Returns the second segment: abc.def.ghi
regexp_extract(name, '^(?:([^.]+)\.?){2}', 1)
Returns the third segment: abc.def.ghi
regexp_extract(name, '^(?:([^.]+)\.?){3}', 1)
The index doesn't change (because the index still referrs to capture group 1), only the regex repetition changes.
Some notes:
This regex ^(?:([^.]+)\.?){n} has problems though.
It requires there be something between dots in the segment or the regex won't match ....
It could be this ^(?:([^.]*)\.?){n} but this will match even if there is less than n-1 dots,
including the empty string. This is probably not desireable.
There is a way to do it where it doesn't require text between the dots, but still requires at least n-1 dots.
This uses a lookahead assertion and capture buffer 2 as a flag.
^(?:(?!\2)([^.]*)(?:\.|$())){2} , everything else is the same.
So, if it uses java style regex, then this should work.
regexp_extract(name, '^(?:(?!\2)([^.]*)(?:\.|$())){2}', 1) change {2} to whatever 'segment' is needed (this does segment 2).
and it still returns capture buffer 1 after the {N}'th iteration.
Here it is broken down
^ # Begining of string
(?: # Grouping
(?!\2) # Assertion: Capture buffer 2 is UNDEFINED
( [^.]*) # Capture buffer 1, optional non-dot chars, many times
(?: # Grouping
\. # Dot character
| # or,
$ () # End of string, set capture buffer 2 DEFINED (prevents recursion when end of string)
) # End grouping
){3} # End grouping, repeat group exactly 3 (or N) times (overwrites capture buffer 1 each time)
If it doesn't do assertions, then this won't work!
I think you have to make 'groups' no?
select distinct regexp_extract(name, '([^.]+)', 1) from dummy;
(untested)
I think it behaves like the java library and this should work, let me know though.