Vim search and replace, adding a constant - regex

I know this is a long shot, but I have a huge text file and I need to add a given number to other numbers matching some criteria.
Eg.
identifying text 1.1200
identifying text 1.1400
and I'd like to transform this (by adding say 1.15) to
identifying text 2.2700
identifying text 2.2900
Normally I'd do this in Python, but it's on a Windows machine where I can't install too many things. I've got Vim though :)

Here is a simplification and a fix on hobbs' solution:
:%s/identifying text \zs\d\+\(.\d\+\)\=/\=(1.15+str2float(submatch(0)))/
Thanks to \zs, there is no need to recall the leading text. Thanks to str2float() a single addition is done on the whole number (in other words, 1.15 + 2.87 will give the expected result, 4.02, and not 3.102).
Of course this solution requires a recent version of Vim (7.3?)

You can do a capturing regex and then use a vimscript expression as a replacement, something like
:%s/\(identifying text \)\(\d\+\)\.\(\d\+\)/
\=submatch(1) . (submatch(2) + 1) . "." . (submatch(3) + 1500)
(only without the linebreak).

Your number format seems to be a fixed one, so it's easy to convert to int and come back (remove the dot) add 11500 and put the dot back.
:%s/\.//
:%normal11500^A " type C-V then C-a
:%s/....$/.&/
If you don't want to do that on all the lines but only the one which match 'identifying text' replace all the % by 'g/indentifying text/'

For integers you can just use n^A to add n to a number (and n^X to subtract it). I doubt whether that works for fractional numbers though.

Well this might not be a solution for vim but I think awk can help:
cat testscript | LC_ALL=C awk '{printf "%s %s %s %s %.3f\n", $1,$2,$3,$4,$5+1.567 }'
and the test
this is a number 1.56
this is a number 2.56
this is a number 3.56
I needed the LC_ALL=C for the correct conversion of the floating point separator, and maybe there is a more elegant solution for printing the beginning/ rest of the string. And the result looks like:
this is a number 3.127
this is a number 4.127
this is a number 5.127

Using macro
qa .......................... start record macro 'a'
/iden<Enter> ................ search 'ident*' press Enter
2w .......................... jump 2 words until number one (before dot)
Ctrl-a ...................... increases the number
2w .......................... jump to number after dot
1500 Ctrl-a ................. perform increases 1500 times
q ........................... stop record to macro 'a'
if you have 300 lines with this pattern just now making
300#a

Related

Issues while processing zeroes found in CSV input file with Perl

Friends:
I have to process a CSV file, using Perl language and produce an Excel as output, using the Excel::Writer::XSLX module. This is not a homework but a real life problem, where I cannot download whichever Perl version (actually, I need to use Perl 5.6), or whichever Perl module (I have a limited set of them). My OS is UNIX. I can also use (embedding in Perl) ksh and csh (with some limitation, as I have found so far). Please, limit your answers to the tools I have available. Thanks in advance!
Even though I am not a Perl developer, but coming from other languages, I have already done my work. However, the customer is asking for extra processing where I am getting stuck on.
1) The stones in the road I found are coming from two sides: from Perl and from Excel particular styles of processing data. I already found a workaround to handle the Excel, but -as mentioned in the subject- I have difficulties while processing zeroes found in CSV input file. To handle the Excel, I am using the '0 way which is the final way for data representation that Excel seems to have while using the # formatting style.
2) Scenario:
I need to catch standalone zeroes which might be present in whichever line / column / cell of the CSV input file and put them as such (as zeroes) in the Excel output file.
I will go directly to the point of my question to avoid loosing your valuable time. I am providing more details after my question:
Research and question:
I tried to use Perl regex to find standalone "0" and replace them by whichever string, planning to replace them back to "0" at the end of processing.
perl -p -i -e 's/\b0\b/string/g' myfile.csv`
and
perl -i -ple 's/\b0\b/string/g' myfile.csv
Are working; but only from command line. They aren't working when I call them from the Perl script as follows:
system("perl -i -ple 's/\b0\b/string/g' myfile.csv")
Do not know why... I have already tried using exec and eval, instead of system, with the same results.
Note that I have a ton of regex that work perfectly with the same structure, such as the following:
system("perl -i -ple 's/input/output/g' myfile.csv")
I have also tried using backticks and qx//, without success. Note that qx// and backticks have not the same behavior, since qx// is complaining about the boundaries \b because of the forward slash.
I have tried using sed -i, but my System is rejecting -i as invalid flag (do not know if this happens in all UNIX, but at least happens in the one at work. However is accepting perl -i).
I have tried embedding awk (which is working from command line), in this way:
system `awk -F ',' -v OFS=',' '$1 == \"0\" { $1 = "string" }1' myfile.csv > myfile_copy.csv
But this works only for the first column (in command line) and, other than having the disadvantage of having extra copy file, Perl is complaining for > redirection, assuming it as "greater than"...
system(q#awk 'BEGIN{FS=OFS=",";split("1 2 3 4 5",A," ") } { for(i in A)sub(0,"string",$A[i] ) }1' myfile.csv#);
This awk is working from command line, but only 5 columns. But not in Perl using #.
All the combinations of exec and eval have also been tested without success.
I have also tried passing to system each one of the awk components, as arguments, separated by commas, but did not find any valid way to pass the redirector (>), since Perl is rejecting it because of the mentioned reason.
Using another approach, I noticed that the "standalone zeroes" seem to be "swallowed" by the Text::CSV module, thus, I get rid off it, and turned back to a traditional looping in csv line by line and a spliter for commas, preserving the zeroes in that way. However I found the "mystery" of isdual in Perl, and because of the limitation of modules I have, I cannot use the Dumper. Then, I also explored the guts of binaries in Perl and tried the $x ^ $x, which was deprecated since version 5.22 but valid till that version (I said mine is 5.6). This is useful to catch numbers vs strings. However, while if( $x ^ $x ) returns TRUE for strings, if( !( $x ^ $x ) ) does not returns TRUE when $x = 0. [UPDATE: I tried this in a devoted Perl script, just for this purpose, and it is working. I believe that my probable wrong conclusion ("not returning TRUE") was obtained when I did not still realize that Text::CSV was swallowing my zeroes. Doing new tests...].
I will appreciate very much your help!
MORE DETAILS ON MY REQUIREMENTS:
1) This is a dynamic report coming from a database which is handover to me and I pickup programmatically from a folder. Dynamic means that it might have whichever amount of tables, whichever amount of columns in each table, whichever names as column headers, whichever amount of rows in each table.
2) I do not know, and cannot know, the column names, because they vary from report to report. So, I cannot be guided by column names.
A sample input:
Alfa,Alfa1,Beta,Gamma,Delta,Delta1,Epsilon,Dseta,Heta,Zeta,Iota,Kappa
0,J5,alfa,0,111.33,124.45,0,0,456.85,234.56,798.43,330000.00
M1,0,X888,ZZ,222.44,111.33,12.24,45.67,0,234.56,0,975.33
3) Input Explanation
a) This is an example of a random report with 12 columns and 3 rows. Fist row is header.
b) I call "standalone zeroes" those "clean" zeroes which are coming in the CSV file, from second row onwards, between commas, like 0, (if the case is the first position in the row) or like ,0, in subsequent positions.
c) In the second row of the example you can read, from the beginning of the row: 0,J5,alfa,0, which in this particular case, are "words" or "strings". In this case, 4 names (note that two of them are zeroes, which required to be treated as strings). Thus, we have a 4 names-columns example (Alfa,Alfa1,Beta,Gamma are headers for those columns, but only in this scenario). From that point onwards, in the second row, you can see floating point (*.00) numbers and, among them, you can see 2 zeroes, which are numbers. Finally, in the third line, you can read M1,0,X888,Z, which are the names for the first 4 columns. Note, please, that the 4th column in the second row has 0 as name, while the 4th column in the third row has ZZ as name.
Summary: as a general picture, I have a table-report divided in 2 parts, from left to right: 4 columns for names, and 8 columns for numbers.
Always the first M columns are names and the last N columns are numbers.
- It is unknown which number is M: which amount of columns devoted for words / strings I will receive.
- It is unknown which number is N: which amount of columns devoted for numbers I will receive.
- It is KNOWN that, after the M amount of columns ends, always starts N, and this is constant for all the rows.
I have done a quick research on Perl boundaries for regex ( \b ), and I have not found any relevant information regarding if it applies or not in Perl 5.6.
However, since you are using and old Perl version, try the traditional UNIX / Linux style (I mean, what Perl inherits from Shell), like this:
system("perl -i -ple 's/^0/string/g' myfile.csv");
The previous regex should do the work doing the change at the start of the each line in your CSV file, if matches.
Or, maybe better (if you have those "standalone" zeroes, and want avoid any unwanted change in some "leading zeroes" string):
system("perl -i -ple 's/^0,/string,/g' myfile.csv");
[Note that I have added the comma, after the zero; and, of course, after the string].
Note that the first regex should work; the second one is just a "caveat", to be cautious.

Multiply a decimal number in vim by a fixed amount

I have a file with the following contents:
set x 0.00456 y 0.05896.
I want to multiply the digits by a fixed amount (lets say 1000). The numbers do not always exist in the same column so anything with awk is out of the picture. I have been trying this but not sure if the way I am using submatch is correct.
%s/ \d*\.\d*/\=submatch(2)*100
vim
Your submatch(2) usage is not correct. You don't have any matching groups, so you should use submatch(0).
Another problem in your codes is, you should first change the string into float, then do the calculation:
%s/\v\d+[.]\d+/\=str2float(submatch(0))*1000/g
awk
The numbers do not always exist in the same column so anything with
awk is out of the picture.
This is not true. You can check each column, if it matches the number format, you do the math calculation:
awk '{for(i=1;i<=NF;i++)if($i~/[0-9]+[.][0-9]+/)$i*=1000}7' file
You can also call the awk within your vim:
%!awk '{for(i=1;i<=NF;i++)if($i~/[0-9]+[.][0-9]+/)$i*=1000}7'

Vim: Placing (,) in between CERTAIN high numbers Issue

source txt file:
34|Gurla Mandhata|7694|25243|2788|Nalakankar Himalaya|30°26'19"N
81°17'48"E|Dhaulagiri|1985|6 (4)|China
command input:
:%s/\(\d\+\)\(\d\d\d\)/\1,\2/g
command output:
34|Gurla Mandhata|7,694|25,243|2,788|Nalakankar Himalaya|30°26'19"N
81°17'48"E|Dhaulagiri|1,985|6 (4)|China
Desired output:
34|Gurla Mandhata|7,694|25,243|2,788|Nalakankar Himalaya|30°26'19"N
81°17'48"E|Dhaulagiri|1985|6 (4)|China
Basically 1985 is supposed to be 1985 and not 1,985. I tried to put a \? so every time the pattern matches it stops and a °+ after so it has to detect a ° to match the pattern, but no success. It just replaces the ° and everything before that, complete mess.
My knowledge of regular expressions however combined with the substitute is weak and I'm stuck here.
EDIT
the first 3 numbers represent heights of mountains, those 3 need to change with a (,) and the last number ( 1985 ) represents a year, which must not be changed.
Mathematical solutions are not going to work as loophole since there are mountains with a height off less than 1900
You haven't told us what is the difference between 1985 and other numbers, so I assumed that your "small" numbers are less than 2000.
You almost got it:
:%s/(\d*[2-90])(\d\d\d)/\1,\2/g
Alternatively if that isn't what you want, you can use c flag (:h s_flags):
:%s/\(\d\+\)\(\d\d\d\)/\1,\2/gc
this line will leave the last 3 columns untouched, just do substitution on the content before it:
%s/\v(.*)((\|[^|]*){3}$)/\=substitute(submatch(1),'\v(\d+)(\d{3})','\1,\2','g').submatch(2)/g
Note that the above line will change 1000000 into 1000,000 instead of 1,000,000. Vim's printf() doesn't support %'d, it is pity. If you do have number > 1m, we can find other solutions.
update
I solved it myself, by using 3 seperate commands; one for every number string in the file:
%s/^\(\d*|[^|]*|\)\(\d\+\)\(\d\d\d\)|/\1\2,\3|/g
:%s/^\(\d*|[^|]*|\d\+,*\d*|\)\(\d\+\)\(\d\d\d\)|/\1\2,\3|/g
:%s/^\(\d*|[^|]*|\d\+,*\d*|\d\+,*\d*|\)\(\d\+\)\(\d\d\d\)|/\1\2,\3|/g
In case you want to use perl:
:%!perl -F'\|' -lane 'for(#F[2..4]) { s/(\d+)(\d{3})/\1,\2/;} print join "|", #F'

Eclipse, regex search and replace

I have a file which contains text like:
1x
2x
5x
10x
50x
100x
.....
Using Eclipse search and replace with regex, how do I script to have an output like
x1
x2
x5
x10
x50
x100
...
That is to say, search for a regex, break it into fields (\d+x, thus \d+ and 'x' in my case), and reuse the field elements later to resubstitute as 'x'+'\d+'.
I looked at a previous question on same lines, but I want to take that a step further.
Thank you.
Search for
(\d+)x
and replace with
x\1
And enable "Regular expressions". It will put the 'x' in front of the number.
search: (\d+)(.+)
replace with : \2\1
or $2$1 whichever eclipse supports

Regex for dollar amount between 3 and 50

I am close, but I need some help to complete a regex. Here is the goal:
Should succeed:
10.05
3.00
50
Should fail:
55.99 (>50)
3.001 (can't have the "1" at the end)
0.50 (< 3)
.99 (< 3)
$50 (can't have "$")
5.2 (if decimal, must have 2 digits after)
Here's the regex I have so far, but it doesn't quite do all the above correctly:
^([1-4][0-9]|50|[3-9])+(\.[0-9][0-9])?$
Can anyone share the answer? Thanks!
^(50(\.00)?|([1-4][0-9]|[3-9])(\.[0-9][0-9])?)$
There were two issues. Firstly, you had allowed non-zero values after the decimal point, even if the value before it was 50. So I separated that out on the top level. Secondly, just remove the +. Because due to it, you can have much larger numbers (by chaining 50 and 43 together, for instance).
However, as Bergi mentioned in a comment, it would be better to just check the format, and do the range check separately (without regex). This would be the format check:
^\d+(\.\d\d)?$
I found a online utility that returns a regex for integers when input the lower and upper limits of the range you want. I used it for the part before the . with limits 3-50 and after the . with limits 0-99. Here is the result:
^0*([3-9]|[1-4][0-9]|50)(\.[0-9]{2})?$
A quick glance... just remove the +
^([1-4][0-9]|50|[3-9])(.[0-9][0-9])?$
You should remove the + before the potential cents. Also, you will need to handle 50$ as a special case, because it can only have .00 after it and not any cent amount.
Also, I changed the [0-9] to the shortcut for digits: \d
/^((0?[3-9]|[1-4]\d)(\.\d\d)?|50(\.00)?)$/