Vim Regex gives incorrect output - regex

I have a list of dates (YYYY-M or YYYY-MM) and want to prefix 0 before the first 9 months for consistency. Data format : Date in YYYY-M or YYYY-MM followed by a comma and a number.
Eg:
2012-1,789
2012-11,563
2012-1,789 should be changed to 2012-01,789. The entry `2012-11,563' should remain unchanged.
Correct output should be:
2012-01,789
2012-11,563
I tried following regular expression in Vim.
:%s/-\(\d\),/-0\0,/g
However, I get the following output:
2012-0-1,789
2012-11,563
Why am I getting an additional dash - between two digits?

Capturing group number starts from 1, not from 0.
So the command should be:
:%s/-\(\d\),/-0\1,/g

Related

Replace or remove period/dot from expression

I have this expression in PCRE and I want to leave/remove the . (period) out of klantnummer.
Expression:
^h:\/Klant\/(?<klantnummer>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
Input:
h:/Klant/12345678.9/map 1/map 2
Outcome: 12345678.9
Desired result: 123456789
https://regex101.com/r/EVv47V/1
So Klantnummer should have 123456789 as result
You can't do that in one step. You could catch it in two Capture Groups:
^h:\/Klant\/(?<klantnummer1>[^\.\/]+)\.(?<klantnummer2>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
and put both together by string concatenation after or use two regex steps and filter out the period in the second, like stated in comments.
Regex above assumes there is always a period, this will work for 0 or 1 period in the number:
h:\/Klant\/(?<klantnummer1>[^\.\/]+)(?:\.?(?<klantnummer2>[^\.\/]+))\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
As already discussed you can't do this on one step.
The solution of using 2 regex stages, or 2 splitting klantnummer into 2 groups before and after the capture group will both work.
However I believe that the simplest and most efficient both in terms of computer power and of code to write, will be to replace .with and empty String '' after the regex, and before using it.
You haven't said which programming language you are using so I can't give you the syntax/example.
If all that you are doing is splitting the String on the slashes you will probably find it easier to split the string into an array.
For example in python
s = "h:/Klant/12345678.9/map 1/map 2"
array = s.split('/')
Klantnummer=array[2].replace('.','')
folder1=array[3]
folder2=array[4]
print(Klantnummer)
print(folder1)
print(folder2)
output
123456789
map 1
map 2
Tested on https://www.online-python.com/

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

Obtaining geographic decimal coordinates from proprietary text format using regex

Using only Notepad++ with regex support I would like to extract some data from a txt file, representing geographic coordinates and organize the output like that:
-123456789 becomes -123.456789
123456789 becomes 123.456789
-23456789 becomes -23.456789
56789 becomes 0.056789
-89 becomes -0.000089
Tried this: (-?)([0-9]*)([0-9]{6}) but fails when input is less than 6 digits long
You will need 2 steps in notepad++ to do this. First, let's take a look at the regex:
(?<sign>-?)(?<first>\d+(?=\d{6}))?(?<last>\d+)
captures the necessary parts in groups.
Explanation: (you can lose the named grouping if you want)
(?<sign>-?) # read the '-' sign
(?<first>\d+(?=\d{6}))? # read as many digits as possible,
# leaving 6 digits at the end.
(?<last>\d+) # read the remaining digits.
see regex101.com
How to use this in notepad++? Using a two step-search and replace:
(-?)(\d+(?=\d{6}))?(\d+)
replace with:
\1(?2\2.:0.)000000\3 # copy sign, if group 2 contains any
# values, copy them, followed by '.'.
# If not show a '0.'
# Print 6 zero's, followed by group 3.
Next, replace the superfluous zeros.
\.(0+(?=\d{6}\b))(\d{6}) # Replace the maximum number of zero's
# leaving 6 digits at the end.
replace with:
.\2
You can do it with three steps :
Step1 : replace : (-?)\b(\d{1,6})\b with \10000000\2
Step2 : replace : (-?)(\d{0,})(\d{6}) with \1\2.\3
Step3 : replace : 0{2,}\. with 0.
The idea is simple :
In the first step comple all the numbers less than 6 length with 6
zeros before to insure the length should be more than 6
In the step two put the dot before the 6th number
Step three replace all the multiple zeros before the dot with just one
In the end the output
-123.456789
123.456789
-23.456789
0.056789
-0.000089
Check the three steps :
You could use a Python Script plugin available for notepad++:
editor.rereplace('(\d+)', lambda m: ('%f' % (float(m.group(1))/1000000)))

Regex selecting the last 6 numbers of

I am a noob at regex and i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line (making a CSV obviously).
Anyway sample data is simply nonsense like this:
fafksadjlkgtjafglkj210000adsfaklgjadklgjag3600001skfjaklaj093i393593390000002sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438adsfaklgjadklgjag7700001skfjaklaj093i393593390000002ssafksa djlkgtjafglkj000000adsfaklgjadklgjag0000001skfj aklaj093i393593£39.00900002ssafksadjlk gtjafglkj000000adsfaklgjadklgjag0000001skfjaklaj093i3935£933.90000002s
Note some of the numbers are attached to currency values as well (and some are next to it but contain a space before hand) but the end will always be 6 numbers (consider them to be random as I can't see a pattern).
So I basically need to select strings matching numerics that are six digits long or longer, if longer then it just uses the last 6 digits.
Then I will replace it with itself and a comma and new line.
I hope that makes sense, i've tried a few things without success..
Thanks, edit the closest I have is:
(\d)\d{6}(?!\d)
In the Find what: text field, type in (\d{6})(\D). In the Replace with: text field, type in $1\r\n$2. Make sure that the regular expression radio button is selected. For your input, that should yield this:
fafksadjlkgtjafglkj210000
adsfaklgjadklgjag3600001
skfjaklaj093i393593390000002
sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438
adsfaklgjadklgjag7700001
skfjaklaj093i393593390000002
ssafksa djlkgtjafglkj000000
adsfaklgjadklgjag0000001
skfj aklaj093i393593
£39.00900002
ssafksadjlk gtjafglkj000000
adsfaklgjadklgjag0000001
skfjaklaj093i3935£933.90000002
s
You want
\d{6}(?=\D*$)
Read more about anchors here.
i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line
So you're basically trying to do this, right?:
Find:
(\d{6})(\D)
Replace:
\1\n\2
[Online example]
How about:
Find what: (\d{6,})(?:\D*)$
Replace with: $1,\n

Why my regex is failing for single digits but working for double digits?

I have the requirement to validate a String containing two numbers separated by a dash(-) or a comma(,). Valid values are :
23.98-34.76 or 23.98,34.76
23-34 or 23,34
5-6 or 5,6
I have the following regex which is a slight modification of the answer that I received here in SO. It is covering the 1st and 2nd case above but not the third case involving single digits only.
The modified regex String that I am using is :
(\d+\.?\d+?)([-,])(\d+\.?\d+?)
Where did my regex go wrong?
Correct regex should be like this:
(\d+(\.\d+)?)[-,](\d+(\.\d+)?)
i.e. if there is a period then it is always followed by 1 or more digits.
Otherwise in your regex it will also match strings like 123.,789.