sed making digit optional - regex

I am attempting to replace a date in the format 08/09/2014 but at the same time also the format 8/9/14 using sed. I know the + sign is supposed to match one or more occurrences, and ? 0 or more. I've tried both but none of the dates are being replaced with "testing". I was expecting this would find 1 or more digits followed by a slash, 1 or more digits followed by a slash, 4 digits.
Do I need to escape the special character, or what is wrong here?
sed -f mySed.sed dates.csv
# mySed.sed file
s#[0-9]+/[0-9]+/[0-9][0-9][0-9][0-9]#testing#g
# sample line in dates.csv
...,20/01/2001,2/1/2009,...

You have made several mistakes. Here is a working example:
echo '20/01/2001,2/1/2009' | sed 's~[0-9]\{1,2\}/[0-9]\{1,2\}/\([0-9]\{2\}\)\{1,2\}~toto~g'
Note that the ? means "optional" (in other words 0 or 1 time) and must be escaped.
To be more precise, I have choosen to use this quantifier {m,n} instead of +. But if you use + don't forget to escape it \+ otherwise it will be seen as a literal character.

You need to escape the + quantifiers in your regular expression, and you can use a range for the last set.
s#[0-9]\+/[0-9]\+/[0-9]\{2,4\}#testing#g
Or you can use the range quantifier throughout your pattern.
s#[0-9]\{1,2\}/[0-9]\{1,2\}/[0-9]\{2,4\}#testing#g

Related

Swap minus sign from after the number to in front of the number using SED (and Regex)

I've got a text-file with the following line:
201174480 11-01-1911 J Student 25-07 11585 2 0 SPOED BEZORGEN 1ST 25,00
320819019 11-01-1911 T. Student 28-07 13561 1 15786986 DESLORATADINE TABL OMH 5MG 60ST 3,60
706059901 11-01-1911 ST Student-Student 30-06 14956 1 15356221 METOPROLOLSUCC RET T 100MG 180ST 12,90-
I want to change this line with SED into:
201174480 11-01-1911 J Student 25-07 11585 2 0 SPOED BEZORGEN 1ST 25,00
320819019 11-01-1911 T. Student 28-07 13561 1 15786986 DESLORATADINE TABL OMH 5MG 60ST 3,60
706059901 11-01-1911 ST Student-Student 30-06 14956 1 15356221 METOPROLOLSUCC RET T 100MG 180ST -12,90
So I want to swap the minus sign so that I get-12,90 in stead of 12,90- with SED. I tried:
try 1:
sed 's/\([0-9.]\+\)-/-\1/g' file.txt > file1.txt
try 2:
sed 's/\([0-9].\+\)-$/-\1/g' file.txt > file1.txt
So there must be something wrong with the REGEX but I donot really understand it. Please help.
You may use
sed 's/\([0-9][0-9,.]\+\)-\($\|[^0-9]\)/-\1\2/g'
See the online demo
The point is that after matching a number and a - (see \([0-9][0-9,.]\+\)-), there should come either end of string or non-digit (\($\|[^0-9]\)). Thus, we have 2 capturing groups now, and that is why we need a second backreference in the replacement pattern (\2).
I added a dot . to the bracket expression just in case you have mixed number formats, you may remove it if you always have a comma as the decimal separator.
Pattern details:
\([0-9][0-9,.]\+\) - Group 1 capturing
[0-9] - a digit
[0-9,.]\+ - one or more digits, commas or dots
- - a literal hyphen
\($\|[^0-9]\) - Group 2 capturing the end of string $ or a non-digit ([^0-9])
In your example, both files are identical, but I think I know what you mean.
For this particular file, you want to match a space, followed by zero or more digits, followed by a comma, followed by at least one digit, followed by a dash,
followed by zero or more spaces to the end of the line.
Then you want to replace the space in front of the matched digits and the comma with a dash. This will do the trick:
sed -e 's/ \([0-9]*,[0-9][0-9]*\)- *$/-\1/' <file.txt >file1.txt
Your first regular expression attempts to match against a string of numbers and .s, but the text contains a comma, not a .. It does the substitution you want if you replace [0-9.] with [0-9,], giving:
sed 's/\([0-9,]\+\)-/-\1/g' file.txt > file1.txt
However, it also replaces 25-07 in that case with -2507. I suggest you explicitly match against the end of the line:
sed 's/\([0-9,]\+\)-$/-\1/g'
or alternatively, you can demand that the match contains exactly one comma:
sed 's/\([0-9]\+,[0-9]\+\)-$/-\1/g'
I also find these things easier to read if you use the -r option to sed, which enables "extended regular expressions":
sed -r 's/([0-9]+,[0-9]+)-$/-\1/g'
Fewer special characters need to be escaped (on the other hand, more literal characters need to be escaped, but I find that tends to be a rarer occurrence).
(Aside: note that . usually means "any character", but inside a character class [.] it means "literally a .", since after all having it mean "any character" in there would be pretty useless.)

Exclude the last character of a regex match

I have the following regex:
%(?:\\.|[^%\\ ])*%([,;\\\s])
That works great but obviously it also highlights the next character to the last %.
I was wondering how could I exclude it from the regex?
For instance, if I have:
The files under users\%username%\desktop\ are:
It will highlight %username%\ but I just want %username%. On the other hand, if I leave the regex like this:
%(?:\\.|[^%\\ ])*%
...then it will match this pattern that I don't want to:
%example1%example2%example3
Any idea how to exclude the last character in the match through a regex?
%(?:\\.|[^%\\ ])*%(?=[,;\\\s])
^^
Use a lookahead.What you need here is 0 width assertion which does not capture anything.
You can use a more effecient regex than you are currently using. When alternation is used together with a quantifier, there is unnecessary backtracking involved.
If the strings you have are short, it is OK to use. However, if they can be a bit longer, you may need to "unroll" the expression.
Here is how it is done:
%[^"\\%]*(?:\\.[^"\\%]*)*%
Regex breakdown:
% - initial percentage sign
[^"\\%]* - start of the unrolled pattern: 0 or more characters other than a double quote, backslash and percentage sign
(?:\\.[^"\\%]*)* - 0 or more sequences of...
\\. - a literal backslash followed by any character other than a newline
[^"\\%]* - 0 or more characters other than a double quote, backslash and percentage sign
% - trailing percentage sign
See this demo - 6 steps vs. 30 steps with your %(?:\\.|[^" %\d\\])*%.

Matching a string with a known prefix and suffix with regex using Grep

I'm trying to match all strings with a known prefix and a mostly known suffix.
The prefix will be any 3 uppercase characters.
The suffix will be one uppercase C and zero or one numbers afterward.
ex. C or Cx where x is any number
The middle substring is of unknown length and is uppercase letters only.
Examples:
GORABJKAC3 [match]
GORCCCCC [match]
GORBBBBCCC [match
GORBBBBCA [no match]
BORBBBBCA2 [no match]
I tried something like grep ^GOR[:upper:]*C[:digit:]* but that doesn't work.
I think [:upper:] may just consume all uppercase letters, including the suffix C I want to match.
How can I match my desired string with regex using grep?
You can use this regex:
\b[A-Z]{3}.*?C[0-9]?\b
RegEx Demo
Or using anchor (if these strings are on separate lines):
^[A-Z]{3}.*?C[0-9]?$
Use [A-Z] instead of [:upper:] and [0-9] instead of [:digit:].
Also, * means 0 or more, + means 1 or more, and ? means 0 or 1. I think you want to be using + and ?.
+ and ? are special Perl regex characters so add the -P flag to your grep command.
So final regex : grep -P ^GOR[A-Z]+C[0-9]?

GPA regex in Perl

I'm attempting to make a GPA validation regex in Perl, and I seem to have something wrong with my logic. You should be able to end a number 0-3 followed by a . with 1 more digit in the range of 0-9. or if the first digit is a 4 it must be followed with a .0 Here's my code:
$get_gpa_input =~ m/[0-3]\.\d[0-9]|[4].[0]/
m/(?: [0-3] [.] [0-9] ) | 4[.]0 /x
Remove the [0-9]. You've also got some extra brackets and you should escape the decimal in '4.0'.
$get_gpa_input =~ m/[0-3]\.\d|4\.0/
If you are doing validation, you don't want to search within a string but rather to force the entire string to match your regex; you do this by adding anchors to the beginning and ending:
/\A (?: [0-3]\.[0-9] | 4\.0 ) \z/x
\A matches only before the first character of the string, \z matches only after the last character of the string.
Avoid using \d in most code since it can match any number of Unicode "digits" that aren't 0 through 9 (though in newer perls, the /a flag reverts it to its old ASCII meaning).
You have \d[0-9] which would require two digits following 0-3. You also don't escape the decimal in the 4 alternate, which may make a difference.
[0-3]\.\d|4\.0

Regex for password that requires one numeric or one non-alphanumeric character

I'm looking for a rather specific regex and I almost have it but not quite.
I want a regex that will require at least 5 charactors, where at least one of those characters is either a numeric value or a nonalphanumeric character.
This is what I have so far:
^(?=.*[\d]|[!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
So the problem is the "or" part. It will allow non-alphanumeric values, but still requires at least one numeric value. You can see that I have the or operator "|" between my require numerics and the non-alphanumeric, but that doesn't seem to work.
Any suggestions would be great.
Try:
^(?=.*(\d|\W)).{5,20}$
A short explanation:
^ # match the beginning of the input
(?= # start positive look ahead
.* # match any character except line breaks and repeat it zero or more times
( # start capture group 1
\d # match a digit: [0-9]
| # OR
\W # match a non-word character: [^\w]
) # end capture group 1
) # end positive look ahead
.{5,20} # match any character except line breaks and repeat it between 5 and 20 times
$ # match the end of the input
Perhaps this may work for you:
^.*[\d\W]+.*$
And use some code like this to check string size:
if(str.len >= 5 && str.len =< 20 && regex.ismatch(str, "^.*[\d\W]+.*$")) { ... }
Is it really necessary to stuff everything in a giant regex? Just use program logic (5 ≤ length(s) ≤ 20) ∧ (/[[:digit:]]/ ∨ /[^[:alpha:]]/). Far more readable syntactically and semantically, I think.
Pretty simple solution, once S.Mark got me on the right track, just needed to merge my numeric and non-alphanumeric pieces as one.
Here's the final regex for anyone that's interested:
^(?=.*[\d!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
This will allow any password between 5 and 20 characters and requires at least one letter and one numeric and/or one non-alphanumeric character.
How about like this?
^.*?[\d!##$%\^*()_\-+=\[{\]};:|\./].*$
For the length 5,20 Please use normal strlen function