I need to write a regular expression that after seeing "aaa" code, this regex should print only 6-digit code, not entire line. There is only one 6-digit code in a line, and it is after "aaa".
I can't use sed, awk, grep ... etc. My application only accepts regex.
Examples:
x aaa y z 123456 returns 123456
aaa x 654321 y z returns 654321
I tried this regex with backreference, not sure how not to repeat [\d]{6} though
(.*)(aaa)(.*)[\d]{6}((?(2)[\d]{6}|.+)
but it prints the entire line.
Any suggestions?
You could do something like
aaa.+?(\d{6})
and then returning only the first group (with \1)
You could also use backreference with a different regex:
(?<=aaa.+?)\d{6}
this means that you want the first 6 digits after aaa and any other character. Unfortunately many languages don't support variable length backreferences, so I'd go with the first one
Related
I have a file that has lines that contain text like this
something,12:3456789,somethingelse
foobar,12:345678,somethingdifferent
For lines where the second item in the line has 6 digits after the : I would like to alternate the format of it by adding a 0 in the front and shifting the :. For example the above would change to:
something,12:3456789,somethingelse
foobar,01:2345678,somethingdifferent
I can't figure out how to do this using sed or any unix command line tool
You just need to match the middle section where you have 2 digits followed by : followed by exactly 6 digits. If you capture the text in individual groups appropriately you can move them around in your result. Note the \b word boundary at the end of the pattern is to ensure that we match on exactly 6 digits and don't match on lines which have the full 7 digits:
/\b(\d)(\d):(\d{6})\b/0\1:\2\3/
|__________________| |______|
pattern replacement
This gives the expected output. You can experiment with it online here
sed doesn't have Perl style specifiers such as \d. Instead, you will need to use [[:digit:]]. Here is the updated regex that works with sed
sed -E 's/\b([[:digit:]])([[:digit:]]):([[:digit:]]{6})\b/0\1:\2\3/g' myfile.txt
As #Jonathan Leffler pointed out, \b doesn't work on Mac's sed so you will instead need to add commas in your regex pattern at the front and back and then replace them back in the replacement pattern
I want a pattern that matches on
ab
a-b
a b
a b
a-b
where a and b can be any pattern, but are reduced to a and b for simplicity.
I want to return "ab" in all these cases. Can I do it all by regex or do I have to receive the matched expressions along with the separator characters and process them in code, by replacing the said characters and the like?
Might misunderstood your meaning, if so I'm sorry about it.
You can group things in regexp with quotes (),
For example, with your case:
(a)(-|\s+)?(b)
And later use \1 and \3 to refer a and b. so \1\3 would mean ab.
Note some tools may need to use \\1\\3 instead.
Check the doc of your language to find out the exact regexp rules.
I'm not sure where will you use this, here I use sed as an example:
$ echo -e "ab\na-b\na b\na b\n"|sed -E 's/^(a)(-| +)?(b)$/\1\3/'
ab
ab
ab
ab
Note the regex used here is ^(a)(-| +)?(b)$, the ^ and $ are to match the beginning and ending of a string/line.
In other words, those lines can be accepted by that regexp -- In some cases it's already validated.
But if you want to return ab, that's not simple matching but an addtional step of replace/reorganizing needed.
I am not a programmer but I am having to use RegEx for a particular purpose. How do I add specific characters to what is being returned from RegEx?
For example, if I have a list as follows:
XYZ ABC 123
How do I use RegEx to add something specific to the end of each? For example, if I want all three to end with .com for example?
You can try this script, replacing XYZ abc 123 with your full list:
echo "XYZ abc 123" | sed -E 's/([a-zA-Z0-9]+)/\1.com/g'
Explanation:
s/ Starts a substitution regex
([a-zA-Z0-9]+) Capture at least one alphanumeric
/ End regex
\1.com replaces the capture with itself plus adds .com
/g Global modifier (for all matches)
Without knowing which regex engine you want to use, there are many other ways to do this. In the future, please give more information.
I need a regular expression that will find a number(s) that is not inside parenthesis.
Example abcd 1 (35) (df)
It would only see the 1.
Is this very complex? I've tried and had no luck.
Thanks for any help
An easy solution is to first remove the unwanted values:
my $string = "abcd 12 (35) (df) 2311,22";
$string =~ s/\(\d+\)//g; # remove numbers within parens
my #numbers = $string =~ /\d+/g; # extract the numbers
This is quite hard but something like this will probably do:
^(?:\()(\d+)(?:[^)])|(?:[^(0-9]|^)(\d+)(?:[^)0-9]|^)|(?:[^(])(\d+)(?:\))$
The problem is to match (123, 123) and also to not match the string 123 as the number 2 between the non-parentheses characters 1 and 3. Also there are probably some edge cases for start of and end of string.
My suggestion is to not use a regex for this. Maybe a regex that matches numbers and then use the capture info to check if the surrounding characters are not parentheses.
The regular expression would be:
^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$
The result is the first (and only) matching group of the regex.
Maybe you want to remove the ^ and $ if the regex should not match only if it’s the content of a whole single line. You can also use [a-zA-Z] or [[:alpha:]]. This depends on the regular expression engine you use and, of course, the content you want to match.
Example perl code:
if (m/^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$/) {
print("$1\n");
}
Please note that your question contains not enough information to make a good answer possible (you did not say anything about the general format of your expression, for example if you want to match integers or floating points)
How about
/(?:^|[^\d(])(\d+)(?:[^\d)]|$)/
? This matches a string of digits (\d+) that are
preceded by the beginning of the string, or a character that is not a digit or an open parenthesis ((?:^|[^\d(]))
succeeded by the end of the string, or by a character that is not a digit or a close parenthesis ((?:[^\d)]|$))
I need a regex script to remove double repetition for these particular words..If these character occurs replace it with single.
/[\s.'-,{2,0}]
These are character that if they comes I need to replace it with single same character.
Is this the regex you're looking for?
/([\s.'-,])\1+/
Okay, now that will match it. If you're using Perl, you can replace it using the following expression:
s/([\s.'-,])\1+/$1/g
Edit: If you're using :ahem: PHP, then you would use this syntax:
$out = preg_replace('/([\s.\'-,])\1+/', '$1', $in);
The () group matches the character and the \1 means that the same thing it just matched in the parentheses occurs at least once more. In the replacement, the $1 refers to the match in first set of parentheses.
Note: this is Perl-Compatible Regular Expression (PCRE) syntax.
From the perlretut man page:
Matching repetitions
The examples in the previous section display an annoying weakness. We were only matching 3-letter words, or chunks of words of 4 letters or less. We'd like to be able to match words or, more generally, strings of any length, without writing out tedious alternatives like \w\w\w\w|\w\w\w|\w\w|\w.
This is exactly the problem the quantifier metacharacters ?, *, +, and {} were created for. They allow us to delimit the number of repeats for a portion of a regexp we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:
a? means: match 'a' 1 or 0 times
a* means: match 'a' 0 or more times, i.e., any number of times
a+ means: match 'a' 1 or more times, i.e., at least once
a{n,m} means: match at least "n" times, but not more than "m" times.
a{n,} means: match at least "n" or more times
a{n} means: match exactly "n" times
As others said it depends on you regex engine but a small example how you could do this:
/([ _-,.])\1*/\1/g
With sed:
$ echo "foo , bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo , bar
$ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo,. bar
Using Javascript as mentioned in a commennt, and assuming (It's not too clear from your question) the characters you want to replace are space characters, ., ', -, and ,:
var str = 'a b....,,';
str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5');
// Now str === 'a b..,'
If I understand correctly, you want to do the following: given a set of characters, replace any multiple occurrence of each of them with a single character. Here's how I would do it in perl:
perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt
If, for example, text.txt originally contains:
Here is . and here are 2 .. that should become a single one. Here's
also a double -- that should become a single one. Finally here we have
three ''' which should be substituted with one '.
it is modified as follows:
Here is . and here are 2 . that should become a single one. Here's
also a double - that should become a single one. Finally here we have
three ' which should be substituted with one '.
I simply use the same replacement regex for each character in in the set: for example
s/\.{2,}/\./g;
replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of this expressions, one for each character of your original set.
There may be more compact ways of doing this, but, I think this is simple and it works :)
I hope it helps.