How do I add characters to RegEx results? - regex

I am not a programmer but I am having to use RegEx for a particular purpose. How do I add specific characters to what is being returned from RegEx?
For example, if I have a list as follows:
XYZ ABC 123
How do I use RegEx to add something specific to the end of each? For example, if I want all three to end with .com for example?

You can try this script, replacing XYZ abc 123 with your full list:
echo "XYZ abc 123" | sed -E 's/([a-zA-Z0-9]+)/\1.com/g'
Explanation:
s/ Starts a substitution regex
([a-zA-Z0-9]+) Capture at least one alphanumeric
/ End regex
\1.com replaces the capture with itself plus adds .com
/g Global modifier (for all matches)
Without knowing which regex engine you want to use, there are many other ways to do this. In the future, please give more information.

Related

Regex AND search inside block which enclosed by something delimiter

I want to regex AND search (?=)(?=) inside block which enclosed by something delimiter such as #
In following sample regex, what I expected is, cat to ugly matches to the pattern inside # cat B to before # cat C.
But the regex match to nothing.
regex
^#(?=[\s\S]*(cat))(?=[\s\S]*(ugly))^#
text
# cat A
the cat is
very cute.
# cat B
the cat is
very ugly.
# cat C
the cat is
very good.
#
You can test the regex on https://regexr.com/
In your pattern ^#(?=[\s\S]*(cat))(?=[\s\S]*(ugly))^# you use match a # from the start of the string ^#, followed by 2 positive lookaheads and then again match ^#. That is why you don't get a match.
To get a more exact match, you could start the pattern with ^# cat B
If you want to use lookaheads, you might use 2 capturing groups in the positive lookahead. If you want to search for cat and ugly as whole words you might use word boundaries \b.
The (?s) is a modifier that enables the dot matching a newline for which you might also use /s as a flag instead.
(?s)(?=^# cat B.*?(cat).*?(ugly).*?^# cat C
Regex demo
But it might be easier to not use the lookahead and match instead:
(?s)^# cat B.*?(cat).*?(ugly).*?^# cat C$
Php demo
This RegEx might help you to design/match your target words by bounding them using \n.
((.+)(cat)(.+))\n((.+)(ugly)(.+))
Just to be simple, it creates four groups for each target keywords: 🐈 and ugly, where your target keywords can be called using $3 and $7:
You could additionally bound it with start ^ and end $, if you wish.
This expression only works when your target keywords are in the middle of both lines.

regex: accepting but not capturing a pattern

I want a pattern that matches on
ab
a-b
a b
a b
a-b
where a and b can be any pattern, but are reduced to a and b for simplicity.
I want to return "ab" in all these cases. Can I do it all by regex or do I have to receive the matched expressions along with the separator characters and process them in code, by replacing the said characters and the like?
Might misunderstood your meaning, if so I'm sorry about it.
You can group things in regexp with quotes (),
For example, with your case:
(a)(-|\s+)?(b)
And later use \1 and \3 to refer a and b. so \1\3 would mean ab.
Note some tools may need to use \\1\\3 instead.
Check the doc of your language to find out the exact regexp rules.
I'm not sure where will you use this, here I use sed as an example:
$ echo -e "ab\na-b\na b\na b\n"|sed -E 's/^(a)(-| +)?(b)$/\1\3/'
ab
ab
ab
ab
Note the regex used here is ^(a)(-| +)?(b)$, the ^ and $ are to match the beginning and ending of a string/line.
In other words, those lines can be accepted by that regexp -- In some cases it's already validated.
But if you want to return ab, that's not simple matching but an addtional step of replace/reorganizing needed.

capturing each word containing pattern regex

I'm trying to write a sed script that finds every word that contains a certain pattern and then prepends all words that contain that pattern. For example:
foobarbaz barfoobaz barbazfoo barbaz
might turn into:
quxfoobarbaz quxbarfoobaz quxbarbazfoo barbaz
I understand the basics of capture groups and backrefrences, but I'm still having trouble. Specifically I can't get it so that it captures each whole word separately.
s/\(.*\)men\(.*\)/ not just the \1men\2, but the \1women\2 and \1children\2 too /
I tried using \s, for whitespace as many sites recommend, but sed treats \s as the separate characters \ and s
You could use the non-space character \S as follows:
sed 's/\S*foo\S*/qux&/g' <<< "foobarbaz barfoobaz barbazfoo barbaz"
this will match words containing foo. The replacement string qux& will prepend every matched pattern with qux. Output:
quxfoobarbaz quxbarfoobaz quxbarbazfoo barbaz
It works fine if no spaces in each word.
echo "foobarbaz barfoobaz barbazfoo barbaz" | sed 's/\([^ ]*foo[^ ]*\)/qux\1/g'

How to extract match pattern only using regex without other tools

I need to write a regular expression that after seeing "aaa" code, this regex should print only 6-digit code, not entire line. There is only one 6-digit code in a line, and it is after "aaa".
I can't use sed, awk, grep ... etc. My application only accepts regex.
Examples:
x aaa y z 123456 returns 123456
aaa x 654321 y z returns 654321
I tried this regex with backreference, not sure how not to repeat [\d]{6} though
(.*)(aaa)(.*)[\d]{6}((?(2)[\d]{6}|.+)
but it prints the entire line.
Any suggestions?
You could do something like
aaa.+?(\d{6})
and then returning only the first group (with \1)
You could also use backreference with a different regex:
(?<=aaa.+?)\d{6}
this means that you want the first 6 digits after aaa and any other character. Unfortunately many languages don't support variable length backreferences, so I'd go with the first one

Can regex match be based on two lines of text?

Let's say I have
def
abc
xyz
abc
And I want to match
xyz
abc
as a whole
Is this possible using the most generic RegEx possible?
That is not the perl RegEx or .Net Regex which have multi line flags.
I guess it would be BNF to match this.
Many regex implementations allow explicit line terminators. If \n is the line separator, then just search for xyz\nabc.
Regexes work on whatever text you give them, multiline or otherwise. If it happens to contain linefeeds, then it's nominally "multiline" text, but you don't have to do anything special to match it with regexes. Linefeed is just another character.
The name "multiline flag" (or "multiline mode") confuses many people. All that flag does is change the meaning of the ^ and $ anchors, allowing them to match at the beginning and end of logical lines as well as the beginning and end of the whole text.
grep -A2 "xyz" <file_name>
from https://stackoverflow.com/a/34808071/5556553