Select everything before & or everything if there is no & - regex

I want to use regex to split some text.
my text:
Hello&World
Hello
0011&World
0011
using (.*)(\&.*) only matches 'Hello&World' and '0011&World' and (.*)(\&.*)? ignores the last part.
For the first 2 I want to get 'Hello' and the last 2 I want to get '0011'
Thank you

It seems you need to fetch 0+ chars other than & at the beginning of a string.
Use the following regex:
^[^&]*
See the regex demo.
Details:
^ - start of string
[^&]* - a negated character class matching zero or more (*) chars other than & (to match 1 or more replace * with +).
See the Python demo:
import re
ss = ['Hello&World','Hello','0011&World','0011']
for s in ss:
print(re.match('[^&]*', s).group())
# print(re.search('^[^&]*', s).group())
Note that re.match looks for a match only at the start of the string, thus making ^ redundant in the pattern.
Else, if you use re.search, the ^ anchor is necessary to anchor the search at the start of the string.

Related

Regex match the unknown characters with dash between

I'm struggling with the following combination of characters that I'm trying to parse:
I have two types of text:
1. AF-B-W23F4-USLAMC-X99-JLK
2. LS-V-A23DF-SDLL--X22-LSM
I want to get the last two combination of characters devided by - within dash.
From the 1. X99-JLK and from the 2. X22-LSM
I accomplished the 2. with the following regex '--(.*-.*)'
How can I parse the 1. sample and is there any option to parse it at one time with something like OR operator?
Thanks for any help!
The pattern --(.*-.*) that you tried matches the second example because it contains -- and it matches the first occurrence.
Then it matches until the end of the string and backtracks to find another hyphen.
As .* can match any character (also -) and there are no anchors or boundaries set, this is a very broad match.
If there have to be 2 dashes, you can match the first one, and use a capture group for the part with the second one using a negated character class [^-]
The character class can also match a newline. If you don't want to match a newline you can use [^-\r\n] or also not matching spaces [^-\s] (as there are none in the example data)
-([^-]+-[^-]+)$
Explanation
- Match -
( Capture group 1
[^-]+-[^-]+ Match the second dash between chars other than -
) Close group 1
$ End of string
See a regex demo
For example using Javascript:
const regex = /-([^-]+-[^-]+)$/;
[
"AF-B-W23F4-USLAMC-X99-JLK",
"LS-V-A23DF-SDLL--X22-LSM"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
You can try lookahead to match the last pair before the new line. JavaScript example:
const str = `
AF-B-W23F4-USLAMC-X99-JLK
LS-V-A23DF-SDLL--X22-LSM
`;
const re = /[^-]*-[^-]*(?=\n)/g;
console.log(str.match(re));

Kotlin / Regex - Replace a group of pattern with a repeating character

I would like to mask the email passed in the maskEmail function. I'm currently facing a problem wherein the asterisk * is not repeating when i'm replacing group 2 and and 4 of my pattern.
Here is my code:
fun maskEmail(email: String): String {
return email.replace(Regex("(\\w)(\\w*)\\.(\\w)(\\w*)(#.*\\..*)$"), "$1*.$3*$5")
}
Here is the input:
tom.cat#email.com
cutie.pie#email.com
captain.america#email.com
Here is the current output of that code:
t*.c*#email.com
c*.p*#email.com
c*.a*#email.com
Expected output:
t**.c**#email.com
c****.p**#email.com
c******.a******#email.com
Edit:
I know this could be done easily with for loop but I would need this to be done in regex. Thank you in advance.
For your problem, you need to match each character in the email address that not is the first character in a word and occurs before the #. You can do that with a negative lookbehind for a word break and a positive lookahead for the # symbol:
(?<!\b)\w(?=.*?#)
The matched characters can then be replaced with *.
Note we use a lazy quantifier (?) on the .* to improve efficiency.
Demo on regex101
Note also as pointed out by #CarySwoveland, you can replace (?<!\b) with \B i.e.
\B\w(?=.*?#)
Demo on regex101
As pointed out by #Thefourthbird, this can be improved further efficiency wise by replacing the .*? with a [^\r\n#]* i.e.
\B\w(?=[^\r\n#]*#)
Demo on regex101
Or, if you're only matching single strings, just [^#]*:
\B\w(?=[^#]*#)
Demo on regex101
I suggest keeping any char at the start of string and a combination of a dot + any char, and replace any other chars with * that are followed with any amount of characters other than # before a #:
((?:\.|^).)?.(?=.*#)
Replace with $1*. See the regex demo. This will handle emails that happen to contain chars other than just word (letter/digit/underscore) and . chars.
Details
((?:\.|^).)? - an optional capturing group matching a dot or start of string position and then any char other than a line break char
. - any char other than a line break char...
(?=.*#) - if followed with any 0 or more chars other than line break chars as many as possible and then #.
Kotlin code (with a raw string literal used to define the regex pattern so as not to have to double escape the backslash):
fun maskEmail(email: String): String {
return email.replace(Regex("""((?:\.|^).)?.(?=.*#)"""), "$1*")
}
See a Kotlin test online:
val emails = arrayOf<String>("captain.am-e-r-ica#email.com","my-cutie.pie+here#email.com","tom.cat#email.com","cutie.pie#email.com","captain.america#email.com")
for(email in emails) {
val masked = maskEmail(email)
println("${email}: ${masked}")
}
Output:
captain.am-e-r-ica#email.com: c******.a*********#email.com
my-cutie.pie+here#email.com: m*******.p*******#email.com
tom.cat#email.com: t**.c**#email.com
cutie.pie#email.com: c****.p**#email.com
captain.america#email.com: c******.a******#email.com

How to use Ruby gsub with regex to do partial string substitution

I have a pipe delimited file which has a line
H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||
I want to substitute the date (28092017) with a regex "[0-9]{8}" if the first character is "H"
I tried the following example to test my understanding where Im trying to subtitute "a" with "i".
str = "|123||a|"
str.gsub /\|(.*?)\|(.*?)\|(.*?)\|/, "\|\\1\|\|\\1\|i\|"
But this is giving o/p as
"|123||123|i|"
Any clue how this can be achieved?
You may replace the first occurrence of 8 digits inside pipes if a string starts with H using
s = "H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||"
p s.gsub(/\A(H.*?\|)[0-9]{8}(?=\|)/, '\100000000')
# or
p s.gsub(/\AH.*?\|\K[0-9]{8}(?=\|)/, '00000000')
See the Ruby demo. Here, the value is replaced with 8 zeros.
Pattern details
\A - start of string (^ is the start of a line in Ruby)
(H.*?\|) - Capturing group 1 (you do not need it when using the variation with \K): H and then any 0+ chars as few as possible
\K - match reset operator that discards the text matched so far
[0-9]{8} - eight digits
(?=\|) - the next char must be |, but it is not added to the match value since it is a positive lookahead that does not consume text.
The \1 in the first gsub is a replacement backreference to the value in Group 1.

Regular expression for substitute a string with another

I have this two lines of text, that I want to manipulate using Regular Expression and substitute:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.));
Attached on the first Obj. there is a Field name, so in this case they are FieldNameA,FieldNameB
I want to attach these values to the second Obj. found on the same line, so the text should become:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.FieldNameA));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.FieldNameB));
I have tested this very simple (and wrong) regex:
Obj\.(\w*).*\n
With substituition as $1
But I don't know how to use substitution...
Sample code here
Some Notes:
After FieldNameA there is always an equal sign that could be preceded or followed by a space.
Before the second Obj. there could be any character, including < ( etc...
Could this be achieved?
You may use
Find: (Obj\.(\w+).*\(Obj\.)\)
Replace: $1$2)
See the regex demo.
You may also add ^ to the start of the regex to match only at the start of a line/string.
Details
^ - start of string
(Obj\.(\w+).*\(Obj\.) - Group 1 ($1 in the replacement):
Obj\. - Obj. text
(\w+) - Group 2 ($2): 1 or more word chars
.* - any 0+ chars other than line break chars as many as possible (you may use .*? to only match the second Obj. on a line, your current input only has two with the second one closer to the end of a line, so .* will work better)
\(Obj\. - (Obj. text
\) - a ) char.

Get the first ocurrence of a string in a variable REGEX

I have the following variable in a database: PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT and I want to split it into two variables, the first will be PSC-CAMPO-GRANDE-I08 and the second V00-C09-H09-IPRMKT.
I'm trying the regex .*(\-I).*(\-V), this doesn't work. Then I tried .*(\-I), but it gets the last -IPRMKT string.
Then my question is: There a way of split the string PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT considering the first occurrence of -I?
This should do the trick:
regex = "(.*?-I[\d]{2})-(.*)"
Here is test script in Python
import re
regex = "(.*?-I[\d]{2})-(.*)"
match = re.search(regex, "PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT")
if match:
print ("yep")
print (match.group(1))
print (match.group(2))
else:
print ("nope")
In the regex, I'm grabbing everything up to the first -I then 2 numbers. Then match but don't capture a -. Then capture the rest. I can help tweak it if you have more logic that you are trying to do.
You may use
^(.*?-I[^-]*)-(.*)
See the regex demo
Details:
^ - start of a string
(.*?-I[^-]*) - Group 1:
.*? - any 0+ 0+ chars other than line break chars up to the first (because *? is a lazy quantifier that matches up to the first occurrence)
-I - a literal substring -I
[^-]* - any 0+ chars other than a hyphen (your pattern was missing it)
- - a hyphen
(.*) - Group 2: any 0+ chars other than line break chars up to the end of a line.