RegEx to replace entire string with first two values - regex

I'm trying to come up with a regex expression to replace an entire string with just the first two values. Examples:
Entire String: AO SMITH 100108283 4500W/240V SCREW-IN ELEMENT, 11"
First Two Values: AO SMITH
Entire String: BRA14X18HEBU / P11-042 / 310-470NL BRASS 1/4 x 1/8 HEX
BUSHING
First Two Values: BRA14X18HEBU / P11-042
Entire String: TWO-HOLE PIPE STRAP 4" 008004EG 72E 4
First Two Values: TWO-HOLE PIPE
The caveat is I'm wanting to preserve any kind of special characters and not count them, like "/"'s and "-"'s. The current code I've written does not, instead leaves the new values entirely blank. Only the first example above works.
Here's what I've got so far:
Matching Value:
^(\w+) +(\w+).+$
New Value:
$1 $2

One option could be using a single capture group and use that in the replacement.
^(\w+(?:-\w+)?(?: +\/)? +\w+(?:-\w+)?).+
The pattern matches:
^ Start of string
( Capture group 1
\w+(?:-\w+)?Match 1+ word charss with an optional part to match a - and 1+ word chars
(?: +\/)? Optionally match /
+\w+(?:-\w+)? Match 1+ word charss with an optional part to match a - and 1+ word chars
) Close group 1
.+ Match 1+ times any char (the rest of the line)
If there can be more than 1 hyphen, you can use * instead of ?
Regex demo
Output
AO SMITH
BRA14X18HEBU / P11-042
TWO-HOLE PIPE
A broader match could be matching non word chars in between the words
^(\w+(?:-\w+)*[\W\r\n]+\w+(?:-\w+)*).+
Regex demo

Related

Regex: match patterns starting from the end of string

I wish to match a filename with column and line info, eg.
\path1\path2\a_file.ts:17:9
//what i want to achieve:
match[1]: a_file.ts
match[2]: 17
match[3]: 9
This string can have garbage before and after the pattern, like
(at somewhere: \path1\path2\a_file.ts:17:9 something)
What I have now is this regex, which manages to match column and line, but I got stuck on filename capturing part.. I guess negative lookahead is the way to go, but it seems to match all previous groups and garbage text in the end of string.
(?!.*[\/\\]):(\d+):(\d+)\D*$
Here's a link to current implementation regex101
You can replace the lookahead with a negated character class:
([^\/\\]+):(\d+):(\d+)\D*$
See the regex demo. Details:
([^\/\\]+) - Group 1: one or more chars other than / and \
: - a colon
(\d+) - Group 2: one or more digits
: - a colon
(\d+) - Group 3: one or more digits
\D*$ - zero or more non-digit chars till end of string.

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

How to use Ruby gsub with regex to do partial string substitution

I have a pipe delimited file which has a line
H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||
I want to substitute the date (28092017) with a regex "[0-9]{8}" if the first character is "H"
I tried the following example to test my understanding where Im trying to subtitute "a" with "i".
str = "|123||a|"
str.gsub /\|(.*?)\|(.*?)\|(.*?)\|/, "\|\\1\|\|\\1\|i\|"
But this is giving o/p as
"|123||123|i|"
Any clue how this can be achieved?
You may replace the first occurrence of 8 digits inside pipes if a string starts with H using
s = "H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||"
p s.gsub(/\A(H.*?\|)[0-9]{8}(?=\|)/, '\100000000')
# or
p s.gsub(/\AH.*?\|\K[0-9]{8}(?=\|)/, '00000000')
See the Ruby demo. Here, the value is replaced with 8 zeros.
Pattern details
\A - start of string (^ is the start of a line in Ruby)
(H.*?\|) - Capturing group 1 (you do not need it when using the variation with \K): H and then any 0+ chars as few as possible
\K - match reset operator that discards the text matched so far
[0-9]{8} - eight digits
(?=\|) - the next char must be |, but it is not added to the match value since it is a positive lookahead that does not consume text.
The \1 in the first gsub is a replacement backreference to the value in Group 1.

regex to match pattern followed some string

I have following text. I want to capture the pattern ddd-dd-ddd followed by all text until I again hit a ddd-dd-ddd.
I am trying to use this regex
\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b.*
it matches 982-99-122 followed by the sentence until it hits a line feed. then again the second number 586-33-453 is matched followed by the text on the same line. but it fails to capture the text that continues on the next line.
OR if I remove the line feed from this string, it will only capture the first number 982-99-122 and captures the whole string i.e. does not match the second number 586-33-453
How should I fix both these issues, 1. when line feeds are part of the string and 2. when the string does not have line feeds.
982-99-122 (FCC 333/22) lube oil service pump 1b discharge lube oil service pump
aaa bb dsdsd
586-33-453 Matches exactly 3 times 0-e single character in the range between
dfldfldflkdf 545-66-666 sdkjsl () jdfkjd-kfdkf sdfl
848-99-040 sdsd"" df
dfdf
It seems you want
\b([0-9]{3}-[0-9]{2}-[0-9]{3})\b([\s\S]*?)(?=\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b|$)?
See the regex demo
Details
\b - word boundary
([0-9]{3}-[0-9]{2}-[0-9]{3}) - 3 digits, -, 2 digits, - and 3 digits
\b - word boundary
([\s\S]*?) - Group 2: any 0+ chars, as few as possible
(?=\b[0-9]{3}-[0-9]{2}-[0-9]{3}\b|$)? - a positive lookahead that requires 3 diigts, -, 2 digits, - and 3 digits as a whole word or end of string immediately to the right of the current location.

Regex to match a unlimited repeating pattern between two strings

I have a dataset with repeating pattern in the middle:
YM10a15b5c27
and
YM1b5c17
How can I get what is between "YM" and the last two numbers?
I'm using this but is getting one number in the end and should not.
/([A-Z]+)([0-9a-z]+)([0-9]+)/
Capture exactly two characters in the last group:
/([A-Z]+)([0-9a-z]+)([0-9]{2})/
You should use:
/^(?:([a-z]+))([0-9a-z]+)(?=\1)/
^ matches the start of the sentence. This is really important, because if your code is aaaa1234aaaa, then without the ^, it would also match the aaaa of the end.
(?:([a-z]+)) is a non-capturing group which takes any letter from 'a' to 'z' as group 1
(?=\1) tells the regex to match the text as long as it is followed by the same code at the starting.
All you have to do is extract the code by group(2)
An example is shown here.
Solution
If you want to match these strings as whole words, use \b(([a-z])\2)([0-9a-z]+)(\1)\b. If you need to match them as separate strings, use ^(([a-z])\2)([0-9a-z]+)(\1)$.
Explanation
\b - a word boundary (or if ^ is used, start of string)
(([a-z])\2) - Group 1: any lowercase ASCII letter, exactly two occurrences (aa, bb, etc.)
([0-9a-z]+) - Group 3: 1 or more digits or lowercase ASCII letters
(\1) - Group 4: the same text as stored in Group 1
\b - a word boundary (or if $ is used, end of string).