Extract specific text surrounded by white space - regex

I have written some basic regex:
/[0123BCDER]/g
I would like to extract the bold numbers from the below lines. The regex i have written extracts the character however i would only like to extract the character if it is surrounded by white space. Any help would be much appreciated.
721658A220421EE5867 AMBER YUR DE STE 30367887462580 **1** 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 **1** 00355133
982658A230421MC1234 SEAN D W MC100050420965155230421 **3** 14032887609303 00355134
Please note the character or digit will always be by itself.

You are loking for something like this: /\s\d\s/g.
\s - match whitespace,
\d - match any digit,
/g - match all occurrences.
You can also replace \d with e.g. [0123BCDER] (your example) or [0-9A-Za-z] (all alphanumberic).
const input = `721658A220421EE5867 AMBER YUR DE STE 30367887462580 1 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 1 00355133 _
982658A230421MC1234 SEAN D W MC100050420965155230421 3 14032887609303 00355134
`
// with whitespaces
const res = input.match(/\s\d\s/g)
console.log(res)
// alphanumeric
const res2 = input.match(/\s[A-Za-z0-9]\s/g)
console.log(res2)

Related

Notepad++ add new line above changing syntax with replace

I have a constant syntax of "Se " but there is a number in front of it that changes. I want to add a newline \n before the number. I've tried using \c to address any character (for the changing number) during replace, I don't know how to get the number part to copy over or work.
this is what it currently looks like
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
I want the new line to be above any item that includes "Se", so that it looks like this
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
this is what i've tried so far (not including parenthesis)
REPLACE TOOL
Find what: [\C Se ]
Replace with: [\n\C Se ]
✓ = Regular expression
but this is what I get
1 hinge 2pk
C Se wall cabinet
4 door 15x40
How do I get the number to the left of "Se" to copy down (as this number is always changing)
You can use:
^\d+\h+Se\b
^ Start of string
\d+ Match 1+ digits
\h+ Match 1+ spaces
Se\b Match Se followed by a word boundary
Regex demo
In the replacement use a newline and the full match \n$0
Find what:
^\d+\h+Se\b
Replace with
\n$0
Well, try this simple code, hope it will help...
Find:^(\d.*? Se .*\n)
Replace with:\n$1 or \n\1

REGEX Two expressions [duplicate]

I have written some basic regex:
/[0123BCDER]/g
I would like to extract the bold numbers from the below lines. The regex i have written extracts the character however i would only like to extract the character if it is surrounded by white space. Any help would be much appreciated.
721658A220421EE5867 AMBER YUR DE STE 30367887462580 **1** 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 **1** 00355133
982658A230421MC1234 SEAN D W MC100050420965155230421 **3** 14032887609303 00355134
Please note the character or digit will always be by itself.
You are loking for something like this: /\s\d\s/g.
\s - match whitespace,
\d - match any digit,
/g - match all occurrences.
You can also replace \d with e.g. [0123BCDER] (your example) or [0-9A-Za-z] (all alphanumberic).
const input = `721658A220421EE5867 AMBER YUR DE STE 30367887462580 1 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 1 00355133 _
982658A230421MC1234 SEAN D W MC100050420965155230421 3 14032887609303 00355134
`
// with whitespaces
const res = input.match(/\s\d\s/g)
console.log(res)
// alphanumeric
const res2 = input.match(/\s[A-Za-z0-9]\s/g)
console.log(res2)

Can I extract the values using regex in a single regex string

Any help will be appreciated. I have written a regex which fails in some edge cases. Not sure if there is a way to handle this.
I am trying to extract the values which having a 1.1 and 1.2 etc etc.
The regex I am using is
"[1-9]\.[1-9]([^\s]+)" If i use it it extracts the first three values but the 4.1 which has a space, only part is extracted. If i use "[1-9]\.1.*[(XDX)]$" It starts to capture the whole line.
Currently I have written a logic which check for MR and splits it and puts in array which is very inefficient way to do.
Let me know if you can think of a better solution than this one.
GIBBERISH
1.1CDDAX/SXEVEN MR*XDX 2.1CDDAX/JEROME MR*XDX
3.1CDDAX/SIXM MR*XDX 4.1CDDAX AMX/SIXM MR*XDX
1 OXP EY 31SED W PK3 MEL/REDOOK DEOPRE 31SED21 XO XRXVEL DEF
EXPRESSA VERO IN IIS AETATIBUS, QUAE IAM CONFIRMATAE SUNT. ATQUI
PERSPICUUM EST HOMINEM E CORPORE ANIMOQUE CONSTARE,
CUM PRIMAE SINT ANIMI PARTES, SECUNDAE CORPORIS. TUM QUINTUS:
EST PLANE, PISO, UT DICIS, INQUIT. BONA AUTEM CORPORIS HUIC SUNT,
QUOD POSTERIUS POSUI, SIMILIORA. ILLA TAMEN SIMPLICIA
You may use
(?<!\S)[1-9]\.[1-9](.*?)(?=\s+MR\*XDX|$)
Or,
(?<!\S)[1-9]\.[1-9]((?:(?!\s+MR\*XDX).)+)
See this regex #1 demo or regex #2 demo
Details
(?<!\S) - a whitespace should come right before the current location or start of string
[1-9]\.[1-9] - a digit from 1 to 9, then a ., and then again a digit from 1 to 9
(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
(?=\s+MR\*XDX|$) - .*? will stop matching before the first occurrence of
\s+MR\*XDX - 1+ whitespace and then MR*XDX substring
| - or
$ - end of string.

Identify an address number in a string

I have a list of addresses, currently quite unclean. They take the format:
955 - 959 Fake Street
95-99 Fake Street
4-9 M4 Ln
95 - 99 Fake Street
99 Fake Street
What I would like to do is split up the street name and street number. I need a regex expression that is true for
955 - 959
95-99
4-9
95 - 99
99
I currently have this:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
which works for the two digit addresses but does not work for the three or one digit addresses.
Thanks
I'm not sure what you're trying to do here \s*+ but you basically had the answer with the last part [0-9][0-9]+ that would find 2+ digits on the end.
Maybe try this (it's more concise). This searches for 1+ digits instead of 2+
\d+(\s*-\s*\d+)?
You can use braces {2,3} for 2-3 numbers - but also *+ isn't right.
/^(([0-9]{1,3}\s-\s)?[0-9]{1,3})\s/
I nested the braces so you only want the first result from the regex.
it breaks up like this
([0-9]{1,3}\s-\s)?
first, Is there a 1-3 digit number with a space-dash-space - OPTIONAL
then.. does it end in a 1-3 digit number followed by a space.
Starting from your regex:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
You got an extra white space matcher in the second block:
^[0-9][0-9]\s*+(-\s*[0-9][0-9]+)
I would suggest you replace [0-9] with \d
^[\d][\d]\s*+(-\s*[\d][\d]+)
Use a + instead o 2 copies of \d meaning at least one number:
^[\d]+\s*+(-\s*[\d]+)
Make the last block optional, so it matches 99 Fake Address:
^[\d]+\s*+(-\s*[\d]+)?
If you know there's only going to be 1 white space, you could replace \s* with \s?:
^[\d]+\s?(-\s?[\d]+)?
That should match all of them :D
For your example, you can do:
/^(\d+[-\s\d]*)\s/gm
Demo
Explanation:
/^(\d+[-\s\d]*)\s/gm
^ start of line
^ at least 1 digit and as many digits as possible
^ any character of the set -, space, digit
^ zero or more
^ trailing space
^ multiline for the ^ start of line assertion
Another way could be
In [83]: s = '955 - 959 Fake Street'
In [84]: s1 = '95-99 Fake Street'
In [85]: s2 = '95 - 99 Fake Street'
In [86]: s3 = '99 Fake Street'
In [87]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s3)
In [88]: d.group()
Out[88]: '99 '
In [89]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s2)
In [90]: d.group()
Out[90]: '95 - 99'
In [91]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s1)
In [92]: d.group()
Out[92]: '95-99'
In [93]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s)
In [94]: d.group()
Out[94]: '955 - 959'
the character set 0-9 cab be represented by \d like this
d = re.search(r'^[\d]+[ ]*(-[ ]*[\d]+){0,1}', s)
Here, in all the examples, we are searching at the beginning of the string, for a sequence of at least one digit followed by zero or more spaces and optionally followed by at most one sequence of only one - symbol followed by zero or more spaces and at least one or more digits.

Sublime Text regex with optional groups

Example Data (including newlines)
Alpha1
100
Bravo2
something else
Charlie3
200
Delta4
A==1
Echo5
300
Foxtrot6
I would like to get the output of:
Alpha1 100 Bravo2 something else
Charlie3 200 Delta4 A==1
Echo5 300 Foxtrot6
The pattern is:
AlphaNumeric
Numeric
AlphaNumeric
value that is not a single alphanumeric "word"
The first three parts are easy -- (\w+)\s+(\d+)\s+(\w+)\s+ -- but I don't know how to have the conditional fourth group. Is this possible? If so, how?
This pattern worked for me:
(\w+)\s+(\d+)\s+(\w+)(\s+([\w\s]+ \w+$))?
Before:
After:
Of course, \w includes underscores, so replace \w with [a-z0-9] if necessary.
Update
This pattern is more specific and should be more reliable:
(\w+)\n(\d+)\n(\w+)(\n([^\n]*[^\w\n][^\n]*))?
Before:
After: