I have written some basic regex:
/[0123BCDER]/g
I would like to extract the bold numbers from the below lines. The regex i have written extracts the character however i would only like to extract the character if it is surrounded by white space. Any help would be much appreciated.
721658A220421EE5867 AMBER YUR DE STE 30367887462580 **1** 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 **1** 00355133
982658A230421MC1234 SEAN D W MC100050420965155230421 **3** 14032887609303 00355134
Please note the character or digit will always be by itself.
You are loking for something like this: /\s\d\s/g.
\s - match whitespace,
\d - match any digit,
/g - match all occurrences.
You can also replace \d with e.g. [0123BCDER] (your example) or [0-9A-Za-z] (all alphanumberic).
const input = `721658A220421EE5867 AMBER YUR DE STE 30367887462580 1 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 1 00355133 _
982658A230421MC1234 SEAN D W MC100050420965155230421 3 14032887609303 00355134
`
// with whitespaces
const res = input.match(/\s\d\s/g)
console.log(res)
// alphanumeric
const res2 = input.match(/\s[A-Za-z0-9]\s/g)
console.log(res2)
Related
I have a constant syntax of "Se " but there is a number in front of it that changes. I want to add a newline \n before the number. I've tried using \c to address any character (for the changing number) during replace, I don't know how to get the number part to copy over or work.
this is what it currently looks like
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
I want the new line to be above any item that includes "Se", so that it looks like this
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
this is what i've tried so far (not including parenthesis)
REPLACE TOOL
Find what: [\C Se ]
Replace with: [\n\C Se ]
✓ = Regular expression
but this is what I get
1 hinge 2pk
C Se wall cabinet
4 door 15x40
How do I get the number to the left of "Se" to copy down (as this number is always changing)
You can use:
^\d+\h+Se\b
^ Start of string
\d+ Match 1+ digits
\h+ Match 1+ spaces
Se\b Match Se followed by a word boundary
Regex demo
In the replacement use a newline and the full match \n$0
Find what:
^\d+\h+Se\b
Replace with
\n$0
Well, try this simple code, hope it will help...
Find:^(\d.*? Se .*\n)
Replace with:\n$1 or \n\1
I have written some basic regex:
/[0123BCDER]/g
I would like to extract the bold numbers from the below lines. The regex i have written extracts the character however i would only like to extract the character if it is surrounded by white space. Any help would be much appreciated.
721658A220421EE5867 AMBER YUR DE STE 30367887462580 **1** 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 **1** 00355133
982658A230421MC1234 SEAN D W MC100050420965155230421 **3** 14032887609303 00355134
Please note the character or digit will always be by itself.
You are loking for something like this: /\s\d\s/g.
\s - match whitespace,
\d - match any digit,
/g - match all occurrences.
You can also replace \d with e.g. [0123BCDER] (your example) or [0-9A-Za-z] (all alphanumberic).
const input = `721658A220421EE5867 AMBER YUR DE STE 30367887462580 1 00355132
172638A220421ER3028 NIKITA YUAN 318058763400580 1 00355133 _
982658A230421MC1234 SEAN D W MC100050420965155230421 3 14032887609303 00355134
`
// with whitespaces
const res = input.match(/\s\d\s/g)
console.log(res)
// alphanumeric
const res2 = input.match(/\s[A-Za-z0-9]\s/g)
console.log(res2)
Any help will be appreciated. I have written a regex which fails in some edge cases. Not sure if there is a way to handle this.
I am trying to extract the values which having a 1.1 and 1.2 etc etc.
The regex I am using is
"[1-9]\.[1-9]([^\s]+)" If i use it it extracts the first three values but the 4.1 which has a space, only part is extracted. If i use "[1-9]\.1.*[(XDX)]$" It starts to capture the whole line.
Currently I have written a logic which check for MR and splits it and puts in array which is very inefficient way to do.
Let me know if you can think of a better solution than this one.
GIBBERISH
1.1CDDAX/SXEVEN MR*XDX 2.1CDDAX/JEROME MR*XDX
3.1CDDAX/SIXM MR*XDX 4.1CDDAX AMX/SIXM MR*XDX
1 OXP EY 31SED W PK3 MEL/REDOOK DEOPRE 31SED21 XO XRXVEL DEF
EXPRESSA VERO IN IIS AETATIBUS, QUAE IAM CONFIRMATAE SUNT. ATQUI
PERSPICUUM EST HOMINEM E CORPORE ANIMOQUE CONSTARE,
CUM PRIMAE SINT ANIMI PARTES, SECUNDAE CORPORIS. TUM QUINTUS:
EST PLANE, PISO, UT DICIS, INQUIT. BONA AUTEM CORPORIS HUIC SUNT,
QUOD POSTERIUS POSUI, SIMILIORA. ILLA TAMEN SIMPLICIA
You may use
(?<!\S)[1-9]\.[1-9](.*?)(?=\s+MR\*XDX|$)
Or,
(?<!\S)[1-9]\.[1-9]((?:(?!\s+MR\*XDX).)+)
See this regex #1 demo or regex #2 demo
Details
(?<!\S) - a whitespace should come right before the current location or start of string
[1-9]\.[1-9] - a digit from 1 to 9, then a ., and then again a digit from 1 to 9
(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
(?=\s+MR\*XDX|$) - .*? will stop matching before the first occurrence of
\s+MR\*XDX - 1+ whitespace and then MR*XDX substring
| - or
$ - end of string.
I have a list of addresses, currently quite unclean. They take the format:
955 - 959 Fake Street
95-99 Fake Street
4-9 M4 Ln
95 - 99 Fake Street
99 Fake Street
What I would like to do is split up the street name and street number. I need a regex expression that is true for
955 - 959
95-99
4-9
95 - 99
99
I currently have this:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
which works for the two digit addresses but does not work for the three or one digit addresses.
Thanks
I'm not sure what you're trying to do here \s*+ but you basically had the answer with the last part [0-9][0-9]+ that would find 2+ digits on the end.
Maybe try this (it's more concise). This searches for 1+ digits instead of 2+
\d+(\s*-\s*\d+)?
You can use braces {2,3} for 2-3 numbers - but also *+ isn't right.
/^(([0-9]{1,3}\s-\s)?[0-9]{1,3})\s/
I nested the braces so you only want the first result from the regex.
it breaks up like this
([0-9]{1,3}\s-\s)?
first, Is there a 1-3 digit number with a space-dash-space - OPTIONAL
then.. does it end in a 1-3 digit number followed by a space.
Starting from your regex:
^[0-9][0-9]\s*+(\s*-\s*[0-9][0-9]+)
You got an extra white space matcher in the second block:
^[0-9][0-9]\s*+(-\s*[0-9][0-9]+)
I would suggest you replace [0-9] with \d
^[\d][\d]\s*+(-\s*[\d][\d]+)
Use a + instead o 2 copies of \d meaning at least one number:
^[\d]+\s*+(-\s*[\d]+)
Make the last block optional, so it matches 99 Fake Address:
^[\d]+\s*+(-\s*[\d]+)?
If you know there's only going to be 1 white space, you could replace \s* with \s?:
^[\d]+\s?(-\s?[\d]+)?
That should match all of them :D
For your example, you can do:
/^(\d+[-\s\d]*)\s/gm
Demo
Explanation:
/^(\d+[-\s\d]*)\s/gm
^ start of line
^ at least 1 digit and as many digits as possible
^ any character of the set -, space, digit
^ zero or more
^ trailing space
^ multiline for the ^ start of line assertion
Another way could be
In [83]: s = '955 - 959 Fake Street'
In [84]: s1 = '95-99 Fake Street'
In [85]: s2 = '95 - 99 Fake Street'
In [86]: s3 = '99 Fake Street'
In [87]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s3)
In [88]: d.group()
Out[88]: '99 '
In [89]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s2)
In [90]: d.group()
Out[90]: '95 - 99'
In [91]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s1)
In [92]: d.group()
Out[92]: '95-99'
In [93]: d = re.search(r'^[0-9]+[ ]*(-[ ]*[0-9]+){0,1}', s)
In [94]: d.group()
Out[94]: '955 - 959'
the character set 0-9 cab be represented by \d like this
d = re.search(r'^[\d]+[ ]*(-[ ]*[\d]+){0,1}', s)
Here, in all the examples, we are searching at the beginning of the string, for a sequence of at least one digit followed by zero or more spaces and optionally followed by at most one sequence of only one - symbol followed by zero or more spaces and at least one or more digits.
Example Data (including newlines)
Alpha1
100
Bravo2
something else
Charlie3
200
Delta4
A==1
Echo5
300
Foxtrot6
I would like to get the output of:
Alpha1 100 Bravo2 something else
Charlie3 200 Delta4 A==1
Echo5 300 Foxtrot6
The pattern is:
AlphaNumeric
Numeric
AlphaNumeric
value that is not a single alphanumeric "word"
The first three parts are easy -- (\w+)\s+(\d+)\s+(\w+)\s+ -- but I don't know how to have the conditional fourth group. Is this possible? If so, how?
This pattern worked for me:
(\w+)\s+(\d+)\s+(\w+)(\s+([\w\s]+ \w+$))?
Before:
After:
Of course, \w includes underscores, so replace \w with [a-z0-9] if necessary.
Update
This pattern is more specific and should be more reliable:
(\w+)\n(\d+)\n(\w+)(\n([^\n]*[^\w\n][^\n]*))?
Before:
After: