Can I extract the values using regex in a single regex string - regex

Any help will be appreciated. I have written a regex which fails in some edge cases. Not sure if there is a way to handle this.
I am trying to extract the values which having a 1.1 and 1.2 etc etc.
The regex I am using is
"[1-9]\.[1-9]([^\s]+)" If i use it it extracts the first three values but the 4.1 which has a space, only part is extracted. If i use "[1-9]\.1.*[(XDX)]$" It starts to capture the whole line.
Currently I have written a logic which check for MR and splits it and puts in array which is very inefficient way to do.
Let me know if you can think of a better solution than this one.
GIBBERISH
1.1CDDAX/SXEVEN MR*XDX 2.1CDDAX/JEROME MR*XDX
3.1CDDAX/SIXM MR*XDX 4.1CDDAX AMX/SIXM MR*XDX
1 OXP EY 31SED W PK3 MEL/REDOOK DEOPRE 31SED21 XO XRXVEL DEF
EXPRESSA VERO IN IIS AETATIBUS, QUAE IAM CONFIRMATAE SUNT. ATQUI
PERSPICUUM EST HOMINEM E CORPORE ANIMOQUE CONSTARE,
CUM PRIMAE SINT ANIMI PARTES, SECUNDAE CORPORIS. TUM QUINTUS:
EST PLANE, PISO, UT DICIS, INQUIT. BONA AUTEM CORPORIS HUIC SUNT,
QUOD POSTERIUS POSUI, SIMILIORA. ILLA TAMEN SIMPLICIA

You may use
(?<!\S)[1-9]\.[1-9](.*?)(?=\s+MR\*XDX|$)
Or,
(?<!\S)[1-9]\.[1-9]((?:(?!\s+MR\*XDX).)+)
See this regex #1 demo or regex #2 demo
Details
(?<!\S) - a whitespace should come right before the current location or start of string
[1-9]\.[1-9] - a digit from 1 to 9, then a ., and then again a digit from 1 to 9
(.*?) - Capturing group 1: any 0+ chars other than line break chars, as few as possible
(?=\s+MR\*XDX|$) - .*? will stop matching before the first occurrence of
\s+MR\*XDX - 1+ whitespace and then MR*XDX substring
| - or
$ - end of string.

Related

Notepad++ add new line above changing syntax with replace

I have a constant syntax of "Se " but there is a number in front of it that changes. I want to add a newline \n before the number. I've tried using \c to address any character (for the changing number) during replace, I don't know how to get the number part to copy over or work.
this is what it currently looks like
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
I want the new line to be above any item that includes "Se", so that it looks like this
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
this is what i've tried so far (not including parenthesis)
REPLACE TOOL
Find what: [\C Se ]
Replace with: [\n\C Se ]
✓ = Regular expression
but this is what I get
1 hinge 2pk
C Se wall cabinet
4 door 15x40
How do I get the number to the left of "Se" to copy down (as this number is always changing)
You can use:
^\d+\h+Se\b
^ Start of string
\d+ Match 1+ digits
\h+ Match 1+ spaces
Se\b Match Se followed by a word boundary
Regex demo
In the replacement use a newline and the full match \n$0
Find what:
^\d+\h+Se\b
Replace with
\n$0
Well, try this simple code, hope it will help...
Find:^(\d.*? Se .*\n)
Replace with:\n$1 or \n\1

Regex to capture alpha numeric before pipe separated

I've been trying to create a regex with space & alpha numeric values.
Below Im sharing the sample String.
Manchester United 8547|12345678910
|12345678910
Manchester |12345678910
124587933 |12345678910
8457 Manchester United|12345678910
Manchester United|12345678910
I want to capture everything before pipe(|) separated. At times there is a possibility of complete space and no alpha numeric values before pipe(|) which I've shown in 2nd example. Regex should not capture pipe(|) and next numerical values(12345678910).
I've tried below regex but none are working for me.
^.*$
^[\s\w\d]+$
[a-zA-Z0-9\s]+
[a-zA-Z0-9\s\W]+
^[\sa-z|A-Z|0-9]+$
^[\sa-z|A-Z|0-9]+$
[^\s]*$
([^\"]*)
^[a-zA-Z0-9]$
^([^?]*)$
.+?(?=\w)
\s[a-zA-Z0-9]+
^[\sa-zA-Z0-9]+
I need a full match & not group match
for example if I try for
Manchester 8457 then regex would be Manchester \d+. This gives me full match & not group match.
You can try this.
input.substring(0,input.indexOf("|"))
If you want to match alphanumeric before the pipe and not get a group match, but a match only, you can use a character class with a positive lookahead (?=\|) (if that is supported) to assert the pipe at the right.
^[A-Za-z0-9 ]+(?=\|)
Regex demo
Assuming that every line would have a pipe, you could split the input string on CRLF, and then extract the portion to the left of the pipe:
String input = "Manchester United 8547|12345678910\n |12345678910\nManchester |12345678910\n124587933 |12345678910\n8457 Manchester United|12345678910\n Manchester United|12345678910\n";
String[] parts = input.split("\r?\n");
List<String> contents = Arrays.stream(parts)
.map(x -> x.split("\\|")[0].trim())
.collect(Collectors.toList());
System.out.println(contents);
This prints:
[Manchester United 8547, , Manchester, 124587933, 8457 Manchester United,
Manchester United]
for getting alphanumeric part use the following
^\s*\w(.+?)\|
This should answer your question i guess.
^(.+?)\|
Please use this and try it checks only for the beginning string.
its is for the pipe
Try it here

regex return everything up to the first space after nth character

I have a list of product names and I want to shorten them (Short Name). I need a regex that will return the first word if it is more than 5 characters and the first two words if it is 5 characters or less.
Product Name Short Name
BABY WIPES MIS /ALOE BABY WIPES
PKU GEL PAK PKU GEL
CA ASCORBATE TAB 500MG CA ASCORBATE
SOD SUL/SULF CRE 10-2% SOD SUL/SULF
ASPIRIN TAB 81MG EC ASPIRIN
IRON TAB 325MG IRON TAB
PEDA PEDA
I initially used:
^([^ \t]+).*
but it only returns the first word so BABY WIPES MIS /ALOE would be BABY. I then tried:
.....([^ \t]+)
But this appears to not work for names less than 5 characters. Any help would be greatly appreciated.
Brief
Your try is close, however, since you negated spaces and tabs, you were unable to move past the first word.
Code
See code in use here
^(\S{1,5}[ \t]*?\S+).*$
Note: The link uses the following shortened regex. \h may not work in your flavour of regex, which is why the code above is posted as well.
^(\S{1,5}\h*?\S+).*$
Super-simplified it becomes ^\S{1,5}\h*?\S+ (without capture groups and .*$ as the OP initially used.)
Results
Input
BABY WIPES MIS /ALOE
PKU GEL PAK
CA ASCORBATE TAB 500MG
SOD SUL/SULF CRE 10-2%
ASPIRIN TAB 81MG EC
IRON TAB
PEDA
Output
BABY WIPES
PKU GEL
CA ASCORBATE
SOD SUL/SULF
ASPIRIN
IRON TAB
PEDA
Explanation
^ Assert position at the start of a line
(\S{1,5}[ \t]*?\S+) Capture group doing the following
\S{1,5} Match any non-whitespace character between 1 and 5 times
[ \t]*? Match space or tab characters any number of times, but as few as possible (note in PCRE regex, this can be replaced with \h*? to make it shorter)
\S+ Match any non-whitespace character between one and unlimited times
.* Match any character (except newline character assuming s modifier is off - it should be for this problem)
$ Assert position at the end of a line
You can use a regex like this:
^\S{1,5} \S+|^\S+
or
^\S{1,5} ?\S*
Working demo
By the way, if you want to replace a full line with the shortened version, then you can use this regex instead:
(^\S{1,5} \S+|^\S+).*
or
(^\S{1,5} ?\S*).*
With the replacement string $1 or \1 depending on your regex engine.
Working demo

regex for excluding text at end of string

I have a regular expression (built in adobe javascript) which finds string which can be of varying length.
The part I need help with is when the string is found I need to exclude the extra characters at the end, which will always end with 1 1.
This is the expression:
var re = new RegExp(/WASH\sHANDLING\sPLANT\s[-A-z0-9 ]{2,90}/);
This is the result:
WASH HANDLING PLANT SIZING STATION SERVICES SHEET 1 1 75 MOR03 MUP POS SU W ST1205 DWG 0001
I need to modify the regex to exclude the string in bold beginning with the 1 1.
Keep in mind the string searched for can be of varying length hence the {2,90}
Can anyone please advise assistance in modifying the REGEX to exclude all string from 1 1
Thank you
You may use a positive lookahead and keep the same functionality:
/WASH\sHANDLING\sPLANT\s[-A-Za-z0-9 ]{2,90}(?=\b1 1\b)/
^^^^^^^^^^^
The (?=\b1 1\b) lookahead requires 1 1 as whole "word" after your match.
See the regex demo
Also, note that [A-z] matches more than just letters.

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})