Regex to remove multiple white-spaces stops at first space - regex

I am trying to find the correct regex to remove all white spaces for different formats of strings like:
A 41 FR 38 ( should become A41FR38)
DGT 4687 P ( should become DGT4687P)
POL 789 EU ( should become POL789EU )
I have tried:
[^\s]+
[^\d]+
and many others, none seem to work, they would only stop at the first space? For example POL 789 EU would become POL, and W 85 EU would become W
https://regex101.com/r/kA1sW4/1
Is this possible?
- EDIT -
I have just discovered that the correct different strings would be HTML outputs. Such as :
.html">W 45 B 1 A 401 L</a>
so I have just tryed: html">([^<]*) and it outputs :
W 45 B 1 A 401 L
(still with spaces) What should I add to remove the spaces?
demo (still with spaces) https://regex101.com/r/kA1sW4/2

Even simpler simply use str_replace
echo str_replace(' ','','A 41 FR 38');
Results in:
A41FR38

([^\s]+)/g
The g flag indicates that the regular expression should be tested against all possible matches in a string.

Related

Regex matching issue to Test-String

i have a problem and dont get it.
My Regex:
My Test-String:
I have two issues and one general question :)
As you can see in my Test-String the very last (german) Phone Number (the big yellow one in the Test-String attachment) does not match my Regex-Pattern correctly. I dont get it, what is the Problem here? the "0049" fits Group 5, but should fit Group 2, why is that?
My second Problem is, how can i get rid of the spaces before and after every match? (The 7 yellow small circles in the Test-String Attachment)
For copy/paste purposes, here is the Regex and Test-String again:
Regex:
((\+\d{2}|00\d{2})?([ ])?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ])?(\d+)?([ ])?(\d+)?)
Test-String:
Vorwahl 089, die E.123 ebenfalls , also (089) 1234567. Die DIN 5008, also +49 89 1234567 respectivly 0049 89 1234567. Die E.123 empfiehlt, also +49 89 123456 0 respectivly 0049 89 123456 0 oder +49 89 123456 789. Also +49 89 123 456 789. Klammern 089/1234567 und 0151 19406041. Test +49 151 123 456 789 respectivly 0049 151 123 456 789
Last but not at least, my general question:
Is it a good approach to Group each logical part as i did in my example?
A last Information: I validate my Regex with https://regex101.com/ and use it in Python with the re Module.
The thing that makes it unpredictable are the numerous optional groups (..)?.
As first step i recommend replacing ([ ])?(\d+)? as a coupled expression ([ ]?\d+)?, which will avoid spaces at the end of the match - your point #2.
As a second step i recommend coupling the first optional space with the expression of the "national dialling": ((\+|00)\d{2}([ ])?)?. Now we are lucky, because it solves both the space at the beginning and the recognition of the whole number, due to less possible matching options.
The new expression now looks like this:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+)?([ ]?\d+)?)
I now recommend to simplify the last part, if you dont need the single group-values:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+){0,2})
For better performance I suggest you remove the parenteses/groups where possible or mark them as non-capturing, if you don't need to have the specific group-values.
In some programming languages you will not need to most outer parenteses, as that is always group 0.

REGEX: Put a space every 3 digits without using " "

Hello !
I've been looking for more than a day now but I can't find an answer, so I'm coming here to ask my problem!
Explanation:
I created a game thanks to a Discord bot which allows to use many functions (Atlas), one of which is the one I will talk about: replace. What I'm trying to do is by using the REGEX, put a space every three digits to format the numbers like this:
Base number:
25
321
54500
78545515201
After formatting:
25
321
54 500
78 545 515 201
But in the replacement section, spaces " " are trimmed from the front and back, so I can't do $1 . However, if I do $1 $2, the space between the two arguments is counted.
So what I'm looking to do is format my numbers using the replacement as $1 $2 so that the space is counted.
If anyone has the solution, I will really thank you!
EDIT: here is the link about the replace function: https://atlas.bot/documentation/tags/replace
You can make use of an empty capture group to assert a position without a char capture so that your replacement can be $1 $2:
(\d)()(?=(\d{3})+(?!\d))
Here it is in JS:
https://regex101.com/r/virtsL/1/
But it's also compatible in PHP (PCRE), Python, and Java.
Attribution: regex originally from https://coderwall.com/p/uccfpq/formatting-currency-via-regular-expression and I just added the empty capture group.
Per your comments, here is a working version of your attempt; slightly modified:
(\d)()(?=(\d\d\d)+(\D|$))
https://regex101.com/r/McrHgj/1/
const inputStr = `
25
321
54500
78545515201
`
const res = inputStr.replace(/(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))/g, " ")
console.log(res)

How to regex German street addresses with numbers (infix)

I've got these two addresses:
Straße des 17 Juni 122a
Str. 545 3
See https://regex101.com/r/2WT48R/5
I need to filter for the street and number.
My desired output would be:
streets = [Straße des, Str. ]
numbers = [17 Juni 122a, 545 3]
This is my regex:
(?<street>[\S ]+?)\s*(?<number>\d+[\w\s\/-]*)$
Output should look like:
streets = [Straße des 17 Juni, Str. 545]
numbers = [122a, 3]
Looks like there's no spaces in the "numbers" part of your regex - you can use that to cut away those extra characters getting stuck in your second capture group.
(?<street>[\S ]+)\s(?<number>\d+\S*$)
By allowing no whitespace in the second capture group, it won't match the numbers 17 or 545 too early.
Demo
EDIT: after seeing your more detailed list of examples on your own demo, the following regex will match the complete set of your test cases:
(?<street>[\S \t]+?) ?(?<number>[\d\s]+[\w-\/]*?$)
Demo
I found one answer by myself:
(?<street>[\S ]+?)\s*(?<number>\d+\s*[a-zA-Z]*\s*([-\/]\s*\d*\s*\w?\s*)*)$
The demo includes several additional test cases.

Match Regular Expressions patterns if exist, else

Here is what I am trying to achieve. Given a certain set of data I am trying to get the entire row that contains the matching regular expressions that I have.
Essentially, given a data set such as this
AFAM 002A AFAM & DEV AM HIS/GV 03 46493 3 LEC D2 70 P 20/15 W 1800-2045 08/24/16-12/12/16 WSQ 207 K WHITE
AFAM 102 AFRO-AMER MUSIC 01 47200 3 LEC P 5/30 W 1800-2045 08/24/16-12/12/16 MUS 250 V GROCE-ROBERTS
AFAM 125 THE BLACK FAMILY 01 47198 3 LEC P 16/40 M 1800-2045 08/24/16-12/12/16 CCB 101 S MILLNER
AFAM 152 THE BLACK WOMAN 01 47199 3 LEC P 8/40 T 1800-2045 08/24/16-12/12/16 CL 111 R WILSON
AFAM 159 ECON ISSUES BLKCM 01 47197 3 LEC P 11/40 MW 1330-1445 08/24/16-12/12/16 CL 234 R WILSON
AFAM 180 INDIVIDUAL STUDIES 01 46982 3 SUP P 0/10 TBA TBA 08/24/16-12/12/16
The regex that I have created basically groups the following into..
Course ID eg. AFAM 002A
Course Name eg. AFRO-AMER MUSIC
Start date
end date
Professor Name (This is the value that I want to be optional)
The problem that I am having now is that for the optional value, instead of what I what which is to check if it exist, if not then leave empty. If someone could show me the correct way to do this I would greatly appreciated it.
Essentially this part of my regular expression ([A-Z][\s][A-Z]+[-]*[A-Z]+)? Needs to be included if it exist, I understand that that's how the ? operator is supposed to work, however I cant seem to find the right keyword for this question so here I am
([A-Z]+[\s][0-9]+[A-Z]*)(.+)[\s][0-9]+[\s][0-9]+.+(\d\d\/\d\d\/\d\d)-(\d\d\/\d\d\/\d\d)[\s]([A-Z][\s][A-Z]+[-]*[A-Z]+)?
The Expected results for this dataset for the last two rows should be
{ [ (AFAM 159), (ECON ISSUES BLKCM), (08/24/16), (12/12/16), (R WILSON)],
[(AFAM 180), (INDIVIDUAL STUDIES), (08/24/16), (12/12/16), ()]
}
Your regex does not match CL 234 in the last but one line. You need to consume it. However, just adding .*? won't work, you need to make your optional pattern obligatory (remove ?) and wrap .*?([A-Z]\s[A-Z]+-*[A-Z]+) with an optional non-capturing group (?:....).
([A-Z]+\s\d+[A-Z]*)(.+?)\s\d+\s\d+.+?(\d\d\/\d\d\/\d\d)-(\d\d\/\d\d\/\d\d)\s(?:.*?([A-Z]\s[A-Z]+-*[A-Z]+))?
See the regex demo.

Match phone numbers with lengths between 8-16 digits, ignoring ()+-

Consider the following:
+12 34 456 432
(12) 34 567 124
1234 56 78 90
(1234) 567 890
1234-567-890
1234 - 567 - 890
12 34 56 78
12-34-56-78
Assume these are all valid phone number structures
Can a regex be used to express: find at least 8 numbers,but not more than 16 and ignore spaces, round brackets, the plus symbol(once) and the minus.
My current working sample is a mess:
^([\+|\(]{1,2})?+(\d{2,4})+([ |-|\)]{1,2})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})?$
Even if phone number validation is recommended against. Is there not a simpler regex syntax for these things?
To just account for the number of digits and ingore the -, ), ( or spaces (allowing a + at the beginning), you can use the following regex:
^\+?(?:[ ()-]*\d){8,16}$
It matches
^ - start of string
\+? - one or zero +
(?:[ ()-]*\d){8,16} - 8 to 16 sequences of...
[ ()-]* - 0 or more -, ), ( or a space characters
\d - a digit
$ - end of string
See the regex demo
This may ease your task.
First, remove everything that is not a number:
myString = myString.replace(/\D/g,'');
You'll get this:
1234456432
1234567124
1234567890
1234567890
1234567890
1234567890
12345678
12345678
Then just check for length:
if(myString.length >= 0 && myString.length <=16)
// Do stuff
Using preg_replace fetch numbers only, check for the valid length
<?php
$ph = "(12) 34 567 124";
$len = strlen(preg_replace('/[^0-9]+/', '', $ph));
if($len >=8 && $len <=16)
echo "Valid";
else
echo "Invalid";
Don't even think about it. Phone numbers are complicated. They are hugely complicated. Google has a decent library to handle phone numbers named libPhoneNumber.
And excuse me, but ignoring the "+" makes whatever you are doing totally, absolutely wrong. A plus is followed by the country code of some country, followed by a local phone number within that country (which needs to be parsed according to the rules of that country, and there are about 200). Without the "+", you have a phone number according to the local rules, and you need to find out which local rules apply. Which means your number can start with a code for dialing a foreign exchange instead of the "+", otherwise it is formatted according to local rules.
As a result, a number may be valid with the "+" and invalid without it or vice versa, and most likely refers to a different actual phone in totally different countries with or without the "+".