REGEX: Put a space every 3 digits without using " " - regex

Hello !
I've been looking for more than a day now but I can't find an answer, so I'm coming here to ask my problem!
Explanation:
I created a game thanks to a Discord bot which allows to use many functions (Atlas), one of which is the one I will talk about: replace. What I'm trying to do is by using the REGEX, put a space every three digits to format the numbers like this:
Base number:
25
321
54500
78545515201
After formatting:
25
321
54 500
78 545 515 201
But in the replacement section, spaces " " are trimmed from the front and back, so I can't do $1 . However, if I do $1 $2, the space between the two arguments is counted.
So what I'm looking to do is format my numbers using the replacement as $1 $2 so that the space is counted.
If anyone has the solution, I will really thank you!
EDIT: here is the link about the replace function: https://atlas.bot/documentation/tags/replace

You can make use of an empty capture group to assert a position without a char capture so that your replacement can be $1 $2:
(\d)()(?=(\d{3})+(?!\d))
Here it is in JS:
https://regex101.com/r/virtsL/1/
But it's also compatible in PHP (PCRE), Python, and Java.
Attribution: regex originally from https://coderwall.com/p/uccfpq/formatting-currency-via-regular-expression and I just added the empty capture group.
Per your comments, here is a working version of your attempt; slightly modified:
(\d)()(?=(\d\d\d)+(\D|$))
https://regex101.com/r/McrHgj/1/

const inputStr = `
25
321
54500
78545515201
`
const res = inputStr.replace(/(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))/g, " ")
console.log(res)

Related

Regex for valid SSN or other ID

I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999
I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999
Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$

Suite 400 - 100 ABCDEF (Capture values from 100)

I need a regular expression that would find 100 ABCDEF from input string Suite 400 - 100 ABCDEF. It should be noted that I created a regex as below but it picks the value from Suite.
[^-\s]\d.+
Just put $ at the end of your regex. $ means "end of line".
Also, replace the dot with [^-], so it will match only non-hyphens:
[^-\s]?\d[^-]+$
Fiddle: http://refiddle.com/refiddles/5b9a88ef75622d4ca9590000
Since you're trying to match a US street address, you should try matching a number followed by one or more words instead:
\d+(?:\s+[A-Za-z.]+)+
Demo: https://regex101.com/r/y6n5jD/1

regex working with long lines

I got a lot of these strings in one txt-file:
X00NAP-0111-OG02Flur-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 0111-HGS DE 10.100.100.100 8
X006NAP-0500-EG00Grossrau-A 2 AIR-CAP2702I-E-K9 50:0f:80:94:82:c0 HGS 0500 DE 10.100.100.100 1
Y008NAP-8399-OG04OE3020-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 HGS Erfurter Hof DE 10.100.100.100 1
A1234NAP-4101-OG02Raum237-A 2 AIR-CAP2602I-E-K9 00:b8:b8:b8:7d:b8 AP 2 Anmeldung V DE 10.100.100.100 0
I am only interested in the first string and the number on the end of the lines. The number can be max. 99
So in the end I would like to have a output like this:
X00NAP-0111-OG02Flur-A 8
X006NAP-0500-EG00Grossrau-A 1
Y008NAP-8399-OG04OE3020-A 1
A1234NAP-4101-OG02Raum237-A 0
I tried a lot of things with regex, but nothing worked really.
Here is a general regex solution:
Find:
^([^\s]*).*(\d+)$
Replace:
$1 $2
The idea here is to match the first string and final number as capture groups, which are indicated by the two terms in the pattern surrounded by parentheses. These capture groups are made available in the replacement as $1 and $2 (sometimes \1 and \2, depending on the regex tool/engine). We can replace each line with these capture groups to leave you with the output you expect.
Note that this may "trash" the original file, but if you are using a tool like Notepad++, you can simply copy this result out, then undo the replacement, or just close the original file without saving.
Demo
The simplest way I can think of is:
Find: " .* "
Replace: " "
This replaces everything from the first space to the last space with a single space, achieving your goal.
Note: Quotes are only there to help show where spaces are in the regex.

Regex - get string after full date and before standard text

I'm stuck on another regex. I'm extracting email data. In the below example, only the time, date and message in quotes changes.
Message Received 6:06pm 21st February "Hello. My name is John Smith" Some standard text.
Message Received 8:08pm 22nd February "Hello. My name is "John Smith"" Some standard text.
How can I get the message only if I need to start with the positive lookbehind, (?<=Message Received ) to begin searching at this particular point of the data? The message will always start and end with quotes but the user is able to insert their own quotes as in the second example.
You can just use a negated charcter class in a capturing group:
/Message Received.*?"([^\n]+)"/
Snippet:
$input = 'Message Received 6:06pm 21st February "Hello. My name is John Smith" Some standard text.
Message Received 8:08pm 22nd February "Hello. My name is "John Smith"" Some standard text.}';
preg_match_all('/Message Received.*?"([^\n]+)"/', $input, $matches);
foreach ($matches[1] as $match) {
echo $match . "\r\n";
}
Output:
> Hello. My name is John Smith
> Hello. My name is "John Smith"
For extracting message in between double quotes.
(?=Message Received)[^\"]+\K\"[\w\s\"\.]+\"
Regex demo
You capture the message in a group
(?<=Message Received)[^"]*(.*)(?=\s+Some standard text)
Two out of the other three posted answers on this page provide an incorrect result. None of the other posted answers are as efficient as they could be:
To correctly extract the substring between the outer double quotes, use one of the following patterns:
/Message Received[^"]+"\K[^\n]+(?=")/ (No capture group, takes 132 steps, Demo)
/Message Received[^"]+"([^\n]+)"/ (Capture group, takes 130 steps, Demo)
Both patterns provide maximum accuracy and efficiency using negated character classes leading up to and including the targeted substring. The first pattern reduces preg_match_all()'s output array bloat by 50% by using \K instead of a capture group. For these reasons, one of these patterns should be used in your project. As your input string increases in size, my patterns provide increasingly better performance versus the other posted patterns.
PHP Implementation:
$in represents your input string.
Pattern #1 Method:
var_export(preg_match_all('/Message Received[^"]+"\K[^\n]+(?=")/',$in,$out)?$out[0]:[]);
// notice the output array only has elements in the fullstring subarray [0]
Output:
array (
0 => 'Hello. My name is John Smith',
1 => 'Hello. My name is "John Smith"',
)
Pattern #2 Method:
var_export(preg_match_all('/Message Received[^"]+"([^\n]+)"/',$in,$out)?$out[1]:[]);
// notice because a capture group is used, [0] subarray is ignored, [1] is used
Output:
array (
0 => 'Hello. My name is John Smith',
1 => 'Hello. My name is "John Smith"',
)
Both methods provide the desired output.
Anirudha's incorrect pattern: /(?<=Message Received)[^"]*(.*)(?=\s+Some standard text)/ (345 steps + a capture group + includes the unwanted outer double quotes)
Josh Crozier's pattern: /Message Received.*?"([^\n]+)"/ (174 steps + a capture group)
Sahil Gulati's incorrect pattern: /(?=Message Received)[^\"]+\K\"[\w\s\"\.]+\"/ (109 steps + includes the unwanted outer double quotes + unnecessarily escapes characters in the pattern)

regex tutorial, How can I improve this

I needed a utililty function earlier today to strip some data out of a file and wrote an appaling regular expresion to do it. The input was a file with lots of line with the format:
<address> <11 * ascii character value> <11 characters>
00C4F244 75 6C 74 73 3E 3C 43 75 72 72 65 ults><Curre
I wanted to strip out everything bar the 11 characters at the end and used the following expression:
"^[0-9A-F+]{8}[\\s]{2}[0-9A-F\\s]{34}"
This matched to the bits I didn't want which I then removed from the original string. I'd like to see how you'd do this but the particular areas I couldn't get working were:
1: having the regex engine return the characters I wanted rather than the characters I didn't and
2: finding a way of repeating the match on a single ascii value followed by the space (eg "75 " = [0-9A-F]{2}[\s]{1}?) and repeating that 11 times rather than grabbing 34 characters.
Looking at it again the easiest thing to do would be to match to the last 11 characters of each input line but this isn't very flexible and in the interests of learning regex I would like to see how you can match through from the start of the sequence.
Edit: Thanks guys, this is what I wanted:
"(?:^[0-9A-F]{8} )(?:[0-9A-F]{2} ){11} (.*)"
Wish I could turn more than one of you green.
As the file has a fixed format, you could use this regular expression to just match the last 11 characters.
^.{44}(.{11})
Last eleven is:
...........$
or:
.{11}$
Matching a hex byte + space and repeat eleven times:
([0-9A-Fa-f]{2} ){11}
1) ^[0-9A-F+]{8}[\s]{2}[0-9A-F\s]{34}(.*)
Parens are used for grouping with extraction. How you retrieve it depends on your language context, but now some sort of $1 is set to everything after the initial pattern.
2) ^[0-9A-F+]{8}[\s]{2}(?:[0-9A-F\s]){11}\s(.*)
(?:) is grouping without extraction. So (?:[0-9A-F\s]){11} considers the subpattern there as a unit and looks for it repeated 11 times.
I'm assuming PCRE here, by the way.
The address and ascii char value are all hex so:
^[0-9A-F\s]{42}
Matching the end of the line would be
.{11}$
To match only the end, you can use a positive look behind.
"(?<=(^[0-9A-F+]{8}[\\s]{2}[0-9A-F\\s]{34}))(.*?)$"
This would match any character until the end of the line, providing that it is preceded by the "look behind" expression.
(?<=....) defines a condition that must be met before matching is possible.
I am a bit short of time, but if you look on the net for any tutorial that contain the words "regex" and "lookbehind", you will find good stuff (if a regex tutorial covers look ahead/behind, it will usually be pretty complete and advanced).
Another advice is to get a regex training tool and play with it. Have a look at this excellent Regex designer.
If you're using Perl, you could also use unpack(), to get each element.
my #data;
open my $fh, '<', $filename or die;
for my $line(<$fh>){
my($address,#list) = unpack 'a8xx(a2x)11xa11', $line;
my $str = pop #list;
# unpack the hexadecimal bytes
my $data = join '', map { pack 'H2',$_ } #list;
die unless $data eq $str;
push #data, [$address,$data,$str];
}
close $fh;
I also went ahead and converted the 11 hexadecimal codes back into a string, using pack().