C# regex get just number in specific condition - regex

I want to get number in string at specific position, and i cant do this.
example:
STRING:
180 MATTHEW SANDLER DON 30.00 1.361,67 00
181 JOHN 30.00 5.987,00 99
182 LUCY P. 30.00 3.888,98 71
I want to return on each line just the numbers:
1.361,67
5.987,00
3.888,98
Unfortunately the name has a variable number of spaces, otherwise it would be a simple string.Split(' ') problem
Does anyone know how to do it, please?

The following pattern should match the values in your example:
\b\S*,\d+\b
Example:
http://rextester.com/LZVQN62207

If we conceptually define the term you want to match as being the last term before the final two (or more?) numbers at the end of each line, then we can use the following regex pattern:
(\d+\.\d+,\d+) \d+$
The quantity in parenthesis will be captured and made available after the regex has run in C#.
string input = "180 MATTHEW SANDLER DON 30.00 1.361,67 00";
var groups = Regex.Match(input,#"(\d+\.\d+,\d+) \d+$").Groups;
var x1 = groups[1].Value;
Console.WriteLine(x1);
Demo here:
Rextester

Related

Regex matching issue to Test-String

i have a problem and dont get it.
My Regex:
My Test-String:
I have two issues and one general question :)
As you can see in my Test-String the very last (german) Phone Number (the big yellow one in the Test-String attachment) does not match my Regex-Pattern correctly. I dont get it, what is the Problem here? the "0049" fits Group 5, but should fit Group 2, why is that?
My second Problem is, how can i get rid of the spaces before and after every match? (The 7 yellow small circles in the Test-String Attachment)
For copy/paste purposes, here is the Regex and Test-String again:
Regex:
((\+\d{2}|00\d{2})?([ ])?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ])?(\d+)?([ ])?(\d+)?)
Test-String:
Vorwahl 089, die E.123 ebenfalls , also (089) 1234567. Die DIN 5008, also +49 89 1234567 respectivly 0049 89 1234567. Die E.123 empfiehlt, also +49 89 123456 0 respectivly 0049 89 123456 0 oder +49 89 123456 789. Also +49 89 123 456 789. Klammern 089/1234567 und 0151 19406041. Test +49 151 123 456 789 respectivly 0049 151 123 456 789
Last but not at least, my general question:
Is it a good approach to Group each logical part as i did in my example?
A last Information: I validate my Regex with https://regex101.com/ and use it in Python with the re Module.
The thing that makes it unpredictable are the numerous optional groups (..)?.
As first step i recommend replacing ([ ])?(\d+)? as a coupled expression ([ ]?\d+)?, which will avoid spaces at the end of the match - your point #2.
As a second step i recommend coupling the first optional space with the expression of the "national dialling": ((\+|00)\d{2}([ ])?)?. Now we are lucky, because it solves both the space at the beginning and the recognition of the whole number, due to less possible matching options.
The new expression now looks like this:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+)?([ ]?\d+)?)
I now recommend to simplify the last part, if you dont need the single group-values:
(((\+|00)\d{2}([ ])?)?(\()?(\d{2,4})(\))?([-| |/])?(\d{3,})([ ]?\d+){0,2})
For better performance I suggest you remove the parenteses/groups where possible or mark them as non-capturing, if you don't need to have the specific group-values.
In some programming languages you will not need to most outer parenteses, as that is always group 0.

REGEX: Put a space every 3 digits without using " "

Hello !
I've been looking for more than a day now but I can't find an answer, so I'm coming here to ask my problem!
Explanation:
I created a game thanks to a Discord bot which allows to use many functions (Atlas), one of which is the one I will talk about: replace. What I'm trying to do is by using the REGEX, put a space every three digits to format the numbers like this:
Base number:
25
321
54500
78545515201
After formatting:
25
321
54 500
78 545 515 201
But in the replacement section, spaces " " are trimmed from the front and back, so I can't do $1 . However, if I do $1 $2, the space between the two arguments is counted.
So what I'm looking to do is format my numbers using the replacement as $1 $2 so that the space is counted.
If anyone has the solution, I will really thank you!
EDIT: here is the link about the replace function: https://atlas.bot/documentation/tags/replace
You can make use of an empty capture group to assert a position without a char capture so that your replacement can be $1 $2:
(\d)()(?=(\d{3})+(?!\d))
Here it is in JS:
https://regex101.com/r/virtsL/1/
But it's also compatible in PHP (PCRE), Python, and Java.
Attribution: regex originally from https://coderwall.com/p/uccfpq/formatting-currency-via-regular-expression and I just added the empty capture group.
Per your comments, here is a working version of your attempt; slightly modified:
(\d)()(?=(\d\d\d)+(\D|$))
https://regex101.com/r/McrHgj/1/
const inputStr = `
25
321
54500
78545515201
`
const res = inputStr.replace(/(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))/g, " ")
console.log(res)

How to regex German street addresses with numbers (infix)

I've got these two addresses:
Straße des 17 Juni 122a
Str. 545 3
See https://regex101.com/r/2WT48R/5
I need to filter for the street and number.
My desired output would be:
streets = [Straße des, Str. ]
numbers = [17 Juni 122a, 545 3]
This is my regex:
(?<street>[\S ]+?)\s*(?<number>\d+[\w\s\/-]*)$
Output should look like:
streets = [Straße des 17 Juni, Str. 545]
numbers = [122a, 3]
Looks like there's no spaces in the "numbers" part of your regex - you can use that to cut away those extra characters getting stuck in your second capture group.
(?<street>[\S ]+)\s(?<number>\d+\S*$)
By allowing no whitespace in the second capture group, it won't match the numbers 17 or 545 too early.
Demo
EDIT: after seeing your more detailed list of examples on your own demo, the following regex will match the complete set of your test cases:
(?<street>[\S \t]+?) ?(?<number>[\d\s]+[\w-\/]*?$)
Demo
I found one answer by myself:
(?<street>[\S ]+?)\s*(?<number>\d+\s*[a-zA-Z]*\s*([-\/]\s*\d*\s*\w?\s*)*)$
The demo includes several additional test cases.

Find numbers in a sentence by regex

I need a regular expression that will find all the numbers on a sentence.
For example:
"I have 3 bananas and 37 balloons"
I will get:
3
37
"The time is 20:00 and I have 7 tanks"
I will get:
20
00
7
Split your string by [^0-9]+.
JAVA: String[] numbers = "yourString".split("[^0-9]+");
JavaScript: var numbers = "yourString".split(/[^0-9]+/);
PHP: $numbers = preg_split("/[^0-9]+/", "yourString");
The regex itself is as simple as \d+, but you will also need to set a flag to match it globally, the syntax of which depends on the programming language or software you are using.
EDIT: Some examples:
Python:
import re
re.findall(r"\d+", my_string)
JavaScript:
myString.match(/\d+/g)
The regex you are looking for is [0-9]+ or \d+. You should then get multiple matches for the sentence.

Extract a portion of text using RegEx

I would like to extract portion of a text using a regular expression. So for example, I have an address and want to return just the number and streets and exclude the rest:
2222 Main at King Edward Vancouver BC CA
But the addresses varies in format most of the time. I tried using Lookbehind Regex and came out with this expression:
.*?(?=\w* \w* \w{2}$)
The above expressions handles the above example nicely but then it gets way too messy as soon as commas come into the text, postal codes which can be a 6 character string or two 3 character strings with a space in the middle, etc...
Is there any more elegant way of extracting a portion of text other than a lookbehind regex?
Any suggestion or a point in another direction is greatly appreciated.
Thanks!
Regular expressions are for data that is REGULAR, that follows a pattern. So if your data is completely random, no, there's no elegant way to do this with regex.
On the other hand, if you know what values you want, you can probably write a few simple regexes, and then just test them all on each string.
Ex.
regex1= address # grabber, regex2 = street type grabber, regex3 = name grabber.
Attempt a match on string1 with regex1, regex2, and finally regex3. Move on to the next string.
well i thot i'd throw my hat into the ring:
.*(?=,? ([a-zA-Z]+,?\s){3}([\d-]*\s)?)
and you might want ^ or \d+ at the front for good measure
and i didn't bother specifying lengths for the postal codes... just any amount of characters hyphens in this one.
it works for these inputs so far and variations on comas within the City/state/country area:
2222 Main at King Edward Vancouver, BC, CA, 333-333
555 road and street place CA US 95000
2222 Main at King Edward Vancouver BC CA 333
555 road and street place CA US
it is counting at there being three words at the end for the city, state and country but other than that it's like ryansstack said, if it's random it won't work. if the city is two words like New York it won't work. yeah... regex isn't the tool for this one.
btw: tested on regexhero.net
i can think of 2 ways you can do this
1) if you know that "the rest" of your data after the address is exactly 2 fields, ie BC and CA, you can do split on your string using space as delimiter, remove the last 2 items.
2) do a split on delimiter /[A-Z][A-Z]/ and store the result in array. then print out the array ( this is provided that the address doesn't contain 2 or more capital letters)