Try to split a string with particular regex expression - regex

i'm trying to split a string using 2 separator and regex. My string is for example
"test 10 20 middle 30 - 40 mm".
and i would like to split in ["test 10", "20 middle 30", "40 mm"]. So, splittin dropping ' - ' and the space between 2 digits.
I tried to do
result = re.split(r'[\d+] [\d+]', s)
> ['test 1', '0 middle 30 - 40 mm']
result2 = re.split(r' - |{\d+} {\d+}', s)
> ['test 10 20 middle 30', '40 mm']
Is there any reg expression to split in ['test 10', '20 middle 30', '40 mm'] ?

You may use
(?<=\d)\s+(?:-\s+)?(?=\d)
See the regex demo.
Details
(?<=\d) - a digit must appear immediately on the left
\s+ - 1+ whitespaces
(?:-\s+)? - an optional sequence of a - followed with 1+ whitespaces
(?=\d) - a digit must appear immediately on the right.
See the Python demo:
import re
text = "test 10 20 middle 30 - 40 mm"
print( re.split(r'(?<=\d)\s+(?:-\s+)?(?=\d)', text) )
# => ['test 10', '20 middle 30', '40 mm']

Data
k="test 10 20 middle 30 - 40 mm"
Please Try
result2 = re.split(r"(^[a-z]+\s\d+|\^d+\s[a-z]+|\d+)$",k)
result2
**^[a-z]**-match lower case alphabets at the start of the string and greedily to the left + followed by:
**`\s`** white space characters
**`\d`** digits greedily matched to the left
| or match start of string with digits \d+ also matched greedily to the left and followed by:
`**\s**` white space characters
**`a-z`** lower case alphabets greedily matched to the left
| or match digits greedily to the left \d+ end the string $
Output

Related

Remove only non-leading and non-trailing spaces from a string in Ruby?

I'm trying to write a Ruby method that will return true only if the input is a valid phone number, which means, among other rules, it can have spaces and/or dashes between the digits, but not before or after the digits.
In a sense, I need a method that does the opposite of String#strip! (remove all spaces except leading and trailing spaces), plus the same for dashes.
I've tried using String#gsub!, but when I try to match a space or a dash between digits, then it replaces the digits as well as the space/dash.
Here's an example of the code I'm using to remove spaces. I figure once I know how to do that, it will be the same story with the dashes.
def valid_phone_number?(number)
phone_number_pattern = /^0[^0]\d{8}$/
# remove spaces
number.gsub!(/\d\s+\d/, "")
return number.match?(phone_number_pattern)
end
What happens is if I call the method with the following input:
valid_phone_number?(" 09 777 55 888 ")
I get false because line 5 transforms the number into " 0788 ", i.e. it gets rid of the digits around the spaces as well as the spaces. What I want it to do is just to get rid of the inner spaces, so as to produce " 0977755888 ".
I've tried
number.gsub!(/\d(\s+)\d/, "") and number.gsub!(/\d(\s+)\d/) { |match| "" } to no avail.
Thank you!!
If you want to return a boolean, you might for example use a pattern that accepts leading and trailing spaces, and matches 10 digits (as in your example data) where there can be optional spaces or hyphens in between.
^ *\d(?:[ -]?\d){9} *$
For example
def valid_phone_number?(number)
phone_number_pattern = /^ *\d(?:[ -]*\d){9} *$/
return number.match?(phone_number_pattern)
end
See a Ruby demo and a regex demo.
To remove spaces & hyphen inbetween digits, try:
(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)
See an online regex demo
(?: - Open non-capture group;
d+ - Match 1+ digits;
| - Or;
\G(?!^)\d+ - Assert position at end of previous match but (negate start-line) with following 1+ digits;
)\K - Close non-capture group and reset matching point;
[- ]+ - Match 1+ space/hyphen;
(?=\d) - Assert position is followed by digits.
p " 09 777 55 888 ".gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, '')
Prints: " 0977755888 "
Using a very simple regex (/\d/ tests for a digit):
str = " 09 777 55 888 "
r = str.index(/\d/)..str.rindex(/\d/)
str[r] = str[r].delete(" -")
p str # => " 0977755888 "
Passing a block to gsub is an option, capture groups available as globals:
>> str = " 09 777 55 888 "
# simple, easy to understand
>> str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }
=> " 0977755888 "
# a different take on #steenslag's answer, to avoid using range.
>> s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s
=> " 0977755888 "
Benchmark, not that it matters that much:
n = 1_000_000
puts(Benchmark.bmbm do |x|
# just a match
x.report("match") { n.times {str.match(/^ *\d(?:[ -]*\d){9} *$/) } }
# use regex in []=
x.report("[//]=") { n.times {s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s } }
# use range in []=
x.report("[..]=") { n.times {s = str.dup; r = s.index(/\d/)..s.rindex(/\d/); s[r] = s[r].delete(" -"); s } }
# block in gsub
x.report("block") { n.times {str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }} }
# long regex
x.report("regex") { n.times {str.gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, "")} }
end)
Rehearsal -----------------------------------------
match 0.997458 0.000004 0.997462 ( 0.998003)
[//]= 1.822698 0.003983 1.826681 ( 1.827574)
[..]= 3.095630 0.007955 3.103585 ( 3.105489)
block 3.515401 0.003982 3.519383 ( 3.521392)
regex 4.761748 0.007967 4.769715 ( 4.772972)
------------------------------- total: 14.216826sec
user system total real
match 1.031670 0.000000 1.031670 ( 1.032347)
[//]= 1.859028 0.000000 1.859028 ( 1.860013)
[..]= 3.074159 0.003978 3.078137 ( 3.079825)
block 3.751532 0.011982 3.763514 ( 3.765673)
regex 4.634857 0.003972 4.638829 ( 4.641259)

Using regexp to find a reoccurring pattern in MATLAB

input = ' 12Z taj 20501 jfdjda OCNL jtjajd ptpa 23Z jfdakdkf tjajdfk OCNL fdkadja 02Z fdjafsdk fkdsafk OCNL fdkafk dksakj = '
using regexp
regexp(input,'\s\d{2,4}Z\s.*(OCNL)','match')
I'm trying to get the output
[1,1] = 12Z taj 20501 jfdjda OCNL jtjajd ptpa
[1,2] = 23Z jfdakdkf tjajdfk OCNL fdkadja
[1,3] = 02Z fdjafsdk fkdsafk OCNL fdkafk dksakj
You may use
(?<!\S)\d{2,4}Z\s+.*?\S(?=\s\d{2,4}Z\s|\s*=\s*$)
See the regex demo.
Details
(?<!\S) - there must be a whitespace or start of string immediately to the left of the current location
\d{2,4} - 2, 3 or 4 digits
Z - a Z letter
\s+ - 1+ whitespaces
.*?\S - any zero or more chars as few as possible and then a non-whitespace
(?=\s\d{2,4}Z\s|\s*=\s*$) - there must be either of the two patterns immediately to the right of the current location:
\s\d{2,4}Z\s - a whitespace, 2, 3 or 4 digits, Z and a whitespace
| - or
\s*=\s*$ - a = enclosed with 0+ whitespace chars at the end of the string.

Regexp to search names of books (Delphi 7 & DiRegEx 8.8.1)

I am using Delphi 7 and this is first time I am using the library DiRegEx.
What I need to do is collect names of the books which are in a list. The list is long but to have an idea it looks like this:
2 Tesalonickým 3:14
2 Tesalonickým 3:15
2 Tesalonickým 3:16
2 Tesalonickým 3:17
2 Tesalonickým 3:18
1 Timoteovi 1:1
1 Timoteovi 1:2
1 Timoteovi 1:3
1 Timoteovi 1:4
So what I want to find by RegEx.Match is the '2 Tesalonickým' and '1 Timoteovi' strings. So I want to search for ^some string\d\d?\d?:\d\d?\d? ...
My code is:
var
contents : TStringList;
RegEx: TDIRegEx;
WordCount: Integer;
s:string;
begin
Contents := TStringList.Create;
RegEx := TDIPerlRegEx.Create{$IFNDEF DI_No_RegEx_Component}(nil){$ENDIF};
Contents.LoadFromFile('..\reference dlouhé CS.txt');
for i:=0 to Contents.count-1 do
begin
Contents[i];
try
RegEx.SetSubjectStr(Contents[i]);
RegEx.MatchPattern := '\w+';
WordCount := 0;
if RegEx.Match(0) >= 0 then
begin
repeat
Inc(WordCount);
s := RegEx.MatchedStr;
WriteLn(WordCount, ' - ', s);
until RegEx.MatchNext < 0;
end;
finally
RegEx.Free;
end; // end try
end; // end for
end;
And I need to modify the regex so the \d\d?\d?:\d\d?\d? won't be in the result, but should be an "anchor" or a "needle". How to make the regexp?
Result:
This is a complete list of 66 books of bible in UTF-8. There were some problems with the \w pattern because this dos not include characters like Ž or š.
Genesis;Exodus;Leviticus;Numeri;Deuteronomium;Jozue;Soudců;Rút;1 Samuelova;2 Samuelova;1 Královská;2 Královská;1 Paralipomenon;2 Paralipomenon;Ezdráš;Nehemjáš;Ester;Jób;Žalmy;Přísloví;Kazatel;Píseň písní;Izajáš;Jeremjáš;Pláč;Ezechiel;Daniel;Ozeáš;Jóel;Ámos;Abdijáš;Jonáš;Micheáš;Nahum;Abakuk;Sofonjáš;Ageus;Zacharjáš;Malachiáš;Matouš;Marek;Lukáš;Jan;Skutky apoštolské;Římanům;1 Korintským;2 Korintským;Galatským;Efezským;Filipským;Koloským;1 Tesalonickým;2 Tesalonickým;1 Timoteovi;2 Timoteovi;Titovi;Filemonovi;Židům;Jakubův;1 Petrův;2 Petrův;1 Janův;2 Janův;3 Janův;Judův;Zjevení Janovo;
You may use
(*UCP)^(?:\d+\s+)?\w+(?=\s+\d\d?\d?:\d)
Or
(*UCP)^(?:\d+\s+)?\w+(?=\s+\d{1,3}:\d)
A (*UCP) at the pattern start (PCRE verb) to make all shorthands Unicode-aware.
The patterns match
^ - start of the string
(?: - start of a non-capturing group
\d+ - 1+ digits,
\s+ - 1+ whitespaces and
)? - end of non-capturing group, 1 or 0 occurrences (? makes it optional)
\w+ - 1+ word chars...
(?=\s+\d{1,3}:\d) - followed with 1+ whitespaces, 1 to 3 digits, : and a digit.
See the regex demo.
The \w might need replacing with \p{L} if you only need to match letters.

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

How to select the sequence of digits from string in Firebird 2.5?

I have strings like:
26.05.2016/00002Lol
26.05.2016/00003(Lol)
26.05.2016UUUU/00004(Lol)
How to select the sequence of five digits (00002, 00003, 00004) from these strings?
Are those five digits always after the first '/'? If so, then:
SELECT SUBSTRING(col FROM POSITION(col, '/') + 1 FOR 5) AS fivedigits ...
ought to do the trick.
Description
(?<=\/)[0-9]{5}
This regular expression will do the following:
capture the first 5 digits after the \ character
Example
Live Demo
https://regex101.com/r/lD6pW5/1
Sample text
26.05.2016/00002Lol
26.05.2016/00003(Lol)
26.05.2016UUUU/00004(Lol)
Sample Matches
[0][0] = 00002
[1][0] = 00003
[2][0] = 00004
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
(?<= look behind to see if there is:
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
[0-9]{5} any character of: '0' to '9' (5 times)
----------------------------------------------------------------------