Ruby empty String regex [duplicate] - regex

This question already has answers here:
What is a regex to match ONLY an empty string?
(11 answers)
Closed 2 years ago.
Is there any other way than .match(""), to specify that my String may as well contain an empty String alongside with some other stuff?
So far my regexp looks like this:
s = /(?<beginString>[\sA-Za-z\.]*)(?<marker1>[\*\_]+)(?<word>[\sA-Za-z]+)(?<marker2>[\*\_]+)(?<restString>[\sA-Za-z\*\_\.]*)/ =~ myString
now I want to assign beginString an empty String, because the program runs recursively. I want to be able to pass this function a String like "*this* _is_ ***a*** _string_" so as you can see, there is plenty possibility that beginString has to be empty in order that it reads the next marker.
In the end I want my program to print this:
{"01"=>#<Textemph #content="this ">, "02"=>#<Textemph #content="is">,
"11"=>#<Textstrongemph #content="a">, "12"=>#<Textemph #content="string">}
I think that it might fix my problem if I could just put somthing like \nil up there in the [ ].
Please just tell me how to add .match("") but in //-form.
Thank you for your Help <3

Match Start-of-String Followed by End-of-String
To match an empty string in Ruby, you could use the \A and \z atoms to match the beginning of the string immediately followed by the end of the string. For example:
"".match /\A\z/
#=> #<MatchData "">
Note that this is different from /^$/, which doesn't handle newlines as you might expect. Consider:
"\n".match /\A\z/
#=> nil
"\n".match /^$/
#=> #<MatchData "">
To properly detect an empty string, you need to use start/end of string matchers rather than start/end of line.

Related

How do I form a regex express to return all parts of the string prior to a [ [duplicate]

This question already has answers here:
Python regex to get everything until the first dot in a string
(5 answers)
How would I get everything before a : in a string Python
(6 answers)
Closed 1 year ago.
I am trying to form a regex expression to match strings that fit the following pattern:
This is what I want [not this]
The return string should be:
This is what I want
The regex expressions I've tried are:
strings = ['This is what I want [but not this]',
'I should hold onto this part [but this part can be discarded]']
Using this expression: re.search(r"(.*)[)", strings
The output is:
This is what I want [
I should hold onto this part [
I have also tried:
re.search(r"(.*)(?![)
The return value is the entire original string as-is. I've already written this using indexing to find the '[' character and remove everything from that character onward, but I would like to know how it can be done with regex.
Thank you.
EDIT:
I tried the two regex recommendations, but neither work.
#!/usr/bin/python
import re
strings = ['This is what I want [but not this]',
'I should hold onto this part [but this part can be discarded]']
for string in strings:
print(re.match("^[^\[]*(?:\[|$)",string).group(0))
Output:
This is what I want [
I should hold onto this part [
This will return groups with the strings you are looking for.
^(.+)\[
See it working on regex101.com.
You could use the regex pattern ^[^\[]*(?:\[|$):
strings = ['This is what I want [but not this]', 'I should hold onto this part [but this part can be discarded]','no brackets here']
output = [re.findall(r'^[^\[]*(?:\[|$)', x)[0] for x in strings]
print(output)
This prints:
['This is what I want [', 'I should hold onto this part [', 'no brackets here']
The regex pattern used here says to match:
^ from the start of the input
[^\[]* zero or more non [ characters, until hitting
(?:\[|$) and including either the first [ OR the end of the input
Note that we leave open the possibility that your input string may not have a [ anywhere, in which case we take the entire input as a match.

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

How can I remove the first and last dash from a string? [duplicate]

This question already has answers here:
What is the easiest way to remove the first character from a string?
(15 answers)
Closed 2 years ago.
Let's say I have a string:
my_string = "-5-24-3-488-7--4-3-"
How can I remove both the first and the last dash? I want the result to look like this:
my_string = "5-24-3-488-7--4-3"
I've thought about using gsub, or a regular expression, but I'm probably over-complicating the solution. Still I can't figure it out. Please Help.
The regex ^-|-$ matches a hyphen at the beginning, or a hyphen at the end.
In Ruby:
"-5-24-3-488-7--4-3-".gsub(/^-|-$/, '')
And if you want to modify the string in-place,
my_string.gsub!(/^-|-$/, '')
You can do this:
my_string.delete_prefix("-").delete_suffix("-")
# => "5-24-3-488-7--4-3"
In view of the other answers given to this question, my answer may in part illustrate the importance of providing a complete and unambiguous statement of a question.
def remove_first_and_last_hyphen(str)
idx = str.index('-')
if idx
str[idx] = ''
idx = str.rindex('-')
str[idx] = '' if idx
end
str
end
str = "-5-24-3-488-7--4-3-"
remove_first_and_last_hyphen str
#=> "5-24-3-488-7--4-3"
str
#=> "5-24-3-488-7--4-3"
remove_first_and_last_hyphen "5-24-3-488-7--4-3-"
#=> "524-3-488-7--4-3"
remove_first_and_last_hyphen "-5-24-3-488-7--4-3"
#=> "5-24-3-488-7--43"
remove_first_and_last_hyphen "5-24-3-488-7--4-3"
#=> "524-3-488-7--43"
I defined str in the first example to show that str was mutated (modified).
The question is, "How can I remove the first and last dash from a string?". An example is given, but it shows only what is wanted in a particular case, and is consistent with various interpretations of the question.
Aside from the confusion between dashes and hyphens, there is only one way to interpret "the first and last hyphen from a string"; namely, the hyphen having the smallest index (if the string contains at least one hyphen) and the hyphen having the largest string index (if the string contains at least two hyphens). That is of course not the same as "the first and last characters of a string, provided they are hyphens". The OP may have something different in mind, but I can only go by what is asked in such a clear and unambiguous way.
By "remove" in "...remove the first and last", I assume the OP wishes to modify the string in place, as opposed to returning a new string. If I am wrong about that my code would have to be modified accordingly.
If you have - at the beginning and the end, then you can do:
"-5-24-3-488-7--4-3-"[1..-2]
# => "5-24-3-488-7--4-3"
Try my_string.scan(/\A-(.+)-\z/).flatten.first. The \A and \z match the beginning and end of the string.

Extract numbers between brackets within a string [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Extract info inside all parenthesis in R (regex)
I inported data from excel and one cell consists of these long strings that contain number and letters, is there a way to extract only the numbers from that string and store it in a new variable? Unfortunately, some of the entries have two sets of brackets and I would only want the second one? Could I use grep for that?
the strings look more or less like this, the length of the strings vary however:
"East Kootenay C (5901035) RDA 01011"
or like this:
"Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020"
All I want from this is 5901035 and 5933039
Any hints and help would be greatly appreciated.
There are many possible regular expressions to do this. Here is one:
x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020")
> gsub('.+\\(([0-9]+)\\).+?$', '\\1', x)
[1] "5901035" "5933039"
Lets break down the syntax of that first expression '.+\\(([0-9]+)\\).+'
.+ one or more of anything
\\( parentheses are special characters in a regular expression, so if I want to represent the actual thing ( I need to escape it with a \. I have to escape it again for R (hence the two \s).
([0-9]+) I mentioned special characters, here I use two. the first is the parentheses which indicate a group I want to keep. The second [ and ] surround groups of things. see ?regex for more information.
?$ The final piece assures that I am grabbing the LAST set of numbers in parens as noted in the comments.
I could also use * instead of . which would mean 0 or more rather than one or more i in case your paren string comes at the beginning or end of a string.
The second piece of the gsub is what I am replacing the first portion with. I used: \\1. This says use group 1 (the stuff inside the ( ) from above. I need to escape it twice again, once for the regex and once for R.
Clear as mud to be sure! Enjoy your data munging project!
Here is a gsubfn solution:
library(gsubfn)
strapplyc(x, "[(](\\d+)[)]", simplify = TRUE)
[(] matches an open paren, (\\d+) matches a string of digits creating a back-reference owing to the parens around it and finally [)] matches a close paren. The back-reference is returned.

Help in set RegularExpressions in Delphi XE

i want to set RegularExpressions for check string1.
string1 can change to :
string1:='D1413578;1038'
string1:='D2;11'
string1:='D16;01'
,....
in string1 only Character 'D' and semicolon is always exist.
i set RegularExpressions1 := '\b(D#\;#)\b';
but RegularExpressions1 can't to check string1 correctly.
in the vb6 this RegularExpressions1="D#;#". but i don't know that is in Delphi??
Try
\bD\d*;\d*
\d* means "zero or more digits".
By the way, I have omitted the second \b because otherwise the match would fail if there is no number after the semicolon (and you said the number was optional).
If by "check" you mean "validate" an entire string, then use
^D\d*;\d*$
All this assumes that only digits are allowed after D and ;. If that is not the case, please edit your question to clarify.
Assuming both numbers require at least one digit, use this regex:
\AD\d+;\d+\z
I prefer to use \A and \z instead of ^ and $ to match the start and end of the string because they always do only that.
In Delphi XE you can check whether this regex matches string1 in a single line of code:
if TRegEx.IsMatch(string1, '\AD\d+;\d+\z') then ...
If you want to use many strings, intantiate the TRegEx:
var
RE: TRegEx;
RegEx.Create('\AD\d+;\d+\z');
for string1 in ListOfStrings do
if RE.IsMatch(string1) then ...