Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 months ago.
Improve this question
I am using C++ std:regex.
I want to decline German adjective + noun groups with regex (the regex will be read from a file). The declination of the adjectives is affected not only by gender of the noun but also by the presence or absence of an article. There can be any number of adjectives from zero to whatever.
For example for a feminine noun going from nominative to dative:
schöne kleine braune Kuh -> schöner kleiner brauner Kuh
die schöne kleine braune Kuh -> der schönen kleinen braunen Kuh
(So, basically all the -e at the end of the adjectives should become -er or -en based in the presence or absence of die in the original string (or der in the resultant string)).
This sequence of regex almost works:
/^die /der /
/e /er /
/^(der .*)er /$1en /
This gives me:
schöne kleine braune Kuh -> schöner kleiner brauner Kuh
die schöne kleine braune Kuh -> der schöner kleiner braunen Kuh
Note that it only correctly declines the final adjective in the 2nd example (or if I make it non-greedy, then it only declines the first adjective). Of course, I can repeat the final regex 3 times to get the right result, but that only works if I have 3 or fewer adjectives. In the real world, the number of adjectives is unlimited. How can a rewrite this to do what I want?
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need a regex pattern to match any text that comes between Health & Beauty that may or may not include a space and/or special character "&" but should not exceed the character limit of 10. In said case, I would want to extract:
Beauty & Fashion
The following is a regix code to extract anchor text:
(<[a|A][^>]*>|)
But I want to limit the character to 1 to 10 ? Is that possble?
For PCRE:
https://regex101.com/r/GJSlZl/1
For JS:
https://regex101.com/r/FIdlyU/1
The solution depends on the regex flavor:
js: (?<=<a[^>]+>)([\w &]{1,10})(?=<\/a>)
pcre: <a[^>]+>\K([\w &]{1,10})(?=<\/a>)
My guess is that you're looking to find some expression similar to,
(?<=&|>)([^&\r\n]{0,10}(?=&|<\/a>))*
which you might want to add more boundaries on the left side,
(?<=&|>)
Test
$re = '/(?<=&|>)([^&\r\n]{0,10}(?=&|<\/a>))*/s';
$str = '<a>Health & Beauty</a>
Health & Beauty
Health & Beauty 1 & Health & Beauty 1
<a>Health & Beauty 1 & Health & Beauty 1 </a>
<a>Health & Beauty 1 & Some other words & Beauty 1 & Some other words 2</a>
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a single line string like so (trying to turn it into a properly formatted csv):
customer id,description,card country\nBZkvIP2FFfhA3s,"Customer\n10019\nUS\n55769 - example#email.co,",US\nBZiFuAQ6Bd7iNw,"EVV c/o Company\r\n47713\r\nUS\r\n55761 - email#example.com",US\n
I want to find a simple regex that I can use to replace the \n characters that are in the "description" (which is always between double quotes) with a space, then I will do a replace for the remaining \n characters (which will be at the end of the csv line. So my end result will be formatted like so:
customer id,description,card country
BZkvIP2FFfhA3s,"Customer 10019 US 55769 - example#email.co,",US
BZiFuAQ6Bd7iNw,"EVV c/o Company\r 47713\r US\r 55761 - email#example.com",US
I can't figure out how to do this simply, I don't need a regex that handles a million exceptions, just matches all \n that are between " and "
This should work as the "search" term:
(".*?)(\\n)(.*?")
and for your "replace" you need:
$1 $3
https://regex101.com/r/djG74I/4
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to use regex so as to obtain specific information from the text and I give an example with a semi-pseudocode ~ you can also reply me with semi-pseudocode:
list=["orange","green","grey"]
text= "The Orange is orange"
for word in list:
if word == re.compile(r'word, text):
capture Orange in order to have the noun
Beware! My question focuses whether there is a possibility to use variables (as word up above) so as to make a loop and see if there are equal words in an text based on a list.
Do not focus on how to capture the Orange.
I think Biffen has the right idea, you're in a world of pain if you're using this for POS tagging. Anyway, this allows you to match words in your text variable
for word in list:
if word in text:
# Do what you want with word
If you wanted to use regex then you can build patterns from strings, use parentheses to capture. Then use group() to access captured patterns
for word in list:
pattern = re.compile(".*(" + word + ").*")
m = re.match(pattern, text)
if m:
print(m.group(1))
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
65 Gregory Street
;Gregory
141-145 Dickson Road
;Dickson
6B Malvern Avenue
;Malvern
230A John Street
;John
I'm trying to extract just the street name in a string, skip the numbers even ones with letters in them and just extract the first word in the string. What's the correct expression for this?
Skip the first group of non-space characters, get the next non-space group, skip the rest:
street := RegExReplace(address, "^\S+ (\S+).*$", "$1")
In case of multiline text you can process all lines at once with m and `a options:
streets := RegExReplace(addresses, "m`a)^\S+ (\S+).*$", "$1")
Use regex101.com to test the expressions online.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm not really good at regex or I can say that I'm a totally beginner to it. I want to know the regex from the following format:
LB-[0-999] - "LB-" with 0 - 999 digits and no spaces at the beginning, middle, and end.
XX-[0-19999] - Only two Capitalize letters in any combination with "-" and 0-19999 digits and no spaces at the beginning, middle and end.
XXX-[0-19999] - Only three Capitalize letters in any combination with "-" and 0-19999 digits and no spaces at the beginning, middle and end.
I want to get all three patterns but I'm really new to regex. I was planning to use it as html5 input validation and I'm really out of time of studying it.
This is what I tried so far:
^LB-[0-9]$
/LB\-[0-9]{1,3}/
/[A-Z]{2}\-1?[0-9]{1,4}/
/[A-Z]{3}\-1?[0-9]{1,4}/
With [0-9] you make appear only numbers in this set: 0 until 9 and with [A-Z] only capital letter from alphabeta. In {1,3} and {1,4} you make obligated to have at least one letter/number and at most four or three. With ?1 you make optional the present of a number 1 before your number, that will be present only for 10000 number or greater. This are three different er for each one of your possible entries.
Consider the changes prosed by user in his last comment the code will be like this:
/LB\-(00[1-9] | [1-9][0-9] | [1-9][0-9]{2})/
/[A-Z]{2}\-(0000[1-9] | 000[1-9][0-9] | 00[1-9][0-9]{2} | 0[1-9][0-9]{3} | 1[0-9][0-9]{3})/
/[A-Z]{3}\-(0000[1-9] | 000[1-9][0-9] | 00[1-9][0-9]{2} | 0[1-9][0-9]{3} | 1[0-9][0-9]{3})/