Regular Expression to split a sentence into hyphenated words

Regular Expression to split a sentence into hyphenated words - regex

I'm looking for a regular expression that will split a sentence into words, by using both spaces and hyphens as the character to split at. i.e. "This is over-done" should return 4 words (this, is, over, done)
I have the RegEx to do these separately but can't get it to work together:
To split on spaces:
\b(\S)(\S*)\b
and to split on hyphens:
\b([^-])([^-]*)\b
I have tried various ways to put these together but can't get it working. Any help appreciated.

This should work:
\b([^-\s]+)\b

What about:
(?:^|[\s-])?(\w+)(?:$|[\s-])?
Demo: http://rubular.com/r/WiRSwFPTXa

Related

Regular Expression Match (get multiple stuff in a group)

I have trouble working on this regular expression.
Here is the string in one line, and I want to be able to extract the thing in the swatchColorList, specifically I want the word Natural Burlap, Navy, Red
What I have tried is '[(.*?)]' to get everything inside bracket, but what I really want is to do it in one line? is it possible, or do I need to do this in two steps?
Thanks
{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}

You can try this regex
(?<=[[,]\{\")[^"]+
If negative lookbehind is not supported, you can use
[[,]\{"([^"]+)
This will save needed word in group 1.

import json
str = '{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}'
obj = json.loads(str)
words = []
for thing in obj["swatchColorList"]:
for word in thing:
words.append(word)
print word
Output will be
Natural Burlap
Navy
Red
And words will be stored to words list. I realize this is not a regex but I want to discourage the use of regex on serialized object notations as regular expressions are not intended for the purpose of parsing strings with nested expressions.

Regular Expression to match string which doesn't contain substring

I have a comma separated list as shown below. The list is actually on one line, but I have split it up to demonstrate the syntax and that each single unit contains 5 elements. There is no comma at the end of the list
ro:2581,1309531682152,A,Place,Page,
me:2642,1310989368864,A,Place,Page,
uk:2556,1309267095061,A,Place,Page,
me:2642,1310989380238,D,Place,Page,
me:2642,1334659643627,D,Place,Page,
ro:3562,1378721526696,A,Place,Page,
uk:1319,1309337246675,D,Place,Page,
ro:2581,1379500694666,D,Place,Page,
uk:1319,1309337246675,A,Place,Page
What I am trying to do is remove any unit (full line) that does not begin with uk:. I.e., the results will be:
uk:2556,1309267095061,A,Place,Page,
uk:1319,1309337246675,D,Place,Page,
uk:1319,1309337246675,A,Place,Page
If the string was on separate lines as my example, I could do this relatively easy, but because it is all on one line, I cannot get it to work. Can anyone point me in the right direction?
Thanks

This should work:
(uk:\d+,\d+,\w,\w+,\w+)
Demo
It looks for uk: and then it's pretty much comma-counting from there on.
EDIT:
Since OP has now clarified that what they're using can only remove strings:
,?[^u][^k]:\d+,\d+,\w,\w+,\w+
Demo 2
This looks for an optional comma followed by two letters that are not u and not k in that order, then a colon (:), and then the rest of the regex is the same.

I would suggest a simple regex like this:
(\buk:.+?,Page)(?:,|$)
and grab matched group #1
RegEx Demo

REGEX how to split on: * , and the phrase "D/ST"?

I've used regex before and am familiar with string.split but I can't figure out how to split on the delimiters: * and , and the phrase, "D/ST".
when i do string.split("[,*|D/ST]+" with a pipe it just splits on the letter D.
Anyone do something like this before?

The reason your previous regex didn't work is because you're using a character class, which will match a single character of those. Instead, you should probably use grouping, which is separated by vertical bars:
(\*|\,|D\/ST)

Have you an example of the input string?
Try something like :
"(\,|\*|D\/ST)+"
No "OR" in an interval...

Regular expression for a list of items separated by comma or by comma and a space

Hey,
I can't figure out how to write a regular expression for my website, I would like to let the user input a list of items (tags) separated by comma or by comma and a space, for example "apple, pie,applepie". Would it be possible to have such regexp?
Thanks!
EDIT:
I would like a regexp for javascript in order to check the input before the user submits a form.

What you're looking for is deceptively easy:
[^,]+
This will give you every comma-separated token, and will exclude empty tokens (if the user enters "a,,b" you will only get 'a' and 'b'), BUT it will break if they enter "a, ,b".
If you want to strip the spaces from either side properly (and exclude whitespace only elements), then it gets a tiny bit more complicated:
[^,\s][^\,]*[^,\s]*
However, as has been mentioned in some of the comments, why do you need a regex where a simple split and trim will do the trick?

Assuming the words in your list may be letters from a to z and you allow, but do not require, a space after the comma separators, your reg exp would be
[a-z]+(,\s*[a-z]+)*
This is match "ab" or "ab, de", but not "ab ,dc"

Here's a simpler solution:
console.log("test, , test".match(/[^,(?! )]+/g));
It doesn't break on empty properties and strips spaces before and after properties.

This thread is almost 7 years old and was last active 5 months ago, but I wanted to achieve the same results as OP and after reading this thread, came across a nifty solution that seems to work well
.match(/[^,\s?]+/g)
Here's an image with some example code of how I'm using it and how it's working
Regarding the regular expression... I suppose a more accurate statement would be to say "target anything that IS NOT a comma followed by any (optional) amount of white space" ?

I often work with coma separated pattern, and for me, this works :
((^|[,])pattern)+
where "pattern" is the single element regexp

This might work:
([^,]*)(, ?([^,]*))*

([^,]*)
Look For Commas within a given string, followed by separating these. in regards to the whitespace? cant you just use commas? remove whitespace?

I needed an strict validation for a comma separated input alphabetic characters, no spaces. I end up using this one is case anyone needed:
/^[a-z]+(,[a-z]+)*$/
Or, to support lower- and uppercase words:
/^[A-Za-z]+(?:,[A-Za-z]+)*$/
In case one need to allow whitespace between words:
/^[A-Za-z]+(?:\s*,\s*[A-Za-z]+)*$/
/^[A-Za-z]+(?:,\s*[A-Za-z]+)*$/

You can try this, it worked for me:
/.+?[\|$]/g
or
/[^\|?]+/g
but replace '|' for the one you need. Also, don't forget about shielding.

something like this should work: ((apple|pie|applepie),\s?)*

Regex multi word search

What do I use to search for multiple words in a string? I would like the logical operation to be AND so that all the words are in the string somewhere. I have a bunch of nonsense paragraphs and one plain English paragraph, and I'd like to narrow it down by specifying a couple common words like, "the" and "and", but would like it match all words I specify.

Regular expressions support a "lookaround" condition that lets you search for a term within a string and then forget the location of the result; starting at the beginning of the string for the next search term. This will allow searching a string for a group of words in any order.
The regular expression for this is:
^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b)
Where \b is a word boundary and the ?= is the lookaround modifier.
If you have a variable number of words you want to search for, you will need to build this regular expression string with a loop - just wrap each word in the lookaround syntax and append it to the expression.

AND as concatenation
^(?=.*?\b(?:word1)\b)(?=.*?\b(?:word2)\b)(?=.*?\b(?:word3)\b)
OR as alternation
^(?=.*?\b(?:word1|word2|word3)\b
^(?=.*?\b(?:word1)\b)|^(?=.*?\b(?:word2)\b)|^(?=.*?\b(?:word3)\b)

Maybe using a language recognition chart to recognize english would work. Some quick tests seem to work (this assumes paragraphs separated by newlines only).
The regexp will match one of any of those conditions... \bword\b is word separated by boundaries word\b is a word ending and just word will match it in any place of the paragraph to be matched.
my #paragraphs = split(/\n/,$text);
for my $p (#paragraphs) {
if ($p =~ m/\bthe\b|\band\b|\ban\b|\bin\b|\bon\b|\bthat\b|\bis\b|\bare\b|th|sh|ough|augh|ing\b|tion\b|ed\b|age\b|’s\b|’ve\b|n’t\b|’d\b/) {
print "Probable english\n$p\n";
}
}

Firstly I'm not certain what you're trying to return... the whole sentence? The words in between your two given words?
Something like:
\b(word1|word2)\b(\w+\b)*(word1|word2)\b(\w+\b)*\.
(where \b is the word boundary in your language)
would match a complete sentence that contained either of the two words or both..
You'd probably need to make it case insensitive so that if it appears at the start of the sentence it will still match

Assuming PCRE (Perl regexes), I am not sure that you can do it at all easily. The AND operation is concatenation of regexes, but you want to be able to permute the order in which the words appear without having to formally generate the permutation. For N words, when N = 2, it is bearable; with N = 3, it is barely OK; with N > 3, it is unlikely to be acceptable. So, the simple iterative solution - N regexes, one for each word, and iterate ensuring each is satisfied - looks like the best choice to me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to split a sentence into hyphenated words - regex

This should work: \b([^-\s]+)\b

What about: (?:^|[\s-])?(\w+)(?:$|[\s-])? Demo: http://rubular.com/r/WiRSwFPTXa

Related

Regular Expression Match (get multiple stuff in a group)

Regular Expression to match string which doesn't contain substring

REGEX how to split on: * , and the phrase "D/ST"?

Regular expression for a list of items separated by comma or by comma and a space

Regex multi word search

Categories

Resources