RegEx: capture all the phrases with hyphen in them - regex

I have a very long list of words that after converting from another format, some of the words in it are hyphenated. for example:
book, im-moral, law
intesti-nal, lung
flimflam*, fly-by-night*, illegal,
How can I capture all the phrases that have hyphen in them? In case of above example it would be:
im-moral
intesti-nal
fly-by-night
RegEx flavor: regular expressions engine implemented in EditPad Pro 7

Please take a look at this plunker link. As anubhava mentioned, we can use the same regexp. I have also added a simple example to check it.
`
var str = 'book, im-moral,law,intesti-nal,lung, flimflam*, fly-by-night*, illegal';
var re = /([a-zA-Z]+(-[a-zA-Z]+)+)/gi;
var found = str.match(re);
alert(found)
`

Related

Regex for parsing a simple sentence words delimited by double quotes

I have an example sentence that looks like this:
""Music"",""EDM / Electronic"",""organizer: Tiny Toons""
I want to parse this sentence into the tokens:
["Music", "EDM / Electronic", "organizer: Tiny Toons"]
My regex foo is quite limited, and I'm under some time pressure.
Was wondering if someone could help me construct a regex (compatible with Java8 - as I'm using Clojure to apply the regex) to parse out these capture groups.
Thank you,
Jason.
Assuming the sentence is the entire string and that there are no commas or " to be matched, you could just use
"[^,\"]+"
If the above assumptions are not correct, please give examples of possible input strings and details of what characters can appear within the sections you want to match.
A simple java example of how to use the regex:
String sentence = "\"\"Music\"\",\"\"EDM / Electronic\"\",\"\"organizer: Tiny Toons\"\"";
Matcher matcher = Pattern.compile("[^,\"]+").matcher(sentence);
List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
}
System.out.println(matches);

Look for any character that surrounds one of any character including itself

I am trying to write a regex code to find all examples of any character that surrounds one of any character including itself in the string below:
b9fgh9f1;2w;111b2b35hw3w3ww55
So ‘b2b’ and ‘111’ would be valid, but ‘3ww5’ would not be.
Could someone please help me out here?
Thanks,
Nikhil
You can use this regex which will match three characters where first and third are same using back reference, where as middle can be any,
(.).\1
Demo
Edit:
Above regex will only give you non-overlapping matches but as you want to get all matches that are even overlapping, you can use this positive look ahead based regex which doesn't consume the next two characters instead groups them in group2 so for your desired output, you can append characters from group1 and group2.
(.)(?=(.\1))
Demo with overlapping matches
Here is a Java code (I've never programmed in Ruby) demonstrating the code and the same logic you can write in your fav programming language.
String s = "b9fgh9f1;2w;111b2b35hw3w3ww55";
Pattern p = Pattern.compile("(.)(?=(.\\1))");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + m.group(2));
}
Prints all your intended matches,
111
b2b
w3w
3w3
w3w
Also, here is a Python code that may help if you know Python,
import re
s = 'b9fgh9f1;2w;111b2b35hw3w3ww55'
matches = re.findall(r'(.)(?=(.\1))',s)
for m in re.findall(r'(.)(?=(.\1))',s):
print(m[0]+m[1])
Prints all your expected matches,
111
b2b
w3w
3w3
w3w

Building a Regex String - Any assistance provided

Im very new to REGEX, I understand its purpose, but Im struggling to yet fully comprehend how to use it. Im trying to build a REGEX string to pull the A8OP2B out from the following (or whatever gets dumped in that 5th group).
{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}}
The other items in above line, will change in character length, so I cannot say the 51st to the 56th character. It will always be the 5th group in quotation marks though that I want to pull out.
Ive tried building various regex strings up, but its still mostly a foreign language to me and I still have much reading to do on it.
Could anyone provide me a working example with the above, so I can reverse engineer and understand better?
Thanks
Demo 1: Reference the JSON to a var, then use either dot or bracket notation.
Demo 2: Using RegEx is not recommended, but here's one in JavaScript:
/\b(\w{6})(?=","RfKey":)/g
First Match
non-consuming match: :"A
meta border: \b: A non-word=:, any char=", and a word=A
consuming match: A8OP2B
begin capture: (, Any word =\w, 6 times={6}
end capture: )
non-consuming match: ","RfKey":
Look ahead: (?= for: ","RfKey": )
Demo 1
var obj = {"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}};
var dataDot = obj.RfReceived.Data;
var dataBracket = obj['RfReceived']['Data'];
console.log(dataDot);
console.log(dataBracket)
Demo 2
Note: This is consuming a string of 3 consecutive patterns. 3 matches are expected.
var rgx = /\b(\w{6})(?=","RfKey":)/g;
var str = `{"RfReceived":{"Sync":9480,"Low":310,"High":950,"Data":"A8OP2B","RfKey":"None"}},{"RfReceived":{"Sync":8080,"Low":102,"High":1200,"Data":"PFN07U","RfKey":"None"}},{"RfReceived":{"Sync":7580,"Low":471,"High":360,"Data":"XU89OM","RfKey":"None"}}`;
var res = str.match(rgx);
console.log(res);

Regular Expression Split

I have a string as mentioned below. I have been trying to split using regular expression and going through the forums, I found ([^|]+) which would match everything except (pipe) However I want to break this into two using regular expressions, but not been able to do this. So one expression would be (xyz) which would extract from GA till everything before the pipe character, the second would be (abc) which would extract anything after the first pipe.
GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298
The first is ^[^|]+ and the second is [^|]+$.
The idea is to use your negated character class with anchors. ^ will match the string start and $ will matchthe string end.
These two patterns have no lookarounds and will work with almost any regex flavor.
Guessing at popular languages. :-)
Python:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
JavaScript:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
PHP:
explode('|', 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298')
Go:
strings.Split("GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298", "|")
Ruby:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
EDIT
After clarification, I get what you're asking. Fiddling with regex101.com, I found that those two expressions should give you what you want:
^.*(?=\|) gets the first part, and
(?<=\|).* gets the second.
When you click on the link, you can see it in action.
PREVIOUS ANSWER
Many alternatives to regular expressions as #smarx's answer reveals.
But something along those lines should do it:
R
myString <- 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\1")
part2 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\2")
R requires doubling all backslashes, some other languages don't.
Python
import re
myString = 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\1", string = myString)
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\2", string = myString)

Javascript RegExp find with condition but without showing them

I'm trying to find the words between the brackets.
var str = "asdfasdfkjh {{word1}} asdf fff fffff {{word2}} asdfasdf";
var pattern = /{{\w*}}/g;
var str.match(pattern); // ["{{word1}}","{{word2}}"]
This closes the deal, but gives it with the brackets, and i don't want them.
Sure, if I used the native replace on the results i could remove them. But i want the regexp to do the same.
I've also tried:
var pattern = /(?:{{)(\w*)(?:}})/g
but i can't find the real deal. Could you help me?
Edit: i might need to add a note that the words are dynamic
solution:
Bases on Tim Piezcker awnser i came with this solution:
var arr = [],
re = /{{(\w?)}}/g,item;
while (item = re.exec(s))
arr.push(item[1]);
In most regex flavors, you could use lookaround assertions:
(?<={{)\w*(?=}})
Unfortunately, JavaScript doesn't support lookbehind assertions, so you can't use them.
But the regex you proposed can be used by accessing the first capturing group:
var pattern = /{{(\w*)}}/g;
var match = pattern.exec(subject);
if (match != null) {
result = match[1];
}
A quick and dirty solution would be /[^{]+(?=\}\})/, but it will cause a bit of a mess if the leading braces are omitted, and will also match {word1}}. If I remember correctly, JavaScript does not support look-behind, which is a bit of a shame in this case.