I'm trying to do my homework but regex is new for me and I'm not sure why my code doesn't work. That's what I have to do:
Write a program that replaces in a HTML document given as string all the tags <a href=…>…</a> with corresponding tags [URL href=…]…[/URL]. Read an input, until you receive “end” command. Print the result on the console.
I wrote:
Pattern pattern = Pattern.compile("<a href=\"(.)+\">(.)+<\\/a>");
input = input.replaceAll(matcher.toString(), "href=" + matcher.group(1) + "]" + matcher.group(2) + "[/URL]");
And it throws Exception in thread "main" java.lang.IllegalStateException:
No match found for this input: href="http://softuni.bg">SoftUni</a>
Your + quantifer needs to be inside the parentheses:
<a href=\"(.+)\">(.+)<\\/a>
You were heading in the right direction, but you can't use a Pattern object like that.
First, change you code to use replaceAll() just with strings directly and use normal back references $n in the replacement string.
Your code thus converted is:
input = input.replaceAll("<a href=(\".+\")>(.)+<\\/a>", "href=$1]$2[/URL]");
Next, fix the expressions:
input = input.replaceAll("<a href=(\".+\")>(.+)</a>", "[URL href=$1]$2[/URL]");
The changes were to put the + inside the capturing group. ie (.)+ -> (.+) and also to capture the double quotes, since you have to put them back if I interpret the "spec" correctly.
Also note that you don't need to escape a forward slash. Forward slashes are just plain old characters in all regex flavors. Although some languages use forward slashes to delimit regular expressions, java isn't one of them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace _06.Replace_a_Tag
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
while (text != "end")
{
string pattern = #"<a.*?href.*?=(.*)>(.*?)<\/a>";
// is used to take only 2 groups :
// first group (or group one) is used for the domain name
// for example : "https://stackoverflow.com"
// and the second is for if you want to enter some text
// (or no text)
// for example : This is some text
string replace = #"[URL href=$1]$2[/URL]";
// we use $ char and a number (like placeholders)
// for example : $1 means take whatever you find from group 1
// and : $2 means take whatever you find from group 2
string replaced = Regex.Replace(text, pattern , replace);
// In a specific input string (text), replaces all strings
// that match a specified regular expression (pattern ) with
// a specified replacement string (replace)
Console.WriteLine(replaced);
text = Console.ReadLine();
}
}
}
}
// input : <ul><li></li></ul>
// output: <ul><li>[URL href=""][/URL]</li></ul>
Related
Hello I am trying to get the values surrounded by curly braces "{value}". I am using this regular expression [^{}]+(?=}) correctly?
let url = "/{id}/{name}/{age}";
let params = url.match('[^{\}]+(?=})');
if(params != null){
params.forEach(param => {
console.log(param);
});
}
// expected output
id
name
age
//actual output
id
By default the search stops after the first match.
You should use a regex literal delimited by slashes (like: /regex/), not a string delimited by quotes (like: 'string').
And then you should add the /g flag to the end of it, which means "global". This lets it search through the whole string to find all matches.
let url = "/{id}/{name}/{age}";
let params = url.match(/[^{\}]+(?=})/g);
// ^ do a global search
if(params != null){
params.forEach(param => {
console.log(param);
});
}
From MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches.
I'm trying to implement the escape character functionality in a macro generator I'm writing in Dart. For example, I would like the program to grab all the occurrences of '¶m' in my string and replace it with 'John', unless the '&' character is preceded with the escape character '\'. Example: "My name is ¶m and my parameter is called \¶m." -> "My name is John and my parameter is called ¶m". What would be the regular expression to catch all the substrings that contain the '&', then my parameter's name, and without the preceding '\'?
It's possible to match that, even avoiding escapes of backslashes, as:
var re = RegExp(r"(?<!(?:^|[^\\])(?:\\{2})*\\)&\w+");
This uses negative lookbehind to find a & followed by word-characters, and not preceded by an odd number of backslashes.
More likely, you want to also recognize double-backslashes and convert them to single-backslashes. That's actually easier if you try to find all matches, because then you know all preceding double-backslashes are part of an earlier match:
var re = RegExp(r"\\\\|(?<!\\)&\w+");
This, when used as re.allMatches will find all occurrences of \\ and &word where the latter is not preceded by an odd number of backslashes.
var _re = RegExp(r"\\\\|(?<!\\)&(\w+)");
String template(String input, Map<String, String> values) {
return input.replaceAllMapped(_re, (m) {
var match = m[0]!;
if (match == r"\\") return r"\";
var replacement = values[m[1]!];
if (replacement != null) return replacement;
// do nothing for undefined words.
return match;
});
}
(You might also want to allow something like &{foo} if parameters can occur next to other characters, like &{amount)USD).
To keep the character before ¶m when it matches a non-backslash character you need to use so called capturing groups. These are are subexpressions of a regular expression inside parentheses. To use capturing groups in Dard you need to use the method replaceAllMapped. We also have the case when the template starts with ¶m and in this case we match at the beginning of the string instead.
Try this:
void main() {
final template = 'My name is ¶m and my parameter is called \\¶m.';
final populatedTemplate = template.replaceAllMapped(RegExp(r'(^|[^\\])¶m\b'), (match) {
return '${match.group(1)}John';
});
final result = populatedTemplate.replaceAll(RegExp(r'\\¶m\b'), 'John');
print(result);
}
I want to find out whole string and one word inside square bracket .
Sample string = [This is my first Question];
I want to search it for [This ..
My expected output should be whole string inside bracket if string includes [This.
Can anyone help in this using regex? Any help would be appreciated.
My code is below :
string mainStr = 'wrapper s = [This is my first Question]';
Pattern pattr = Pattern.compile('\[This[^]]+\]');
Matcher mat = pattr .matcher(mainStr );
system.debug('mat is -----'+mat.matches());
system.debug('m is -----'+mat.find());
string n = null;
if(mat.matches()) { n = mat.group(); system.debug('m is ----'+n); }
if using regEx (\[(\w+)[^]]+\])
you get
in $1 the string beetween [] and
in $2 the first word inide of this.
Below you see a demo used inside a texteditor (phpStorm):
I'm looking for a regex that searches a file and does NOT return (i.e. it excludes) chars that repeat 3 or more times consecutively in a string. I've tried this expression below, but it's NOT doing the the job :( ..something that looks fwd and backward and excludes strings that have 3 or more repeating back-to back chars. i.e. it should return abcdefg, but not 3333ahg or gagjjjjagy or hdajgjga111
(?!(.)\1{3})
Try using following regex to match string containing 3 or more repeating back-to-back characters
(.)\1{2,}
And then invert the match using flags. Most of the languages support it.
For example, with grep
$ cat file
abcdefg
gagjagyyy
3333ahg
$ grep -v -E '(.)\1{2,}' file
abcdefg
If you are using C#, you may try this:
using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
class Program
{
const string
isMatch = "IsMatch",
pattern = #"(?:(?<Open>\w*?(\w)\1{{2,}}\w*)|(?<{0}>\w*))";
static void Main(string[] args)
{
var input = File.ReadAllText("input.txt");
var regex = String.Format(pattern, isMatch);
var matches = Regex.Matches(input, regex)
.Cast<Match>()
.Select<Match, Group>(m => m.Groups[isMatch])
.Where(g => g.Value != string.Empty)
.ToList();
matches.ForEach(m => Console.WriteLine(m.Value));
}
}
Try this:
^(?!.*(.)\1\1.*$).+$ - matches whole string as one word
(?=\b|^)(?!\w*(\w)\1\1\w*)\w+(?:\b|$) - matches one word
Example: http://rubular.com/r/dkIHkDo67g
I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"