replace a tag with regex - regex

I'm trying to do my homework but regex is new for me and I'm not sure why my code doesn't work. That's what I have to do:
Write a program that replaces in a HTML document given as string all the tags <a href=…>…</a> with corresponding tags [URL href=…]…[/URL]. Read an input, until you receive “end” command. Print the result on the console.
I wrote:
Pattern pattern = Pattern.compile("<a href=\"(.)+\">(.)+<\\/a>");
input = input.replaceAll(matcher.toString(), "href=" + matcher.group(1) + "]" + matcher.group(2) + "[/URL]");
And it throws Exception in thread "main" java.lang.IllegalStateException:
No match found for this input: href="http://softuni.bg">SoftUni</a>

Your + quantifer needs to be inside the parentheses:
<a href=\"(.+)\">(.+)<\\/a>

You were heading in the right direction, but you can't use a Pattern object like that.
First, change you code to use replaceAll() just with strings directly and use normal back references $n in the replacement string.
Your code thus converted is:
input = input.replaceAll("<a href=(\".+\")>(.)+<\\/a>", "href=$1]$2[/URL]");
Next, fix the expressions:
input = input.replaceAll("<a href=(\".+\")>(.+)</a>", "[URL href=$1]$2[/URL]");
The changes were to put the + inside the capturing group. ie (.)+ -> (.+) and also to capture the double quotes, since you have to put them back if I interpret the "spec" correctly.
Also note that you don't need to escape a forward slash. Forward slashes are just plain old characters in all regex flavors. Although some languages use forward slashes to delimit regular expressions, java isn't one of them.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace _06.Replace_a_Tag
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
while (text != "end")
{
string pattern = #"<a.*?href.*?=(.*)>(.*?)<\/a>";
// is used to take only 2 groups :
// first group (or group one) is used for the domain name
// for example : "https://stackoverflow.com"
// and the second is for if you want to enter some text
// (or no text)
// for example : This is some text
string replace = #"[URL href=$1]$2[/URL]";
// we use $ char and a number (like placeholders)
// for example : $1 means take whatever you find from group 1
// and : $2 means take whatever you find from group 2
string replaced = Regex.Replace(text, pattern , replace);
// In a specific input string (text), replaces all strings
// that match a specified regular expression (pattern ) with
// a specified replacement string (replace)
Console.WriteLine(replaced);
text = Console.ReadLine();
}
}
}
}
// input : <ul><li></li></ul>
// output: <ul><li>[URL href=""][/URL]</li></ul>

Related

Am I using this [^{\}]+(?=}) regular expression correctly on TypeScript

Hello I am trying to get the values surrounded by curly braces "{value}". I am using this regular expression [^{}]+(?=}) correctly?
let url = "/{id}/{name}/{age}";
let params = url.match('[^{\}]+(?=})');
if(params != null){
params.forEach(param => {
console.log(param);
});
}
// expected output
id
name
age
//actual output
id
By default the search stops after the first match.
You should use a regex literal delimited by slashes (like: /regex/), not a string delimited by quotes (like: 'string').
And then you should add the /g flag to the end of it, which means "global". This lets it search through the whole string to find all matches.
let url = "/{id}/{name}/{age}";
let params = url.match(/[^{\}]+(?=})/g);
// ^ do a global search
if(params != null){
params.forEach(param => {
console.log(param);
});
}
From MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches.

Regular expresion with a specific character and without another

I'm trying to implement the escape character functionality in a macro generator I'm writing in Dart. For example, I would like the program to grab all the occurrences of '&param' in my string and replace it with 'John', unless the '&' character is preceded with the escape character '\'. Example: "My name is &param and my parameter is called \&param." -> "My name is John and my parameter is called &param". What would be the regular expression to catch all the substrings that contain the '&', then my parameter's name, and without the preceding '\'?
It's possible to match that, even avoiding escapes of backslashes, as:
var re = RegExp(r"(?<!(?:^|[^\\])(?:\\{2})*\\)&\w+");
This uses negative lookbehind to find a & followed by word-characters, and not preceded by an odd number of backslashes.
More likely, you want to also recognize double-backslashes and convert them to single-backslashes. That's actually easier if you try to find all matches, because then you know all preceding double-backslashes are part of an earlier match:
var re = RegExp(r"\\\\|(?<!\\)&\w+");
This, when used as re.allMatches will find all occurrences of \\ and &word where the latter is not preceded by an odd number of backslashes.
var _re = RegExp(r"\\\\|(?<!\\)&(\w+)");
String template(String input, Map<String, String> values) {
return input.replaceAllMapped(_re, (m) {
var match = m[0]!;
if (match == r"\\") return r"\";
var replacement = values[m[1]!];
if (replacement != null) return replacement;
// do nothing for undefined words.
return match;
});
}
(You might also want to allow something like &{foo} if parameters can occur next to other characters, like &{amount)USD).
To keep the character before &param when it matches a non-backslash character you need to use so called capturing groups. These are are subexpressions of a regular expression inside parentheses. To use capturing groups in Dard you need to use the method replaceAllMapped. We also have the case when the template starts with &param and in this case we match at the beginning of the string instead.
Try this:
void main() {
final template = 'My name is &param and my parameter is called \\&param.';
final populatedTemplate = template.replaceAllMapped(RegExp(r'(^|[^\\])&param\b'), (match) {
return '${match.group(1)}John';
});
final result = populatedTemplate.replaceAll(RegExp(r'\\&param\b'), 'John');
print(result);
}

Find out whole string inside square brackets and match one word inside square brackets using regex

I want to find out whole string and one word inside square bracket .
Sample string = [This is my first Question];
I want to search it for [This ..
My expected output should be whole string inside bracket if string includes [This.
Can anyone help in this using regex? Any help would be appreciated.
My code is below :
string mainStr = 'wrapper s = [This is my first Question]';
Pattern pattr = Pattern.compile('\[This[^]]+\]');
Matcher mat = pattr .matcher(mainStr );
system.debug('mat is -----'+mat.matches());
system.debug('m is -----'+mat.find());
string n = null;
if(mat.matches()) { n = mat.group(); system.debug('m is ----'+n); }
if using regEx (\[(\w+)[^]]+\])
you get
in $1 the string beetween [] and
in $2 the first word inide of this.
Below you see a demo used inside a texteditor (phpStorm):

Regex that does excludes characters that repeat (3) three or more times

I'm looking for a regex that searches a file and does NOT return (i.e. it excludes) chars that repeat 3 or more times consecutively in a string. I've tried this expression below, but it's NOT doing the the job :( ..something that looks fwd and backward and excludes strings that have 3 or more repeating back-to back chars. i.e. it should return abcdefg, but not 3333ahg or gagjjjjagy or hdajgjga111
(?!(.)\1{3})
Try using following regex to match string containing 3 or more repeating back-to-back characters
(.)\1{2,}
And then invert the match using flags. Most of the languages support it.
For example, with grep
$ cat file
abcdefg
gagjagyyy
3333ahg
$ grep -v -E '(.)\1{2,}' file
abcdefg
If you are using C#, you may try this:
using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
class Program
{
const string
isMatch = "IsMatch",
pattern = #"(?:(?<Open>\w*?(\w)\1{{2,}}\w*)|(?<{0}>\w*))";
static void Main(string[] args)
{
var input = File.ReadAllText("input.txt");
var regex = String.Format(pattern, isMatch);
var matches = Regex.Matches(input, regex)
.Cast<Match>()
.Select<Match, Group>(m => m.Groups[isMatch])
.Where(g => g.Value != string.Empty)
.ToList();
matches.ForEach(m => Console.WriteLine(m.Value));
}
}
Try this:
^(?!.*(.)\1\1.*$).+$ - matches whole string as one word
(?=\b|^)(?!\w*(\w)\1\1\w*)\w+(?:\b|$) - matches one word
Example: http://rubular.com/r/dkIHkDo67g

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"