Replace a single character within a match via Regex - regex

I have a simple pattern that I am trying to do a find and replace on. It needs to replace all dashes with periods when they are surrounded by numbers.
Replace the period in these:
3-54
32-11
111-4523mhz
Like So:
3.54
32.11
111.4523mhz
However, I do not want to replace the dash inside anything like these:
Example-One
A-Test
I have tried using the following:
preg_replace('/[0-9](-)[0-9]/', '.', $string);
However, this will replace the entire match instead of just the middle. How do you only replace a portion of the match?

preg_replace('/([0-9])-([0-9])/', '$1.$2', $string);
Should do the trick :)
Edit: some more explanation:
By using ( and ) in a regular expression, you create a group. That group can be used in the replacement. $1 get replaced with the first matched group, $2 gets replaced with the second matched group and so on.
That means that if you would (just for example) change '$1.$2' to '$2.$1', the operation would swap the two numbers.
That isn't useful behavior in your case, but perhaps it helps you understand the principle better.

Depending on the regex implementation you're using, you can use non-capturing groups:
preg_replace('/(?<=[0-9])-(?=[0-9])/', '.', $string);

You can use back-references to retain the parts of the match you want to keep:
preg_replace('/([0-9])-([0-9])/', '$1.$2', $string);

Below are couple of ways of achieving the same.
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created by Rituraj_Jain on 7/14/2017.
*/
public class Main {
private static final Pattern pattern = Pattern.compile("(\\S)-(\\S)");
private static final String input = "Test-TestBad-ManRituRaj-Jain";
public static void main(String[] args) {
Matcher matcher = pattern.matcher(input);
String s = matcher.replaceAll("$1 $2");
System.out.println(s);
System.out.println(input.replaceAll(pattern.pattern(), "$1 $2"));
}
public static void main1(String[] args) {
Matcher matcher = pattern.matcher(input);
StringBuffer finalString = new StringBuffer();
while(matcher.find()){
String replace = matcher.group().replace("-", " ");
matcher.appendReplacement(finalString, replace);
}
matcher.appendTail(finalString);
System.out.println(input);
System.out.println(finalString.toString());
}
}

Related

java regex pattern.compile Vs matcher

Im trying to find whether a word contains consecutive identical strings or not, using java.regex.patterns, while testing an regex with matcher, It returns true. But if I only use like this :
System.out.println("test:" + scanner.hasNext(Pattern.compile("(a-z)\\1")));
it returns false.
public static void test2() {
String[] strings = { "Dauresselam", "slab", "fuss", "boolean", "clap", "tellme" };
String regex = "([a-z])\\1";
Pattern pattern = Pattern.compile(regex);
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(string);
}
}
}
this returns true. which one is correct.
The pattern ([a-z])\\1 uses a capturing group to match a single lowercase character which is then followed by a backreference to what is captured in group 1.
Ih you have Dauresselam for example, it would match the first s in the capturing group and then matches the second s. So if you want to match consecutive characters you could use that pattern.
The pattern (a-z)\\1 uses a capturing group to match a-z literally and then then uses a backreference to what is captured in group 1. So that would match a-za-z
It depends on what you want. Here you use parenthesis:
Pattern.compile("(a-z)\\1").
Here you use Square brackets inside pareanthesis:
String regex = "([a-z])\\1";
To compare, you should obviously use the same pattern.

Split the string at the particular occurrence of special character (+) using regex in Java

I want to split the following string around +, but I couldn't succeed in getting the correct regex for this.
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
I want to have a result like this
[SOP3a'+bEOP3', SOP3b'+aEOP3']
In some cases I may have the following string
c+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2
which should be split as
[c, SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2]
I have tried the following regex but it doesn't work.
input.split("(SOP[0-9](.*)EOP[0-9])*\\+((SOP)[0-9](.*)(EOP)[0-9])*");
Any help or suggestions are appreciated.
Thanks
You can use the following regex to match the string and by replacing it using captured group you can get the expected result :
(?m)(.*?)\+(SOP.*?$)
see demo / explanation
Following is the code in Java that would work for you:
public static void main(String[] args) {
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
String pattern = "(?m)(.*?)\\+(SOP.*?$)";
Pattern regex = Pattern.compile(pattern);
Matcher m = regex.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
The m.group(1) and m.group(2) are the values that you are looking for.
Do you really need to use split method?
And what are the rules? They are unclear to me.
Anyway, considering the regex you provided, I've only removed some unnecessary groups and I've found what you are looking for, however, instead of split, I just joined the matches as splitting it would generate some empty elements.
const str = "SOP1a+bEOP1+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2";
const regex = RegExp(/(SOP[0-9].*EOP[0-9])*\+(SOP[0-9].*EOP[0-9])*/)
const matches = str.match(regex);
console.log('Matches ', matches);
console.log([matches[1],matches[2]]);

replace a tag with regex

I'm trying to do my homework but regex is new for me and I'm not sure why my code doesn't work. That's what I have to do:
Write a program that replaces in a HTML document given as string all the tags <a href=…>…</a> with corresponding tags [URL href=…]…[/URL]. Read an input, until you receive “end” command. Print the result on the console.
I wrote:
Pattern pattern = Pattern.compile("<a href=\"(.)+\">(.)+<\\/a>");
input = input.replaceAll(matcher.toString(), "href=" + matcher.group(1) + "]" + matcher.group(2) + "[/URL]");
And it throws Exception in thread "main" java.lang.IllegalStateException:
No match found for this input: href="http://softuni.bg">SoftUni</a>
Your + quantifer needs to be inside the parentheses:
<a href=\"(.+)\">(.+)<\\/a>
You were heading in the right direction, but you can't use a Pattern object like that.
First, change you code to use replaceAll() just with strings directly and use normal back references $n in the replacement string.
Your code thus converted is:
input = input.replaceAll("<a href=(\".+\")>(.)+<\\/a>", "href=$1]$2[/URL]");
Next, fix the expressions:
input = input.replaceAll("<a href=(\".+\")>(.+)</a>", "[URL href=$1]$2[/URL]");
The changes were to put the + inside the capturing group. ie (.)+ -> (.+) and also to capture the double quotes, since you have to put them back if I interpret the "spec" correctly.
Also note that you don't need to escape a forward slash. Forward slashes are just plain old characters in all regex flavors. Although some languages use forward slashes to delimit regular expressions, java isn't one of them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace _06.Replace_a_Tag
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
while (text != "end")
{
string pattern = #"<a.*?href.*?=(.*)>(.*?)<\/a>";
// is used to take only 2 groups :
// first group (or group one) is used for the domain name
// for example : "https://stackoverflow.com"
// and the second is for if you want to enter some text
// (or no text)
// for example : This is some text
string replace = #"[URL href=$1]$2[/URL]";
// we use $ char and a number (like placeholders)
// for example : $1 means take whatever you find from group 1
// and : $2 means take whatever you find from group 2
string replaced = Regex.Replace(text, pattern , replace);
// In a specific input string (text), replaces all strings
// that match a specified regular expression (pattern ) with
// a specified replacement string (replace)
Console.WriteLine(replaced);
text = Console.ReadLine();
}
}
}
}
// input : <ul><li></li></ul>
// output: <ul><li>[URL href=""][/URL]</li></ul>

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"