Match a space between " " - regex

I want to match text that contains:
MyValue="{NON_SPACEs}{SPACE_ONE_OE_MORE}{NON_SPACEs}"
pattern:
MyValue="(\S*)(\s+)(\S*)"
Example of text:
sometext MyValue="val1 val2" sometext="xyz"
the problem of my pattern that it's also matches:
sometext MyValue="val1val2" sometext="xyz" (no space between val1 and val2)
I use this for tests: http://regexpal.com/

Restrict your non-space chars to also be non-quotes:
MyValue="([^\s"]*)(\s+)([^\s"]*)"
This regex won't try to span multiple quoted values.
Consider removing some or all of those brackets, especially around the spaces, unless you need to capture a group.

This is what you are looking for:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="abc xyz";
string re1=".*?"; // Non-greedy match on filler
string re2="(\\s+)"; // White Space 1
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String ws1=m.Groups[1].ToString();
Console.Write("("+ws1.ToString()+")"+"\n");
}
Console.ReadKey();
}
}
}
Hope it Helps :)

Related

Regular Expression to match last word when string starts with pattern

I'm trying to create a regex to match the last word of a string, but only if the string starts with a certain pattern.
For example, I want to get the last word of a string only if the string starts with "The cat".
"The cat eats butter" -> would match "butter".
"The cat drinks milk"-> would match "milk"
"The dog eats beef" -> would find no match.
I know the following will give me the last word:
\s+\S*$
I also know that I can use a positive look behind to make sure a string starts with a certain pattern:
(?<=The cat )
But I can't figure out to combine them.
I'll be using this in c# and I know I could combine this with some string comparison operators but I'd like this all to be in one regex expression, as this is one of several regex pattern string that I'll be looping through.
Any ideas?
Use the following regex:
^The cat.*?\s+(\S+)$
Details:
^ - Start of the string.
The cat - The "starting" pattern.
.*? - A sequence of arbitrary chars, reluctant version.
\s+ - A sequence of "white" chars.
(\S+) - A capturing group - sequence of "non-white" chars,
this is what you want to capture.
$ - End of the string.
So the last word will be in the first capturing group.
What about this one?
^The\scat.*\s(\w+)$
My regex knowdlege is quite rusty, but couldn't you simply "add" the word you are looking for at the start of \s+\S*$, if you know that will return the last word?
Something like this then (the "\" is supposed to be the escape sign so it's read as the actual word):
\T\h\e\ \c\a\t\ \s+\S*$
Without Regex
No need for regex. Just use C#'s StartsWith with Linq's Split(' ').Last().
See code in use here
using System;
using System.Linq;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
foreach(string s in strings) {
if(s.StartsWith("The cat")) {
Console.WriteLine(s.Split(' ').Last());
}
}
}
}
Result:
butter
milk
With Regex
If you prefer, however, a regex solution, you may use the following.
See code in use here
using System;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
Regex regex = new Regex(#"(?<=^The cat.*)\b\w+$");
foreach(string s in strings) {
Match m = regex.Match(s);
if(m.Success) {
Console.WriteLine(m.Value);
}
}
}
}
Result:
butter
milk

Regular expression to match all digits of unknown length except the last 4 digits

There is a number with unknown length and the idea is to build a regular expression which matches all digits except last 4 digits.
I have tried a lot to achieve this but no luck yet.
Currently I have this regex: "^(\d*)\d{0}\d{0}\d{0}\d{0}.*$"
Input: 123456789089775
Expected output: XXXXXXXXXXX9775
which I am using as follows(and this doesn't work):
String accountNumber ="123456789089775";
String pattern = "^(\\d*)\\d{1}\\d{1}\\d{1}\\d{1}.*$";
String result = accountNumber.replaceAll(pattern, "X");
Please suggest how I should approach this problem or give me the solution.
In this case my whole point is to negate the regex : "\d{4}$"
You may use
\G\d(?=\d{4,}$)
See the regex demo.
Details
\G - start of string or end of the previous match
\d - a digit
(?=\d{4,}$) - a positive lookahead that requires 4 or more digits up to the end of the string immediately to the right of the current location.
Java demo:
String accountNumber ="123456789089775";
String pattern = "\\G\\d(?=\\d{4,}$)"; // Or \\G.(?=.{4,}$)
String result = accountNumber.replaceAll(pattern, "X");
System.out.println(result); // => XXXXXXXXXXX9775
still not allowed to comment as I don't have that "50 rep" yet but DDeMartini's answer would swallow prefixed non-number-accounts as "^(.*)" would match stuff like abcdef1234 as well - stick to your \d-syntax
"^(\\d+)(\\d{4}$)"
seems to work fine and demands numbers (minimum length 6 chars). Tested it like
public class AccountNumberPadder {
private static final Pattern LAST_FOUR_DIGITS = Pattern.compile("^(\\d+)(\\d{4})");
public static void main(String[] args) {
String[] accountNumbers = new String[] { "123456789089775", "999775", "1234567890897" };
for (String accountNumber : accountNumbers) {
Matcher m = LAST_FOUR_DIGITS.matcher(accountNumber);
if (m.find()) {
System.out.println(paddIt(accountNumber, m));
} else {
throw new RuntimeException(String.format("Whooaaa - don't work for %s", accountNumber));
}
}
}
public static String paddIt(String input, Matcher m) {
StringBuilder b = new StringBuilder();
for (int i = 0; i < m.group(1).length(); i++) {
b.append("X");
}
return input.replace(m.group(1), b.toString());
}
}
Try:
String pattern = "^(.*)[0-9]{4}$";
Addendum after comment: A refactor to only match full numerics could look like this:
String pattern = "^([0-9]+)[0-9]{4}$";

replace a tag with regex

I'm trying to do my homework but regex is new for me and I'm not sure why my code doesn't work. That's what I have to do:
Write a program that replaces in a HTML document given as string all the tags <a href=…>…</a> with corresponding tags [URL href=…]…[/URL]. Read an input, until you receive “end” command. Print the result on the console.
I wrote:
Pattern pattern = Pattern.compile("<a href=\"(.)+\">(.)+<\\/a>");
input = input.replaceAll(matcher.toString(), "href=" + matcher.group(1) + "]" + matcher.group(2) + "[/URL]");
And it throws Exception in thread "main" java.lang.IllegalStateException:
No match found for this input: href="http://softuni.bg">SoftUni</a>
Your + quantifer needs to be inside the parentheses:
<a href=\"(.+)\">(.+)<\\/a>
You were heading in the right direction, but you can't use a Pattern object like that.
First, change you code to use replaceAll() just with strings directly and use normal back references $n in the replacement string.
Your code thus converted is:
input = input.replaceAll("<a href=(\".+\")>(.)+<\\/a>", "href=$1]$2[/URL]");
Next, fix the expressions:
input = input.replaceAll("<a href=(\".+\")>(.+)</a>", "[URL href=$1]$2[/URL]");
The changes were to put the + inside the capturing group. ie (.)+ -> (.+) and also to capture the double quotes, since you have to put them back if I interpret the "spec" correctly.
Also note that you don't need to escape a forward slash. Forward slashes are just plain old characters in all regex flavors. Although some languages use forward slashes to delimit regular expressions, java isn't one of them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace _06.Replace_a_Tag
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
while (text != "end")
{
string pattern = #"<a.*?href.*?=(.*)>(.*?)<\/a>";
// is used to take only 2 groups :
// first group (or group one) is used for the domain name
// for example : "https://stackoverflow.com"
// and the second is for if you want to enter some text
// (or no text)
// for example : This is some text
string replace = #"[URL href=$1]$2[/URL]";
// we use $ char and a number (like placeholders)
// for example : $1 means take whatever you find from group 1
// and : $2 means take whatever you find from group 2
string replaced = Regex.Replace(text, pattern , replace);
// In a specific input string (text), replaces all strings
// that match a specified regular expression (pattern ) with
// a specified replacement string (replace)
Console.WriteLine(replaced);
text = Console.ReadLine();
}
}
}
}
// input : <ul><li></li></ul>
// output: <ul><li>[URL href=""][/URL]</li></ul>

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.

Words starting with '#' doesn't match if I place in word boundary \b

I am using Regex in c# console app.
I've words starting with '#' in my string and I am using Regex to match those but that doesn't seem to be working.
Here's my code
public static void regularExpression()
{
string[] sentences =
{
"#TODAY is 18 Dec",
"#TODAY_CAL is 18 Dex",
"#YESTERDAY was 17 dec",
"#YESTERDAY_CAL was 17 Dec"
};
string sPattern = #"\b#TODAY\b";
foreach (string s in sentences)
{
System.Console.Write("{0,24}", s);
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern, System.Text.RegularExpressions.RegexOptions.IgnoreCase))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
}
}
#TODAY doesn't match. If I replace the sPattern with
string sPattern = #"#TODAY";
it works. But in that case it matches even #TODAY_CAL, which is exactly what I am trying to avoid.
I want to exact word to match.
Any suggestions???
Try this:
string sPattern = #"#TODAY\b";