Regex - Remove all characters before and after - regex

Is is it possible to remove all characters before (and including) every character to third ' and also everything after (and including) the fourth ', basically isolating the text inside the 3rd and 4th '
example:
a, 'something', 'ineedthistext', 'moretexthere'
should result in
ineedthistext

Regex might not be the best tool to do this (split by comma/apostrophe might actually be a better way), but if you want regex...
Maybe instead of removing all the characters before and after ineedthistext, you can capture ineedthistext from the group.
I would use something like:
^.*?'.*?'.*?'(.*?)'
Tested with rubular.

Try
public String stringSplit(String input) {
String[] wordArray = input.split("'");
String requiredText = wordArray[3];
return requiredText;
}
This will work if you always want the bit between the 3rd and 4th '.

Derived from this answer, a possible solution is:
Regex.Match(yourString, #"\('[^']*)\)").Groups[2].Value
The code looks for all strings embedded between 2 single quotes, and puts them in groups. You need the 2nd group.
To alter your string directly, effectively removing the unwanted characters, you could use:
yourString = Regex.Match(yourString, #"\('[^']*)\)").Groups[2].Value

Related

Regex Expression for textfield

I want a regix format that Must be alphabets and special characters (like space, ‘, -) but numeric value should not be taken.
I tried with this expression /^[a-zA-Z ]*$/ but it treats space as special character.
Please Help.
/^[a-zA-Z\s\-\'\"]*$/
use this.
This will contain any alphabet([upper/lower]case)
,space,
hiphen,
",
'
update
If you are using it inside NSPredicate
then make sure that you put the - in the end, as it throws error.
Move it to the end of the sequence to be the last character before the closing square bracket ].
like this [a-zA-Z '"-]
If you want only the alphabets and space, ' and - then:
/^[-a-zA-Z\s\']+$/
Notice the + from above instead of *. If you use * then it will match with empty string, where the + sign means to have at least one character in your input.
Now, if you want to match any alphabets with any special characters(not only those three which are mentioned), then I'll just you to use this one:
/^\D+$/
It means any characters other than digits!
Maybe try this:
\b[a-zA-Z \-\']+\b
http://regex101.com/r/oQ5nU9
You can use it defiantly work it
[a-zA-Z._^%$#!~#,-]+
this code work fine you can try it....
//Use this for allowing space as we all as other special character.
#"[a-zA-Z\\s\\-\\'\\"]"
//Following link will be help for further.
http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet
Thanks for your response.. I finally resolved it with this
NSString characterRegex = #"^(\s[a-zA-Z]+(([\'\-\+\s]\s*[a-zA-Z])?[a-zA-Z])\s)+$";
NSPredicate *characterTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#",characterRegex];
return [characterTest evaluateWithObject:inputString];

Regex to remove characters up to a certain point in a string

How do I use regex to convert
11111aA$xx1111xxdj$%%`
to
aA$xx1111xxdj$%%
So, in other words, I want to remove (or match) the FIRST grouping of 1's.
Depending on the language, you should have a way to replace a string by regex. In Java, you can do it like this:
String s = "11111aA$xx1111xxdj$%%";
String res = s.replaceAll("^1+", "");
The ^ "anchor" indicates that the beginning of the input must be matched. The 1+ means a sequence of one or more 1 characters.
Here is a link to ideone with this running program.
The same program in C#:
var rx = new Regex("^1+");
var s = "11111aA$xx1111xxdj$%%";
var res = rx.Replace(s, "");
Console.WriteLine(res);
(link to ideone)
In general, if you would like to make a match of anything only at the beginning of a string, add a ^ prefix to your expression; similarly, adding a $ at the end makes the match accept only strings at the end of your input.
If this is the beginning, you can use this:
^[1]*
As far as replacing, it depends on the language. In powershell, I would do this:
[regex]::Replace("11111aA$xx1111xxdj$%%","^[1]*","")
This will return:
aA$xx1111xxdj$%%
If you only want to replace consecutive "1"s at the beginning of the string, replace the following with an empty string:
^1+
If the consecutive "1"s won't necessarily be the first characters in the string (but you still only want to replace one group), replace the following with the contents of the first capture group (usually \1 or $1):
1+(.*)
Note that this is only necessary if you only have a "replace all" capability available to you, but most regex implementations also provide a way to replace only one instance of a match, in which case you could just replace 1+ with an empty string.
I'm not sure but you can try this
[^1](\w*\d*\W)* - match all as a single group except starting "1"(n) symbols
In Javascript
var str = '11111aA$xx1111xxdj$%%';
var patt = /^1+/g;
str = str.replace(patt,"");

Regex - Extract a substring from a given string

I have a string here, This is a string: AAA123456789.
So the idea here is to extract the string AAA123456789 using regex.
I am incorporating this with X-Path.
Note: If there is a post to this, kindly lead me to it.
I think, by right, I should substring(myNode, [^AAA\d+{9}]),
I am not really sure bout the regex part.
The idea is to extract the string when met with "AAA" and only numbers but 9 consequent numbers only.
Pure XPath solution:
substring-after('This is a string: AAA123456789', ': ')
produces:
AAA123456789
XPath 2.0 solutions:
tokenize('This is a string: AAA123456789 but not an double',
' '
)[starts-with(., 'AAA')]
or:
tokenize('This is a string: AAA123456789 but not an double',
' '
)[matches(., 'AAA\d+')]
or:
replace('This is a string: AAA123456789 but not an double',
'^.*(A+\d+).*$',
'$1'
)
Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,
concat("AAA", substring(substring-after(., "AAA"), 1, 9)).
So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.
This will allow the data to be correct no matter what other contributions there is.
But I like the regex by #Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.
And also thanks to you guys out there to...
First, I'm pretty sure you don't mean to have the [^ ... ]. That defines a "negative character class", i.e. your current regex says, "Give me a single character that is not one of the following: A0123456789{}". You probably meant, plainly, "AAA(\d{9})". Now, according to this handy website, XPath does support capture groups, as well as backreferences, so take your pick:
"AAA(\d{9})"
And extracting $1, the first capture group, or:
"(?<=AAA)\d{9}"
And taking the whole match ($0).
Can you try this :
A{3}(\d{9})

How do you find all text up to the first character x on a line?

Sorry, this is probably really easy. But if you have a delimiter character on each line and you want to find all of the text before the delimiter on each line, what regular expression would do that? I don't know if the delimiter matters but the delimiter I have is the % character.
Your text will be in group 1.
/^(.*?)%/
Note: This will capture everything up the percent sign. If you want to limit what you capture replace the . with the escape sequence of your choice.
In python, you can use:
def GetStuffBeforeDelimeter(str, delim):
return str[:str.find(delim)]
In Java:
public String getStuffBeforeDelimiter(String str, String delim) {
return str.substring(0, str.indexOf(delim));
}
In C++ (untested):
using namespace std;
string GetStuffBeforeDelimiter(const string& str, const string& delim) {
return str.substr(0, str.find(delim));
}
In all the above examples you will want to handle corner cases, such as your string not containing the delimeter.
Basically I would use substringing for something this simple becaues you can avoid scanning the entire string. Regex is overkill, and "exploding" or splitting on the delimeter is also unnecessary because it looks at the whole string.
You don't say what flavor of regex, so I'll use Perl notation.
/^[^%]*/m
The first ^ is a start anchor: normally it matches only the beginning of the whole string, but this regex is in multiline mode thanks the 'm' modifier at the end. [^%] is an inverted character class: it matches any one character except a '%'. The * is a quantifier that means to match the previous thing ([^%] in this case) zero or more times.
you don't have to use regex if you don't want to. depending on the language you are using, there will be some sort of string function such as split().
$str = "sometext%some_other_text";
$s = explode("%",$str,2);
print $s[0];
this is in PHP, it split on % and then get the first element of the returned array. similarly done in other language with splitting methods as well.

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression:
H3Y5NC8E-TGA5B6SB-2NVAQ4E0
and return the following using Split:
H3Y5NC8E
TGA5B6SB
2NVAQ4E0
I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:
([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}
This will match exactly 3 groups of 8 characters each. Any more or less will fail the match.
This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:
(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}
But this does not work.
Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.
Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.
The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.
BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.
[[A-Z\d]-[IOUW]]
If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?
([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}
In PHP I would return (I don't know .NET)
return "$1 $2 $3";
I have discovered the answer I was after. Here is my working code:
static void Main(string[] args)
{
string pattern = #"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
Regex re = new Regex(pattern);
Match m = re.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["group"].Captures)
Console.WriteLine(c.Value);
}
After reviewing your question and the answers given, I came up with this:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = #"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
string match = matches[i].Value;
}
Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.
Why use Regex? If the groups are always split by a -, can't you use Split()?
Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?
Dim stringArray As Array = someString.Split("-")
What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.
My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.
If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.
system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value
Mike,
You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])
You can use this pattern:
Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")
But you will need to filter out empty strings from resulting array.
Citation from MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array.