Regulare expression - regex

I need the regular expression for below string cases,
String value = "�江苏银行股份有限公司南京迈皋桥支行";
String value = "�/CNYXB/02112";
in both the cases only the character "�" needs to be removed and the final string values should be as below after applying regular expression,
String value = "江苏银行股份有限公司南京迈皋桥支行";
String value = "/CNYXB/02112";
thanks in advance!!!
yes i have tried below regEx,
value = value.replaceAll("[^\\p{ASCII}]", "");

I'm not sure if this is what you're actually asking, but you can easily remove the first character from the string:
^.
matches the first character at the start of the string.
If you want to remove an out-of-range character then you need to define your range. Use multiple classes wiht octal escapes, so something like:
[\o{2444}-\o{3444}\o{40}-\o{77}]
without know what the characters you're looking for really are it's difficult to be more specific.

try to use replaceFirst instead of replaceAll:
value = value.replaceFirst("[^\\p{ASCII}]", "");

Related

Processing a string with the null character

I have a text file full of strings (computer paths) which I want to process by replacing every backslash with an underscore, in addition to replacing every number ( integer or float) with an underscore as well, the original string looks like that :
string = "\Software\Microsoft\0\Windows\CurrentVersion\Internet Settings\5.0\Cache"
Usually, I could replace easily the backslash with the following command:
string=string.replace('\\','_')
and apply some regular expressions such as: '(\d(?:\.\d)?)' to replace the numbers.
However in my case I couldn't do either, because python recognise always '\0' as a null character and '\5.0' as ENQ, in fact any number follow the backslash will be treated the same way as well.
Any suggested way to replace them ?
e.g. is there a way to convert my string to raw string as a start ?
Always remember: Backslash(\) escapes special characters. If you want to use the backslash itself, you need to escape it too. Your string should look like this:
string = "\\Software\\Microsoft\\0\\Windows\\CurrentVersion\\Internet Settings\\5.0\\Cache"

Generalized Regex from a set of String

I have this problem. I need to find automatically a way to generate a regex that match a set of string.
For example, given the set of string in input:
S = ["Casino Royale (1928)", "Mission Goldfinger", "A view to a kill"]
create iterating at the start a regex that match the first string, so:
regex1 = "\w{6}\s\w{6}\s\(\d{4}\)"
then compare regex1 with the second string, so:
regex2 = "\w{6-7}\s\w{6-10}(\s\(\d{4}\))?"
and then with the last string, so the final output is:
regex_output = "\w{1-7}\s\w{4-10}(\s\w{2}\s\w\s\w{4}|\s\(\d{4}\))?"
I would like to if it is possible to realize. Maybe it is a problem of complexity theory, maybe.
Thanks in advice.
Use an alternation of literals:
^\QCasino Royale (1928)\E|\QMission Goldfinger\E|\QA view to a kill\E$
\Q...\E means the characters contained to be matched literally.
This approach can of course handle an arbitrarily large list of strings.

Textbox that accepts all the characters but returns only numbers

I need the code to filter the data entered in a textbox. Although it accepts all the characters during runtime, the code should remove all the strings and alpha numeric characters except the numbers (which would be my output). I tried the following code but guess it won't do:
a = Textbox1.text
Dim value As Decimal = CDec(Regex.Replace(a, "[\D]", ""))
Your regex was correct (just a bit redundant, \D would have done). Better would have been \D+ so consecutive non-decimals are replaced at once.
ResultString = Regex.Replace(SubjectString, "\D+", "")
I use this jQuery plugin. http://plugins.jquery.com/project/jQueryNumberLettersPlugin
$("#id").numbers();
That would only allow numbers to be entered into the selected input.
Try this instead, use the matching object
Dim a As String
Try
a = Regex.Match(Textbox1.text, "\d+").Value
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

How to replace all the numbers with literal \d in scala?

I want to write a function, to replace all the numbers in a string with literal \d. My code is:
val r = """\d""".r
val s = r.replaceAllIn("123abc", """\d""")
println(s)
I expect the result is \d\d\dabc, but get:
dddabc
Then I change my code (line 2) to:
val s = r.replaceAllIn("123abc", """\\d""")
The result is correct now: \d\d\dabc
But I don't understand why the method replaceAllIn converts the string, not use it directly?
There was a toList in my previous code, that now what I want. I have just update the question. Thanks to everyone.
Just remove the toList.
val r = """\d""".r
val list = r.replaceAllIn("123abc", """\\d""")
println(list)
Strings are (implicitly, via WrappedString, convertible to) Seq[Char]. If you invoke toList, you will have a List[Char].
Scala's Regex uses java.util.regex underneath (at least on the JVM). Now, if you look up replaceAll on Java docs, you'll see this:
Note that backslashes (\) and dollar
signs ($) in the replacement string
may cause the results to be different
than if it were being treated as a
literal replacement string. Dollar
signs may be treated as references to
captured subsequences as described
above, and backslashes are used to
escape literal characters in the
replacement string.

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression:
H3Y5NC8E-TGA5B6SB-2NVAQ4E0
and return the following using Split:
H3Y5NC8E
TGA5B6SB
2NVAQ4E0
I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:
([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}
This will match exactly 3 groups of 8 characters each. Any more or less will fail the match.
This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:
(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}
But this does not work.
Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.
Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.
The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.
BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.
[[A-Z\d]-[IOUW]]
If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?
([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}
In PHP I would return (I don't know .NET)
return "$1 $2 $3";
I have discovered the answer I was after. Here is my working code:
static void Main(string[] args)
{
string pattern = #"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
Regex re = new Regex(pattern);
Match m = re.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["group"].Captures)
Console.WriteLine(c.Value);
}
After reviewing your question and the answers given, I came up with this:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = #"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
string match = matches[i].Value;
}
Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.
Why use Regex? If the groups are always split by a -, can't you use Split()?
Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?
Dim stringArray As Array = someString.Split("-")
What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.
My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.
If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.
system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value
Mike,
You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])
You can use this pattern:
Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")
But you will need to filter out empty strings from resulting array.
Citation from MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array.