I need something besides [^0-9\n], I want a regex(but dont know how to make one), that captures anything in a pattern of numbers like this, "0000000000" or "000-000-0000" or basically any numbers that exist with spaces and or special characters right before or in between.
so any number, even like these (626*) 34a2- 4387) should convert to 6263424387
How can this be accomoplished? Im thinking its too hard?
You can search for all non-digits using:
\D+
OR
[^0-9]+
And replace by empty string.
RegEx Demo
I always forget this one myself and go hunting the internet for the answer. Here it is, for my future reference, and the rest of the internet
const rawNumber = '(555) 123-4567';
const strippedNumber = rawNumber.replace(/\D+/g, '');
Of course, my above example is JavaScript specific, but it can be adapted to other languages easily. The original post didn't specify language.
This is a better approche. it's includes decimal numbers as well.
"$9.0".replace(/[^0-9.]+/g, '');
JavaScript :)
Related
Utter RegEx noob here with a project involving RegEx I need to modify. Has been a blast learning all of this.
I need to search for/verify a set of vales that start with one of two string combinations (NC or KH) and a variable numeric list—unique to each string prefix. NC01-NC13 or KH01-11.
I have been able to pull off the first common "chunk" of this with:
^(NC|KH)0[1-9]$
to verify NC01-NC09 or KH01-KH09. The next part is completely throwing me—needing to change the leading character of the two-digit character to a 1 vs a 0, and restricting the range to 0–3 for NC and 0–1 for KH.
I have found references abound for selecting between two strings (where I got the (NC|KH) from), but nothing as detailed as how to restrict following values based on the found text.
Any and all help would be greatly appreciated, as well as any great references/books/tutorials to RegEx (currently using Regular-Expressions.info).
The best way to do this is to just separate the two case altogether.
((NC(0\d|1[0-3])|(KH(0\d|1[01])))
You might want to turn some of those internal capturing groups into non capturing groups, but that make the regex a little hard to read.
Edit: You might also be able to do this with positive lookbehind.
Edit: Here's a regex using lookbehind. It's a lot messier, and not really necessary here, but hopefully demonstrates the utility:
(KH|NC)(0\d|(?<=KH)(1[01])|(?<=NC)(1[0-3]))
Sticking with your original idea of options for NC or KH, do the same for the numbers, try this:
^(NC|KH)(0[1-9]|1[0-3])$
Hope that makes sense
EDIT:
Based upon #Patrick's comment below, and sticking with this original answer, you could use this (although I bet there's a better way):
^(NC|KH)(0[1-9]|1[0-1])|(NC1[2-3])$
I need something besides [^0-9\n], I want a regex(but dont know how to make one), that captures anything in a pattern of numbers like this, "0000000000" or "000-000-0000" or basically any numbers that exist with spaces and or special characters right before or in between.
so any number, even like these (626*) 34a2- 4387) should convert to 6263424387
How can this be accomoplished? Im thinking its too hard?
You can search for all non-digits using:
\D+
OR
[^0-9]+
And replace by empty string.
RegEx Demo
I always forget this one myself and go hunting the internet for the answer. Here it is, for my future reference, and the rest of the internet
const rawNumber = '(555) 123-4567';
const strippedNumber = rawNumber.replace(/\D+/g, '');
Of course, my above example is JavaScript specific, but it can be adapted to other languages easily. The original post didn't specify language.
This is a better approche. it's includes decimal numbers as well.
"$9.0".replace(/[^0-9.]+/g, '');
JavaScript :)
I have the following string 3}HFB}4AF4}1 -M}1.
I have searched for this string using the regex :
([0-9])(\})([A-Z]{3})(\})([0-9][A-Z]{2}[0-9])(\})([0-9])(\s\-)([A-Z])(\})([0-9]).
I want to replace the } with 0. The Result I am looking for is 30HFB04AF401-M01, any assistance is appriciated. The tool I am using is Regex Buddy
A possible solution
Problem solved? In JavaScript at least :-)
"3}HFB}4AF4}1 -M}1".replace(/\}/g, "0");
// "30HFB04AF401 -M01"
I'm missing the point, right?
Assuming the language is JavaScript, we can write something like
"dfghj456783}HFB}4AF4}1 -M}1fghjkl8765".replace(/(?:[\d\w\s]+)([0-9]}[A-Z]{3}}[0-9][A-Z]{2}[0-9]}[0-9] -[A-Z]}[0-9])(?:[\d\w\s]+)/g, function () {
return arguments[1].replace(/}/g, "0");
});
What's possible in other languages though may be a different story.
Try the home of RegexBuddy for details.
So you've already got an expression to find instances of the string. Now you can either use groups to replace the characters, or you can use a separate regular expression over the string you found, simply replacing the } character within group(0) (which is the entire matched part of the input). I would certainly prefer the latter.
Fred seems to have created the replacement method for you already, so I won't repeat it here.
I have managed to find a solution to the formating in the JGSoft Lanugage used by Regex Buddy, thanks to all that provided suggestions that helped me channel my thoughts in the right direction.
Solution(I am still a beginner with Regex hence the syntax might not be efficent, but it does the job!!)
Using Group Names instead of Regex assiging groups with backreference and $ syntax.
Hence to replace 0 for } in the string 3}HFB}4AF4}1 -M}1 or any similar string. I used the following search and replacement syntax
Search : (?<Gp1>([0-9]))(?:})(?<Gp2>([A-Z]){3})(?:})(?<Gp3>([0-9])([A-Z]{2})([0-9]))(?:})(?<Gp4>([0-9]))(?:\s-)(?<Gp5>([A-Z]))(?:})(?<Gp6>[0-9])
Replace : ${Gp1}0${Gp2}0${Gp3}0${Gp4}-${Gp5}0${Gp6}
Result : 30HFB04AF401-M01
I use VB.NET and would like to add http:// to all links that doesn't already start with http://, https://, ftp:// and so on.
"I want to add http here Google,
but not here Google."
It was easy when I just had the links, but I can't find a good solution for an entire string containing multiple links. I guess RegEx is the way to go, but I wouldn't even know where to start.
I can find the RegEx myself, it's the parsing and prepending I'm having problems with. Could anyone give me an example with Regex.Replace() in C# or VB.NET?
Any help appreciated!
Quote RFC 1738:
"Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http")."
Excellent! A regex to match:
/^[a-zA-Z0-9+.-]+:\/\//
If that matches your href string, continue on. If not, prepend "http://". Remaining sanity checks are yours unless you ask for specific details. Do note the other commenters' thoughts about relative links.
EDIT: I'm starting to suspect that you've asked the wrong question... that you perhaps don't have anything that splits the text up into the individual tokens you need to handle it. See Looking for C# HTML parser
EDIT: As a blind try at ignoring all and just attacking the text, using case insensitive matching,
/(<a +href *= *")(.*?)(" *>)/
If the second back-reference matches /^[a-zA-Z0-9+.-]+:\/\//, do nothing. If it does not match, replace it with
$1 + "http://" + $2 + $3
This isn't C# syntax, but it should translate across without too much effort.
In PHP (should translate somewhat easily)
$text = preg_replace('/href="(?:(http|ftp|https)\:\/\/)?([^"]*)"/', 'href="http://$1"', $text);
C#
result = new Regex("(href=\")([^(http|https|ftp)])", RegexOptions.IgnoreCase).Replace(input, "href=\"//$2");
If you aren't concerned with potentially messing up local links, and you can always guarantee that the strings will be fully qualified domain names, then you can simply use the contains method:
Dim myUrl as string = "someUrlString".ToLower()
If Not myUrl.Contains("http://") AndAlso Not myUrl.Contains("https://") AndAlso Not myUrl.Contains("ftp://") Then
'Execute your logic to prepend the proper protocol
myUrl = "http://" & myUrl
End If
Keep in mind this omits a lot of holes regarding the checking of which protocol should be used in the addition and if the url is relative or not.
Edit: I chose specifically not to offer a RegEx solution since this is a simple check and RegEx is a little heavy for it (IMO).
I am working with legacy systems at the moment, and a lot of work involves breaking up delimited strings and testing against certain rules.
With this string, how could I return "Active" in a back reference and search terms, stopping when it hits the first caret (^)?:
Active^20080505^900^LT^100
Can it be done with an inclusion in the regex of this "(.+)" ? The reason I ask is that the actual regex "(.+)" is defined in a database as cutting up these messages and their associated rules can be set from a front-end system. The content could be anything ('Active' in this case), that's why ".+" has been used in this case.
Rule: The caret sign cannot feature between the brackets, as that would result with it being stored in the database field too, and it is defined elsewhere in another system field.
If you have a better suggestion than "(.+)" will be happy to hear it.
Thanks in advance.
(.+?)\^
Should grab up to the first ^
If you have to include (.+) w/o modifications you could use this:
(.+?)\^(.+)
The first backreference will still be the correct one and you can ignore the second.
A regex is really overkill here.
Just take the first n characters of the string where n is the position of the first caret.
Pseudo code:
InputString.Left(InputString.IndexOf("^"))
^([^\^]+)
That should work if your RE library doesn't support non-greediness.