Regex match a string and allow specific character to appear randomly - regex

I want to extract a portion of a string, allowing for the dash character to appear randomly throughout. In my match, I want the dash character occurrences to be included.
Let's say I have a scenario like so:
haystack = "arandomse-que-nce"
needle = "sequence"
and I want to come out on the other end with a string like se-que-nce this this case, what would the regex pattern look like?

I would split the string and then join by -*; for example, in JavaScript:
var needle = "sequence"
var regex = new RegExp(needle.split('').join('-*'))
var result = "arandomse-que-nce".match(regex) // ["se-que-nce"]
var result2 = "a-bad-sequ_ence".match(regex) // null
You could also use a regex to insert -* between each character:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*'))
Both the split/join method and the replace method return 's-*e-*q-*u-*e-*n-*c-*e' for the regex.
If you have characters like * in your string, that have meanings in regular expressions, you may want to escape them, like so:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*')
.replace(/([-\\^$*+?.()|[\]{}])/g, '\\$1'))
Then, if needle was 1+1, for example, it would give you 1-*\+-*1 for the regex.

s-*e-*q-*u-*e-*n-*c-*e-*
The assumes that multiple hyphens in a row are okay.
EDIT: Doorknob's split/join solution is good, but be aware that it only works for character that aren't special characters (*, +, etc.)
I don't know what the specifications are, but if there are special characters, make sure to escape them:
new RegExp(needle.split('').map(function(c) { return '\\' + c; }).join('-*'))

You could try to use:
s-?e-?q-?u-?e-?n-?c-?e

Related

Split string by multiple delimiters and keep the delimiter in result [duplicate]

How to split string with Regex.Split and keep all separators?
I have a string:"substring1 delimeter1 substring2" , where delimeter+substring2 is a part of address.
Also i have 2 and more delimeters: delim1,delim2 wich are equivalent in meaning;
And i want to get string array like this:
arr[0]="subsctring1";
arr[1]="delim1 subsctring2";
or,
arr[1]="delim2 subsctring2;
I have a pattern:
addrArr= Regex.Split(inputText, String.Concat("(?<=",delimeter1, "|",delimeter2, ")"), RegexOptions.None);
But it not works well.
Can you help me to create a valid pattern to to that?
You need a pattern with a lookahead only:
\s+(?=delim1|delim2)
The \s+ will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s* (but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b word boundaries: \s+(?=\b(?:delim1|delim2)\b).
In C#:
addrArr = Regex.Split(inputText, string.Format(#"\s+(?={0})", string.Join("|", delimeters)));
If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape on your delimiters list.
A C# demo:
var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(#"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));
I think you need to use a lookahead, not a lookbehind, for this to work (haven't tried it though).
Also, you have to be careful with the separators; they must be escaped to work correctly as patterns in the regex.
Try this:
addrArr= Regex.Split(inputText, string.Format("(?={0}|{1})", Regex.Escape(delimeter1), Regex.Escape(delimeter2)), RegexOptions.None);

Generalized Regex from a set of String

I have this problem. I need to find automatically a way to generate a regex that match a set of string.
For example, given the set of string in input:
S = ["Casino Royale (1928)", "Mission Goldfinger", "A view to a kill"]
create iterating at the start a regex that match the first string, so:
regex1 = "\w{6}\s\w{6}\s\(\d{4}\)"
then compare regex1 with the second string, so:
regex2 = "\w{6-7}\s\w{6-10}(\s\(\d{4}\))?"
and then with the last string, so the final output is:
regex_output = "\w{1-7}\s\w{4-10}(\s\w{2}\s\w\s\w{4}|\s\(\d{4}\))?"
I would like to if it is possible to realize. Maybe it is a problem of complexity theory, maybe.
Thanks in advice.
Use an alternation of literals:
^\QCasino Royale (1928)\E|\QMission Goldfinger\E|\QA view to a kill\E$
\Q...\E means the characters contained to be matched literally.
This approach can of course handle an arbitrarily large list of strings.

Regex replacing special characters in a string

I have numerical values that contain special characters and I would like to replace those special characters with "x"
I already tried [^\w*], and it will only work when there is one special character
When there is more than 1234?12?, it won't capture the second special character, what am i doing wrong?
Here is something you could use. It will replace all none numeric characters. Good luck!
var str = "rt5121212?232?2*dse%e&323"
var pattern = /([^![0-9])/gi;
var sanitized = str.replace(pattern,'');
console.log(sanitized);

Regex to remove characters up to a certain point in a string

How do I use regex to convert
11111aA$xx1111xxdj$%%`
to
aA$xx1111xxdj$%%
So, in other words, I want to remove (or match) the FIRST grouping of 1's.
Depending on the language, you should have a way to replace a string by regex. In Java, you can do it like this:
String s = "11111aA$xx1111xxdj$%%";
String res = s.replaceAll("^1+", "");
The ^ "anchor" indicates that the beginning of the input must be matched. The 1+ means a sequence of one or more 1 characters.
Here is a link to ideone with this running program.
The same program in C#:
var rx = new Regex("^1+");
var s = "11111aA$xx1111xxdj$%%";
var res = rx.Replace(s, "");
Console.WriteLine(res);
(link to ideone)
In general, if you would like to make a match of anything only at the beginning of a string, add a ^ prefix to your expression; similarly, adding a $ at the end makes the match accept only strings at the end of your input.
If this is the beginning, you can use this:
^[1]*
As far as replacing, it depends on the language. In powershell, I would do this:
[regex]::Replace("11111aA$xx1111xxdj$%%","^[1]*","")
This will return:
aA$xx1111xxdj$%%
If you only want to replace consecutive "1"s at the beginning of the string, replace the following with an empty string:
^1+
If the consecutive "1"s won't necessarily be the first characters in the string (but you still only want to replace one group), replace the following with the contents of the first capture group (usually \1 or $1):
1+(.*)
Note that this is only necessary if you only have a "replace all" capability available to you, but most regex implementations also provide a way to replace only one instance of a match, in which case you could just replace 1+ with an empty string.
I'm not sure but you can try this
[^1](\w*\d*\W)* - match all as a single group except starting "1"(n) symbols
In Javascript
var str = '11111aA$xx1111xxdj$%%';
var patt = /^1+/g;
str = str.replace(patt,"");

Regex to find substring between two strings

I'd like to capture the value of the Initial Catalog in this string:
"blah blah Initial Catalog = MyCat'"
I'd like the result to be: MyCat
There could or could not be spaces before and after the equal sign and there could or could not be spaces before the single quote.
Tried this and various others but no go:
/Initial Catalog\s?=\s?.*\s?\'/
Using .Net.
You need to put parentheses around the part of the string that you would like to match:
/Initial Catalog\s*=\s*(.*?)\s*'/
Also you would like to exclude as many spaces as possible before the ', so you need \s* rather than \s?. The .*? means that the extracted part of the string doesn't take those spaces, since it is now lazy.
This is a nice regex
= *(.*?) *'
Use the idea and add \s and more literal text as needed.
In C# group 1 will contain the match
string resultString = null;
try {
Regex regexObj = new Regex("= *(.*?) *'");
resultString = regexObj.Match(subjectString).Groups[1].Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Regex rgx = new Regex(#"=\s*([A-z]+)\s*'");
String result = rgx.Match(text).Groups[1].Value;