URL rewriting with regular expressions - regex

I want to extract 2 pieces of information from the URL below, "events/festivals" and "sandiego.storeboard.com".
How can I do that with a regular expression?
http://sandiego.storeboard.com/classifieds/events/festivals
I need this information for a URL rewrite in IIS 7

Try this:
^http://([^/]*)/classifieds/([^/]*/[^/]*)/
The [^/] snippet means "everything which is not a /"

The following C# code will retrun the two strings that you requested.
class Program
{
static void Main(string[] args)
{
GroupCollection result = GetResult("http://sandiego.storeboard.com/classifieds/events/festivals");
Console.Write(result[1] + " " + result[2]);
Console.ReadLine();
}
private static GroupCollection GetResult(string url)
{
string reg = #".*?(\w+\.\w+\.com).*?(events\/festivals)";
return Regex.Match(url, reg).Groups;
}
}

It's not the fastest solution but it works:
(.*?)/classifieds/(.*)

Related

How to find a string at a specific location mixed not english in java?

How to find a string at a specific location with regex?
choryangStn_110_220114_일_0.sbm
choryangStn_110_220114_이_0.sbm
choryangStn_110_220114_삼_0.sbm
At work, I would like to bring 일, 이, 삼
I tried
String filename = "choryangStn_110_220114_일_0.sbm";
filename.replaceAll(".*_(\\w+)_\\d+\\.\\w+", "$1");
If do like this, it will not work properly.
I wonder how can I satisfy \\w or [가-힣] .
filename.replaceAll(".*_(\\w+)||[가-힣]_\\d+\\.\\w+", "$1");
filename.replaceAll(".*_(\\w+||[가-힣])_\\d+\\.\\w+", "$1");
Both of the above sentences don't work properly.
I wonder how this is possible.
You can use the following regex with replaceFirst():
(?U)^.*_(\\w+)_\\d+\\.\\w+$
The (?U) is an embedded flag option that is equivalent of Pattern.UNICODE_CHARACTER_CLASS option that makes all shorthand character classes Unicode-aware.
See the regex demo and the Java demo:
import java.util.*;
import java.util.regex.*;
class Test
{
public static void main (String[] args) throws java.lang.Exception
{
String strings[] = {"choryangStn_110_220114_일_0.sbm",
"choryangStn_110_220114_이_0.sbm",
"choryangStn_110_220114_삼_0.sbm"
};
String regex = "(?U)^.*_(\\w+)_\\d+\\.\\w+$";
for(String text : strings)
{
System.out.println("'" + text + "' => '" + text.replaceFirst(regex, "$1") + "'");
}
}
}
Output:
'choryangStn_110_220114_일_0.sbm' => '일'
'choryangStn_110_220114_이_0.sbm' => '이'
'choryangStn_110_220114_삼_0.sbm' => '삼'

Splitting a string into parts, including quoted strings

So suppose I have this line:
print "Hello world!" out.txt
And I want to split it into:
print
"Hello world!"
out.txt
What would be the regular expression to match these?
Note that there must be a space between each of them. For example, if I had this:
print"Hello world!"out.txt
I would get:
print"Hello
world!"out.txt
The language I'm using is Haxe.
Expanding on Mark Knol's answer, this should work as expected for all your test strings so far:
static function main() {
var command = 'print "Hello to you world!" out.txt';
var regexp:EReg = ~/("[^"]+"|[^\s]+)/g;
var result = [];
var pos = 0;
while (regexp.matchSub(command, pos)) {
result.push(regexp.matched(0));
var match = regexp.matchedPos();
pos = match.pos + match.len;
}
trace(result);
}
Demo: http://try.haxe.org/#5c0B1
EDIT:
As pointed out in comments, if your use case is to split different parts of a command line, then it should be better to have a parser handle it, and not a regex.
These libs might help:
https://github.com/Simn/hxargs
https://github.com/Ohmnivore/HxCLAP
You can use regular expressions in Haxe using the EReg api class:
Demo:
http://try.haxe.org/#76Ea0
class Test {
static function main() {
var command = 'print "Hello world!" out.txt';
var regexp:EReg = ~/\s(?![\w!.]+")/g;
var result = regexp.replace(command, "\n");
js.Browser.alert(result);
}
}
About Haxe regular expressions:
http://haxe.org/manual/std-regex.html
About regular expressions replacement:
http://haxe.org/manual/std-regex-replace.html
EReg class API documentation:
http://api.haxe.org/EReg.html
regex demo
\s(?![\w!.]+"\s)
an example worked for these two case,maybe someone have more better solution

What is the equalient of JavaScript's "s.replace(/[^\w]+/g, '-')" in Dart language?

I am trying to get the following working code in JavaScript also working in Dart.
https://jsfiddle.net/8xyxy8jp/1/
var s = "We live, on the # planet earth";
var results = s.replace(/[^\w]+/g, '-');
document.getElementById("output").innerHTML = results;
Which gives the output
We-live-on-the-planet-earth
I have tried this Dart code
void main() {
print( "We live, on the # planet earth".replaceAll("[^\w]+","-"));
}
But the output becomes the same.
What am I missing here?
If you want replaceAll() to process the argument as regular expression you need to pass a RegExp instance. I usually use r as prefix for the regex string to make it a raw string where not interpolation ($, \, ...) takes place.
main() {
var s = "We live, on the # planet earth";
var result = s.replaceAll(new RegExp(r'[^\w]+'), '-');
print(result);
}
Try it in DartPad

use regex to check if string/url does not begin with specified string

[RegularExpression(), ErrorMessage = "Youtube link must start with www.youtube.com/watch?v=")]
I need to check if Link does NOT begin with: http://www.youtube.com/watch?v=
I've just created an MVC project and tested the following:
[RegularExpression("^((?!http://www.youtube.com/watch\\?v=).)*$")]
This seems to work.
More information may be found here.
If you need to check that the text does begin with a youtube link (rather than does not begin) then you can use:
[RegularExpression("http://www.youtube.com/watch\\?v=.*")]
Try this code , Alan
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.youtube.com/watch?v=";
string re1="(http:\\/\\/www\\.youtube\\.com\\/watch\\?v=)";
string re2="(www\\.youtube\\.com)";
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
String file1=m.Groups[2].ToString();
Console.Write("("+httpurl1.ToString()+")"+"("+file1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
If you only want to check for that specific string, regex is not needed.
just do something like int position = string.IndexOf("http://www.youtube.com/watch?v="); and check if position is 0
EDIT:
If you really need a regular expression you could try this: /^(?!^http:\/\/www\.youtube\.com/watch\?v=).*/

How do I check if a filename matches a wildcard pattern

I've got a wildcard pattern, perhaps "*.txt" or "POS??.dat".
I also have list of filenames in memory that I need to compare to that pattern.
How would I do that, keeping in mind I need exactly the same semantics that IO.DirectoryInfo.GetFiles(pattern) uses.
EDIT: Blindly translating this into a regex will NOT work.
I have a complete answer in code for you that's 95% like FindFiles(string).
The 5% that isn't there is the short names/long names behavior in the second note on the MSDN documentation for this function.
If you would still like to get that behavior, you'll have to complete a computation of the short name of each string you have in the input array, and then add the long name to the collection of matches if either the long or short name matches the pattern.
Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace FindFilesRegEx
{
class Program
{
static void Main(string[] args)
{
string[] names = { "hello.t", "HelLo.tx", "HeLLo.txt", "HeLLo.txtsjfhs", "HeLLo.tx.sdj", "hAlLo20984.txt" };
string[] matches;
matches = FindFilesEmulator("hello.tx", names);
matches = FindFilesEmulator("H*o*.???", names);
matches = FindFilesEmulator("hello.txt", names);
matches = FindFilesEmulator("lskfjd30", names);
}
public string[] FindFilesEmulator(string pattern, string[] names)
{
List<string> matches = new List<string>();
Regex regex = FindFilesPatternToRegex.Convert(pattern);
foreach (string s in names)
{
if (regex.IsMatch(s))
{
matches.Add(s);
}
}
return matches.ToArray();
}
internal static class FindFilesPatternToRegex
{
private static Regex HasQuestionMarkRegEx = new Regex(#"\?", RegexOptions.Compiled);
private static Regex IllegalCharactersRegex = new Regex("[" + #"\/:<>|" + "\"]", RegexOptions.Compiled);
private static Regex CatchExtentionRegex = new Regex(#"^\s*.+\.([^\.]+)\s*$", RegexOptions.Compiled);
private static string NonDotCharacters = #"[^.]*";
public static Regex Convert(string pattern)
{
if (pattern == null)
{
throw new ArgumentNullException();
}
pattern = pattern.Trim();
if (pattern.Length == 0)
{
throw new ArgumentException("Pattern is empty.");
}
if(IllegalCharactersRegex.IsMatch(pattern))
{
throw new ArgumentException("Pattern contains illegal characters.");
}
bool hasExtension = CatchExtentionRegex.IsMatch(pattern);
bool matchExact = false;
if (HasQuestionMarkRegEx.IsMatch(pattern))
{
matchExact = true;
}
else if(hasExtension)
{
matchExact = CatchExtentionRegex.Match(pattern).Groups[1].Length != 3;
}
string regexString = Regex.Escape(pattern);
regexString = "^" + Regex.Replace(regexString, #"\\\*", ".*");
regexString = Regex.Replace(regexString, #"\\\?", ".");
if(!matchExact && hasExtension)
{
regexString += NonDotCharacters;
}
regexString += "$";
Regex regex = new Regex(regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
return regex;
}
}
}
}
You can simply do this. You do not need regular expressions.
using Microsoft.VisualBasic.CompilerServices;
if (Operators.LikeString("pos123.txt", "pos?23.*", CompareMethod.Text))
{
Console.WriteLine("Filename matches pattern");
}
Or, in VB.Net,
If "pos123.txt" Like "pos?23.*" Then
Console.WriteLine("Filename matches pattern")
End If
In c# you could simulate this with an extension method. It wouldn't be exactly like VB Like, but it would be like...very cool.
You could translate the wildcards into a regular expression:
*.txt -> ^.+\.txt$
POS??.dat _> ^POS..\.dat$
Use the Regex.Escape method to escape the characters that are not wildcars into literal strings for the pattern (e.g. converting ".txt" to "\.txt").
The wildcard * translates into .+, and ? translates into .
Put ^ at the beginning of the pattern to match the beginning of the string, and $ at the end to match the end of the string.
Now you can use the Regex.IsMatch method to check if a file name matches the pattern.
Just call the Windows API function PathMatchSpecExW().
[Flags]
public enum MatchPatternFlags : uint
{
Normal = 0x00000000, // PMSF_NORMAL
Multiple = 0x00000001, // PMSF_MULTIPLE
DontStripSpaces = 0x00010000 // PMSF_DONT_STRIP_SPACES
}
class FileName
{
[DllImport("Shlwapi.dll", SetLastError = false)]
static extern int PathMatchSpecExW([MarshalAs(UnmanagedType.LPWStr)] string file,
[MarshalAs(UnmanagedType.LPWStr)] string spec,
MatchPatternFlags flags);
/*******************************************************************************
* Function: MatchPattern
*
* Description: Matches a file name against one or more file name patterns.
*
* Arguments: file - File name to check
* spec - Name pattern(s) to search foe
* flags - Flags to modify search condition (MatchPatternFlags)
*
* Return value: Returns true if name matches the pattern.
*******************************************************************************/
public static bool MatchPattern(string file, string spec, MatchPatternFlags flags)
{
if (String.IsNullOrEmpty(file))
return false;
if (String.IsNullOrEmpty(spec))
return true;
int result = PathMatchSpecExW(file, spec, flags);
return (result == 0);
}
}
Some kind of regex/glob is the way to go, but there are some subtleties; your question indicates you want identical semantics to IO.DirectoryInfo.GetFiles. That could be a challenge, because of the special cases involving 8.3 vs. long file names and the like. The whole story is on MSDN.
If you don't need an exact behavioral match, there are a couple of good SO questions:
glob pattern matching in .NET
How to implement glob in C#
For anyone who comes across this question now that it is years later, I found over at the MSDN social boards that the GetFiles() method will accept * and ? wildcard characters in the searchPattern parameter. (At least in .Net 3.5, 4.0, and 4.5)
Directory.GetFiles(string path, string searchPattern)
http://msdn.microsoft.com/en-us/library/wz42302f.aspx
Plz try the below code.
static void Main(string[] args)
{
string _wildCardPattern = "*.txt";
List<string> _fileNames = new List<string>();
_fileNames.Add("text_file.txt");
_fileNames.Add("csv_file.csv");
Console.WriteLine("\nFilenames that matches [{0}] pattern are : ", _wildCardPattern);
foreach (string _fileName in _fileNames)
{
CustomWildCardPattern _patetrn = new CustomWildCardPattern(_wildCardPattern);
if (_patetrn.IsMatch(_fileName))
{
Console.WriteLine("{0}", _fileName);
}
}
}
public class CustomWildCardPattern : Regex
{
public CustomWildCardPattern(string wildCardPattern)
: base(WildcardPatternToRegex(wildCardPattern))
{
}
public CustomWildCardPattern(string wildcardPattern, RegexOptions regexOptions)
: base(WildcardPatternToRegex(wildcardPattern), regexOptions)
{
}
private static string WildcardPatternToRegex(string wildcardPattern)
{
string patternWithWildcards = "^" + Regex.Escape(wildcardPattern).Replace("\\*", ".*");
patternWithWildcards = patternWithWildcards.Replace("\\?", ".") + "$";
return patternWithWildcards;
}
}
For searching against a specific pattern, it might be worth using File Globbing which allows you to use search patterns like you would in a .gitignore file.
See here: https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing
This allows you to add both inclusions & exclusions to your search.
Please see below the example code snippet from the Microsoft Source above:
Matcher matcher = new Matcher();
matcher.AddIncludePatterns(new[] { "*.txt" });
IEnumerable<string> matchingFiles = matcher.GetResultsInFullPath(filepath);
The use of RegexOptions.IgnoreCase will fix it.
public class WildcardPattern : Regex {
public WildcardPattern(string wildCardPattern)
: base(ConvertPatternToRegex(wildCardPattern), RegexOptions.IgnoreCase) {
}
public WildcardPattern(string wildcardPattern, RegexOptions regexOptions)
: base(ConvertPatternToRegex(wildcardPattern), regexOptions) {
}
private static string ConvertPatternToRegex(string wildcardPattern) {
string patternWithWildcards = Regex.Escape(wildcardPattern).Replace("\\*", ".*");
patternWithWildcards = string.Concat("^", patternWithWildcards.Replace("\\?", "."), "$");
return patternWithWildcards;
}
}