I'm wondering if there is a function like preg_match in PHP where I can find or match a string with another string.
//In Array `word` // in array `part`
"Backdoor", 0 "mark" 3 (matches "Market")
"DVD", 1 "of" 2 (matches "Get off")
"Get off", 2 "" -1 (no match)
"Market", 3 "VD" 1 (matches "DVD")
I'm thinking that if there is a function that can match just part of the string it would be great, but as far as I know there is only strcmp but that will only compare if is match or not for the whole string in which my case will always be false.
std::strstr(). It doesn't do regexes, but it does do simple string-in-string matching.
const char *foo = "Quick brown fox";
const char *bar = "brown";
printf("%d\n", strstr(foo, bar) - foo); // Displays "6"
And as you're in C++, there's also std::string::find():
std::string foo = "Quick brown fox";
std::string bar = "brown";
std::cout << foo.find(bar) << "\n"; // Displays "6"
you can use std::string::find()
also you can use std::strstr()
as another alternative you can implement this function using dynamic programming or backtrack method (Dynamic Programming has higher performance).
Naturally, i know this question is not an algorithmic problem, but i think this answer can be useful
Related
I'm trying to create a regex to match the last word of a string, but only if the string starts with a certain pattern.
For example, I want to get the last word of a string only if the string starts with "The cat".
"The cat eats butter" -> would match "butter".
"The cat drinks milk"-> would match "milk"
"The dog eats beef" -> would find no match.
I know the following will give me the last word:
\s+\S*$
I also know that I can use a positive look behind to make sure a string starts with a certain pattern:
(?<=The cat )
But I can't figure out to combine them.
I'll be using this in c# and I know I could combine this with some string comparison operators but I'd like this all to be in one regex expression, as this is one of several regex pattern string that I'll be looping through.
Any ideas?
Use the following regex:
^The cat.*?\s+(\S+)$
Details:
^ - Start of the string.
The cat - The "starting" pattern.
.*? - A sequence of arbitrary chars, reluctant version.
\s+ - A sequence of "white" chars.
(\S+) - A capturing group - sequence of "non-white" chars,
this is what you want to capture.
$ - End of the string.
So the last word will be in the first capturing group.
What about this one?
^The\scat.*\s(\w+)$
My regex knowdlege is quite rusty, but couldn't you simply "add" the word you are looking for at the start of \s+\S*$, if you know that will return the last word?
Something like this then (the "\" is supposed to be the escape sign so it's read as the actual word):
\T\h\e\ \c\a\t\ \s+\S*$
Without Regex
No need for regex. Just use C#'s StartsWith with Linq's Split(' ').Last().
See code in use here
using System;
using System.Linq;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
foreach(string s in strings) {
if(s.StartsWith("The cat")) {
Console.WriteLine(s.Split(' ').Last());
}
}
}
}
Result:
butter
milk
With Regex
If you prefer, however, a regex solution, you may use the following.
See code in use here
using System;
using System.Text.RegularExpressions;
class Example {
static void Main() {
string[] strings = {
"The cat eats butter",
"The cat drinks milk",
"The dog eats beef"
};
Regex regex = new Regex(#"(?<=^The cat.*)\b\w+$");
foreach(string s in strings) {
Match m = regex.Match(s);
if(m.Success) {
Console.WriteLine(m.Value);
}
}
}
}
Result:
butter
milk
This causes infinite loop:
std::regex_replace("the string", std::regex(".*"), "whatevs");
This DOES NOT cause infinite loop:
std::regex_replace("the string", std::regex("^.*$"), "whatevs");
What is wrong with Mac regex implementation? using Mac OS X El Capitan Xcode 7.1
this question is related to: C++ Mac OS infinite loop in regex_replace if given blank regex expression
The .* matches the whole string first, and then the empty string at the end because * means "match 0 or more occurrences of the preceding subpattern". The empty string match is probably the cause of the infinite loop, but I'm not sure whether it's a bug or by-design.
You can override the behavior using std::regex_constants::match_not_null (see regex_replace c++ reference):
match_not_null Not null Empty sequences do not match.
C++ code demo returning whatevs only:
std::regex reg(".*");
std::string s = "the string";
std::cout << std::regex_replace(s, reg, "whatevs",
std::regex_constants::match_not_null) << std::endl;
Note that the "infinite loop" you observe is most likely a bug since the source code hints that an exception should be thrown once an empty string is passed to the regex engine. It is not yet logged anywhere. I think (not sure) the issue might be with how the string is handled by the regex_replace method when matches are collected for a replace operation.
Here is what happens: The regex_replace calls
basic_string<_Elem, _Traits1, _Alloc1> regex_replace(const basic_string<_Elem, _Traits1, _Alloc1>& _Str, const basic_regex<_Elem, _RxTraits>& _Re, const _Elem *_Ptr, regex_constants::match_flag_type _Flgs = regex_constants::match_default)
{ // search and replace, string result, string target, NTBS format
basic_string<_Elem, _Traits1, _Alloc1> _Res;
const basic_string<_Elem> _Fmt(_Ptr);
regex_replace(_STD back_inserter(_Res), _Str.begin(), _Str.end(),
_Re, _Fmt, _Flgs);
return (_Res);
}
_Res is an empty string, _Fmt is now whatevs. Then, the regex_replace is called. _Str.end() equals 10, and a pointer is initialized.
_First equals the string and _Last equals an empty string.
It happens as a result of internal char buffer processing whose pointer actually contains an array of:
The inline back_insert_iterator<_Container> back_inserter(_Container& _Cont) method first creates a string out of the first 0 to 9 chars, and then from 10 to 15 array elements (the one starting with the null terminator).
stribizhev's answer inspired this one. Here are example results using various flags:
GOOD
boost::regex_replace(input, match, replace, input.empty() ? boost::regex_constants::match_default : boost::regex_constants::match_not_null);
results:
input: ""
match: ".*"
replace: "a"
output: "a"
input: "something"
match: ".*"
replace: "a"
output: "a"
BAD
boost::regex_replace(input, match, replace, boost::regex_constants::match_not_null);
results:
input: ""
match: ".*"
replace: "a"
output: ""
input: "something"
match: ".*"
replace: "a"
output: "a"
BAD
boost::regex_replace(input, match, replace);
results:
input: ""
match: ".*"
replace: "a"
output: "a"
input: "something"
match: ".*"
replace: "a"
output: "aa"
I need to determine whether a string begins with a number - I've tried the following to no avail:
if (matches("^[0-9].*)", upper(text))) str = "Title"""
I'm new to DXL and Regex - what am I doing wrong?
You need the caret character to indicate a match only at the start of a string. I added the plus character to match all the numbers, although you might not need it for your situation. If you're only looking for numbers at the start, and don't care if there is anything following, you don't need anymore.
string str1 = "123abc"
string str2 = "abc123"
string strgx = "^[0-9]+"
Regexp rgx = regexp2(strgx)
if(rgx(str1)) { print str1[match 0] "\n" } else { print "no match\n" }
if(rgx(str2)) { print str2[match 0] "\n" } else { print "no match\n" }
The code block above will print:
123
no match
#mrhobo is correct, you want something like this:
Regexp numReg = "^[0-9]"
if(numReg text) str = "Title"
You don't need upper since you are just looking for numbers. Also matches is more for finding the part of the string that matches the expression. If you just want to check that the string as a whole matches the expression then the code above would be more efficient.
Good luck!
At least from example I found this example should work:
Regexp plural = regexp "^([0-9].*)$"
if plural "15systems" then print "yes"
Resource:
http://www.scenarioplus.org.uk/papers/dxl_regexp/dxl_regexp.htm
How can I get a string that is between two other declared strings, for example:
String 1 = "[STRING1]"
String 2 = "[STRING2]"
Source:
"832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
How can I get the "I need this text here"?
Since this is homework, only clues:
Find index1 of occurrence of String1
Find index2 of occurrence of String2
Substring from index1+lengthOf(String1) (inclusive) to index2 (exclusive) is what you need
Copy this to a result buffer if necessary (don't forget to null-terminate)
Might be a good case for std::regex, which is part of C++11.
#include <iostream>
#include <string>
#include <regex>
int main()
{
using namespace std::string_literals;
auto start = "\\[STRING1\\]"s;
auto end = "\\[STRING2\\]"s;
std::regex base_regex(start + "(.*)" + end);
auto example = "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"s;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (base_match.size() == 2) {
matched = base_match[1].str();
}
}
std::cout << "example: \""<<example << "\"\n";
std::cout << "matched: \""<<matched << "\"\n";
}
Prints:
example: "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
matched: "I need this text here"
What I did was create a program that creates two strings, start and end that serve as my start and end matches. I then use a regular expression string that will look for those, and match against anything in-between (including nothing). Then I use regex_match to find the matching part of the expression, and set matched as the matched string.
For more info, see http://en.cppreference.com/w/cpp/regex and http://en.cppreference.com/w/cpp/regex/regex_search
Use strstr http://www.cplusplus.com/reference/clibrary/cstring/strstr/ , with that function you will get 2 pointers, now you should compare them (if pointer1 < pointer2) if so, read all chars between them.
I want to split a command line like string in single string parameters. How look the regular expression for it. The problem are that the parameters can be quoted. For example like:
"param 1" param2 "param 3"
should result in:
param 1, param2, param 3
You should not use regular expressions for this. Write a parser instead, or use one provided by your language.
I don't see why I get downvoted for this. This is how it could be done in Python:
>>> import shlex
>>> shlex.split('"param 1" param2 "param 3"')
['param 1', 'param2', 'param 3']
>>> shlex.split('"param 1" param2 "param 3')
Traceback (most recent call last):
[...]
ValueError: No closing quotation
>>> shlex.split('"param 1" param2 "param 3\\""')
['param 1', 'param2', 'param 3"']
Now tell me that wrecking your brain about how a regex will solve this problem is ever worth the hassle.
I tend to use regexlib for this kind of problem. If you go to: http://regexlib.com/ and search for "command line" you'll find three results which look like they are trying to solve this or similar problems - should be a good start.
This may work:
http://regexlib.com/Search.aspx?k=command+line&c=-1&m=-1&ps=20
("[^"]+"|[^\s"]+)
what i use
C++
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
void foo()
{
std::string strArg = " \"par 1\" par2 par3 \"par 4\"";
std::regex word_regex( "(\"[^\"]+\"|[^\\s\"]+)" );
auto words_begin =
std::sregex_iterator(strArg.begin(), strArg.end(), word_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i)
{
std::smatch match = *i;
std::string match_str = match.str();
std::cout << match_str << '\n';
}
}
Output:
"par 1"
par2
par3
"par 4"
Without regard to implementation language, your regex might look something like this:
("[^"]*"|[^"]+)(\s+|$)
The first part "[^"]*" looks for a quoted string that doesn't contain embedded quotes, and the second part [^"]+ looks for a sequence of non-quote characters. The \s+ matches a separating sequence of spaces, and $ matches the end of the string.
Regex: /[\/-]?((\w+)(?:[=:]("[^"]+"|[^\s"]+))?)(?:\s+|$)/g
Sample: /P1="Long value" /P2=3 /P3=short PwithoutSwitch1=any PwithoutSwitch2
Such regex can parses the parameters list that built by rules:
Parameters are separates by spaces (one or more).
Parameter can contains switch symbol (/ or -).
Parameter consists from name and value that divided by symbol = or :.
Name can be set of alphanumerics and underscores.
Value can absent.
If value exists it can be the set of any symbols, but if it has the space then value should be quoted.
This regex has three groups:
the first group contains whole parameters without switch symbol,
the second group contains name only,
the third group contains value (if it exists) only.
For sample above:
Whole match: /P1="Long value"
Group#1: P1="Long value",
Group#2: P1,
Group#3: "Long value".
Whole match: /P2=3
Group#1: P2=3,
Group#2: P2,
Group#3: 3.
Whole match: /P3=short
Group#1: P3=short,
Group#2: P3,
Group#3: short.
Whole match: PwithoutSwitch1=any
Group#1: PwithoutSwitch1=any,
Group#2: PwithoutSwitch1,
Group#3: any.
Whole match: PwithoutSwitch2
Group#1: PwithoutSwitch2,
Group#2: PwithoutSwitch2,
Group#3: absent.
Most languages have other functions (either built-in or provided by a standard library) which will parse command lines far more easily than building your own regex, plus you know they'll do it accurately out of the box. If you edit your post to identify the language that you're using, I'm sure someone here will be able to point you at the one used in that language.
Regexes are very powerful tools and useful for a wide range of things, but there are also many problems for which they are not the best solution. This is one of them.
This will split an exe from it's params; stripping parenthesis from the exe; assumes clean data:
^(?:"([^"]+(?="))|([^\s]+))["]{0,1} +(.+)$
You will have two matches at a time, of three match groups:
The exe if it was wrapped in parenthesis
The exe if it was not wrapped in parenthesis
The clump of parameters
Examples:
"C:\WINDOWS\system32\cmd.exe" /c echo this
Match 1: C:\WINDOWS\system32\cmd.exe
Match 2: $null
Match 3: /c echo this
C:\WINDOWS\system32\cmd.exe /c echo this
Match 1: $null
Match 2: C:\WINDOWS\system32\cmd.exe
Match 3: /c echo this
"C:\Program Files\foo\bar.exe" /run
Match 1: C:\Program Files\foo\bar.exe
Match 2: $null
Match 3: /run
Thoughts:
I'm pretty sure that you would need to create a loop to capture a possibly infinite number of parameters.
This regex could easily be looped onto it's third match until the match fails; there are no more params.
If its just the quotes you are worried about, then just write a simple loop to dump character by character to a string ignoring the quotes.
Alternatively if you are using some string manipulation library, you can use it to remove all quotes and then concatenate them.
there's a python answer thus we shall have a ruby answer as well :)
require 'shellwords'
Shellwords.shellsplit '"param 1" param2 "param 3"'
#=> ["param 1", "param2", "param 3"] or :
'"param 1" param2 "param 3"'.shellsplit
Though answer is not RegEx specific but answers Python commandline arg parsing:
dash and double dash flags
int/float conversion based on SO answer
import sys
def parse_cmd_args():
_sys_args = sys.argv
_parts = {}
_key = "script"
_parts[_key] = [_sys_args.pop(0)]
for _part in _sys_args:
# Parse numeric values float and integers
if _part.replace("-", "1", 1).replace(".", "1").replace(",", "").isdigit():
_part = int(_part) if '.' not in _part and float(_part)/int(_part) == 1 else float(_part)
_parts[_key].append(_part)
elif "=" in _part:
_part = _part.split("=")
_parts[_part[0].strip("-")] = _part[1].strip().split(",")
elif _part.startswith(("-")):
_key = _part.strip("-")
_parts[_key] = []
else:
_parts[_key].extend(_part.split(","))
return _parts
Something like:
"(?:(?<=")([^"]+)"\s*)|\s*([^"\s]+)
or a simpler one:
"([^"]+)"|\s*([^"\s]+)
(just for the sake of finding a regexp ;) )
Apply it several time, and the group n°1 will give you the parameter, whether it is surrounded by double quotes or not.
If you are looking to parse the command and the parameters I use the following (with ^$ matching at line breaks aka multiline):
(?<cmd>^"[^"]*"|\S*) *(?<prm>.*)?
In case you want to use it in your C# code, here it is properly escaped:
try {
Regex RegexObj = new Regex("(?<cmd>^\\\"[^\\\"]*\\\"|\\S*) *(?<prm>.*)?");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
It will parse the following and know what is the command versus the parameters:
"c:\program files\myapp\app.exe" p1 p2 "p3 with space"
app.exe p1 p2 "p3 with space"
app.exe
Here's a solution in Perl:
#!/usr/bin/perl
sub parse_arguments {
my $text = shift;
my $i = 0;
my #args;
while ($text ne '') {
$text =~ s{^\s*(['"]?)}{}; # look for (and remove) leading quote
my $delimiter = ($1 || ' '); # use space if not quoted
if ($text =~ s{^(([^$delimiter\\]|\\.|\\$)+)($delimiter|$)}{}) {
$args[$i++] = $1; # acquired an argument; save it
}
}
return #args;
}
my $line = <<'EOS';
"param 1" param\ 2 "pa\"ram' '3" 'pa\'ram" "4'
EOS
say "ARG: $_" for parse_arguments($line);
Output:
ARG: param 1
ARG: param\ 2
ARG: pa"ram' '3
ARG: pa'ram" "4
Note the following:
Arguments can be quoted with either " or ' (with the "other"
quote type treated as a regular character for that argument).
Spaces and quotes in arguments can be escaped with \.
The solution can be adapted to other languages. The basic approach is to (1) determine the delimiter character for the next string, (2) extract the next argument up to an unescaped occurrence of that delimiter or to the end-of-string, then (3) repeat until empty.
\s*("[^"]+"|[^\s"]+)
that's it
(reading your question again, just prior to posting I note you say command line LIKE string, thus this information may not be useful to you, but as I have written it I will post anyway - please disregard if I have missunderstood your question.)
If you clarify your question I will try to help but from the general comments you have made i would say dont do that :-), you are asking for a regexp to split a series of parmeters into an array. Instead of doing this yourself I would strongly suggest you consider using getopt, there are versions of this library for most programming languages. Getopt will do what you are asking and scales to manage much more sophisticated argument processing should you require that in the future.
If you let me know what language you are using I will try and post a sample for you.
Here are a sample of the home pages:
http://www.codeplex.com/getopt
(.NET)
http://www.urbanophile.com/arenn/hacking/download.html
(java)
A sample (from the java page above)
Getopt g = new Getopt("testprog", argv, "ab:c::d");
//
int c;
String arg;
while ((c = g.getopt()) != -1)
{
switch(c)
{
case 'a':
case 'd':
System.out.print("You picked " + (char)c + "\n");
break;
//
case 'b':
case 'c':
arg = g.getOptarg();
System.out.print("You picked " + (char)c +
" with an argument of " +
((arg != null) ? arg : "null") + "\n");
break;
//
case '?':
break; // getopt() already printed an error
//
default:
System.out.print("getopt() returned " + c + "\n");
}
}