Would Rewriting It Using Regex Shorten/Beautify The Code?

Would Rewriting It Using Regex Shorten/Beautify The Code? - c++

The problem is a little challenging because I want to code it using std::regex believing it would be easier to read and faster to write.
But it seems that I can only code it one way (shown below).
Somehow my mind could not see the solution using std::regex.
How would you code it?
Would using std::regex_search do the job?
/*
input: data coming in:
/product/country/123456/city/7890/g.json
input: url parameter format:
/product/country/<id1:[0-9]+>/city/<id2:[0-9]+>/g.json
output:
std::vector<std::string> urlParams
sample output:
urlParams[0] = "123456"
urlParams[1] = "7890"
*/
bool ParseIt(const char *path, const char* urlRoute, std::vector<std::string> *urlParams)
{
const DWORD BUFSZ = 2000;
char buf[BUFSZ];
DWORD dwSize = strlen(urlRoute);
urlParams.clear();
int j = 0;
int i = 0;
bool good = false;
for (i = 0; i < dwSize; i++)
{
char c1 = path[j++];
char c2 = urlRoute[i];
if (c2 == '<')
{
good = true;
while (c2 != '/')
{
i++;
c2 = urlRoute[i];
}
int k = 0;
memset(buf, 0, BUFSZ);
while (c1 != '/')
{
buf[k++] = c1;
c1 = path[j++];
}
urlParams->push_back(_strdup(buf));
int b = 1;
}
if (c1 != c2)
{
return false;
}
if (c2 != '<')
{
if (c1 == c1)
{
}
else
{
return false;
}
}
}
if (dwSize == i && good)
{
return true;
}
return false;
}

The easiest one I've found (might not be the best but should work with your input data) is
std::string subject("/product/country/123456/city/7890/g.json");
std::regex re("/(\d+)/city/(\d+)/");
std::smatch match;
std::regex_search(subject, match, re);
It matches two values per line. The / matches for the slash at the beginning/end and the () does the capture. You will have to convert it from the string type though.

Related

String Comparison without compare()

I'm currently doing an assignment that requires us to create our own library for string comparison without using compare(), etc.
I got it to work, but during my research I created a bool function for character compare and return values.
It needs to work as if it returns like compare(), where 0 = strings are equal and 0 > or 0 < for not equal instead of true or false like I currently set it up to be.
I tried to change the bool functions to int but now when I run the program that was correctly returning strings are equal, it's showing not equal.
Header code:
bool compare_char(char &c1, char &c2)
{
if (c1 == c2)
return true;
else if (toupper(c1) == toupper(c2))
return true;
else
return false;
}
bool insensitive_string_comparision(string &string_one, string &string_two)
{
return ((string_one.size() == string_two.size()) &&
equal(string_one.begin(), string_one.end(), string_two.begin(), &compare_char));
}
string remove_spaces(string string)
{
string.erase(remove(string.begin(), string.end(), ' '), string.end());
return string;
}
string remove_punctuation(string string)
{
for (size_t i = 0, len = string.size(); i < len; ++i)
{
if (ispunct(string[i]))
{
string.erase(i--, 1);
len = string.size();
}
}
return string;
Int header changes
int compare_char(char &c1, char &c2)
{
if (c1 == c2)
return 0;
else if (toupper(c1) == toupper(c2))
return 0;
else if (toupper(c1) > toupper(c2))
return -1;
else if (toupper(c1) < toupper(c2))
return 1;
}
int insensitive_string_comparision(string &string_one, string &string_two)
{
return ((string_one.size() == string_two.size()) &&
equal(string_one.begin(), string_one.end(), string_two.begin(), &compare_char));
}
Int main changes
int result = insensitive_string_comparision(string_one, string_two);
if (result == 0)
cout << "Both Strings are equal." << endl;
else (result == 1 || result == -1)
cout << "Both Strings are not equal." << endl;
return 0;
I feel like I'm going to have to redesign the entire function to return the value that is similar to compare().
I'm assuming bool was the wrong decision to begin with? Where should I go moving forward to return a correct value?

In your question you are not entirely clear about how you want to compare the strings, but I made some assumptions based on your example code. You can fix your problem by writing insensitive_string_comparision like:
int insensitive_string_comparision(string &string_one, string &string_two) {
int len_one = string_one.length();
int len_two = string_two.length();
int len_comparison = 0;
if (len_one > len_two) {
len_comparison = -1;
} else if (len_one < len_two) {
len_comparison = 1;
}
int minlen = (len_comparison == -1) ? len_one : len_two;
for (int i = 0; i < minlen; i++) {
int order = compare_char(string_one[i], string_two[i]);
if (order != 0) {
return order;
}
}
return len_comparison;
}
I'd also recommend turning on warnings on your compiler. You don't need to put some of your return statements in else blocks.

Compare two strings containing float values

I am given two strings which contain a floating point number. I need to compare them. Can I directly compare the strings using std::string::compare and will this always give correct results? My current approach is to convert the string to float using std::stof, however I would prefer to avoid C++11 library functions.

simply comparing strings won't help you in cases like
a = "0.43"
b = "0.4300"
if you need to compare first parse them into float and then compare them
std::string s1 = "0.6"
std::wstring s2 = "0.7"
float d1 = std::stof(s1);
float d2 = std::stof(s2);
and then compare them
here is a full program
#include <iostream> // std::cout
#include <string> // std::string, std::stof
int main ()
{
std::string s1 = "0.6"
std::wstring s2 = "0.7"
float d1 = std::stof(s1);
float d2 = std::stof(s2);
if(d1 == d2)
std::cout << "Equals!";
else
std::cout << "Not Equals!";
return 0;
}
click here for more reading on stof

What about writing some ugly codes? It may not be good practice but ...
int compare (const string &str1, const string &str2) {
string *s1 = &str1, *s2 = &str2;
int isReverse = 1;
int len1, len2;
if (str1.length() > str2.length()) {
s1 = &str2;
s2 = &str1;
isReverse = -1;
}
len1 = s1->length();
len2 = s2->length();
if (!len1) {
if (!len2))
return 0;
else if ((*s2)[0] != '-')
return 1*isReverse;
return -1*isReverse;
}
int i = 0;
while(i < len1) {
if ((*s1)[i] > (*s2)[i])
return 1*isReverse;
else if ((*s1)[i] < (*s2)[i])
return -1*isReverse;
i++;
}
while (i < len2) {
if ((*s2)[i] != '0')
return -1*isReverse;
i++;
}
return 0;
}

String Formatting using C / C++

Recently I was asked in an interview to convert the string "aabbbccccddddd" to "a2b3c4d5". The goal is to replace each repeated character with a single occurrence and a repeat count. Here 'a' is repeated twice in the input, so we have to write it as 'a2' in the output. Also I need to write a function to reverse the format back to the original one (e.g. from the string "a2b3c4d5" to "aabbbccccddddd"). I was free to use either C or C++. I wrote the below code, but the interviewer seemed to be not very happy with this. He asked me to try a smarter way than this.
In the below code, I used formatstring() to eliminate repeated chars by just adding the repeated count and used reverseformatstring() to convert back to the original string.
void formatstring(char* target, const char* source) {
int charRepeatCount = 1;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// Compare the current char with previous one,
// increment repeat count
if (*source == *(source-1)) {
charRepeatCount++;
source++;
} else {
if (charRepeatCount > 1) {
// Convert repeat count to string, append to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
charRepeatCount = 1; // Reset repeat count
}
*target = *source;
source++; target++;
}
}
}
if (charRepeatCount > 1) {
// Convert repeat count to string, append it to the target
char repeatStr[10];
_snprintf(repeatStr, 10, "%i", charRepeatCount);
int repeatCount = strlen(repeatStr);
for (int i = 0; i < repeatCount; i++) {
*target = repeatStr[i];
target++;
}
}
*target = '\0';
}
void reverseformatstring(char* target, const char* source) {
int charRepeatCount = 0;
bool isFirstChar = true;
while (*source != '\0') {
if (isFirstChar) {
// Always add the first character to the target
isFirstChar = false;
*target = *source;
source++; target++;
} else {
// If current char is alpha, add it to the target
if (isalpha(*source)) {
*target = *source;
target++; source++;
} else {
// Get repeat count of previous character
while (isdigit(*source)) {
int currentDigit = (*source) - '0';
charRepeatCount = (charRepeatCount == 0) ?
currentDigit : (charRepeatCount * 10 + currentDigit);
source++;
}
// Decrement repeat count as we have already written
// the first unique char to the target
charRepeatCount--;
// Repeat the last char for this count
while (charRepeatCount > 0) {
*target = *(target - 1);
target++;
charRepeatCount--;
}
}
}
}
*target = '\0';
}
I didn't find any issues with above code. Is there any other better way of doing this?

The approach/algorithm is fine, perhaps you could refine and shrink the code a bit (by doing something simpler, there's no need to solve this in an overly complex way). And choose an indentation style that actually makes sense.
A C solution:
void print_transform(const char *input)
{
for (const char *s = input; *s;) {
char current = *s;
size_t count = 1;
while (*++s == current) {
count++;
}
if (count > 1) {
printf("%c%zu", current, count);
} else {
putc(current, stdout);
}
}
putc('\n', stdout);
}
(This can be easily modified so that it returns the transformed string instead, or writes it to a long enough buffer.)
A C++ solution:
std::string transform(const std::string &input)
{
std::stringstream ss;
std::string::const_iterator it = input.begin();
while (it != input.end()) {
char current = *it;
std::size_t count = 1;
while (++it != input.end() && *it == current) {
count++;
}
if (count > 1) {
ss << current << count;
} else {
ss << current;
}
}
return ss.str();
}

Since several others have suggested very reasonable alternatives, I'd like to offer some opinions on what I think is your underlying question: "He asked me to try a smarter way than this.... Is there any other better way of doing this?"
When I interview a developer, I'm looking for signals that tell me how she approaches a problem:
Most important, as H2CO3 noted, is correctness: will the code work? I'm usually happy to overlook small syntax errors (forgotten semicolons, mismatched parens or braces, and so on) if the algorithm is sensible.
Proper use of the language, especially if the candidate claims expertise or has had extensive experience. Does he understand and use idioms appropriately to write straightforward, uncomplicated code?
Can she explain her train of thought as she formulates her solution? Is it logical and coherent, or is it a shotgun approach? Is she able and willing to communicate well?
Does he account for edge cases? And if so, does the intrinsic algorithm handle them, or is everything a special case? Although I'm happiest if the initial algorithm "just works" for all cases, I think it's perfectly acceptable to start with a verbose approach that covers all cases (or simply to add a "TODO" comment, noting that more work needs to be done), and then simplifying later, when it may be easier to notice patterns or duplicated code.
Does she consider error-handling? Usually, if a candidate starts by asking whether she can assume the input is valid, or with a comment like, "If this were production code, I'd check for x, y, and z problems," I'll ask what she would do, then suggest she focus on a working algorithm for now and (maybe) come back to that later. But I'm disappointed if a candidate doesn't mention it.
Testing, testing, testing! How will the candidate verify his code works? Does he walk through the code and suggest test cases, or do I need to remind him? Are the test cases sensible? Will they cover the edge cases?
Optimization: as a final step, after everything works and has been validated, I'll sometimes ask the candidate if she can improve her code. Bonus points if she suggests it without my prodding; negative points if she spends a lot of effort worrying about it before the code even works.
Applying these ideas to the code you wrote, I'd make these observations:
Using const appropriately is a plus, as it shows familiarity with the language. During an interview I'd probably ask a question or two about why/when to use it.
The proper use of char pointers throughout the code is a good sign. I tend to be pedantic about making the data types explicit within comparisons, particularly during interviews, so I'm happy to see, e.g.
while (*source != '\0') rather than the (common, correct, but IMO less careful) while(*source).
isFirstChar is a bit of a red flag, based on my "edge cases" point. When you declare a boolean to keep track of the code's state, there's often a way of re-framing the problem to handle the condition intrinsically. In this case, you can use charRepeatCount to decide if this is the first character in a possible series, so you won't need to test explicitly for the first character in the string.
By the same token, repeated code can also be a sign that an algorithm can be simplified. One improvement would be to move the conversion of charRepeatCount to a separate function. See below for an even better solution.
It's funny, but I've found that candidates rarely add comments to their code during interviews. Kudos for helpful ones, negative points for those of the ilk "Increment the counter" that add verbosity without information. It's generally accepted that, unless you're doing something weird (in which case you should reconsider what you've written), you should assume the person who reads your code is familiar with the programming language. So comments should explain your thought process, not translate the code back to English.
Excessive levels of nested conditionals or loops can also be a warning. You can eliminate one level of nesting by comparing each character to the next one instead of the previous one. This works even for the last character in the string, because it will be compared to the terminating null character, which won't match and can be treated like any other character.
There are simpler ways to convert charRepeatCount from an int to a string. For example, _snprintf() returns the number of bytes it "prints" to the string, so you can use
target += _snprintf(target, 10, "%i", charRepeatCount);
In the reversing function, you've used the ternary operator perfectly ... but it's not necessary to special-case the zero value: the math is the same regardless of its value. Again, there are also standard utility functions like atoi() that will convert the leading digits of a string into an integer for you.
Experienced developers will often include the increment or decrement operation as part of the condition in a loop, rather than as a separate statement at the bottom: while(charRepeatCount-- > 0). I'd raise an eyebrow but give you a point or two for humor and personality if you wrote this using the slide operator: while (charRepeatCount --> 0). But only if you'd promise not to use it in production.
Good luck with your interviewing!

I think your code is too complex for the task. Here's my approach (using C):
#include <ctype.h>
#include <stdio.h>
void format_str(char *target, char *source) {
int count;
char last;
while (*source != '\0') {
*target = *source;
last = *target;
target++;
source++;
for (count = 1; *source == last; source++, count++)
; /* Intentionally left blank */
if (count > 1)
target += sprintf(target, "%d", count);
}
*target = '\0';
}
void convert_back(char *target, char *source) {
char last;
int val;
while (*source != '\0') {
if (!isdigit((unsigned char) *source)) {
last = *source;
*target = last;
target++;
source++;
}
else {
for (val = 0; isdigit((unsigned char) *source); val = val*10 + *source - '0', source++)
; /* Intentionally left blank */
while (--val) {
*target = last;
target++;
}
}
}
*target = '\0';
}
format_str compresses the string, and convert_back uncompresses it.

Your code "works", but it doesn't adhere to some common patterns used in C++. You should have:
used std::string instead of plain char* array(s)
pass that string as const reference to avoid modification, since you write the result somewhere else;
use C++11 features such as ranged based for loops and lambdas as well.
I think the interviewer's purpose was to test your ability to deal with the C++11 standard, since the algorithm itself was pretty trivial.

Perhaps the interviewer wanted to test your knowledge of existing standard library tools. Here's how my take could look in C++:
#include <string>
#include <sstream>
#include <algorithm>
#include <iostream>
typedef std::string::const_iterator Iter;
std::string foo(Iter first, Iter last)
{
Iter it = first;
std::ostringstream result;
while (it != last) {
it = std::find_if(it, last, [=](char c){ return c != *it; });
result << *first << (it - first);
first = it;
}
return result.str();
}
int main()
{
std::string s = "aaabbbbbbccddde";
std::cout << foo(s.begin(), s.end());
}
An extra check is needed for empty input.

try this
std::string str="aabbbccccddddd";
for(int i=0;i<255;i++)
{
int c=0;
for(int j=0;j<str.length();j++)
{
if(str[j] == i)
c++;
}
if(c>0)
printf("%c%d",i,c);
}

My naive approach:
void pack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
int RepeatCount = 1;
while( '\0' != *Src_Ptr ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
Dst_Ptr += sprintf( Dst_Ptr, "%i", RepeatCount );
RepeatCount = 1;
}
}
*Dst_Ptr = '\0';
};
void unpack( char const * SrcStr, char * DstBuf ) {
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
char c = 0;
while( '\0' != *Src_Ptr ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
*Dst_Ptr = '\0';
};
But if interviewer asks for error-handling than solution turns to be much more complex (and ugly). My portable approach:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
// for MSVC
#ifdef _WIN32
#define snprintf sprintf_s
#endif
int pack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
int RepeatCount = 1;
// don't forget about buffers intercrossing
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) ) {
return 1;
}
// source string must contain no digits
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 \
&& !isdigit( *Src_Ptr ) && !Err ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
++Src_Ptr;
}
if( RepeatCount > 1 ) {
int res = snprintf( Dst_Ptr, DstBuf_End - Dst_Ptr - 1, "%i" \
, RepeatCount );
if( res < 0 ) {
Err = 1;
} else {
Dst_Ptr += res;
RepeatCount = 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int unpack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {
int Err = 0;
char const * Src_Ptr = SrcStr;
char * Dst_Ptr = DstBuf;
size_t SrcBuf_Size = strlen( SrcStr ) + 1;
char const * SrcBuf_End = SrcStr + SrcBuf_Size;
char const * DstBuf_End = DstBuf + DstBuf_Size;
char c = 0;
// don't forget about buffers intercrossing
// first character of source string must be non-digit
if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
|| (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) || isdigit( SrcStr[0] ) ) {
return 1;
}
// check for destination buffer overflow
while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 && !Err ) {
if( !isdigit( *Src_Ptr ) ) {
c = *Dst_Ptr = *Src_Ptr;
++Src_Ptr; ++Dst_Ptr;
} else {
int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
if( !repeat_count || repeat_count - 1 > DstBuf_End - Dst_Ptr - 1 ) {
Err = 1;
} else {
memset( Dst_Ptr, c, repeat_count - 1 );
Dst_Ptr += repeat_count - 1;
}
}
}
*Dst_Ptr = '\0';
return Err;
};
int main() {
char str[] = "aabbbccccddddd";
char buf1[128] = {0};
char buf2[128] = {0};
pack( str, buf1, 128 );
printf( "pack: %s -> %s\n", str, buf1 );
unpack( buf1, buf2, 128 );
printf( "unpack: %s -> %s\n", buf1, buf2 );
return 0;
}
Test: http://ideone.com/Y7FNE3. Also works in MSVC.

Try to make do with less boilerplate:
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;
template<typename in_iter,class ostream>
void torle(in_iter i, ostream &&o)
{
while (char c = *i++) {
size_t n = 1;
while ( *i == c )
++n, ++i;
o<<c<<n;
}
}
template<class istream, typename out_iter>
void fromrle(istream &&i, out_iter o)
{
char c; size_t n;
while (i>>c>>n)
while (n--) *o++=c;
}
int main()
{
typedef ostream_iterator<char> to;
string line; stringstream converted;
while (getline(cin,line)) {
torle(begin(line),converted);
cout<<converted.str()<<'\n';
fromrle(converted,ostream_iterator<char>(cout));
cout<<'\n';
}
}

How to find whether a string is guid in c++

How to find whether a string is guid in native c++? a code sample would help greatly

If you need to do it by hand (information from wikipedia):
Check the length (36, including hyphens)
Check that the hyphens are at the expected positions (9-14-19-24)
Check that all other characters are hexadecimal (isxdigit)

You could use a regex to see if it complies with GUID format.

Try the following code - could help.
_bstr_t sGuid( _T("guid to validate") );
GUID guid;
if( SUCCEEDED( ::CLSIDFromString( sGuid, &guid ) )
{
// Guid string is valid
}

Here's a faster, native C++ version without regular expressions or scanf()-like calls.
#include <cctype>
#include <string>
using namespace std;
bool isCanonicalUUID(const string &s) {
// Does it match the format 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'?
if (s.length() != 36) return false;
if (s[8] != '-' || s[13] != '-' ||
s[18] != '-' || s[23] != '-')
return false;
for (int i = 0; i < s.length(); i++) {
if (i == 8 || i == 13 || i == 18 || i == 23) continue;
if (isspace(s[i])) return false;
if (!isxdigit(s[i])) return true;
}
return true;
}
bool isMicrosoftUUID(const string &s) {
// Does it match the format '{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}'?
if (s.length() != 38) return false;
if (s[0] != '{' || s[37] != '}') return false;
if (s[10] != '-' || s[15] != '-' ||
s[20] != '-' || s[25] != '-')
return false;
for (int i = 1; i < s.length()-1; i++) {
if (i == 10 || i == 15 || i == 20 || i == 25) continue;
if (isspace(s[i])) return false;
if (!isxdigit(s[i])) return true;
}
return true;
}
bool isUUID(const string &s) {
return isCannonicalUUID(s) || isMicrosoftUUID(s);
}

This method leverages the scanf parser and works also in plain C:
#include <stdio.h>
#include <string.h>
int validateUUID(const char *candidate)
{
int tmp;
const char *s = candidate;
while (*s)
if (isspace(*s++))
return 0;
return s - candidate == 36
&& sscanf(candidate, "%4x%4x-%4x-%4x-%4x-%4x%4x%4x%c",
&tmp, &tmp, &tmp, &tmp, &tmp, &tmp, &tmp, &tmp, &tmp) == 8;
}
Test with e.g.:
int main(int argc, char *argv[])
{
if (argc > 1)
puts(validateUUID(argv[1]) ? "OK" : "Invalid");
}

Here's a code example in case you still need one :P
#include <ctype.h>
using namespace std;
bool isUUID(string uuid)
{
/*
* Check if the provided uuid is valid.
* 1. The length of uuids should always be 36.
* 2. Hyphens are expected at positions {9, 14, 19, 24}.
* 3. The rest characters should be simple xdigits.
*/
int hyphens[4] = {9, 14, 19, 24};
if (uuid.length() != 36)
{
return false;//Oops. The lenth doesn't match.
}
for (int i = 0, counter = 0; i < 36; i ++)
{
char var = uuid[i];
if (i == hyphens[counter] - 1)// Check if a hyphen is expected here.
{
// Yep. We need a hyphen here.
if (var != '-')
{
return false;// Oops. The character is not a hyphen.
}
else
{
counter++;// Move on to the next expected hyphen position.
}
}
else
{
// Nope. The character here should be a simple xdigit
if (isxdigit(var) == false)
{
return false;// Oops. The current character is not a hyphen.
}
}
}
return true;// Seen'em all!
}

How do I write a simple regular expression pattern matching function in C or C++?

This is a question in my paper test today, the function signature is
int is_match(char* pattern,char* string)
The pattern is limited to only ASCII chars and the quantification * and ?, so it is relatively simple. is_match should return 1 if matched, otherwise 0.
How do I do this?

Brian Kernighan provided a short article on A Regular Expression Matcher that Rob Pike wrote as a demonstration program for a book they were working on. The article is a very nice read explaining a bit about the code and regular expressions in general.
I have played with this code, making a few changes to experiment with some extensions such as to also return where in the string the pattern matches so that the substring matching the pattern can be copied from the original text.
From the article:
I suggested to Rob that we needed to find the smallest regular
expression package that would illustrate the basic ideas while still
recognizing a useful and non-trivial class of patterns. Ideally, the
code would fit on a single page.
Rob disappeared into his office, and at least as I remember it now,
appeared again in no more than an hour or two with the 30 lines of C
code that subsequently appeared in Chapter 9 of TPOP. That code
implements a regular expression matcher that handles these constructs:
c matches any literal character c
. matches any single character
^ matches the beginning of the input string
$ matches the end of the input string
* matches zero or more occurrences of the previous character
This is quite a useful class; in my own experience of using regular
expressions on a day-to-day basis, it easily accounts for 95 percent
of all instances. In many situations, solving the right problem is a
big step on the road to a beautiful program. Rob deserves great credit
for choosing so wisely, from among a wide set of options, a very small
yet important, well-defined and extensible set of features.
Rob's implementation itself is a superb example of beautiful code:
compact, elegant, efficient, and useful. It's one of the best examples
of recursion that I have ever seen, and it shows the power of C
pointers. Although at the time we were most interested in conveying
the important role of a good notation in making a program easier to
use and perhaps easier to write as well, the regular expression code
has also been an excellent way to illustrate algorithms, data
structures, testing, performance enhancement, and other important
topics.
The actual C source code from the article is very very nice.
/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}

See This Question for a solution you can not submit. See this paper for a description of how to implement a more readable one.

Here is recursive extendable implementation. Tested for first order of pattern complexity.
#include <string.h>
#include <string>
#include <vector>
#include <iostream>
struct Match {
Match():_next(0) {}
virtual bool match(const char * pattern, const char * input) const {
return !std::strcmp(pattern, input);
}
bool next(const char * pattern, const char * input) const {
if (!_next) return false;
return _next->match(pattern, input);
}
const Match * _next;
};
class MatchSet: public Match {
typedef std::vector<Match *> Set;
Set toTry;
public:
virtual bool match(const char * pattern, const char * input) const {
for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i) {
if ((*i)->match(pattern, input)) return true;
}
return false;
}
void add(Match * m) {
toTry.push_back(m);
m->_next = this;
}
~MatchSet() {
for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i)
if ((*i)->_next==this) (*i)->_next = 0;
}
};
struct MatchQuestion: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0] != '?')
return false;
if (next(pattern+1, input))
return true;
if (next(pattern+1, input+1))
return true;
return false;
}
};
struct MatchEmpty: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0]==0 && input[0]==0)
return true;
return false;
}
};
struct MatchAsterisk: public Match {
virtual bool match(const char * pattern, const char * input) const {
if (pattern[0] != '*')
return false;
if (pattern[1] == 0) {
return true;
}
for (int i = 0; input[i] != 0; ++i) {
if (next(pattern+1, input+i))
return true;
}
return false;
}
};
struct MatchSymbol: public Match {
virtual bool match(const char * pattern, const char * input) const {
// TODO: consider cycle here to prevent unnecessary recursion
// Cycle should detect special characters and call next on them
// Current implementation abstracts from that
if (pattern[0] != input[0])
return false;
return next(pattern+1, input+1);
}
};
class DefaultMatch: public MatchSet {
MatchEmpty empty;
MatchQuestion question;
MatchAsterisk asterisk;
MatchSymbol symbol;
public:
DefaultMatch() {
add(&empty);
add(&question);
add(&asterisk);
add(&symbol);
}
void test(const char * p, const char * input) const {
testOneWay(p, input);
if (!std::strcmp(p, input)) return;
testOneWay(input, p);
}
bool testOneWay(const char * p, const char * input) const {
const char * eqStr = " == ";
bool rv = match(p, input);
if (!rv) eqStr = " != ";
std::cout << p << eqStr << input << std::endl;
return rv;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
using namespace std;
typedef vector<string> Strings;
Strings patterns;
patterns.push_back("*");
patterns.push_back("*hw");
patterns.push_back("h*w");
patterns.push_back("hw*");
patterns.push_back("?");
patterns.push_back("?ab");
patterns.push_back("a?b");
patterns.push_back("ab?");
patterns.push_back("c");
patterns.push_back("cab");
patterns.push_back("acb");
patterns.push_back("abc");
patterns.push_back("*this homework?");
patterns.push_back("Is this homework?");
patterns.push_back("This is homework!");
patterns.push_back("How is this homework?");
patterns.push_back("hw");
patterns.push_back("homework");
patterns.push_back("howork");
DefaultMatch d;
for (unsigned i = 0; i < patterns.size(); ++i)
for (unsigned j =i; j < patterns.size(); ++j)
d.test(patterns[i].c_str(), patterns[j].c_str());
return 0;
}
If something is unclear, ask.

Cheat. Use #include <boost/regex/regex.hpp>.

try to make a list of interesting test cases:
is_match("dummy","dummy") should
return true;
is_match("dumm?y","dummy") should
return true;
is_match("dum?y","dummy")
should return false;
is_match("dum*y","dummy") should
return true;
and so on ...
then see how to make the easier test pass, then the next one ...

Didn't test this, actually code it, or debug it, but this might get you a start...
for each character in the pattern
if pattern character after the current one is *
// enter * state
while current character from target == current pattern char, and not at end
get next character from target
skip a char from the pattern
else if pattern character after the current one is ?
// enter ? state
if current character from target == current pattern char
get next char from target
skip a char from the pattern
else
// enter character state
if current character from target == current pattern character
get next character from target
else
return false
return true

The full power of regular expressions and finite state machines are not needed to solve this problem. As an alternative there is a relatively simple dynamic programming solution.
Let match(i, j) be 1 if it is possible to match the the sub-string string[i..n-1] with the sub-pattern pattern[j, m - 1], where n and m are the lengths of string and pattern respectively. Otherwise let match(i, j) be 0.
The base cases are:
match(n, m) = 1, you can match an empty string with an empty pattern;
match(i, m) = 0, you can't match a non-empty string with an empty pattern;
The transition is divided into 3 cases depending on whether the current sub-pattern starts with a character followed by a '*', or a character followed by a '?' or just starts with a character with no special symbol after it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int is_match(char* pattern, char* string)
{
int n = strlen(string);
int m = strlen(pattern);
int i, j;
int **match;
match = (int **) malloc((n + 1) * sizeof(int *));
for(i = 0; i <= n; i++) {
match[i] = (int *) malloc((m + 1) * sizeof(int));
}
for(i = n; i >= 0; i--) {
for(j = m; j >= 0; j--) {
if(i == n && j == m) {
match[i][j] = 1;
}
else if(i < n && j == m) {
match[i][j] = 0;
}
else {
match[i][j] = 0;
if(pattern[j + 1] == '*') {
if(match[i][j + 2]) match[i][j] = 1;
if(i < n && pattern[j] == string[i] && match[i + 1][j]) match[i][j] = 1;
}
else if(pattern[j + 1] == '?') {
if(match[i][j + 2]) match[i][j] = 1;
if(i < n && pattern[j] == string[i] && match[i + 1][j + 2]) match[i][j] = 1;
}
else if(i < n && pattern[j] == string[i] && match[i + 1][j + 1]) {
match[i][j] = 1;
}
}
}
}
int result = match[0][0];
for(i = 0; i <= n; i++) {
free(match[i]);
}
free(match);
return result;
}
int main(void)
{
printf("is_match(dummy, dummy) = %d\n", is_match("dummy","dummy"));
printf("is_match(dumm?y, dummy) = %d\n", is_match("dumm?y","dummy"));
printf("is_match(dum?y, dummy) = %d\n", is_match("dum?y","dummy"));
printf("is_match(dum*y, dummy) = %d\n", is_match("dum*y","dummy"));
system("pause");
return 0;
}
The time complexity of this approach is O(n * m). The memory complexity is also O(n * m) but with a simple modification can be reduced to O(m).

Simple recursive implementation. It's slow but easy to understand:
int is_match(char *pattern, char *string)
{
if (!pattern[0]) {
return !string[0];
} else if (pattern[1] == '?') {
return (pattern[0] == string[0] && is_match(pattern+2, string+1))
|| is_match(pattern+2, string);
} else if (pattern[1] == '*') {
size_t i;
for (i=0; string[i] == pattern[0]; i++)
if (is_match(pattern+2, string+i)) return 1;
return 0;
} else {
return pattern[0] == string[0] && is_match(pattern+1, string+1);
}
}
Hope I got it all right.

A C program to find the index,from where the sub-string in the main string is going to start.
enter code here
#include<stdio.h>
int mystrstr (const char *,const char *);
int mystrcmp(char *,char *);
int main()
{
char *s1,*s2;//enter the strings, s1 is main string and s2 is substring.
printf("Index is %d\n",mystrstr(s1,s2));
//print the index of the string if string is found
}
//search for the sub-string in the main string
int mystrstr (const char *ps1,const char *ps2)
{
int i=0,j=0,c=0,l,m;char *x,*y;
x=ps1;
y=ps2;
while(*ps1++)i++;
while(*ps2++)j++;
ps1=x;
ps2=y;
char z[j];
for(l=0;l<i-j;l++)
{
for(m=l;m<j+l;m++)
//store the sub-string of similar size from main string
z[c++]=ps1[m];
z[c]='\0'
c=0;
if(mystrcmp(z,ps2)==0)
break;
}
return l;
}
int mystrcmp(char *ps3,char *ps4) //compare two strings
{
int i=0;char *x,*y;
x=ps3;y=ps4;
while((*ps3!=0)&&(*ps3++==*ps4++))i++;
ps3=x;ps4=y;
if(ps3[i]==ps4[i])
return 0;
if(ps3[i]>ps4[i])
return +1;
else
return -1;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js