How to replace escaped symbol with its actual value?

How to replace escaped symbol with its actual value? - c++

I have a string like escape new lines\n. When printing it, it doesn't replace \n with a new line, it just prints escape new lines\n. I'm iterating over this std::string, and I want to replace \n with it's actual value:
bool seenSlash = false;
for (int index = 0; index < text.length(); ++index) {
if (text[index] == '\\') {
seenSlash = true;
continue;
}
if (seenSlash) {
// what to do here?
seenSlash = false;
}
}
It's not just \n, I want to support all of those escaping symbols. How can I do it?

WHat you could do is a switch statement or using a constant std::map<char,char>:
if (seenSlash) {
bool replace = false;
switch(text[index]) {
case 'n':
replace = true;
text[index-1] = '\n';
break;
case 't':
replace = true;
text[index-1] = '\t';
break;
// ... etc.
}
if(replace) {
text.erase(text.begin() + index); // erase the cuurent char, the
// backslash char was replaced
--index; // adapt the index for the next iteration
}
seenSlash = false;
}
const std::map<char,char> escaped_chars = {
{ 'n', '\n' } ,
{ 't', '\t' } ,
{ 'a', '\a' } ,
// ... etc.
};
if (seenSlash) {
bool replace = escaped_chars.find(text[i]) != escaped_chars.end();
if(replace) {
text[index - 1] = escaped_chars[text[i]];
text.erase(text.begin() + index); // erase the cuurent char, the
// backslash char was replaced
--index; // adapt the index for the next iteration
}
seenSlash = false;
}

Related

How do I parse comma-delimited string in C++ with some elements being quoted with commas?

I have a comma-delimited string that I want to store in a string vector. The string and vectors are:
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v;
Ideally I want the strings 'abc' and 'test, 1' to be stored without the single quotes as below, but I can live with storing them with single quotes:
v[0] = "1";
v[1] = "10";
v[2] = "abc";
v[3] = "test, 1";

bool nextToken(const string &s, string::size_type &start, string &token)
{
token.clear();
start = s.find_first_not_of(" \t", start);
if (start == string::npos)
return false;
string::size_type end;
if (s[start] == '\'')
{
++start;
end = s.find('\'', start);
}
else
end = s.find_first_of(" \t,", start);
if (end == string::npos)
{
token = s.substr(start);
start = s.size();
}
else
{
token = s.substr(start, end-start);
if ((s[end] != ',') && ((end = s.find(',', end + 1)) == string::npos))
start = s.size();
else
start = end + 1;
}
return true;
}
string s = "1, 10, 'abc', 'test, 1'", token;
vector<string> v;
string::size_type start = 0;
while (nextToken(s, start, token))
v.push_back(token);
Demo

What you need to do here, is make yourself a parser that parses as you want it to. Here I have made a parsing function for you:
#include <string>
#include <vector>
using namespace std;
vector<string> parse_string(string master) {
char temp; //the current character
bool encountered = false; //for checking if there is a single quote
string curr_parse; //the current string
vector<string>result; //the return vector
for (int i = 0; i < master.size(); ++i) { //while still in the string
temp = master[i]; //current character
switch (temp) { //switch depending on the character
case '\'': //if the character is a single quote
if (encountered) encountered = false; //if we already found a single quote, reset encountered
else encountered = true; //if we haven't found a single quote, set encountered to true
[[fallthrough]];
case ',': //if it is a comma
if (!encountered) { //if we have not found a single quote
result.push_back(curr_parse); //put our current string into our vector
curr_parse = ""; //reset the current string
break; //go to next character
}//if we did find a single quote, go to the default, and push_back the comma
[[fallthrough]];
default: //if it is a normal character
if (encountered && isspace(temp)) curr_parse.push_back(temp); //if we have found a single quote put the whitespace, we don't care
else if (isspace(temp)) break; //if we haven't found a single quote, trash the whitespace and go to the next character
else if (temp == '\'') break; //if the current character is a single quote, trash it and go to the next character.
else curr_parse.push_back(temp); //if all of the above failed, put the character into the current string
break; //go to the next character
}
}
for (int i = 0; i < result.size(); ++i) {
if (result[i] == "") result.erase(result.begin() + i);
//check that there are no empty strings in the vector
//if there are, delete them
}
return result;
}
This parses your string as you want it to, and returns a vector. Then, you can use it in your program:
#include <iostream>
int main() {
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v = parse_string(s);
for (int i = 0; i < v.size(); ++i) {
cout << v[i] << endl;
}
}
and it properly prints out:
1
10
abc
test, 1

A proper solution would require a parser implementation. If you need a quick hack, just write a cell reading function (demo). The c++14's std::quoted manipulator is of great help here. The only problem is the manipulator requires a stream. This is easily solved with istringstream - see the second function. Note that the format of your string is CELL COMMA CELL COMMA... CELL.
istream& get_cell(istream& is, string& s)
{
char c;
is >> c; // skips ws
is.unget(); // puts back in the stream the last read character
if (c == '\'')
return is >> quoted(s, '\'', '\\'); // the first character of the cell is ' - read quoted
else
return getline(is, s, ','), is.unget(); // read unqoted, but put back comma - we need it later, in get function
}
vector<string> get(const string& s)
{
istringstream iss{ s };
string cell;
vector<string> r;
while (get_cell(iss, cell))
{
r.push_back( cell );
char comma;
iss >> comma; // expect a cell separator
if (comma != ',')
break; // cell separator not found; we are at the end of stream/string - break the loop
}
if (char c; iss >> c) // we reached the end of what we understand - probe the end of stream
throw "ill formed";
return r;
}
And this is how you use it:
int main()
{
string s = "1, 10, 'abc', 'test, 1'";
try
{
auto v = get(s);
}
catch (const char* e)
{
cout << e;
}
}

How do I replace all special characters in a string with the escape character?

If I had a string that looked like \"A\\nB\", how do I transform it to look like "A\nB"?
The \n portion should work as a new line. It should not be printing "A\nB", it should be printing as follows:
A
B

You can create a separate function for removing escape characters from your std::string in a following way:
std::string remove_escape_char(std::string const& s) {
std::string result;
auto it = s.begin();
while (it != s.end()) {
char c = *it++;
if (c == '\\' && it != s.end()) {
switch (*it++) {
case '\\':
c = '\\';
break;
case 'n':
c = '\n';
break;
default:
continue;
}
}
result += c;
}
return result;
}
and then the function for removing special characters from std::string:
void remove_special_char(std::string& str, char c) {
auto position = str.find(c);
while (position != std::string::npos) {
str.erase(position, 1);
position = str.find(c);
}
}
You can use above two functions like:
std::string str{"\"A\\nB\""};
remove_special_char(str, 0x22); // remove "
std::cout << remove_escape_char(str) << std::endl;
Now the output should be:
A
B
Demo

This should do the work:
if(example.size() > 1) {
for (auto i = 0ul; i < example.size() - 1; ++i) // loop through the string char by char
if (example[i] != '\\' || (example[i + 1] == '\\'))
result += example[i]; // if the current one is a \ and the next char is different to \ (for the case of \\) remove it
if (example[example.size() - 1] == '\\' && example[example.size() - 1] == '\\') result += '\\'; // check for the last char
} else if(example.size() == 1 && example[0] != '\\') result+=example[0]; // check for special case there the string is just one char long

Converting a cstring to camelcase

So my task is to fill out my function to work with a test driver that feeds it a random string during every run. For this function I have to convert the first character of every word to a capital and everything else must be lower.
It mostly works but the issue i'm having with my code is that it won't capitalize the very first character and if there is a period before the word like:
.word
The 'w' in this case would remain lower.
Here is my source:
void camelCase(char line[])
{
int index = 0;
bool lineStart = true;
for (index;line[index]!='\0';index++)
{
if (lineStart)
{
line[index] = toupper(line[index]);
lineStart = false;
}
if (line[index] == ' ')
{
if (ispunct(line[index]))
{
index++;
line[index] = toupper(line[index]);
}
else
{
index++;
line[index] = toupper(line[index]);
}
}else
line[index] = tolower(line[index]);
}
lineStart = false;
}

Here's a solution that should work and is a bit less complicated in my opinion:
#include <iostream>
#include <cctype>
void camelCase(char line[]) {
bool active = true;
for(int i = 0; line[i] != '\0'; i++) {
if(std::isalpha(line[i])) {
if(active) {
line[i] = std::toupper(line[i]);
active = false;
} else {
line[i] = std::tolower(line[i]);
}
} else if(line[i] == ' ') {
active = true;
}
}
}
int main() {
char arr[] = "hELLO, wORLD!"; // Hello, World!
camelCase(arr);
std::cout << arr << '\n';
}
The variable active tracks whether the next letter should be transformed to an uppercase letter. As soon as we have transformed a letter to uppercase form, active becomes false and the program starts to transform letters into lowercase form. If there's a space, active is set to true and the whole process starts again.

Solution using std::string
void toCamelCase(std::string & s)
{
char previous = ' ';
auto f = [&](char current){
char result = (std::isblank(previous) && std::isalpha(current)) ? std::toupper(current) : std::tolower(current);
previous = current;
return result;
};
std::transform(s.begin(),s.end(),s.begin(),f);
}

Interview: Machine coding / regex (Better alternative to my solution)

The following is the interview question:
Machine coding round: (Time 1hr)
Expression is given and a string testCase, need to evaluate the testCase is valid or not for expression
Expression may contain:
letters [a-z]
'.' ('.' represents any char in [a-z])
'*' ('*' has same property as in normal RegExp)
'^' ('^' represents start of the String)
'$' ('$' represents end of String)
Sample cases:
Expression Test Case Valid
ab ab true
a*b aaaaaab true
a*b*c* abc true
a*b*c aaabccc false
^abc*b abccccb true
^abc*b abbccccb false
^abcd$ abcd true
^abc*abc$ abcabc true
^abc.abc$ abczabc true
^ab..*abc$ abyxxxxabc true
My approach:
Convert the given regular expression into concatenation(ab), alteration(a|b), (a*) kleenstar.
And add + for concatenation.
For example:
abc$ => .*+a+b+c
^ab..*abc$ => a+b+.+.*+a+b+c
Convert into postfix notation based on precedence.
(parantheses>kleen_star>concatenation>..)
(a|b)*+c => ab|*c+
Build NFA based on Thompson construction
Backtracking / traversing through NFA by maintaining a set of states.
When I started implementing it, it took me a lot more than 1 hour. I felt that the step 3 was very time consuming. I built the NFA by using postfix notation +stack and by adding new states and transitions as needed.
So, I was wondering if there is faster alternative solution this question? Or maybe a faster way to implement step 3. I found this CareerCup link where someone mentioned in the comment that it was from some programming contest. So If someone has solved this previously or has a better solution to this question, I'd be happy to know where I went wrong.

Some derivation of Levenshtein distance comes to mind - possibly not the fastest algorithm, but it should be quick to implement.
We can ignore ^ at the start and $ at the end - anywhere else is invalid.
Then we construct a 2D grid where each row represents a unit [1] in the expression and each column represents a character in the test string.
[1]: A "unit" here refers to a single character, with the exception that * shall be attached to the previous character
So for a*b*c and aaabccc, we get something like:
a a a b c c c
a*
b*
c
Each cell can have a boolean value indicating validity.
Now, for each cell, set it to valid if either of these hold:
The value in the left neighbour is valid and the row is x* or .* and the column is x (x being any character a-z)
This corresponds to a * matching one additional character.
The value in the upper-left neighbour is valid and the row is x or . and the column is x (x being any character a-z)
This corresponds to a single-character match.
The value in the top neighbour is valid and the row is x* or .*.
This corresponds to the * matching nothing.
Then check if the bottom-right-most cell is valid.
So, for the above example, we get: (V indicating valid)
a a a b c c c
a* V V V - - - -
b* - - - V - - -
c - - - - V - -
Since the bottom-right cell isn't valid, we return invalid.
Running time: O(stringLength*expressionLength).
You should notice that we're mostly exploring a fairly small part of the grid.
This solution can be improved by making it a recursive solution making use of memoization (and just calling the recursive solution for the bottom-right cell).
This will give us a best-case performance of O(1), but still a worst-case performance of O(stringLength*expressionLength).
My solution assumes the expression must match the entire string, as inferred from the result of the above example being invalid (as per the question).
If it can instead match a substring, we can modify this slightly so, if the cell is in the top row it's valid if:
The row is x* or .*.
The row is x or . and the column is x.

Given only 1 hour we can use simple way.
Split pattern into tokens: a*b.c => { a* b . c }.
If pattern doesn't start with ^ then add .* in the beginning, else remove ^.
If pattern doesn't end with $ then add .* in the end, else remove $.
Then we use recursion: going 3 way in case if we have recurring pattern (increase pattern index by 1, increase word index by 1, increase both indices by 1), going one way if it is not recurring pattern (increase both indices by 1).
Sample code in C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace ReTest
{
class Program
{
static void Main(string[] args)
{
Debug.Assert(IsMatch("ab", "ab") == true);
Debug.Assert(IsMatch("aaaaaab", "a*b") == true);
Debug.Assert(IsMatch("abc", "a*b*c*") == true);
Debug.Assert(IsMatch("aaabccc", "a*b*c") == true); /* original false, but it should be true */
Debug.Assert(IsMatch("abccccb", "^abc*b") == true);
Debug.Assert(IsMatch("abbccccb", "^abc*b") == false);
Debug.Assert(IsMatch("abcd", "^abcd$") == true);
Debug.Assert(IsMatch("abcabc", "^abc*abc$") == true);
Debug.Assert(IsMatch("abczabc", "^abc.abc$") == true);
Debug.Assert(IsMatch("abyxxxxabc", "^ab..*abc$") == true);
}
static bool IsMatch(string input, string pattern)
{
List<PatternToken> patternTokens = new List<PatternToken>();
for (int i = 0; i < pattern.Length; i++)
{
char token = pattern[i];
if (token == '^')
{
if (i == 0)
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
else
throw new ArgumentException("input");
}
else if (char.IsLower(token) || token == '.')
{
if (i < pattern.Length - 1 && pattern[i + 1] == '*')
{
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Multiple });
i++;
}
else
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
}
else if (token == '$')
{
if (i == pattern.Length - 1)
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
else
throw new ArgumentException("input");
}
else
throw new ArgumentException("input");
}
PatternToken firstPatternToken = patternTokens.First();
if (firstPatternToken.Token == '^')
patternTokens.RemoveAt(0);
else
patternTokens.Insert(0, new PatternToken { Token = '.', Occurence = Occurence.Multiple });
PatternToken lastPatternToken = patternTokens.Last();
if (lastPatternToken.Token == '$')
patternTokens.RemoveAt(patternTokens.Count - 1);
else
patternTokens.Add(new PatternToken { Token = '.', Occurence = Occurence.Multiple });
return IsMatch(input, 0, patternTokens, 0);
}
static bool IsMatch(string input, int inputIndex, IList<PatternToken> pattern, int patternIndex)
{
if (inputIndex == input.Length)
{
if (patternIndex == pattern.Count || (patternIndex == pattern.Count - 1 && pattern[patternIndex].Occurence == Occurence.Multiple))
return true;
else
return false;
}
else if (inputIndex < input.Length && patternIndex < pattern.Count)
{
char c = input[inputIndex];
PatternToken patternToken = pattern[patternIndex];
if (patternToken.Token == '.' || patternToken.Token == c)
{
if (patternToken.Occurence == Occurence.Single)
return IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
else
return IsMatch(input, inputIndex, pattern, patternIndex + 1) ||
IsMatch(input, inputIndex + 1, pattern, patternIndex) ||
IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
}
else
return false;
}
else
return false;
}
class PatternToken
{
public char Token { get; set; }
public Occurence Occurence { get; set; }
public override string ToString()
{
if (Occurence == Occurence.Single)
return Token.ToString();
else
return Token.ToString() + "*";
}
}
enum Occurence
{
Single,
Multiple
}
}
}

Here is a solution in Java. Space and Time is O(n). Inline comments are provided for more clarity:
/**
* #author Santhosh Kumar
*
*/
public class ExpressionProblemSolution {
public static void main(String[] args) {
System.out.println("---------- ExpressionProblemSolution - start ---------- \n");
ExpressionProblemSolution evs = new ExpressionProblemSolution();
evs.runMatchTests();
System.out.println("\n---------- ExpressionProblemSolution - end ---------- ");
}
// simple node structure to keep expression terms
class Node {
Character ch; // char [a-z]
Character sch; // special char (^, *, $, .)
Node next;
Node(Character ch1, Character sch1) {
ch = ch1;
sch = sch1;
}
Node add(Character ch1, Character sch1) {
this.next = new Node(ch1, sch1);
return this.next;
}
Node next() {
return this.next;
}
public String toString() {
return "[ch=" + ch + ", sch=" + sch + "]";
}
}
private boolean letters(char ch) {
return (ch >= 'a' && ch <= 'z');
}
private boolean specialChars(char ch) {
return (ch == '.' || ch == '^' || ch == '*' || ch == '$');
}
private void validate(String expression) {
// if expression has invalid chars throw runtime exception
if (expression == null) {
throw new RuntimeException(
"Expression can't be null, but it can be empty");
}
char[] expr = expression.toCharArray();
for (int i = 0; i < expr.length; i++) {
if (!letters(expr[i]) && !specialChars(expr[i])) {
throw new RuntimeException(
"Expression contains invalid char at position=" + i
+ ", invalid_char=" + expr[i]
+ " (allowed chars are 'a-z', *, . ^, * and $)");
}
}
}
// Parse the expression and split them into terms and add to list
// the list is FSM (Finite State Machine). The list is used during
// the process step to iterate through the machine states based
// on the input string
//
// expression = a*b*c has 3 terms -> [a*] [b*] [c]
// expression = ^ab.*c$ has 4 terms -> [^a] [b] [.*] [c$]
//
// Timing : O(n) n -> expression length
// Space : O(n) n -> expression length decides the no.of terms stored in the list
private Node preprocess(String expression) {
debug("preprocess - start [" + expression + "]");
validate(expression);
Node root = new Node(' ', ' '); // root node with empty values
Node current = root;
char[] expr = expression.toCharArray();
int i = 0, n = expr.length;
while (i < n) {
debug("i=" + i);
if (expr[i] == '^') { // it is prefix operator, so it always linked
// to the char after that
if (i + 1 < n) {
if (i == 0) { // ^ indicates start of the expression, so it
// must be first in the expr string
current = current.add(expr[i + 1], expr[i]);
i += 2;
continue;
} else {
throw new RuntimeException(
"Special char ^ should be present only at the first position of the expression (position="
+ i + ", char=" + expr[i] + ")");
}
} else {
throw new RuntimeException(
"Expression missing after ^ (position=" + i
+ ", char=" + expr[i] + ")");
}
} else if (letters(expr[i]) || expr[i] == '.') { // [a-z] or .
if (i + 1 < n) {
char nextCh = expr[i + 1];
if (nextCh == '$' && i + 1 != n - 1) { // if $, then it must
// be at the last
// position of the
// expression
throw new RuntimeException(
"Special char $ should be present only at the last position of the expression (position="
+ (i + 1)
+ ", char="
+ expr[i + 1]
+ ")");
}
if (nextCh == '$' || nextCh == '*') { // a* or b$
current = current.add(expr[i], nextCh);
i += 2;
continue;
} else {
current = current.add(expr[i], expr[i] == '.' ? expr[i]
: null);
i++;
continue;
}
} else { // a or b
current = current.add(expr[i], null);
i++;
continue;
}
} else {
throw new RuntimeException("Invalid char - (position=" + (i)
+ ", char=" + expr[i] + ")");
}
}
debug("preprocess - end");
return root;
}
// Traverse over the terms in the list and iterate and match the input string
// The terms list is the FSM (Finite State Machine); the end of list indicates
// end state. That is, input is valid and matching the expression
//
// Timing : O(n) for pre-processing + O(n) for processing = 2O(n) = ~O(n) where n -> expression length
// Timing : O(2n) ~ O(n)
// Space : O(n) where n -> expression length decides the no.of terms stored in the list
public boolean process(String expression, String testString) {
Node root = preprocess(expression);
print(root);
Node current = root.next();
if (root == null || current == null)
return false;
int i = 0;
int n = testString.length();
debug("input-string-length=" + n);
char[] test = testString.toCharArray();
// while (i < n && current != null) {
while (current != null) {
debug("process: i=" + i);
debug("process: ch=" + current.ch + ", sch=" + current.sch);
if (current.sch == null) { // no special char just [a-z] case
if (test[i] != current.ch) { // test char and current state char
// should match
return false;
} else {
i++;
current = current.next();
continue;
}
} else if (current.sch == '^') { // process start char
if (i == 0 && test[i] == current.ch) {
i++;
current = current.next();
continue;
} else {
return false;
}
} else if (current.sch == '$') { // process end char
if (i == n - 1 && test[i] == current.ch) {
i++;
current = current.next();
continue;
} else {
return false;
}
} else if (current.sch == '*') { // process repeat char
if (letters(current.ch)) { // like a* or b*
while (i < n && test[i] == current.ch)
i++; // move i till end of repeat char
current = current.next();
continue;
} else if (current.ch == '.') { // like .*
Node nextNode = current.next();
print(nextNode);
if (nextNode != null) {
Character nextChar = nextNode.ch;
Character nextSChar = nextNode.sch;
// a.*z = az or (you need to check the next state in the
// list)
if (test[i] == nextChar) { // test [i] == 'z'
i++;
current = current.next();
continue;
} else {
// a.*z = abz or
// a.*z = abbz
char tch = test[i]; // get 'b'
while (i + 1 < n && test[++i] == tch)
; // move i till end of repeat char
current = current.next();
continue;
}
}
} else { // like $* or ^*
debug("process: return false-1");
return false;
}
} else if (current.sch == '.') { // process any char
if (!letters(test[i])) {
return false;
}
i++;
current = current.next();
continue;
}
}
if (i == n && current == null) {
// string position is out of bound
// list is at end ie. exhausted both expression and input
// FSM reached the end state, hence the input is valid and matches the given expression
return true;
} else {
return false;
}
}
public void debug(Object str) {
boolean debug = false;
if (debug) {
System.out.println("[debug] " + str);
}
}
private void print(Node node) {
StringBuilder sb = new StringBuilder();
while (node != null) {
sb.append(node + " ");
node = node.next();
}
sb.append("\n");
debug(sb.toString());
}
public boolean match(String expr, String input) {
boolean result = process(expr, input);
System.out.printf("\n%-20s %-20s %-20s\n", expr, input, result);
return result;
}
public void runMatchTests() {
match("ab", "ab");
match("a*b", "aaaaaab");
match("a*b*c*", "abc");
match("a*b*c", "aaabccc");
match("^abc*b", "abccccb");
match("^abc*b", "abccccbb");
match("^abcd$", "abcd");
match("^abc*abc$", "abcabc");
match("^abc.abc$", "abczabc");
match("^ab..*abc$", "abyxxxxabc");
match("a*b*", ""); // handles empty input string
match("xyza*b*", "xyz");
}}

int regex_validate(char *reg, char *test) {
char *ptr = reg;
while (*test) {
switch(*ptr) {
case '.':
{
test++; ptr++; continue;
break;
}
case '*':
{
if (*(ptr-1) == *test) {
test++; continue;
}
else if (*(ptr-1) == '.' && (*test == *(test-1))) {
test++; continue;
}
else {
ptr++; continue;
}
break;
}
case '^':
{
ptr++;
while ( ptr && test && *ptr == *test) {
ptr++; test++;
}
if (!ptr && !test)
return 1;
if (ptr && test && (*ptr == '$' || *ptr == '*' || *ptr == '.')) {
continue;
}
else {
return 0;
}
break;
}
case '$':
{
if (*test)
return 0;
break;
}
default:
{
printf("default case.\n");
if (*ptr != *test) {
return 0;
}
test++; ptr++; continue;
}
break;
}
}
return 1;
}
int main () {
printf("regex=%d\n", regex_validate("ab", "ab"));
printf("regex=%d\n", regex_validate("a*b", "aaaaaab"));
printf("regex=%d\n", regex_validate("^abc.abc$", "abcdabc"));
printf("regex=%d\n", regex_validate("^abc*abc$", "abcabc"));
printf("regex=%d\n", regex_validate("^abc*b", "abccccb"));
printf("regex=%d\n", regex_validate("^abc*b", "abbccccb"));
return 0;
}

wild card matching in text string

My friend give this wild card(*) matching algorithm . Here is the code .
//This function compares text strings, one of which can have wildcards ('*').
//
BOOL GeneralTextCompare(
char * pTameText, // A string without wildcards
char * pWildText, // A (potentially) corresponding string with wildcards
BOOL bCaseSensitive = FALSE, // By default, match on 'X' vs 'x'
char cAltTerminator = '\0' // For function names, for example, you can stop at the first '('
)
{
BOOL bMatch = TRUE;
char * pAfterLastWild = NULL; // The location after the last '*', if we’ve encountered one
char * pAfterLastTame = NULL; // The location in the tame string, from which we started after last wildcard
char t, w;
// Walk the text strings one character at a time.
while (1)
{
t = *pTameText;
w = *pWildText;
// How do you match a unique text string?
if (!t || t == cAltTerminator)
{
// Easy: unique up on it!
if (!w || w == cAltTerminator)
{
break; // "x" matches "x"
}
else if (w == '*')
{
pWildText++;
continue; // "x*" matches "x" or "xy"
}
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
bMatch = FALSE;
break; // "x" doesn't match "xy"
}
else
{
if (!bCaseSensitive)
{
// Lowercase the characters to be compared.
if (t >= 'A' && t <= 'Z')
{
t += ('a' - 'A');
}
if (w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
}
// How do you match a tame text string?
if (t != w)
{
// The tame way: unique up on it!
if (w == '*')
{
pAfterLastWild = ++pWildText;
pAfterLastTame = pTameText;
w = *pWildText;
if (!w || w == cAltTerminator)
{
break; // "*" matches "x"
}
continue; // "*y" matches "xy"
}
else if (pAfterLastWild)
{
if (pAfterLastWild != pWildText)
{
pWildText = pAfterLastWild;
w = *pWildText;
if (!bCaseSensitive && w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
if (t == w)
{
pWildText++;
}
}
pTameText++;
continue; // "*sip*" matches "mississippi"
}
else
{
bMatch = FALSE;
break; // "x" doesn't match "y"
}
}
}
pTameText++;
pWildText++;
}
return bMatch;
}
This algo works as follow (according to me)
mississippi *sip*
mississippi sip*
ississippi sip*
ssissippi sip*
sissippi ip*
sissippi sip* pAfterLastWild is used to restore the location
issippi ip*
ssippi p*
ssippi sip* again pAfterLastWild is used here.
sippi ip*
sippi sip* here also.
ippi ip*
ppi p*
pi *
i *
I am not able to figure out why pAfterLastTame is needed and what does this piece of code is doing here as i am not able to find use of it .
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
This algo is pretty fast as number of comparisons are equal to size of tameString (correct me i am wrong) .
Does any one know more efficient algorithm than this ??

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js