How figlet fit and smush? - c++

I am creating a figlet like code in c++ and i am currently doing fiting by inserting null at left side character and find min space and remove minspace and null the character,( for understanding take null as '0' )
" __0 "
"| _|0 "
"| |0 _____ "
"| |0 |_____|"
"| |0 $ "
"|__|0 "
(if any space line i insert null at start) Now here min space is 3 so i remove 3 space and null in that and this code works perfect and smushing by inheriting the fitting class and i will pass the right side char by inserting null like
" 0"
" 0"
" _0____ "
"|0_____|"
" $0 "
" 0"
it will give result like
" __ 0"
"| _| 0"
"| | _0____ "
"| ||0_____|"
"| | $0 "
"|__| 0"
Now i will store pos of null and remove it, In next loop check the two character before the null, if any one are HardBlank then i return the function else i smush it in next loop, but above are not smush( not work correctly ) due to HardBlank, so i want to know how figlet actually smush i downloaded the figlet code from here but i did not understand the code.
There is any better algorithm than this or How figlet actually do fitting and smushing ?
All suggestions are welcome,
Thanks in advance.

I asked this question long time ago now, But I answering this question now for future readers I have written algorithm a better than this some time ago that do the following,
Kerning or fitting :
The main part of this thing is trim, so lets create an function that takes two input that figs is left side FIGchar and figc is right side FIGchar. The first thing to do is find the number of space that need to be removed from right side of figs and left side of figc, we can easily find this by counting total space in right size of figs and left side of figc. Finally take minimum of that that is the space count that has to be removed here is that implementation,
/**
* #brief trims in a deep
*
* #param figs fig string
* #param figc fig character
*/
void trim_deep(Figs_type &figs, Figc_type &figc) const
{
std::vector<size_type> elem;
for (size_type i = 0; i < figs.size(); ++i)
{
int lcount = 0, rcount = 0;
for (auto itr = figs[i].rbegin(); itr != figs[i].rend(); ++itr)
{
if (*itr == ' ')
++lcount;
else
break;
}
for (auto itr = figc[i].begin(); itr != figc[i].end(); ++itr)
{
if (*itr == ' ')
++rcount;
else
break;
}
elem.push_back(lcount + rcount);
}
size_type space = *std::min_element(elem.begin(), elem.end());
for (size_type i = 0; i < figs.size(); ++i)
{
size_type siz = space;
while (siz > 0 && figs[i].back() == ' ')
{
figs[i].pop_back();
--siz;
}
figc[i].erase(0, siz);
}
}
Smushing:
This smushing can be done easily by using above function with only smush right most character of figs and left side character of figc if it is smushble here is implementation,
/**
* #brief smush rules
* #param lc left character
* #param rc right character
* #return smushed character
*/
char_type smush_rules(char_type lc, char_type rc) const
{
//()
if (lc == ' ')
{
return rc;
}
if (rc == ' ')
{
return lc;
}
//(Equal character smush)
if (lc == rc)
{
return rc;
}
//(Underscores smush)
if (lc == '_' && this->cvt("|/\\[]{}()<>").find(rc) != string_type::npos)
{
return rc;
}
if (rc == '_' && this->cvt("|/\\[]{}()<>").find(lc) != string_type::npos)
{
return lc;
}
//(Hierarchy Smushing)
auto find_class = [](char_type ch) -> size_type
{
if (ch == '|')
{
return 1;
}
if (ch == '/' || ch == '\\')
{
return 3;
}
if (ch == '[' || ch == ']')
{
return 4;
}
if (ch == '{' || ch == '}')
{
return 5;
}
if (ch == '(' || ch == ')')
{
return 6;
}
return 0;
};
size_type c_lc = find_class(lc);
size_type c_rc = find_class(rc);
if (c_lc > c_rc)
{
return lc;
}
if (c_rc > c_lc)
{
return rc;
}
//(Opposite smush)
if (lc == '[' && rc == ']')
{
return '|';
}
if (lc == ']' && rc == '[')
{
return '|';
}
if (lc == '{' && rc == '}')
{
return '|';
}
if (lc == '}' && rc == '{')
{
return '|';
}
if (lc == '(' && rc == ')')
{
return '|';
}
if (lc == ')' && rc == '(')
{
return '|';
}
//(Big X smush)
if (lc == '/' && rc == '\\')
{
return '|';
}
if (lc == '\\' && rc == '/')
{
return 'Y';
}
if (lc == '>' && rc == '<')
{
return 'X';
}
//(universel smush)
return lc;
}
/**
* #brief smush algoriths on kerned Fig string and character
*
* #param figs
* #param figc
*/
void smush(Figs_type &figs, Figc_type figc, char_type hb) const
{
bool smushble = true;
for (size_type i = 0; i < figs.size(); ++i)
{
if (figs[i].size() == 0 || figc[i].size() == 0)
{
smushble = false;
}
else if ((figs[i].back() == hb) && !(figc[i].front() == hb))
{
smushble = false;
}
}
if (smushble)
{
for (size_type i = 0; i < figs.size(); ++i)
{
char_type val = smush_rules(figs[i].back(), figc[i].front());
figs[i].pop_back();
figc[i].erase(0, 1);
figs[i] += string_type(1, val) + figc[i];
}
}
else
{
for (size_type i = 0; i < figs.size(); ++i)
{
figs[i] += figc[i];
}
}
}
This code is directly copied from this file, So the types can be confusing here is overview Figs_type and Figc_type are just like vector of string and other type are reflects in their name and the repo can be found here.

Related

Checking balanced parantheses code is not behaving as per expectation

I am solving a question in which I have to check if the input string of parentheses are balanced or not,
and if not, code is expected to return the 1-based index of unmatched closing parenthesis, and if not found, return the 1-based index of the opening parenthesis. My code runs fine if I implement only the parenthesis checking part, but as I try to implement the returning index part, the code starts giving 'success' output for all the input.
Here is the code:
#include<iostream>
#include<string>
#include<algorithm>
#include<stack>
using namespace std;
int process_input( string value );
bool closing_bracket_match(char opening_bracket, char closing_bracket);
bool closing_bracket_match(char opening_bracket , char closing_bracket){
if( (opening_bracket == '{' && closing_bracket == '}') || (opening_bracket == '(' && closing_bracket == ')') || (opening_bracket == '[' &&
closing_bracket == ']') ){
return true;
}
else{
return false;
}
}
int process_input( string value ){
stack<char> processed_input{};
int unmatched_index{};
for( size_t i{}; i< value.size() ; ++i ){
if( value.at(i) == '{' || value.at(i) == '(' || value.at(i) == '[' ){ // check for opening brackets
processed_input.push(value.at(i)); // Appending opening bracket into the stack
}
else if( (value.at(i) == '}' || value.at(i) == ')' || value.at(i) == ']') && (processed_input.empty() == false) &&
closing_bracket_match(processed_input.top(),value.at(i)) ){ // the bracket in stack would be popped
processed_input.pop(); // matching brackets ar removed
}
}
if( processed_input.empty()==true ){
return 0;
}//This part is causing the bug
if(processed_input.empty() == false){
auto it = find( value.begin(), value.end(), processed_input.top() );
if( it!= value.end() ){
unmatched_index = distance(value.begin() , it)+1; //returning the 1 -based index of unmatched bracket
}
return unmatched_index;
}
}
int main(){
string input{};
cout<<"Please enter the code here: "; // debug line
cin>> input;
int result{};
result = process_input(input);
if( result == 0 ){
cout<<"Success";
}
else{
cout<<result;
}
}
If you want to return a position of the last (innermost) unmatched paren, you need to store it together with its position on the stack. Seeking for it leads to errors.
Which of potentially several items equal to the one you seek will find() find?
For example, in "(((" there are three unmatched opening parentheses, and all of them are equal to '('. Which one do you want to return as a result? Which one do you actually return?
And how about this input: "()("...?
Added
Here is a possible solution. Please note how it does not find() anything, but it stores on a stack all information necessary to produce the desired output.
#include<iostream>
#include<string>
#include<stack>
using std::string;
using std::stack;
bool is_opening(char c) {
return c == '(' || c == '[' || c == '{';
}
bool is_closing(char c) {
return c == ')' || c == ']' || c == '}';
}
bool is_matching(char opn, char cls) {
switch(opn) {
case '(': return cls == ')';
case '[': return cls == ']';
case '{': return cls == '}';
}
return false;
}
int process_input( string value )
{
stack<char> opn_parens{};
stack<size_t> positions{};
for( size_t i{}; i < value.size() ; ++i )
{
const char ch = value.at(i);
if( is_opening(ch) )
{
opn_parens.push(ch);
positions.push(i);
}
else if( is_closing(ch) )
{
if( opn_parens.empty() ) // a closing paren with no unmatched opening one
return i + 1;
const char opn_ch = opn_parens.top();
const size_t opn_pos = positions.top();
if( ! is_matching(opn_ch, ch) ) // unmatched closing paren
return opn_pos + 1;
opn_parens.pop(); // remove a matched paren
positions.pop();
}
}
if( ! positions.empty() ) // some unmatched parens remain
return positions.top() + 1;
return 0;
}
int main(){
std::cout << process_input("hello(mum[]{(dad()[bro!])})") << std::endl;
std::cout << process_input("))") << std::endl;
std::cout << process_input("([") << std::endl;
std::cout << process_input("([)") << std::endl;
std::cout << process_input("([{") << std::endl;
}
You can see it working at https://godbolt.org/z/e8fYW5fKz

How to delete all elements that has a vowel in an array [C++]

Im trying to eliminate just the vowels from my dynamic array.
with this function i just get blank spaces like this.
char* eliminarVocales(char* arreglo, int*size)
{
if (arreglo != nullptr)
{
for (int i = 0; i < *size; i++)
{
if (arreglo[i] == 'a' || arreglo[i] == 'e' || arreglo[i] == 'i' || arreglo[i] == 'o' || arreglo[i] == 'u')
{
arreglo[i] = NULL;
}
}
return arreglo;
}
}
Your program is not removing the vowels, but instead it is replacing their values by "NULL". So in the main function where you are calling this function. You have to write another function (e.g., display function) where you have to display only those values where the value!=NULL.
for(i=0;i>maxvalue;i++)
{
if(arreglo[i] != NULL)
{
cout<<"[i]"<<arreglo[i]<<endl;
}
}
Let me know if it helped u.
You are getting blanks because you are not actually removing any characters from your array, you are just replacing them will nulls, and then not ignoring the nulls when outputting the contents of the array.
Instead of doing the removal manually, consider using the standard std::remove_if() algorithm, which can move any vowels to the end of the array. And since you are passing the array size by pointer, you can modify its value to indicate the new size of the array minus any moved vowels.
For example:
#include <algorithm>
#include <cctype>
char* eliminarVocales(char* arreglo, int* size)
{
if (arreglo)
{
*size = std::distance(
arreglo,
std::remove_if(arreglo, arreglo + *size,
[](int ch){ ch = std::toupper(ch); return ((ch == 'A') || (ch == 'E') || (ch == 'I') || (ch == 'O') || (ch == 'U')); }
)
);
}
return arreglo;
}
Then you can use it like this:
void listarElementos(char* arreglo, int size)
{
for(int i = 0; i < size; ++i)
std::cout << "[" << i << "] : " << arreglo[i] << std::endl;
}
...
#include <cstring>
int size = 5;
char *arreglo = new char[size];
std::strcpyn(arreglo, "hello", 5);
...
listarElementos(arreglo, size); // shows "hello"
...
eliminarVocales(arreglo, &size);
...
listarElementos(arreglo, size); // shows "hll"
...
delete[] arreglo;
If you use a std::vector (or std::string) for your character array, you can then use the erase-remove idiom:
#include <algorithm>
#include <vector>
void eliminarVocales(std::vector<char> &arreglo)
{
arreglo.erase(
std::remove_if(arreglo.begin(), arreglo.end(),
[](int ch){ ch = std::toupper(ch); return ((ch == 'A') || (ch == 'E') || (ch == 'I') || (ch == 'O') || (ch == 'U')); }
),
arreglo.end()
);
}
void listarElementos(const std::vector<char> &arreglo)
{
for(std::size_t i = 0; i < arreglo.size(); ++i)
std::cout << "[" << i << "] : " << arreglo[i] << std::endl;
}
...
#include <cstring>
std::vector<char> arreglo(5);
std::strcpyn(arr.data(), "hello", 5);
...
listarElementos(arreglo); // shows "hello"
...
eliminarVocales(arreglo);
...
listarElementos(arreglo); // shows "hll"
...

Recursive comparison of two strings

The function int compare(...), checks if 2 strings are equal ignoring case and any non-alphabetical characters, e.g. "a?...!b" is equivalent to "ab". Returns 1 if equal, 0 else. However, there's a bug in my code!
int compare(const char* string1, const char* string2)
{
if(string1 == NULL || string2 == NULL)
return 0;
std::cout << *string1 << " | " << *string2 << std::endl;
if((!isalpha(*string1) && *string1 != ' ') && (!isalpha(*string2) && *string2 != ' '))
{
compare(++string1,++string2);
}
else if(!isalpha(*string1) && *string1 != ' ')
{
compare(++string1,string2);
}
else if(!isalpha(*string2) && *string2 != ' ')
{
compare(string1, ++string2);
}
if(tolower(*string1) != tolower(*string2))
return 0;
if(*string1 == '\0')
return 1;
if(*string1 == *string2)
compare(++string1, ++string2);
}
If I try and run this code with for example:
compare("a !!!b", "a b");
The output really confuses me:
a | b
|
! |
! |
! |
b | b
^#| ^#
| a
^#| ^#
| a
It returns 0 (not equal). It doesn't stop running once it gets to b | b, why?
Besides needing the return statement you have a flaw in your logic. You need to check if both strings are empty and thus equal earlier in the function:
int compare(const char* string1, const char* string2)
{
if(string1 == NULL || string2 == NULL)
return 0;
// This needs to go here
if(*string1 == '\0' && *string2 == '\0') {
return 1;
}
std::cout << *string1 << " | " << *string2 << std::endl;
if((!isalpha(*string1) && *string1 != ' ') && (!isalpha(*string2) && *string2 != ' '))
{
return compare(++string1,++string2);
}
else if(!isalpha(*string1) && *string1 != ' ')
{
return compare(++string1,string2);
}
else if(!isalpha(*string2) && *string2 != ' ')
{
return compare(string1, ++string2);
}
if(tolower(*string1) != tolower(*string2))
return 0;
if(*string1 == *string2)
return compare(++string1, ++string2);
}
You can check it here: https://ideone.com/Si78Nz

Interview: Machine coding / regex (Better alternative to my solution)

The following is the interview question:
Machine coding round: (Time 1hr)
Expression is given and a string testCase, need to evaluate the testCase is valid or not for expression
Expression may contain:
letters [a-z]
'.' ('.' represents any char in [a-z])
'*' ('*' has same property as in normal RegExp)
'^' ('^' represents start of the String)
'$' ('$' represents end of String)
Sample cases:
Expression Test Case Valid
ab ab true
a*b aaaaaab true
a*b*c* abc true
a*b*c aaabccc false
^abc*b abccccb true
^abc*b abbccccb false
^abcd$ abcd true
^abc*abc$ abcabc true
^abc.abc$ abczabc true
^ab..*abc$ abyxxxxabc true
My approach:
Convert the given regular expression into concatenation(ab), alteration(a|b), (a*) kleenstar.
And add + for concatenation.
For example:
abc$ => .*+a+b+c
^ab..*abc$ => a+b+.+.*+a+b+c
Convert into postfix notation based on precedence.
(parantheses>kleen_star>concatenation>..)
(a|b)*+c => ab|*c+
Build NFA based on Thompson construction
Backtracking / traversing through NFA by maintaining a set of states.
When I started implementing it, it took me a lot more than 1 hour. I felt that the step 3 was very time consuming. I built the NFA by using postfix notation +stack and by adding new states and transitions as needed.
So, I was wondering if there is faster alternative solution this question? Or maybe a faster way to implement step 3. I found this CareerCup link where someone mentioned in the comment that it was from some programming contest. So If someone has solved this previously or has a better solution to this question, I'd be happy to know where I went wrong.
Some derivation of Levenshtein distance comes to mind - possibly not the fastest algorithm, but it should be quick to implement.
We can ignore ^ at the start and $ at the end - anywhere else is invalid.
Then we construct a 2D grid where each row represents a unit [1] in the expression and each column represents a character in the test string.
[1]: A "unit" here refers to a single character, with the exception that * shall be attached to the previous character
So for a*b*c and aaabccc, we get something like:
a a a b c c c
a*
b*
c
Each cell can have a boolean value indicating validity.
Now, for each cell, set it to valid if either of these hold:
The value in the left neighbour is valid and the row is x* or .* and the column is x (x being any character a-z)
This corresponds to a * matching one additional character.
The value in the upper-left neighbour is valid and the row is x or . and the column is x (x being any character a-z)
This corresponds to a single-character match.
The value in the top neighbour is valid and the row is x* or .*.
This corresponds to the * matching nothing.
Then check if the bottom-right-most cell is valid.
So, for the above example, we get: (V indicating valid)
a a a b c c c
a* V V V - - - -
b* - - - V - - -
c - - - - V - -
Since the bottom-right cell isn't valid, we return invalid.
Running time: O(stringLength*expressionLength).
You should notice that we're mostly exploring a fairly small part of the grid.
This solution can be improved by making it a recursive solution making use of memoization (and just calling the recursive solution for the bottom-right cell).
This will give us a best-case performance of O(1), but still a worst-case performance of O(stringLength*expressionLength).
My solution assumes the expression must match the entire string, as inferred from the result of the above example being invalid (as per the question).
If it can instead match a substring, we can modify this slightly so, if the cell is in the top row it's valid if:
The row is x* or .*.
The row is x or . and the column is x.
Given only 1 hour we can use simple way.
Split pattern into tokens: a*b.c => { a* b . c }.
If pattern doesn't start with ^ then add .* in the beginning, else remove ^.
If pattern doesn't end with $ then add .* in the end, else remove $.
Then we use recursion: going 3 way in case if we have recurring pattern (increase pattern index by 1, increase word index by 1, increase both indices by 1), going one way if it is not recurring pattern (increase both indices by 1).
Sample code in C#
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace ReTest
{
class Program
{
static void Main(string[] args)
{
Debug.Assert(IsMatch("ab", "ab") == true);
Debug.Assert(IsMatch("aaaaaab", "a*b") == true);
Debug.Assert(IsMatch("abc", "a*b*c*") == true);
Debug.Assert(IsMatch("aaabccc", "a*b*c") == true); /* original false, but it should be true */
Debug.Assert(IsMatch("abccccb", "^abc*b") == true);
Debug.Assert(IsMatch("abbccccb", "^abc*b") == false);
Debug.Assert(IsMatch("abcd", "^abcd$") == true);
Debug.Assert(IsMatch("abcabc", "^abc*abc$") == true);
Debug.Assert(IsMatch("abczabc", "^abc.abc$") == true);
Debug.Assert(IsMatch("abyxxxxabc", "^ab..*abc$") == true);
}
static bool IsMatch(string input, string pattern)
{
List<PatternToken> patternTokens = new List<PatternToken>();
for (int i = 0; i < pattern.Length; i++)
{
char token = pattern[i];
if (token == '^')
{
if (i == 0)
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
else
throw new ArgumentException("input");
}
else if (char.IsLower(token) || token == '.')
{
if (i < pattern.Length - 1 && pattern[i + 1] == '*')
{
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Multiple });
i++;
}
else
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
}
else if (token == '$')
{
if (i == pattern.Length - 1)
patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
else
throw new ArgumentException("input");
}
else
throw new ArgumentException("input");
}
PatternToken firstPatternToken = patternTokens.First();
if (firstPatternToken.Token == '^')
patternTokens.RemoveAt(0);
else
patternTokens.Insert(0, new PatternToken { Token = '.', Occurence = Occurence.Multiple });
PatternToken lastPatternToken = patternTokens.Last();
if (lastPatternToken.Token == '$')
patternTokens.RemoveAt(patternTokens.Count - 1);
else
patternTokens.Add(new PatternToken { Token = '.', Occurence = Occurence.Multiple });
return IsMatch(input, 0, patternTokens, 0);
}
static bool IsMatch(string input, int inputIndex, IList<PatternToken> pattern, int patternIndex)
{
if (inputIndex == input.Length)
{
if (patternIndex == pattern.Count || (patternIndex == pattern.Count - 1 && pattern[patternIndex].Occurence == Occurence.Multiple))
return true;
else
return false;
}
else if (inputIndex < input.Length && patternIndex < pattern.Count)
{
char c = input[inputIndex];
PatternToken patternToken = pattern[patternIndex];
if (patternToken.Token == '.' || patternToken.Token == c)
{
if (patternToken.Occurence == Occurence.Single)
return IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
else
return IsMatch(input, inputIndex, pattern, patternIndex + 1) ||
IsMatch(input, inputIndex + 1, pattern, patternIndex) ||
IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
}
else
return false;
}
else
return false;
}
class PatternToken
{
public char Token { get; set; }
public Occurence Occurence { get; set; }
public override string ToString()
{
if (Occurence == Occurence.Single)
return Token.ToString();
else
return Token.ToString() + "*";
}
}
enum Occurence
{
Single,
Multiple
}
}
}
Here is a solution in Java. Space and Time is O(n). Inline comments are provided for more clarity:
/**
* #author Santhosh Kumar
*
*/
public class ExpressionProblemSolution {
public static void main(String[] args) {
System.out.println("---------- ExpressionProblemSolution - start ---------- \n");
ExpressionProblemSolution evs = new ExpressionProblemSolution();
evs.runMatchTests();
System.out.println("\n---------- ExpressionProblemSolution - end ---------- ");
}
// simple node structure to keep expression terms
class Node {
Character ch; // char [a-z]
Character sch; // special char (^, *, $, .)
Node next;
Node(Character ch1, Character sch1) {
ch = ch1;
sch = sch1;
}
Node add(Character ch1, Character sch1) {
this.next = new Node(ch1, sch1);
return this.next;
}
Node next() {
return this.next;
}
public String toString() {
return "[ch=" + ch + ", sch=" + sch + "]";
}
}
private boolean letters(char ch) {
return (ch >= 'a' && ch <= 'z');
}
private boolean specialChars(char ch) {
return (ch == '.' || ch == '^' || ch == '*' || ch == '$');
}
private void validate(String expression) {
// if expression has invalid chars throw runtime exception
if (expression == null) {
throw new RuntimeException(
"Expression can't be null, but it can be empty");
}
char[] expr = expression.toCharArray();
for (int i = 0; i < expr.length; i++) {
if (!letters(expr[i]) && !specialChars(expr[i])) {
throw new RuntimeException(
"Expression contains invalid char at position=" + i
+ ", invalid_char=" + expr[i]
+ " (allowed chars are 'a-z', *, . ^, * and $)");
}
}
}
// Parse the expression and split them into terms and add to list
// the list is FSM (Finite State Machine). The list is used during
// the process step to iterate through the machine states based
// on the input string
//
// expression = a*b*c has 3 terms -> [a*] [b*] [c]
// expression = ^ab.*c$ has 4 terms -> [^a] [b] [.*] [c$]
//
// Timing : O(n) n -> expression length
// Space : O(n) n -> expression length decides the no.of terms stored in the list
private Node preprocess(String expression) {
debug("preprocess - start [" + expression + "]");
validate(expression);
Node root = new Node(' ', ' '); // root node with empty values
Node current = root;
char[] expr = expression.toCharArray();
int i = 0, n = expr.length;
while (i < n) {
debug("i=" + i);
if (expr[i] == '^') { // it is prefix operator, so it always linked
// to the char after that
if (i + 1 < n) {
if (i == 0) { // ^ indicates start of the expression, so it
// must be first in the expr string
current = current.add(expr[i + 1], expr[i]);
i += 2;
continue;
} else {
throw new RuntimeException(
"Special char ^ should be present only at the first position of the expression (position="
+ i + ", char=" + expr[i] + ")");
}
} else {
throw new RuntimeException(
"Expression missing after ^ (position=" + i
+ ", char=" + expr[i] + ")");
}
} else if (letters(expr[i]) || expr[i] == '.') { // [a-z] or .
if (i + 1 < n) {
char nextCh = expr[i + 1];
if (nextCh == '$' && i + 1 != n - 1) { // if $, then it must
// be at the last
// position of the
// expression
throw new RuntimeException(
"Special char $ should be present only at the last position of the expression (position="
+ (i + 1)
+ ", char="
+ expr[i + 1]
+ ")");
}
if (nextCh == '$' || nextCh == '*') { // a* or b$
current = current.add(expr[i], nextCh);
i += 2;
continue;
} else {
current = current.add(expr[i], expr[i] == '.' ? expr[i]
: null);
i++;
continue;
}
} else { // a or b
current = current.add(expr[i], null);
i++;
continue;
}
} else {
throw new RuntimeException("Invalid char - (position=" + (i)
+ ", char=" + expr[i] + ")");
}
}
debug("preprocess - end");
return root;
}
// Traverse over the terms in the list and iterate and match the input string
// The terms list is the FSM (Finite State Machine); the end of list indicates
// end state. That is, input is valid and matching the expression
//
// Timing : O(n) for pre-processing + O(n) for processing = 2O(n) = ~O(n) where n -> expression length
// Timing : O(2n) ~ O(n)
// Space : O(n) where n -> expression length decides the no.of terms stored in the list
public boolean process(String expression, String testString) {
Node root = preprocess(expression);
print(root);
Node current = root.next();
if (root == null || current == null)
return false;
int i = 0;
int n = testString.length();
debug("input-string-length=" + n);
char[] test = testString.toCharArray();
// while (i < n && current != null) {
while (current != null) {
debug("process: i=" + i);
debug("process: ch=" + current.ch + ", sch=" + current.sch);
if (current.sch == null) { // no special char just [a-z] case
if (test[i] != current.ch) { // test char and current state char
// should match
return false;
} else {
i++;
current = current.next();
continue;
}
} else if (current.sch == '^') { // process start char
if (i == 0 && test[i] == current.ch) {
i++;
current = current.next();
continue;
} else {
return false;
}
} else if (current.sch == '$') { // process end char
if (i == n - 1 && test[i] == current.ch) {
i++;
current = current.next();
continue;
} else {
return false;
}
} else if (current.sch == '*') { // process repeat char
if (letters(current.ch)) { // like a* or b*
while (i < n && test[i] == current.ch)
i++; // move i till end of repeat char
current = current.next();
continue;
} else if (current.ch == '.') { // like .*
Node nextNode = current.next();
print(nextNode);
if (nextNode != null) {
Character nextChar = nextNode.ch;
Character nextSChar = nextNode.sch;
// a.*z = az or (you need to check the next state in the
// list)
if (test[i] == nextChar) { // test [i] == 'z'
i++;
current = current.next();
continue;
} else {
// a.*z = abz or
// a.*z = abbz
char tch = test[i]; // get 'b'
while (i + 1 < n && test[++i] == tch)
; // move i till end of repeat char
current = current.next();
continue;
}
}
} else { // like $* or ^*
debug("process: return false-1");
return false;
}
} else if (current.sch == '.') { // process any char
if (!letters(test[i])) {
return false;
}
i++;
current = current.next();
continue;
}
}
if (i == n && current == null) {
// string position is out of bound
// list is at end ie. exhausted both expression and input
// FSM reached the end state, hence the input is valid and matches the given expression
return true;
} else {
return false;
}
}
public void debug(Object str) {
boolean debug = false;
if (debug) {
System.out.println("[debug] " + str);
}
}
private void print(Node node) {
StringBuilder sb = new StringBuilder();
while (node != null) {
sb.append(node + " ");
node = node.next();
}
sb.append("\n");
debug(sb.toString());
}
public boolean match(String expr, String input) {
boolean result = process(expr, input);
System.out.printf("\n%-20s %-20s %-20s\n", expr, input, result);
return result;
}
public void runMatchTests() {
match("ab", "ab");
match("a*b", "aaaaaab");
match("a*b*c*", "abc");
match("a*b*c", "aaabccc");
match("^abc*b", "abccccb");
match("^abc*b", "abccccbb");
match("^abcd$", "abcd");
match("^abc*abc$", "abcabc");
match("^abc.abc$", "abczabc");
match("^ab..*abc$", "abyxxxxabc");
match("a*b*", ""); // handles empty input string
match("xyza*b*", "xyz");
}}
int regex_validate(char *reg, char *test) {
char *ptr = reg;
while (*test) {
switch(*ptr) {
case '.':
{
test++; ptr++; continue;
break;
}
case '*':
{
if (*(ptr-1) == *test) {
test++; continue;
}
else if (*(ptr-1) == '.' && (*test == *(test-1))) {
test++; continue;
}
else {
ptr++; continue;
}
break;
}
case '^':
{
ptr++;
while ( ptr && test && *ptr == *test) {
ptr++; test++;
}
if (!ptr && !test)
return 1;
if (ptr && test && (*ptr == '$' || *ptr == '*' || *ptr == '.')) {
continue;
}
else {
return 0;
}
break;
}
case '$':
{
if (*test)
return 0;
break;
}
default:
{
printf("default case.\n");
if (*ptr != *test) {
return 0;
}
test++; ptr++; continue;
}
break;
}
}
return 1;
}
int main () {
printf("regex=%d\n", regex_validate("ab", "ab"));
printf("regex=%d\n", regex_validate("a*b", "aaaaaab"));
printf("regex=%d\n", regex_validate("^abc.abc$", "abcdabc"));
printf("regex=%d\n", regex_validate("^abc*abc$", "abcabc"));
printf("regex=%d\n", regex_validate("^abc*b", "abccccb"));
printf("regex=%d\n", regex_validate("^abc*b", "abbccccb"));
return 0;
}

wild card matching in text string

My friend give this wild card(*) matching algorithm . Here is the code .
//This function compares text strings, one of which can have wildcards ('*').
//
BOOL GeneralTextCompare(
char * pTameText, // A string without wildcards
char * pWildText, // A (potentially) corresponding string with wildcards
BOOL bCaseSensitive = FALSE, // By default, match on 'X' vs 'x'
char cAltTerminator = '\0' // For function names, for example, you can stop at the first '('
)
{
BOOL bMatch = TRUE;
char * pAfterLastWild = NULL; // The location after the last '*', if we’ve encountered one
char * pAfterLastTame = NULL; // The location in the tame string, from which we started after last wildcard
char t, w;
// Walk the text strings one character at a time.
while (1)
{
t = *pTameText;
w = *pWildText;
// How do you match a unique text string?
if (!t || t == cAltTerminator)
{
// Easy: unique up on it!
if (!w || w == cAltTerminator)
{
break; // "x" matches "x"
}
else if (w == '*')
{
pWildText++;
continue; // "x*" matches "x" or "xy"
}
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
bMatch = FALSE;
break; // "x" doesn't match "xy"
}
else
{
if (!bCaseSensitive)
{
// Lowercase the characters to be compared.
if (t >= 'A' && t <= 'Z')
{
t += ('a' - 'A');
}
if (w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
}
// How do you match a tame text string?
if (t != w)
{
// The tame way: unique up on it!
if (w == '*')
{
pAfterLastWild = ++pWildText;
pAfterLastTame = pTameText;
w = *pWildText;
if (!w || w == cAltTerminator)
{
break; // "*" matches "x"
}
continue; // "*y" matches "xy"
}
else if (pAfterLastWild)
{
if (pAfterLastWild != pWildText)
{
pWildText = pAfterLastWild;
w = *pWildText;
if (!bCaseSensitive && w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
if (t == w)
{
pWildText++;
}
}
pTameText++;
continue; // "*sip*" matches "mississippi"
}
else
{
bMatch = FALSE;
break; // "x" doesn't match "y"
}
}
}
pTameText++;
pWildText++;
}
return bMatch;
}
This algo works as follow (according to me)
mississippi *sip*
mississippi sip*
ississippi sip*
ssissippi sip*
sissippi ip*
sissippi sip* pAfterLastWild is used to restore the location
issippi ip*
ssippi p*
ssippi sip* again pAfterLastWild is used here.
sippi ip*
sippi sip* here also.
ippi ip*
ppi p*
pi *
i *
I am not able to figure out why pAfterLastTame is needed and what does this piece of code is doing here as i am not able to find use of it .
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
This algo is pretty fast as number of comparisons are equal to size of tameString (correct me i am wrong) .
Does any one know more efficient algorithm than this ??