Win32 API to do wildcard string match - c++

I am looking for a wildcard string match API (not regex match). I cannot use anything other than Win32 APIs.

There is PathMatchSpec - but handling is specialized for files, so results might not be what you expect if you need general wildcard matching.
Otherwise, you should probably go with an RegEx, as Pavel detailed.
[edit]
I incorrectly assumed PathMatchSpec shares the properties of FindFirstFile/FindNextFile. I've ran a few tests - it doesn't. So it looks like the best candidate.

Strange that so many years passed and nobody gave you this answer:
There is a WIN32 API that does exactly what you're looking for. (I found it searching in the MSDN for "wildcard")
It's name is SymMatchString. It sits in DbgHelp.dll which is part of the operating system.
Put a CriticalSection around the API call if your app is mulithreaded!
The API that FindFirstFile uses internally for wildcard matches is probably FsRtlIsNameInExpression.
Elmü

The easiest thing would be to just convert your glob pattern to a regex, by the following rules:
* becomes .*
? becomes .
Any of \|.^$+()[]{} are escaped by preceding them with \
This is partly true.
Following rules are inducted from DIR behaviour in XP+ Command Prompt:
* is the same as *.* and becomes regex .+
? becomes regex .? unless followed by a non-wildcard
? not followed by a wildcard becomes regex .
*. means "without extension", and becomes [^.]+$

The FindFirstFile and FindNextFile APIs do wildcard matches, but only against filenames.
You can't use anything but Win32? What about STL or CRT? Are you using Boost?
Without the Win32 API restriction, I would recommend using the code from some open-source project. Another option would be to translate the glob into a regex, which I believe can be done with a regex replace operation.
edit: First google match is the PHP code:
http://cvs.php.net/viewvc.cgi/php-src/win32/

If you're after a simple wildcard compare (globbing), some people have written their own, including this one (which we use in our code)

WHat exactly is your requirement? Are you just looking to use the '' symbol to match 0 or more characters or are you planning on using the '?' symbol as well. If it is just '', do you need to look for a, a, ab, ab*c, etc type patterns? If your requirement is limited, you could easily get away with the C++ runtime library's strstr function.

Related

Futile attempt to run regular expression find/replace in MS Word using groups on Mac

According to the received wisdom MS Word (more or less) supports find/replace with use of regular expressions. I have a simple regular expression:
^(C[[:alpha:]]*)(\d*)(.*)$
That I'm running on the data:
indSIMDdecile
CSdeccrim12006
CSdeccrim12006
CSdeccrim12009
CSdeccrim12009
CSdeccrim12012
CSdeccrim12012
CSdeceduc12004
CSdeceduc12004
CSdeceduc12006
CSdeceduc12006
CSdeceduc12009
CSdeceduc12009
CSdeceduc12012
CSdeceduc12012
CSdecemp12004.x
I'm interested in returning the first word prior to the digit 1, which works as demonstrated on regex101 here.
Problem
I would like to the same but in MS Word (v. 15.18 on Mac). After getting error messages of trying to supply unsuitable syntax I learned that MS Word does not support to the full regex syntax. I simplified my expression to something on the lines:
but the search does not find any strings and nothing gets replaced. Hence my questions, is it possible to use MS Word on Mac with regex?
The linked help website hints that something like that should be possible, but so far now luck.
The simple answer is "no", if you mean "Does Mac Word have a UI feature that lets you use one of the modern dialects of regex?" Word's Find/Replace only supports its own Regular Expression syntax.
In this case, I think the following will give you what you need:
Find with wildcards:
(C)([!1]#)(1)
and a replace by
\1
(If you also had to find "C1", then that doesn't work, and unfortunately nor does
(C)([!1]{0,})(1)
because Word does not allow 0 in the {,} pattern)
But there is a problem with "#". If the text the "#" is looking for is long, the find/replace may fail. There is supposed to be a 255 limit, but it seems rather more arbitrary than that. (I have long suspected a buffer overrun type error in the Word code, but perhaps there is a simpler explanation).
If you mean, "is there any way to use modern regex with Word?", then the answer is "Yes, but you only get to operate on a copy of the text in the document. You will need to create your own code to do the 'replace' part of the find replace, and that means that you would have to deal with any of the issues such as preserving formatting that Word's built-in find/replace might get right for you.
On the Windows side, people who want a better regex than Word's often use VBScript's regexp object because it is easily used from VBA. VBA itself only really has the "like" operator, which also only has fairly crude pattern matching abilities. I think there are examples of VBScript rexexp use on StackOverflow. On the Mac side, you would either have to use VBA and "shell out" to one of the built-in Mac/Unix utilities to do your finding (and perhaps replacing), or perhaps use Applescript or Javascript application scripting to do it. As far as I can remember Applescript does not have a 'modern' regex built-in either.
[As a bit of history, Word's "regular expressions" were I think introduced in Word 6, around 1993, at a time when most dialects of regex were much more crude than they are today. I don't think Word's version has moved along much at all - it probably added some Unicode support at some point, but that's probably about it. I assume that people using modern regex don't regard it as regex at all, and I personally prefer not to call Word's Regular Expressions 'regex' precisely for that reason.]

Looking for RegEx pattern match code for Powerbuilder Application

I would appreciate if any one has RegEx Pattern Match Code to be used in Powerbuilder application.
Natively, I'd be reluctant to call this "RegEx", but if you have rudimentary pattern matching needs, Match() is the PowerScript function that does this. It has basic operators, but will only tell you if you have a match, not where in the target string.
If you need something more robust than that, even if it hadn't already been mentioned, I'd point you towards PbniRegex.
Good luck,
Terry.
If using PB.NET then you can use .NET framework assemblies that have robust RegEx functionality. If not then maybe what Terry mentioned about PBNIRegex, for which I don't have any experience with.

Regex for comparing special Characters in (C)Strings

I have an MFC project where I need to read and compare various configuration strings from (xml-)files.
The problem is that they could contain one or multiple special characters like STX, ETX, LF, CR ... and so on.
An idea is using regex. I could simply write the full regex pattern in the files and compare them with a match function.
As I looked this up via google and msdn, there were two different(?) regex frameworks for MFC but I don't see any difference between them nor do I see if they can solve my problem, meaning handle special characters.
Do any of you have an experience with those frameworks? Can you recommend one or can you think of another solution for this problem?
Many thanks in advance.
I recommend std::regex or boost::regex over non standard alternatives. Also, they are able to support special characters.

Regex PatternRepository pattern on BlackBerry 5 - how to ignore case

I hope this title makes sense - I need case-insensitive regex matching on BlackBerry 5.
I have a regular expression defined as:
public static final String SMS_REG_EXP = "(?i)[(htp:/w\\.)]*cobiinteractive\\.com/[\\w|\\%]+";
It is intended to match "cobiinteractive.com/" followed by some text. The preceding (htp:w.) is just there because on my device I needed to override the internal link-recognition that the phone applies (shameless hack).
The app loads at start-up. The idea is that I want to pick up links to my site from sms & email, and process them with my app.
I add it to the PatternRepository using:
PatternRepository.addPattern(
ApplicationDescriptor.currentApplicationDescriptor(),
GlobalConstants.SMS_REG_EXP,
PatternRepository.PATTERN_TYPE_REGULAR_EXPRESSION,
applicationMenu);
On the os 4.5 / 4.7 simulators and on
a Curve 8900 device (running 4.5),
this works.
On the os 5 simulators and the Bold
9700 I tested, app fails to compile
the pattern with an
IllegalArgumentException("unrecognized
character after (?").
I have also tried (naively) to set the pattern to "/rockstar/i" but that only matches the exact string - this is possibly the correct direction to take, but if so, I don't know how to implement it on the BB.
How would I modify my regex in order to pick up case insensitive patterns using the PatternRepository as above?
PS: would the "correct" way be to use the [Cc][Oo][Bb][Ii]2... etc pattern? This is ok for a short string, but I am hoping for a more general solution if possible?
Well not a real solution for the general problem but this workaround is easy, safe and performant:
As your dealing here with URLs and they are not case-sensitive...
(it doesn't matter if we write google.com or GooGLE.COm or whatever)
The most simple solution (we all love KISS_principle) is to do first a lowercase (or uppercase if you like) on the input and than do a regex match where it doesn't matter whether it's case-sensitive or not because we know for sure what we are dealing with.
Since nobody else has answered this question relating to the PatternRepository class, I will self-answer so I can close it.
One way to do this would be to use a pattern like: [Cc][Oo][Bb][Ii]2[Nn][Tt][Ee][Rr][Aa][Cc][Tt][Ii][Vv][Ee]... etc where for each letter in the string, you put 2 options. Fortunately my string is short.
This is not an elegant solution, but it works. Unfortunately I don't know of a way to modify the string passed to PatternRepository and I think the crash when using the (?i) modifier is a bug in BB.
Use the port of the jakarta regex library:
https://code.google.com/p/regexp-me/
If you use unicode support, it's going to eat memory,
but if you just want case insensitive matching,
you simply need to pass the RE.MATCH_CASEINDEPENDENT flag when you compile your regex.
new RE("yourCaseInsensitivePattern", RE.MATCH_CASEINDEPENDENT | OTHER_FLAGS)

hgignore: help ignoring all files but certain ones

I need an .hgdontignore file :-) to include certain files and exclude everything else in a directory. Basically I want to include only the .jar files in a particular directory and nothing else. How can I do this? I'm not that skilled in regular expression syntax. Or can I do it with glob syntax? (I prefer that for readability)
Just as an example location, let's say I want to exclude all files under foo/bar/ except for foo/bar/*.jar.
The answer from Michael is a fine one, but another option is to just exclude:
foo/bar/**
and then manually add the .jar files. You can always add files that are excluded by an ignore rule and it overrides the ignore. You just have to remember to add any jars you create in the future.
To do this, you'll need to use this regular expression:
foo/bar/.+?\.(?!jar).+
Explanation
You are telling it what to ignore, so this expression is searching for things you don't want.
You look for any file whose name (including relative directory) includes (foo/bar/)
You then look for any characters that precede a period ( .+?\. == match one or more characters of any time until you reach the period character)
You then make sure it doesn't have the "jar" ending (?!jar) (This is called a negative look ahead
Finally you grab the ending it does have (.+)
Regular expressions are easy to mess up, so I strongly suggest that you get a tool like Regex Buddy to help you build them. It will break down a regex into plain English which really helps.
EDIT
Hey Jason S, you caught me, it does miss those files.
This corrected regex will work for every example you listed:
foo/bar/(?!.*\.jar$).+
It finds:
foo/bar/baz.txt
foo/bar/baz
foo/bar/jar
foo/bar/baz.jar.txt
foo/bar/baz.jar.
foo/bar/baz.
foo/bar/baz.txt.
But does not find
foo/bar/baz.jar
New Explanation
This says look for files in "foo/bar/" , then do not match if there are zero or more characters followed by ".jar" and then no more characters ($ means end of the line), then, if that isn't the case, match any following characters.
Anyone that wants to use negative lookaheads (or ?! in regex syntax) or any kind of back-referencing mechanism should be aware that Mercurial will fall back from google's RE2 to Python's re module for matching.
RE2 is a non-backtracking engine that guarantees a run-time linear with the size of the input. If performance is important to you, that is if you have a big repository, you should consider sticking to more simple patterns that Re2 supports, which is why I think that the solution offered by Ryan.