RegEx with Excel VBA on Mac - regex

I need to use regEx with Excel VBA. I'm using Mac OS 10.10 and Office 2011. So there is no DLL file I can use.
What is there to do here?
I read I've to bind an apple script. How is this done and what content does this script need?

You can use VBA's Like operator. It's a very limited regex tester only.
Microsoft Word has it's standard wildcards plus if you tick Use Wildcards it is a Regex engine (plus find words that sound the same, and words with the same root). So use Word rather than Vbscript's RegEx.
Just record a Find and Replace in Word and you'll get most of the program written for you that you'll just need to adapt.

Natively, you can't really - AppleScript isn't actually that good for this kind of thing (where VBA is concerned)
There are other libraries that you can install and use to allow support for things like regular expressions on Mac OS - the one I've seen used the most is Satimage although I've not personally had to use it (yet) so can't vouch for it myself:
http://www.satimage.fr/software/en/downloads/downloads_companion_osaxen.html

I'm working on this problem too and I think Advanced Filters may be your answer if you want to do it in Excel without adding an external library. You can access it through VBA and set up a hidden sheet somewhere to stash your filters.
https://searchengineland.com/advanced-filters-excels-amazing-alternative-to-regex-143680
And you can see what it looks like in VBA here:
https://www.contextures.com/exceladvancedfiltervba.html
However, Advanced Filters does have some notable shortcomings, like the inability to distinguish a digit from a letter. The LIKE command mentioned earlier DOES have this ability however - so you could combine them to overcome that limitation.
Hopefully you and I can both solve this problem using these tools...!

Related

How to remove emoticons from tweets in C++?

I'm working on a twitter sentiment analysis tool in C++. So far I get the tweets from Twitter and I process them a bit ( lowercase, remove RT, remove # and URLs).
The next step is to remove emoticons and all those special characters. How does one do that? before you jump me, I already looked at other similar questions but none of them deals with C++. Mostly R,Python and PHP.
I was thinking to use regex however I can't get it to work. I tried it with removal of hashtags and URLs and I gave up. I ended up using normal string:find and find_first_of.
Is there any library or method available to get rid of those emoticons and special stuff ?
Thanks
I would recommend using regular expressions for this. Now you have two options, you can either extract only the characters you are interested in (if you are working with English tweets this would probably be A-Z,a-z, numbers and maybe some symbols, depending on your needs), or you can select invalid characters (emoticons) and replace them with an empty string.
I only have experience with Qt's RegularExpression engine, but the c++ standard library has regex support (although I'm not sure how good it is with Unicode), but the ICU provides a regex library too.
*I'd provide more links but I don't have enough reputation yet :/

Futile attempt to run regular expression find/replace in MS Word using groups on Mac

According to the received wisdom MS Word (more or less) supports find/replace with use of regular expressions. I have a simple regular expression:
^(C[[:alpha:]]*)(\d*)(.*)$
That I'm running on the data:
indSIMDdecile
CSdeccrim12006
CSdeccrim12006
CSdeccrim12009
CSdeccrim12009
CSdeccrim12012
CSdeccrim12012
CSdeceduc12004
CSdeceduc12004
CSdeceduc12006
CSdeceduc12006
CSdeceduc12009
CSdeceduc12009
CSdeceduc12012
CSdeceduc12012
CSdecemp12004.x
I'm interested in returning the first word prior to the digit 1, which works as demonstrated on regex101 here.
Problem
I would like to the same but in MS Word (v. 15.18 on Mac). After getting error messages of trying to supply unsuitable syntax I learned that MS Word does not support to the full regex syntax. I simplified my expression to something on the lines:
but the search does not find any strings and nothing gets replaced. Hence my questions, is it possible to use MS Word on Mac with regex?
The linked help website hints that something like that should be possible, but so far now luck.
The simple answer is "no", if you mean "Does Mac Word have a UI feature that lets you use one of the modern dialects of regex?" Word's Find/Replace only supports its own Regular Expression syntax.
In this case, I think the following will give you what you need:
Find with wildcards:
(C)([!1]#)(1)
and a replace by
\1
(If you also had to find "C1", then that doesn't work, and unfortunately nor does
(C)([!1]{0,})(1)
because Word does not allow 0 in the {,} pattern)
But there is a problem with "#". If the text the "#" is looking for is long, the find/replace may fail. There is supposed to be a 255 limit, but it seems rather more arbitrary than that. (I have long suspected a buffer overrun type error in the Word code, but perhaps there is a simpler explanation).
If you mean, "is there any way to use modern regex with Word?", then the answer is "Yes, but you only get to operate on a copy of the text in the document. You will need to create your own code to do the 'replace' part of the find replace, and that means that you would have to deal with any of the issues such as preserving formatting that Word's built-in find/replace might get right for you.
On the Windows side, people who want a better regex than Word's often use VBScript's regexp object because it is easily used from VBA. VBA itself only really has the "like" operator, which also only has fairly crude pattern matching abilities. I think there are examples of VBScript rexexp use on StackOverflow. On the Mac side, you would either have to use VBA and "shell out" to one of the built-in Mac/Unix utilities to do your finding (and perhaps replacing), or perhaps use Applescript or Javascript application scripting to do it. As far as I can remember Applescript does not have a 'modern' regex built-in either.
[As a bit of history, Word's "regular expressions" were I think introduced in Word 6, around 1993, at a time when most dialects of regex were much more crude than they are today. I don't think Word's version has moved along much at all - it probably added some Unicode support at some point, but that's probably about it. I assume that people using modern regex don't regard it as regex at all, and I personally prefer not to call Word's Regular Expressions 'regex' precisely for that reason.]

How to implement Regex

I'm working on a database server software product (see my profile) and we see the need to implement free- text searching in our software. The query language standard we are using only supports free-text search using a BT type Regex. The only way we can use our free-text database indexes together with Regex seems to be to implement our own. My questions to SO is:
Where can I find papers/examples/patterns on how to implement a BT style Regex?
Is it worth looking into taking one of the open source C/C++ Regex libraries and altering the code to fit our needs?
If I'm not wrong SPARQL uses the XPath/XQuery regular expression syntax which is based on PERL regular expressions (At least that is what the W3C docs say)
If this is indeed the case then you can use PCRE from http://www.pcre.org/
It is licensed as BSD so you will be able to use it in a commercial product
If your syntax is slightly modified you can probably write a small routine to normalize it to the PERL syntax used by PCRE
There are two papers I have found on the subject on REGEX indexing online; one from Bell Labs and one from UCLA/IBM. I'm still not sure if to use an existing Regex library and modify it or write one from scratch.

Regex - match a string not contain a 'semi-word'

I tried to make regex syntax for that but I failed.
I have 2 variables
PlayerInfo[playerid][pLevel]
and
Character[playerid]
and I want to catch only the second variable,I mean only the world what don't contain PlayerInfo, but cointains [playerid]
"(\S+)\[playerid\]" cath both words and (\S+[^PlayerInfo])\[playerid\] jump on some variables- they contais p,l,a,y ...
I need to replace in notepad++,all variables like Text[playerid] to ExClass [playerid][Text]
Couple Pluasible solutions.
List item
Notepad has a plugin called python script. Running regex from there
gives full regex functionality, the python version anyway, and a lot
of powerful potential beyond that. And I use the online python regex tester to help out.
RegRexReplace plugin helps create regex plugins in Notepad++, so when you do hit a limitation, you find out a lot quicker.
Or of course default to your alternate editor (I'm assuming you have
one?) or this online regex tool is absolutely amazing. You
can perform the action on the text online as well.
(I'd try to build a regex for you, but I'm a bit lost as to what you're looking for. Unless the Ivo Abeloos got it. If you're still coming up short, maybe a code example along with values displayed?)
Good luck!
It seems that Notepad++ support negative lookbehind since v6.
In notepad++ you could try to replace (.+)\[(.+)\] with ExClass\[\2\]\[\1\]
Try to use negative lookbehind.
(?<!PlayerInfo)\[playerid\]
EDIT: unfortunately notepad++ does not support negative lookbehind.
I tried to make a workaround based on the following naive idea:
(.[^o]|[^f]o)[playerid]
But this expression does not work either. Notepad++ seems to fail in alternative operator. Thus the answer is: it is impossible to do exactly what you want. Try to solve the problem in other way or use alternative tool.

Regular expression extraction in text editors

I'm kind of new to programming, so forgive me if this is terribly obvious (which would be welcome news).
I do a fair amount of PHP development in my free time using pregmatch and writing most of my expressions using the free (open source?) Regex Tester.
However frequently I find myself wanting to simply quickly extract something and the only way I know to do it is to write my expression and then script it, which is probably laughable, but welcome to my reality. :-)
What I'd like is something like a simple text editor that I can feed my expression to (given a file or a buffer full of pasted text) and have it parse the expression and return a document with only the results.
What I find is usually regex search/replace functions, as in Notepad++ I can easily find (and replace) all instances using an expression, but I simply don't know how to only extract it...
And it's probably terribly obvious, can expression match only the inverse? Then I could use something like (just the expression I'm currently working on):
([^<]*)
And replace everything that doesn't match with nothing. But I'm sure this is something common and simple, I'd really appreciate any poniters.
FWIW I know grep and I could do it using that, but I'm hoping their are better gui'ified solution I'm simply ignorant of.
Thanks.
Zach
What I was hoping for would be something that worked in a more standard set of gui tools (ie, the tools I might already be using). I appreciate all the responses, but using perl or vi or grep is what I was hoping to avoid, otherwise I would have just scripted it myself (of course I did) since their all relatively powerful, low-level tools.
Maybe I wasn't clear enough. As a senior systems administrator the cli tools are familiar to me, I'm quite fond of them. Working at home however I find most of my time is spent in a gui, like Netbeans or Notepad++. I just figure there would be a simple way to achieve the regex based data extraction using those tools (since in these cases I'd already be using them).
Something vaguely like what I was referring to would be this which will take aa expression on the first line and a url on the second line and then extract (return) the data.
It's ugly (I'll take it down after tonight since it's probably riddled with problems).
Anyway, thanks for your responses. I appreciate it.
If you want a text editor with good regex support, I highly recommend Vim. Vim's regex engine is quite powerful and is well-integrated into the editor. e.g.
:g!/regex/d
This says to delete every line in your buffer which doesn't match pattern regex.
:g/regex/s/another_regex/replacement/g
This says on every line that matches regex, do another search/replace to replace text matching another_regex with replacement.
If you want to use commandline grep or a Perl/Ruby/Python/PHP one-liner any other tool, you can filter the current buffer's text through that tool and update the buffer to reflect the results:
:%!grep regex
:%!perl -nle 'print if /regex/'
Have you tried nregex.com ?
http://www.nregex.com/nregex/default.aspx
There's a plugin for Netbeans here, but development looks stalled:
http://wiki.netbeans.org/Regex
http://wiki.netbeans.org/RegularExpressionsModuleProposal
You might also try The Regulator:
http://sourceforge.net/projects/regulator/
Most regex engines will allow you to match the opposite of the regex.
Usually with the ! operator.
I know grep has been mentioned, and you don't want a cli tool, but I think ack deserves to be mentioned.
ack is a tool like grep, aimed at
programmers with large trees of
heterogeneous source code.
ack is written purely in Perl, and
takes advantage of the power of Perl's
regular expressions.
A good text editor can be used to perform the actions you are describing. I use EditPadPro for search and replace functionality and it has some other nice feaures including code coloring for most major formats. The search panel functionality includes a regular expression mode that allows you to input a regex then search for the first instance which identifies if your expression matches the appropriate information then gives you the option to replace either iteratively or all instances.
http://www.editpadpro.com
My suggestion is grep, and cygwin if you're stuck on a Windows box.
echo "text" | grep ([^<]*)
OR
cat filename | grep ([^<]*)
What I'd like is something like a
simple text editor that I can feed my
expression to (given a file or a
buffer full of pasted text) and have
it parse the expression and return a
document with only the results.
You have just described grep. This is exactly what grep does. What's wrong with it?