Perl has been one of my go-to programming language tools for years and years. Perl 6 grammars looks like a great language feature. I'd like to know if someone has started something like this for Ruby.
If you want to use actual Perl 6 grammars in Ruby, your best bet is going to be Cardinal, a ruby compiler on Parrot. It's currently unfinished and VERY SLOW, but I'm quite hopeful about it being a viable ruby implementation eventually. It's currently mostly-inactive, pending some infrastructure changes in Parrot to support improved parsing speed and additional features.
No. And, since Perl6 grammars are a language feature, and Ruby doesn't allow the language to be extended, it is actually impossible to implement this in an "addon".
However, there are numerous libraries for Ruby which implement different kinds of Parsing or Grammar Systems. The standard library already contains racc, which is an LALR(1) parser generator (comparable to and somewhat compatible with the venerable yacc). Then there is the ANTLR parser generator, which has a Ruby backend (although I am not sure whether that actually works).
The closest thing to Perl6 grammars in Ruby would be the Ruby-OMeta project (make sure to also take a look at Ryan Davis's fork), which unfortunately ist still under development. (Or rather, no longer under active development.)
So, keeping to stuff that actually exists, I recommend you take a look at the Grammar project and Treetop.
Don't know of anything similar for Ruby.
However there is something similar for Perl5, see Regexp::Grammars
Related
I see umpteen posts a day about "how to do X with regexen". And the best response to most of them seems like it would honestly be, "Why are you trying to drive a screw with a hammer?" But regexen are everywhere, and the syntax is mostly portable, particularly if you keep away from the fancy bits.
Is there anything equivalent to regexen but at the next level up in power and configurability? A "you can use it anywhere" parsing library of some variety, preferably with a gloriously concise DSL as its interface?
I've used Ragel somewhat, but because of the preprocessing step, I'd hesitate to recommend it to someone as "use this instead of some hairy regex". It's awkward to use from Obj-C, and I expect it will be terribly awkward from a language that doesn't have compile-link-run as part of its standard operating procedure.
What I'm looking for is something that will pass the "inline-online-universal" test.
(inline) You can write the notation inline with your other code, as you would with a regex..
(online) You can run the resulting parser just as you would your other code, which would mean right after input to a REPL in the case of something like Python.
(universal) You can move to a different language/platform and use virtually the same code for your parser, modulo dialect differences. In reality, I'd be happy with something that works from Python, Ruby, C, Java, and Haskell.
Most tools I know of fall down at "online". They preprocess a grammar offline and spit out code in the target language (C, Python, Java, C++…). They're standalone tools that aren't themselves integrated into the language environment.
I've had suggestions of PEG parsers and lex/yacc combos. Parser combinator libraries might also be a good fit. Whatever you might propose, I'd like to see demonstrated that it meets these tests. Your answer should demonstrate that the proposed solution meets the inline-online-universal requirements by providing a working demo parser in Python, C, and Haskell. The demo example is up to the author, but it should be something painful using just regexen but trivial using a proper parser.
https://github.com/leblancmeneses/NPEG
Implements PEG.
Meets all 3 ... let me explain.
It is inline only with C# and offline with all the others. C# has an offline version also.
I currently support offline versions: C/C++/Javascript (local right now)/Java pass all unit tests - to make it universal. To add another language takes 25.84 hrs (how long it took to create the offline Javascript version)
To make it online for every language would be to much maintenance(possible) but it took me a lot of work and time just to support the current offline versions. I can now focus my energy on building grammar optimizers and tooling to unit test grammar rules where all offline versions benefit.
Have a look at Lex/Yacc or their counterparts Flex/Bison (or Coco, or all the other "compiler" generators). The combination can be used to parse complex textual data with an (arguably) much more readable syntax than with regexen.
For simple problems though, where regexen are more than sufficient, by any means do use them.
I'm wondering if Perl is a good (easy to use and to learn) tool for this:
I'd like to do some custom preprocessing on my C/C++ source code. Basically, this's to allow me to insert my own custom annotations to the source code and generate new codes base on it. The required processing is mainly line oriented search/replace and insertion of new source code lines.
I can now think of 2 tools to achieve this: (1) Ultraedit's scripting feature (or any other capable editors). (2) Perl scripting.
Ultraedit's scripting looks good and I'm familiar with it. Best of all, its natural line oriented processing is a good abstraction for processing source code lines.
I'm wondering if Perl is also a good tool. I've ZERO experience with Perl except that I'm familiar with Perl style Regexpr used in other contexts. Is Perl a good tool for line oriented text processing? I'll have to search forward and backward and replace source code lines with some other texts.
Yes, Perl is a good tool for what you want. I'd go for Python, it's quick, easy, beautiful and has a good regex interface in the STL; but it's purely a matter of taste.
Perl is an excellent tool for this if you are familiar with it. It's essentially geared towards that kind of text analysis and translation, so you'll find that it has all of the extensions you could ask for.
Another option is to use UltraEdit's JavaScript functionality. The execution speed on it is a little slower than what you can get in Perl, but it provides a decent user interface where you can use UltraEdit to indicate where you want the changes to be made. Also, UltraEdit JavaScript has a great deal more flexibility than UltraEdit scripting.
I can't personally recommend Python for it, but I'm currently part of a company initiative to use it for exactly that kind of function, so hopefully the previous answerer is right.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I’m always afraid whenever I see any regular expression. I think it very difficult to understand. But fear is not the solution. I’ve decided to start learning regex, so can someone advise me how I can just start? And if there’s is any easy tutorial?
☝ Getting Started with /Regexes/
Regular expressions are a form of declarative programming. If you are used to imperative, functional, or object-oriented programming, then they are a very different way of thinking. It’s a rules-based approach with subtle backtracking issues. I daresay a background in Prolog might actually do you some good with these, which certainly isn’t something I commonly advise.
Normally I would just have people play around with the grep command from their shell, then advance to using regexes for searching and replacing in their editor.
But I’m guessing you aren’t coming from a Unix background, because if you were, you would have come across regexes all over, from the very most basic grep command to pattern-matching in the vi or emacs editors. You can look at the grep manpage by typing
% man grep
on your BSD,
Linux,
Apple, or Sun systems — just to name a few.
☹ ¡ʇɟoƨoɹɔᴉƜ ʇnoqɐ əɯ ʞƨɐ ʇ ̦uop əƨɐəld ʇƨnɾ ☹
☟ (?: Book Learnin’? )
If you ran into regular expresions at school or university, it was probably in the context of automata theory. They come up when discussing regular languages. If you have suffered through such classes, you may remember that regular expressions are the user-friendly face to messy finite automata. What they probably did not teach you, however, is that outside of the ivory tower, the regular expressions people actually use to in the real world are far, far behind "regular" in the rarefied, theoretical, and highly irregular sense of that otherwise commonplace word. This means that the modern regular expressions — call them patterns if you prefer — can do much more than the traditional regular
expressions taught in computer science classes. There just isn’t any REGULAR left in modern regular expressions outside the classroom, but this is a good thing.
I say “modern”, but in fact regular expressions haven’t been regular since Ken Thompson first put back references into his backtracking NFA, back when he was famousluy proving NFA–DFA equivalence. So unless you actually are using a DFA engine, it might be best to just forget any book-learnin’ nonsense about REGULARness of regexes. It just doesn’t apply to the way we really use them every day in the real world.
Modern regular expressions allow for much more than just back references though, as you will find once you delve into them. They’re their own wonderful world, even if that world is a bit surreal at times. They can let you substitute for pages and pages of code in just one line. They can also make you lose hair over their crazy behavior. Sometimes they make your computer seem like it’s hung, because it’s actually working very hard in a race between it and the heat-death of the universe in some awful O(2ⁿ) algorithm, or even worse. It can easily be much worse, actually. That’s what having this sort of power in your hands can do. There are no training wheel or slow lane. Regexes are a power tool par excellence.
/☕✷⅋⋙$⚣™‹ª∞¶⌘̤℈⁑‽#♬˘$π❧/
Just one more thing before I give you a big list of helpful references. As I’ve already said today elsewhere, regexes do not have to be ugly, and they do not have to be hard. REMEMBER: If you create ugly regexes, it is only a reflection on you, not on them.
That’s absolutely no excuse for creating regexes that are hard to read. Oh, there’s plenty like that out there all right, but they shouldn’t be and they needn’t be. Even though regexes are (for the most part( a form of declarative programming, all the software engineering techniques that one uses in other forms of programming ̲s̲t̲i̲l̲l̲ ̲a̲p̲p̲l̲y̲ ̲h̲e̲r̲e̲!
A regex should never look like a dense row of punctuation that’s impossible to decipher. Any language would be a disaster if you removed all the alphabetical identifiers, removed all whitespace and indentation, removed all comments, and removed every last trace of top-down programming. So of course they look like cr#p if you do that. Don’t do that!
So use all of those basic tools, including aesthetically pleasing code layout, careful problem decomposition, named subroutines, decoupling the declaration from the execution (including ordering!), unit testing, plus all the rest, whenever you’re creating regexes. These are all critical steps in making your patterns maintainable.
It’s one thing to write /(.)\1/, but quite another to write something like mǁ☕⅋⚣⁑™∞¶⌘℈‽#♬❧ǁ. Those are regexes from the Dark Ages: don’t just reject them: burn them at the stake! It’s programming, after all, not line-noise or golf!
☞ Regex References
The Wikipedia page on regular expressions is a decent enough overview.
IBM has a nice introduction to regexes in their Speaking Unix series.
Russ Cox has a very nice list of classic regular expressions references. You might want to check out the original Version 8 regular expressions, here found in a Perl manpage, but these were the original, most basic patterns that everybody grew up with back in olden days.
Mastering Regular Expressions from O’Reilly, by Jeffrey Friedl.
Jan Goyvaerts’s regular-expressions.info site and his Regular Expression Cookbook, also from O’Reilly.
I’m a native speaker of Perl, so let me say four words about it. Chapter 5 of the Perl Cookbook and Chapter 6 of Programming Perl, both somewhat embarrassingly by yours truly et alios, also from O’Reilly, are devoted to regular expressions in Perl. Perl was the language that originated most regex features found in modern regular expressions, and it continues to lead the pack. Perl’s Unicode support for regexes is especially rich and remarkably simple to use — in comparison with other languages’. You can download all the code examples from those two books from the O’Reilly site, or see the next item. The perldoc.org site has quite a bit on pattern matching, including the perlre and perluniprops manpages, just to take a couple of starting points.
Apropos the Perl Cookbook, the PLEAC project has reïmplemented the Perl Cookbook code in a dizzying number of diverse languages, including ada, common lisp, groovy, guile, haskell, java, merd, ocaml, php, pike, python, rexx, ruby, and tcl. If you look at what each language does for their equivalent of PCB’s regex chapter, you will learn a tremendously huge amount about how that language deals with regular expressions. It’s a marvellous resource and quite an eye-opener, even if some up the solutions are, um, supoptimal.
Java Regular Expressions by Mehran Habibi from Apress. It’s certainly better than trying to figure anything out by reading Sun’s documentation on the Pattern class. Java is probably the worst possible language for learning regexes in; it is very clumsy and often completely stupid. I speak from painful personal experience, not from ignorance, and I am hardly alone in this appraisal. If you have to use a JVM language, I recommend Groovy or perhaps Scala. Unfortunately, both are based on the standard Java pattern matching classes, so share their inadequacies.
If you need Unicode and you’re using Java or C⁺⁺ instead of Perl, then I recommend looking into the ICU library. They handle Unicode in Java much better than Sun does, but it still feels too much like assembler for my tastes. Perl and Java appear to have the best support for Unicode and multiple encodings. Java is still kinda warty, but other languages often have this even worse. Be warned that languages with regexes bolted on the site are always clumsier to use them in than those that don’t.
If you’re using C, then I would probably skip over the system-supplied regex library and jump right into PCRE by Phil Hazel. A bonus is that PCRE can be built to handle Unicode reasonably well. It is also the basic regex library used by several other languages and tools, including PHP.
regular-expressions.info is a gold-mine of information and tutorials about regular expressions. From beginner to expert, there's not much out there that is better than this site when it comes to the study of regular expressions.
regular-expressions.info has a good tutorial here
http://www.regular-expressions.info/tutorial.html
Regular expressions in itself might not achieve any utility, unless combined in with either a text manipulation operations using some kind of scripting tool(sed/awk) or a programming language like Perl or so. Try to install Regex Buddy. Nice standalone tool which can let you use regular expressions, on some files you may point it to.
So yes you can learn about some basic info mentioning their structure, syntax, semantics, if I may call so, but try to read the regular expressions tutorials in - Perl, Vim,... and do some example string/text manipulation in those contexts, programatically
-AD.
While learning at: regular-expressions.info, the Regular Expressions Cheat Sheet (V2) is something you definitely want to have.
http://www.gskinner.com/RegExr/ exists both as an online version and as an AIR application.
The cool thing about this app (besides that it work like a charm) is that you can save your expressions or share them with the community right from the app.
Say you need an e-mail regex you can just search for e-mail and you will get back a rated list of expressions.
Another helpful feature is the interpretation of your expressions into human readable form. This makes it easier to learn and master.
For the tutorial part this article is very easy consume.
This book saved my ass when I was starting out with awk and sed.
I know there is the perl regex that is sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?
There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE (Perl Compatible Regular Expression) with respect to greedy matching and what not.
Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.
Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.
Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.
I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.
For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.
Differences like that make this one particularly hard to standardize.
Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.
They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.
Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".
Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.
Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
We would like to create some simple automated tests that will be created and maintained by testers. Right now we have a tester who can code in any language, but in the future we might want any tester with a limited knowledge of programming to be able to add or modify the tests.
What is a good programming language for testers who are not great programmers, or programmers at all?
Someone suggested LUA, but I looked into LUA and it might be more complicated that another language would be.
Preferably, the language will be interpreted and not be compiled. Let me know what you think.
Update: C and C++ are under the hood. No one is aspiring to be a programmer really ... it just might be something they could potentially work on if they can handle the task.
Update 2: I am a software engineer who happens to be a tester right now. I am very knowledgeable about the entire lifecycle ... including developing code, so for me I could go with any language but I'm trying to think of other testers who aren't as knowledgeable about programming as I am.
Update 3: The language will need to be able to make calls to the C++ code easily.
You may not even need language depending on what you are testing you can use
Test Modeling tools like CubicTest: http://cubictest.seleniumhq.org/
Highly recommend you check that out if you are doing Web Applications.
Our QA team had great success with it.
Otherwise I would recommend a Domain Specific Language over a General Purpose Language in your problem domain. The DSL might actually be a subset of a GPL (for example Rake for Ruby) so google carefully.
If you can not find an existing DSL then:
Create a DSL for your testers using Ruby or Scheme. Those two languages are the
easiest to create Domain Specific languages.
Python If all else fails and they need a GPL than Python is by far the easiest language to learn IMHO.
EDIT - Based on your updated requirements, Python might be the best fit. I have found it very easy to call C or C++ with Python CTypes. However I am sure Ruby has something equally as good.
I always reccomend Python.
People always think i am crazy, but it is the easiest and most flexible to show people. And you could always design some type of "test" framework, and only expose them to a very small subset of it.
And you can always refer them to :
(source: barnesandnoble.com)
I think that before actually choosing a language, we should define even more precisely what you are looking for.
Garbage collected, as we don't want people to have to understand memory management!!
Good number of modules/libraries around, so as not to reinvent the wheel
Preferably coming up with already existing (and tested) unit-test frameworks
Good documentation for the modules/libraries
Preferably scripting language, because tests have to be modified/run quickly
Easy interaction with C/C++, though the developers will have to provide the interface
And, perhaps most important of all:
clear and "english-like" syntax, so that it will be (at least) readable by non-tech people
Based on this list, I would recommend Python.
It's perhaps the programming language (having reached a critical mass) that is the closest to traditional english / algorithm expressions. It's certainly one of those with the least punctuation / weird symbols that throw off non-programmers
It comes immediately with so much modules that it's unlikely you'll have to dig for more any time soon... comprising a unittest module
The documentation is really good, generally illustrated by examples
It is quite simple to interface it with C
You can even run Python scripts from Java using JPython ;)
We have a in-house software for our non-regression tests. While it's been programmed in Java (probably for the GUI part and the Windows/Unix portability requirement), Python has been elected as the language to use within the non-regression tests themselves.
This is used mostly by our QA people, and even beginners usually take to it pretty easily, even when they don't have any programming background at all.
Note: I don't have any practical experience with LUA, so I am unable to choose between Python or LUA. However, having use both Python and Ruby, I must say that I have found Python much more readable (loop constructs / punctuation). Just make sure not to pick up Perl ;)
Depends a bit on what you want, but for my money Ruby is probably the most comprehensible language around.
Also if you're working with web stuff, then Watir gives you a lot of testing functionality right there.
If your ambition is at all to become a programmer, I'd suggest using the language of the system you're testing.
The experience will make you a better programmer, and the knowledge can only make you a better tester.
Python is a very simple and useful language to understand. Some even compare it to writing pseudocode. It also comes with its own unit testing framework.
EDIT: It also comes with a C API.
I think the better question might be what do you plan on doing with the programs created. I have done Java, html, css, php, mysql, vb, C#, etc, etc. Out of all of them, the fundementals remain the same. You always have the same type of logic from language to language. IF ELSE statements, for loops etc. However if your not planning on creating self loading programs then you would go for something that wouldn't do that.
Persoanlly Java is difficult but allows for a lot of portability. Don't just go with what's easiest, because you might not be able to do anything with it in the future.
EDIT *
if you are still interested, java has the ability to make calls to C++, but not without it's share of problems. Link this link has a look at making calls, but might be a little too involved if you're only hoping to show introductory programming.
Ok, as I understand the question it is really, how can I let non-programmer's write automated tests for an app written in c++? So in this context I would suggest taking a look at Fit and FitNesse.
Fit is essentially a table based F ramework for I ntegration T esting. The idea is that you hook fixtures up into the code to test and those fixtures are then controlled in different ways using nothing more than tables or in the case of FitNesse simple wiki markup which creates the tables under the hood.
The advantage of this is that there is no programming language involved at all. They just need to know what fixtures you have exposed and the proper usage for them.
The drawback of this is that it can be difficult at first to map out the fixtures you need/want for your tests. Also, it is generally more maintenance than using like a unit test framework where the tests are all just in code.