I'm looking for some predefined Regexes for elements of ANSI C++.
I would like to create a program which takes a headerfile (with includes, namespaces, classes etc) as input and returns lists with the found classnames, methods, attributes etc.
Its hard to google for something like that, I always end up with tutorials of how to use Regexes in C++. Perhaps I'm just googling the wrong terms?
Perhaps someone already has found/used/created such Regexes.
This type of operation is not possible to do with a regular expression. C++ is not a regular language and hence can't be reliably parsed with a regular expression. The safest approach here is to use an actual parser here to locate C++ elements.
If 100% correctness is not a goal though then a regular expression will work because it can be crafted to catch the majority of cases within a code base. The simplest example would be the following
class\s+[a-z]\w+
However it will incorrectly match the following as a class
Forward declarations
Any string literal with text like "class foo"
Template parameters
etc ...
You might find the code for ctags handy. It will parse code and break out the symbols for use in emacs and other programs. In fact, it might just do all the work you are trying to do yourself.
You may also find something interesting in ctags or cscope as already mentioned. I also have encountered flist here
I'm writing a Python program to extract some essential class info from a large messy C++ source tree. I'm having pretty good luck with using regexes. Fortunately, nearly all the code follows a style that lets me get away with defining just a few regexes to detect class declarations, methods, etc. Most member variables have names like "itsSomething_" or "m_something". I kludge in hard-coded hackwork to catch anything not fitting the style.
class_decl_re = re.compile( r"^class +(\w+)\s*(:|\{)" )
close_decl_re = re.compile( r"^\};" )
method_decl_re = re.compile( r"(\w[ a-zA-Z_0-9\*\<\>]+) +(\w+)\(" )
var_decl1_re = re.compile( r"(\w[ a-zA-Z_0-9\*\<\>]+) +(its\w+);" )
var_decl2_re = re.compile( r"(\w[ a-zA-Z_0-9\*\<\>]+) +(m_\w+);" )
comment_pair_re = re.compile( r"/\*.*\*/" )
This is a work in progress, but I'll show this (possibly buggy) (no, almost certainly buggy) snip of code to show how the regexes are used:
# at this point, we're looking at one line from a .hpp file
# from inside a class declaration. All initial whitespace has been
# stripped. All // and /*...*/ comments have been removed.
is_static = (line[0:6]=="static")
if is_static:
line=line[6:]
is_virtual = (line[0:7]=="virtual")
if is_virtual:
line=line[7:]
# I believe "virtual static" is impossible, but if our goal
# is to detect such coding gaffes, this code can't do it.
mm = method_decl_re.match(line)
vm1 = var_decl1_re.match(line)
vm2 = var_decl2_re.match(line)
if mm:
meth_name = mm.group(2)
minfo = MethodInfo(meth_name, classinfo.name) # class to hold info about a method
minfo.rettype = mm.group(1) # return type
minfo.is_static = is_static
if is_static:
if is_virtual:
classinfo.screwed_up=True
classinfo.class_methods[meth_name] = minfo
else:
minfo.is_polymorphic = is_virtual
classinfo.obj_methods[meth_name] = minfo
elif vm1 or vm2:
if vm1: # deal with vars named "itsXxxxx..."
vm=vm1
var_name = vm.group(2)[3:]
if var_name.endswith("_"):
var_name=var_name[:-1]
else: # deal with vars named "m_Xxxxx..."
vm=vm2
var_name = vm.group(2)[2:] # remove the m_
datatype = vm.group(1)
vi = VarInfo(var_name, datatype)
vi.is_static = is_static
classinfo.vars[var_name] = vi
I hope this is easy to understand and translate to other languages, at least for a starting point for anyone crazy enough to try. Use at your own risk.
Related
I'm trying to write a script to update a text file by replacing instances of certain characters, (i.e. 'a', 'w') with a word (i.e. 'airplane', 'worm').
If a single line of the text was something like this:
a.function(); a.CallMethod(w); E.aa(w);
I'd want it to become this:
airplane.function(); airplane.CallMethod(worm); E.aa(worm);
The difference is subtle but important, I'm only changing 'a' and 'w' where it's used as a variable, not just another character in some other word. And there's many lines like this in the file. Here's what I've done so far:
original = open('original.js', 'r')
modified = open('modified.js', 'w')
# iterate through each line of the file
for line in original:
# Search for the character 'a' when not part of a word of some sort
line = re.sub(r'\W(a)\W', 'airplane', line)
modified.write(line)
original.close()
modified.close()
I think my RE pattern is wrong, and I think i'm using the re.sub() method incorrectly as well. Any help would be greatly appreciated.
If you're concerned about the semantic meaning of the text you're changing with a regular expression, then you'd likely be better served by parsing it instead. Luckily python has two good modules to help you with parsing Python. Look at the Abstract Syntax Tree and the Parser modules. There's probably others for JavaScript if that's what you're doing; like slimit.
Future reference on Regular Expression questions, there's a lot of helpful information here:
https://stackoverflow.com/tags/regex/info
Reference - What does this regex mean?
And it took me 30 minutes from never having used this JavaScript parser in Python (replete with installation issues: please note the right ply version) to writing a basic solution given your example. You can too.
# Note: sudo pip3 install ply==3.4 && sudo pip3 install slimit
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = 'a.funktion(); a.CallMethod(w); E.aa(w);'
tree = Parser().parse(data)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Identifier):
if node.value == 'a':
node.value = 'airplaine'
elif node.value == 'w':
node.value = 'worm'
print(tree.to_ecma())
It runs to give this output:
$ python3 src/python_renames_js_test.py
airplaine.funktion();
airplaine.CallMethod(worm);
E.aa(worm);
Caveats:
function is a reserved word, I used funktion
the to_ecma method pretty prints; there is likely another way to output it closer to the original input.
line = re.sub(r'\ba\b', 'airplane', line)
should get you closer. However, note that you will also be replacing a.CallMethod("That is a house") into airplane("That is airplane house"), and open("file.txt", "a") into open("file.txt", "airplane"). Getting it right in a complex syntax environment using RegExp is hard-to-impossible.
I have an xml file that i need to find, multiply (e.g. by 1.25) and replace all prices.
Price tag looks like that:
<price><![CDATA[15.9]]></price>
The price tag should look like that after the operation:
<price><![CDATA[19.875]]></price>
Can this be done in Notepad++ or PowerGrep using a regular expression?
Thanks in advance.
As far as I know you can't use either program to preform the math but you can build a simple program in most any language of your choice to take a file use regex to find the number. Cast that string as a double do the math and put it back into the string. Later on today I could probably build something in c# but it should be relatively straight forward in most languages. You may even be able to build a shell script and use grep if you're not in a windows environment or use Powershell for windows but I have less experience with Powershell.
Edit: There is an easier way to do this http://msdn.microsoft.com/en-us/library/hcebdtae(v=vs.110).aspx
this is essentially what you want to do using the xmldocument object.
Edit2: I did this even though I couldn't get a hold of the original poster I thought someone might be able to use the info and I learned a lot. I can add the source code to github if anyone is interested.
public static void ChangePricesWork(string filepath, double multiply)
{
var document = new XmlDocument();
document.Load(filepath);
XmlNodeList nodeList = document.GetElementsByTagName("price");
foreach (XmlNode node in nodeList)
{
if (!string.IsNullOrEmpty(node.InnerText))
{
node.InnerText = Convert.ToString(multiplyPrice(multiply, node.InnerText));
}
}
string newFilePath = string.Format(#"{0}\{1}_updated.xml", Path.GetDirectoryName(filepath), Path.GetFileNameWithoutExtension(filepath));
document.Save(newFilePath);
}
private static double multiplyPrice(double multiply, string oldPrice)
{
var newPrice = new double();
if (Double.TryParse(oldPrice, out newPrice))
{
newPrice = newPrice * multiply;
}
return newPrice;
}
Notepad++ has a Pythonscript plugin that allows you to create quick Python scripts that have access to your document and Notepad++ itself.
The install and setup is described in this answer.
The API has moved on a little bit since then, you do a regular expression replace with Editor.rereplace now.
# Start a sequence of actions that is undone and redone as a unit. May be nested.
editor.beginUndoAction()
# multiply_price_cdata
from decimal import *
TWOPLACES = Decimal(10) ** -2
def multiply_price_cdata( m ):
price = Decimal( m.group(2) ) * Decimal( 1.25 )
return m.group(1) + str(price.quantize(TWOPLACES)) + m.group(3)
def cdata( m ):
return "CDATA"
# npp++ search/replace
re_price = r'(<price><!\[CDATA\[)(\d+\.\d+|\d+)(\]\]></price>)'
editor.rereplace( re_price , multiply_price_cdata )
# end the undo sequence
editor.endUndoAction()
I have the following regular expression pattern that matches fully qualified Microsoft SQL Server table names ([dbName].[schemaName].[tableName]), where the schema name is optional:
val tableNamePattern = """\[(\w+)\](?:\.\[(\w+)\])?\.\[(\w+)\]""".r
I am using it like this:
val tableNamePattern(database, schema, tableName) = fullyQualifiedTableName
When the schema name is missing (e.g.: [dbName].[tableName]), the schema value gets set to null.
Is there a Scala idiomatic way to set it to None instead, and to Some(schema) when the schemaName is provided?
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
-- Jamie Zawinski
I'm going to copy the code from the accepted answer on the linked question, and without giving credit too. Here it is:
object Optional {
def unapply[T](a: T) = if (null == a) Some(None) else Some(Some(a))
}
val tableNamePattern(database, Optional(schema), tablename) = fullyQualifiedTableName
PS: I just today wondered on twitter whether creating special-case extractors was as common as they were suggested. :)
Scala noob i'm afraid:
I have the following declared class variable which will the objects I read from the database:
val options = mutable.LinkedList[DivisionSelectOption]()
I then use JPA to get a List of all rows from a table:
val divisionOptions = em.createNamedQuery("SelectOption.all", classOf[SelectOption]) getResultList
/* Wrap java List in Scala List */
val wrappedOptions = JListWrapper.apply(divisionOptions)
/* Store the wrappedOptions in the class variable */
options += wrappedOptions
However, the last line has an error:
Type Expected: String, actual JListWrapper[SelectOption]
Can anyone help with what I am doing wrong? I'm just trying to populate the options object with the result of the DB call.
Thanks
What (probably) is happening is that a JlistWrapper[SelectOption] isn't a DivisionSelectOption, so the method += isn't applicable to it. That being the case, it is trying other stuff, and giving a final error on this:
options = options + wrappedOptions
That is a rewriting Scala can do to make things like x += 1 work for var x. The + method is present on all objects, but it takes a String as parameter -- that's so one can write stuff like options + ":" and have that work as in Java. But since wrappedOptions isn't a String, it complains.
Roundabout and confusing, I know, and even Odersky regrets his decision with regards to +. Let that be a lesson: if you thing of adding a method to Any, think really hard before doing it.
lcount = Open_Layers.objects.all()
form = SearchForm()
if request.method == 'POST':
form = SearchForm(request.POST)
if form.is_valid():
data = form.cleaned_data
val=form.cleaned_data['LayerName']
a=Open_Layers()
data = []
for e in lcount:
if e.Layer_name == val:
data = val
return render_to_response('searchresult.html', {'data':data})
else:
form = SearchForm()
else:
return render_to_response('mapsearch.html', {'form':form})
This just returns back if a particular "name" matches . How do to change it so that it returns when I give a search for "Park" , it should return Park1 , Park2 , Parking , Parkin i.e all the occurences of the park .
You can improve your searching logic by using a list to accumulate the results and the re module to match a larger set of words.
However, this is still pretty limited, error prone and hard to maintain or even harder to make evolve. Plus you'll never get as nice results as if you were using a search engine.
So instead of trying to manually reinvent the wheel, the car and the highway, you should spend some time setting up haystack. This is now the de facto standard to do search in Django.
Use woosh as a backend at first, it's going to be easier. If your search get slow, replace it with solr.
EDIT:
Simple clean alternative:
Open_Layers.objects.filter(name__icontains=val)
This will perform a SQL LIKE, adding %` for you.
This going to kill your database if used too often, but I guess this is probably not going to be an issue with your current project.
BTW, you probably want to rename Open_Layers to OpenLayers as this is the Python PEP8 naming convention.
Instead of
if e.Layer_name == val:
data = val
use
if val in e.Layer_name:
data.append(e.Layer_name)
(and you don't need the line data = form.cleaned_data)
I realise this is an old post, but anyway:
There's a fuzzy logic string comparison already in the python standard library.
import difflib
Mainly have a look at:
difflib.SequenceMatcher(None, a='string1', b='string2', autojunk=True).ratio()
more info here:
http://docs.python.org/library/difflib.html#sequencematcher-objects
What it does it returns a ratio of how close the two strings are, between zero and 1. So instead of testing if they're equal, you chose your similarity ratio.
Things to watch out for, you may want to convert both strings to lower case.
string1.lower()
Also note you may want to impliment your favourite method of splitting the string i.e. .split() or something using re so that a search for 'David' against 'David Brent' ranks higher.