I've been wondering for a while: how do autocompletions work?
Example:
In PhpStorm whenever I use a class and type -> it shows me all properties and methods of that class. It also auto completes namespaces and even functions inside libraries such as jQuery.
Does it run some sort of regex on the files, or does it parse them somehow?
PhpStorm developer here. I'd like to go through some basics in case it may be helpful for those who want to implement their own plugin.
First of all, a code has to be broken into tokens using a lexer. Then AST (an abstract syntax tree) and PSI (a program structure interface) are built using a parser. PhpStorm has its own implementations of the lexer and the parser. This is how a PSI tree looks like for a simple class.
When you type in an editor or explicitly invoke a completion action (Ctrl+Space) a number of completion contributors are invoked. They're intended to return a list of suggestions based on a cursor's position.
Let's consider a case when completion is invoked inside a field reference.
PhpStorm knows that at the current position all class members can be suggested. It starts obtaining a class reference (the $class variable in our case) and determining its type. If a variable resolves to a class its type is a class' FQN (fully qualified name).
To obtain methods and fields of a class its PSI element is needed. A special index is used to map an FQN to an appropriate PhpClass tree element. Indices are initially built when a project is opened for the first time and updated for each modified file.
PhpStorm collects all members from the PSI element (including parent's ones), then from its traits. They're filtered depending on a current context (e.g. access scope) and an already typed name's part (f).
Suggestions are shown in a list which is sorted by how good element's name matches, its type, position and so on. The list rearranges when you type.
When you press Enter to insert an element PhpStorm invokes one more handler. It knows how to properly insert the element into a code. For instance, it can add parentheses for a method or import a class reference. In our case, it's enough to put brackets and place a cursor just after them because the method has no parameters.
That's basically it. It worth mention that the IntelliJ IDEA platform allows a plugin to provide an implementation for each step described above. Thus completion can be improved or extended for some particular framework or language.
Related
While doing a game engine that uses .lua files in order to read parameter values, I got stuck when I had to read these values and assign them to the parameters of each component in C++. I tried to investigate the way Unity does it, but I didn't find it (and I'm starting to doubt that Unity has to do it at all).
I want the parameters to be initialized automatically, without the user having to do the process of
myComponentParameter = readFromLuaFile("myParameterName")
for each one of the parameters.
My initial idea is to use the std::variant type, and storing an array of variants in order to read them automatically. My problems with this are:
First of all, I don't know how to know the type that std::variant is storing at the moment (tried with std::variant::type, but it didn't work for the template), in order to cast from the untyped .lua value to the C++ value. For reference, my component initialization looks like this:
bool init(luabridge::LuaRef parameterTable)
{
myIntParameter = readVariable<int>(parameterTable, "myIntParameter");
myStringParameter = readVariable<std::string>(parameterTable, "myStringParameter");
return true;
}
(readVariable function is already written in this question, in case you're curious)
The second problem is that the user would have to write std::get(myIntParameter); whenever they want to access to the value stored by the variant, and that sounds like something worse than making the user read the parameter value.
The third problem is that I can't create an array of std::variant<any type>, which is what I would like to do in order to automatically initialize the parameters.
Is there any good solution for this kind of situation where I want the init function to not be necessary, and the user doesn't need to manually set up the parameter values?
Thanks in advance.
Let's expand my comment. In a nutshell, you need to get from
"I have some things entered by the user in some file"
to:
"the client code can read the value without std::get"
…which roughly translates to:
"input validation was done, and values are ready for direct use."
…which implies you do not store your variables in variants.
In the end it is a design question. One module somewhere must have the knowledge of which variable names exist, and the type of each, and the valid values.
The input of that module will be unverified values.
The output of the module will probably be some regular c++ struct.
And the body of that module will likely have a bunch of those:
config.foo = readVariable<int>("foo");
config.bar = readVariable<std::string>("bar");
// you also want to validate values there - all ints may not be valid values for foo,
// maybe bar must follow some specific rules, etc
assuming somewhere else it was defined as:
struct Configuration {
int fooVariable;
std::string bar;
};
Where that module lives depends on your application. If all expected types are known, there is no reason to ever use a variant, just parse right away.
You would read to variants if some things do not make sense until later. For instance if you want to read configuration values that will be used by plugins, so you cannot make sense of them yet.
(actually even then simply re-parsing the file later, or just saving values as text for later parsing would work)
I want to use derived attributes and references in an ecore model, but so far I have not found any documentation on how to set the code for the methods which compute the values of derived attributes/references.
As far as I understand it, the basic workflow is to mark an attribute/reference as derived, generate model code, and then manually add the implementation. However, I work with models dynamically generated through the Ecore API. Is there a way to take a String and specify this String as the implementation for the computation of the derived feature, without manually editing generated files?
EDIT>
To clarify: I'm looking for a way to directly change the generated Java files, by specifying method bodys (as strings) for the getters of derived EStructuralFeatures.
EMF provides a way of dealing with dedicated implementation for EOperation and derived EAttribute using "invocation delegate". This functionality allows you to put some implementation directly in your ecore metamodel in a string format (as soon as the used language can be "handled" by EMF, i.e, an invocation delegate exists).
As far as I know, OCL is well supported: https://wiki.eclipse.org/OCL/OCLinEcore#Invocation_Delegate
The registration of the invocation delegate is performed either by plugin registration or by hand (for standalone usage), and the mechanism works with the EMF reflection layer (dynamic EMF): https://wiki.eclipse.org/EMF/New_and_Noteworthy/Helios#Registering_an_Invocation_Delegate
(Please note that I never experienced this mechanism. I know it exists, but I never played with it.)
EDIT>
It seems that the question was not related to dynamic code execution for derived attribute, but to code injection (I misunderstood the "Is there a way to take a String and specify this String as the implementation for the computation of the derived feature?").
EMF provides a way of injecting code placed on the ecore metamodel directly into the generated code.
Here is the way for EAttribute with derived property. The EAttribute should have the following properties set to true: {derived volatile} (you can also add transient). If you only want a getter and no setter for your EAttribute, you can also set the property changeable to false.
Once your EAttribute is well "configured", you have to add a new EAnnotation with the source set to http://www.eclipse.org/emf/2002/GenModel and an entry with the key set to get and value set to your code that will be injected (see image below).
And voilà, your code will be generated with the value value injected in your getter.
You can add the same process for EOperation using body instead of get.
I need to add logging to a legacy c++ project, which contains hundreds of user defined structs/classes. These structs only contain primary types as int, float, char[], enum.
Content of objects need to be logged ,preferred in human readable way , but not a must, as long as the object could be reconstructed.
Instead of writing different serialization methods for each class, is there any alternative method?
What you want is a Program Transformation System (PTS). These are tools that can read source code, build compiler data structures (usually ASTs) that represent the source code, and allow you to modify the ASTs and regenerate source code from the modified AST.
These are useful because they "step outside" the language, and thus have no language-imposed limitations on what you can analyze or transform. So it doesn't matter if your langauge doesn't have reflection for everything; a good PTS will give you full access to every detail of the language, including such arcana as comments and radix on numeric literals.
Some PTSes are specific to a targeted language (e.g, "Jackpot" is only usuable for Java). A really good PTS is provided a description of an arbitrary programming langauge, and can then manipulate that language. That description has to enable the PTS to parse the code, analyze it (build symbol tables at least) and prettyprint the parsed/modified result.
Good PTSes will allow you write the modifications you want to make using source-to-source transformations. These are rules specifying changes written in roughly the following form:
if you see *this*, replace it by *that* when *condition*
where this and that are patterns using the syntax of the target language being processed, and condition is a predicate (test) that must be true to enable the rule to be applied. The patterns represent well-formed code fragmens, and typically allow metavariables to represent placeholders for arbitrary subfragments.
You can use PTSes for a huge variety of program manipulation tasks. For OP's case, what he wants is to enumerate all the structs in the program, pick out the subset of interest, and then generate a serializer for each selected struct as a modification to the original program.
To be practical for this particular task, the PTS must be able to parse and name resolve (build symbol tables) C++. There are very few tools that can do this: Clang, our DMS Software Reengineering Toolkit, and the Rose compiler.
A solution using DMS looks something like this:
domain Cpp~GCC5; -- specify the language and specific dialect to process
pattern log_members( m: member_declarations ): statements = TAG;
-- declares a marker we can place on a subtree of struct member declarations
rule serialize_typedef_struct(s: statement, m: member_declarations, i: identifier):
statements->statements
= "typedef struct { \m } \i;" ->
"typedef struct { \m } \i;
void \make_derived_name\(serialize,\i) ( *\i argument, s: stream )
{ s << "logging" << \toString\(\i\);
\log_members\(\m\)
}"
if selected(i); -- make sure we want to serialize this one
rule generate_member_log_list(m: member_declarations, t: type_specification, n: identifier): statements -> statements
" \log_members\(\t \n; \m\)" -> " s << \n; \log_members\(\m\) ";
rule generate_member_log_base(t: type_specification, n: identifier): statements -> statements
" \log_members\(\t \n; \)" -> " s << \n; ";
ruleset generate_logging {
serialize_typedef struct,
generate_member_log_list,
generate_member_log_base
}
The domain declaration tells DMS which specific language front-end to use. Yes, GCC5 as a dialect is different than VisualStudio2013, and DMS can handle either.
The pattern log_members is used as a kind of transformational pointer, to remember that there is some work to do. It wraps a sequence of struct member_declarations as an agenda (tag). What the rules do is first mark structs of interest with log_members to establish the need to generate the logging code, and then generate the member logging actions. The log_members pattern acts as a list; it is processed one element at a time until a final element is processed, and then the log_members tag vanishes, having served its purpose.
The rule serialize_typedef_struct is essentially used to scan the code looking for suitable structs to serialize. When it finds a typedef for a struct, it checks that struct is one that OP wants serialized (otherwise one can just leave off the if conditional). The meta-function selected is custom-coded (not shown here) to recognize the names of structs of interest. When a suitable typedef statement is found, it is replaced by the typedef (thus preserving it), and by the shell of a serializing routine containing the agenda item log_members holding the entire list of members of the struct. (If the code declares structs in some other way, e.g., as a class, you will need additional rules to recognize the syntax of those cases). Processing the agenda item by rewriting it repeatedly produces the log actions for the individual members.
The rules are written in DMS rule-syntax; the C++ patterns are written inside metaquotes " ... " to enable DMS to distinguish rule syntax from C++ syntax. Placeholder variables v are declared in the rule header according thier syntactic categories, and show up in the meta-quoted patterns using an escape notation \v. [Note the unescaped i in the selected function call: it isn't inside metaquotes]. Similarly, meta-functions and patterns references inside the metaquotes are similarly escaped, thus initially odd looking \log\( ... \) including the escaped pattern name, and escaped meta-parentheses.
The two rules generate_member_log_xxx hand the general and final cases of log generation. The general case handles one member with more members to do; the final case handles the last member. (A slight variant would be to process an empty members list by rewriting to the trivial null statement ;). This is essentially walking down a list until you fall off the end. We "cheat" and write rather simple logging code, counting on overloading of stream writes to handle the different datatypes that OP claims he has. If he has more complex types requiring special treatment (e.g., pointer to...) he may want to write specialized rules that recognize those cases and produce different code.
The ruleset generate_logging packages these rules up into a neat bundle. You can trivially ask DMS to run this ruleset over entire files, applying rules until no rules can be further applied. The serialize_typdef_structure rule finds the structs of interest, generating the serializing function shell and the log_members agenda item, which are repeatedly re-written to produce the serialization of the members.
This is the basic idea. I haven't tested this code, and there is usually some surprising syntax variations you end up having to handle which means writing a few more rules along the same line.
But once implemented, you can run this rule over the code to get serialized results. (One might implement selected to reject named structs that already have a serialization routine, or alternatively, add rules that replace any existing serialization code with newly generated code, ensuring that the serialization procedures always match the struct definition). There's the obvious extension to generating a serialized struct reader.
You can arguably implement these same ideas with Clang and/or the Rose Compiler. However, those systems do not offer you source-to-source rewrite rules, so you have to write procedural code to climb up and down trees, inspect individual nodes, etc. It is IMHO a lot more work and a lot less readable.
And when you run into your next "C++ doesn't reflect that", you can tackle the problem with the same tool :-}
Since C++ does not have reflection there is no way for you to dynamically inspect the members of an object at runtime. Thus it follows that you need to write a specific serialization/streaming/logging function for each type.
If all the different types had members of the same name, then you could write a template function to handle them, but I assume that is not the case.
As C++ does not have reflection this is not that easy.
If you want to avoid a verbose solution you can use a variadic template.
E.g.
`class MyStruct {
private:
int a;
float f;
public:
void log()
{
log_fields(a, f);
}
};`
where log_fields() is the variadic template. It would need to be specialized for all the basic types found on those user defined types and also for a recursive case.
As an example, a string that contains only a valid email address, as defined by some regex.
If a field of this type would be a part of a more complex data structure, or would be used as a function parameter, or used in any other context, the client code would be able to assume the field is a string containing a valid email address. Thus, no checks like "valid?" should be ever necessary, so approach of domaintypes would not work.
In Haskell this could be accomplished by a smart constructor (section 1.2) and in Java by ensuring the type is immutable (all setters private) and by adding a check in the constructor that throws a RuntimeException if the string used to create the type doesn't contain a valid email address.
If this is impossible in plain Clojure, I would like to see an example implementation in some well known extensions of the language, like Typed Clojure.
Ok, maybe, I understand now a question and I formulate in the comment my thoughts not really well. So I try to suggest an admissible solution to your question and then I try to explain some ideas I tried to tell in the comment.
1) There is a gen-class that generates compiled bytecode for a class and you can set constructor for the class there.
2) You can create a record with defrecord in some namespace that is private by convention in your project, then you
create another namespace with public api and define your factory function here. So the user of your public namespace will be able to call only public functions of your public namespace. (Of course, he can call also private ones, but with some another code)
3) You can just define a function like make-email that will return a map.
So you didn't specify your data structure anywhere.
4) You can just document your code where you will warn people to use the factory function for construction.
But! In Java if your code requires some interface, then it's user problem to give to your code the valid interface implementation. So if you write even a little bit general code in Java you already has lost the property of the valid email string. This stuff with interfaces is because Java is statically typed language.
Clojure is, in general, dynamically typed, so the user, in general, should be able to pass arbitrary data structure to arbitrary function without any type problems in compile time and it's his fault if he pass the wrong data. That makes, for example, this thing possible: You create a record and create a factory (constructor) function. And you expect a record to be passed in your code. But the user can pass a map with the same keys as your record fields names and the code will work.
So, in general, if you want the user of your code to be responsible for passing a required typed in dynamically typed language, then it cost nothing for user to be responsible for constructing it in a correct way that you provide to him.
Another solutions are: User just write tests. You can specify in your api functions :pre and :post conditions to check the structure. You can use typed clojure with the ideas I wrote above. And you can use some additional declarative libraries, like that was mentioned in the first comment of #Thumbnail.
P.S. I'm not a clojure professional, so I could easily miss some better solutions.
I am developing a C++ application used to simulate a real world scenario. Based on this simulation our team is going to develop, test and evaluate different algorithms working within such a real world scenrio.
We need the possibility to define several scenarios (they might differ in a few parameters, but a future scenario might also require creating objects of new classes) and the possibility to maintain a set of algorithms (which is, again, a set of parameters but also the definition which classes are to be created). Parameters are passed to the classes in the constructor.
I am wondering which is the best way to manage all the scenario and algorithm configurations. It should be easily possible to have one developer work on one scenario with "his" algorithm and another developer working on another scenario with "his" different algorithm. Still, the parameter sets might be huge and should be "sharable" (if I defined a set of parameters for a certain algorithm in Scenario A, it should be possible to use the algorithm in Scenario B without copy&paste).
It seems like there are two main ways to accomplish my task:
Define a configuration file format that can handle my requirements. This format might be XML based or custom. As there is no C#-like reflection in C++, it seems like I have to update the config-file parser each time a new algorithm class is added to project (in order to convert a string like "MyClass" into a new instance of MyClass). I could create a name for every setup and pass this name as command line argument.
The pros are: no compilation required to change a parameter and re-run, I can easily store the whole config file with the simulation results
contra: seems like a lot of effort, especially hard because I am using a lot of template classes that have to be instantiated with given template arguments. No IDE support for writing the file (at least without creating a whole XSD which I would have to update everytime a parameter/class is added)
Wire everything up in C++ code. I am not completely sure how I would do this to separate all the different creation logic but still be able to reuse parameters across scenarios. I think I'd also try to give every setup a (string) name and use this name to select the setup via command line arg.
pro: type safety, IDE support, no parser needed
con: how can I easily store the setup with the results (maybe some serialization?)?, needs compilation after every parameter change
Now here are my questions:
- What is your opinion? Did I miss
important pros/cons?
- did I miss a third option?
- Is there a simple way to implement the config file approach that gives
me enough flexibility?
- How would you organize all the factory code in the seconde approach? Are there any good C++ examples for something like this out there?
Thanks a lot!
There is a way to do this without templates or reflection.
First, you make sure that all the classes you want to create from the configuration file have a common base class. Let's call this MyBaseClass and assume that MyClass1, MyClass2 and MyClass3 all inherit from it.
Second, you implement a factory function for each of MyClass1, MyClass2 and MyClass3. The signatures of all these factory functions must be identical. An example factory function is as follows.
MyBaseClass * create_MyClass1(Configuration & cfg)
{
// Retrieve config variables and pass as parameters
// to the constructor
int age = cfg->lookupInt("age");
std::string address = cfg->lookupString("address");
return new MyClass1(age, address);
}
Third, you register all the factory functions in a map.
typedef MyBaseClass* (*FactoryFunc)(Configuration *);
std::map<std::string, FactoryFunc> nameToFactoryFunc;
nameToFactoryFunc["MyClass1"] = &create_MyClass1;
nameToFactoryFunc["MyClass2"] = &create_MyClass2;
nameToFactoryFunc["MyClass3"] = &create_MyClass3;
Finally, you parse the configuration file and iterate over it to find all the entries that specify the name of a class. When you find such an entry, you look up its factory function in the nameToFactoryFunc table and invoke the function to create the corresponding object.
If you don't use XML, it's possible that boost::spirit could short-circuit at least some of the problems you are facing. Here's a simple example of how config data could be parsed directly into a class instance.
I found this website with a nice template supporting factory which I think will be used in my code.