Can anyone explain me [list source file]? - list

I have been using Tcl language for 2 months. I have a question: what does [list source file] mean? I understand source and list separately, but I do not understand what it means when they are put together.

That would appear to be using list to do command-script construction. I'd bet that the result of that [list source file] is then used with uplevel or namespace eval. Or possibly even interp eval.
The list command makes lists. It also makes substitution-free commands, so that:
eval [list $a $b]
is effectively identical in behaviour to:
$a $b
In your case, we have source instead of $a and file (which I'd lay strong odds on not being that literal) instead of $b. Why would we do this? Well, it ensures that if the file name has Tcl meta-characters in it (e.g., {) then the created script to source the file in won't have any problems at all when evaluated.
Why wouldn't you just write source file directly? Well, the most likely cases are where you want to source into a context other than the current one; the source command reads the file into a string and then effectively does an immediate eval on that string (well, there's some nuances, but it's surprisingly close to that). In particular:
proc foo {} {
source bar.tcl
}
Will run the contents of bar.tcl inside the procedure body of foo, just as if you'd typed the text in there directly. The variables will be local variables (unless you use global or something like that) and so on. Most people don't write Tcl scripts that like that sort of treatment, frankly; to handle this, and make the code evaluate in a defined context, you'd actually write:
proc foo {} {
# Quoted to defeat the Stack Overflow syntax highlighter only!
uplevel "#0" [list source bar.tcl]
}

Related

squeak(Smalltalk) search regex in string

I'm trying to write a method 'compile' that gets a string and a collection and see if the string matches the conditions.
the method signature is:
compile: stringCode where: argTypeCollection
example of use (assuming C is the class):
C compile:
'first: i second: any third: n
| local |
local := i + n.
^(local*local)'
where: #(Integer nil Number).
the first thing the method should do is analyze the string by check if the number of arguments is correct, I thought of doing so with regex.
I tried to look for regex use explanation here and here but the only example is for files and I didn't succeed to scan the string and count matches for [a-zA-z][a-zA-Z0-9]*: the same way.
any example of using regex on string in squeak will help.
When analyzing Smalltalk source code, the best option is to use the very same objects the Smalltalk compiler employs for parsing, compiling and evaluating methods and code snnipets. In other words, having the full range of compiling tools at your disposal it makes little sense to use regex for these kinds of tasks.
For instance, you can analyze the header of your method (i.e., the part of the source code defining the selector and formal arguments) using the Parser like this
Parser new parse: aString class: aClass
where aString is the method's source code and aClass is the target class, i.e., the class for which the method would make sense.
In your example the class is C. Note however that when the source code contains no reference to ivars (or for that matter class or shared variables) the argument aClass becomes irrelevant and can be replaced by Object.
The result of the parse:class: message, if the parsing succeeds, will be an Abstract Parse Tree (a.k.a. AST) whose nodes will bring more information useful for further analysis. If the parsing fails, you will get access to the parsing error object that will let you determine why the code is non-conformant with the Smalltalk syntax. As you can see, you will have everything you need to reflect on the source code under analysis.

How can I print whatever I see in Yacc/Bison?

I have a complicated Yacc file with a bunch of rules, some of them complicated, for example:
start: program
program: extern_list class
class: T_CLASS T_ID T_LCB field_dec_list method_dec_list T_RCB
The exact rules and the actions I take on them are not important, because what I want to do seems fairly simple: just print out the program as it appears in the source file, using the rules I define for other purposes. But I'm surprised at how difficult doing so is.
First I tried adding printf("%s%s", $1, $2) to the second rule above. This produced "��#P�#". From what I understand, the parsed text is also available as a variable, yytext. I added printf("%s", yytext) to every rule in the file and added extern char* yytext; to the top of the file. This produced (null){void)1133331122222210101010--552222202020202222;;;;||||&&&&;;;;;;;;;;}}}}}}}} from a valid file according to the language's syntax. Finally, I changed extern char* yytext; to extern char yytext[], thinking it would not make a difference. The difference in output it made is best shown as a screenshot
I am using Bison 3.0.2 on Xubuntu 14.04.
If you just want to echo the source to some output while parsing it, it is easiest to do that in the lexer. You don't say what you ware using for a lexer, but you mention yytext, which is used by lex/flex, so I will assume that.
When you use flex to recognize tokens, the variable yytext refers to the internal buffer flex uses to recognize tokens. Within the action of a token, it can be used to get the text of the token, but only temporarily -- once the action completes and the next token is read, it will no longer be valid.
So if you have a flex rule like:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = yytext, return T_ID; }
that likely won't work at all, as you'll have dangling pointers running around in your program; probably the source of the random-looking outputs you're seeing. Instead you need to make a copy. If you also want to output the input unchanged, you can do that here too:
[a-zA-Z_][a-zA-Z_0-9]* { yylval.str = strdup(yytext); ECHO; return T_ID; }
This uses the flex macro ECHO which is roughly equivalent to fputs(yytext, yyout) -- copying the input to a FILE * called yyout (which defaults to stdout)
If the first symbol in the corresponding right-hand side is a terminal, $1 in a bison action means "the value of yylval produced by the scanner when it returned the token corresponding to that terminal. If the symbol is a non-terminal, then it refers to the value assigned to $$ during the evaluation of the action which reduced that non-terminal. If there was no such action, then the default $$ = $1 will have been performed, so it will pass through the semantic value of the first symbol in the reduction of that non-terminal.
I apologize if all that was obvious, but your snippet is not sufficient to show:
what the semantic types are for each non-terminal;
what the semantic types are for each terminal;
what values, if any, are assigned to yylval in the scanner actions;
what values, if any, are assigned to $$ in the bison actions.
If any of those semantic types are not, in fact, character strings, then the printf will obviously produce garbage. (gcc might be able to warn you about this, if you compile the generated code with -Wall. Despite the possibility of spurious warnings if you are using old versions of flex/bison, I think it is always worthwhile compiling with -Wall and carefully reading the resulting warnings.)
Using yytext in a bison action is problematic, since it will refer to the text of the last token scanned, typically the look-ahead token. In particular, at the end of the input, yytext will be NULL, and that is what you will pick up in any reductions which occur at the end of input. glibc's printf implementation is nice enough to print (null) instead of segfaulting when your provide (char*)0 to an argument formated as %s, but I don't think it's a great idea to depend on that.
Finally, if you do have a char* semantic value, and you assign yylval = yytext (or yylval.sval = yytext; if you are using unions), then you will run into another problem, which is that yytext points into a temporary buffer owned by the scanner, and that buffer may have completely different contents by the time you get around to using the address. So you always need to make a copy of yytext if you want to pass it through to the parser.
If what you really want to do is see what the parser is doing, I suggest you enable bison's yydebug parser-trace feature. It will give you a lot of useful information, without requiring you to insert printf's into your bison actions at all.

Highlight arguments in function body in vim

A little something that could be borrowed from IDEs. So the idea would be to highlight function arguments (and maybe scoped variable names) inside function bodies. This is the default behaviour for some C:
Well, if I were to place the cursor inside func I would like to see the arguments foo and bar highlighted to follow the algorithm logic better. Notice that the similarly named foo in func2 wouldn't get highlit. This luxury could be omitted though...
Using locally scoped variables, I would also like have locally initialized variables highlit:
Finally to redemonstrate the luxury:
Not so trivial to write this. I used the C to give a general idea. Really I could use this for Scheme/Clojure programming better:
This should recognize let, loop, for, doseq bindings for instance.
My vimscript-fu isn't that strong; I suspect we would need to
Parse (non-regexply?) the arguments from the function definition under the cursor. This would be language specific of course. My priority would be Clojure.
define a syntax region to cover the given function/scope only
give the required syntax matches
As a function this could be mapped to a key (if very resource intensive) or CursorMoved if not so slow.
Okay, now. Has anyone written/found something like this? Do the vimscript gurus have an idea on how to actually start writing such a script?
Sorry about slight offtopicness and bad formatting. Feel free to edit/format. Or vote to close.
This is much harder than it sounds, and borderline-impossible with the vimscript API as it stands, because you don't just need to parse the file; if you want it to work well, you need to parse the file incrementally. That's why regular syntax files are limited to what you can do with regexes - when you change a few characters, vim can figure out what's changed in the syntax highlighting, without redoing the whole file.
The vim syntax highlighter is limited to dealing with regexes, but if you're hellbent on doing this, you can roll your own parser in vimscript, and have it generate a buffer-local syntax that refers to tokens in the file by line and column, using the \%l and \%c atoms in a regex. This would have to be rerun after every change. Unfortunately there's no autocmd for "file changed", but there is the CursorHold autocmd, which runs when you've been idle for a configurable duration.
One possible solution can be found here. Not the best way because it highlights every occurrence in the whole file and you have to give the command every time (probably the second one can be avoided, don't know about the first). Give it a look though.

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful

finding a function name and counting its LOC

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.