Related
I am aware of a similar question for C#. I downloaded and tried NArrange and UniversalIndentGUI but both do not sort functions of C++ code per name. Does anyone know a non-commercial tool that does this job?
Unless you're under orders to rearrange the code to conform to an arbitrary coding standard, my advice is do not do this. I've seen people do it before, and the results are not pretty. The file will look completely different after you're done, and you'll have effectively trashed all of the edit history in source control. Any diffs between this version and any version that came before it will look like a jumbled mess. In the long run, having a clear diff history is worth more to you and your team than some measure of code cleanliness.
How compatible is the WPS SAS-clone with the corresponding products from SAS Institute?
Has anyone tried it - if so: have you run into any compatibility issues?
WPS has a good comparison document available on their website. It lists areas where WPS and SAS are compatible and where they are not. Start in http://www.teamwpc.co.uk/products/wps/language
I have tried it, although briefly and three years ago. I have a former co-worker that uses it exclusively as an alternative to SAS. He manages to function mostly without problems, but does run into some syntax related oddities on some of the more obscure procedures or functions.
WPS does offer a free trial, so you've nothing to lose with that. Give it a shot?
I don't really have an answer, but have you tried SAS-L? I'm sure you'll find people there with strong opinions about this. Also there are a couple of threads available already, which might get you started.
My 2 cents:
I've tried WPS for some time and it does indeed make for a SAS working substitute as long as you keep yourself from using some advanced stuff from the base language.
A few examples from my own experience:
I wasn't able to use the CALL MODULE* family for calling win32 API functions.
When reading one row at a time through FETCH() you cannot use CALL SET() for automatically initializing all the variables found in the input dataset.
Some random errors using macros. Sorry I don't remember those in detail.
In a few words: if you have a working SAS installation, ask for a WPS trial and test if it fits your use. If it does, be sure it's a tremendous save in licensing.
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
References Needed for Implementing an Interpreter in C/C++
How to create a language these days?
Learning to write a compiler
I know some c++, VERY good at php, pro at css html, okay at javascript. So I was thinking of how was c++ created I mean how can computer understand what codes mean? How can it read... so is it possible I can create my own language and how?
If you're interested in compiler design ("how can computer understand what codes mean"), I highly recommend Dragon Book. I used it while in college and went as far as to create programming language myself.
"Every now and then I feel a temptation to design a programming
language but then I just lie down until it goes away." — L. Peter
Deutsch
EDIT (for those who crave context):
"[L. Peter Deutsch] also wrote the PDP-1 Lisp 1.5 implementation, Basic PDP-1 LISP, 'while still in short pants' between the age of 12-15 years old."
If you want to understand how the computer understands the code, you might want to learn some assembly language. It's a much lower-level language and will give you a better feel for the kinds of simple instructions that really get executed. You should also be able to get a feel for how one implements higher level constructs like loops with only conditional jumps.
For an even lower-level understanding, you'll need to study up on electronics. Digital logic shows you how you can take electronic "gates" and implement a generic CPU that can understand the machine code generated from the assembly language code.
For really-low level stuff, you can study material science which can teach you how to actually make the gates work at an atomic level.
You sound like a resourceful person. You'll want to hunt down books and/or websites on these topics tailored to your level of understanding and that focus on what you're interested in the most. A fairly complete understanding of all of this comes with a BS degree in computer science or computer engineering, but many things are quite understandable to a motivated person in your position.
Yes it's possible to create your own language. Take a look at compiler compilers. Or the source code to some scripting languages if you dare. Some useful tools are yacc and bison and lexx.
Others have mentioned the dragon book. We used a book that I think was called "compiler theory and practice" back in my university days.
It's not necessary to learn assembler to write a language. For example, Javascript runs in something called an interpreter which is an application that executes javascript files. In thise case, the interpreter is usually built into the browser.
The easiest starting program language might be to write a simple text based calculator. i.e. taking a text file, run through it and perform the calculations. You could write that in C++ very easily.
My first language for a college project was a language defined in BNF given to us. We then had to write a parser which parsed it into a tree structure in memory and then into something called 3 address code (which is assembler like). You could quite easily turn 3 address code into either real assembler or write an interpreter for that.
Yup! It's definitely possible. Others have mentioned the Dragon Book, but there is also plenty of information online. llvm, for example, has a tutorial on implementing a programming language: http://llvm.org/docs/tutorial/
I really recommend Programming Language Pragmatics. It's a great book that takes you all the way from what a language is through how compilers work and creating your own. It's a bit more accessible than the Dragon Book and explains how things work before jumping in headfirst.
It is possible. You should learn about compilers and/or interpreters: what they are for and how they are made.
Start learning ASM and reading up on how byte code works and you might have a chance :)
If you know C -- it sounds like you do -- grab a used copy of this ancient book:
http://www.amazon.com/Craft-Take-Charge-Programming-Book-Disk/dp/0078818826
In it there's a chapter where the author creates a "C" interpreter, in C. It's not academically serious like the Dragon book would be, but I remember it being pretty simple, very practical and easy to follow, and since you're just getting started, it would be an awesome introduction to the ideas of a "grammar" for languages, and "tokenizing" a program.
It would be a perfect place for you to start. Also, at $0.01 for a used copy, cheaper than the Dragon Book. ;)
Start with creating a parser. Read up on EBNF grammars. This will answer your question about how the computer can read code. This is a very advanced topic, so don't expect too much of yourself, but have fun. Some resources I've used for this are bison, flex, and PLY.
Yes! Getting interested in compilers was my hook into professional CS (previously I'd been on a route to EE, and only formally switched sides in college), it's a great way to learn a TON about a wide range of computer science topics. You're a bit younger (I was in high school when I started fooling around with parsers and interpreters), but there's a whole lot more information at your fingertips these days.
Start small: Design the tiniest language you can possibly think of -- start with nothing more than a simple math calculator that allows variable assignment and substitution. When you get adventurous, try adding "if" or loops. Forget arcane tools like lex and yacc, try writing a simple recursive descent parser by hand, maybe convert to simple bytecodes and write an interpreter for it (avoid all the hard parts of understanding assembly for a particular machine, register allocation, etc.). You'll learn a tremendous amount just with this project.
Like others, I recommend the Dragon book (1986 edition, I don't like the new one, frankly).
I'll add that for your other projects, I recommending using C or C++, ditch PHP, not because I'm a language bigot, but just because I think that working through the difficulties in C/C++ will teach you a lot more about underlying machine architecture and compiler issues.
(Note: if you were a professional, the advice would be NOT to create a new language. That's almost never the right solution. But as a project for learning and exploration, it's fantastic.)
If you want a really general (but very well written) introduction to this topic -- computing fundamentals -- I highly recommend a book titled Code by Charles Petzold. He explains a number of topics you are interest and from there you can decide what you want to create yourself.
Check out this book,
The Elements of Computing Systems: Building a Modern Computer from First Principles it takes you step by step through several aspects of designing a computer language, a compiler, a vm, the assembler, and the computer. I think this could help you answer some of your questions.
I got a task related to ANCIENT C++ project which hasn't any documentation, comments at all and all code/variables is written in foreign language. Do I have a chance to analyze this code in a 1 working day and make a design/UML to create new features? I have been sitting around for 3 hours already and I feel so frustrated... Maybe somebody also had same problem? Any advice?
BR,
I suspect the biggest issue may be the fact that it's in a foreign language. You can use various static code analysis tools to try and understand what's going on, but if everything is presented in an unfamiliar language then that's still no use. Your first step (I believe) is to find someone who can speak this language and get them to translate as you go...
1) Use Doxygen , You can configure doxygen to extract the code structure from undocumented source files.
2) Use source Insight, Source Insight is an advanced code editor and browser with built-in analysis for C/C++, C#, and Java programs
Short answer, no - you probably don't have a chance to understand the code in one day. Reading/maintaining code is one of the hardest things to do, especially when it's lacking documentation. The fact that the code is in a foreign language (!) makes it even harder.
Sounds like you are on a very restricted (unrealistic) time-budget, but Working With Legacy Software is a good book if you're working with legacy systems. If you are planning to keep adding new features to the legacy system it's your responsibility to make your management aware of the scope of the operation. Or at least try.
Under this time constraint (1 day) it may or may not be doable depending on the size of the project - if its a few hundred lines of code then for sure. If its a serious project with several tens of thousands code lines, then likely no.
The first thing you need to know is what is this program supposed to do at all. If you have no idea what it does and how it does it, then analyzing the code will give you the answer but it will be a long and frustrating task. So my first suggestion would be to get yourself familiar with the outer workings of the software - what does it supposed to do and generally how it is supposed to do it. If you are doing it as part as your work then you should be able to get someone to walk you through using the program - even if its UI is in a foreign language (which I hope it doesn't, even if the code is written by a foreign language speaker).
Once you know what the software is attempting to do, then it should be fairly straight forward (even if lengthy and daunting) to rewrite all the comments in your own language for you to understand. I suggest doing so in a bottoms-up approach: its easier to understand the small and trivial things a program does, then to understand the top-level logic - and a lot of trivial things in order make up the logic of the software.
Only once you understand - to a large degree, anyway - the inner workings of the program you may write its functional spec and work on features.
Non-free way on Windows:
You can use CppDepend. This application is able to parse your visual project or your source files. It gives you a lot of information like dependency trees. You can try the trial (Maybe it will be enough for what you have to do).
Free way multi-platform:
You can use doxygen with a special configuration (extract code structure from undocumented code) and analyze the result.
I was quite happy with a tool called Understand (15-day eval license available) for this kind of task. However, I agree with Guss that the time you'll need depends a lot on the size of the code, and one day is probably just enough for a small program.
cscope & ctags are a must when I do my own code, and even more when looking to other's code.
You may also try this ::
http://www.sgvsarc.com/product_crystalflow.htm
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've recently come to maintain a large amount of scientific calculation-intensive FORTRAN code. I'm having difficulties getting a handle on all of the, say, nuances, of a forty year old language, despite google & two introductory level books. The code is rife with "performance enhancing improvements". Does anyone have any guides or practical advice for de-optimizing FORTRAN into CS 101 levels? Does anyone have knowledge of how FORTRAN code optimization operated? Are there any typical FORTRAN 'gotchas' that might not occur to a Java/C++/.NET raised developer taking over a FORTRAN 77/90 codebase?
You kind of have to get a "feel" for what programmers had to do back in the day. The vast majority of the code I work with is older than I am and ran on machines that were "new" when my parents were in high school.
Common FORTRAN-isms I deal with, that hurt readability are:
Common blocks
Implicit variables
Two or three DO loops with shared CONTINUE statements
GOTO's in place of DO loops
Arithmetic IF statements
Computed GOTO's
Equivalence REAL/INTEGER/other in some common block
Strategies for solving these involve:
Get Spag / plusFORT, worth the money, it solves a lot of them automatically and Bug Free(tm)
Move to Fortran 90 if at all possible, if not move to free-format Fortran 77
Add IMPLICIT NONE to each subroutine and then fix every compile error, time consuming but ultimately necessary, some programs can do this for you automatically (or you can script it)
Moving all COMMON blocks to MODULEs, low hanging fruit, worth it
Convert arithmetic IF statements to IF..ELSEIF..ELSE blocks
Convert computed GOTOs to SELECT CASE blocks
Convert all DO loops to the newer F90 syntax
myloop: do ii = 1, nloops
! do something
enddo myloop
Convert equivalenced common block members to either ALLOCATABLE memory allocated in a module, or to their true character routines if it is Hollerith being stored in a REAL
If you had more specific questions as to how to accomplish some readability tasks, I can give advice. I have a code base of a few hundred thousand lines of Fortran which was written over the span of 40 years that I am in some way responsible for, so I've probably run across any "problems" you may have found.
Legacy Fortran Soapbox
I helped maintain/improve a legacy Fortran code base for quite a while and for the most part think sixlettervariables is on the money. That advice though, tends to the technical; a tougher row to hoe is in implementing "good practices".
Establish a required coding style and coding guidelines.
Require a code review (of more than just the coder!) for anything submitted to the code base. (Version control should be tied to this process.)
Start building and running unit tests; ditto benchmark or regression tests.
These might sound like obvious things these days, but at the risk of over-generalizing, I claim that most Fortran code shops have an entrenched culture, some started before the term "software engineering" even existed, and that over time what comes to dominate is "Get it done now". (This is not unique to Fortran shops by any means.)
Embracing Gotchas
But what to do with an already existing, grotty old legacy code base? I agree with Joel Spolsky on rewriting, don't. However, in my opinion sixlettervariables does point to the allowable exception: Use software tools to transition to better Fortran constructs. A lot can be caught/corrected by code analyzers (FORCHECK) and code rewriters (plusFORT). If you have to do it by hand, make sure you have a pressing reason. (I wish I had on hand a reference to the number of software bugs that came from fixing software bugs, it is humbling. I think some such statistic is in Expert C Programming.)
Probably the best offense in winning the game of Fortran gotchas is having the best defense: Knowing the language fairly well. To further that end, I recommend ... books!
Fortran Dead Tree Library
I have had only modest success as a "QA nag" over the years, but I have found that education does work, some times inadvertently, and that one of the most influential things is a reference book that someone has on hand. I love and highly recommend
Fortran 90/95 for Scientists and Engineers, by Stephen J. Chapman
The book is even good with Fortran 77 in that it specifically identifies the constructs that shouldn't be used and gives the better alternatives. However, it is actually a textbook and can run out of steam when you really want to know the nitty-gritty of Fortran 95, which is why I recommend
Fortran 90/95 Explained, by Michael Metcalf & John K. Reid
as your go-to reference (sic) for Fortran 95. Be warned that it is not the most lucid writing, but the veil will lift when you really want to get the most out of a new Fortran 95 feature.
For focusing on the issues of going from Fortran 77 to Fortran 90, I enjoyed
Migrating to Fortran 90, by Jim Kerrigan
but the book is now out-of-print. (I just don't understand O'Reilly's use of Safari, why isn't every one of their out-of-print books available?)
Lastly, as to the heir to the wonderful, wonderful classic, Software Tools, I nominate
Classical FORTRAN, by Michael Kupferschmid
This book not only shows what one can do with "only" Fortran 77, but it also talks about some of the more subtle issues that arise (e.g., should or should not one use the EXTERNAL declaration). This book doesn't exactly cover the same space as "Software Tools" but they are two of the three Fortran programming books that I would tag as "fun".... (here's the third).
Miscellaneous Advice that applies to almost every Fortran compiler
There is a compiler option to enforce IMPLICIT NONE behavior, which you can use to identify problem routines without modifying them with the IMPLICIT NONE declaration first. This piece of advice won't seem meaningful until after the first time a build bombs because of an IMPLICIT NONE command inserted into a legacy routine. (What? Your code review didn't catch this? ;-)
There is a compiler option for array bounds checking, which can be useful when debugging Fortran 77 code.
Fortran 90 compilers should be able to compile almost all Fortran 77 code and even older Fortran code. Turn on the reporting options on your Fortran 90 compiler, run your legacy code through it and you will have a decent start on syntax checking. Some commercial Fortran 77 compilers are actually Fortran 90 compilers that are running in Fortran 77 mode, so this might be relatively trivial option twiddling for whatever build scripts you have.
There's something in the original question that I would caution about. You say the code is rife with "performance enhancing improvements". Since Fortran problems are generally of a scientific and mathematical nature, do not assume these performance tricks are there to improve the compilation. It's probably not about the language. In Fortran, the solution is seldom about efficiency of the code itself but of the underlying mathematics to solve the end problem. The tricks may make the compilation slower, may even make the logic appear messy, but the intent is to make the solution faster. Unless you know exactly what it is doing and why, leave it alone.
Even simple refactoring, like changing dumb looking variable names can be a big pitfall. Historically standard mathematical equations in a given field of science will have used a particular shorthand since the days of Maxwell. So to see an array named B(:) in electromagnetics tells all Emag engineers exactly what is being solved for. Change that at your peril. Moral, get to know the standard nomenclature of the science before renaming too.
As someone with experience in both FORTRAN (77 flavor although it has been a while since I used it seriously) and C/C++ the item to watch out for that immediately jumps to mind are arrays. FORTRAN arrays start with an index of 1 instead of 0 as they do in C/C++/Java. Also, memory arrangement is reversed. So incrementing the first index gives you sequential memory locations.
My wife still uses FORTRAN regularly and has some C++ code she needs to work with now that I'm about to start helping her with. As issues come up during her conversion I'll try to point them out. Maybe they will help.
I have used Fortran starting with the '66 version since 1967 (on an IBM 7090 with 32k words of memory). I then used PL/1 for some time, but later went back to Fortran 95 because it is ideally suited for the matrix/complex-number problems we have. I would like to add to the considerations that much of the convoluted structure of old codes is simply due to the small amount of memory available, forcing such thing like reusing a few lines of code via computed or assigned GOTOs. Another problem is optimization by defining auxiliary variables for every repeated subexpression - compilers simply did not optimize for that. In addition, it was not allowed to write DO i=1,n+1; you had to write n1=n+1; DO i=1,n1. In consequence old codes are overwhelmed with superfluous variables. When I rewrote a code in Fortran 95, only 10% of the variables survived. If you want to make the code more legible, I highly recommend looking for variables that can easily be eliminated.
Another thing I might mention is that for many years complex arithmetic and multidimensional arrays were highly inefficient. That is why you often find code rewritten to do complex calculations using only real variables, and matrices addressed with a single linear index.
Well, in one sense, you're lucky, 'cause Fortran doesn't have much in the way of subtle flow-of-control constructs or inheritance or the like. On the other, it's got some truly amazing gotchas, like the arithmetically-calculated branch-to-numeric-label stuff, the implicitly-typed variables which don't require declaration, the lack of true keywords.
I don't know about the "performance enhancing improvements". I'd guess most of them are probably ineffective, as a couple of decades of compiler technology have made most hinting unnecessary. Unfortunately, you'll probably have to leave things the way they are, unless you're planning to do a massive rewrite.
Anyway, the core scientific calculation code should be fairly readable. Any programming language using infix arithmetic would be good preparation for reading Fortran's arithmetic and assignment code.
Could you explain what you have to do in maintaining the code? Do you really have to modify the code? If you can get away by modifying just the interface to that code instead of the code itself, that would be the best.
The inherent problem when dealing with a large scientific code (not just FORTRAN) is that the underlying mathematics and the implementation are both complex. Almost by default, the implementation has to include code optimization, in order to run within reasonable time frame. This is compounded by the fact that a lot of code in this field is created by scientists / engineers that are expert in their field, but not in software development. Let's just say that "easy to understand" is not the first priority to them (I was one of them, still learning to be a better software developer).
Due to the nature of the problem, I don't think a general question and answer is enough to be helpful. I suggest you post a series of specific questions with code snippet attached. Perhaps starting with the one that gives you the most headache?
I loved FORTRAN, I used to teach and code in it. Just wanted to throw that in. Haven't touched it in years.
I started out in COBOL, when I moved to FORTRAN I felt I was freed. Everything is relative, yeah?
I'd second what has been said above - recognise that this is a PROCEDURAL language - no subtelties - so take it as you see it.
Probably frustrate you to start with.
I started on Fortran IV (WATFIV) on punch cards, and my early working years were VS FORTRAN v1 (IBM, Fortran 77 level). Lots of good advice in this thread.
I would add that you have to distinguish between things done to get the beast to run at all, versus things that "optimize" the code, versus things that are more readable and maintainable. I can remember dealing with VAX overlays in trying to get DOE simulation code to run on IBM with virtual memory (they had to be removed and the whole thing turned into one address space).
I would certainly start by carefully restructuring FORTRAN IV control structures to at least FORTRAN 77 level, with proper indentation and commenting. Try to get rid of primitive control structures like ASSIGN and COMPUTED GOTO and arithmetic IF, and of course, as many GOTOs as you can (using IF-THEN-ELSE-ENDIF). Definitely use IMPLICIT NONE in every routine, to force you to properly declare all variables (you wouldn't believe how many bugs I caught in other people's code -- typos in variable names). Watch out for "premature optimizations" that you're better off letting the compiler handle by itself.
If this code is to continue to live and be maintainable, you owe it to yourself and your successors to make it readable and understandable. Just be certain of what you are doing as you change the code! FORTRAN has lots of peculiar constructs that can easily trip up someone coming from the C side of the programming world. Remember than FORTRAN dates back to the mid-late '50s, when there was no such thing as a science of language and compiler design, just ad hoc hacking together of something (sorry, Dr. B!).
Here's another one that has bit me from time to time. When you are working on FORTRAN code make sure you skip all six initial columns. Every once and a while, I'll only get the code indented five spaces and nothing works. At first glance everything seems okay and then I finally realize that all the lines are starting in column 6 instead of column 7.
For anyone not familiar with FORTRAN, the first 5 columns are for line numbers (=labels), the 6th column is for a continuation character in case you have a line longer than 80 characters (just put something here and the compiler knows that this line is actually part of the one before it) and code always starts in column 7.