Which algorithm is best for Text Summarization? - data-mining

Cosine, Dice, Jaccard out of these algorithms which algorithm is best for Text Summarization?

None of them is an algorithm for text summarization at all.
They are similarity measures that can be applied for two texts.

Extractive summarization
Extractive summarization means identifying important sections of the text and generating them verbatim producing a subset of the sentences from the original text; while abstractive summarization reproduces important material in a new way after interpretation and examination of the text using advanced natural language techniques to generate a new shorter text that conveys the most critical information from the original one.
This is where the model identifies the important sentences and phrases from the original text and only outputs those.
Abstractive Summarization
Abstractive Summarization is more advanced and closer to human-like interpretation. Though it has more potential (and is generally more interesting for researchers and developers), so far the more traditional methods have proved to yield better results.
The model produces a completely different text that is shorter than the original, it generates new sentences in a new form, just like humans do. In this tutorial, we will use transformers for this approach.
reference_text = """Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in contrast to the natural intelligence (NI) displayed by humans and other animals. In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". See glossary of artificial intelligence. The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the quip "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from "artificial intelligence", having become a routine technology. Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data, including images and videos. Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding. For most of its history, AI research has been divided into subfields that often fail to communicate with each other. These sub-fields are based on technical considerations, such as particular goals (e.g. "robotics" or "machine learning"), the use of particular tools ("logic" or "neural networks"), or deep philosophical differences. Subfields have also been based on social factors (particular institutions or the work of particular researchers). The traditional problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing, perception and the ability to move and manipulate objects. General intelligence is among the field's long-term goals. Approaches include statistical methods, computational intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, neural networks and methods based on statistics, probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy and many others. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it". This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity if it progresses unabatedly. Others believe that AI, unlike previous technological revolutions, will create a risk of mass unemployment. In the twenty-first century, AI techniques have experienced a resurgence following concurrent advances in computer power, large amounts of data, and theoretical understanding; and AI techniques have become an essential part of the technology industry, helping to solve many challenging problems in computer science."""
Abstractive Summarize
len(reference_text.split())
from transformers import pipeline
summarization = pipeline("summarization")
abstractve_summarization = summarization(reference_text)[0]["summary_text"]
Output
In computer science AI research is defined as the study of "intelligent agents" Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go)
Extractive summarization
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer
parser = PlaintextParser.from_string(reference_text, Tokenizer("english"))
summarizer = LexRankSummarizer()
extractve_summarization = summarizer(parser.document,2)
extractve_summarization = ' '.join([str(s) for s in list(extractve_summarization)])
Extractive Output
Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. Sub-fields have also been based on social factors (particular institutions or the work of particular researchers).The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects.

seq2seq deep learning models are mostly used for this case,
this is a blog series that talks in much detail from the very beginning of how seq2seq works till reaching the newest research approaches
Also this repo collects multiple implementations on building a text summarization model, it runs these models on google colab, and hosts the data on google drive, so no matter how powerful your computer is, you can use google colab which is a free system to train your deep models on
If you like to see the text summarization in action, you can use this free api.
I truly hope this helps

Related

How to display computer vision and AR in UML use case diagrams

My system uses both Augmented reality and computer vision,
The first feature is: The user actor can scan a specific object and the computer vision should recognize it.
The second feature is: The user actor can view a specific place using the augmented reality.
Each feature is a use case connected to the user, but do I also connect them to some sort of AI actors? and if so what is the suitable way to do it?
Do I just say "Computer vision system", and "Augmented reality system" ?
Feature or use-case?
This is a good start. However, there is a key misconception here:
Features are characteristics or capabilities offered by the software that are valued by users because it helps them to achieve some purpose. Features are often identified with user stories.
Use-cases represent goals of the actors using the system, that corresponds to a set of behaviors and interactions with the user, without reference to the internal structure of the system.
These a two different concepts. There is of course some overlap: some higher level capabilities can be described in terms of goals. For example, an ERP can be expected to have accounting, warehouse management and sales administration features.
But features are more general: it can also describe technical capabilities that are not directly observable by the user (e.g. backup), capabilities that are not directly related to a specific set of behaviors (e.g. multilingual user interface), Or which are much more detailed (e.g. date picking feature)
If you're on features, you may consider non-UML techniques, such as a feature tree, or user-story mapping (which is a kind of feature tree constructed with user-stories).
The big picture with use-cases
In your diagram, the bulbs seem to show that the system offers, and not what the user wants to do. If you want to show the big-picture with use-cases, you need to relate the bubbles with user goals:
Does the user just want to scan objects? Or is this scanning only one step for a larger goal, such as making an inventory, recognizing and ordering spare parts, or populating a virtual world.
Does the user just want to view a place in VR? Or are the expectations more ambitious, like purchasing products that would look fine in a given place?
This might look like an unecessary philosophic debate. But it is not. Because the main benefit of use-cases is a goal-oriented approach. Framing the problem or the expectations correctly, may allow you to think more creatively at alternatives instead of locking you early in a pre-conceived solution.
The right boundaries
The actors raise another question: are these actors autonomous and independent systems and do they matter to the user? Or are they just implementation details?
Formally, actors are external to the system, and moreover, the use-case should not depend on the internal structure of the system. So if the computer vision and the virtual reality system are in fact libraries, components, sub-systems of your system, they should not appear in the diagram.
Secondly, use-cases should offer observable result of value for actors. If the external system is dependent on your system and has no value on its own, then the use-case results cannot be of value to this system. For example, a DBMS are often viewed as candidate actors, but do not pass this test: the DBMS without the main system would be useless. If the system is not independent an autonomous, just remove it from the diagram to keep things simple.
Lastly, is, does the system actor matter to the other actors? If it makes no difference for your human users if an external system-actor intervenes, keep it simple and do not show the system-actor although you could. Because then again, it's more an implementation choice to rely on an external system than a requirement.
The way you denoted it is common practice. The so-called primary actor (which is who receives the added value from the system under consideration) is placed to the left and the so-called secondary actors (which only take part in and/or support the use case) are placed to the right. Depending on who the reader of the UC diagram will be their appearance will make sense or not. If you present it to some customer they are likely not interested in IT blurb. But for system designers it would be some important information.

large scale data mining with clojure

I'm looking for a good reference on
large scale data mining with Clojure
I know of many good clojure programming books (Programming Clojure, Joy of Clojure, ...), and many good data mining text books (mining of massive data sets, managing gigabytes, ...). However I'm not aware of any reference that specifically addresses
large scale data mining with Clojure
The "with clojure" part is rather important to me for the following reasons:
* most theoretical analysis uses big-Oh running time, which ignores constants
* constants matter, if it ends up being a matter of 1 second vs 1 hour (for things that need to be real time)
* or 1 hour vs 1 week (for batch jobs)
In particular, I think there's a lot of interplay between the JVM, Clojure Data Structures, whether data is stored in memory or lazily read from disk -- that can have the "same" algorithm have drastically different running times by "slightly" different implementations.
Thus, my question (all of the above was to avoid being closed by "Check Google"):
what is a good resource on massive data mining with Clojure?
Thanks!
I don't think anyone's yet written a good comprehensive reference. But there is certainly lots of work going on in this space (my own company included!)
Some interesting links to follow up:
Storm - distributed realtime computation using Clojure. Could be used for large scale data mining.
http://www.infoq.com/presentations/Why-Prismatic-Goes-Faster-With-Clojure - interesting video regarding Clojure performance and optimisation for machine learning applications
Incanter - probably the leading Clojure library for statistics and data visualisation
Weka - very comprehensive data mining / machine learning library for Java (and hence very easy to use directly from Clojure)
There is a wonderful book that is coming out in May 2013: Clojure Data Analysis Cookbook. I will probably buy it.
http://www.amazon.co.uk/Clojure-Data-Analysis-Cookbook-ebook/dp/B00BECVV9C/ref=sr_1_1?s=books&ie=UTF8&qid=1360697819&sr=1-1
In Detail
Data is everywhere and it's increasingly important to be able to gain
insights that we can act on. Using Clojure for data analysis and
collection, this book will show you how to gain fresh insights and
perspectives from your data with an essential collection of practical,
structured recipes.
"The Clojure Data Analysis Cookbook" presents recipes for every stage
of the data analysis process. Whether scraping data off a web page,
performing data mining, or creating graphs for the web, this book has
something for the task at hand.
You'll learn how to acquire data, clean it up, and transform it into
useful graphs which can then be analyzed and published to the
Internet. Coverage includes advanced topics like processing data
concurrently, applying powerful statistical techniques like Bayesian
modelling, and even data mining algorithms such as K-means clustering,
neural networks, and association rules.
Approach
Full of practical tips, the "Clojure Data Analysis Cookbook" will help
you fully utilize your data through a series of step-by-step, real
world recipes covering every aspect of data analysis.
Who this book is for
Prior experience with Clojure and data analysis techniques and
workflows will be beneficial, but not essential.

Arenas where core.logic dominates [soft]

Community Wiki
I don't care about the reputation points, I just want good answers. Feel free to remark this question as community wiki.
Context
I'm been working through The Reasoned Schemer, and have found the following observations:
Logic programming is very interesting.
Logic programming is sometimes counter-intuitive
Logic programming is often "inefficient" (or at least the code I write).
It seems like in going from
Assembly -> C++, I "give up" control of writing my own machine code
C++ -> Clojure, I give up control of memory management
Clojure -> core.logic/prolog/minikanren, I lose partial control of how computations are done
Question:
Besides (1) solving logic puzzles and (2) type inference, what are the domains of problems that logic programming dominates?
Thanks!
Constraint logic programming can be really useful for solving various scheduling, resource allocation, and other nontrivial constraint satisfaction / combinatorial optimization problems. All you have is declarative: the constraints (e.g. only one aircraft can be on the runway at a time), and maybe something you want to minimize/maximize (throughput/waiting).
There are various well-known flavors of that in Prolog, including CLP(FD), which works in finite integer domain, and CLP(R), which works in real domain. At least CLP(FD) seems to be in core.logic's immediate roadmap.
I believe such Prolog-derived solutions are actively being used in air traffic control and other logistics tasks, although it's hard to get precise info what technologies exactly such mission- and life-critical companies are using under the hood.
Research in artificial intelligence, and in particular cognitive robotics and other application of logic-based knowledge representation, are areas where Prolog is used a lot for its close relation to logic theory. This relation is very useful because it basically brings theory to life. Theorems can be proven on paper, and then implemented almost trivially in prolog and executed, and the executing programs have the proven properties. This allows for programs that are "correct by construction", which is the opposite of first writing programs and then trying to prove properties about them (as is done in formal methods, using, e.g., model checking).
The semantic web is another place where logic programming plays a growing role.

Is Communicating Sequential Processes ever used in large multi threaded C++ programs?

I'm currently writing a large multi threaded C++ program (> 50K LOC).
As such I've been motivated to read up alot on various techniques for handling multi-threaded code. One theory I've found to be quite cool is:
http://en.wikipedia.org/wiki/Communicating_sequential_processes
And it's invented by a slightly famous guy, who's made other non-trivial contributions to concurrent programming.
However, is CSP used in practice? Can anyone point to any large application written in a CSP style?
Thanks!
CSP, as a process calculus, is fundamentally a theoretical thing that enables us to formalize and study some aspects of a parallel program.
If you instead want a theory that enables you to build distributed programs, then you should take a look to parallel structured programming.
Parallel structural programming is the base of the current HPC (high-performance computing) research and provides to you a methodology about how to approach and design parallel programs (essentially, flowcharts of communicating computing nodes) and runtime systems to implements them.
A central idea in parallel structured programming is that of algorithmic skeleton, developed initially by Murray Cole. A skeleton is a thing like a parallel design pattern with a cost model associated and (usually) a run-time system that supports it. A skeleton models, study and supports a class of parallel algorithms that have a certain "shape".
As a notable example, mapreduce (made popular by Google) is just a kind of skeleton that address data parallelism, where a computation can be described by a map phase (apply a function f to all elements that compose the input data), and a reduce phase (take all the transformed items and "combine" them using an associative operator +).
I found the idea of parallel structured programming both theoretical sound and practical useful, so I'll suggest to give a look to it.
A word about multi-threading: since skeletons addresses massive parallelism, usually they are implemented in distributed memory instead of shared. Intel has developed a tool, TBB, which address multi-threading and (partially) follows the parallel structured programming framework. It is a C++ library, so probably you can just start using it in your projects.
Yes and no. The basic idea of CSP is used quite a bit. For example, thread-safe queues in one form or another are frequently used as the primary (often only) communication mechanism to build a pipeline out of individual processes (threads).
Hoare being Hoare, however, there's quite a bit more to his original theory than that. He invented a notation for talking about the processes, defined a specific set of signals that can be sent between the processes, and so on. The notation has since been refined in various ways, quite a bit of work put into proving various aspects, and so on.
Application of that relatively formal model of CSP (as opposed to just the general idea) is much less common. It's been used in a few systems where high reliability was considered extremely important, but few programmers appear interested in learning (yet another) formal design notation.
When I've designed systems like this, I've generally used an approach that's less rigorous, but (at least to me) rather easier to understand: a fairly simple diagram, with boxes representing the processes, and arrows representing the lines of communication. I doubt I could really offer much in the way of a proof about most of the designs (and I'll admit I haven't designed a really huge system this way), but it's worked reasonably well nonetheless.
Take a look at the website for a company called Verum. Their ASD technology is based on CSP and is used by companies like Philips Healthcare, Ericsson and NXP semiconductors to build software for all kinds of high-tech equipment and applications.
So to answer your question: Yes, CSP is used on large software projects in real-life.
Full disclosure: I do freelance work for Verum
Answering a very old question, yet it seems important that one
There is Go where CSPs are a fundamental part of the language. In the FAQ to Go, the authors write:
Concurrency and multi-threaded programming have a reputation for difficulty. We believe this is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare's Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go's concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects. Experience with several earlier languages has shown that the CSP model fits well into a procedural language framework.
Projects implemented in Go are:
Docker
Google's download server
Many more
This style is ubiquitous on Unix where many tools are designed to process from standard in to standard out. I don't have any first hand knowledge of large systems that are build that way, but I've seen many small once-off systems that are
for instance this simple command line uses (at least) 3 processes.
cat list-1 list-2 list-3 | sort | uniq > final.list
This system is only moderately sized, but I wrote a protocol processor that strips away and interprets successive layers of protocol in a message that used a style very similar to this. It was an event driven system using something akin to cooperative threading, but I could've used multithreading fairly easily with a couple of added tweaks.
The program is proprietary (unfortunately) so I can't show off the source code.
In my opinion, this style is useful for some things, but usually best mixed with some other techniques. Often there is a core part of your program that represents a processing bottleneck, and applying various concurrency increasing techniques there is likely to yield the biggest gains.
Microsoft had a technology called ActiveMovie (if I remember correctly) that did sequential processing on audio and video streams. Data got passed from one filter to another to go from input to output format (and source/sink). Maybe that's a practical example??
The Wikipedia article looks to me like a lot of funny symbols used to represent somewhat pedestrian concepts. For very large or extensible programs, the formalism can be very important to check how the (sub)processes are allowed to interact.
For a 50,000 line class program, you're probably better off architecting it as you see fit.
In general, following ideas such as these is a good idea in terms of performance. Persistent threads that process data in stages will tend not to contend, and exploit data locality well. Also, it is easy to throttle the threads to avoid data piling up as a fast stage feeds a slow stage: just block the fast one if its output buffer grows too big.
A little bit off-topic but for my thesis I used a tool framework called TERRA/LUNA which aims for software development for Embedded Control Systems but is used heavily for all sorts of software development at my institute (so only academical use here).
TERRA is a graphical CSP and software architecture editor and LUNA is both the name for a C++ library for CSP based constructs and the plugin you'll find in TERRA to generate C++ code from your CSP models.
It becomes very handy in combination with FDR3 (a CSP refinement checker) to detect any sort of (dead/life/etc) lock or even profiling.

The role of scripting languages in game Programming

So I've been running into a debate at work about what the proper role of a scripting language is in game development. As far as I can tell there are two schools of thought on this:
1) The scripting language is powerful and full featured. Large portions of game code are written in the language and code is only moved into C++ when performance dictates that it's necessary. This is usually something like Lua or Unrealscript.
2) This scripting language is extremely limited. Almost all game code is in C++ and the language is only there to expose the underlying functionality to designers.
My frustration comes from seeing approach number two frequently abused, with large systems implemented in a language that does not have the features that make that code maintainable.
So I started out supporting approach number one, but after talking to some designers I realized that many of them seem to prefer number two, and its mostly programmers who prefer one.
So I'm still left wondering which approach is better. Am I just seeing bad code and blaming the tool instead of the programmer, or do we need really need a more complex tool?
The compile-link-run-test cycle for compiled C++ code when you're dealing with something as complex as a video game is very, very slow. Using a scripting engine allows you to make large functional changes to the game without having to compile the program. This is a massive, massive time savings. As you say, things that need optimization can be moved into C++ as required.
AAA engines are highly driven by scripting. Torque, used for Tribes II (and many other games!) is scripted, Unreal has Unrealscript and so on. Scripting isn't just for mods, it's key to efficient development.
I think designers need to see a language suitable for them. That's not negotiable: they have to spend their time designing, not programming.
If scripting allows fast development of product-worthy game code, then the programmers should be doing it too. But it has to be product-worthy: doing everything twice doesn't save time. So you have to keep scripting in its place. If a developer can script the inventory system in a week, or write it in C++ in a month, then you want full-featured scripting, if only to give you more time to hand-optimise the parts that might conceivably hit performance limits. So not the inventory, or the high-score table, or the key-mapping system, or high-level game logic like what the characters are supposed to be doing. That can all be scripted if doing so saves you any time at all to spend on speeding up the graphics and the physics: programmers need to be able to work out exactly what the bottlenecks are, and senior programmers can make a reasonable prediction what won't be.
Designers probably shouldn't know or care whether the programmers are using the scripting language to implement game code. They shouldn't have an opinion whether (1) is good or bad. So even if the programmers are right to want (1), why is this inconveniencing the designers? How have they even noticed that the scripting language is full-featured, if all they ever need is a fairly small number of standard recipes? Something has gone wrong, but I don't think it's that Lua is too good a language.
One possibility is that using the same scripting language for both, becomes an obstacle to drawing a sharp line between the interfaces used by designers, and the interfaces internal to the game code. That, as I said at the top, is not negotiable. Even if they're using the same scripting language, programmers still have to provide the designers with the functionality that they need to do their job. Just because designers use Lua to code up AI strategies and conversation trees, and Lua is full-featured language capable of writing a physics engine, doesn't mean your designers are expected to write their own physics engine. Or use more than 10% of Lua's capabilities.
You can probably do both (1) and (2), mind. There's no reason you can't give the programmers Lua, and the designers some miniature under-featured DSL which is "compiled" into Lua with a Lua script.
I think the balance you want is something to this effect: you want game logic in script, but game functionality in code.
One big big advantage of script is that you can set up waits easily. For instance:
enemy = GetObjectFromScene ("enemy01");
in 5 seconds { enemy.ThrowGrenadeAt (player); }
Just a contrived example. That kind of logic would be annoying to setup in C++. Script can make it easier to express this kind of logic, but you'd want the actual functionality (the functions it calls) to be in C++.
And script doesn't have to be slow. There are heavily scripted games running at 60fps on consoles, but it requires a good design and finding the right balance between your options 1 and 2 above.
I can't really see the argument that large amounts of scripted code is going to be superior to large amounts of C++. One could make the counter argument. I've seen terrible large projects written in scripting languages that started out with the scripting mentality--getting things done quick and dirty. This unfortunately doesn't scale well. There's really no way to make a code quality argument based solely on the programming language. People can write gobs of maintainable code in any language, compiled or scripted.
Anyway, its really impossible to know the line between your "approach 1" and "approach 2" without knowing where the performance bottlenecks are and what your users are going to most want to "script".
I would suggest an "approach 3" which is to do it all or in part in C++ and expose a clean SDK in C++. This would have the advantage of forcing you to write your code as if it were a user interface someone outside your organization has to use. It would hopefully cause it to be more maintainable AND have the added side effect of implementing your scripting interface for you. All your scripting interface would need to do now is forward to your SDK. Viola!
With this approach you avoid having to draw a line between the realm of what is "scripted" vs what is in C++. You and your users keep a choice to either add functionality in C++ with the SDK or use a scripting language.
As usual, I think the answer is "it depends."
There's NO WAY Halo 3 or Call of Duty 4 could have been based primarily on scripting. Top-rated AAA titles must, by definiton, push the envelope of any platform they touch. This just can't be done with scripting.
Casual games, however, are a different story. Major game eninges like Unity have lots of scripting built in.
There is also a market for mods. A solid game engine with a good scripting environment can be a platform for tweaks, derivations, and in some cases altogether new games. Counter Strike is my personal favorite example. I think you're limited in this case to FPS run-and-gun type games.
So, there is a place for scripting in games but it will probably stay in the small/casual game and modder's space.
One of the best examples of scripting in action is Civilisation IV. It uses Boost Python. As a result, it is horribly slow. A clear counterpoint to the statement that "code is moved into C++ when performance dictates that it's necessary".
The proper way is to design an architecture up fron, and decide which problems are computationally hard and which are complex to specify up front. Graphics, physics, pathfinding, instant feedback on input, text to speech - all clearly computationally hard. "Friendly or hostile stance", "defend or attack", "spend or save" type AI decisions are all complex to specify up front.
The conclusion is that you want to expose capable, but soulless actors to your scripting code. The scripting code describes what the actors should do, but the C++ code takes care of the how.
With regard to the compile-link cycle, it's important to have a proper modular architecture. When you are working on the pathfinding logic, you should never have to compile graphics code. Also, tools like Incredibuild can help speed up compile times. Big game firms probably already have a render farm, which can double up quite well as a compile farm.
Saying that games should be done just with C++ (or claiming that AAA titles are done like that) is just being ignorant. Most of the games nowadays are scripted in a way or another, be it proprietary scripting language (UnrealScript), generic language (Lua, python, C#, Java etc) or data-driven (xml, proprietary binary format etc). Just make some research and you'll find out that most engines employ scripting in one form or another.
I think the question of whether to have 1) powerful and full featured scripting language vs 2) extremely limited scripting language is approaching the problem from the wrong end. The scripting environment (language or data-driven) should be designed so that it best supports the process of creating the game. The expressiveness of the system is crucial here: the designers and scripters should be able to easily solve the problems they have without too much headaches while they shouldn't be exposed to too complex system that they have difficulties understanding and learning. Often programmers (myself included) tend to design systems by making assumptions how it is going to be used and trying to ensure that all features between earth and heaven could be implemented, resulting in complex systems. Being overwhelmed by this complexity, the designers end up using the minimum to get the work done.
For example in our game (Rochard) we designed a fully expandable and scriptable data-driven AI system, but the level designers ended up using just the stock AI patterns because they didn't have time or effort to utilize the AI system to its fullest extent. So in the end I just created a good UI for choosing those stock AI patterns easily.
I'd say there is no silver bullet in this matter.
So I started out supporting approach
number one, but after talking to some
designers I realized that many of them
seem to prefer number two, and its
mostly programmers who prefer one.
This should make obvious sense: if the scripting language is "powerful and full featured", there is an onus on the designers to have to create the systems, since this opportunity is available to them. On the other hand, if the scripting language only exposes small details of the hard-coded game, then the programmers have to create those systems, since designers cannot. I'm not saying each side is lazy, but obviously both have individual skills the project requires them to focus on (since nobody else can do them as effectively), meaning there is always an interest in getting someone else to do a given task if possible.
And following naturally on from this, the proper role will depend on how the human resources in your company are laid out, in conjunction with any performance requirements of your game. Once you have an idea of how many people will need any sort of scripting, you'll know how much of the game will require it, and therefore can decide how how wide or narrow the interface needs to be. This contributes to what the 'domain' of the language would be, as mentioned by onebyone.livejournal.com above.
(For what little it's worth, I'm a professional game developer and also the moderator of the Scripting Languages forum on Gamedev.net.)
The game development community already uses scripting as a very common way of building user interfaces and tuning AI responses. You can find lots of info on sites like Gamasutra. The interesting thing is that standard languages are starting to replace custom interpreters.
Two of the best scripting examples in AAA games that come to mind are World of Warcraft (uses Lua) and EvE Online (uses Python). EvE uses scripting on the server-side too: they have a significant investment in Stackless Python.
Update: Performance will essentially be a non-issue due to multicores. Even low end machines will end up with more cores than the display and model updates need. You might as well spend one of those cores running a scripting solution for the UI.
As has been said before; use scripting for game logic, and C++ for game functionality. An example would be scripting a game mode, but using C++ for rendering.
If we split a game into two parts: The engine and the actual game (design).
A solid top class game engine is most likely written in C/C++ and some people even optimize further with assembler code. You can do a lot of neat really fast stuff for graphics and physics with specific CPU instructions. There is no way to get performance when rendering large landscapes or multiple complex objects from a scripting language only.
When it comes to the game itself where are a lots of aspects that can be done in slower but higher level programming languages. AI and other game logic can be written in script with success. It can speed up development and it opens up for modding and community expansions in a simple way.
I think the poster here is correct to favor method #1. It is a cleaner method and will yield better long term results despite the complaints of designers. In the end the code will determine good game programming from mediocre (bad).