ACM Digital Library Search Problems - researchkit

I have problems using the ACM advanced search at https://dl.acm.org/search/advanced
I only want to search for abstracts, so I choose the abstract field in the "Search within" block.
To provide a simple example, I use the following search query:
change impact AND software AND graph AND algorithm
With that, I got 179 Results.
I double checked the results by looking in the first abstract (Multi-perspective change impact analysis using linked data of software engineering) and everything was fine. But in the second abstract (Change impact analysis for aspect-oriented software evolution) the terms for "graph" and "algorithm" are missing.
I did some testing with something like:
change impact AND (software OR artifact) AND scenario
and got results where the "scenario" term is missing in the abstract.
My original search term is way more complex than that simple example.
So what I am missing or is the search engine broken?

Related

Is there a tool & tip & method for existing system analysis

I have received a legacy code from my company's previous project without any document and description left. The only part of these code I can recognize is Jetty for API. I can't even find the database it use yet 😛
Miserably, I probably need to do some modification of this system.
Is there a way&tools for figuring the component and relationship of this system? I mean like modeling a causal loop diagram or something by monitoring data interaction through running process and IPC etc.
I have used some previous knowledge of regular website forming langue (HTML) to locate particular syntax of code for modification.
but there should be a more general mythology and tools for analyzing the dynamic of cooperating process.

What form of testing should I perform?

I want to write an algorithm (a bunch of machine learning algorithms) in C/C++ or maybe in Java, possibly in Python. The language doesn't really matter to me - I'm familiar with all of the above.
What matters to me is the testing. I want to train my models using training data. So I have the test input and I know what the output should be and I compare it to the model's output. What kind of test is this? Is it a unit test? How do I approach the problem? I can see that I can write some code to check what I need checking but I want to separate testing from main code. Testing is a well developed field and I've seen this done before but I don't know the name and type of this particular kind of testing so that I can read up on it and not create a mess. I'd be grateful if you could let me know what this testing method is called.
Your best bet is watch the psychology of testing videos from the tetsing God http://misko.hevery.com/
Link of Misko videos:
http://misko.hevery.com/presentations/
And read this Google testing guide http://misko.hevery.com/code-reviewers-guide/
Edited:
Anyone can write tests, they are really simple and there is no magic to write a test, you can simply do something like:
var sut = new MyObject();
var res = sut.IsValid();
if(res != true)
{
throw new ApplicationException("message");
}
That is the theory of course these days we have tools to simplify the tests and we can write something like this:
new MyObject().IsValid().Should().BeTrue();
But what you should do is focus on writing testable code, that's the magic key
Just follow the psychology of testing videos from Misko to get you started
This sounds a lot like Test-Driven Development (TDD), where you create unit-tests ahead of the production code. There are many detailed answers around this site on both topics. I've linked to a couple of pertinent questions to get you started.
If your inputs/outputs are at the external interfaces of your full program, that's black box system testing. If you are going inside your program to zoom in on a particular function, e.g., a search function, providing inputs directly into the function and observing the behavior, that's unit testing. This could be done at function level and/or module level.
If you're writing a machine learning project, the testing and training process isn't really Test-Driven Development. Have you ever heard of co-evolution? You have a set puzzles for your learning system that are, themselves, evolving. Their fitness is determined by how much they confound your cases.
For example, I want to evolve a sorting network. My learning system is the programs that produce networks. My co-evolution system generates inputs that are difficult to sort. The sorting networks are rewarded for producing correct sorts and the co-evolutionary systems are rewarded for how many failures they trigger in the sorting networks.
I've done this with genetic programming projects and it worked quite well.
Probably back testing, which means you have some historical inputs and run your algorithm over them to evaluate the performance of your algorithm. The term you used yourself - training data - is more general and you could search for that to find some useful links.
Its Unit testing. the controllers are tested and the code is checked in and out without really messing up your development code. This process is also called a Test Driven Development(TDD) where your every development cycle is tested before going into the next software iteration or phase.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use (your "evaluation protocol", so to say) and tested your algorithm on unitary edge cases, you might be interested in ways to run your algorithm on several datasets and assert that the results are above a certain threshold (individually, or on average, etc.)
This tutorial explains how to do it within the pytest framework, that is the most popular testing framework within python. It is based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Fuzzy queries to database

I'm curious about how works feature on many social sites today.
For example, you enter list of movies you like and system suggests other movies you may like (based on movies that like other people who likes the same movies that you). I think doing it straight-sql way (join list of my movies with movies-users join with user-movies group by movie title and apply count to it ) on large datasets would be just impossible to implement due to "heaviness" of such query.
At the same time we don't need exact solution, approximate would be enough. I wonder is there way to implement something like fuzzy query to traditional RDBMS that would be fast to execute but has some infelicity. Or how such features implemented on real systems.
that's collaborative filtering, or recommendation
unless you need something really complex the slope one predictor is one of the more simple ones it's like 50 lines of python, Bryan O’Sullivan’s Collaborative filtering made easy, the paper by Daniel Lemire et al. introducing "Slope One Predictors for Online Rating-Based Collaborative Filtering"
this one has a way of updating just one user at a time when they change without in some cases for others that need to reprocess the whole database just to update
i used that python code to do predict the word count of words not occurring in documents but i ran into memory issues and such and i think i might write an out of memory version maybe using sqlite
also the matrix used in that one is triangular the sides along the diagonal are mirrored so only one half of the matrix needs to be stored
The term you are looking for is "collaborative filtering"
Read Programming Collective Intelligence, by O'Reilly Press
The simplest methods use Bayesian networks. There are libraries that can take care of most of the math for you.

java based calculation engine based on Open Office Calc

The requirement is to build a calculation engine which is performant and supports excel like formulas. These formulas need to be applied on huge data sets (millions of rows of data).
I was thinking if something could be built on top of OpenOffice Calc service and make it available as a Calculation Engine.
Does anyone have any experience in doing this ? Are there any other alternatives ? I know it is possible using Excel service but we are an Open Source shop. M$ is ruled out.
Any pointers would be very helpful.
Edited based on High Performance Mark's inputs.
Numerics calculations are needed. Scientific calculations are not in scope (ie., Sin(x), tanh(x) etc)
Calculation are not performed by end users. The formulas are stored in the DB and applied on the datasets. The formulas (like tax calculation) are configured. So if the formula changes, recalculation will be triggered via the application.
spreadsheet like formulas are well understood by wider audience and should be easier to read and maintain. Is there any wrapper around R (or such equivalent) that will convert spreadsheet formula into R syntax ?
Well, a little Googling finds several open-source Java-written spreadsheets, one of which may be suitable for your purposes. One of the questions you might want to answer, maybe edit your post, would be what calculations do you want to perform -- the full set of functionality that Excel provides (or something close) or would the facilities that SQL provides satisfy your requirements ? If so, then you might want to database this.
Another question you might clarify is this: are you trying to create an application which like Excel is usable by end-users for specifying calculations ? But, unlike Excel, is based on open-source software and can cope with millions of rows. I don't know about its performance on such large data sets, someone else on SO can probably tell us, but R is very popular (and rightly so) for what you are probably trying to do. My view is that R sits between the average programming languages (say Python) and the average spreadsheet (say Excel) in terms of ease-of-use-by-non-programmers.
Your choice of solution may (and certainly ought to) depend on who will be using it.

C++ code visualization

A sort of follow up/related question to this.
I'm trying to get a grip on a large code base that has hundreds and hundreds of classes and a large inheritance hierarchy. I want to be able to see the "main veins" of the inheritance hierarchy at a glance - not all the "peripheral" classes that only do some very specific / specialized thing. Visual Studio's "View Class Diagram" makes something that looks like a train and its sprawled horizontally across the screen and isn't very organized. You can't grok it easily.
I've just tried doxygen and graphviz but the results are .. somewhat similar to Visual Studio. I'm getting sweet looking call graphs but again too much detail for what I'm trying to get.
I need a quick way to generate the inheritance hierarchy, in some kind of collapsible view.
Why not just do it manually, it is a great learning experience when starting to work with a large code base. I usually just look at what class inherits from what, and what class contain what instances, references or pointers to other classes. Have a piece of paper next to you and get drawing...
Instead of going into the full Class Designer tool, just use the "Class View" or the "Object Browser" in Visual Studio - they present fully collapsible class heirarchies.
A good UML tool should do the trick.
Here is a list of generic UMl tools: http://en.wikipedia.org/wiki/List_of_UML_tools
There are lots out there, all with varying feature sets. Try playing with a few to see if you get the output you desire. If they free ones fail you, you might have to shell out for a good commercial grade UML tool
You can try CppDepend, it doesn't create a class hierarchy like Doxygen does but it can show 'the big picture' for your project, it also shows some code metrics.
I've had most success with valgrind and kcachegrind to do this. You run valgrind against your debugging binary, perform whatever actions your interested in, then import the output into kcachegrind to see everything you'd ever want to know about who called what, how often, and when. Plus, because your doing it dynamically, it catches cases that static analysis likely wont.
I've also had some success using Enterprise Architect's reverse engineering features, although this doesn't end up nearly as nicely (but you get a workable UML model which is nice!).
And finally, a tool called "Understand". This is pretty good at static OO analysis, but I think quite pricey and not that widely used.
Try Source Insight it is possible to configure the depth of the generated graph in this tool.
See also C/C++ call-graph utility for Windows platform
Check out SourceNavigator, it's open source, works on a bunch of platforms and has a Hierarchy Browser, a Class Browser, a Cross-Reference Browser and more that will allow you navigate and understand the code.
I'm using it for some time now especially when I have new code to go through and understand.
For a reasonably priced commercial product, you may want to check out SolidSX from Vizlogix (www.vizlogix.com). (If you are outside of North America, go to SolidSource -- www.solidsourceit.com.)
It generates a radial diagram that can be collapsed and expanded. It also integrates with Visual Studio (both BSC and .NET).
What's your definition of 'main vein'? You either want a graph reducer or skeletizer (you could find or write one and apply it to what Doxygen and the rest produce) or, 'main vein' has something to do with the function of the code and, I don't think an automated tool can help you with that. Unless you can point out to it 'These are the important bits that do input and output, show me only elements that are one or two steps away from the paths between these'. Hum, sounds like a cool tool to write :)
... the inheritance hierarchy, in some kind of collapsible view.
again, a sweet idea for a tool!