How we use the Charity Engine for a search of square numbers whose difference are again squares? - primes

We want find quadruples [w,x,y,z] where the difference of their squares are all again a square: z^2-y^2, z^2-x^2, z^2-w^2, y^2-x^2, y^2-w^2 and x^2-w^2 are all square numbers.
For this we plan to use the Charity Engine to seach for these quadruples in a distributed manner.
For a jump start or (partial) solution on how to approach, I would be very grateful.
What we have so far: We have already a solution based on C++ that searches for such quadruples (see this GitHub Link). And it searched already up to 2^34. However, a distributed search seems to be much more promising in terms of scalability.

Related

How to build a "Bill Of Materials" automatically

I'm looking to auto-generate a Bill Of Materials (BOM) in draw.io by using object ID's. This is because my plans contain many of the same or similar components making it difficult to visually count. So, ultimately I'd like to auto-generate a parts list. Firstly is this possible in draw.io? Secondly what would be the best method of execution?
Currently I'm converting my diagram to PDF and then using the find function to count the number of instances of the part number.
A point in the right direction would be most welcome.

How can I implement the search for square numbers whose difference are again squares using the GIMPS platform?

I would like to search for quadruples [w,x,y,z] where the difference of their squares are all again a square: z^2-y^2, z^2-x^2, z^2-w^2, y^2-x^2, y^2-w^2 and x^2-w^2 are all square numbers.
For this I would like to use the GIMPS (Great Internet Mersenne Prime Search) plattform for a distributed search.
I would be very grateful for a jump start or solution on how I can implement such an application.
What we have so far: We have already a solution based on C++ that searches for such quadruples (see this GitHub Link). And it searched already up to 2^34. However, a distributed search seems to be much more promising in terms of scalability.

Find and remove quotation indicators in Wikipedia text using regular expression

During my text editing, I face a lot of quotation indicators, and I'm sure there is way to make my life simple and faster in handling these.
For instance, I only want to remove indicators such as [1], [2], [3]. Is there ways to do using regular expression?
Here's an example text that I'm working with:
Blaise Pascal designed and constructed the first working mechanical calculator, Pascal's calculator, in 1642.[2] In 1673, Gottfried Leibniz demonstrated a digital mechanical calculator, called the Stepped Reckoner.[3] He may be considered the first computer scientist and information theorist, for, among other reasons, documenting the binary number system. In 1820, Thomas de Colmar launched the mechanical calculator industry[note 1] when he released his simplified arithmometer, which was the first calculating machine strong enough and reliable enough to be used daily in an office environment. Charles Babbage started the design of the first automatic mechanical calculator, his Difference Engine, in 1822, which eventually gave him the idea of the first programmable mechanical calculator, his Analytical Engine.[4] He started developing this machine in 1834 and "in less than two years he had sketched out many of the salient features of the modern computer".[5]
The easiest way is to detect all square brackets and whatever is inside of them.
\[[^\[\]]+\]
Here is a demo.
But you didn't specify which language you want to use.
Please note that this solution assumes that there is no interesting text inside of the square brackets, only quotations. But I think it is a reasonable assumption of Wikipedia.

Testing if a string contains one of several thousand substrings

I'm going to be running through live twitter data and attempting to pull out tweets that mention, for example, movie titles. Assuming I have a list of ~7000 hard-coded movie titles I'd like to look against, what's the best way to select the relevant tweets? This project is in it's infancy so I'm open to any looking into any solution (i.e. language agnostic.) Any help would be greatly appreciated.
Update: I'd be curious if anyone had any insight to how the Yahoo! Placemaker API, solves this problem. It can take a text string and return a geocoded JSON result of all the locations mentioned in it.
You could try Wu and Manber's A Fast Algorithm For Multi-Pattern Searching.
The multi-pattern matching problem lies at the heart of virus scanning, so you might look to scanner implementations for inspiration. ClamAV, for example, is open source and some papers have been published describing its algorithms:
Lin, Lin and Lai: A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning (a variant of Wu-Manber; the paper is behind the IEEE paywall).
Cha, Moraru, et al: SplitScreen: Enabling Efficient, Distributed Malware Detection
If you use compiled regular expressions, it should be pretty fast. Maybe especially if you put lots of titles in one expression.
Efficiently searching for many terms in a long character sequence would require a specialized algorithm to avoid testing for every term at every position.
But since it sounds like you have short strings with a known pattern, you should be able to use something fairly simple. Store the set of titles you care about in a hash table or tree. Parse out "string1" and "string2" from each tweet using a regex, and test whether they are contained in the set.
Working off what erickson suggested, the most feasible search is for the ("is better than" in your example), then checking for one of the 7,000 terms. You could instead narrow the set by creating 7,000 searches for "[movie] is better than" and then filtering manually on the second movie, but you'll probably hit the search rate limit pretty quickly.
You could speed up the searching by using a dedicated search service like Solr instead of using text parsing. You might be able to pull out titles quickly using some natural language processing service (OpenCalais?), but that would be better suited to batch processing.
For simultaneously searching for a large number of possible targets, the Rabin-Karp algorithm can often be useful.

Best approach for doing full-text search with list-of-integers documents

I'm working on a C++/Qt image retrieval system based on similarity that works as follows (I'll try to avoid irrelevant or off-topic details):
I take a collection of images and build an index from them using OpenCV functions. After that, for each image, I get a list of integer values representing important "classes" that each image belongs to. The more integers two images have in common, the more similar they are believed to be.
So, when I want to query the system, I just have to compute the list of integers representing the query image, perform a full-text search (or similar) and retrieve the X most similar images.
My question is, what's the best approach to permorm such a search?
I've heard about Lucene, Lemur and other indexing methods, but I don't know if this kind of full-text searchs are the best way, given the domain is reduced (only integers instead of words).
I'd like to know about the alternatives in terms of efficiency, accuracy or C++ friendliness.
Thanks!
It sounds to me like you have a vectorspace model, so Lucene or a similar product may work well for you. In general, an inverted-index model will be good if:
You don't know the number of classes in advance
There are a lot of classes relative to the number of images
If your problem doesn't fit these criteria, a normal relational DB might work better, as Thomas suggested. If it meets #1 but not #2, you could investigate one of the "column oriented" non-relational databases. I'm not familiar enough with these to tell you how well they would work, but my intuition is that you'll need to replicate a lot of the functionality in an IR toolkit yourself.
Lucene is written in Java and I don't know of any C++ ports. Solr exposes Lucene as a web service, so it's easy enough to access it that way from whatever language you choose.
I don't know much about Lemur, but it looks like it has a similar vectorspace model, and it's written in C++, so that might be easier for you to use.
You can take a look at Lucene for image retrieval (LIRE) here: http://www.semanticmetadata.net/2006/05/19/lire-lucene-image-retrieval-04-released/
If I'm mistaken, you are trying to implement a typical bag of words image retrieval am I correct? If so you are probably trying to build an inverted file index. Lucene on its own is not suitable as you probably have already realized as it index text instead of numbers. Using its classes for querying the index would also be a problem as it is not designed to "parse" (i.e. detect keypoints, extract descriptors then vector-quantize them) image into the query vector.
LIRE on the other hand have been modified to index feature vectors. However, it does not appear to work out of the box for bag of words model. Also, I think I've read on the author's website that it currently uses brute force matching rather than the inverted file index to retrieve the images but I would expect it to be easier to extend than Lucene itself for your purposes.
Hope this helps.