As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Let's say I want to aggregate information related to a specific niche from many sources (could be travel, technology, or whatever).
How would I do that?
Have a spider/crawler who will crawl the web for finding the information I need (how would I tell the crawler what to crawl because I don't want to get the whole web?)?
Then have an indexing system to index and organize the information I crawled and also be a search engine?
Are systems like Nutch lucene.apache.org/nutch OK to be used for what I want? Do you recommend something else?
Or can you recommend another approach?
For example, how Techmeme.com is built? (it's an aggregator of technology news and it's completely automated - only recently they added some human intervention).
What would it take to build such a service?
Or how do Kayak.com aggregate their data? (It's a travel aggregator service.)
This all depends on the aggregator you are looking for.
Types:
Losely defined - Generially this requires for you datasource to be very flexible about determining the type of information gathers (answers the question of is this site/information Travel Related? Humour? Business related? )
Specific - This relaxes a requirement in the data storage that all of the data is specificially travel related requires for flights, hotel prices, etc.
Typcially an aggregator is a system of sub programs:
Grabber, this searches and grabs all of the content that is needed to be summarized
Summerization- this is typically done through queries to the db and can be adjusted based on user preferences [through programming logic]
View - this formats the information for what the user would like to see and can respond to feedback on the user's likes or dislikes of the item suggested.
For a basic look - check out this: http://en.wikipedia.org/wiki/Aggregator
It will give you an overview of aggregators in general.
In terms of how to build your own aggregator if you're looking for something out of the box that can get you content that YOU want - I'd suggest this: http://dailyme.com/
If you're looking for a codebase / architecture to BUILD your own aggregator-service - I'd suggest looking at something straight forward - like: Open Reddit from http://www.reddit.com/
You need to define what your application is going to do. Building your own web crawler is a huge task as you tend to keep adding new features as you find you need them... only to complicate your design, etc...
Building an aggregator is much different. Whereas a crawler simply retrieves data to be processed later, an aggregator takes already defined sets of data and puts them together. If you use an aggregator, you will probably want to look for already defined travel feeds, financial feeds, travel data, etc... An aggregator is easier to build IMO, but it's more constrained.
If you, instead, want to build a crawler you'll need to define starting pages, define ending conditions (crawl depth, time, etc...) and so on and then still process the data afterwards (that is aggregate, summarize and so on).
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So in the past, the website I work on has relied on a single layer architectural that mixes data access directly with application and view code. I've recently convinced the team of the advantages of implementing a Data Access Service Layer (API); Especially with recent talk of horizontal scaling (native mobile applications, etc).
Right now the implementation that comes to mind is involves using Entity Framework to map the database to Data Contract classes which the client will request using a WCF service.
I've used this approach in the past on a smaller scale but now with the scope of a large collection of data objects, each with numerous criteria they may be queried off, I'm having problems envisioning how to structure the API.
Example List of data classes:
Product
Merchant
Brand
Taxonomy
Coupon
User
Reviews
Questions
Answers
Etc
(hopefully the way these objects are related is obvious enough to make my points)
Service Requirements
Service must be language independent (.net, php, java, objective c)
Pattern must allow for several different datasources acting as one api (say our users are stored in a MSSQL server, our Reviews system in MySQL, and our products come from an XML feed)
Must be able to implement object caching from the API side
Each of these data objects essentially needs to be queryable based on several of it's columns (to either return a single object or a collection). While I could write a new API method for each of the different scenarios, I'm thinking there must a more elegant way of doing this.
Example Requests:
Get single Product by id.
Get list of products from a specific merchant, brand, or taxonomy ordered by 'create date', 'price', or 'percent-off'.
Get all Reviews Submitted by a User
Get all Reviews of a Product
...
In my research I came across this MSDN Article outlining several standard approaches for creating an API. I would be interested in hearing advantages and disadvantages of each, as well as which approach seems to fit the model I've described above best.
Essentially there are many patterns that do exactly this.
UI Layer => Data Access layer => Core layer
EF used in DAL, The model and Logic in CORE.
Core declares interfaces and models for Other other layers to use.
Do some reading on "Repository pattern" Unit of Work Pattern" Inversion of Control optional extra.
If you are prepared to put 50USD on the table, there are some excellent video tutorials on http://Pluralsight.com to walk you through these concepts. Even sample apps etc. The 50USD is well spent in my view (1 month subscription).
I particular recommend the video on pluralsight by Julie Lerman on EF and Enterprise architecture.
Some very good tips there.
I use such a pattern. And it works very well.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
There are many, many, many options out there, but I still couldn't settle with one. Ones that looks nice to me so far include wordpress getshopped, opencart and magento. But they all need to install and configure and try out for many different features, which I still don't even know if I'm going to need or not. That is the "solution" part of my question, since I'll also need a host able to handle it and, preferably, supporting the platform updates.
Features I do need are:
a ticket system (which opencart sure doesn't have built in), so I can customize a custom made product for instance, and;
a customizable user interface, as easy and as much as possible. In here I like to take squarespace as an example. Really easy to customize. In fact, it would be great if the shopping cart would offer similar drag n' drop features.
I don't care if it's .net, php, gae or python. Actually, that's about my reversed order of preference language-wise, python being preferred. I care a lot more if it's easy to support, modify and migrate if needed (of host, platform, database, whatever). Also I do want a way to try it out hassle free. Open source is always better but not necessarily best.
TLDR: What's the best shopping cart out there that can be used to sell services rather than products?
I just went through the process of installing and trying several carts for a project that I was working on. As Pierre says above, "There is no best shopping cart, however there is one best for your specific need" That is a very truthful statement.
My project was for an on line soap company that has 5 different categories with 5 or so variations each. Not a big store and not one that changes inventory often.
I tried the following carts: PrestaShop, Zen Cart, Magento, getshopped and phpurchase.
My findings were that for a small store, PrestaShop, Zen Cart and Magento are a bit overkill. For a small shop, getshopped and phpurchase are better fits.
Out of the 3 big shop solutions, I felt that Zen Cart is really hard to make look nice. It has a 90's vibe about the template that it comes with and takes a lot of work to get around that. Magento and PrestaShop were really cool. PrestaShop seems very UK specific. It did not take Authorize.net and I think that there may be a plugin that you can get. Magento seems like a great solution for a larger store and I liked the backend admin interface.
I purchased getshopped plugin and integrated it into my Wordpress site (I purchased the Authroize.net integration gold cart level) I had such trouble dealing with the multiple bugs that I found riddled through the code base. I looked at their forum and many people who had similar issues were not responded to. Alot of people were as frustrated as me. I tried customer support - no response. I asked for a refund, no response. Basically, Get Shopped was a complete waste of time and money.
I then found Phpurchase. The customer support person, Lee Blue was really nice - Lee answered my emails morning, noon and night. Lee is literally the nicest customer support person I've ever worked with! - so helpful. The code worked just as specced - no troubles and no complaints. I'm a very happy customer with phpurchase. If I need a small ecommerce site in the future, I will use that solution again, for sure.
Note, I'm not an affiliate of Phpurchase or have any type of financial gain by recommending them, I just had such a rough time with getShopped and such a wonderful experience with Phpurchase!
There is no one best pizza, best soda, best father, best website,...
There is no best shopping cart.
However, there is one best for your specific need.
To find it, testing them all is the only effective solution I know.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Can you recommend a full-text search engine? (Preferably open source)
I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:
Skip common words, such as the, of, and, etc.
Support stemming, i.e. search for run also finds documents containing runner, running and ran.
Be able to update its index in the background as new documents are added to the database.
Be able to provide search word suggestions (like Google Suggest)
Have a well-documented API
To illustrate, assume the database has just two documents:
Document 1: This is a test of text search.
Document 2: Testing is fun.
The following words should be in the index: fun, search, test, testing, text. If the user types t in the search box, I want the application to be able to suggest test, testing and text (Ideally, the application should be able to query the search engine for the 10 most common search words starting with t). A search for testing should return both documents.
Other points:
I don't need multi-user support
I don't need support for complex queries
The database resides on the user's computer, so the indexing should be performed locally.
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).
Also check out Sphinx
You can use Clucene for c/c++ and sphider for php. both are free but take time to setup and use, but not difficult to understand.
I have use with very success the dtSearch module.
They have a dll, that you can use with your application to index just anything and do more than the one you ask.
Note: Is not free.
I do not see in question that you ask for free one, so I write my favor one.
The dtSearch have inspire me and I create an indexer for my language Ellinika for my sites, because did not found what I was looking for my language.
There are some modules just for steeming if you just need to find suggestions for your words, I have get reference from here http://tartarus.org/~martin/PorterStemmer/
For example if you have a database like ms sql that all ready do some basic indexing, and some one search for a word, and you do not find nothing, you can do by your self steeming on this word, and search again...
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need a note taking wiki-like "super" application. I'll start with a rundown of applications that I've already evaluated and/or used:
Wikidpad
Pros:
fast switching between the edit and view modes;
nice syntax (especially for pasting code snippets or just raw ASCII text, nice indenting visual clues);
it is standalone application that don't require server;
the wiki pages can be kept in flat text database;
easy drag-and-drop of file attachments (especially for image files).
Cons:
doesn't have history/version control of the pages and the state of the wiki database as a whole;
doesn't have the concept of namespaces for the wiki pages;
MoinMoin wiki
Pros:
nice syntax;
have standalone server (Python based) which makes it truly portable and standalone;
keeps the pages in flat files;
have a lots of nice plugins;
Cons:
its a wiki == slow iterations of editing/taking notes, viewing, rince-repeat...
doesn't have version control integration
Trac
Pros:
All of the features of the MoinMoin wiki, except the flat file database;
Version control integration: I can use the wiki changeset feature and the wiki pages as metadata of my personal codebase;
Cons:
All of the general drawbacks of the wikis;
Not truly portable;
todolist2 (by AbstractSpoon)
Pros:
fast, standalone todolist manager;
the tasks have this really nice and important for me feature of having an rich edit box for taking notes associated with the task with flipping between the task and the notes with a single key;
time tracking for the tasks;
Cons:
doesn't have version control built-in (it has "simple" version control by just making an automatic backup copies of the project/data file with time stamp embedded in its name).
it's hard to filter the tasks by urgency (in the GTD terms, it doesn't have the concept of the containers of tasks: Inbox, Maybe, Next action for each project, etc).
it doesn't have cross-referencing/linking between the tasks in wiki-like fashion.
Thinking Rock
Pros:
implements GTD almost perfectly;
it has notes for every action;
portable;
Cons:
(Maybe because of the Java GUI) doesn't have simple Undo when editing text notes;
it's clunky when switching between the projects/actions tree and the editable notes editbox;
doesn't have version control;
MonkeyGTD/TiddlyWiki
Pros:
truly standalone
almost 100% wiki
nice GTD implementation
Cons:
it's little confusing when there is no easy or user-friendly way to see an overview of the current structure of the wiki pages
I'm not sure if it scales well when there is a lots of pages/data/text/attachments.
doesn't have source control integration;
I'm not sure about version control/pages history...
I want an application that has the following:
the speed and the ease of edit/preview iteration cycle of wikidpad.
the wiki pages and the associated attachments as they are (like wikidpad and MoinMoin).
version control for the wiki pages (like MoinMoin or Trac).
source control integration (like Trac).
time tracking like todolist2 and the task/project nesting like todolist2 and ThinkingRock.
the almost perfect GTD implementation of ThinkingRock or MonkeyGTD.
It's obvious that I haven't decided which one to use because for some reason my requirements are somehow orthogonal in the terms of the features that the aforementioned applications provide... not that the features are orthogonal or it is impossible or impractical... actually, I think that maybe wikidpad is the closest to my ideal, which means that I could:
implement the features that I need (to add version control, GTD-life features/properties for the wiki pages themselves, source control integration), or
continue to search and evaluate, or
get some interesting and valuable opinions here.
Try ConnectedText: http://www.connectedtext.com/
ConnectedText has all the pros of Wikidpad and none of the cons. ConnectedText has a much superior query engine and contains semantic extensions not available in Wikidpad, and is much more stable.
Try KNote http://www.smartgoldfish.com/download.html.
I'm not sure you'll be able to find any application that meets all your requirements, but here is my shameless plug for a note-taking desktop application for Windows that works like a personal wiki:
- http://www.ppcsoft.com/blog/personal-wiki.asp
If you specify which feature(s) that is most important it is easier to tell which is more suitable ?
Any of the tools that save files as text can be added to your own version control system (which is better than using each tools' version control) that you could use for all your important documents.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Does anyone know of any existing packages or libraries that can be used to build a calendar in a django app?
A quick google search reveals django-gencal, which looks like exactly what you need. It would also be worth looking at the snippets under the calendar tag on Django Snippets at http://www.djangosnippets.org/tags/calendar/.
It seems that django-calendar has become django-agenda: http://github.com/dokterbob/django-agenda
Great Tipps
django-swingtime lives on
http://github.com/dakrauth/django-swingtime
The django-schedule code originally from thauber (thauber/django-schedule) has been forked and worked into the glamkit/glamkit-eventtools code for Galleries, Libraries, Museums and Archives. It has also been forked and updated by a variety of other folks, e.g. boskee/django-schedule, and my guess is that that might have fewer dependencies and be easier to integrate into another project. It says:
Django-schedule: A calendaring/scheduling application, featuring:
one-time and recurring events
calendar exceptions (occurrences changed or cancelled)
occurrences accessible through Event API and Period API
relations of events to generic objects
ready to use, nice user interface
view day, week, month, three months and year
project sample which can be launched immediately and reused in your project
See the github "network" tab for a graphical navigation from the point of view of a given branch to see how other branches relate to it (i.e. what is available for merging).
svn checkout http://django-calendar.googlecode.com/svn/trunk/ django-calendar-read-only
svn: URL 'http://django-calendar.googlecode.com/svn/trunk' doesn't exist
so google search may reveal, but it's no longer exists.
There is another calendar alternative here, Django Event Calendar from 3captus, that offers something a bit simpler. I'm trying it out now, but it looks like a better fit for me.
From the features list:
Full feature calendar display using python calendar class
Support month scrolling (forward or backward)
AJAX add, modify, delete GUI
Require mimimum knowledge of Django, should be a good compliment after you are done with django tutorial
(http://www.djangoproject.com/documentation/tutorial01/)
Calendar and Event class can be used in any python project
Full unit test included
There are also some calendar functions built into Python itself, you can see a simple implementation here.
Today I ran into django-swingtime. Worth checking out.