Can anyone suggest any techniques/concepts to compress word-embedding like Glove 300d?
As it increases the size of model.
A decent option to go through where deep com-positional code learning neural network is helpful to reduce word embedding size drastically.
This github link will help you and give more insights . it is quite beneficial in Sentimental Analysis tasks.
Need to test on more different NLP tasks.
I'm training a model for recognizing short, one to three sentence strings of text using the MITIE back-end in Rasa. The model trains and works using spaCy, but it isn't quite as accurate as I'd like. Training on spaCy takes no more than five minutes, but training for MITIE ran for several days non-stop on my computer with 16GB of RAM. So I started training it on an Amazon EC2 r4.8xlarge instance with 255GB RAM and 32 threads, but it doesn't seem to be using all the resources available to it.
In the Rasa config file, I have num_threads: 32 and set max_training_processes: 1, which I thought would help use all the memory and computing power available. But now that it has been running for a few hours, CPU usage is sitting at 3% (100% usage but only on one thread), and memory usage stays around 25GB, one tenth of what it could be.
Do any of you have any experience with trying to accelerate MITIE training? My model has 175 intents and a total of 6000 intent examples. Is there something to tweak in the Rasa config files?
So I am going to try to address this from several angles. First specifically from the Rasa NLU angle the docs specifically say:
Training MITIE can be quite slow on datasets with more than a few intents.
and provide two alternatives:
Use the mite_sklearn pipeline which trains using sklearn.
Use the MITIE fork where Tom B from Rasa has modified the code to run faster in most cases.
Given that you're only getting a single cores used I doubt this will have an impact, but it has been suggested by Alan from Rasa that num_threads should be set to 2-3x your number of cores.
If you haven't evaluated both of those possibilities then you probably should.
Not all aspects of MITIE are multi-threaded. See this issue opened by someone else using Rasa on the MITIE GitHub page and quoted here:
Some parts of MITIE aren't threaded. How much you benefit from the threading varies from task to task and dataset to dataset. Sometimes only 100% CPU utilization happens and that's normal.
Specifically on training data related I would recommend that you look at the evaluate tool recently introduced into the Rasa repo. It includes a confusion matrix that would potentially help identify trouble areas.
This may allow you to switch to spaCy and use a portion of your 6000 examples as an evaluation set and adding back in examples to the intents that aren't performing well.
I have more questions on where the 6000 examples came from, if they're balanced, and how different each intent is, have you verified that words from the training examples are in the corpus you are using, etc but I think the above is enough to get started.
It will be no surprise to the Rasa team that MITIE is taking forever to train, it will be more of a surprise that you can't get good accuracy out of another pipeline.
As a last resort I would encourage you to open an issue on the Rasa NLU GitHub page and and engage the team there for further support. Or join the Gitter conversation.
I want to extract features from images in MS COCO dataset using a fine-tuned VGG-19 network.
However, it takes about 6~7 seconds per image, roughly 2 hours per 1k images. (even longer for other fine-tuned models)
There are 120k images in MS COCO dataset, so it'll take at least 10 days.
Is there any way that I can speed up the feature extraction process?
Well, this is not just a command. First you must check whether your GPU is powerful enough to wrestle with deep CNNs. Knowing your GPU model can answer this question.
Second, you have to compile and build Caffe framework with CUDA and GPU-enabled (CPU_Only disabled) in the Makefile.config (or CMakeLists.txt).
Passing all required steps (installing Nvidia Driver, installing CUDA and etc.) you can build caffe for GPU-use. Then by passing the GPU_Device_ID in your command-line you can benefit from speed provided by them.
Follow this link for building Caffe using GPU.
Hope it helps
This ipython notebook example explains the steps to extract features out of any caffe model really well: https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb
In pycaffe, you can set gpu mode simply by using caffe.set_mode_gpu().
I am currently working on a requirement to analyze the images and extract text and contents for further analysis. The images need to be processed using Hadoop due to huge volume. I am looking for libraries that can fit well in MapReduce paradigm as well as give most flawless conversion possible. I am looking for not only open source but any commercial libraries that can fit into this requirement. Can the experts on this exchange guide me on what options I should choose for further analysis and prototyping?
I am currently analyzing Tesseract library but will like to see any better candidates for this requirement.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Greetings!
I'd like to build an apache web server, running on debian lenny.
It will primarily be used for hosting a web-shop, so it should have some light db i/o and lots of image serving (item previews/thumbs/etc...).
It's tough to put a finger on the exact number of concurrent requests that I'll get hit with, but I'd say that a non-professional setup should be enough to handle them.
By non-professional I mean that I don't need to invest into purchasing blades, a rack or something of the like. Just regular desktop PC tweaked for web server performance.
Which exposes my current problem: I have no idea whatsoever what kind of a machine I should be looking for.
If I'd like to build a gaming rig, no problem - there are at least a million sites out there with performance benches, from bleeding edge graphic card reviews to flat panel LCD contrast/response time charts. But when it comes to trying to find reccomendations for a web server based build, I'm having a hard time finding a good RECENT review.
So, at least I've managed to gather this so far - these are the priorities I should be attending to:
1) Lots of memory (preferably fast)
2) A pair of fast HDDs
3) As many cores as I can get
4) As fast processor as I can get
5) A MB with good I/O
So, memory and HDDs aren't that big of a deal, you can't go wrong here (I guess).
With the RAM prices these days, it's pretty affordable to pump 8+ Gb into a machine.
The only question here is, would it be worth it to buy a tiny (<=32 Gb) SSD and place all my web stuff and OS onto it. My entire web server is just a couple of megs in size + the database will fit really neatly onto it with space to boot.
As for the graphics card, I'll just plug in any old PCI Ex card I can whip up, and the same goes for any peripherals. I don't need a display of any kind - I'll be logging in remotely for most of the time.
OK - and now for the most important question: Which Proc and MB to buy.
As far as I've gathered - it would be better to have 10 cores running at 100 Mhz each than only one running at 2 Ghz, taking the nature of the machine into consideration.
So I'll most likely have to get a quad core, right? The question is which... :/
For there are several affordable... My budget is around US $800. This is, again, for just the proc, the MB, and the memory. I have the HDDs. If I take a small SSD, add $100 to that budget.
AMD Phenom or Intel Core 2? Which MB to go with it? I'm totally lost here.
If this will start an AMD vs. Intel flame war, I'm truly sorry, for this is not my intention - but if you could at least point me to a good recent review for a web server build I would ge grateful.
On the one hand you say you don't need that much performance but on the another you are talking about adding as many cores as you can. A quad core CPU of either AMD or Intel will be more than sufficient. It gets into the category of "religious war" but i prefer the Intel chips; I usually buy Xeon processors. As far as SSD, I wouldn't bother. Look into a good RAID setup with a 3Ware controller; either RAID 1+0 or RAID 5 (obviously, there will be a religious anti-RAID5 crowd, though I prefer it .. at least until RAID 6 is more widespread). As much memory as you can afford is ideal, although anything more than 8 is probably overkill from what you have said. Probably the main departure from what you have already listed is that I wouldn't even bother with the SSD. Depending on your usage patterns, you may actually hurt performance with it and any benefits for your use cases would not be worth the costs. Wait for the research to catch up for SSD to really be beneficial in terms of performance. :)
If this is a business server, I recommend buying one pre-configured from IBM, Dell, or whatever major manufactuer is your preference (I prefer IBM).
This is really a stretch for the "right" kind of question for SO. Only in degrees though "implementation."
Pre-configured "Server" machines can often be more cost-beneficial. But, if you'd still prefer to build your own...
Considering just your budget ($800) for MB, Proc, and Memory...
RAM - DDR2 800 ($200/4GB, and cheaper)
MB - 1333/1066MHz FSB ($250)
CPU - Dual Core ($150)
Quad Core can still be too expensive for the benefit -- but, that's up to you to judge.
But, follow the links, and use the Advanced Search to cross out unnecessary features, and you should be able to reduce the list of items fairly easily.
Have you considered shared, dedicated, or virtual hosting? If I were you, I'd go with SliceHost for the virtual server, then use Amazon S3 for serving up images and other large static files. The combination has worked well for me in the past. I've found that, especially when it comes to hosting, don't take on more responsibility than you absolutely have to.
I use MediaTemple for my websites. They have a lot of professional organizations hosted on there servers. I'd probably go with them if I were you.
My dad thought the server route would be easy and we found out differently the hard way. If you don't have a friend or an employee that really knows what he's doing, I'd be careful. Anyways, good luck.
If you're not planning on running the next Amazon, I'd say that your choice of CPU/chipset is irrelevant. Find a motherboard with the features you need (4+ RAM slots, plenty of SATA headers, etc) that suits your budget and then buy a upper midrange multicore CPU to suit. Get a PCI express RAID card and a meaty UPS too.
Get a vanilla hard drive for the OS, and a pair of fast drives (WD Velociraptors, etc) and put them in RAID 1 for the webserver for redundancy.
Then, after a year or so or restarting the server every other day, migrate everything to a hosting company.