I am attacking a combinatorial optimization problem similar to the multi-knapsack problem. The problem has an optimal solution, and i prefer not to settle for an approximate solution.
Are there any recommended tutorials regarding the quick prototyping and deployment of combinatorial optimization solutions (for senior software engineers that are also Big Data newbies)? I want to move quickly from prototype to deployment onto a docker cluster or AWS.
My background is in distributed systems (a focus on .NET, java, kafka, docker containers, etc...), thus I'm typically inclined to solve complex problems by parallel processing across a cluster of machines (via scaling on a docker cluster or AWS). However, this particular problem can NOT be solved in a brute force manner as the problem space is too large (roughly 100^1000 combinations are possible).
I've limited experience with “big data”, but I'm studying up on knapsack solvers, genetic algorithms, reinforcement learning, and some other AI/ML approaches. Given my limited exposure in this area, how would one recommend I tackle a problem such as this?
I tend to favor the approach of leveraging existing frameworks/libraries as much as possible. Good idea? Or would one recommend using Accord.Net or ML.Net or some other library to build a custom model?
If existing frameworks are the way to go, any particular favorites? tensorflow? Any thoughts on Google OR tools: https://developers.google.com/optimization/ Anything in the AWS space?
Any good tutorials, videos, or podcasts that can get me prototyping quickly? (keeping in mind my goal of deploying and validating the model on a docker cluster)
Thank you for any help and guidance!
The Cloud Balancing problem in OptaPlanner (open source, java) is a multi-knapsack problem. There's a tutorial for it in the user guide. Many users run OptaPlanner implementations on Docker (normal open JDK 8 image) and AWS. Here's an Employee Rostering implementation that is deployed to OpenShift Dedicated (which generates an docker image that it runs on AWS) - it exposes a REST api (which is Swagger documented even).
Thanks to all for your insight above. I’m having a look at optaplanner and google-OT, as well as a few other solvers.
To follow up on this question, if I were to relax the constraint that I want the optimal answer , and allow for “approximate” solutions , would this change your guidance or recommended tool set (libraries/frameworks) in any way?
Related
I do machine learning code development on an AWS SageMaker instance, and would like to make use of Emacs/Tramp†. My question: how, if at all, could that be done? I strongly suspect that some knowledge about security protocols / IAM roles, etc., would be important, but I am a mere SageMaker end-user.
A co-worker may be able to assist on some of these questions, but he is way over-subscribed. It is a big ask of him to indulge my personal preferences (however strong), so I start by posing the question, has anybody else has already solved this problem, or are there good starting points for consideration?
Already seen:
Martin Baillie's Emacs TRAMP over AWS SSM APIs, which seems light on key details, and may not be applicable for SageMaker environments
AWS's own Tutorial: Set Up PyCharm Professional with a Development Endpoint, which seems to be pretty specific to PyCharm, and which again may not be suitable for SageMaker environments.
†Why? Because I have a long lifetime of using emacs key bindings and macros, etc., that improve my efficiency greatly. Why not emacs within a terminal running on a SageMaker instance? That's what I'm doing now, but it leaves out important flexibility compared with a local windowed emacs client; the latter can be as tall or wide as my pixels permit, can have multiple frames simultaneously, wouldn't have as many networking latencies, etc.
I'm totally new with AWS Serverless architecture.
I was trying to generate the project architecture, and I read about AWS codestar and how it can Easily create new projects using templates for AWS Lambda using Python (which is my case)
But I didn't know if I should :
generate one project (the main project ) with AWS codestar and then
I create separate folders for every microservice I have
(UsersService, ContactService ...etc)
OR
every microservice can be generated via AWS Codestar so each
service is a separate codestar project for my lambdas ?
Maybe it's a very stupid question for some of you, please any help or usefull links are welcome.
Thanks
This is generally your decision over how you deploy, although I feel like the general consensus will be option 2. I'll try to explain why.
Option 1 is what you would call a Monolith, this means everything for your app is all in one place. This might initially seem great but has a few limitations which I've detailed below:
All or nothing deployments, if you update a tiny part of the app you need to deploy everywhere.
Leads to coupling between unrelated components, generally the design pattern can lead to overlapping changes that can cause breaking changes for other parts of your stack.
Harder to scale, you generally scale larger chunks (i.e. not search and book independently but everything all together).
You can mitigate against these but it can be a bit of a headache.
The second option leads more towards a Microservice/Decoupled Architecture.
Some of the benefits of this method are:
Only deploy the changes you've made, if the search service changes only deploy that.
Easier to scale infrastructure to meet specific demand.
Able to implement functional testing of the component easier.
Restrict access to users who develop specific components.
Option 2 is your microservice based repository setup, so I would suggest using this.
I don't have enough reputation to comment so I will post this as an answer.
What you are asking about is software architecture question, and whether or not to use a monorepo vs a polyrepo. You've already made the decision about microservices, so this is not a monolith.
The answer is.... it depends. There is no general consensus. Just do search on monorepo vs polyrepo (or multirepo) and be prepared to go down the rabbit hole.
Being a serverless application should have no bearing on the type of structure you decide on. However, CodeStar may have some limitations that make it more difficult to use a monorepo. I'm using the CDK for that.
Here are a couple of articles to get you started:
https://medium.com/#mattklein123/monorepos-please-dont-e9a279be011b
https://medium.com/#adamhjk/monorepo-please-do-3657e08a4b70
Here is another that pertains directly to serverless applications:
https://lumigo.io/blog/mono-repo-vs-one-per-service/
I have been using saltstack for a few years with bare metal server. Now we need to setup a whole new environment on AWS. I'd prefer to use saltstack set everything up because I like the orchestration of salt and the event based stuff, like beacons & reactors. Plus it's easy to write your own customised python module. We will also be running kubernetes clusters on EC2 instances. Can someone provide some best practices for using salt with AWS and k8s?
There’s a few reusable setups floating around, last I remember https://github.com/valentin2105/Kubernetes-Saltstack was the most complete of them. But all of them are less solid than tools closer to the community mainstream (kops, kubespray) so beware of weird problems. I would recommend going through Kubernetes The Hard Way just so you have some familiarity with the underlying components that make up Kubernetes so you’ll have a better chance of debugging them :)
I'm not the only one with this question, but haven't found a lot of information in my research so far, so help me out.
We are a small IT crowd in an organization. We're looking to build a small, private service that would emulate a heroku/gae workflow. The basics of this: deploy an app as a git repository, and have it scale in a 'cloud' environment. Basically, a platform as a service (Paas).
Pretend we are amateur PM's, programmers, and sysadmins tasked with this. What would you recommend? We know generally what is needed: some sort of routing, database, caching, authentication, etc. What other tools do we need?
We would prefer tools along a ruby/python/haskell/erlang dimension, on a linux/bsd stack, with postgres databases(couchdb or cassandra in the future). We are not touching anything in the ms/.net area, nothing on the JVM (We've looked at Steamcannon, but no; Scala and Clojure tools are not entirely out of the question). We have a basic grasp of bootstrapping a cloud (e.g. Eucalyptus) to build on. We have an understanding of the basics in server admin, and the physical infrastructure limitations aren't a factor right now.
We're not looking into why gaerokuyardspace is the best choice, a list of such services, why we should ditch our plans for one of these services, or an argument against this plan. For this situation the decision has been made that the cost to build privately is more attractive than the cost of deploying elsewhere. We already know why and how for these services. We're looking to emulate and build upon these for private needs.
A short list of tools to be expanded:
Beehive
Steamcannon
Gitosis/Gitolite
?
Basically, I'd like to generate a list of tools for building heroku/gae like service on a small, private, definitely experimental/toy level.
I don't know that it will meet all of your stated needs today, but you should take a look at Cloud Foundry from VMware. You can check the FAQ for the commercial project or look in to the Open Source version that you can host and manage yourself.
Some combination of Cloud Foundry (above) gitolite, and fabric
will probably do well for you. Any such solution will take some time to get right.
(Disclaimer: I'm a lead developer on the AppScale project)
AppScale is pretty much right up your alley, especially if you're looking to run Google App Engine apps in your own private cloud. It's open source, so grab it and extend it if there are other types of apps you want to support (and definitely commit it back to us if you do).
trying to pick version control, continuous integration, and host for Flex + Ruby or Django smallish project. Question:
version control: I've used SVN and CVS in the past. I hear great things about git. Not sure what to pick.
continuous integration: I've heard good things about hudson and cruiseControl. Not sure what to pick
hosting: is my own server the only way to go? Are the decent cloud options that are not too expensive? or should I look for some free hosting service?
thank you for your help!
f
Use Git.
Git is a great tool that allows a very flexible workflow. It has lots of benefits over subversion/cvs, the biggest of which is the ability to branch and merge seamlessly. This can't be overstated. The merge-hell that ensues when attempting to use svn's branching and merging is a thing of the past. For a better case on why to use git, check out http://whygitisbetterthanx.com/
Use Hudson.
Hudson is the easily the best CI tool in the game. The reason Hudson is the best is that its easy to configure (for one or multiple nodes), it has a ton of plugins, and handles the 90% use case extremely well. You are in the 90% use case. People like Mozilla aren't. Check out C. Titus Brown's talk at Pycon for more info. http://pycon.blip.tv/file/3259794/ (If you decide that Hudson isn't what you should use, check out buildbot)
Use Webfaction (or Rackspace Cloud).
Webfaction is a great starter ground. If your needs are low, check them out. Beyond that, I'd suggest taking a hard look at Rackspace Cloud (RSC). RSC makes scaling out much easier and their pricing model is very palatable for things that aren't bandwidth intensive (ie: most things that don't require tons of uploads/downloads). It starts at $10/mo. Their management console is good (save the DNS administration interface, but even that is more than bearable). If your needs expand beyond RSC (doubtful), you would do well to check out Amazon's EC2. Companies like RightScale can help when it comes to scaling out.