How Feast handle high traffic of my usecase - feast

Hello Feast Developers,
I want to implement Feast in my place to unify our features.
Currently we have usecase that given 100 product_id, sort from product that will likely clicked by user_id. It means and 1 request will hit Feast Serving API 100 times, and if the usecase RPS is 100 RPS, it means 10.000 RPS in Feast Serving API.
How can we make sure that Feast Serving API can handle all this request? Do Feast have solution to this problem?
Another concern is, can we separate the serving layer per usecase? Let say redis A is used by usecase A and redis B for usecase B. This is to make sure when 1 usecase have increased traffic, it won't affect the other.
Thank you for your answer ^^

Feast is already capable of supporting this use case. Each retrieval request to the online serving API can take a collection of entities, meaning you can send a list of products or users, or both. Feast will enrich those entities with feature values and respond.
Feast, however, does not currently support wildcard queries. You need to provide all entities, and Feast will look up the appropriate features.
In terms of having a serving layer per use case: That's totally possible. How most teams do this is to have a single repository for feature definitions, but have different Feast serving deployments (online store) for each use case and production system.

Related

Best practice for pre-warming cache by calling the API ahead of time

Asked by thomhlee, 0 votes
We have a service that is expected to not be able to meet latency requirement mainly due to downstream dependency latency.
We want some kind of a pre-fetch mechanism. What we decided we want to add cachcing (Doesn't have to be in-memory DB) ability, which can be manually warmed up ahead of time so when later fetched, it can be fetched quickly from cache instead of calling downstream services.
Which approach is better? We also have potential use cases to react to SNS events to pre-warm the cache ahead of time.
Approach #1: Use the same Fetch API to pre-warm cache.
Pros:
We only need to implement 1 API
Cons:
We cannot distinguish requests from pre-warming caching from requests that wants to fetch data.
One could throttle the other, this is a high TPS service
Approach #2: Have a separate API to request a pre-fetch, but use the normal fetch API later to get the data.
Pros:
One API will not affect the other API
Separation of concerns
Cons:
More development work / possible more infrastructure
Thanks!

Microservice granularity: Per domain model or not?

When building a microservice oriented application, i wonder what could be the appropriate microservice granularity.
Let's image an application consisting of:
A set of various resources types where each resource map a given business model. (ex: In a todo app resources could be User, TodoList and TodoItem...)
Each of those resources are saved within a NoSQL database that could be replicated.
Each of those resources are exposed through a REST Api
The application manage an internal chat room.
An Api gateway for gathering chat room and REST api interaction.
The application front end: an SPA application connected to the API Gateway
The first (and naive) approach when thinking about how microservices could match the need of this application would be:
One monolith service for managing EVERY resources and business logic:
By managing i mean providing the REST API for all of those resources and handling the persistance of those resources within the database.
One service for each Database replica.
One service providing the internal chat room using websocket or whatever.
One service for Authentification.
One service for the api gateway.
One service serving the static assets for the SPA front end.
An other approach could be to split service 1 into as many service as business models exist in the system. (let's call those services resource services)
I wonder what are the benefit of this second approach.
In fact i see a lot of downsides with this approach:
Need to setup an inter service communication process.
When requesting a service representing resource X that have a relation with resource Y, a lot more work are needed (i.e: interservice request)
More devops work.
More difficulty to share common code between resource services.
Where to put business logic ?
When starting a fresh project this second approach seams to me a bit of an over engineered work.
I feel like starting with the first approach and THEN split the monolith resource service into several specific services depending on the observed needs will minimize the complexity and risks.
What's your opinions regarding that ?
Is there any best practices ?
Thanks a lot !
The first approach is not microservice way, by definition.
And yes, idea is to split - each service for Bounded Context - One for Users, one for Inventory, Todo things etc etc.
The idea of microservices, at very simple, assumes:
You want to pay extra dev-ops work for modularity, and complete/as much as possible removal of dependencies between different bounded contexts (see dev/product/pjm teams).
It's idea lies around ownership, modularity, allowing separate teams develop their own piece of code, without requirement from them to know the rest of the system . As long as there is Umbiqutious Language (common set of conventions/communication protocols/terminology/documentation) they can work in completeley isolated, autotonmous fashion.
Maintaining, managing, testing, and develpoing become much faster - in cost of initial dev-ops and sophisticated architecture engeneering investment.
Sharing code should be minimal, and if required, could be done to represent the Umbiqutious Language (common communication interface/set of conventions). Sharing well-documented code, which acts as integration/infrastructure mini-framework, and have special dev/dev-ops/team attached to it ccould be easy business, as long as it, as i said, well-documented, and threated as separate architecture-related sub-project.
Properly engeneered Microservice architecture could lessen maintenance and development times by huge margin, but it requires quite serious reason to use it (there lot of reasons, and lots of articles on that, I wont start it here) and quite serious engeneering investment at start.
It brings modularity, concept of ownership, de-coupling of different contexts of your app.
My personal advise check if you really need MS architecture. If you can not invest engenerring though and dev-ops effort at start and do not have proper reasons for such system - why bother?
If you do need MS, i would really advise against the first method. You will develop wrong thing's, will miss the true challenges of MS, and could end with huge refactor, which could take more work than engeneering MS system from start properly. It's like to make square to make it fit into round bucket later.
Now answering your question title: granularity. (your question body bit different from your post title).
Attach it to Domain Model / Bounded Context. You can make meaty services at start, in order to avoid complex distributed transactions.
First just answer question if you need them in your design/architecture?
If not, probably you did a good design.
Passing reference ids between models from different microservices should suffice, and if not, try to rethink if more of complex transactions could be avoided.
If your system have unavoidable amount of distributed trasnactions, perhaps look towards using/making some CQRS mini-framework as your "shared code infrastructure component" / communication protocol.
It is the key problem of the microservices or any other SOA approach. It is where the theory meets the reality. In general you should not force the microservices architecture for the sake of it. This should rather naturally come from functional decomposition (top-down) and operational, technological, dev-ops needs (bottom-up). First approach is closer to what you would need to do, however at the first step do not focus so much on the technology aspect. Ask yourself why would you need to implement a separate service for particular business function. Treat it as a micro-application with all its technical resources. Ask yourself if there is reason to implement particular function as a full-stack app.
Some, of the functionalities you have mentioned in scenario 1) are naturally ok, such as 'authentication' service - this is probably good candidate.
For the business functions decomposition into separate service, focus on the 'dependencies' problem, if there are too many dependencies and you see that you have to implement bigger chunk of data mode - naturally this is not a micro service any more.
Try to put litmus test , if you can 'turn off' particular functionality and the system still makes sense - it is the candidate for service or further decomposition

Microservices Architecture: Cross Service data sharing

Consider the following micro services for an online store project:
Users Service keeps account data about the store's users (including first name, last name, email address, etc')
Purchase Service keeps track of details about user's purchases.
Each service provides a UI for viewing and managing it's relevant entities.
The Purchase Service index page lists purchases. Each purchase item should have the following fields:
id, full name of purchasing user, purchased item title and price.
Furthermore, as part of the index page, I'd like to have a search box to let the store manager search purchases by purchasing user name.
It is not clear to me how to get back data which the Purchase Service does not hold - for example: a user's full name.
The problem gets worse when trying to do more complicated things like search purchases by purchasing user name.
I figured that I can obviously solve this by syncing users between the two services by broadcasting some sort of event on user creation (and saving only the relevant user properties on the Purchase Service end). That's far from ideal in my perspective. How do you deal with this when you have millions of users? would you create millions of records in each service which consumes users data?
Another obvious option is exposing an API at the Users Service end which brings back user details based on given ids. That means that every page load in the Purchase Service, I'll have to make a call to the Users Service in order to get the right user names. Not ideal, but I can live with it.
What about implementing a purchase search based on user name? Well I can always expose another API endpoint at the Users Service end which receives the query term, perform a text search over user names in the Users Service, and then return all user details which match the criteria. At the Purchase Service, map the relevant ids back to the right names and show them in the page. This approach is not ideal either.
Am I missing something? Is there another approach for implementing the above? Maybe the fact that I'm facing this issue is sort of a code smell? would love to hear other solutions.
This seems to be a very common and central question when moving into microservices. I wish there was a good answer for that :-)
About the suggested pattern already mentioned here, I would use the term Data Denormalization rather than Polyglot Persistence, as it doesn't necessarily needs to be in different persistence technologies. The point is that each service handles its own data. And yes, you have data duplication and you usually need some kind of event bus to share data across services.
There's another option, which is a sort of a take on the first - making the search itself as a separate service.
So in your example, you have the User service for managing users. The Purchases services manages purchases. Each handles its own data and only the data it needs (so, for instance, the Purchases service doesn't really need the user name, only the ID). And you have a third service - the Search Service - that consumes data produced by other services, and creates a search "view" from the combined data.
It's totally fine to keep appropriate data in different databases, it's called Polyglot Persistence. Yes, you would like to keep user data and data about purchases separately and use message queue for sync. Millions of users seems fine to me, it's scalability, not design issue ;-)
In case of search - you probably want to search more than just username, right? So, if you use message queue to update data between services you can also easily route this data to ElasticSearch, for example. And from ElasticSearch perspective it doesn't really matter what field to index - username or product title.
I usually use both approaches. Sometimes i have another service which is sitting on top on x other services and combines the data. I don't really like this approach because it is causing dependencies and coupling between services. So in general, within my last projects we tried to stick to polyglot persistence.
Also think about, if you need to have x sub http requests for combining data in some kind of middleware service, it will lead you to higher latency. We always try to cut down the amount of requests for one task and handle everything what is possible through asynchronous queues. ( especially data sync )
If you conceptualize modules as the owners and controllers of the data they work on, then your model must also communicate that data out of that module to others. In contrast, the modules in a manufacturing process have the access to change data without possessing and controlling it.
Microservices is an architecture for distributed processing, like most code, where modules pass the data around to work on it. From classic articles by Harvard Business Review and McKinsey on the subject of owning members of a supply chain, I identified complexities arising from this model and wrote an article teaching programmers what you need to know: http://www.powersemantics.com/p.html
Manufacturing is an architecture for integrated processing, where modules work on the data without passing it around from point to point. This can be accomplished by having modules configured to access the same memory, files or database tables. My architecture shows how to accomplish this on memory via reference properties.
When you consider "exposing an API at the Users Service end which brings back user details based on given ids", you need to be aware that creates what HBR calls "irreversible" complexity, which I've dubbed centralization complexity. Don't build A->B (distributed) systems, because you can't decentralize them later after failing to separate requirements. Requirements in production processes represent user instructions, and centralized modules only enable you to change the wrong users' processes. In other words, centralized modules don't document user groups or distinguish them from derived-product-users.

Is it possible to use Django and Node.Js?

I have a django backend set up for user-logins and user-management, along with my entire set of templates which are used by visitors to the site to display html files. However, I am trying to add real-time functionality to my site and I found a perfect library within Node.Js that allows two users to type in a text box and have the text appear on both their screens. Is it possible to merge the two backends?
It's absolutely possible (and sometimes extremely useful) to run multiple back-ends for different purposes. However it opens up a few cans of worms, depending on what kind of rigour your system is expected to have, who's in your team, etc:
State. You'll want session state to be shared between different app servers. The easiest way to do this is to store external session state in a framework-agnostic way. I'd suggest JSON objects in a key/value store and you'll probably benefit from JSON schema.
Domains/routing. You'll need your login cookie to be available to both app servers, which means either a single domain routed by Apache/Nginx or separate subdomains routed via DNS. I'd suggest separate subdomains for the following reason
Websockets. I may be out of date, but to my knowledge neither Apache nor Nginx support proxying of websockets, which means if you want to use that you'll sacrifice the flexibility of using an http server as a app proxy and instead expose Node directly via a subdomain.
Non-specified requirements. Things like monitoring, logging, error notification, build systems, testing, continuous integration/deployment, documentation, etc. all need to be extended to support a new type of component
Skills. You'll have to pay in time or money for the skill-sets required to manage a more complex application architecture
So, my advice would be to think very carefully about whether you need this. There can be a lot of time and thought involved.
Update: There are actually companies springing around who specialise in adding real-time to existing sites. I'm not going to name any names, but if you look for 'real-time' on the add-on marketplace for hosting platforms (e.g. Heroku) then you'll find them.
Update 2: Nginx now has support for Websockets
You can't merge them. You can send messages from Django to Node.Js through some queue system like Reddis.
If you really want to use two backends, you could use a database that is supported by both backends.
Though I would not recommended it.

Difference between frontend, backend, and middleware in web development

I was wondering if anyone can compare/contrast the differences between frontend, backend, and middleware ("middle-end"?) succinctly.
Are there cases where they overlap?
Are there cases where they MUST overlap, and frontend/backend cannot be separated?
In terms of bottlenecks, which end is associated with which type of bottlenecks?
Here is one breakdown:
Front-end tier -> User Interface layer usually consisting of a mix of HTML, Javascript, CSS, Flash, and various server-side code like ASP.Net, classic ASP, PHP, etc. Think of this as being closest to the user in terms of code.
Middleware, middle-tier -> One tier back, generally referred to as the "plumbing" part of a system. Java and C# are common languages for writing this part that could be viewed as the glue between the UI and the data and can be webservices or WCF components or other SOA components possibly.
Back-end tier -> Databases and other data stores are generally at this level. Oracle, MS-SQL, MySQL, SAP, and various off-the-shelf pieces of software come to mind for this piece of software that is the final processing of the data.
Overlap can exist between any of these as you could have everything poured into one layer like an ASP.Net website that uses the built-in AJAX functionality that generates Javascript while the code behind may contain database commands making the code behind contain both middle and back-end tiers. Alternatively, one could use VBScript to act as all the layers using ADO objects and merging all three tiers into one.
Similarly, taking middleware and either front or back-end can be combined in some cases.
Bottlenecks generally have a few different levels to them:
1) Database or back-end processing -> This can vary from payroll or sales or other tasks where the throughput to the database is bogging things down.
2) Middleware bottlenecks -> This would be where some web service may be hitting capacity but the front and back ends have bandwidth to handle more traffic. Alternatively, there may be some server that is part of a system that isn't quite the UI part or the raw data that can be a bottleneck using something like Biztalk or MSMQ.
3) Front-end bottlenecks -> This could client or server-side issues. For example, if you took a low-end PC and had it load a web page that consisted of a lot of data being downloaded, the client could be where the bottleneck is. Similarly, the server could be queuing up requests if it is getting hammered with requests like what Amazon.com or other high-traffic websites may get at times.
Some of this is subject to interpretation, so it isn't perfect by any means and YMMV.
EDIT: Something to consider is that some systems can have multiple front-ends or back-ends. For example, a content management system will likely have a way for site visitors to view the content that is a front-end but what about how content editors are able to change the data on the site? The ability to pull up this data could be seen as front-end since it is a UI component or it could be seen as a back-end since it is used by internal users rather than the general public viewing the site. Thus, there is something to be said for context here.
Generally speaking, people refer to an application's presentation layer as its front end, its persistence layer (database, usually) as the back end, and anything between as middle tier. This set of ideas is often referred to as 3-tier architecture. They let you separate your application into more easily comprehensible (and testable!) chunks; you can also reuse lower-tier code more easily in higher tiers.
Which code is part of which tier is somewhat subjective; graphic designers tend to think of everything that isn't presentation as the back end, database people think of everything in front of the database as the front end, and so on.
Not all applications need to be separated out this way, though. It's certainly more work to have 3 separate sub-projects than it is to just open index.php and get cracking; depending on (1) how long you expect to have to maintain the app (2) how complex you expect the app to get, you may want to forgo the complexity.
There are in fact 3 questions in your question :
Define frontend, middle and back end
How and when do they overlap ?
Their associated usual bottlenecks.
What JB King has described is correct, but it is a particular, simple version, where in fact he mapped front, middle and bacn to an MVC layer.
He mapped M to the back, V to the front, and C to the middle.
For many people, it is just fine, since they come from the ugly world where even MVC was not applied, and you could have direct DB calls in a view.
However in real, complex web applications, you indeed have two or three different layers, called front, middle and back. Each of them may have an associated database and a controller.
The front-end will be visible by the end-user. It should not be confused with the front-office, which is the UI for parameters and administration of the front. The front-end will usually be some kind of CMS or e-commerce Platform (Magento, etc.)
The middle-end is not compulsory and is where the business logics is. It will be based on a PIM, a MDM tool, or some kind of custom database where you enrich your produts or your articles (for CMS). It'll also be the place where you code business functions that need to be shared between differents frontends (for instance between the PC frontend and the API-based mobile application). Sometimes, an ESB or tool like ActiveMQ will be your middle-end
The back-end will be a 3rd layer, surrouding your source database or your ERP. It may be jsut the API wrting to and reading from your ERP. It may be your supplier DB, if you are doing e-commerce. In fact, it really depends on web projects, but it is always a central repository. It'll be accessed either through a DB call, through an API, or an Hibernate layer, or a full-featured back-end application
This description means that answering the other 2 questions is not possible in this thread, as bottlenecks really depend on what your 3 ends contain : what JB King wrote remains true for simple MVC architectures
at the time the question was asked (5 years ago), maybe the MVC pattern was not yet so widely adopted. Now, there is absolutely no reason why the MVC pattern would not be followed and a view would be tied to DB calls.
If you read the question "Are there cases where they MUST overlap, and frontend/backend cannot be separated?" in a broader sense, with 3 different components, then there times when the 3 layers architecture is useless of course. Think of a simple personal blog, you'll not need to pull external data or poll RabbitMQ queues.
Here is a real world example which shows front/mid/back end.
General description:
Frontend is responsible for presenting data to user. Please note interesting quirk that you may have two different front ends associated with single backend
Backend provides business logic/data persistence.
Middleware (activemq in the picture) is responsible for system to system. integration between backends. Usually it is installed as separate application
Overlapping:
It is possible to have overlapping between frontend and backend. This usually leaads to long-term issues with application maintenance and scalability. Fairly common in legacy applications.
Most modern technology stacks encourage developers to have strict separation. For example in the picture you can see that backend of the first system has rest web service which is a clear separation line.
Bottlenecks
Most bottlenecks in large are caused by database/network. Databases are located in backend. As for network issues every connection goes through netowrk, so every connection has potential for being slow. With good application design these issues are avoidable to large extend.
In terms of networking and security, the Backend is by far the most (should be) secure node.
The middle-end portion, usually being a web server, will be somewhat in the wild and cut off in many respects from a company's network. The middle-end node is usually placed in the DMZ and segmented from the network with firewall settings. Most of the server-side code parsing of web pages is handled on the middle-end web server.
Getting to the backend means going through the middle-end, which has a carefully crafted set of rules allowing/disallowing access to the vital nummies which are stored on the database (backend) server.
Frontend refers to the client-side, whereas backend refers to the server-side of the application. Both are crucial to web development, but their roles, responsibilities and the environments they work in are totally different. Frontend is basically what users see whereas backend is how everything works
Frontend -> these are the client side of a website from where a user can interact with the server through User Interface. generally built using Html and CSS.
Middleware -> Middleware are the software or service which is responsible for the system to communicate and manage the data. it handles the communication between components and input/output
Backend -> Backend are the server side of any application which consist of all functioning and operations performed on data. this part is considered to be most essential part of any application. Only the server admin have access to this. it mainly consist of database and servers.