I have a web service which will be called from about...let us say 100000 users in the same time (within 3 hours). The services reads and updates the SQL database using Entity Framework 4.1. Here is the code
[WebMethod]
public bool addVotes(string username,string password,int votes)
{
bool success= false;
if (Membership.ValidateUser(username, password) == true)
{
DbContext context = new DbContext();
AppUsers user = context.AppUsers.Where(x => x.Username.Equals(username)).FirstOrDefault();
if (user != null)
{
user.Votat += votes;
context.SaveChanges();
success = true;
}
}
return success;
}
The web service will be called from android mobiles(as I said maybe 100000 maybe more maybe less but that`s not important right now). Is there a deadlock possibility or a possibility for things to go wrong?
What will happen when reading from database and what when updating. As one of the answers said: I am updating just the field Vote per each user. If there is any problem with this how do you advice me to correct it.
Thank You in advance :)
This should be fine.
The reason i say that is that as far as i can tell, the only thing that happens when this method is called on behalf of a user is that the vote count (Votat) in their row in the database is increased. As long as they are only touching their own row, and not any row that might also be touched by one of the 99999 other users, then there is no contention between users, and this should scale well.
Related
Problem
I'm using mssql v6.2.0 in a Lambda that is invoked frequently (consistently ~25 concurrent invocations under standard load).
I seem to be having trouble with connection pooling or something because I keep having tons of open DB connections which overwhelm my database (SQL Server on RDS) causing the Lambdas to just time out waiting for query results.
I have read the docs, various similar questions, Github issues, etc. but nothing has worked for this particular issue.
Things I've Learned Already
I did learn that pooling is possible across invocations due to the fact that variables outside the handler function are shared across invocations in the same container. This makes me think I should see just a few connections for each container running my Lambda, but I don't know how many that is so it's hard to verify. Bottom line is that pooling should keep me from having tons and tons of open connections, so something isn't working right.
There are several different ways to use mssql and I have tried several of them. Notably I've tried specifying max pool size with both large and small values but got the same results.
AWS recommends that you check to see if there's already a pool before trying to create a new one. I tried that to no avail. It was something like pool = pool || await createPool()
I know that RDS Proxy exists to help with situations like this, but it appears it isn't offered (at this time) for SQL Server instances.
I do have the ability to slow down my data a bit, but this has a slight impact on the performance of the product as a whole, so I don't want to do that just to avoid solving a DB connections issue.
Left unchecked, I saw as many as 700 connections to the DB at once, leading me to think there's a leak of some kind and it's maybe not just a reasonable result of high usage.
I didn't find a way to shorten the TTL for connections on the SQL Server side as recommended by this re:Invent slide. Perhaps that is part of the answer?
Code
'use strict';
/* Dependencies */
const sql = require('mssql');
const fs = require('fs').promises;
const path = require('path');
const AWS = require('aws-sdk');
const GeoJSON = require('geojson');
AWS.config.update({ region: 'us-east-1' });
var iotdata = new AWS.IotData({ endpoint: process.env['IotEndpoint'] });
/* Export */
exports.handler = async function (event) {
let myVal= event.Records[0].Sns.Message;
// Gather prerequisites in parallel
let [
query1,
query2,
pool
] = await Promise.all([
fs.readFile(path.join(__dirname, 'query1.sql'), 'utf8'),
fs.readFile(path.join(__dirname, 'query2.sql'), 'utf8'),
sql.connect(process.env['connectionString'])
]);
// Query DB for updated data
let results = await pool.request()
.input('MyCol', sql.TYPES.VarChar, myVal)
.query(query1);
// Prepare IoT Core message
let params = {
topic: `${process.env['MyTopic']}/${results.recordset[0].TopicName}`,
payload: convertToGeoJsonString(results.recordset),
qos: 0
};
// Publish results to MQTT topic
try {
await iotdata.publish(params).promise();
console.log(`Successfully published update for ${myVal}`);
//Query 2
await pool.request()
.input('MyCol1', sql.TYPES.Float, results.recordset[0]['Foo'])
.input('MyCol2', sql.TYPES.Float, results.recordset[0]['Bar'])
.input('MyCol3', sql.TYPES.VarChar, results.recordset[0]['Baz'])
.query(query2);
} catch (err) {
console.log(err);
}
};
/**
* Convert query results to GeoJSON for API response
* #param {Array|Object} data - The query results
*/
function convertToGeoJsonString(data) {
let result = GeoJSON.parse(data, { Point: ['Latitude', 'Longitude']});
return JSON.stringify(result);
}
Question
Please help me understand why I'm getting runaway connections and how to fix it. For bonus points: what's the ideal strategy for handling high DB concurrency on Lambda?
Ultimately this service needs to handle several times the current load -- I realize this becomes a quite intense load. I'm open to options like read replicas or other read-performance-boosting measures as long as they're compatible with SQL Server, and they're not just a cop out for writing proper DB access code.
Please let me know if I can improve the question. I know there are similar ones out there but I have read/tried a lot of them and didn't find them to help. Thanks in advance!
Related Material
https://forums.aws.amazon.com/thread.jspa?messageID=678029 (old, but similar)
https://www.slideshare.net/AmazonWebServices/best-practices-for-using-aws-lambda-with-rdsrdbms-solutions-srv320 re:Invent slide deck
https://www.jeremydaly.com/reuse-database-connections-aws-lambda/ Relevant info but for MySQL instead of SQL Server
Answer
I finally found the answer after 4 days of effort. All I needed to do was scale up the DB. The code is actually fine as-is.
I went from db.t2.micro to db.t3.small (or 1 vCPU, 1GB RAM to 2 vCPU and 2GB RAM) at a net cost of roughly $15/mo.
Theory
In my case, the DB probably couldn't handle the processing (which involves several geographic calculations) for all my invocations at once. I did see CPU go up, but I assumed that was a result of the high open connections. When the queries slowed down, the concurrent invocations pile up as Lambdas start to wait for results, finally causing them to time out and not close their connections properly.
Comparisions:
db.t2.micro:
200+ DB connections (goes up continuously if you leave it running)
50+ concurrent invocations
5000+ ms Lambda duration when things slow down, ~300ms under no load
db.t3.small:
25-35 DB connections (constantly)
~5 concurrent invocations
~33 ms Lambda duration <-- ten times faster!
CloudWatch Dashboard
Summary
I think this issue was confusing to me because it didn't smell like a capacity issue. Almost every time I've dealt with high DB connections in the past, it has been a code error. Having tried options there, I thought it was "some magical gotcha of serverless" that I needed to understand. In the end it was as simple as changing DB tiers. My takeaway is that DB capacity issues can manifest themselves in ways other than high CPU and memory usage, and that high connections may be a result of something besides a code bug.
Update (4 months in)
This continues to work very well. I'm impressed that doubling the DB resources seems to have given > 2x performance. Now, when due to load (or a temporary bug during development), the db connections get really high (even over 1k) the DB handles it. I'm not seeing any issues at all with db connections timing out or the database getting bogged down due to load. Since the original time of writing I've added several CPU-intensive queries to support reporting workloads, and it continues to handle all these loads simultaneously.
We've also deployed this setup to production for one customer since the time of writing and it handles that workload without issue.
So a connection pool is no good on Lambda at all what you can do is reuse connections.
Trouble is every Lambda execution opens a pool it'll just flood the DB like you're getting, you want 1 connection per lambda container, you can use a db class like so (this is rough but lemmy know if you've got questions)
export default class MySQL {
constructor() {
this.connection = null
}
async getConnection() {
if (this.connection === null || this.connection.state === 'disconnected') {
return this.createConnection()
}
return this.connection
}
async createConnection() {
this.connection = await mysql.createConnection({
host: process.env.dbHost,
user: process.env.dbUser,
password: process.env.dbPassword,
database: process.env.database,
})
return this.connection
}
async query(sql, params) {
await this.getConnection()
let err
let rows
[err, rows] = await to(this.connection.query(sql, params))
if (err) {
console.log(err)
return false
}
return rows
}
}
function to(promise) {
return promise.then((data) => {
return [null, data]
}).catch(err => [err])
}
What you need to understand is A lambda execution is a little virtual machine that does a task and then stops, it does sit there for a while and if anyone else needs it then it gets reused along with the container and connection for a single task there's never multiple connections to a single lambda.
Hope this helps let me know if ya need any more detail! Oh and welcome to stackoverflow, that's a well-constructed question.
We have a Play app, currently using version 2.6. We are trying to prevent dictionary attacks against our login by delaying a "failed login" message back to our users when they provide a failed password. We currently hash and salt and have all the best practices, but we are not sure if we are delaying correctly. So we have in our Controller:
public Result login() { return ok(loginHtml) }
and we have a:
public Result loginAction()
{
// Check for user in database
User user = User.find.query()...
// Was the user found?
if (user == null) {
// Wrong password! Delay and redirect
Thread.sleep(10000); <<-- how do delay correctly?
return redirect(routes.Controller.login())
}
// User is not null, so all good!
...
}
We are not sure if Thread.sleep(10000) is the best way to delay a response since this might hang other requests that come in, or use too many thread from the default pool. We have noticed that under 80+ hits per second the Play Framework does not route our HTTP calls to the Routes. That is, if we receive a HTTP POST request, our app will not even send that request to the Controller until 20+ seconds later, HOWEVER, in the SAME time period if we get a HTTP GET request, our app will process that GET instantly!
Currently we have 300 threads as the min/max in our Akka settings for the default fork pool. Any insights would be appreciated. We run a t2.xlarge AWS EC2 instance running Ubuntu.
Thank you.
Thread.sleep causes current thread blocking, please, try to avoid using it in production code as much as possible.
What you need to use, is CompletionStage / CompletableFuture or any abstraction for deeling with async programming and asynchronous action.
Please, take a look for more details about asynchronios actions: https://www.playframework.com/documentation/2.8.x/JavaAsync
In your case solution would look like something too (excuse me, please, this might have mistakes - I'm Scala engineer primary):
import play.libs.concurrent.HttpExecutionContext;
import play.mvc.*;
import javax.inject.Inject;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
public class LoginController extends Controller {
private HttpExecutionContext httpExecutionContext;
// Create and inject separate ScheduledExecutorService
private ScheduledExecutorService executor;
#Inject
public LoginController(HttpExecutionContext ec,
ScheduledExecutorService executor) {
this.httpExecutionContext = ec;
this.executor = executor;
}
public CompletionStage<Result> loginAction() {
User user = User.find.query()...
if (user == null) {
return executor.schedule(() -> {redirect(routes.Controller.login());}, 10, TimeUnit.SECONDS);
} else {
// return another response
}
}
}
Hope this helps!
I don't like this approach at all. This hogs threads for no reason and can probably cause your entire system to lock up if someone finds out you are doing this and they have malicious ideas. Let me propose a better approach:
In the User table store a nullable LocalDateTime of the last login attempt time.
When you fetch the user from the DB check the last attempt time (compare to LocalDateTime.now()), if 10 secs have passed since last attempt perform the password comparison.
If passwords don't match store the last attempt time as now.
This can also be handled gracefully on the front end if you provide good error responses.
EDIT: If you want to delay login attempts NOT based on the user, you could create an attempt table and store last attempt by IP address.
If you really want to do your way which I don't recommend you need to read up on this first: https://www.playframework.com/documentation/2.8.x/ThreadPools
I have an Analytics pipeline added just before the standard one in section to delete duplicate triggered pageevents before submitting all to database so I can have unique triggered events as there seems to be a bug on android/ios devices that triggers several events within few seconds interval.
In this custom pipeline I need to get the list of all goals/events the current user triggered in his session so I can compare with the values in dataset obtained from args parameter and delete the ones already triggered.
The args.DataSet.Tables["PageEvents"] only returns the set to be submitted to database and that doesn't help since it is changing each time this pipeline runs. I also tried Sitecore.Analytics.Tracker.Visitor.DataSet but I get a null value for these properties.
Does anyone knows a way how to get a list with all goals the user triggered so far in his session without requesting it directly to the database ?
Some code:
public class CommitUniqueAnalytics : CommitDataSetProcessor
{
public override void Process(CommitDataSetArgs args)
{
Assert.ArgumentNotNull(args, "args");
var table = args.DataSet.Tables["PageEvents"];
if (table != null)
{
//Sitecore.Analytics.Tracker.Visitor.DataSet.PageEvents - this list always empty
...........
}
}
}
I had a similar question.
In Sitecore 7.5 I found that this worked:
Tracker.Current.Session.Interaction.Pages.SelectMany(x=>x.PageEvents)
However I'm a little worried that this will be inefficient if the Pages collection is very large.
I'm developing my php software using Doctrine2. It is quite simple to use it but I have a little problem and I would know what is the best practice in that situation. Maybe you could help me ! You'll have all my gratitude :-D
Situation :
I have 2 entities (User and Contacts)
A User can contain some Contacts
The entity (table) Contacts have a field labelled mainContact which define if it is the main contact of the user or not.
Ony one contact could be the main contact (mainContact=1)
Problematic :
I woud like that when I persist a contact :
If this contact has mainContact=1, all other contacts associated to
the user sould be updated to mainContact=0
If this contact has mainContact=0, I need to check all other
contacts. If I don't find any other contact with mainContact=1 for
this user, I automaticly update the current contact with
setMainContact(true).
Possible solutions :
I have some idea how to process this logic but I would like to know the best practice in order to do a good code because this application will be an open source application.
Not clean ideas :
Create a method in the Contact Repository that will update all the
others contacts assigned to the user and return the value to
attribute to the current contact.
With this solution, I must launch the repository method always before to persist a contact all around the application. If I forgot to launch it, the database integrity should be compromised.
Use the Prepersist mecanism from the entity to get the entitymanager
and update all others user's contacts.
This method is not recommanded, the entity should never access directly the entity manager.
Can anyone tell me what is the best practice to do so ? Thank you very much !
PS : Sorry for my poor english !
The best thing you can do here (from a pure OOP perspective, without even the persistence logic) is to implement this logic in your entity's setters. After all, the logic isn't heavy considered that a User won't have many contacts, nor the operation will happen very often.
<?php
class User
{
protected $contacts;
// constructor, other fields, other methods
public function addContact(Contact $contact)
{
if ($this->contacts->contains($contact)) {
return;
}
if ($contact->isMainContact()) {
foreach ($this->contacts as $existingContact) {
$existingContact->setMainContact(false);
}
$this->contacts->add($contact);
$contact->setUser($this); // set the owning side of the relation too!
return;
}
$mainContact = true;
foreach ($this->contacts as $existingContact) {
if ($existingContact->isMainContact()) {
$mainContact = false;
break; // no need for further checks
}
}
$contact->setMainContact($mainContact);
$this->contacts->add($contact);
$contact->setUser($this); // set the owning side of the relation too!
}
}
On the other side, think about adding a field to your user instead:
<?php
class User
{
// keep reference here instead of the contact (cleaner)
protected $mainContact;
}
Webservice1 can receive a set of Lon/Lat variables. Based on these variables it returns a resultset of items nearby.
In order to create the resultset Webservice1 has to pass the variables to multiple webservices of our own and multiple external webservices. All these webservice return a resultset. The combination of these resultsets of these secondary Webservices is the resultset to be returned by Webservice1.
What is the best design approach within Windows Azure with costs and performance in mind?
Should we sequential fire requests from Webservice1 to the other webservices wait for a response and continue? Or can we eg use a queue where we post the variables to be picked up by the secondary webservices?
I think you've answered you're own question in the title.
I wouldn't worry about using a queue. Queues are great for sending information off to get dealt with by something else when it doesn't matter how long it takes to process. As you've got a web service that's waiting to return results, this is not ideal.
Sending the requests to each of the other web services one at a time will work and is the easiest option technically, but it won't give you the best performance.
In this situation I would send requests to each of the other web services in parallel using the Task Parallel Library. Presuming the order of the items that you return isn't important your code might look a bit like this.
public List<LocationResult> PlacesOfInterest(LocationParameters parameters)
{
WebService[] webServices = GetArrayOfAllWebServices();
LocationResult[][] results = new LocationResult[webServices.Count()][];
// Call all of the webservices parallel
Parallel.For((long)0,
webServices.Count(),
i =>
{
results[i] = webServices[i].PlacesOfInterest(parameters);
});
var finalResults = new List<LocationResult>();
// Put all the results together
for (int i = 0; i < webServices.Count(); i++)
{
finalResults.AddRange(results[i]);
}
return finalResults;
}