Question about GraphQL and its vulnerability to SQL injections. Say I have some dynamic search being performed, where a user enters text into a field, and I use this as a parameter to a graphQL search.
So, I end up with something like this:
{
data {
location: user_input
}
}
where user_input is a variable specified by the user.
Now, say a user were to try and enter some malevolent code here, to wipe the database or something along those lines (i.e. a 1=1 attack). Would this work here? Does GraphQL simply translate the queries into SQL and thus could this be dangerous? Or will GraphQL prevent such things from happening?
Thank you
GraphQL is a query language. From the spec:
GraphQL is a query language designed to build client applications by providing an intuitive and flexible syntax and system for describing their data requirements and interactions... GraphQL is not a programming language capable of arbitrary computation, but is instead a language used to query application servers that have capabilities defined in this specification. GraphQL does not mandate a particular programming language or storage system for application servers that implement it.
GraphQL is agnostic about the underlying data layer. It can be used with a SQL database, but it could also be used with a NoSQL database, an in-memory key-value store, a file system, etc.
Whether a particular GraphQL service is vulnerable to SQL injection ultimately depends on how that service was implemented. The fact that it's a GraphQL service doesn't really factor into it.
Related
I’d like to automatically build up a SQL query based on some strings passed in by my users. Are there any helper methods for doing that in the Cloud Spanner APIs?
We strongly recommend that you not generate textual SQL based on untrusted user input. It’s much easier and safer to use bound parameters, which help you avoid SQL injection attacks.
I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.
We have an internal application. As time went on and new applications were requested, that exchange data between eachother, the interaction became bound to the database schema. Meaning changes in the database require changes everywhere else. As we plan to build even more applications that will depend on the same data this quickly will become and unmanagable mess.
Now i'm looking to abstract that interaction behind an API. Currently i have trouble choosing the right tool.
Interaction at times could be complex, meaning data is posted to one service and if the action has been completed it should notify the sender of that.
Another example would be that some data does not have context without the data from other services. Lets say there is one service for [Schools] and one for [Students]. So if the [School] gets deleted or changed the [Student] needs to be informed about it immeadetly and not when he comes to [School].
Advice? Suggestions? SOAP/REST/?
I don't think you need an API. In my opinion you need an architecture which decouples your database from the domain logic and other parts of the application. Such an architecture is for example clean architecture, onion architecture and hexagonal architecture (ports&adapters by new name). They share the same concepts, you have a domain logic, which does not depend from any framework, external lib, delivery method, data storage solutions, etc... This domain logic communicates with the outside world through adapters having well defined interfaces. If you first design the inside of your domain logic, and the interfaces of the adapters, and just after the outside components, then it is called domain driven design (DDD).
So for example if you want to move from MySQL to MongoDB you already have a DataStorageInterface, and the only thing you need is writing a MongoDBAdapter which implements this interface, and ofc migrate the data...
To design the adapters you can use two additional concepts; command and query segregation (CQRS) and event sourcing (ES). CQRS is for connecting delivery methods like REST, SOAP, webapplications, etc... to the domain logic. For example you can raise a CreateUserCommand from your REST API. After that the proper listener in the domain logic processes that command, and by success it raises a domain event, like UserCreatedEvent. Your REST API can listen to that event and respond with a success message to the REST client. The UserCreatedEvent can be listened by one or more storage adapter too. So they can process that event and persist the new user. You don't necessary use only a single database. For example if a relational database is faster by a specific type of query, then you can use that, but if a noSQL database suites better to the job, then you can use that too. So you can use as many databases as you want for your queries, the only thing you need is writing a storage adapter for them. For example if your REST client wants to retrieve the profile of a specific user, then it can raise a GetUserProfileByIdQuery and the domain logic can ask the adapter of a database which can serve the query. After that the adapter can send for example an SQL query to a MySQL database and return the response. By ES you add EventStorage to your system, which stores the raised domain events. It can be very useful if you want to migrate your data from one query database to another. In that case you create a new storage adapter to your new database, and replay all of the domain events from the EventStorage in historical order to that adapter, so it can fill the new database with the relevant data. That's all, you don't have to write complicated migration scripts...
In your case I think your should create at least domain events, and use event sourcing. That will totally decouple your database from the other parts of your application. Adding a REST or SOAP API can have a similar effect, but building HTTP connections to access your database can slow down your application.
This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.
In my case the separate system is a web-service (but it could conceivably be anything).
My question is what are the best practices when you integrate against a separate system such as a web-service when it comes to data?
Example: Web-service provides a list of products. Products are grouped using categories. You can get all products in a sub-category. You can get a specific product by its id (an integer) or its name (a unique value).
In my application:
I display the list of categories and products - and the user can choose the product and specify an order quantity.
Should I store the name of the category or the id of the category?
Should I store the name of the product or the id of the product?
How should I name the field in the database that stores the data from the web-service
(CategoryId or WsCategoryId: so that by convention one knows where the value is coming from?)
Any other best practices?
Any other references?
From your question I understand that the web service's interface looks something like this:
/product/
/product/{ProductId}
/product/{ProductName}
/product/category/{CategoryId}
Since you are asking if you should store CategoryName, I assume that it is unique (same as ProductName).
I also assume that the web service handles cases where products or categories are renamed transparently (i.e. by providing a redirect or any other means which allow you to detect this and handle it accordingly). If it doesn't, do not consider storing names as references to products or categories - always use IDs.
I would provide the same answer to your questions #1 and #2. Even though uniqueness of ProductName and CategoryName will technically allow you to store them in your application as unique identifiers of products and categories, I would opt for storing their IDs instead. The main decision point would be your storage medium. Since you are using a database, and the web service allows you to access objects by unique numerical IDs, database normalization rules should apply - hence you should store IDs.
The above however assumes that you are using a relational database - if you are using a NoSQL database, I assume that storing names instead of IDs would be a viable option as well (at least as far as I can tell with my current understanding of NoSQL solutions, unfortunately I don't have any practical experience with any of them yet).
Regarding question #3 - I would stick with the naming conventions that you already use in your database. There are many different conventions for naming tables and columns out there, so I really doubt that there are any standardized conventions on how to name columns referencing web service objects. I would name them according to your existing naming conventions and in a way that purpose of the columns is clear to everybody who is using the system. Note that if there is a chance that you will be using other web services in the future, you should consider keeping the name of the service in the column name rather than using a generic ws prefix - e.g. AmazonProductId or AmazonCategoryId.
I'll try to point out a few items from my experience, but I would not label them as best practices - just topics to think about.
In my experience, I found it useful to treat data from web services in the same fashion as the data from a database - at least from an application's perspective, where your storage layer would be abstracted from application logic. By this I mean that you would should think about and prepare for similar scenarios regardless if your storage medium is a database or a web service. Same as databases, web services can go down, both can have their data or integrity corrupt, both will require you to sanitize or otherwise process data on input.
Caching of data should be an item which is high on your list - apart from the obvious performance reasons, it can allow you to deal with outages of the web service (to an extend limited by which data you cache).
An example would be that your application displays a list products most frequently purchased products in your application. If your application stores only IDs of products, you will have to do one or more requests to the web service in order to retrieve the names of all products which you need to display in the list. If you cache product names locally or in your database, you will achieve better performance, conserve your resources and you will also have a failsafe scenario in case that the web service goes down.
Referential integrity is one other important aspect to think about when working with web services. As the web service is completely separate from your database, you do not have the option to create foreign keys as you would do in a database-only solution. This means that data changes in the web service (i.e. product updates or deletions) can break the integrity of data in your database.
Regarding references, these depend mostly on the type of web service that you are about to use (you didn't specify which service you will be using). If the service is based on REST principles, I can recommend Restful Web Services by Leonard Richardson and Sam Ruby. Even though it isn't focused on application/service integration as such, it's a great introduction into REST.