Best way to go about sanitizing user input in rails

Best way to go about sanitizing user input in rails - ruby-on-rails-4

I've read a lot about this and know there are many related questions on here, but I couldn't find a definitive guide for how to go about sanitizing everything. One option is to sanitize on insert, for example I have the following in my model
before_validation :sanitize_content, :on => :create
def sanitize_content
self.content = ActionController::Base.helpers.sanitize(self.content)
end
Do I need to run this on every field in every model? I'm guessing the :on => :create should be removed too so it runs when updates too?
The other option is to sanitize when data is displayed in views, using simple_format, or .html_safe or sanitize(fieldname). SHould I be sanitizing in all my views for every single field, as well as on insert? Having to do this manually everywhere doesn't seem very railsy
Thanks for any help

TL;DR
Regarding user input and queries: Make sure to always use the active record query methods (such as .where), and avoid passing parameters using string interpolation; pass them as hash parameter values, or as parameterized statements.
Regarding rendering potentially unsafe user-generated html / javascript content: As of Rails 3, html/javascript text is automatically properly escaped so that it appears as plain text on the page, rather than interpreted as html/javascript, so you don't need to explicitly sanitize (or use <%= h(potentially_unsafe_user_generated_content)%>
If I understand you correctly, you don't need to worry about sanitizing data in this manner, as long as you use the active record query methods correctly. For example:
Lets say our parameter map looks like this, as a result of a malicious user inputting the following string into the user_name field:
:user_name => "(select user_name from users limit 1)"
The bad way (don't do this):
Users.where("user_name = #{params[:id}") # string interpolation is bad here
The resulting query would look like:
SELECT `users`.* FROM `users` WHERE (user_name = (select user_name from users limit 1))
Direct string interpolation in this manner will place the literal contents of the parameter value with key :user_name into the query without sanitization. As you probably know, the malicious user's input is treated as plain 'ol SQL, and the danger is pretty clear.
The good way (Do this):
Users.where(id: params[:id]) # hash parameters
OR
Users.where("id = ?", params[:id]) # parameterized statement
The resulting query would look like:
SELECT `users`.* FROM `users` WHERE user_name = '(select user_name from users limit 1)'
So as you can see, Rails in fact sanitizes it for you, so long as you pass the parameter in as a hash, or method parameter (depending on which query method you're using).
The case for sanitization of data on creating new model records doesn't really apply, as the new or create methods are expecting a hash of values. Even if you attempt to inject unsafe SQL code into the hash, the values of the hash are treated as plain strings, for example:
User.create(:user_name=>"bobby tables); drop table users;")
Results in the query:
INSERT INTO `users` (`user_name`) VALUES ('bobby tables); drop table users;')
So, same situation as above.
Let me know if I've missed or misunderstood anything.
Edit
Regarding escaping html and javascript, the short version is that ERB "escapes" your string content for you so that it is treated as plain text. You can have it treated like html if you really want, by doing your_string_content.html_safe.
However, simply doing something like <%= your_string_content %> is perfectly safe. The content is treated as a string on the page. In fact, if you examine the DOM using Chrome Developer Tools or Firebug, you should in fact see quotes around that string.

Because I always appreciate when I find the source of knowledge and code on any SO answer, I will provide that for this question.
Both ActiveRecord and ActionController provide methods to sanitize sql input.
Specifically from ActiveRecord::Sanitization::ClassMethods you have sanitize_sql_for_conditions and its two other aliases:
sanitize_conditions and sanitize_sql. The three do literally the exact same thing.
sanitize_sql_for_conditions
Accepts an array, hash, or string of SQL conditions and sanitizes
them into a valid SQL fragment for a WHERE clause.
However, in ActiveRecord you also have
sanitize_sql_for_assignment which
Accepts an array, hash, or string of SQL conditions and sanitizes them
into a valid SQL fragment for a SET clause.
Note that these methods are included in ActiveRecord::Base and therefore are included by default in any ActiveRecord model.
On the other hand, in ActionController you have ActionController::Parameters which allows you to
choose which attributes should be whitelisted for mass updating and
thus prevent accidentally exposing that which shouldn't be exposed.
Provides two methods for this purpose: require and permit.
params = ActionController::Parameters.new(user: { name: 'Bryan', age: 21 })
req = params.require(:user) # will throw exception if user not present
opt = params.permit(:name) # name parameter is optional, returns nil if not present
user = params.require(:user).permit(:name, :age) # user hash is required while `name` and `age` keys are optional
The parameters magic is called Strong Parameters, docs here.
I hope that helps anyone, if only to learn and demystify Rails! :)

Related

Convert the value of a field in a django RawQueryset to a different django field type

I have a rather complex query that's generating a Django RawQuerySet. This specific query returns some fields that aren't part of the model that the RawQuerySet is based on, so I'm using .annotate(field_name=models.Value('field_name')) to attach it as an attribute to individual records in the RawQuerySet. The most important custom field is actually a uuid, which I use to compose URLs using Django's {% url %} functionality.
Here's the problem: I'm not using standard uuids inside my app, I'm using SmallUUIDs (compressed UUIDs.) These are stored in the database as native uuidfields then converted to shorter strings in python. So I need to somehow convert the uuid returned as part of the RawQuerySet to a SmallUUID for use inside a template to generate a URL.
My code looks somewhat like this:
query = "SELECT othertable.uuid_field as my_uuid FROM myapp_mymodel
JOIN othertable o ON myapp_mymodel.x = othertable.x"
MyModel.objects.annotate(
my_uuid=models.Value('my_uuid'),
).raw(query)
Now there is a logical solution here, there's an optional kwarg for models.Value called output_field, making the code look like this:
MyModel.objects.annotate(
my_uuid=models.Value('my_uuid', output_field=SmallUUIDField()),
).raw(query)
But it doesn't work! That kwarg is completely ignored and the type of the attribute is based on the type returned from the database and not what's in output_field. In my case, I'm getting a uuid output because Postgres is returning a UUID type, but if I were to change the query to SELECT cast othertable.uuid_field as text) as my_uuid I'd get the attribute in the format of a string. It appears that Django (at least version 1.11.12) doesn't actually care what is in that kwarg in this instance.
So here's what I'm thinking are my potential solutions, in no particular order:
Change the way the query is formatted somehow (either in Django or in the SQL)
Change the resulting RawQuerySet in some way before it's passed to the view
Change something inside the templates to convert the UUID to a smalluuid for use in the URL reverse process.
What's my best next steps here?

A couple of issues with your current approach:
Value() isn't doing what you think it is - your annotation is literally just annotating each row with the value "my_uuid" because that is what you have passed to it. It isn't looking up the field of that name (to do that you need to use F expressions).
Point 1 above doesn't matter anyway because as soon as you use raw() then the annotation is ignored - which is why you see no effect coming from it.
Bottom line is that trying to annotate a RawQuerySet isn't going to be easy. There is a translations argument that it accepts, but I can't think of a way to get that to work with the type of join you are using.
The next best suggestion that I can think of is that you just manually convert the field into a SmallUUID object when you need it - something like this:
from smalluuid import SmallUUID
objects = MyModel.objects.raw(query)
for o in objects:
# Take the hex string obtained from the database and convert it to a SmallUUID object.
# If your database has a built-in UUID type you will need to do
# SmallUUID(small=o.my_uuid) instead.
my_uuid = SmallUUID(hex=o.my_uuid)
(I'm doing this in a loop just to illustrate - depending on where you need this you can do it in a template tag or view).

Django GROUP BY including unnecessary columns?

I have Django code as follows
qs = Result.objects.only('time')
qs = qs.filter(organisation_id=1)
qs = qs.annotate(Count('id'))
And it gets translated into the following SQL:
SELECT "myapp_result"."id", "myapp_result"."time", COUNT("myapp_result"."id") AS "id__count" FROM "myapp_result" WHERE "myapp_result"."organisation_id" = 1 GROUP BY "myapp_result"."id", "myapp_result"."organisation_id", "myapp_result"."subject_id", "myapp_result"."device_id", "myapp_result"."time", "myapp_result"."tester_id", "myapp_result"."data"
As you can see, the GROUP BY clause starts with the field I intended (id) but then it goes on to list all the other fields as well. Is there any way I can persuade Django not to specify all the individual fields like this?
As you can see, even with .only('time') that doesn't stop Django from listing all the other fields anyway, but only in this GROUP BY clause.
The reason I want to do this is to avoid the issue described here where PostgreSQL doesn't support annotation when there's a JSON field involved. I don't want to drop native JSON support (so I'm not actually using django-jsonfield). The query works just fine if I manually issue it without the reference to "myapp_result"."data" (the only JSON field on the model). So if I could just persuade Django not to refer to it, I'd be fine!

only only defers the loading of certain fields, i.e. it allows for lazy loading of big or unused fields. It should generally not be used unless you know exactly what you're doing and why you need it, as it is nothing more than a performance booster than often decreases performance with improper use.
What you're looking for is values() (or values_list()), which actually excludes certain fields instead of just lazy loading. This will return a dictionary (or list) instead of a model instance, but this is the only way to tell Django to not take other fields into account:
qs = (Result.objects.filter_by(organisation_id=1)
.values('time').annotate(Count('id')))

Sitecore Fast Query select attribute values

Is there a way by using XPath Builder under Developer Center inside Sitecore Shell (a Fast Query interface) to select a particular attribute from an item. Example:
/sitecore/content/Home/Products/*[##templatename = 'Product Group']/#id
I would expect to see a collection of id's to be returned where id is an attribute of an item. If yes is it possible to extract an attribute with a space bar? Example:
/sitecore/content/Home/Products/*[##templatename = 'Product Group']/#more info
EDIT
The thing that I want to achieve is to get a collection of items (I have few hounded items here), not one particular item. That's why I am not interested in adding additional conditions, like specific item id or title. I want to see a collection of values of a specific attribute. As in example showed above, I want to see a collection of values that are assign to 'more info' attribute. Once again I am expecting to see few hounded different values that are set to 'more info' attribute.
EDIT2
There is a problem with a production, a critical stuff. There is no access to it other then thru Sitecore shell, but I don't have permissions to add/install additional packages. I know how to get this info by implementing custom code, or queering db directly, but I simply do not have permission to do it. Guys that will be able to grant me need credentials will wake up in 6 hours, so I was hoping to do whatever I can to analyse the situation. I would accept Maras answer if it was an answer not a comment - there is no way I can do it using fast query. thanks for help.

Try using #
/sitecore/content/Home/Products/*[##templatename = 'Product Group']/##more info#
This is the way around when selecting items with fields that contain spaces. Having said that I don't know if you would be able to get a specific result or not for your specific question but give it a try.
For example, consider this query which returns Product S1
fast:/sitecore/content/home/*[#Title = 'Item 1' and ##templatename = 'Product Group1']//*[#Title = 'Product S1' and ##id = '{787EE6C5-0885-495D-855E-1D129C643E55}']
However, if you place the special attribute (i.e. ##id) at the beginning of the condition, the query will not return a result.
fast:/sitecore/content/home/*[##templatename = 'Product Group1' and #Title = 'Product S1']//*[##id = '{787EE6C5-0885-495D-855E-1D129C643E55}' and #Title = 'Product S1']
Remember this, Sitecore Fast Query only supports the following special attributes:
##id
##name
##key
##templateid
##templatename
##templatekey
##masterid
##parentid
Let us know if this helps.

How do I search a db field for a the string after a "#" and add it to another db field in django

I want to have a content entry block. When a user types #word or #blah in a field, I want efficiently search that field and add the string right after the "#" to a different field as a n entry in a different table. Like what Twitter does. This would allow a user to sort by that string later.
I believe that I would do this as a part of the save method on the model, but I'm not sure. AND, if the #blah already exists, than the content would belong to that "blah"
Can anyone suggest samples of how to do this? This is a little beyond what I'm able to figure out on my own.
Thanks!

You can use regex (re) during save() or whenever to check if your field text contains #(?P<blah>\w+) , extract your blah and and use it for whatever you want .

intelligent methodology for filtering server side

My lack of CS and inexperience is really coming to the forefront in this moment. I've never really handled filtering results server side. I'm thinking that this is not the right way to go about it. I'm using Django....
First, I assume that I can keep it DRYer by keeping this validation in my form definitions. Next, I was concerned about my chained filter statements. How important is it to use Q complex lookups as opposed to chaining filters at this point? I'm just building a prototype and I assume that I'll eventually have to go for a search solution more powerful than full text search.
My big issue right now (besides the length of the code and clearly the inefficiency) is that I'm not sure how to handle my rooms and workers inputs, which are select forms. If the user does not select a value, I want to remove these filters from the process server side. Should I just create two separate conditional series of lookups for these outcomes?
def search(request):
if request.method=='GET' and request.GET.get('region',''):
neighborhoods=request.GET.getlist('region')
min_rent=request.GET.get('min_cost','0')
min_rent=re.sub(r'[,]','',min_cost) #remove any ','
if re.search(r'[^\d]',min_cost):
min_cost=0
else:
min_cost=int(min_cost)
max_cost=request.GET.get('max_cost','0')
max_cost=re.sub(r'[,]','',max_cost) #remove any ','
if re.search(r'[^\d]',max_cost):
max_cost=100000
else:
max_cost=int(max_rent)
date_min=request.GET.get('from','')
date_max=request.GET.get('to','')
if not date_min:
date=(str(datetime.date.today()))
date_min=u'%s' %date
if not date_max:
date_max=u'2013-03-18'
rooms=request.GET.get('rooms',0)
if not rooms:
rooms=0
workers=request.GET.get('workers',0)
if not workers:
workers=0
#I should probably use Q objects here for complex lookups
posts=Post.objects.filter(region__in=region).filter(cost__gt=min_cost).filter(cost__lt=max_cost).filter(availability__gt=date_min).filter(availability__lt=date_max).filter(rooms=rooms).filter(workers=workers)
#return HttpResponse('%s' %posts)
return render_to_response("website/search.html",{'posts':posts),context_instance=RequestContext(request))

First, I assume that I can keep it
DRYer by keeping this validation in my
form definitions.
Yes, I'd put this in a form as it looks like you are using one to display the form anyways? Also, you can put a lot of your date formatting stuff right in the clean_FIELD methods to format the data in the cleaned_data dict. The only issue here is that output is actually modified so your users will see the change from 1,000 to 1000. Either way, I would put this logic in a form method.
# makes the view clean.
if form.is_valid():
form.get_posts(request)
return response
My big issue right now (besides the
length of the code and clearly the
inefficiency) is that I'm not sure how
to handle my rooms and workers inputs,
which are select forms. If the user
does not select a value, I want to
remove these filters from the process
server side. Should I just create two
separate conditional series of lookups
for these outcomes?
Q objects are only for complex lookups. I don't see a need for them here.
I also don't see why you need to chain the filters. I at first wondered if these are m2m, but these types of queries (__gt/__lt) don't behave any differently chaining as there is no overlap between the queries.
# this is more readable / concise.
# I'd combine as many of your queries as you can just for readability.
posts = Posts.objects.filter(
region__in=region,
cost__gte=min_cost,
# etc
)
Now, if you want optional arguments, my suggestion is to use a dictionary of keyword arguments so that you can dynamically populate the kwargs.
keyword_arguments = {
'region__in': region,
'cost__gte': min_cost,
'cost__lt': max_cost,
'availability__gt': date_min,
'availability__lt': date_max,
}
if request.GET.get('rooms'):
keyword_arguments['rooms'] = request.GET['rooms']
if request.GET.get('workers'):
keyword_arguments['workers'] = request.GET['workers']
posts = Posts.objects.filter(**keyword_arguments)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js