Passing more than 8000 Post parameters throws error - coldfusion

I am working on a module which requires to submit a form with an insane amount of parameters (8k-10k). I am not sure whether this is a good idea or not. But that's the way it is. I have changed the settings in neo-runtime.xml file as mentioned in this link as bellow:
<var name='postParametersLimit'><number>10000.0</number></var> and restarted the server. But no use. CF still throws error 500. We can not see any robust information. I am working on CF9.0.2 and we are using IIS 7.5. Is there anything do i need to do?

"We gave our client a dynamic form where he can add his own form fields and now we have this problem. There was a mismatch between clients expectations and our thinking of the way client wants it."
Unfortunately, you're going to have to tell the client they can't have it how they want it. That post processing limit is there for security reasons and if you raise it too high, then you're re-opening your server to a denial of service attack using a hash algorithm collision.
We have tens of thousands of forms in our workflow system and work with banking and government clients. Once this update was applied (in development first), we had to raise the default to a certain value and stick with it. We made sure to note this limitation to the entire business team and add it to our coding standards document to ensure that all new development was done in accordance to the standard. After reworking a handful of existing forms to account for the limitation, we were able to push the security update to production without a problem.
Just tell them that there is a security restriction on the number of fields in a single form and they cannot cross that line. If you need to gather that much data, they'll have to break it up into multiple forms.

You can use a cfgrid instead of using a long form with huge amount of data to take input from the user.
cfgrid allows you to load only a limited amount of data from the database.
Using it you can prevent posting and loading of huge amount of data at once.
And if you are not a great supporter of cfgrid of cfajax features you can still use pagination or stuff like that, that will allow you to load a limited amount of data in your form and in turn less posting of data. But the later will need you to build a logic by yourself.

Start with the CF server limits first. This blog post should give you a pointer to where limits can be adjusted:
http://www.cutterscrossing.com/index.cfm/2012/3/27/ColdFusion-Security-Hotfix-and-Big-Forms

Related

Determine unique visitors to site

I'm creating a django website with Apache2 as the server. I need a way to determine the number of unique visitors to my website (specifically to every page in particular) in a full proof way. Unfortunately users will have high incentives to try to "game" the tracking systems so I'm trying to make it full proof.
Is there any way of doing this?
Currently I'm trying to use IP & Cookies to determine unique visitors, but this system can be easily fooled with a headless browser.
Unless it's necessary that the data be integrated into your Django database, I'd strongly recommend "outsourcing" your traffic to another provider. I'm very happy with Google Analytics.
Failing that, there's really little you can do to keep someone from gaming the system. You could limit based on IP address but then of course you run into the problem that often many unique visitors share IPs (say, via a university, organization, or work site). Cookies are very easy to clear out, so if you go that route then it's very easy to game.
One thing that's harder to get rid of is files stored in the appcache, so one possible solution that would work on modern browsers is to store a file in the appcache. You'd count the first time it was loaded in as the unique visit, and after that since it's cached they don't get counted again.
Of course, since you presumably need this to be backwards compatible then of course it leaves it open to exactly the sorts of tools which are most likely to be used for gaming the system, such as curl.
You can certainly block non-browserlike user agents, which makes it slightly more difficult if some gamers don't know about spoofing browser agent strings (which most will quickly learn).
Really, the best solution might be -- what is the outcome from a visit to a page? If it is, for example, selling a product, then don't award people who have the most page views; award the people whose hits generate the most sales. Or whatever time-consuming action someone might take at the page.
Possible solution:
If you're willing to ignore people with JavaScript disabled, you could choose to count only people who access the page and then stay on that page for a given window of time (say, 1 minute). After a given period of time, do an Ajax request back to the server. So if they tried to game by changing their cookie and loading multiple tabs at once, it wouldn't work because they'd need to have the same cookie in order to register that they'd been on that page long enough. I actually think this might work; I can't honestly see a way to game that. Basically on the server side you store a dictionary called stay_until in request.session with keys for each unique page and after 1 minute or so you run an Ajax call back to the server. If the value for stay_until[page_id] is less than or equal to the current time, then they're an active user, otherwise they're not. This means that it will take someone at least 20 minutes to generate 20 unique visitors, and so long as you make the payoff worth less than the time consumed that will be a strong disincentive.
I'd even make it more explicit: on the bottom of the page in a noscript tag, put "Your access was not counted. Turn on JavaScript to be counted" with a page that lays out the tracking process.
As HTML Requests are stateless and you have no control over the users behavior on his clientside, there is no bulletproof way.
The only way you're going to be able to track "unique" visitors in a fool-proof way is to make it contingent on some controlled factor such as a login. Anything else can and will fail to be completely accurate.

Choice of storage and caching

I hope the title is chosen well enough to ask this question.
Feel free to edit if not and please accept my apologies.
I am currently laying out an application that is interacting with the web.
Explanation of the basic flow of the program:
The user is entering a UserID into my program, which is then used to access multiple xml-files over the web:
http://example.org/user/userid/?xml=1
This file contains several ID's of products the user owns in a DRM-System. This list is then used to access stats and informations about the users interaction with the product:
http://example.org/user/appid/stats/?xml=1
This also contains links to various images which are specific to that application. And those may change at any time and need to be downloaded for display in the app.
This is where the horror starts, at least for me :D.
1.) How do I store that information on the PC of the user?
I thought about using a directory for the userid, then subfolders with the appid to cache images and the xml-files to load them on demand. I also thought about using a zipfile while using the same structure.
Or would one rather use a local db like sqlite for that?
Average Number of Applications might be around ~100-300 and stats and images per app from basically 5-700.
2.) When should I refresh the content?
The bad thing is, the website from where this data is downloaded, or rather the xmls, do not contain any timestamps when it was refreshed/changed the last time. So I would need to hash all the files and compare them in the moment the user is accessing that data, which can take an inifite amount of time, because it is webbased. Okay, there are timeouts, but I would need to block the access to the content until the data is either downloaded and processed or the timeout occurs. In both cases, the application would not be accessible for a short or maybe even long time and I want to avoid that. I could let the user do the refresh manually when he needs it, but then I hoped there are some better methods for that.
Especially with the above mentioned numbers of apps and stuff.
Thanks for reading and all of that and please feel free to ask if I forgot to explain something.
It's probably worth using a DB since it saves you messing around with file formats for structured data. Remember to delete and rebuild it from time to time (or make sure old stuff is thoroughly removed and compact it from time to time, but it's probably easier to start again, since it's just a cache).
If the web service gives you no clues when to reload, then you'll just have to decide for yourself, but do be sure to check the HTTP headers for any caching instructions as well as the XML data[*]. Decide a reasonable staleness for data (the amount of time a user spends staring at the results is a absolute minimum, since they'll see results that stale no matter what you do). Whenever you download anything, record what date/time you downloaded it. Flush old data from the cache.
To prevent long delays refreshing data, you could:
visually indicate that the data is stale, but display it anyway and replace it once you've refreshed.
allow staler data when the user has a lot of stuff visible, than you do when they're just looking at a small amount of stuff. So, you'll "do nothing" while waiting for a small amount of stuff, but not while waiting for a large amount of stuff.
run a background task that does nothing other than expiring old stuff out of the cache and reloading it. The main app always displays the best available, however old that is.
Or some combination of tactics.
[*] Come to think of it, if the web server is providing reasonable caching instructions, then it might be simplest to forget about any sort of storage or caching in your app. Just grab the XML files and display them, but grab them via a caching web proxy that you've integrated into your app. I don't know what proxies make this easy - you can compile Squid yourself (of course), but I don't know whether you can link it into another app without modifying it yourself.

Tools and tips for switching CMS

I work for a university, and in the past year we finally broke away from our static HTML site of several thousand pages and moved to a Drupal site. This obviously entails massive amounts of data entry.
What if you're already using a CMS and are switching to another one that better suits your needs? How do you minimize the mountain of data entry during such a huge change? Are there tools built for this, or some best practices one should follow?
The Migrate module for Drupal would provide a big help. The Economist.com data migration to Drupal will give you an overview of the process.
The video from the Migration: not just for the birds presentation at Drupalcon DC 2009 is probably somewhat out-of-date, but also gives a good introduction.
Expect to have to both pre-process and post-process your data manually, whatever happens. Accept early on that your data is likely to be in a worse state than you think it is: fields will be misused; record-to-record references (foreign keys) might not be implemented properly, or at all; content is likely to need weeding and occasionally to be just bad or incorrect.
Check your database encoding. Older databases won't be in Unicode encodings, and get grumpy if you have to export data dumps and import them elsewhere. Even then, assume that there'll be some wacky nonprintable characters in your data: programs like Word seem to somehow inject them everywhere, and I've seen... codepoints... you people wouldn't believe. Consider sweeping your data before you even start (or even sweeping a database dump) for these characters. Decide whether or not to junk them or try to convert them in the case of e.g. Word "smart" punctuation characters.
It's very difficult to create explicit data structures from implied one. If your incoming data has a separate date field, you can map that to a date field; if it has a date as part of a big lump of HTML, even if that date is in a tag with an id attribute, simple scripting won't work. You could use offline scripting with BeautifulSoup or (if your HTML's a bit nicer) the faster lxml to pre-process your data set, extract those implicit fields, and save them into an implicit format. Consider creating an intermediate database where these revisions are going to go.
The Migrate module is excellent, but to get really good data fidelity and play more clever tricks you might need to learn about its hook system (Drupal's terminology for functions following a particular naming scheme) and the basics of writing a module to put these hooks in (a module is broadly just a PHP file where all the functions begin with the same text, the name of the module file.)
All imported content should be flagged for at least a cursory check. You can do this by importing it with status=0 i.e. unpublished, and then create a view with the Views module to go through the content and open it in other tabs for checking. Views Bulk Operations lets you have a set of checkboxes alongside your view items, so you could approve many nodes at once.
Expect to run and re-run and re-run the import, fixing new things every time. Check ten, or twenty items, as early as possible. If there are any problems, check ten or twenty more. Fix and repeat the import.
Gauge how long a single import run is likely to take. Be pessimistic: we had an import we expected to take ten hours encounter exponential slowdown when we introduced the full data set; until we finally fixed some slow queries, it was projected to take two weeks.
If in doubt, or if you think the technical aspects of the above are just going to take more time than the work itself, then just hire temps to do the data. But you still need decent quality controls, as early as possible during their work. Drupal developers are also for hire: try your country's relevant IRC channel, or post a note in a relevant groups.drupal.org group. They're more expensive than temps but they usually write better PHP...! Consider hiring an agency too: that's a shameless plug, as I work for one, but sometimes it's best to get experts in for these specific jobs.
Really good imports are always hard, harder than you expect. Don't let it get you down!
Migrate + table wizard (and schema + views) is the way to go. With table wizard you can expose any table to drupal and map fields accordingly using migrate.
Look here for a detailed walktrough:
http://www.lullabot.com/articles/drupal-data-imports-migrate-and-table-wizard
You'll want to have an access to existing data from django. This helps me a lot with migrating: http://docs.djangoproject.com/en/1.2/howto/legacy-databases/ . With correct model definitions you'll have full django power including the admin. In fact, I'm using django just as admin backend for several legacy php projects - django's admin can easily outachieve a lot of custom hand-written admin scripts.
Authorization should remain the same. Users should be able to login with their credentials but it is hard to write a migration script for auth data because password hashing schemas may be different and there is no way to convert between them without knowing plain passwords. Django provides a way to support different sources of auth so you can write Drupal auth backend: http://docs.djangoproject.com/en/1.2/topics/auth/#writing-an-authentication-backend
There is no need to do the full rewrite. If some parts are working fine they can still be powered by Drupal. New code can written using Django with same UI. Routing between old and new parts can be performed by web server url rewriting. Both django and drupal parts can be powered by the same DB.

What would be a good Coldfusion-based bug tracking software?

What I am looking for is a tool that easily or automatically sends coldfusion error messages to their system.
Then I can use the web-based interface, to manage priorities, track who fixed what and so forth.
But I want to use this to help us deal with errors better, but also to show the importance of a bug tracking system to my fellow works.
System Requirements: Apache, Windows, Coldfusion 8 Standard, Sql Server 2005.
Financial Requirements: Free or Open Source
Goal Or Purpose: To encourage my fellow workers to want and use a bug tracking system.
Does this re-write make more sense?
Thanks
Craig
Wiki has a list of issue tracking software, maybe this list could help.
http://en.wikipedia.org/wiki/Comparison_of_issue_tracking_systems
You may be able to find a hosted service and use either email or web services to create the ticket using onError. With that said, a simple issue tracking app could be created for your site using the same DB used to drive the content. 2 or 3 tables would take care of the data storage and you're already using CF so the application layer is already there.
HTH.
I have been heavily using this type of a setup for several years by email only, and the last 3 years with a Bug Tracking Software.
I must say, the bug tracking software has made my life so much more peaceful. Nothing is left, forgotten, or slips through the cracks. It's easy to find trends in errors, and remember "all the times" it happened.
Our setup is like this:
1) Coldfusion + Appropriate framework with error reporting - It doesn't matter what you use. I have used Fusebox extensively and am making the transition to ColdBox. Both are very capable, in addition to Mach-II, FW/1, Model-Glue, etc. The key part you have to find in them is their ability to catch "onError", usualy in the application CFC.
2) Custom OnError Script - Wherever an error occurs, you want to capture the maximum amount of information about that error and email it in. What we do is, when an error occurs, we log the user out with a message of "oops, log in again". Before logging them out, the application captures the error and emails it to Fogbugz. Along with it, at the top we include the CGI variables for the IP address, browser being used, etc. Over time you will find the things you need to add.
3) Routing in Fogbugz. A 2 user version of Fogbugz is free, and hosted online. There are two main ways to submit bugs. One is to email one in at a time. So if an error happens 2000 times, you get 2000 emails, and 2000 cases. Not always the best to link them together, etc. They have a feature called BugzScout, which is essentially an HTTP address that you do a form post to with cfform with all of the same information you would have put into the email. There's plenty of documentation on this and something I've always wanted to get around to. I had a scenario of 2000 emails for the first time happen a few weeks ago so I'll be switching over to this.
Hope that helps. Share what you ended up doing and why so we all can learn too!
I'm surprised no one mentioned LighthousePro (http://lighthousepro.riaforge.org). Open source - 100% free - and ColdFusion. As the author I'm a bit biased though. :)
Hard question to answer not knowing what kind of restrictions are there? Do you have any permissions to install anything? Also most bug-tracking systems require some kind of database support.
I have a suggestion. You can put in place a basic bug-tracking system, that just allows people to create tickets, and allows you/someone else to close it.
More Windows based tools are mentioned here
Good open-source bug tracking / issue tracking sofware for Windows
Any reason why coldfusion specifically?
I really like Fogbugz from the makers of Stack Overflow. For one user it's quite reasonably priced. I enter some bugs manually and have others emailed in.
A lot of bug tracking software will expose SOAP methods for entering data into them.
For example, we used Axosoft's OnTime and that exposed some WSDL pages that I consumed in my application. I was told that Jira did as well.
There are few in CF411 list: Bug Tracking/Defect Tracking/Trouble Ticket/Help Desk Tools Written in CFML
We use HopToad. There is another bug-tracking app called LightHouse that integrates with HopToad so you can easily create a [bug] ticket from an incoming exception. HopToad has an API of which there are many clients, you want the CF based one:
http://github.com/timblair/coldfusion-hoptoad-notifier
Even if you dont use HopToad and you end up using a different service or roll your own, if you needed to write your own API client you could leverage the code or pattern(s) of the above HopToad client.
A lot of good information from everyone, and I really do appreciate the efforts given. But not the answer i was looking for. Which maybe means, that what i want does not exist, yet.
So i may have to roll my own solution...Or maybe integrate with another existing app...
Thank You all.

How to encourage non-anonymous editing on MediaWiki?

Problem
At work we have a department wiki (running Mediawiki). Unfortunately several
persons edit without logging in, and that makes it very difficult to track
down editors to ask questions about the content.
There are two strategies to improve this
encourage logged in editing
discourage anonymous editing.
Encouraging
For this part, any tips are welcome. But of course there is always risks involved
in rewarding behaviours.
Discourage
I know that this must be kept low or else it will discourage any editing.
But something just slightly annoying would be nice to have.
[update]
I know it is possible to just disallow anonymous editing, but that will put a high barrier to any first time contribution (especially for people outside our department!), so I do not think that is an option.
[/update]
[update2]
Using LDAP or Active Directory does not solve the problem since the wiki is also accessible and used by external contractors.
[/update2]
[update3]
I am no longer working for this company. That does not mean that I completely have lost interest in this question, but from my current interest point the most valuable part is the "Did you forget to log in?" part below, and I will accept answers based on this part of the question.
[/update3]
Confirmation
One thought was to have an additional confirmation step for anonymous users -
"Are you really sure you want to submit this anonymously?", although with
such a question there is a risk that people will give up or resist editing. However,
if that question is re-phrased in a more diplomatic way as "Did you forget
to log in?" I think it will appear as much more acceptable. And besides that
will also capture those situations where the author did in fact forget to
log in, but actually would want to have his/her contributions credited
his/her user. This last point is by itself a good enough reason for wanting it.
Is this possible?
Delay
Another thought for something to be slightly annoying is to add an extra
forced delay after "save page" displaying something like "If you had logged
in you would not have to wait x seconds". Selecting a right x is difficult
because if it is to high it will be a barrier and if it too low might not
make any difference. But then I started thinking, what about starting at
zero and then add one second delay for each anonymous edit by a given IP
address in a given time frame? That way there will be no barrier for
starting to use the wiki, and by the time the delay is getting significant
the user has already contributed a lot so I think the outcome is much
more likely to be that the editor eventually creates a user rather than
giving up. This assumes IP addresses are rather static, but that is very
typically is the case in a business network.
Is this possible?
You can Turn off Anonymous Editing in Mediawiki like so:
Edit LocalSettings.php and add the following setting:
$wgDisableAnonEdit = true;
Edit includes/SkinTemplate.php, find $fname-edit and change the code to look like this (i.e., basically wrap the following code between the wfProfileIn() and wfProfileOut() functions):
wfProfileIn( "$fname-edit" );
global $wgDisableAnonEdit;
if ( $wgUser->mId || !$wgDisableAnonEdit) {
// Leave this as is
}
wfProfileOut( "$fname-edit" );
Next, you may want to disable the [Edit] links on sections. To do this, open includes/Skin.php and search for editsection. You will see something like:
if (!$wgUser->getOption( 'editsection' ) ) {
Change that to:
global $wgDisableAnonEdit;
if (!$wgUser->getOption( 'editsection' ) || !$wgDisableAnonEdit ) {
Section editing is now blocked for anonymous users.
Forbid anonymous editing and let people log in using their domain logins (LDAP). Often the threshold is the registering of a new user and making up username and password and such.
I think you should discourage anonymous edits by forbidding them - it's an internal wiki, after all.
The flipside is you must make the login process as easy as possible. Hopefully you can configure the login cookie to have a decent length (like 1 month) so they only need to login once per month.
Play to the people's egos, and add a rep system kind of like here. Just make a widget for the home page that shows the number of edits made by the top 5 users or something. Give the top 1 or 2 users a MVP reward at regular (monthly?) intervals.
Well, I doubt that this solution will be valuable for hlovdal, given that this question is now two months old, but maybe somebody else will find it useful:
The optimum solution to this problem is to enable automatic logins. This requires two steps. First, you need to add automatic authentication to your web service. Right now, we're using Apache with the Debian usn-libapache2-authenntlm-perl package on our internal application server*. (Our network is Active Directory and, obviously, the server runs on Debian Linux.) Second, you need a MediaWiki extension that makes MediaWiki aware of the web service's authentication. I've used the Automatic REMOTE_USER Authentication module successfully on an Apache web server that was tied into our network via an NTLM authentication module, but I do recall that it required a bit of massaging the code to make it work:
I had to follow the "horrid hacks" given on the extension's page, changing the setPassword() and addUser() functions to always return true instead of always returning false.
Since Active Directory is case-insensitive and MediaWiki isn't, I replaced both instances of the statement $username = $_SERVER['REMOTE_USER'] with $username = getCanonicalName($_SERVER['REMOTE_USER']).
Since I wanted to only allow certain people within the company to use our wiki, I set autoCreate() to always return false. It doesn't sound as if you need to worry about this, so you should leave autoCreate() at always returning true, which means that anybody on your company network will be able to access the wiki.
The nifty thing about this solution is that nobody has to log in into the wiki, ever; they simply go to a wiki page and they are logged in under their network ID.
* We just switched to this from a Red Hat server that was using mod_ntlm. Unfortunately, mod_ntlm hasn't been updated in a while and it's been starting to sporadically fail. I mention this because I've started to stumble on a performance issue with our current MediaWiki configuration that may require further code massaging....
Make sure users don't get logged out if they look away from the screen or sneeze or scratch their head. You want long, persistent, sessions. Once logged in, stay logged in.
That's the problem with the MediaWiki our company is using internally - you log in, do stuff, then come back later and it logged you out, but the notification of not being logged in anymore is so insignificant on the screen that the user never notices.
If this runs within an internal network, you could pull Active Directory information so that no one has to log in, ever. That's how I do it at work. That is, if they are logged into their windows machine, then my webapps can pick up their username and associate that (or their userid) with their edits.
I don't know if this would be easy to add to MediaWiki, though.
I'd recommend checking out wikipatterns.org - a great site about the social aspects of wikis
Explicitly using some form of directory service (LDAP) would probably be a good idea, so that your users are always fully identified. On the other hand, wikis are subject to their own dynamics, in fact some wikis are so successful because they can be anonymously edited, so that's another thing to keep in mind.
Apart from that, personally I'd try to create some sort of incentive for users to contribute openly and identifiable: this could be based on a point/score system so that there are stats shown for all users who have contributed to the wiki each day, this could possibly even create some sort of competition.
Likewise, the wiki could by default not show any anonymously contributed contents without them being reviewed first, which would be another incentive for users to contribute openly.
SO has an extremely low barrier for posting. You could allow people to specify their name when making an edit. When they are ready, they can finally log in to avoid having to type their name all the time.
You said this is in a departmental situation. Can't you add a feature to the wiki where it makes an educated guess as to who is editing based on the IP address, and annotates the edit accordingly?
I agree absolutely with everyone who recommends carefully researching the effects of anonymity in your application before you start "forbidding" it. In a great many cases people prefer anonymous editing because they DO NOT WANT TO BE ASKED ABOUT IT, IDENTIFIED WITH IT, OR SUFFER SOME PROBLEM FOR POINTING IT OUT. You need to be VERY sure these factors are not driving users to prefer anonymous edits, and frankly you should continue to allow anonymized edits with a generic credential login like "anonymous_employee" or "anonymous_contractor", in case someone wants to point out an issue without becoming identified with it.
Re the "thought... to have an additional confirmation step for anonymous users- "Are you really sure you want to submit this anonymously?", it's a good idea, but do not "re-phrase" in a way that suggests it is wrong to not be logged in as yourself, i.e. don't say "Did you forget to log in?" I'd instead note it this way:
"Your edit will appear as an IP number - it may be attributed to 'anonymous_employee' or 'anonymous_contractor' or 'anonymous_contributor' for your privacy protection. You will not be notified of any answer or response to it. If you prefer to have this contribution credited, then [log in right now]."
That leaves it absolutely clear what will happen, doesn't pressure anyone to do it either way, and does not bias what is being contributed with some "rewards".
You can also, alternately, force a login via LDAP / cookies, and then ask them if they prefer this edit to be anonymous. That is the approach taken on some blog platforms. In an intranet the abuse potential for this is basically zero, so you would presumably only have situations where someone didn't want 'how they knew' or 'why they raised this' to be the question rather than the data itself... IBM has shown in some careful research that anonymized feedback is very much more useful than attributed in correcting groupthink & management blind sides.