I have an entire set of data i want to insert into a table. I am trying to have it insert/update everything OR rollback. I was going to do it in a transaction, but i wasnt sure if the sql_exec() command did the same thing.
My goal was to iterate through the list.
Select from each iteration based on the Primary Key.
If result was found:
append update to string;
else
append insert to string;
Then after iterating through the loop, i would have a giant string and say:
sql_exec(string);
sql_close(db);
Is that how i should do it? I was going to do it on each iteration of the loop, but i didnt think a global rollback if there was an error.
No, you should not append everything into a giant string. If you do, you will need to allocate a whole bunch of memory as you are going, and it will be harder to create good error messages for each individual statement, as you will just get a single error for the entire string. Why spend all of that effort, constructing one big string when SQLite is just going to have to parse it back down into its individual statements again?
Instead, as #Chad suggests, you should just use sqlite3_exec() on a BEGIN statement, which will begin a transaction. Then sqlite3_exec() each statement in turn, and finally sqlite3_exec() a COMMIT or ROLLBACK depending on how everything goes. The BEGIN statement will start a transaction, and all of the statements executed after that will be within that transaction, and so committed or rolled back together. That's what the "A" in ACID stands for; Atomic, as all of the statements in the transaction will be committed or rolled back as if they were a single atomic operation.
Furthermore, you probably shouldn't use sqlite3_exec() if some of the data varies within each statement, such as being read from a file. If you do, a mistake could easily leave you with an SQL injection bug. For instance, if you construct your query by appending strings, and you have strings like char *str = "it's a string" to insert, if you don't quote it properly, your statement could come out like INSERT INTO table VALUES ('it's a string');, which will be an error. Or if someone malicious could write data into this file, then they could cause you to execute any SQL statement they want (imagine if the string were "'); DROP TABLE my_important_table; --"). You may think that no one malicious is going to provide input, but you can still have accidental problems, if someone puts a character that confuses the SQL parser into a string.
Instead, you should use sqlite3_prepare_v2() and sqlite3_bind_...() (where ... is the type, like int or double or text). In order to do this, you use a statement like char *query = "INSERT INTO table VALUES (?)", where you substitute a ? for where you want your parameter to go, prepare it using sqlite3_prepare_v2(db, query, -1, &stmt, NULL), bind the parameter using sqlite3_bind_text(stmt, 1, str, -1, SQLITE_STATIC), then execute the statement with sqlite3_step(stmt). If the statement returns any data, you will get SQLITE_ROW, and can access the data using the various sqlite3_columne_...() functions. Be sure to read the documentation carefully; some of the example parameters I gave may need to change depending on how you use this.
Yes, this is a bit more of a pain than calling sqlite3_exec(), but if your query has any data loaded from external sources (files, user input), this is the only way to do it correctly. sqlite3_exec() is fine to call if the entire text of the query is contained within your source, such as the BEGIN and COMMIT or ROLLBACK statements, or pre-written queries with no parts coming from outside of your program, you just need prepare/bind if there's any chance that an unexpected string could get in.
Finally, you don't need to query whether something is in the database already, and then insert or update it. You can do a INSERT OR REPLACE query, which will either insert a record, or replace one with a matching primary key, which is the equivalent of selecting and then doing an INSERT or an UPDATE, but much quicker and simpler. See the INSERT and "on conflict" documentation for more details.
Related
while writing code we can either use select statement or select field list or find method on table for fetching the records.
I wonder which of the statement helps in better performance
It really depends on what you actually need.
find() methods must return the whole table buffer, that means, all of the columns are projected into the buffer returned by it, so you have the complete record selected. But sometimes you only need a single column, or just a few. In such cases it can be a waste to select the whole record, since you won't use the columns selected anyway.
So if you're dealing with a table that has lots of columns and you only need a few of them, consider writing a specific select statement for that, listing the columns you need.
Also, keep in mind that select statements that only project a few columns should not be made public. That means that you should NOT extract such statements into a method, because imagine the surprise of someone consuming that method and trying to figure out why column X was empty...
You can look at the find() method on the table and find out the same 'select'-statement there.
It can be the same 'select; statement as your own an the performance will be the same in this case.
And it can be different select statement then your own and the performance will be depend on indexes on the table, select statement, collected statistics and so on.
But there is no magic here. All of them is just select statement - no matter which method do you use.
I am having an array of structure. I need to insert all the rows from that array to a table.
So I have simply used cfquery inside cfloop to insert into the database.
Some people suggested me not to use cfquery inside cfloop as each time it will make a new connection to the database.
But in my case Is there any way I can do this without using cfloop inside cfquery?
Its not so much about maintaining connections as hitting the server with 'n' requests to insert or update data for every iteration in the cfloop. This will seem ok with a test of a few records, but then when you throw it into production and your client pushes your application to look around a couple of hundred rows then you're going to hit the database server a couple of hundred times as well.
As Scott suggests you should see about looping around to build a single query rather than the multiple hits to the database. Looping around inside the cfquery has the benefit that you can use cfqueryparam, but if you can trust the data ie. it has already been sanatised, you might find it easier to use something like cfsavecontent to build up your query and output the string inside the cfquery at the end.
I have used both the query inside loop and loop inside query method. While having the loop inside the query is theoretically faster, it is not always the case. You have to try each method and see what works best in your situation.
Here is the syntax for loop inside query, using oracle for the sake of picking a database.
insert into table
(field1, field2, etc)
select null, null, etc
from dual
where 1 = 2
<cfloop>
union
select <cfqueryparam value="#value1#">
, <cfqueryparam value="#value2#">
etc
from dual
</cfloop>
Depending on the database, convert your array of structures to XML, then pass that as a single parameter to a stored procedure.
In the stored procedure, do an INSERT INTO SELECT, where the SELECT statement selects data from the XML packet. You could insert hundreds or thousands of records with a single INSERT statement this way.
Here's an example.
There is a limit to how many <CFQUERY><cfloop>... iterations you can do when using <cfqueryparam>. This is also vendor specific. If you do not know how many records you will be generating, it is best to remove <cfqueryparam>, if it is safe to do so. Make sure your data is coming from trusted sources & is sanitised. This approach can save huge amounts of processing time, because it is only make one call to the database server, unlike an outer loop.
How I can encode/escape a varchar to be more secure without using cfqueryparam? I want to implement the same behaviour without using <cfqueryparam> to get around "Too many parameters were provided in this RPC request. The maximum is 2100" problem. See: http://www.bennadel.com/blog/1112-Incoming-Tabular-Data-Stream-Remote-Procedure-Call-Is-Incorrect.htm
Update:
I want the validation / security part, without generating a prepared-statement.
What's the strongest encode/escape I can do to a varchar inside <cfquery>?
Something similar to mysql_real_escape_string() maybe?
As others have said, that length-related error originates at a deeper level, not within the queryparam tag. And it offers some valuable protection and therefore exists for a reason.
You could always either insert those values into a temporary table and join against that one or use the list functions to split that huge list into several smaller lists which are then used separately.
SELECT name ,
..... ,
createDate
FROM somewhere
WHERE (someColumn IN (a,b,c,d,e)
OR someColumn IN (f,g,h,i,j)
OR someColumn IN (.........));
cfqueryparam performs multiple functions.
It verifies the datatype. If you say integer, it makes sure there is an integrer, and if not, it does nto allow it to pass
It separates the data of a SQL script from the executable code (this is where you get protection from SQL injection). Anything passed as a param cannot be executed.
It creates bind variables at the DB engine level to help improve performance.
That is how I understand cfqueryparam to work. Did you look into the option of making several small calls vs one large one?
It is a security issue. Stops SQL injections
Adobe recommends that you use the cfqueryparam tag within every cfquery tag, to help secure your databases from unauthorized users. For more information, see Security Bulletin ASB99-04, "Multiple SQL Statements in Dynamic Queries," at www.adobe.com/devnet/security/security_zone/asb99-04.html, and "Accessing and Retrieving Data" in the ColdFusion Developer's Guide.
The first thing I'd be asking myself is "how the heck did I end up with more than 2100 params in a single query?". Because that in itself should be a very very big red flag to you.
However if you're stuck with that (either due to it being outwith your control, or outwith your motivation levels to address ;-), then I'd consider:
the temporary table idea mentioned earlier
for values over a certain length just chop 'em in half and join 'em back together with a string concatenator, eg:
*
SELECT *
FROM tbl
WHERE col IN ('a', ';DROP DATABAS'+'E all_my_data', 'good', 'etc' [...])
That's a bit grim, but then again your entire query sounds grim, so that might not be such a concern.
param values that are over a certain length or have stop words in them or something. This is also quite a grim suggestion.
SERIOUSLY go back over your requirement and see if there's a way to not need 2100+ params. What is it you're actually needing to do that requires all this???
The problem does not reside with cfqueryparam, but with MsSQL itself :
Every SQL batch has to fit in the Batch Size Limit: 65,536 * Network Packet Size.
Maximum size for a SQL Server Query? IN clause? Is there a Better Approach
And
http://msdn.microsoft.com/en-us/library/ms143432.aspx
The few times that I have come across this problem I have been able to rewrite the query using subselects and/or table joins. I suggest trying to rewrite the query like this in order to avoid the parameter max.
If it is impossible to rewrite (e.g. all of the multiple parameters are coming from an external source) you will need to validate the data yourself. I have used the following regex in order to perform a safe validation:
<cfif ReFindNoCase("[^a-z0-9_\ \,\.]",arguments.InputText) IS NOT 0>
<cfthrow type="Application" message="Invalid characters detected">
</cfif>
The code will force an error if any special character other than a comma, underscore, or period is found in a text string. (You may want to handle the situation cleaner than just throwing an error.) I suggest you modify this as necessary based on the expected or allowed values in the fields you are validating. If you are validating a string of comma separated integers you may switch to use a more limiting regex like "[^0-9\ \,]" which will only allow numbers, commas, and spaces.
This answer will not escape the characters, it will not allow them in the first place. It should be used on any data that you will not use with <cfqueryparam>. Personally, I have only found a need for this when I use a dynamic sort field; not all databases will allow you to use bind variables with the ORDER BY clause.
I am trying to insert a large number of records into a SQLite database. I get the above error if I try to use the sqlite3_exec C-API.
The code looks like this:
ret = sqlite_exec(db_p,".import file.txt table", NULL, NULL, NULL);
I know that the .import is command line, but can there be any way that you can do a extremely large insert of records that takes minimal time. I have read through previous bulk insert code and attempted to make changes but these are not providing the desired results.
Is there not a way to directly insert the string into the tables without having intermediate API's being called?
.import is most probably not available via the API. However there's one crucial thing to speed up inserts: wrap them in a transaction.
BEGIN;
lots of insert statements here;
COMMIT;
Without this, sqlite will need to write to the file after each insert to keep the ACID principle. The transaction let's it write to file later in bulk.
The answer to the syntax error could well be, that your strings are not enclosed in quotes in your SQL statement.
I'm using sqlite3_bind_text to bind text parameters to my queries, with the SQLITE_STATIC flag, since I know the text pointer remains valid at least up until the query is executed.
Recently I've made changes so that the queries are executed in the transaction mode (many such queries in a single transaction). Should the text buffer remain valid up until the transaction is finished?
I mean, my text buffers are valid for the duration of a single query, but no the whole transaction. Should I specify the SQLITE_TRANSIENT flag?
Yes, if you're using SQLITE_STATIC, you should leave the contents alone until after the transaction is finished. Even more so, you should leave the contents alone until you've either rebound the parameter to something else or until you've freed the statement.
SQLITE_TRANSIENT requests that Sqlite make an internal copy of the string which it will manage appropriately. Given your description, this is probably what you should use. Otherwise, you'll have to manage your own copy of each string for each statement.