Polybase - is it possible to truncate over length fields - azure-sqldw

I've defined my external table definition, but I have one field that could theoretically contain a very long string that I dont care about.
Rather than error out, is it possible to set polybase to truncate long fields that extend over the specified length?

Today it is not possible to truncate during load. It's a great suggestion. You can file a formal request here:
https://feedback.azure.com/forums/307516-sql-data-warehouse
This is where we capture feedback from the public. Thanks.

Related

Quicksight breaking up strings for use of all aspects

I was wondering if anyone has every had experience with breaking a string up in quicksight and using certain aspects of the string. My example is a data set that returns tags like this "animals|funny|dog-park" I have used "split(tags,'|',1)" but then all that gets returned is the first part(animals). I have also tried a combination of ifelse->locate->split with no luck. Is there a way to split these tags to where they are all usable (animals) & (funny) or (funny) & (dog-park), etc.? Say the article associated will then be broken up into one tag but also another separately? I know this will end up being a calculated field most likely. Thank you in advance!
Since QuickSight does not support any form of nested fields (including objects and list) in analysis, you need to normalise this into separate rows before feeding the data to QuickSight.
Otherwise, if you leave it as is, you would be limited to filtering using string contains and doing string lookup in calculated fields - nevertheless you would not be able to use these tags as categories (such as in colours field well of visuals).

How to ask for 627 fields to Marketo's APIs

I am new in these type of access. I need to retrieve lead changes from Marketo, with about 627 attribute names. But the field list in the endpoint cannot be empty, and cannot contain such number of attributes.
Somebody know how to solve this? What is the maximum number of fields allowed in the endpoint?
There's no explicit limit. What are you encountering that makes you think you can't retrieve all of the fields?
make sure to use POST and pass in _method="GET" as a parameter. If you use GET you will run into URL size limits when you specify that many fields.

About bitwise data

I am not sure if it relates to "bitwise". I store a music file provided with different format. eg. MP3, WAV, midi... It needs to store the provided type in the DB. One of the solution is to create individual db fields/columns for each format. eg withMP3, withWav, withMidi... But once I add one more format, I need to create an extra column.
Is there any standard solution to store the format to one field? For example first digit store with mp3, second digit store with wav... Once I add one more file format, it just needs to append one more bit to the data, no need to add new column. I am not sure this question related to any aspect. Hope that someone can help me.
Many thanks!!
Turn that data into its own table (id, format, blob) then you can associate them with the rows in the other table via another table. That way the schema is independent of the number of formats.
I'm not sure why you try to store this information as fields. I would just store the mime type, that is normally enough information for a normal database.

Overcoming Exclude List Limit Size

I'm trying to make a query using Django's Exclude() and passing to it a list, as in:
(...).exclude(id__in=list(top_vip_deals_filter))
The problem is that, apparently, there is a Limit -- depending on your database --on the size of the list being passed.
Is this correct?
If so, How to overcome this?
If not, is there some explanation to the fact that queries silently fail when the list size is big?
Thanks
If the top_vip_deals_filter comes from the database, you can set an extra where in the query:
(...).extra(where=['model.id not in select blah blah'])
(put your lowercase model name instead of model.)
You can do better if the data model allows you to. If you can do it in SQL, you probably can do it in django.

How I can encode/escape a varchar to be more secure without using cfqueryparam?

How I can encode/escape a varchar to be more secure without using cfqueryparam? I want to implement the same behaviour without using <cfqueryparam> to get around "Too many parameters were provided in this RPC request. The maximum is 2100" problem. See: http://www.bennadel.com/blog/1112-Incoming-Tabular-Data-Stream-Remote-Procedure-Call-Is-Incorrect.htm
Update:
I want the validation / security part, without generating a prepared-statement.
What's the strongest encode/escape I can do to a varchar inside <cfquery>?
Something similar to mysql_real_escape_string() maybe?
As others have said, that length-related error originates at a deeper level, not within the queryparam tag. And it offers some valuable protection and therefore exists for a reason.
You could always either insert those values into a temporary table and join against that one or use the list functions to split that huge list into several smaller lists which are then used separately.
SELECT name ,
..... ,
createDate
FROM somewhere
WHERE (someColumn IN (a,b,c,d,e)
OR someColumn IN (f,g,h,i,j)
OR someColumn IN (.........));
cfqueryparam performs multiple functions.
It verifies the datatype. If you say integer, it makes sure there is an integrer, and if not, it does nto allow it to pass
It separates the data of a SQL script from the executable code (this is where you get protection from SQL injection). Anything passed as a param cannot be executed.
It creates bind variables at the DB engine level to help improve performance.
That is how I understand cfqueryparam to work. Did you look into the option of making several small calls vs one large one?
It is a security issue. Stops SQL injections
Adobe recommends that you use the cfqueryparam tag within every cfquery tag, to help secure your databases from unauthorized users. For more information, see Security Bulletin ASB99-04, "Multiple SQL Statements in Dynamic Queries," at www.adobe.com/devnet/security/security_zone/asb99-04.html, and "Accessing and Retrieving Data" in the ColdFusion Developer's Guide.
The first thing I'd be asking myself is "how the heck did I end up with more than 2100 params in a single query?". Because that in itself should be a very very big red flag to you.
However if you're stuck with that (either due to it being outwith your control, or outwith your motivation levels to address ;-), then I'd consider:
the temporary table idea mentioned earlier
for values over a certain length just chop 'em in half and join 'em back together with a string concatenator, eg:
*
SELECT *
FROM tbl
WHERE col IN ('a', ';DROP DATABAS'+'E all_my_data', 'good', 'etc' [...])
That's a bit grim, but then again your entire query sounds grim, so that might not be such a concern.
param values that are over a certain length or have stop words in them or something. This is also quite a grim suggestion.
SERIOUSLY go back over your requirement and see if there's a way to not need 2100+ params. What is it you're actually needing to do that requires all this???
The problem does not reside with cfqueryparam, but with MsSQL itself :
Every SQL batch has to fit in the Batch Size Limit: 65,536 * Network Packet Size.
Maximum size for a SQL Server Query? IN clause? Is there a Better Approach
And
http://msdn.microsoft.com/en-us/library/ms143432.aspx
The few times that I have come across this problem I have been able to rewrite the query using subselects and/or table joins. I suggest trying to rewrite the query like this in order to avoid the parameter max.
If it is impossible to rewrite (e.g. all of the multiple parameters are coming from an external source) you will need to validate the data yourself. I have used the following regex in order to perform a safe validation:
<cfif ReFindNoCase("[^a-z0-9_\ \,\.]",arguments.InputText) IS NOT 0>
<cfthrow type="Application" message="Invalid characters detected">
</cfif>
The code will force an error if any special character other than a comma, underscore, or period is found in a text string. (You may want to handle the situation cleaner than just throwing an error.) I suggest you modify this as necessary based on the expected or allowed values in the fields you are validating. If you are validating a string of comma separated integers you may switch to use a more limiting regex like "[^0-9\ \,]" which will only allow numbers, commas, and spaces.
This answer will not escape the characters, it will not allow them in the first place. It should be used on any data that you will not use with <cfqueryparam>. Personally, I have only found a need for this when I use a dynamic sort field; not all databases will allow you to use bind variables with the ORDER BY clause.