Access Table - Expression Builder unexpected results - if-statement

I have a huge CSV data file that generates 500,000+ rows and 70+ columns, running Excel queries over this much data causes my desktop to crash.
As an alternative i've managed to import the CSV into Access.
The majority of the data fields i need to review/consider within further calculations i've imported as "double" field type.
I guess the first question is should i use single rather than double? The values i am considering will only ever report to 2 decimal places.
Within the imported table i've created some new columns, as i need to validate that the sum of underlying values equals the totals reported.
A sum of 5 underlying columns (called SUMofService)
[Ancillary Costs] + [Incidental Costs] + [One-Off Costs] + [Ongoing Costs] + [Transaction Costs]
I've not reviewed all 500,000 rows, but this formula seems to be summing the values correctly.
Using this value i've then created a new column to compare this total to the total in the report
IIF([SUMofService] = [Total Service],"Match","No Match")
This also seems to work as expected, but there are instances where this field returns a false.
Looking at the underlying numbers in [SUMofService] and [TotalService] they match, so i am confused as to why i am seeing the false results.
Could anyone review what i've detailed, and perhaps provide a steer as to whether i've considered something incorrectly.
There are probably better ways to achieve what i'm trying to do, but i haven't really used Access since school and you forget quite a bit in 30 years!!
Any responses are much appreciated - i've googled this as much as i can, but not 100% what to ask, and some responses are so far beyond my level of thinking.

should I use single rather than double?
The values I am considering will only ever report to 2 decimal places.
Neither. Use Currency.
That will also provide correct results for:
IIF([SUMofService] = [Total Service],"Match","No Match")
Using Double or, indeed, Single will cause floating point errors - as in this classic example:
? 10.1 - 10.0
9.99999999999996E-02
' thus:
? 10.1 - 10.0 = 0.1
False

Related

Getting huge number after using simple subtraction in DAX measure

I'm kind of new to DAX and I'm basically learning as I'm using it in my work. We are building reports in PowerBI and we have data model that gets data from Oracle database. So I'm using DAX to create measures in this data model.
I need to substract 2 numbers from each other. So I created simple measure which looked like this:
[MEASURE1] - [MEASURE2]
Whether it works or it doesn't depends on my Period filter which uses another table. I don't know how could period be related to any of this. So when I change filter to some values, I get normal number. However, when I switch it to different values, I get numbers like 2,27483058473905E-13.
Weird thing is that if I check those two measures that I'm subtracting, they have exactly the same numbers, so the difference should be 0.
I know this is not the best explanation, but it is impossible to describe entire data model here. So I'm just looking for some ideas what could possibly be causing this and what should I check.
I have literally no idea what could be causing this.
Floating point precision.
Either use fixed decimal data types, specify the format string of the measure, or wrap your measure in ROUND, e.g.:
Diff =
ROUND (
[Measure 1] - [Measure 2] ,
2
)
2,27483058473905E-13 is not a huge number, but as close as a decimal calculator can get to zero.

AWS Quicksight calculated fields gives incorrect result for simple division

I have a dataset with fields targeted and opens and I need to add calculated field opens per targeted which essentially means doing simple devision of those 2 values.
My calculated field is as follows
{opens}/{targeted}
but then displaying simple table with values they are completely incorrect
If I try any other operator like + * etc calculations are correct.
I'm completely out of ideas on how to debug this. I've simplified the dataset to just columns of targeted and opens, can't get any simpler.
Had the same problem, I fixed it by wrapping the columns with the sum() function. Like this:
sum({opens})/sum({targeted})
I think you need to make AWS understand that you are working with float numbers.
1.0*{opens}/{targeted}
if still not working try also
(1.0*{opens})/({targeted}*1.0)
it should give you the desired output (not tested, let me know if it doesnt work)

AWS DynamoDB Scan Filter Expression Returning Empty

I'm stumped trying to figure out why my scan won't return anything but [ ]. Here are my scan params:
var params = {
TableName: tableName,
FilterExpression: "#wager = :wager",
ExpressionAttributeNames: {
"#wager": "wager"
},
ExpressionAttributeValues: {
":wager": wager
}
};
My DynamoDB table works perfectly when I run a filter expression in the DynamoDB dashboard, like "wager [NUMBER] = 0.001".
jellycsc and Seth Geoghegan already mentioned the two most likely explanations in comments:
First, make sure you do not call a single Scan operation, but rather do a loop to get all the pages of the scan result. The specific way to do this depends on which programing language you are using. When your filter leaves only a small subset of the results (e.g., only when wage is exactly 0.001) remember to read all the pages is critical, because the first page may be empty: DynamoDB might have read 1MB of items (the default page size), and none of them matched wager=0.001 so an empty first page is return.
Second, the wager might have a wrong type. Obviously, if you store numbers but search for a string, nothing would match, so check you didn't do that. But a more subtle problem can be how you store numbers. DynamoDB holds floating-point in an unusual manner - using decimal, not binary, digits. This means that DynamoDB can hold the number "0.001" precisely, without any rounding errors. The same cannot be said for most programming languages. On my machine, if I set a "double" variable to 0.001, the result is 0.0010000000000000000208. If I pass this to DynamoDB, the equality check will not match! This means you should make sure that wager variable is not a double. In Python, for example, wager should be set to Decimal("0.001") - note how it is constructed from the string "0.001", not from the floating-point 0.001 which already has rounding errors.
thanks for the ideas everyone. it did indeed turn out to be a type issue - all I had to do was cast "wager" as
wager = Number(wager);
ahead of setting the scan params (the same params I have in the question worked).

Weka Resample to balance instances in binary dataset

I've only been using Weka for a couple of weeks but I am absolutely blown away by how great it is!
But I have a question, I have a dataset with a target column which is either True or False.
6709 instances in my dataset are True
25318 instances are False.
I want to randomly add duplicates of my True instances to produce a new dataset with 25318 True and 25318 False.
The only filter I can find which does this is the supervised Resample filter however I am having trouble understanding what parameters I should use.
(there might be a better filter to do what I want)
I've got some success with these parameters
biasToUniformClass = 1.0
invertSelection = False
noReplacement = False
randomSeed = 1
sampleSizePercent = 157.5 (a magic number I've arrived at by trial and error)
This produces 25277 True and 25165 False. Not exactly what I want, but quite close.
The problem is that I cant figure out how to arrive at the magic number. I'm also not getting exactly the numbers of instances that I really want.
Is there a better filter for this purpose?
If not, is there a way to calculate the sampleSizePercent magic number?
Any help is greatly appreciated :)
Supplemental question, am I best to run NominalToBinary on my boolean columns to ensure they are Binary? I'm using a NaiveBayes classifier (at the moment) and I don't have any missing instances.
Jason
I think the tricky part of this question is getting a perfect balance using the Resample Filter. This is because, as it is stated in the description, it 'Produces a random sub-sample of a dataset using either sampling with replacement or without replacement'. If these cases are being drawn randomly, there is no guarantee that you will get an equal measure between the two classes.
As for the magic number, this would be associated with the total number of cases that you would like to have when the filter is applied. In your case, it would be 50636 instead of 32027. In this case, your magic number would be 50636 / 32027 = 1.581. However, as stated above, you may not get an exact match of true and false cases.
If you really need an exact figure, you could use your favourite spreadsheet and preprocess the data. One possible method is to randomise the true cases (in a separate column), sort and copy all of the cases until the number matches the false one. It's not an automated solution, and the solution is outside of Weka, but I have used this method before and does the job reasonably quickly.
Hope this Helps!

Sqlite max columns number configuration from QT

I want to store rows that have 65536 columns in a Sqlite database, and I am doing that using C++ and QT.
My question is: Since the default maximum number of columns seems to be 2000 no more, how to configure this parameter from C++ and Qt?
Thank you.
The SQLLite homepage has some explanation on this:
2.Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound (...)
and
The default setting for SQLITE_MAX_COLUMN is 2000. You can change it
at compile time to values as large as 32767. On the other hand, many
experienced database designers will argue that a well-normalized
database will never need more than 100 columns in a table.
Like that, even if you increased it, you could only achieve half of what you want. Apart from that I can only refer to Styne666's comment on your post.