Sqlite max columns number configuration from QT - c++

I want to store rows that have 65536 columns in a Sqlite database, and I am doing that using C++ and QT.
My question is: Since the default maximum number of columns seems to be 2000 no more, how to configure this parameter from C++ and Qt?
Thank you.

The SQLLite homepage has some explanation on this:
2.Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound (...)
and
The default setting for SQLITE_MAX_COLUMN is 2000. You can change it
at compile time to values as large as 32767. On the other hand, many
experienced database designers will argue that a well-normalized
database will never need more than 100 columns in a table.
Like that, even if you increased it, you could only achieve half of what you want. Apart from that I can only refer to Styne666's comment on your post.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

Application for filtering database for the short period of time

I need to create an application that would allow me to get phone numbers of users with specific conditions as fast as possible. For example we've got 4 columns in sql table(region, income, age [and 4th with the phone number itself]). I want to get phone numbers from the table with specific region and income. Just make a sql query won't help because it takes significant amount of time. Database updates 1 time per day and I have some time to prepare data as I wish.
The question is: How would you make the process of getting phone numbers with specific conditions as fast as possible. O(1) in the best scenario. Consider storing values from sql table in RAM for the fastest access.
I came up with the following idea:
For each phone number create smth like a bitset. 0 if the particular condition is false and 1 if the condition is true. But I'm not sure I can implement it for columns with not boolean values.
Create a vector with phone numbers.
Create a vector with phone numbers' bitsets.
To get phone numbers - iterate for the 2nd vector and compare bitsets with required one.
It's not O(1) at all. And I still don't know what to do about not boolean columns. I thought maybe it's possible to do something good with std::unordered_map (all phone numbers are unique) or improve my idea with vector and masks.
P.s. SQL table consumes 4GB of memory and I can store up to 8GB in RAM. The're 500 columns.
I want to get phone numbers from the table with specific region and income.
You would create indexes in the database on (region, income). Let the database do the work.
If you really want it to be fast I think you should consider ElasticSearch. Think of every phone in the DB as a doc with properties (your columns).
You will need to reindex the table once a day (or in realtime) but when it's time to search you just use the filter of ElasticSearch to find the results.
Another option is to have an index for every column. In this case the engine will do an Index Merge to increase performance. I would also consider using MEMORY Tables. In case you write to this table - consider having a read replica just for reads.
To optimize your table - save your queries somewhere and add index(for multiple columns) just for the top X popular searches depends on your memory limitations.
You can use use NVME as your DB disk (if you can't load it to memory)

is it possible to determine the max length of a field in a csv file using regex?

This has been discussed in stackoverflow before but I couldn't find a case/answer that might apply to my situation:
From time to time I have raw data in text to be imported into SQL, for almost every case I must try out several times as SSIS wizard doesn't know what's the max size of each field and the default is 50 characters. Only after it fails I can know from the error message which (first) field was truncated and I then increase the field's size.
There might be more than one field that needs getting its size increased, and the SSIS wizard only gives one error each time it encounters a truncate, as you can see this is very tedious, I want to find a way to have a quick inspect to the data first to determine the max size of each field.
I came across an old post on stackoverflow: Here is the post
Unfortunately it might not work on my case: my raw data could have as many rows as 10 Million (yes, in one single text file which is over GB).
I am kind of do not think there would be a way to get that, but just still want to post my question here hoping to get some clue.
Thank you very much.

Most efficient way to process complex histogram data?

I'm currently implementing a histogram that will show a very large scale data using Qt and I have some doubts about which data structure(s) I should be using for my problem. I will be displaying the amount of queries received from users of an application and the way I should display is as follows -in a single application that will show different histograms upon clicking different "show me this data etc." buttons-
1) Display the histogram of total queries per every month -4 months of data here, I
kept four variables and incremented them as I caught queries belonging to those months
in the CSV file-
2) Display the histogram of total queries per every single day in a selected month -I was thinking of using 4 QVectors to represent the months for this one, incrementing every element of the vector (day), as I come by that specific day -e.g. the vector represents the month of August and whenever I come across a data with 2011-08-XY , I will increment the (XY + 1)th element of that vector by 1- my second alternative is to use 4 QLinkedList's for the sake of better complexity but I'm not sure if the ways I've come up with are efficient enough and I'm willing to listen to any other idea.
3) Here's where things get a bit complicated. Display the histogram of total queries per every hour in a selected day and month. The data represented is multiplied in a vast manner and I don't know which data structure -or combination of structures- I should use to implement this one. A list of lists perhaps?
Any ideas on my problems at 2) and 3) would be helpful, Thanks in advance.
Actually, it shouldn't be too unmanageable to always do queries per hour. Assuming that the number of queries per hour is never greater than the maximum int value, that's only 24 ints per day = 32 bits or 64 depending on your machine. Assuming 32 bits, then you could get up to 28 years worth of data per MB.
There's no need to transfer the month/year - your program can work that out. Just assign hour 0 to the earliest point in your data, which you keep as a constant, then work out the date based on hours passed since then.
This avoids having to have a list of lists or anything fancy - just use an array where each address contains the number of hours since hour 0, and the number of queries for that hour.
Why don't you simply use a classical database?
When you start asking these kind of question I think it is a good time to consider a more robust structure.There are multiple data structures implemented inside any DB, optimized either for different access type. You should considerate at least lookup, insertion, deletion, range queries. There is no structure which is better than the others in all costs, so there is always a trade-off.
Qt has some database classes you can use. I never used the Qt SQL library, but I think you should give it a shot. Fortunately, there is a Qt SQL programming guide at the end of the page linked.

"out of memory" exception in CRecordset when selecting a LONGTEXT column from MySQL

I am using CODBCRecordset (a class found on CodeProject) to find a single record in a table with 39 columns. If no record is found then the call to CRecordset::Open is fine. If a record matches the conditions then I get an Out of Memory exception when CRecordset::Open is called. I am selecting all the columns in the query (if I change the query to select only one of the columns with the same where clause then no exception).
I assume this is because of some limitation in CRecordset, but I can't find anything telling me of any limitations. The table only has 39 columns.
Has anyone run into this problem? And if so, do you have a work around / solution?
This is a MFC project using Visual Studio 6.0 if it makes any difference.
Here's the query (formatted here so wold show up without a scrollbar):
SELECT `id`, `member_id`, `member_id_last_four`, `card_number`, `first_name`,
`mi`, `last_name`, `participant_title_id`, `category_id`, `gender`,
`date_of_birth`, `address_line_1`, `address_line_2`, `city`, `state`,
`zip`, `phone`, `work_phone`, `mobile_phone`, `fax`, `email`,
`emergency_name`, `emergency_phone`, `job_title`, `mail_code`,
`comments`, `contract_unit`, `contract_length`, `start_date`,
`end_date`, `head_of_household`, `parent_id`, `added_by`, `im_active`,
`ct_active`, `organization`, `allow_members`, `organization_category_id`,
`modified_date`
FROM `participants`
WHERE `member_id` = '27F7D0982978B470C5CF94B1B833CC93F997EE23'
Copying and pasting into my query browser gives me only one result.
More info:
Commented out each column in the select statement except for id. Ran the query and no exception.
Then I systematically went through and uncommented each column, one at a time, and re-ran query in between each uncomment.
When I uncomment the comment column then I get the error.
This is defined as the following (Using MySQL): LONGTEXT
Can we assume you mean you're calling CODBCRecordset::Open(), yes? Or more precisely, something like:
CDatabase db;
db.Open (NULL,FALSE,FALSE,"ODBC;",TRUE);
CODBCRecordSet rs (&db);
rs.Open ("select blah, blah, blah from ...");
EDIT after response:
There are some known bugs with various ODBC drivers that appear to be caused by retrieving invalid field lengths. See these links:
http://forums.microsoft.com/msdn/showpost.aspx?postid=2700779&siteid=1
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=296391
This particular one seems to have been because CRecordset will allocate a buffer big enough to hold the field. As the column returns a length of zero, it's interpreted as the max 32-bit size (~2G) instead of the max 8-bit size (255 bytes). Needless to say, it can't allocate enough memory for the field.
Microsoft has acknowledged this as a problem, have a look at these for solutions:
http://support.microsoft.com/kb/q272951/
http://support.microsoft.com/kb/940895/
EDIT after question addenda:
So, given that your MySQL field is a LONGTEXT, it appears CRecordSet is trying to allocate the max possible size for it (2G). Do you really need 2 gig for a comments field? Typing at 80 wpm, 6cpw would take a typist a little over 7 years to fill that field, working 24 h/day with no rest :-).
It may be a useful exercise to have a look at all the columns in your database to see if they have appropriate data types. I'm not saying that you can't have a 2G column, just that you should be certain that it's necessary, especially in light of the fact that the current ODBC classes won't work with a field that big.
Read Pax's response. It gives a you a great understanding about why the problem happens.
Work Around:
This error will only happen if the field defined as (TEXT, LONGTEXT, etc) is NULL (and maybe empty). If there is data in the field then it will only allocate for the size the data in the field and not the max size (thereby causing the error).
So, if there is a case where you absolutely have to have these large fields. Here is a potential solution:
Give the field a default value in the database. (ie. '<blank>')
Then when displaying the value; you pass NULL/empty if you find default value.
Then when updating the value; you pass the default value if you find NULL/empty.
I second Pax's suggestion that this error is due to trying to allocate a buffer big enough to hold the biggest LONGTEXT possible. The client doesn't know how large the data is until it has fetched it.
LONGTEXT is indeed way larger than you would ever need in most applications. Consider using MEDIUMTEXT (max size 16MB) or just TEXT (max size 64KB) instead.
There are similar problems in PHP database interfaces. A PHP normally has a memory size limit and any fetch of a LONGBLOB or LONGTEXT is likely to exceed that limit.