Class::DBI - does it load all tables? - profiling

We have a MySQL database with very big number of tables. Unfortunately in 2018 we still use Perl CGI. So loading time of a script is essential.
DBIx::Class was ruled out by me because it loads about 1.6 sec (so long because it loads Perl definitions for all tables of the DB) what is clearly too much.
How quickly Class::DBI loads? My main question: Does Perl load information about all available tables (like DBIx::Class does) when we use Class::DBI or does it load Perl definitions for only these tables which we actually use?
The following is a DBIx::Class code which loads 1.6 sec:
#!/usr/bin/perl
package MyApp::Schema;
use lib '.../ORMs/dbix-class';
use base qw/DBIx::Class::Schema/;
__PACKAGE__->load_namespaces();
1;
(The schema is autogenerated.)
Is there any way to make it faster? How to use it without loading all tables?

I really wouldn't recommend Class::DBI. It's been unmaintained for twelve years - and there were good reasons why everyone switched to DBIx::Class.
I would highly recommend working on the problem that leads to you still using CGI. What is preventing you from, for example, using CGI::Emulate::PSGI to trivially convert your CGI code to PSGI apps which you can then deploy in a persistent environment like FastCGI or, better, as a standalone service which you can then access using nginx? Any of these solutions would mean that the DBIx::Class load time is no longer problem.
Obviously, I have no idea what is keeping you tied to CGI. But, in my experience, moving to PSGI solution is often easier than people expect it to be and it will undoubtedly leave you in a better position.

Related

Design of data storage C++ application (maybe relational database)

I need to store and load some data in a C++ application. This data is basically going to end up as a set of tables as per a relational database.
I write the data to tables using something like csv format, then parse them myself and apply the database logic I need in my C++ code. But it seems stupid to reinvent the wheel with this and end up effectively writing my own database engine.
However, using something like a MySQL database seems like massive overkill for what is going to be a single user local system. I have tried setting up a MySQL daemon on my Windows system and I found it rather complex and possibly even impossible without admin privileges. It would be a serious obstacle to deployment as it would need each user's system to have MySQL set up and running.
Is there a reasonable middle ground solution? Something that can provide me with a simple database, accessible from C++, without all the complexities of setting up a full MySQL install?
NB. I have edited this question such that I hope it satisfies those who have voted to close the question. I am not asking for a recommendation for a tool, or someone's favourite tool or the best tools. That would be asking which database engine should I use. I am asking for what tools and design patterns are available to solve a specific programming problem - i.e. how can I get access to database like functionality from a C++ program, without writing my own database engine, nor setting up a full database server. This is conceptually no different to asking e.g. How do I print out the contents of a vector? - it's just a bigger problem. I have described the problem and what has been done so far to solve it. My understanding from the On Topic Page is that this is within scope.
You can try sqlite.
Here are some simple code examples: https://www.tutorialspoint.com/sqlite/sqlite_c_cpp.htm

Is it an efficient method to use a framework to code a single interactive webpage in python?

This is an open source contributor project for Wikidata's Chronic Pain project.
I would like to create a webpage that :
Have inputboxes where the user select several wikipedia page titles (with suggestions)
Can also take these parameters via the URL
Get items metadata from Wikidata.
Makes a SPARQL request to gather scholarly articles.
Render data from Wikidata and Wikipedia, linking to various wiki pages.
The webpage will be hosted on Wikimedia fundation server. I have access to a linux container as well as a Jupyter Notebook (not sure this one is suitable for this project). It has to be coded in Python 3 since I will use Pywikibot framework to interact with Wikidata.
I'm new to programming so that I don't really know what is the best approach. I heard that it was difficult to code webpages in Python without using a framework like Django. However this page is very simple so that it may not be the most efficient to deploy Django for this ?
NB : your question is bordering on "primarily opinion based" (which doesn't mean it's a bad question by itself but that answers might be more, well, opinions than hard facts).
This being said, "a single interactive page" doesn't mean the server code behind is just loading a static html file and sending it to the client. For example, the main UI part of our product is, technically speaking, "a single interactive page", but this "single" page is full react app and is backed by a dedicated API with a dozen entry points, which the dispatch to a whole load of backend code including database access, celery tasks etc. It would of course be technically possible to code all this with only pure wsgi or even plain old cgi code, but well, it could also be possible to write it directly in C or even assembly and no one would ever consider this a viable solution.
To make a long story short: do not even waste your time trying to code this project with plain wsgi (and let's not talk cgi), you will end up reinventing the squared wheel and everyone will hate you for this (stakeholders because you'll never deliver a robust, working product in due time and budget, and other devs because they'll now have to port the whole darn thing to a stable, mature and maintained framework). Now if Django appears to be overkill for this project there are much lighter frameworks like flask. Actually both are the "industry standard" and safe choices.

Why bother using Apache or Nginx, etc?

I've been assigned a project which requires me to add some HTML page serving. This embedded system (running Linux CentOS 6.3) has some extra juice available, but also already has numerous responsibilities.
I considered Apache but tossed it due to bloat, I looked into Nginx but am now shying from that too. It just seems that I'm getting way more 'functionality' and as a result, more CPU usage than I need.
Can someone enlighten me as to why I wouldn't just implement the HTTP protocol myself using async sockets?
My specific needs are:
Receive and decode GETs and POSTs.
Send CSS, JS and JPG files as requested.
Output header, cookie, head and body data based upon the decode of the GETs/POSTs.
Given that I don't need the myriad things these webservers offer, am I being naive in assuming this course of doing it myself? What would you suggest or warn against?
Basically, you use a web server because then you get the functionality you want in a form that's already been tested, is more reliable than your first code is likely to be, and is supported by a large community of others. If Apache and nginx are too heavyweight for you (although nginx is pretty much characterized by how lightweight it is for heavy loads) and especially if the load you expect is very light, then look around for other options.
Wiki has a whole page of comparisons of lightweight web servers.
An easy trap to fall into: thinking "I don't need all the functionality in Product X, I'll just write my own with just the functionality that I need" only to end up reimplementing Product X entirely, one newly-discovered requirement at a time.
I sort of doubt that an embedded system that can run CentOS okay is so resource-starved that it can't run Nginx comfortably (or even Apache, which people run on the Raspberry Pi just fine with appropriate configuration tweaks), given reasonable assumptions about how many pages you are actually serving. I ran it on a Pentium 266 with something like 256MB of RAM serving a few simple PHP apps that served roughly a page every two seconds, with no issues. As I recall, it's fairly modular, so you can just choose not to load the functionality you don't think you need. And, later, when your requirements change and you find out you do need it, you can just plug it back in :)
If you are really and truly concerned about resource consumption, look into web servers designed for embedded applications. I hear Cherokee is quite nice. Mongoose looks promising as well.
Go further you can, I began with this http://www.w3.org/Protocols/HTTP/HTTP2.html

Sqlite as a replacement for fopen()?

On an official sqlite3 web page there is written that I should think about sqlite as a replacement of fopen() function.
What do you think about it? Is it always good solution to replece application internal data storage with sqlite? What are the pluses and the minuses of such solution?
Do you have some experience in it?
EDIT:
How about your experience? Is it easy to use? Was it painful or rather joyful? Do you like it?
It depends. There are some contra-indications:
for configuration files, use of plain text or XML is much easier to debug or to alter than using a relational database, even one as lightweight as SQLite.
tree structures are easier to describe using (for example) XML than by using relational tables
the SQLite API is quite badly documented - there are not enough examples, and the hyperlinking is poor. OTOH, the information is all there if you care to dig for it.
use of app-specific binary formats directly will be faster than storing same format as a BLOB in a database
database corruption can mean the los of all your data rather than that in a single bad file
OTOH, if your internal data fits in well with the relational model and if there is a a lot of it, I'd recommend SQLite - I use it myself for one of my projects.
Regarding experience - I use it, it works well and is easy to integrate with existing code. If the documentation were easier to navigate I'd give it 5 stars - as it is I'd give it four.
As always it depends, there are no "one size fits all" solutions
If you need to store data in a stand-alone file and you can take advantage of relational database capabilities of an SQL database than SQLite is great.
If your data is not a good fit for a relational model (hierarchical data for example) or you want your data to be humanly readable (config files) or you need to interoperate with another system than SQLite won't be very helpful and XML might be better.
If on the other hand you need to access the data from multiple programs or computers at the same time than again SQLite is not an optimal choice and you need a "real" database server (MS SQL, Oracle, MySQL, PosgreSQL ...).
The atomicity of SQLite is a plus. Knowing that if you half-way write some data(maybe crash in the middle), that it won't corrupt your data file. I normally accomplish something similar with xml config files by backing up the file on a successful load, and any future failed load(indicating corruption) automatically restores the last backup. Of course it's not as granular nor is it atomic, but it is sufficient for my desires.
I find SQLite a pleasure to work with, but I would not consider it a one-size-fits-all replacement for fopen().
As an example, I just wrote a piece of software that's downloading images from a web server and caching them locally. Storing them as individual files, I can watch them in Windows Explorer, which certainly has benefits. But I need to keep an index that maps between a URL and the image file in order to use the cache.
Storing them in a SQLite database, they all sit in one neat little file, and I can access them by URL (select imgdata from cache where url='http://foo.bar.jpg') with little effort.

Which web framework incurs the least overhead?

I'm playing around with Django on my website hosting service.
I found out that a simple Django page, which has only some static text, and is rendered from a very simple template I created takes a significant time to render. When compared to a static HTML page, I am getting ~2 seconds difference in the load times. Keep in mind this is a simple test of mine with nothing complicated. Also note that my web hosting is on a shared server (not dedicated), so I might be hitting some CPU limitations.
Seems to me that either:
I have some basic CGI/Apache/Django configuration wrong
Django takes significant overhead, at least in this specific scenario.
I find #1 not probable since I followed my web hosting service wiki on how to set up Django. So we are left with the overhead problem.
My question is which web framework do you find the best to use in scenarios where the website is hosted on a shared server, and CPU/memory overhead must be kept to minimum?
Edit: seems that my configuration is something I might want to look at, and perhaps later on I'll be opening a question on how to best configure Django.
For now, I would appreciate answers focusing on your experience, in general, with web frameworks, and which of those you found to be the best in terms of performance in the aforementioned scenario.
"I have some basic CGI/Apache/Django configuration wrong"
Correct.
First. The very first time Django returns a page, it takes forever. A lot of initialization happens for the first request.
Second. What specific configuration are you using. We just switched from mod_python to mod_wsgi in daemon mode and are very happy with the performance changes.
Third. What database are you using?
Fourth. What test configuration are you using?
Fifth. What caching parameters and reverse proxy are you using?
Odds are good that you have a lot of degrees of freedom in your configuration.
Edit
The question "which of those you found to be the best in terms of performance" is largely impossible to answer.
See http://wiki.python.org/moin/WebFrameworks
There are dozens of frameworks. Few people can examine more than a few to do head-to-head comparison.
The best possible performance is achieved through static content. A Python app that makes static pages (for instance a collection of Jinja templates) is fastest.
After that, it's largely impossible to say. Even http://werkzeug.pocoo.org/ involves some processing overheads that may be unacceptable in the above scenario. Python can be slow.
Django, with a modicum of effort, is often fast enough. Serving static content separately from dynamic content, for example, can be a huge speedup.
Since Django does so much automatically, there's a huge victory in not having to write every little administrative page.
I'd say there has to be something funky with your setup there to get such a large performance difference. Try mod_wsgi (if you're not already) and follow the excellent suggestions by the posters above. If Django genuinely was this slow in all cases, there's just no way companies would be able to use it for production applications. It's more than likely not to be Django that is holding the request up. Once you have the .pyc files all sorted (automatically generated bytecode), then the execution should be fairly zippy.
However, if you don't actually need all Django has to offer, then why use it? I'm using it in quite a large production application, and we're not using all of its features… if you're doing something fairly simple, you may want to consider using something like web.py or Werkzeug (or something non-Python-based if you'd rather).
Frameworks like Django or Ruby on Rails grew out of real world needs. As different as these needs were, as different they turned out.
Here is my Experience:
As a former PHP programmer, I prefered CakePHP for simple stuff and Symfony for more advanced applications. I had a look into Ruby, but the documentation sucked back then. Now I'm using Django. Django works very well for me. In contrast to Symfony I feel like Django brings less flexibility out of the Box, but its easier to extend.
Another approach would be to use 'no framework' CherryPy
I think the host may be an issue. I do Django development on my localhost (Mac) and it's way better. I like WebFaction for cheap hosting and Amazon ec2 for premium hosting.
The framework is strong and it can handle heavy sites - don't obsess about that stuff. The important thing is to create a clean product, Django can handle it. There are about a thousand steps to take when you see how the application handles in the wild, but for now, just trust us that you don't need to worry about the inherent speed of the framework before exhausting a whole slew of parameters including a dedicated VPS/instance when you need it.
Also, following on your edit - I personally don't think performance is a major issue in programming. Here are the issues in terms of concern:
UI/UX efficiency
UI/UX speed (application caching)
Well designed models/views
Optimization of the system (n-tier architecture, etc...)
Optimization of the process (good QA to reduce failures/bottlenecks from deployment)
Optimization of the subsystems (database, etc...)
Hardware
Framework internal optimization
Don't waste time with comparing framework speeds. Their advantage is in extensible code, smart architectures, etc...
On a side note, DO NOT NOT USE A FRAMEWORK FOR A NEW WEB APPLICATION. I'm sorry I can't say it loud enough, but it's an absolute requirement nowadays. It's not even a debate about not using one, just which one to use.
I personally chose Django, which is great. But I can't definitively knock the others out there.
It's possibly both. Django does have stuff for caching built-in, which would be worth trying. Regardless, any non-cached page will nearly always take longer than a static file. A file has to be read in both cases, and in the case of a dynamic page, it also must be executed. And then, in both cases, sent over to the client.
Definetely shared hosting is not the best choice to run heavy frameworks such as Django or CakePHP. If you can afford it, buy VPS.
As for performance, probably your host uses Python with mod-python, which is not recommended now. WSGI is preferred standard for Python powered webapps.