which layout engine for finding coordinates of html elements on the web page? - c++

I am doing some web data classification task and was thinking if I could get the co-ordinates of html elements as they would appear on a web-browser without taking into consideration any css or javascript being referred in the web page.
My language of programming is c++ and the need results for a couple million of pages, so it has to be fast. I know there is a Microsoft COM component which renders the page in a web browser control and then can be queried for position of different html tags. But this is not suitable in my case as it first renders the whole page which takes up a lot of time.
So as I found out, there are open-source layout engines WebKit, Gecko that can probably be used for this. But that's a huge piece of code and I need someone to direct me to the right classes or right modules to look into or any previous/similar work someone has done previously. Also, please let me know what you guys think is a good choice if I want to customize the existing code for use with multiple threads to make it faster.
Thanks

Generally, you would find that different page rendering engines do render the html in their own way and the results will differ.
The thing is that if you stick to any concrete browser engine, what you are to do is somehow bringing this engine into your project and using engine's interface to retrieve these coordinates. Kind of a tough task though, simply because you'll have to read a lot of documentation and crawl through thousands of files.
I think that right approach would be posting this task in some place, that is specific for the page rendering engine you've chosen. (gecko/webkit/...)
If you prefer sticking to something MS-specific, guess it's gonna be easier, but can't help you with something like class names or code chunks that you want to see. Probably somebody else could guide you in this case.

Related

scroll websites with django

I'm currently working on a simple scroll website with nothing really difficult (I could almost use plain html/css/javascript but it's a bit of practicing and I will maybe add a blog). And as it is simple I was wondering how to do it properly with Django.
So here is my question, I have a homepage template that is actually the website and I don't really get how I split my different part in different apps.
For exemple I have a contact form, do I need to split it in another app and then include it in the basic template ? I want to add a galery with image in a database, do I create an app for that ?
And the other question that goes along is how do I code an app that is not returning httpresponse but just html to put it in another template and do I still need views ? I would like to do a bit like a standard form in django where you do :
form.as_p or form.as_table
so maybe:
galery.as_slideshow
So my questions are quite novice and open but someone could give me some reading to get going, I would be really happy !
This is a question a lot of people struggle with and it seems like there are a lot of varying opinions out there.
I've found that the best way to really determine the appropriate answer for each case is to really distill the feature into individual requirements and group them by feature sets while keeping an eye out for additional uses outside of the project actively being worked on.
There is nothing which says you can't build your project to include a single app containing all of the modules you would need. Doing so seems like it would make your development easier initially right? So, the question to ask then is "What if I want to reuse (insert feature set here) in another unrelated project a year from now after I've already forgotten about the weird stuff I did to make it work originally?". Asking yourself that question forces you to think about your features in a much broader context and I think 99% of the time you will realize that a "Contact Form" requirement can actually become quite complex and really should be split up into at least one separate app (i.e. User Creation, Profile Management, Email Subscription, etc...)
Here is a link to a video about this very topic which I found to be useful in figuring out my way through this question:
https://www.youtube.com/watch?v=A-S0tqpPga4
I know this is not really a hard-line answer to your question but I hope it helps point you in the right direction.

Ember Way to Add Rss Feed without third party widget, Front-end only

I am using Ember 3.0 at the moment. Wrote my first lines of code in ANY language about 1 year ago (I switched careers from something totally unrelated to development), but I quickly took to ember. So, not a ton of experience, but not none. I am writing a multi-tenant site which will include about 20 different sites, all with one Ember frontend and a RubyOnRails backend. I am about 75% done with the front end, now just loading content into it. I haven’t started on the backend yet, one, because I don’t have MUCH experience with backend stuff, and two, because I haven’t needed it yet. My sites will be informational to begin with and I’ll build it up from there.
So. I am trying to implement a news feed on my site. I need it to pull in multiple rss feeds, perhaps dozens, filtered by keyword, and display them on my site. I’ve been scouring the web for days just trying to figure out where to get started. I was thinking of writing a service that parses the incoming xml, I tried using a third party widget (which I DON’T really want to do. Everything on my site so far has been built from scratch and I’d like to keep it that way), but in using these third party systems I get some random cross domain errors and node-child errors which only SOMETIMES pop up. Anyway, I’d like to write this myself, if possible, since I’m trying to learn (and my brain is wired to do the code myself - the only way it sticks with me).
Ultimately, every google result I read says RSS feeds are easy to implement. I don’t know where I’m going wrong, but I’m simply looking for:
1: An “Ember-way” starting point. 2: Is this possible without a backend? 3: Do I have to use a third party widget/aggregator? 4: Whatever else you think might help on the subject.
Any help would be appreciated. Here in New Hampshire, there are basically no resources, no meetings, nothing. Thanks for any help.
Based on the results I get back when searching on this topic, it looks like you’ll get a few snags if you try to do this in the browser:
CORS header issues (sounds like you’ve already hit this)
The joy of working with XML in JavaScript (that just might be sarcasm 😉, it’s actually unlikely to be fun)
If your goal is to do this as a learning exercise, then doing it Javascript/Ember will definitely help you learn lots of new things. You might start with this article as a jumping off point: https://www.raymondcamden.com/2015/12/08/parsing-rss-feeds-in-javascript-options/
However, if you want to have this be maintainable for the long run and want things to go quickly and smoothly, I would highly recommend moving the RSS parsing system into your backend and feeding simple data out to Ember. There are enough gotchas and complexities to RSS feeds over time that using a battle-tested library is going to be your best way to stay sane. And loading that type of library up in Ember (while quite doable) will end up increasing your application size. You will avoid all those snags (and more I’m probably not thinking of) if you move your parsing back to the server ...

Web Design - Templates vs Include

I am currently developing a website. I would like to separate content and presentation. I am currently using a Dreamweaver Template to achieve this. However, I find that Dreamweaver's edit regions are very limiting in the design view. I have found that the same goal can be achieved by including the header and footer of my website.
What are the pros and cons of using includes rather than using templates?
First, if I were to rephrase your question, it's more like asking "Should I by a wire frame of a kite or by the glue to stick together what I'm making?" And then, you ask about the pros and cons of buying the wireframe against buying the glue. There are far too many variables as you can see...
And back on your your question... At some point your template will use include files. And for a start, it's worth knowing what you're thinking... Let's look at some basics.
Web design - usually refers to making websites that aren't really interactive. They don't have server-side elements. So most of the site has 'static' contents. If this were the case, you're better off with DreamWeaver, particularly if you're not into html/css editing.
Web development/programming - starts off with something as elementary as mailing a form, to highly interactive sites like FaceBook. Here you'll need to use some server-side language, usually like PHP, ASP or JSP. The choices are many but you've got to choose your own platform or combination of them.
Now to the second option (above). If for example, you were building a site using PHP, one of the nice things you'll do is to include your header, footer and side panels that need to be repeated across all pages. This way, you'll eliminate the need to re-write those sections. But if you were using a program like DreamWeaver, it does this duplication for you. Yes, it physically copy-pastes those sections into every file that needs it. Of course the end result may not be any different. But as a developer, you will be tied down to the DreamWeaver platform or for that matter, any other specific platform.
On the other hand, if you get used to working with an editor like NotePad++ or GEdit, you may switch between editors at any time. But you have the task of hand-coding everything from scratch. But then again, since you would use include files to bring in your headers and stuff, you save development time as well.
I don't know how much of html/css or php you know, but here's one of my demos to show you how to hand-code a site. This ain't complete but you should get an idea.
Link to the video introduction
Link to the video on youtube

Building dashboards in django

I have a django app and I would like to display some graphical data visualization to my users. I am looking for an easy-to-use package that would allow me to add graphs and widgets.
The kind of widget I want to build is a kind of speedometer dial that is red at one end and green at the other. As a user completes their job over the day, the graphic/widget adjusts itself. The dial moves from red to green.
I also want an S-curve graphic that shows the cumulative amount of work accomplished against planned. That is kind of an x/y line plot.
My question are: How easy is this to implement? Are there any add-ins libraries or packages that do this already? I am trying to keep my entire application open-source. I've seen a couple subscription services that do this type of thing, but I can't stomach the cost.
I don't mind using ajax or jquery to implement such a thing, but I would like the most elegant and maintainable solution.
Any advice or examples on how to tackle this project?
There are lots of good javascript libraries these days, but all require some effort to learn how to use. I have not found one that really is easy to use, I guess because everyone wants something different. My general experience has been the more effort you put into learning them, the more you get out.
Google has gauges: http://code.google.com/apis/chart/interactive/docs/gallery/gauge.html
Also
http://www.flotcharts.org/
http://philogb.github.com/jit/
http://www.highcharts.com/
http://www.jqplot.com/
Or really take control:
http://mbostock.github.com/protovis/
As first, see the following grid (https://www.djangopackages.com/grids/g/dashboard-applications/) on djangopackages.
Not sure if that's exactly what's asked for, but you might take a look at django-dash (https://pypi.python.org/pypi/django-dash).
It allows each user to make his own dashboard (from plugins available). Those dashboards can be made public.
Some screenshots (http://pythonhosted.org/django-dash/#screenshots).
It's modular and plugin based, so you need to make a plugin and widgets for every specific feature (in this particular case - the speedometer plugin and widgets for it). Each plugin/widget can include own JS/CSS when being rendered.
See the following chart usage examples:
D3.js integration examples (https://github.com/barseghyanartur/django-dash/tree/master/example/example/d3_samples).
Polychart2.js integration example (https://github.com/barseghyanartur/django-dash/blob/master/example/example/bar/).
protovis is no longer under active development, but they started a new poject: http://d3js.org/
You may choose from these packages:
https://www.djangopackages.com/search/?q=dash

How to build a system for an editorial team

I'm developing a web portal that mostly works like a newspaper site. In the focus, there are articles, containing text, videos and images. These articles have attachments which shall be presented in a sidebar. These attachments might be the same objects that will be displayed within the body text.
I have been thinking a lot about how to create the structure and - and this is a major point - how to enable the editor to edit all this stuff comfortably.
What I evaluated were Django-CMS and feincms as complete systems, and several third-party-modules that do snippets of the work.
Now, I a have solution for inline objects: I forked the inline-module of django-basic-apps which is now able to take additional parameters for the objects to embed. Their parameters are an important thing to e.g. embed "an image with object id x, but max x pixels in size".
What is not solved with my approach is, to generate a sidebar containing a bunch of inline tokens. I could create a custom widget for this, though. A better solution would surely be to add a functionaly like somehow attaching generic objects (videos, images...) to an article object.
While my solution is working so far, I'm not sure if there are other ways to solve these common scenario, and I would like to hear some other experiences about this topic, and if there are any other ways you deal with it.
For there does not seem to be a bigger need of a solution for this generic problem, I will use my solution and see whether it proves in practice.
Take a look at Armstrong CMS. It's specifically designed to meet the needs of news organizations. It was developed out of the code that powers The Texas Tribune, a very large Django news site that won the Edward R. Murrow Award for best local news website in 2010.
Armstrong scales very well, is fast and can handle just about any kind of content you want to throw at it.