CAML from Sharepoint to HTML - web-services

I'm invoking one of SharePoint's web service APIs that returns a CAML fragment. I've searched the interweb far and wide but I've been unable to figure out how to make this CAML fragment to render as "normal" HTML that I can render in a more sane environment like Plumtree, WLP, Liferay or any other portal besides SharePoint.
Without a way to do this, I'm wondering why Microsoft wrote SharePoint web service calls that return CAML in the first place. Web services are for interoperability and it seems the CAML is only valid within a WebPart running within SharePoint. [Note to Bill and Steve: that's not interoperability.]
If I can't do anything with the CAML that comes back, I'm just going to call a different web service that returns only data and then write my own UI. I was hoping for an easier path. Any suggestions would be greatly appreciated.

The CAML is still XML and as mentioned, XSLT will be able to render it as HTML. The actual gnraly nested OR/AND structure of CAML is a whole nother issue.
That would require unrolling the CAML structure and displaying it in a way that normal people understand.
Unfortunately, the XSLT language is unsuitable for unrolling nested structures like this (it has no stack). It is possible, but having done it I recommend strongly using another language to parse and unroll the CAML.
I have yet to see a CAML to SQL conversion code. Sounds like a great Codeplex project.
So in summary... you are a bit stuffed with CAML. While it is XML, it's structure is unsuited to use in any other query language.

You could send the CAML through an XSLT stylesheet to generate HTML or XHTML.
Edit:
Considering your first question (why SharePoint returns CAML from some of its web services)... who knows? It may be there to support authoring tools such as SharePoint designer. But it seems clear from the dearth of documentation and tools that CAML is a more-or-less internal SharePoint thing. At present, performing CAML-to-HTML conversion would require either somehow accessing the CAML rendering engine within SharePoint, or re-implementing it. Neither option is attractive.
I think that your conclusion (calling data-returning web services and rendering the HTML yourself) is probably your best bet.

Related

Sitemesh or XSLT for layout

I am designing a layout for my crm project now.
Now i am ended with 2 options one is sitemesh to define the layout or XSLT to define a layout.
Sitemesh will run at runtime from the server , it wont cause any issue if the number of request is high?
I guess XSLT will run at the browser based on the Xpath , is this correct?
Which one is better to use?
Please help me
Thanks
You can run XSLT either in the browser or at the server. The advantage to running it at the server is that the HTML you generate will be the same regardless of what browser the user has. If you run it in the web browser, users with different browsers might get slightly different results, because different XSLT transformation engines have different quirks, kind of like different web browsers do when rendering the same HTML and CSS.
I've designed and taught a corporate 1-day intro to XSLT class. I love the way XSLT works. That said, it has been criticized for running slowly and being hard to learn.
I've just started using SiteMesh 2.0, and I really like it. If you are not familiar with XSLT coding, you may be more comfortable with SiteMesh, because it simply wraps your content with a header/footer you create. You don't have to write and debug XSLT code.

What are the gotchas with ColdFusion?

Background:
I have a new site in the design phase and am considering using ColdFusion. The Server is currently set-up with ColdFusion and Python (done for me).
It is my choice on what to use and ColdFusion seems intriguing with the tag concept. Having developed sites in PHP and Python the idea of using a new tool seems fun but I want to make sure it is as easy to use as my other two choices with things like URL beautification and scalability.
Are there any common problems with using ColdFusion in regards to scalability and speed of development?
My other choice is to use Python with WebPy or Django.
ColdFusion 9 with a good framework like Sean Cornfeld's FW/1 has plenty of performance and all the functionality of any modern web server development language. It has some great integration features like exchange server support and excel / pdf support out of the box.
Like all tools it may or may not be the right one for you but the gotchas in terms of scalability will usually be with your code, rarely the platform.
Liberally use memcached or the built in ehache in CF9, be smart about your data access strategy, intelligently chunk returned data and you will be fine performance wise.
My approach with CF lately involves using jQuery extensively for client side logic and using CF for the initial page setup and ajax calls to fill tables. That dramatically cuts down on CF specific code and forces nice logic separation. Plus it cuts the dependency on any one platform (aside from the excellent jQuery library).
To specifically answer your question, if you read the [coldfusion] tags here you will see questions are rarely on speed or scalability, it scales fine. A lot of the questions seem to be on places where CF is a fairly thin layer on another tool like Apache Axis (web services) and ExtJs (cfajax) - neither of which you need to use. You will probably need mod-rewrite or IIS rewrite to hide .cfm
Since you have both ColdFusion and Python available to you already, I would carefully consider exactly what it is you're trying to accomplish.
Do you need a gradual learning curve, newbie-friendly language (easy for someone who knows HTML to learn), great documentation, and lots of features that make normally difficult tasks easy? That sounds like a job for ColdFusion.
That said, once you get the basics of ColdFusion down, it's easy to transition into an Object Oriented approach (as others have noted, there are a plethora of MVC frameworks available: FW/1, ColdBox, Fusebox, Model-Glue, Mach-ii, Lightfront, and the list goes on...), and there are also dependency management (DI/IoC) frameworks (my favorite of which is ColdSpring, modeled after Java's Spring framework), and the ability to do Aspect-Oriented Programming, as well. Lastly, there are also several ORM frameworks (Transfer, Reactor, and DataFaucet, if you're using CF8 or earlier, or add Hibernate to the list in CF9+).
ColdFusion also plays nicely with just about everything else out there. It can load and use .Net assemblies, provides native access to Java classes, and makes creating and/or consuming web services (particularly SOAP, but REST is possible) a piece of cake. (I think it even does com/corba, if you feel like using tech from 1991...)
Unfortunately, I've got no experience with Python, so I can't speak to its strengths. Perhaps a Python developer can shed some light there.
As for url rewrting, (again, as others have noted) that's not really done in the language (though you can fudge it); to get a really nice looking URL you really need either mod_rewrite (which can be done without .htaccess, instead the rules would go into your Apache VHosts config file), or with one of the IIS URL Rewriting products.
The "fudging" I alluded to would be a url like: http://example.com/index.cfm/section/action/?search=foo -- the ".cfm" is in the URL so that the request gets handed from the web server (Apache/IIS) to the Application Server (ColdFusion). To get rid of the ".cfm" in the URL, you really do have to use a URL rewriting tool; there's no way around it.
From two years working with CF, for me the biggest gotchas are:
If you're mainly coding using tags (rather than CFScript) and formatting for readability, be prepared for your output to be filled with whitespace. Unlike other scripting languages, the whitespace between statements are actually sent to the client - so if you're looping over something 100 times and outputting the result, all the linebreaks and tabs in the loop source code will appear 100 times. There are ways around this but it's been a while - I'm sure someone on SO has asked the question before, so a quick search will give you your solution.
Related to the whitespace problem, if you're writing a script to be used with AJAX or Flash and you're trying to send xml; even a single space before the DTD can break some of the more fussy parsing engines (jQuery used to fall over like this - I don't know if it still does and flash was a nightmare). When I first did this I spent hours trying to figure out why what looked like well formed XML was causing my script to die.
The later versions aren't so bad, but I was also working on legacy systems where even quite basic functionality was lacking. Quite often you'll find you need to go hunting for a COM or Java library to do the job for you. Again, though, this is in the earlier versions.
CFAJAX was a heavy, cumbersome beast last time I checked - so don't bother, roll your own.
Other than that, I found CF to be a fun language to work with - it has its idiosyncracies like everything else, but by and large it was mostly headache free and fast to work with.
Hope this helps :)
Cheers
Iain
EDIT: Oh, and for reasons best known to Adobe, if you're running the trial version you'll get a lovely fat HTML comment before all of your output - regardless of whether or not you're actually outputting HTML. And yes, because the comment appears before your DTD, be prepared for some browsers (not looking at any one in particular!) to render it like crap. Again - perhaps they've rethought this in the new version...
EDIT#2: You also mentioned URL Rewriting - where I used to work we did this all the time - no problems. If you're running on Apache, use mod_rewrite, if you're running on IIS buy ISAPI Rewrite 3.
do yourself the favor and check out the CFWheels project. it has the url rewriting support and routes that you're looking for. also as a full stack mvc framework, it comes with it's own orm.
It's been a few years, so my information may be a little out of date, but in my experience:
Pros:
Coldfusion is easy to learn, and quick to get something up and running end-to-end.
Cons:
As with many server-side scripting languages, there is no real separation between persistence logic, business logic, and presentation. All of these are typically interwoven throughout a typical Coldfusion source file. This can mean a lot more work if you want to make changes to the database schema of a mature application, for example.
There are some disciplines that can be followed to make things a little more maintainable; "Fusebox" was one. There may be others.

Is Coldfusion more than a presentation technology?

I've been looking recently at Coldfusion for an upcoming job. My background is in ASP.net/MVC and JSP/Servelets.
From what I can tell, Coldfusion is mostly a presentation technology that interfaces with a business layer implemented in some other technology. For the trivial cases, it also looks like you can go straight from the markup to the database much like PHP.
I know this is probably a simplistic view of the product. So what more does it do and what is the business case for using Coldfusion over more heavily hyped web technologies like ASP.net/JSP?
You can definitely write your business layer in ColdFusion, and as you say you can extend that with easy hooks to java and .net objects.
The business case for ColdFusion is that it is a rapid application development platform - the speed that you as a developer can get things done is just insane. There is a lot of built-in functionality, from MS Exchange integration, charting, Excel generation, all the way through to a Hibernate ORM implementation (new in CF9).
There are a few popular, mature MVC frameworks (Model-Glue, Coldbox, Fusebox, onTap, etc) that you can work with, or you can run up your own framework using a pattern that suits your style.
What might be confusing you is that you can choose to write the presentation layer and business layer in ColdFusion tags, and that might be why you think it's not a powerful option for the business layer. CF tags wrap a lot of functionality in an easy to use syntax, but with CF9 you have the option to write ColdFusion Components (CFCs) completely with a script based syntax - that might help you distinguish between presentation (tags) and business logic (script).
The developer edition is free to try, so you really only are losing some time if you give it a go, and I highly recommend you check it out.
Riding on Antony's suggestions, he forgot to mention another MVC framework, ColdFusion on Wheels! We're rapidly approaching a 1.0 release by next month and have an active community developing a slew of plugins. With built a ORM that follows Rails' design, it's easy to pick up. Check it out and give us some feedback.

ColdFusion CRUD

For quite a long time now, I've been trying to write and have been in search of "a really good" CRUD application. Don't get me wrong - I didn't say "The ultimate" CRUD application. Just one that could be rated 1st class.
What I'm saying is: Please don't respond to this plea with an answer like "Well, every situation is different..."
Q: Is there a blog post or something in the Adobe documentation that shows CRUD on a one-to-many relationship (Header/Detail), that uses web standards css (instead of tables), that uses best practices (CF9 has changed so many things now: scripted components, ORM), that uses the latest UI techniques (jQuery or some of the built-in AJAX features of CF9), that has a nice front-end (a nice looking header and background along with some pretty buttons)?
I know that's a lot to ask, but such is my quest.
A good example of a one-to-many relationship is the city/state xml files built into the Spry examples. There are 23,000 cities in the sample xml files, so I think that's better than just using random data.
I'm not really sure what you're asking, but I just want to respond to a couple of points in your question (this is more a comment than an answer, but since SO is stupidly limited in this, I'll put it here instead.)
that uses web standards css (instead of tables),
There is no "css instead of tables" - they are two distinct and compatible things!
CSS describes visual aspects of a document, whilst tables markup tabular data.
If you're displaying tabular data, then tables is exactly what you should be using, and you can use CSS to make it look more exciting than the plain styles that tables come in.
Since you're asking for a CRUD app, odds are you are going to be wanting to display tabular data so should be using tables.
(The common mistake people make is not understanding the nature of the web, and using tables to apply grid layouts to documents, when they should be using strucuted semantic markup instead.)
that uses best practices (CF9 has changed so many things now:
scripted components, ORM)
Scripted components are not a best practise!
They are an alternative syntax (for people that prefer having non-descriptive braces everywhere) they do not offer anything you can't already do.
i would strongly suggest you check out cfwheels. read the documentation, it's built for doing such crud applications and has an amazing set of features and will save you a lot of time. as for the interface, there are many jquery plugins out there that can handle this. i suggest looking at ajaxrain and find a plugin you like

"Smart" way of parsing and using website data?

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed yesterday.
Improve this question
How does one intelligently parse data returned by search results on a page?
For example, lets say that I would like to create a web service that searches for online books by parsing the search results of many book providers' websites. I could get the raw HTML data of the page, and do some regexs to make the data work for my web service, but if any of the websites change the formatting of the pages, my code breaks!
RSS is indeed a marvelous option, but many sites don't have an XML/JSON based search.
Are there any kits out there that help disseminate information on pages automatically? A crazy idea would be to have a fuzzy AI module recognize patterns on a search results page, and parse the results accordingly...
I've done some of this recently, and here are my experiences.
There are three basic approaches:
Regular Expressions.
Most flexible, easiest to use with loosely-structured info and changing formats.
Harder to do structural/tag analysis, but easier to do text matching.
Built in validation of data formatting.
Harder to maintain than others, because you have to write a regular expression for each pattern you want to use to extract/transform the document
Generally slower than 2 and 3.
Works well for lists of similarly-formatted items
A good regex development/testing tool and some sample pages will help. I've got good things to say about RegexBuddy here. Try their demo.
I've had the most success with this. The flexibility lets you work with nasty, brutish, in-the-wild HTML code.
Convert HTML to XHTML and use XML extraction tools. Clean up HTML, convert it to legal XHTML, and use XPath/XQuery/ X-whatever to query it as XML data.
Tools: TagSoup, HTMLTidy, etc
Quality of HTML-to-XHML conversion is VERY important, and highly variable.
Best solution if data you want is structured by the HTML layout and tags (data in HTML tables, lists, DIV/SPAN groups, etc)
Most suitable for getting link structures, nested tables, images, lists, and so forth
Should be faster than option 1, but slower than option 3.
Works well if content formatting changes/is variable, but document structure/layout does not.
If the data isn't structured by HTML tags, you're in trouble.
Can be used with option 1.
Parser generator (ANTLR, etc) -- create a grammar for parsing & analyzing the page.
I have not tried this because it was not suitable for my (messy) pages
Most suitable if HTML structure is highly structured, very constant, regular, and never changes.
Use this if there are easy-to-describe patterns in the document, but they don't involve HTML tags and involve recursion or complex behaviors
Does not require XHTML input
FASTEST throughput, generally
Big learning curve, but easier to maintain
I've tinkered with web harvest for option 2, but I find their syntax to be kind of weird. Mix of XML and some pseudo-Java scripting language. If you like Java, and like XML-style data extraction (XPath, XQuery) that might be the ticket for you.
Edit: if you use regular expressions, make sure you use a library with lazy quantifiers and capturing groups! PHP's older regex libraries lack these, and they're indispensable for matching data between open/close tags in HTML.
Without a fixed HTML structure to parse, I would hate to maintain regular expressions for finding data. You might have more luck parsing the HTML through a proper parser that builds the tree. Then select elements ... that would be more maintainable.
Obviously the best way is some XML output from the engine with a fixed markup that you can parse and validate. I would think that a HTML parsing library with some 'in the dark' probing of the produced tree would be simpler to maintain than regular expressions.
This way, you just have to check on <a href="blah" class="cache_link">... turning into <a href="blah" class="cache_result">... or whatever.
Bottom line, grepping specific elements with regexp would be grim. A better approach is to build a DOM like model of the page and look for 'anchors' to character data in the tags.
Or send an email to the site stating a case for a XML API ... you might get hired!
You don't say what language you're using. In Java land you can use TagSoup and XPath to help minimise the pain. There's an example from this blog (of course the XPath can get a lot more complicated as your needs dictate):
URL url = new URL("http://example.com");
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser"); // build a JDOM tree from a SAX stream provided by tagsoup
Document doc = builder.build(url);
JDOMXPath titlePath = new JDOMXPath("/h:html/h:head/h:title");
titlePath.addNamespace("h","http://www.w3.org/1999/xhtml");
String title = ((Element)titlePath.selectSingleNode(doc)).getText();
System.out.println("Title is "+title);
I'd recommend externalising the XPath expressions so you have some measure of protection if the site changes.
Here's an example XPath I'm definitely not using to screenscrape this site. No way, not me:
"//h:div[contains(#class,'question-summary')]/h:div[#class='summary']//h:h3"
You haven't mentioned which technology stack you're using. If you're parsing HTML, I'd use a parsing library:
Beautiful Soup (Python)
HTML Agility Pack (.NET)
There are also webservices that do exactly what you're saying - commercial and free. They scrape sites and offer webservice interfaces.
And a generic webservice that offers some screen scraping is Yahoo Pipes. previous stackoverflow question on that
It isn't foolproof but you may want to look at a parser such as Beautiful Soup It won't magically find the same info if the layout changes but it's a lot easier then writing complex regular expressions. Note this is a python module.
Unfortunately 'scraping' is the most common solution, as you said attempting to parse HTML from websites. You could detect structural changes to the page and flag an alert for you to fix, so a change at their end doesn't result in bum data. Until the semantic web is a reality, that's pretty much the only way to guarantee a large dataset.
Alternatively you can stick to small datasets provided by APIs. Yahoo are working very hard to provide searchable data through APIs (see YDN), I think the Amazon API opens up a lot of book data, etc etc.
Hope that helps a little bit!
EDIT: And if you're using PHP I'd recommend SimpleHTMLDOM
Have you looked into using a html manipulation library? Ruby has some pretty nice ones. eg hpricot
With a good library you could specify the parts of the page you want using CSS selectors or xpath. These would be a good deal more robust than using regexps.
Example from hpricot wiki:
doc = Hpricot(open("qwantz.html"))
(doc/'div img[#src^="http://www.qwantz.com/comics/"]')
#=> Elements[...]
I am sure you could find a library that does similar things in .NET or Python, etc.
Try googling for screen scraping + the language you prefer.
I know several options for python, you may find the equivalent for your preferred language:
Beatiful Soup
mechanize: similar to perl WWW:Mechanize. Gives you a browser like object to ineract with web pages
lxml: python binding to libwww
scrapemark: uses templates to scrape pieces of pages
pyquery: allows you to make jQuery queries in xml/xhtml documents
scrapy: an high level scraping and web crawling framework for writing spiders to crawl and parse web pages
Depending on the website to scrape you may need to use one or more of the approaches above.
If you can use something like Tag Soup, that'd be a place to start. Then you could treat the page like an XML API, kinda.
It has a Java and C++ implementation, might work!
Parsley at http://www.parselets.com looks pretty slick.
It lets you define 'parslets' using JSON what you're define what to look for on the page, and it then parses that data out for you.
As others have said, you can use an HTML parser that builds a DOM representation and query it with XPath/XQuery. I found a very interesting article here: Java theory and practice: Screen-scraping with XQuery - http://www.ibm.com/developerworks/xml/library/j-jtp03225.html
There is a very interesting online service for parsing websites https://loadsiteinmysql.site This service splits the site into tags and loads them into MySQL tables. This allows you to parse sites using MySQL syntax