ColdFusion 9 Verity Update - coldfusion

I have a usable Verity collection, initially populated thusly:
<cfindex action="refresh" type="custom" body="PageTitle,PageText"
collection="ABC" custom1="PageID" custom2="MenuName"
key="PageID" query="GetPages" title="PageTitle">
GetPages query looks something like this:
PageID PageTitle PageText MenuName
-------------------------------------------
100 About XYZ Corp <content> About Us
200 XYZs Products <content> Products
300 XYZs Services <content> Services
Along comes a new page that needs to be added to the collection:
PageID PageTitle PageText MenuName
-------------------------------------------
400 XYZ News <content> News
How do I add this to ABC without having to rebuild the entire collection? I've tried variations of <cfindex action="update" ...> without success, usually locking the collection and requiring a CF restart. I can't seem to find good working examples online, and what I do find seems vague. I can successfully purge and rebuild collection using <cfindex action="refresh"...>, but that's so process intensive to do regularly.
Environment: CF9 / IIS / WinServ 2008 R2; collection is ~300 documents at 1,500KB.
Limitations: app is in EOL phase so collection won't be migrated to SOLR; I have little experience using CF's search tools (engineers have done this for me up to now).

Related

scrapy: xpath not returning the full url for #href

performing a scrape using xpath with scrapy i dont get the full URL
here is the url i am looking at
using scrapy shell
scrapy shell "http://www.ybracing.com/omp-ia01854-omp-first-evo-race-suit.html"
i perform the following xpath select from the shell
sel.xpath("//*[#id='Thumbnail-Image-Container']/li[1]/a//#href")
and get only half the href
[<Selector xpath="//*[#id='Thumbnail-Image-Container']/li[1]/a//#href" data=u'http://images.esellerpro.com/2489/I/160/'>]
here's the snippet of html i am looking at in a browser
<li><a data-medimg="http://images.esellerpro.com/2489/I/160/260/1/medIA01854-GALLERY.jpg" href="http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg" class="cloud-zoom-gallery Selected" title="OMP FIRST EVO RACE SUIT" rel="useZoom: 'MainIMGLink', smallImage: 'http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg'"><img src="http://images.esellerpro.com/2489/I/160/260/1/smIA01854-GALLERY.jpg" alt="OMP FIRST EVO RACE SUIT Thumbnail 1"></a></li>
and here it is from wget
<li><a data-medimg="http://images.esellerpro.com/2489/I/513/0/medIA01838_GALLERY.JPG" href="http://images.esellerpro.com/2489/I/513/0/lrgIA01838_GALLERY.JPG" class="cloud-zoom-gallery Selected" title="OMP DYNAMO RACE SUIT" rel="useZoom: 'MainIMGLink', smallImage: 'http://images.esellerpro.com/2489/I/513/0/lrgIA01838_GALLERY.JPG'"><img src="http://images.esellerpro.com/2489/I/513/0/smIA01838_GALLERY.JPG" alt="OMP DYNAMO RACE SUIT Thumbnail 1" /></a></li>
i have tried varying my xpath to pull the same but still get the same result
what is causing this and what can i do to work around it would like to understand rather than someone just correct my xpath for me
some thoughts on the page itself i disabled javascript to see if the js was generating half the url but its not. I also downloaded the page with wget to confirm the urls are complete in the orriginal html
i havent tested any other builds but i'm using scrapy 1.2.1 on with 2.7 in centos 7
I've googled and only find people who cant grab the data due to javascript generating the data on the fly but my data is there in the html
By using
sel.xpath("//*[#id='Thumbnail-Image-Container']/li[1]/a//#href")
you get a list of Selector instances, in which the data field shows only the first few bytes of all its content (since it might be very long).
To retrieve the content as a string (instead of a Selector instance), you would need to use something like .extract or .extract_first:
>>> print(sel.xpath("//*[#id='Thumbnail-Image-Container']/li[1]/a//#href").extract_first())
http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg

MailChimp: How to use conditional logic with RSS feeds

I have searched MailChimp's documentation as well as other sites but cannot seem to figure out how to use both conditional merge tag blocks with |FEED| merge tags.
Basically I am wanting to combine the two in order to include posts from multiple blogs in my e-mail campaign; in particular, I would like to use conditional logic so that any RSS feeds evaluated as "empty" (meaning no new items) receive alternative content that says something along the lines of "no updates available."
I have tried to come up with a few ways of doing this - none have been successful, but here's the type of thing I had in mind:
*|FEEDBLOCK:http://www.mailchimp.com/blog/feed/|*
*|FEED:TITLE|*
*|IF:FEED:POSTS[$content=full] != |*
*|FEED:POSTS[$count=3,$content=titles]|*
*|ELSE:|*
no updates available for this feed
*|END:IF|*
*|END:FEEDBLOCK|*
any help would be appreciated.
Well, this question is from ages ago, but I had a similar problem and figured out a (really hacky) work-around...
Assumptions:
1. You're using a (very) custom RSS feed
2. You're overriding the default RSS tags with custom content
In my use case, I'm using the <category> RSS tag, which MailChimp reads in via the *FEEDITEM:CATEGORY* merge tag. I'm using this as a subheading for my RSS feed, instead.
If that subheading is filled out in the admin (that is, whatever admin system you're using to spit out the RSS feed), I want to include it in the feed -- but I also need to add in more html for the email template. The solution is including the required html in the RSS feed. (Like I said -- hacky.)
Shockingly, this works. Mailchimp dutifully pulls in all the html/css.
The RSS feed (vastily simplied here) looks something like this:
<channel>
<item>
<category><![CDATA[ <table><td><tr><div class="example">Sub Headline</div></tr></td> ]]></category>
</item>
<channel>
If that field is not set in my custom admin, then no <category> tags at all are outputted, and MailChimp simply ignores that merge tag.
So basically, any email HTML code that you want to display only if the merge tag is valid, should show up in the feed itself.
Definitely not ideal, but it works.
YMMV...
MailChimp is still very limited with it's conditional tags, which are limited to subscriber data. RSS feed conditions would be a welcome addition.
http://blog.mailchimp.com/conditional-dynamic-content-in-mailchimp/

Excessive recrawl of ColdFusion dynamic pages

The folks who use ColdFusion and serversideincludes are having issues with excessive recrawls on dynamic pages because there is no datelastmodfied set, which causes excessive server traffic. You can laugh if you want, but when I tell them the solution is setting a last modified date on the pages I get a universal huh? how do you do that? I opened a case with google originally and was told that yep, it's a page date problem. I have done a lot of research to try and find how to code this in the header and most of what I found talked about pulling a date from a page.
I did determine that it probably could be done using the CFHEADER tag. I'm just not sure about implementing.
Can I tell them that adding something like
<cfheader NAME="datelastmodified="Mon, 01 Feb 2013 08:00:00 GMT">
will suffice? Not sure about the date format, if the day name is required.
Have I tried just asking one of the webmasters to try this? No I haven't. I would like to know that I am at least on the right track before taking up too much of their time. And so far none of them have come up with a solution on their own other than useing robots.txt to block the crawl or things along those lines.
Any suggestions or thoughts would be appreciated.
Fortunately, none of these things need to be mysterious, as they're all well documented.
last-modified HTTP header
HTTP date/time formats
<cfheader>
and even a function to format the date correctly: getHttpTimeString()
This all comes together to suggest this sort of thing:
<cfheader name="Last-Modified" value="#getHttpTimeString(now())#"> <!--- although use some timestamp indicating when the content of the page was last updated,which would be a system-specific sort of thing --->
NB: I didn't know any of the specifics to this until I googled it about 5min ago.
Google's crawlers do tend to respect the meta tag details and HTTP response values for pages they encounter and the way to set such in CF is indeed with the CFHEADER tag. You'll want to craft it to look something like this:
<CFHEADER NAME="Last-Modified" VALUE="#DateFormat(now (), 'ddd, dd mmm yyyy')# #TimeFormat(now(), 'HH:mm:ss')# GMT#gmt#">
<CFHEADER NAME="Expires" VALUE="Mon, 10 Mar 2013 05:00:00 GMT">
You will likely want a CF dev to do that work as I'm showing you two examples for the datetime value there. The first one dynamically sets it to right now (using the DateFormat() and Now() functions) and the second example sets the Expires header value with a hard coded date.
You'll probably want to include both the last-modified and expires tags and decide whether you want the dates applied to each to be either dynamic or hard coded.

Using a regular expression to replace everything between tags contained within XML output

I've been trawling the internet trying to find a solution to this issue. Basically I am using a web service provided by the company that runs our support software to retrieve customer tickets and output them (dependent on filtering) through our system so that customers can see from their dashboard which current support tickets they have active. I've managed to get the desired tags from the XML that is returned via the web service and place their content in a html table (therefore listing the active tickets row by row in the table) however, as the ticket description tag is populated with the content from emails sent by clients, there is lots of nasty redundant css and styling that has been applied to the Email that I would like to remove.
So far I have managed to use the 'replace' function to replace some of the redundant content from this email content ->
l_html_build := replace(l_html_build,'<','<');
l_html_build := replace(l_html_build,'>','>');
l_html_build := replace(l_html_build,'&lt;','');
l_html_build := replace(l_html_build,'&gt;','');
l_html_build := replace(l_html_build,'&nbsp;',' ');
However I now need to overwrite the p tags which have all sorts of garbage added to them so that they just become standard p tags->
From this:
<p 0in;"="" 3.0pt="" padding:="" 1.0pt;="" solid="" border-top:="" none;="" _mce_style=""border:" 0in"="" 0in="" 1.0pt;padding:3.0pt="" #b5c4df="" style=""border:none;border-top:solid">
To this:
<p>
I've looked into using the regEXP function listed here psoug however this appears to require a select statement that is performed each time. The data I need to manipulate is stored in a CLOB called l_html_build so is there any way of adapting the regEXP function to be used in a similar way to the replace function above or is there an alternative method that I am not aware of?
I apologise if this is a noob question. My expertise lies in front end development, PHP and MySQL but unfortunately I'm now required to bits of PL/SQL in my new role.
Any help would be greatly appreciated.
Knowing that:
There is no standard PL/SQL package that parses HTML.
You can't reliably parse HTML with regex. Furthermore, Oracle only support basic regular expressions, restricting its capabilities.
You want to stay in PL/SQL
You are left with few options (that I can think of):
Write a simple procedure yourself that will work in most of the cases (but there will be many exceptions that will break your parser).
Use a java parser, load class in database, call java from PL/SQL. Oracle comes with its integrated jvm, so this involves no extra setup.
I would go with option (2) if you want reliability, or option (1) if infrequent but inevitable losses are acceptable.
Since your content will be coming from email client, we can assume that only a tiny (negligible?) fraction will have very obscure HTML.
In that case you could start with simple regex expressions that may need some tweaking:
SQL> SELECT regexp_replace(
2 '<p1 3.0pt="" padding:="" #b5c4df="">
3 text
4 </p>',
5 '<([[:alpha:]]+)[^>]*>',
6 '<\1>') remove_attr_simple
7 FROM dual;
REMOVE_ATTR_SIMPLE
------------------
<p>
text
</p>
This will fail to catch tricky valid HTML (such as <P attr=">">) but since your input is somewhat standard this should be fine often enough. You may need to remove HTML comments with another procedure -- I'm not sure it can be done with regex.
SQL is really not the best tool for this job. Nor will regexes be able to perform this kind of task reliably. You would be better off extracting the data and processing it in another language using an XML parser.
Presumably Oracle itself is not sending these emails. What program does the sending, and can you add some programmatic processing at that point?
Since you already know PHP, here is a discussion of parsing HTML/XML in PHP. Similar tools are available in most other languages.

Sharepoint 2007 AddList and AddListFromFeature are missing template columns and data content

What I've Done
Inside SharePoint I created a List based on the Project Tasks template
I deleted most default columns, and added new custom columns
I added data using the new format
Then I did a "Save as template" and chose to save the template with the content
What IS Working
Now, when I use that template to create a new List inside of SharePoint it works perfectly. The custom columns are present, and the data is all pre-filled as expected.
What ISN'T Working
However, when I use the AddList or AddListFromFeature methods made available by SharePoint web services the new list is created, but it is simply based off of the original Project Tasks template with the default columns and no data!
What I've Tried
I tried following the suggestion in the article from Phase 2 to setup a custom template ID, but that only prevented me from using the template at all (was no longer listed when I do a "Create").
I'm still trying to figure out if this article applies - it seems to be a similar issue, but applied to Sites instead of Lists.
I found that another person was having the same problem about a year ago.
System Setup
Working with SharePoint 2007 (I think?), using PHP with NuSOAP to connect. The connection is definitely working as I've added items to lists, created lists, and read data.
Code Samples
Request - against Phase 2 Method template above
<?xml version="1.0" encoding="ISO-8859-1"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns2034="http://tempuri.org"><SOAP-ENV:Body>
<AddListFromFeature xmlns="http://schemas.microsoft.com/sharepoint/soap/">
<listName>2Test Milestone Release</listName>
<description>Testing this out</description>
<featureID>{00BFEA71-513D-4CA0-96C2-6A47775C0119}</featureID>
<templateID>151</templateID>
</AddListFromFeature></SOAP-ENV:Body></SOAP-ENV:Envelope>
Response - fails due to templateID not being recognized
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><soap:Fault><faultcode>soap:Server</faultcode><faultstring>Exception of type 'Microsoft.SharePoint.SoapServer.SoapServerException' was thrown.</faultstring><detail><errorstring xmlns="http://schemas.microsoft.com/sharepoint/soap/">Cannot complete this action.
Please try again.</errorstring><errorcode xmlns="http://schemas.microsoft.com/sharepoint/soap/">0x81072101</errorcode></detail></soap:Fault></soap:Body></soap:Envelope>
I'm stumped! So if you can help - I'd be a very happy person! Thanks, in advance!
I would chase why you can't create the list via the interface in the first place, these two web service calls don't seem to include the important parameter when creating from custom templates, lets analyse the querystrings:
New Project Tasks (out of the box)
http://site/_layouts/new.aspx?FeatureId={00bfea71-513d-4ca0-96c2-6a47775c0119}&ListTemplate=150
New Project Tasks Custom (saved in the list template gallery)
http://site/_layouts/new.aspx?CustomTemplate=PT6.stp&FeatureId={00bfea71-513d-4ca0-96c2-6a47775c0119}&ListTemplate=150
New Project Tasks Custom (manifest.xml edited to 151)
http://site/_layouts/new.aspx?CustomTemplate=PT6.stp&FeatureId={00bfea71-513d-4ca0-96c2-6a47775c0119}&ListTemplate=151
They all work, so my take here is that the Web Service is a no no for custom templates, or it has some secret magic (common in list definitions) since specifying only the ListTemplate without being explicitly CUSTOM won't work even in the UI.
If you can't get around with this apparent limitation, my suggestions are:
.NET, note that this post has some voodoo in the first comment if you happen to get the same error
Make an IFRAME with http://site/_layouts/new.aspx?CustomTemplate=PT6.stp&FeatureId={00bfea71-513d-4ca0-96c2-6a47775c0119}&ListTemplate=150 as source and fill the fields using javascript, and then trigger the OK button click, make some full page loading transition and it will even look good.
Method 2 needs to be done from the same domain, if you are not running your PHP from the same domain (unlikely) you need to create a page inside the SharePoint site to contain this hack, it can be as simple as a Web Part Page with a Content Editor Web Part in it, you read some querystring parameters, place them on the fields, trigger the OK and wait for the page to change so you can redirect to a "success" page.
Edit: I got curious and looked at the source of New.aspx, it has this little snippet (bIsCustomTemplate = strCustomTemplate != null, strCustomTemplate = querystring "CustomTemplate"):
<% if (bIsCustomTemplate) { %>
<input id="onetidCustomTemplate" type="Hidden" name="CustomTemplate" value=<%SPHttpUtility.AddQuote(SPHttpUtility.HtmlEncode(strCustomTemplate),Response.Output);%> />
<% } %>
I looked at the disassembled code but I don't think we can post it here, but it only proves that the UI builds it from a post (Request.Form) and looks for the CustomTemplate parameter, and the Web Service has only those methods were you can't specify a custom template.