Extract number from html page using ant and regex - regex

I have to extract number from web page using ant. I have downloaded page using task.
Ma page is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Index of .......</TITLE>
</HEAD>
<BODY>
<H1>Index of .....</H1>
<PRE><IMG SRC="/icons/blank.gif" ALT=" "> Name Last modified Size Description
<HR>
<IMG SRC="/icons/back.gif" ALT="[DIR]"> Parent Directory 19-Dec-2012 11:39 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> 20120114-1731/ 14-Feb-2012 17:40 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> 20120115-1055/ 15-Feb-2012 11:04 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> 20120115-1336/ 15-Feb-2012 13:44 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> 20120115-1656/ 15-Feb-2012 17:05 -
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> 20120115-2157/ 15-Feb-2012 22:06 -
</PRE><HR>
<ADDRESS>Apache/1.3.41 Server at romgsa.ibm.com Port 443</ADDRESS>
</BODY></HTML>
From:
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="20120114-1731/"&
gt;20120114-1731/</A> I
I have to extract "20120114-1731"

The following example embeds a groovy script. Groovy has a useful Grab annotation which can be used to download Java libraries like htmlcleaner, which enables a HTML page to be parsed as XML.
Example
The bootstrap target will download and install groovy.
$ ant bootstrap
Running the build produces the following expected output:
$ ant
..
parse:
[groovy] 20120114-1731/
[groovy] 20120115-1055/
[groovy] 20120115-1336/
[groovy] 20120115-1656/
[groovy] 20120115-2157/
build.xml
<project name="demo" default="parse">
<target name="bootstrap">
<mkdir dir="${user.home}/.ant/lib"/>
<get dest="${user.home}/.ant/lib/groovy-all.jar" src="http://search.maven.org/remotecontent?filepath=org/codehaus/groovy/groovy-all/2.1.1/groovy-all-2.1.1.jar"/>
<get dest="${user.home}/.ant/lib/ivy.jar" src="http://search.maven.org/remotecontent?filepath=org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar"/>
</target>
<target name="parse">
<taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy"/>
<groovy>
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.SimpleXmlSerializer;
#Grab(group='net.sourceforge.htmlcleaner', module='htmlcleaner', version='2.2.1')
// HTML page to parse
def address = 'file:///path/to/example/page.html'
// Clean any messy HTML
def cleaner = new HtmlCleaner()
def node = cleaner.clean(address.toURL())
// Convert from HTML to XML
def serializer = new SimpleXmlSerializer(cleaner.getProperties())
def xml = serializer.getXmlAsString(node)
// Parse the XML into a document we can work with
def page = new XmlSlurper(false,false).parseText(xml)
// Retrieve the anchor tag values matching a pattern
def numbers = page.body.pre.a.findAll { it.toString().startsWith("2012") }
numbers.each {
println it
}
</groovy>
</target>
</project>

Related

HTTP Status 404 - /beer/SelectBeer.do

Web.xml :
Hi , can't solve the error, spent over two days and the same issue.
Please help ,do I have any mistake ?
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
version="3.1"
metadata-complete="true">
<display-name>Welcome to Tomcat</display-name>
<description>
Welcome to Tomcat
</description>
<servlet>
<servlet-name>beer</servlet-name>
<servlet-class>com.example.web</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>beer</servlet-name>
<url-pattern>/SelectBeer.do</url-pattern>
</servlet-mapping>
</web-app>
Form.html :
<html><body>
<h1 align="center">Beer Selection Page</h1>
<form method="POST" action="SelectBeer.do">
Select beer characteristics<p>
Color:
<select name="color" size="1">
<option value="light"> light </option>
<option value="amber"> amber </option>
<option value="brown"> brown </option>
<option value="dark"> dark </option>
</select>
<br><br>
<center>
<input type="submit" value="ok"></center>
</form></body></html>
BeerSelect.java
package com.example.web;
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
public class BeerSelect extends HttpServlet {
public void doPost(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
response.setContentType("text/html");
PrintWriter out = response.getWriter();
out.println("Beer Selection Advice<br>");
String c = request.getParameter("color");
out.println("<br>Got beer color " + c);
}
}
deployment is
C:\Program Files\tomcat\webapps\beer
inside beer is form.html and WEB-INF
inside WEB_INF Web.xml and classes
inside classes is com /example/web/BeerSelect.java
I can log into locathost:8080/beer/form.html but when I chose the color and hit submit the error occur
HTTP Status 404 - /beer/SelectBeer.do
I have compiled BeerSelect.java
type Status report
message /beer/SelectBeer.do "
description The requested resource is not available.
Apache Tomcat/8.0.22
I noticed that your servlet mapping in web.xml maps 'beer' to 'com.example.web'
note this should be 'com.example.web.BeerSelect' according to my understanding.
The directory structure is important. You need to create your directory structure inside Tomcat/webapps/
Example Structure could be:
- Beer/Form.html
- Beer/WEB-INF/web.xml
- Beer/WEB-INF/com/example/web/BeerSelect.class
Note: there should be a folder named WEB-INF not WEB_INF etc this could result in a 404, cause tomcat looks for the mapping(DD) inside WEB-INF
The initial statement inside web.xml ' ...'could also cause a problem. Try to copy that statement form one of the example provided by your Tomcat version.
Hope that helps

How to structure a master page with coldfusion?

I have a small coldfusion section of our site that all uses similar js and css files and page structure. The code is currently repeated for each file, and I'd like to factor it out and set something up using a master page and templates.
Master.cfm page:
<!--- Master template, includes all necessary js and css files.
Expects following variables to be defined:
- pageName - name of the file to be loaded as the body of the page
- title - text to be used as the title of the page and as the header text in the header bar --->
<cfinclude template="_lockedPage.cfm" />
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>#title#</title>
... all script and css links here ...
<script type="text/javascript" src="js/jquery-1.9.1.min.js"></script>
<script type="text/javascript" src="js/jquery.mobile-1.3.2.js"></script>
... etc ...
</head>
<body>
<div data-role="page">
<div class="headerDiv" data-role="header" data-theme="b" data-position="fixed">
<a id="backButton" data-role="button" data-direction="reverse" data-rel="back" data-icon="arrow-l" data-iconpos="left" data-theme="a">Back</a>
<h1><cfoutput>#title#</cfoutput></h1>
Home
</div>
<div data-role="content" class="container">
<cfinclude template="#pageName#.cfm" />
</div>
</div>
</body>
</html>
Then a page example would be something like this. CustomerSearch.cfm:
<cfscript>
title = "Customer Search";
pageName = "_customer-search";
include "Master.cfm";
</cfscript>
And then I would need a _customer-search.cfm page that would include all the body content for the page.
This means that I would need 2 files for every page that we currently have - the outer page that defines the variable and includes the master page, and the template page that has the individual page content.
Is this a good logical structure? Is there anyway to improve it?
You have the right idea, but I think you'll end up with a lot of unnecessary files. You could instead create a header.cfm and a footer.cfm that contain your global HTML. Each page would include those files and the content would be written between.
<cfset title = "Customer Search">
<cfinclude template="global_header.cfm">
<!--- This will be the content of your page. --->
<cfinclude template="global_footer.cfm">
This file would be named customer_search.cfm. Anytime you update the header or footer, it's a global change.
If you have a lot of business logic and query code that needs to exist on multiple pages, you might look into using an MVC framework to help you organize and reuse code. I prefer ColdBox (try ColdBox Lite), but many people use Framework/1.
I found that using custom tags was another simple solution and in my opinion a better solution to creating a separate header.cfm and footer.cfm.
In master.cfm:
<cfif ThisTag.ExecutionMode EQ 'start'>
[HEADER]
<cfelse>
[FOOTER]
<cfif>
In each content page:
<cf_master>
[CONTENT GOES HERE]
</cf_master>
If you'd like to pass in variables to the master page simply add it as an attribute to the opening tag:
<cf_master Title="Content Title">
And make sure the attribute is specified in the master file:
<cfparam name="Attributes.Title" default=""/>
<head>
<title><cfoutput>#Attributes.Title#</cfoutput></title>
</head>
The key for me was understanding the ThisTag.ExectuionMode. If you use custom tags you can either use just one tag or use an opening and closing tag. If you use an opening and closing tag then you can choose to include some content in the opening tag <cf_master>, and other content in the closing tag </cf_master>. That is why you need the if/else condition in master.cfm. In this case it is useful because then you can include a HEADER in the opening tag and a FOOTER in the closing tag.
Also, in case this isn't obvious, when you call your custom tag, it should match the name of the file where the code is stored. In my case <cf_master> matches master.cfm.
I used this page as a tutorial for custom tags: https://www.petefreitag.com/item/64.cfm
The Application.cfc can be a great use for common page design. Basically have one template and inject the pages generated content. Dan Bracuk commented in the other solution about using the Application.cfc onRequestStart() and onRequestEnd() methods but I use it slightly differently. Here is my general setup:
Application.cfc
// This is <cfscript> but it could be regular CFML too
component {
public function onRequest( required string targetPage ) {
// Capture/buffer the requested pages output
savecontent variable='LOCAL.output' {
include ARGUMENTS.targetPage;
}
// Use the output as the page content
// if the page did not specify content
param string REQUEST.content = LOCAL.output;
// Inject the design template
// which should output the page content somewhere
include '/path/to/template.cfm';
}
}
template.cfm
<!DOCTYPE html>
<cfparam name="REQUEST.title" type="string" /><!--- required --->
<cfparam name="REQUEST.head" type="string" default="" />
<cfparam name="REQUEST.content" type="string" /><!--- required --->
<html>
<head>
<title><cfoutput>#REQUEST.title#</cfoutput></title>
<link rel="stylesheet" href="path/to/common.css" />
<script src="path/to/common.js"></script>
<cfoutput>#REQUEST.head#</cfoutput>
</head>
<body>
<header>...</header>
<cfoutput>#REQUEST.content#</cfoutput>
<footer>...</footer>
</body>
</html>
each-page.cfm
<cfset REQUEST.title = "My Page Title" />
<cfsavecontent variable="REQUEST.head">
<!-- page specific head elements here -->
</cfsavecontent>
<!-- Regular page code/HTML output here -->
<!--- or you could use another <cfsavecontent> block --->
<!--- to save specific output sections --->
<p>Hello World</p>
This way allows you to keep the template all within one file which is much easier when designing it in a WYSIWYG manor. It also allows each page to set variables used in the design template, since the requested page is executed before the design template is included.
And, there is no need to <cfinclude> templates on each page since Application.cfc onRequest() will get called for ALL pages by default. If there are .cfm pages which should NOT include the design template, such as PDF output, then you'll need to add some logic to just dump the output and not include the design template.

How to modify custom Confluence page so that its viewable like default options

I am pursuing the task of adding my custom tabs inside my Confluence set up. I have chosen the 'Advanced' region as the favourable spot to realize this.
So i click on the Space name. Then I goto 'Browse>>Advanced' and see This http://imageshack.us/f/405/advanc.png . The "Freeway Project Creation " tab we see in this image was custom added by me.
I wrote this class
package com.atlassian.myorg;
import com.atlassian.confluence.core.ConfluenceActionSupport;
import com.atlassian.confluence.pages.AbstractPage;
import com.atlassian.confluence.pages.actions.PageAware;
import com.opensymphony.xwork.Action;
/**
* The simplest action possible
*/
public class ExampleAction extends ConfluenceActionSupport
{
#Override
public String execute() throws Exception
{
return Action.SUCCESS;
}
}
used this atlassian-plugin.xml
<atlassian-plugin key="${project.groupId}.${project.artifactId}" name="${project.name}" plugins-version="2">
<plugin-info>
<description>${project.description}</description>
<version>${project.version}</version>
<vendor name="${project.organization.name}" url="${project.organization.url}" />
</plugin-info>
<resource type="i18n" name="i18n" location="message" />
<web-item name="add-fpc-label-action-web-ui" key="add-fpc-label-action-web-ui" section="system.space" weight="150">
<description key="item.add-fpc-label-action-web-ui.link.desc">Allows the Create Freeway Project functionality.</description>
<label key="Freeway Project Creation"/>
<link linkId="add-fpc-label-action">/plugins/examples/hello.action?key=$helper.space.key</link>
</web-item>
<xwork name="My Example Action" key="example-action">
<description>Shows a simple "Hello, World!" Action</description>
<package name="examples" extends="default" namespace="/plugins/examples">
<default-interceptor-ref name="validatingStack" />
<action name="hello" class="com.atlassian.myorg.ExampleAction">
<result name="success" type="velocity">/templates/example/hello.vm</result>
</action>
</package>
</xwork>
</atlassian-plugin>
The below seen is the VM
<html>
<head>
<title>This is my Example action!</title>
<meta name="decorator" content="atl.general" />
</head>
<body>
<strong>Hello, Confluence World!</strong>
</body>
</html>
As a result when i click on this tab named "Freeway Project Creation " i see this page http://imageshack.us/f/846/imageprf.png/
Well this was good enough. But i wanted to have this page seen in the 'body area' besides the sidebar. Like for example if we click on the "Space Admin" tab and we click on 'Edit Space Label' from the side bar; we see the resulting page in the 'body area' marked http://imageshack.us/f/809/bodyarea.png/.
Would like to have your sugessions as to how that can be achieved?
Thanks
A
Please try this
##requireResource("confluence.web.resources:space-admin")
<html>
<head>
<title>This is my Example action!</title>
<meta name="decorator" content="atl.general" />
</head>
<content tag="key">$action.space.key</content>
<body>
#applyDecorator("root")
#decoratorParam("helper" $action.helper)
#decoratorParam("context" "space-administration")
#decoratorParam("mode" "view-space-administration")
#applyDecorator ("root")
#decoratorParam ("context" "spaceadminpanel")
#decoratorParam ("selection" "add-fpc-label-action-web-ui")
#decoratorParam ("title" $action.getText("action.name"))
#decoratorParam ("selectedTab" "admin")
#decoratorParam("helper" $action.helper)
<strong>Hello, Confluence World!</strong>
#end
#end
</body>
</html>

Adding new view to Dexterity type causes "page not found" viewing items

I'm working through the recent Professional Plone 4 Development book, on a Plone 4.1.2 install.
I have successfully defined the content types via Dexterity and am now trying to create a custom view for one of the types. The schema & view are defined as such:
from zope import schema
from plone.directives import form
from five import grok
from ctcc.contenttypes import CTCCTypesMessageFactory as _
class ITrial(form.Schema):
"""A clinical trial."""
title = schema.TextLine(
title = _(u'label_title', default=u'Title'),
required = True,
)
description = schema.Text(
title=_(u'label_description', default=u'Description'),
description = _(u'help_description', default=u'A short summary of the content'),
required = False,
missing_value = u'',
)
class View(grok.View):
grok.context(ITrial)
grok.require('zope2.View')
grok.name('view')
Here is the relevant section from the type's FTI:
view
False
<alias from="(Default)" to="(selected layout)"/>
<alias from="edit" to="##edit"/>
<alias from="sharing" to="##sharing"/>
<alias from="view" to="##view"/>
<action title="View" action_id="view" category="object" condition_expr=""
url_expr="string:${folder_url}/" visible="True">
<permission value="View"/>
</action>
And the template itself, located in ctcc.contenttypes/trial_templates/view.pt, which should simply display the title & description:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
xmlns:tal="http://xml.zope.org/namespaces/tal"
xmlns:metal="http://xml.zope.org/namespaces/metal"
xmlns:i18n="http://xml.zope.org/namespaces/i18n"
lang="en"
metal:use-macro="context/main_template/macros/master"
i18n:domain="ctcc.contenttypes">
<body>
<metal:content-core fill-slot="content-core">
<metal:content-core define-macro="content-core">
<div tal:replace="structure context/text/output" />
</metal:content-core>
</metal:content-core>
</body>
</html>
Accessing any instances of the type with all this in place causes a "page not found" error. Something doesn't seem to be tying up the new view to the expected path, but as this is my first week with Plone I've no idea where to begin to track this down. I'm seeing no errors running the site in foreground mode either.
Any help whatsoever would be greatly appreciated.
did you included the dependency in setup.py?
install_requires=[
'setuptools',
'plone.app.dexterity',
...
],
did you initialized Grok in your configure.zcml?
<configure
xmlns="http://namespaces.zope.org/zope"
...
xmlns:grok="http://namespaces.zope.org/grok">
<includeDependencies package="." />
<grok:grok package="." />
...
</configure>
did you included Dexterity's GenericSetup profile in your metadata.xml?
<metadata>
<version>1</version>
<dependencies>
<dependency>profile-plone.app.dexterity:default</dependency>
</dependencies>
</metadata>
The problem was with this line in the template:
<div tal:replace="structure context/text/output" />
I had stripped back an example template to what I thought was the bare minimum. Thanks to David Glick's suggestion, I removed NotFound from the ignored exceptions list in error_log and saw the following:
Module Products.PageTemplates.Expressions, line 225, in evaluateText
Module zope.tales.tales, line 696, in evaluate
- URL: /opt/plone41/zeocluster/src/ctcc.contenttypes/ctcc/contenttypes/trial_templates/view.pt
- Line 13, Column 8
- Expression: <PathExpr standard:u'context/text/output'>
[...]
Module OFS.Traversable, line 299, in unrestrictedTraverse
- __traceback_info__: ([], 'text')
NotFound: text
Now that I can see what's causing the problem and have started reading deeper into TALs, I can see why it's failing: ignorance on my behalf, as suspected.
Thanks, everyone!

JSTL Sets and Lists - checking if item exists in a Set

I have a Java Set in my session and a variable also in the session. I need to be able to tell if that variable exists in the set.
I want to use the contains ( Object ) method that Java has for Lists and Sets to test whether that object exists in the set.
Is that possible to do in JSTL? If so, how? :)
Thanks,
Alex
You could do this using JSTL tags, but the result is not optimal:
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<html>
<body>
<jsp:useBean id="numbers" class="java.util.HashSet" scope="request">
<%
numbers.add("one");
numbers.add("two");
numbers.add("three");
%>
</jsp:useBean>
<c:forEach items="${numbers}" var="value">
<c:if test="${value == 'two'}">
<c:set var="found" value="true" scope="request" />
</c:if>
</c:forEach>
${found}
</body>
</html>
A better way would be to use a custom function:
package my.package;
public class Util {
public static boolean contains(Collection<?> coll, Object o) {
if (coll == null) return false;
return coll.contains(o);
}
}
This is defined in a TLD file ROOT/WEB-INF/tag/custom.tld:
<?xml version="1.0" encoding="UTF-8"?>
<taglib xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-jsptaglibrary_2_1.xsd"
version="2.1">
<tlib-version>1.0</tlib-version>
<short-name>myfn</short-name>
<uri>http://samplefn</uri>
<function>
<name>contains</name>
<function-class>my.package.Util</function-class>
<function-signature>boolean contains(java.util.Collection,
java.lang.Object)</function-signature>
</function>
</taglib>
The function can then be imported into your JSPs:
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<%# taglib prefix="myfn" uri="http://samplefn"%>
<html>
<body>
<jsp:useBean id="numbers" class="java.util.HashSet" scope="request">
<%
numbers.add("one");
numbers.add("two");
numbers.add("three");
%>
</jsp:useBean>
${myfn:contains(numbers, 'one')}
${myfn:contains(numbers, 'zero')}
</body>
</html>
The next version of EL (due in JEE6) should allow the more direct form:
${numbers.contains('two')}
If you are using Spring Framework, you can use Spring TagLib and SpEL:
<%# taglib prefix="spring" uri="http://www.springframework.org/tags" %>
---
<spring:eval var="containsValue" expression="yourList.contains(yourValue)" />
Contains (true or false): ${containsValue}