replacing content in a page with varnish + regex - regex

If I want my varnish cache server to replace content inside a page (ie: change the class on a div) from the backend before serving or storing the page (vcl_fetch?), how can this be done?
I would like to use simple regex to perform the replacement as I imagine it is supported natively in varnish.

Modifying response body is not natively supported by Varnish. You need a Varnish module (vmod) for this.
Aivars Kalvans has libvmod-rewrite, which does exactly what you are looking for. However the vmod is a proof of concept and according to Aivars it is not ready for production use. You can use it as a starting point in any case.
If you are using Apache, you can use mod_ext_filter to modify response body. Here's an example from mod_ext_filters documentation. Since you can pass the response body to any external command, it is very easy to do the necessary modifications to the content.
# mod_ext_filter directive to define a filter which
# replaces text in the response
#
ExtFilterDefine fixtext mode=output intype=text/html cmd="/bin/sed s/verdana/arial/g"
<Location />
# core directive to cause the fixtext filter to
# be run on output
SetOutputFilter fixtext
</Location>

Related

Replacing part of ${url}'s from a sitemap in Jmeter

I have a jmeter test plan that goes to a site's sitemap.xml page, retrieves each url on that page with an XPath Extractor, then passes ${url} to a HTTP Request sampler within a ForEach Controller to send the results for each page to a file. This works great, except I just realized that the links on this sitemap.xml page are hardcoded. This is a problem when i want to test https://staging-website.com, but all of the links on sitemap.xml are all www.website.com pages. It seems like there must be a way to replace 'www.website.com' in each ${url} with 'staging-website.com' with regex or something, but I haven't been able to figure out how. Any suggestions would be greatly appreciated.
Add a BeanShell pre-processor to manipulate the url.
String sUrl = vars.get("url");
String sNewUrl = sUrl.replace("www.website.com", "https://staging-website.com");
log.info("sNewUrl:" + sNewUrl);
vars.put("url", sNewUrl);
You can also try to correlate the sitemap.xml with the regular expression extractor positioned till www.website.com so that you extract only the URL portion of the data instead of the full host name. Shouldn't you be having it already since the HTTPSampler only allows you to enter the URI segment and not the host name?
You can use __strReplace() function available via JMeter Plugins project like:
${__strReplace(${url},${url},staging-website.com,)}
Demo:
The easiest way to install JMeter Custom Functions (as well as any other plugins) is using JMeter Plugins Manager
I was able to replace the host within the string by putting
${__javaScript('${url}'.replace('www.website'\,'staging.website'))}
in the path input of the second http request sampler. The answers provided by Selva and Dimitri were more elegant, so if I have time in the future to come back to this I will give them another try. I really appreciate the help!

Syntax highlight on nginx for every cpp without human interaction

Basically, I've a webserver, where I stated in my nginx conf, to show every .cpp as plain text - but I want to make a syntax highlight for more readability.
Any idea how could I proceed?
I want to use google highlights, so any idea about how to insert before an html file before and after every .cpp would suffice.
I thought and tried in the far past using header and footer tags in nginx conf, with no luck whatsoever.
Thanks in advance!
cheers!
As was already pointed out, Nginx is not quite suitable for generating HTML documents by itself. Usually this is a job for a server-side processing language like PHP or Perl. However, there are several ways of solving the problem solely with Nginx.
The first obvious choice would be to use a server-side processing language from within Nginx. There are at least three optional modules for three different languages (Perl, Lua and a dialect of Javascript) that could be used for that.
The problem with this approach is that these modules are rarely available by default, and in many cases you will have to build Nginx manually to enable any of them. Sometimes it can be painful, because as soon as you get your own custom build of Nginx, you will have to support and upgrade it yourself.
There is, however, another option, which involves SSI. It might not be the prettiest solution but it will work. And unlike above-mentioned modules, the SSI support comes with almost every distribution of Nginx. My bet is, your Nginx can do SSI out of the box, without having to compile anything.
So, the configuration goes like this:
# Define a special virtual location for your cpp files
location ~* \.(cpp|h)$ {
# Unless a GET parameter 'raw' is set with 'yes'
if ($arg_raw = 'yes') {
break;
}
# Redirect all the requests for *.cpp and *.h files to another location #js
try_files #js #js;
}
location #js {
ssi on; # Enable SSI in this location
default_type text/html; # Tell the browser that what is returned is HTML
# Generate a suitable HTML document with an SSI insertion
return 200 '<!DOCTYPE html>
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.9.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<pre><code class="cpp"><!--# include virtual="$uri?raw=yes" --></code></pre>';
}
Now here is what happens if you request some *.cpp file in your browser:
The request goes to the first location, because the URI ends with cpp.
Then it is redirected to the second location #js, because there is no GET parameter raw in your request.
In the second location the SSI template is generated with return and then immediately processed by the SSI engine because of ssi on.
The include virtual="$uri?raw=yes" tells the SSI engine to make another request (subrequest) from within Nginx to the originally requested file (the internal variable $uri stores the original URI, that is the web path to your cpp file). The difference between the request from your browser and the subrequest made by Nginx is ?raw=yes.
The subrequest again is handled by the first location, but it never goes to the second one, because of the raw GET parameter. In this case the raw contents of the cpp file is returned as a response to the subrequest.
The SSI engine combines this response with the rest of the template and returns the result to the browser. Additionally, default_type tells the browser to render the result as an HTML document.
You can see an example of the output here. I used this highlighting library for this example. You can change it with whatever you prefer simply modifying the SSI template.

Alternative to <!--#include virtual="somefilename"-->

I have a website running an an old apache server with SSI enabled. My host wants to move to a new server which has SSI disabled for security reasons.
I have a whole lot of pages with Google Friendly urls which just have one line
<!--#include virtual="Url_Including_Search_String"-->
What is the best alternative to the SSI to keep my google friendly search strings returning the specified search result?
I can achieve most of the results with rewrite rules in the .htaccess file, however some search strings have a space in the keyword but the url doesn't. I can't do this with a rewrite rule
ie www.somedomain.com.au/SYDNEY.htm would have
<!--#include virtual="/search.php?keyword=SYDNEY&Submit=SEARCH"-->
However,the issue is
www.somedomain.com.au/POTTSPOINT.htm would have
<!--#include virtual="/search.php?keyword=POTTS+POINT&Submit=SEARCH"-->
A rewrite rule cannot detect where a space should be in a Suburb name, so hoping there is an alternative for <!--#include virtual=
I have looked at RewriteMap but don't think I can access the file I would need to put this in.
I would use Mod Rewrite to redirect any calls to non-existent files to your Search page.
For example:
http://example.com/SYDNEY redirects to
http://example.com/search.php?q=SYDNEY
(assuming there is not actually a /SYDNEY/ file at your server root.)
Then get rid of all those individual redirect pages.
As for the spaces, I'd modify my actual Search page to recognize (for example) "POTTSPOINT" and figure out that the space should be inserted. Basically compare the search term against a database of substitutions.

Is there a "clean URL" (mod_rewrite) equivalent for iPlanet?

I'm working with Coldfusion (because I have to) and we use iPlanet 7 (because we have to), and I would like to pass clean URL's instead of the query-param junk (for numerous reasons). My problem is I don't have access to the overall obj.conf file, and was wondering if there were .htaccess equivalents I could pass on the fly per directory. Currently I am using Application.cfc to force the server to look at index.cfm in root before loading the requested page, but this requires a .cfm file is passed, so it just 404's out if the user provides /path/to/file but no extension. Ultimately, I would like to allow the user to pass domain.com/path/to/file but serve domain.com/index.cfm?q1=path&q2=to&q3=file. Any ideas?
You can mod_dir with the DirectoryIndex directive to set which page is served on /directory/ requests.
http://httpd.apache.org/docs/2.2/mod/mod_dir.html
I'm not sure what exists for iPlanet, haven't had to work with it before. But it would be possible to use a url like index.cfm/path/to/file, and pull the extra path information via the cgi.path_info variable. Not exactly what you're looking for, but cleaner that query-params.

With Coldfusion, how do you handle dynamicaly generated URLs?

(Update: I converted this question to a community wiki as the answer appears more subjective than I thought it would. There are multiple answers depending on one's needs.)
If I have a folder that only includes application.cfc and index.cfm, what is a fast, reliable method to handle dynamically generated URLs? i.e. URLs that do not have a corresponding physical .cfm file.
This example url generates a 404, but it should lookup a page in a db and return it via index.cfm:
http://www.myserver.com/cfdemo/mynewpage.cfm
Should I use onMissingTemplate() in the application.cfc to handle the missing file? Since this method doesn't process onRequestStart(), onRequest() and onRequestEnd(), I wonder if it should be avoided.
Alternately, I could setup an ISAPIRewrite rule since I'm using IIS (or mod_rewrite on Apache)
# IF the request is not /index.cfm, doesn't exist and ends in cfm or html,
# rewrite it. Pass the requested filename $1.$2 as the 1st param: cgi.page
# append the remaining url params $4 ($3 is the ?)
RewriteCond %{SCRIPT_NAME} ^(?!/index.cfm)(.*)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^\/(.*)\.(cfm|html)(\??)(.*)$ /index.cfm?page=$1.$2&$4 [I,L]
Are these methods appropriate, or am I missing a better way of accomplishing this goal? It seems that Coldfusion should have this type of feature built into the application.cfc. Maybe I'm just missing it.
nothing wrong with url rewrite on web server level. I'd vote for that.
Because CF by default handles only cfm/cfc requests, you can do in the beginning of Application.cfc something like this:
<cfif Right(cgi.SCRIPT_NAME, 9) NEQ "index.cfm">
<!--- analyze the SCRIPT_NAME and start processing --->
</cfif>
For other filetypes using web-server configuration is the only way I can see. But instead of creating rewriting rules you can try to use custom 404 handlers. At least when using IIS you'll be able to get the context in cgi.QUERY_STRING, if set up the dummy page, say 404.cfm (it does not need to exist) and putting following check before previous example:
<!--- trap 404 requests triggered by IIS --->
<cfif right(cgi.SCRIPT_NAME, 7) EQ "404.cfm">
<cflog file="mylogfile" text="404 error triggered by IIS. Context: #cgi.QUERY_STRING#">
</cfif>
For Apache it is possible to use following handler, but I'm not sure if you can extract the context in this case:
ErrorDocument 404 /404.cfm
If you are doing this for SES URLs, I'd offer two pieces of advice.
The first is that they matter less and less as time goes on. Google, for example, recognizes that URLs need to include query data.
Second: CF can natively handle SES URLs in the form hostname/file.cfm/param1/param2. Ray Camden's BlogCFC, for example, works that way. It is on by default in CF8, but needs to be enabled in CF7. I don't have a lot of information handy on this, but it should be easy to Google (or Bing, or whatever).
If you can allow it, I'd try to convert URLs like:
http://www.myserver.com/cfdemo/mynewpage.cfm
to:
http://www.myserver.com/cfdemo/mynewpage OR
http://www.myserver.com/index.cfm/cfdemo/mynewpage
so that you don't lose the onRequest methods. The first one can be done only at the webserver level, so in Apache or IIS. The second one can be done in just ColdFusion. See this: http://www.cfcdeveloper.com/index.cfm/2007/4/7/Coldfusion-SES-URL.
Otherwise, if you must have the .cfm at the end, you can use a URL rewrite package in Apache or IIS to strip it out and then forward the request to a cfm page or do what you're doing with onMissingTemplate. I'd try to opt for a solution that doesn't involve losing the onRequest methods, but up to you.
I'd definitely go for URL rewriting. Not only will it be a more predictable, yet generalized approach, but it reduces a significant amount of string parsing load from the CF server. Further, it results in CF handling a request to a real file thereby getting you the benefit of onapplicationstart, onrequeststart, and other events.
As an aside, I've personally always found URLs like /index.cfm/foo/bar/ to look unpro and hackish. Additionally, URLs (like /foo/bar) that don't end in either a file extension or trailing slash are technically incorrect (per old-school static site conventions at the very least) and ought to probably be avoided as well. I'd also be curious where Ben Doom gets his assertion that "The first is that they matter less and less as time goes on. Google, for example, recognizes that URLs need to include query data." In my experience I've actually found the exact opposite to be true.