Simple syntax question - regex

First off, sorry for my noob-ness. Believe me when i say ive been rtfm'ing. Im not lazy, im just dumb (apparently). On the bright side, this could earn someone some easy points here.
I'm trying to do a match/replace with a pattern that contains special characters, and running into syntax errors in a Flex 3 app. I just want the following regex to compile... (while also replacing html tags with "")
value.replace(/</?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?>/g, "");
On a side note, the pattern /<.*?>/g wouldn't work in cases where there are html entities between tags,
like so:
<TEXTFORMAT LEADING="2">
<P ALIGN="LEFT">
<FONT FACE="Arial" SIZE="11" COLOR="#4F4A4A" LETTERSPACING="0" KERNING="0"><one</FONT>
</P>
</TEXTFORMAT><TEXTFORMAT LEADING="2">
<P ALIGN="LEFT">
<FONT FACE="Arial" SIZE="11" COLOR="#4F4A4A" LETTERSPACING="0" KERNING="0">two</FONT>
</P>
</TEXTFORMAT>
The first regex would get both "<one" and "two", but the second would only get "hi"
Thanks!
Stabby L

Here is what you are looking for:
// strips htmltags
// #param html - string to parse
// #param tags - tags to ignore
public static function stripHtmlTags(html:String, tags:String = ""):String
{
var tagsToBeKept:Array = new Array();
if (tags.length > 0)
tagsToBeKept = tags.split(new RegExp("\\s*,\\s*"));
var tagsToKeep:Array = new Array();
for (var i:int = 0; i < tagsToBeKept.length; i++)
{
if (tagsToBeKept[i] != null && tagsToBeKept[i] != "")
tagsToKeep.push(tagsToBeKept[i]);
}
var toBeRemoved:Array = new Array();
var tagRegExp:RegExp = new RegExp("<([^>\\s]+)(\\s[^>]+)*>", "g");
var foundedStrings:Array = html.match(tagRegExp);
for (i = 0; i < foundedStrings.length; i++)
{
var tagFlag:Boolean = false;
if (tagsToKeep != null)
{
for (var j:int = 0; j < tagsToKeep.length; j++)
{
var tmpRegExp:RegExp = new RegExp("<\/?" + tagsToKeep[j] + " ?/?>", "i");
var tmpStr:String = foundedStrings[i] as String;
if (tmpStr.search(tmpRegExp) != -1)
tagFlag = true;
}
}
if (!tagFlag)
toBeRemoved.push(foundedStrings[i]);
}
for (i = 0; i < toBeRemoved.length; i++)
{
var tmpRE:RegExp = new RegExp("([\+\*\$\/])","g");
var tmpRemRE:RegExp = new RegExp((toBeRemoved[i] as String).replace(tmpRE, "\\$1"),"g");
html = html.replace(tmpRemRE, "");
}
return html;
}
see http://fightskillz.com/2010/01/flexactionscript-3-0-strip-html-tags-function/ for more information.

Related

Google Script, how can use variables with regex search?

Very inexperienced coder here, I have recently gotten a script working that uses regex to search for two different words occurring within a certain word limit. So I can search for "the" and "account" occurring within 10 words of each other, then my script prints the sentence it occurs in. However, my work requires me to search for lots of different work combinations and it has become a pain having to enter each word manually into the string /\W*(the)\W*\s+(\w+\s+){0,10}(account)|(account)\s+(\w+\s+){0,10}(the)/i; for example.
I would like to have something in the script where I can enter the words I want to search for just once, and they will be used in the string above. I have tried, what I think is, declaring variables like this:
var word1 = the
var word2 = account
/\W*(word1)\W*\s+(\w+\s+){0,10}(word2)|(word2)\s+(\w+\s+){0,10}(word1)/i;
But, again, very experienced coder so I'm a little out of my depth. Would really like something like the script snippet above to work in my full script listed below.
Here is my full working script without my attempt at declaring variables mentioned above:
var ss = SpreadsheetApp.getActiveSpreadsheet();
var historySheet = ss.getSheetByName('master');
var resultsSheet = ss.getSheetByName('results');
var totalRowsWithData = historySheet.getDataRange().getNumRows();
var data = historySheet.getRange(1, 1, totalRowsWithData, 3).getValues();
var regexp = /\W*(the)\W*\s+(\w+\s+){0,10}(account)|(account)\s+(\w+\s+){0,10}(the)/i;
var result = [];
for (var i = 0; i < data.length; i += 1) {
var row = data[i];
var column = row[0];
if (regexp.exec(column) !== null) {
result.push(row); }}
if (result.length > 0) {
var resultsSheetDataRows = resultsSheet.getDataRange().getNumRows();
resultsSheetDataRows = resultsSheetDataRows === 1 ? resultsSheetDataRows : resultsSheetDataRows + 1;
var resultsSheetRange = resultsSheet.getRange(resultsSheetDataRows, 1, result.length, 3);
resultsSheetRange.setValues(result);}}
I tried this solution but not sure I have done it correctly as it only enters results in logs and not printing in the "results" sheet:
var ss = SpreadsheetApp.getActiveSpreadsheet();
var historySheet = ss.getSheetByName('Sheet1');
var resultsSheet = ss.getSheetByName('Results1');
var totalRowsWithData = historySheet.getDataRange().getNumRows();
var data = historySheet.getRange(1, 1, totalRowsWithData, 3).getValues();
const regexpTemplate = '\W*(word1)\W*\s+(\w+\s+){0,10}(word2)|(word2)\s+(\w+\s+){0,10}(word1)';
var word1 = 'test1';
var word2 = 'test2';
var regexpString = regexpTemplate.replace(/word1/g, word1).replace(/word2/g, word2);
var regexp = new RegExp(regexpString, 'i');
Logger.log(regexp); // /W*(the)W*s+(w+s+){0,10}(account)|(account)s+(w+s+){0,10}(the)/i
var result = [];
for (var i = 0; i < data.length; i += 1) {
var row = data[i];
var column = row[0];
if (regexp.exec(column) !== null) {
result.push(row); }}
if (result.length > 0) {
var resultsSheetDataRows = resultsSheet.getDataRange().getNumRows();
resultsSheetDataRows = resultsSheetDataRows === 1 ? resultsSheetDataRows : resultsSheetDataRows + 1;
var resultsSheetRange = resultsSheet.getRange(resultsSheetDataRows, 1, result.length, 3);
resultsSheetRange.setValues(result);}}
Use the RegExp contructor.
const regexpTemplate = '\\W*(word1)\\W*\\s+(\\w+\\s+){0,10}(word2)|(word2)\\s+(\\w+\\s+){0,10}(word1)';
var word1 = 'the';
var word2 = 'account';
var regexpString = regexpTemplate.replace(/word1/g, word1).replace(/word2/g, word2);
var regexp = new RegExp(regexpString, 'i');
Logger.log(regexp); // /\W*(the)\W*\s+(\w+\s+){0,10}(account)|(account)\s+(\w+\s+){0,10}(the)/i
You can put this into a function to easily generate your new regular expression whenever you want to update the words.
/**
* Generate the regular expression with the provided words.
* #param {String} word1
* #param {String} word2
* #returns {RegExp}
*/
function generateRegExp(word1, word2) {
const regexpTemplate = '\\W*(word1)\\W*\\s+(\\w+\\s+){0,10}(word2)|(word2)\\s+(\\w+\\s+){0,10}(word1)';
var regexpString = regexpTemplate.replace(/word1/g, word1).replace(/word2/g, word2);
return new RegExp(regexpString, 'i');
}
/**
* Test the generateRegExp() function.
*/
function test_generateRegExp() {
var word1 = 'the';
var word2 = 'account';
var regexp = generateRegExp(word1, word2); // /\W*(the)\W*\s+(\w+\s+){0,10}(account)|(account)\s+(\w+\s+){0,10}(the)/i
// Use regexp just as you do in your script
// i.e. if (regexp.exec(column) !== null) { result.push(row); }
}
Your final script could look something like this.
function printSentences() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var historySheet = ss.getSheetByName('Sheet1');
var resultsSheet = ss.getSheetByName('Results1');
var totalRowsWithData = historySheet.getDataRange().getNumRows();
var data = historySheet.getRange(1, 1, totalRowsWithData, 3).getValues();
var result = [];
var regexp = generateRegExp("the", "account");
for (var i = 0; i < data.length; i += 1) {
var row = data[i];
var column = row[0];
if (regexp.exec(column) !== null) {
result.push(row);
}
}
if (result.length > 0) {
var resultsSheetDataRows = resultsSheet.getDataRange().getNumRows();
resultsSheetDataRows = resultsSheetDataRows === 1 ? resultsSheetDataRows : resultsSheetDataRows + 1;
var resultsSheetRange = resultsSheet.getRange(resultsSheetDataRows, 1, result.length, 3);
resultsSheetRange.setValues(result);
}
}
/**
* Generate the regular expression with the provided words.
* #param {String} word1
* #param {String} word2
* #returns {RegExp}
*/
function generateRegExp(word1, word2) {
const regexpTemplate = '\\W*(word1)\\W*\\s+(\\w+\\s+){0,10}(word2)|(word2)\\s+(\\w+\\s+){0,10}(word1)';
var regexpString = regexpTemplate.replace(/word1/g, word1).replace(/word2/g, word2);
return new RegExp(regexpString, 'i');
}

Finding text between tag <a> (regex with variable)

function Selection () {
if (typeof window.getSelection != "undefined") {
var sel = window.getSelection();
if (sel.rangeCount) {
var container = document.createElement("div");
for (var i = 0, len = sel.rangeCount; i < len; ++i) {
container.appendChild(sel.getRangeAt(i).cloneContents());
}
html = container.innerHTML;
}
} else if (typeof document.selection != "undefined") {
if (document.selection.type == "Text") {
html = document.selection.createRange().htmlText;
}
}
var result = html.indexOf("href");
if (result != -1) {
window.getSelection().empty();
window.getSelection().removeAllRanges();
} else {
var textofclassText = $('.text').html();
//var patt1 = /id=\"load([\s\S]*?)+[selectedTexthtml]/g;
var result_2;
//var result_2 = textofclassText.search(patt1);
if(result_2 != "null") {
window.getSelection().empty();
window.getSelection().removeAllRanges();
}
}
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<html>
<body>
<div class="text" onmouseup="Selection()">
You
<br><a class="good" data-id="3" href="#load" id="load3">are very </a> important for us.
<br>So we're offering you
<br><a class="good" data-id="4" href="#load" id="load4"> win-win</a> option.
<br>You will get 2 refrigerators if you buy one
</div>
</body>
</html>
I need to remove selection if the selected text contains "a" tag. If I start to select link-text from left I will get only text "are very", so I need to find first match in all class ".text". I can't get how to use variable in that regex, because when I try to build string from variables it doesn't work.
var patt1 = new RegExp("id=\"load([\s\S]*?)" + html, "g");
var result = textofclassText.search(patt1);

How do I outdent by a specific amount of tabs?

I'm trying to create a function that I can use to outdent (versus indent) a specific amount.
Here is what I have so far. This removes all tabs at the beginning of the lines. I think I need to create a dynamic pattern or use a function but I'm stuck:
var outdentPattern:RegExp = /([\t ]*)(.+)$/gm;
function outdent(input:String, outdentAmount:String = "\t"):String {
var outdentedText:String = input.replace(outdentPattern, outdentAmount + "$2");
return outdentedText;
}
Here is test data:
<s:BorderContainer>
<html:htmlOverride><![CDATA[
<script>
var test:Boolean = true;
test = "string";
</script>]]>
</html:htmlOverride>
</s:BorderContainer>
The test would be remove one tab, remove two tabs, etc.
Expected results at one tab would be:
<s:BorderContainer>
<html:htmlOverride><![CDATA[
<script>
var test:Boolean = true;
test = "string";
</script>]]>
</html:htmlOverride>
</s:BorderContainer>
And two tabs:
<s:BorderContainer>
<html:htmlOverride><![CDATA[
<script>
var test:Boolean = true;
test = "string";
</script>]]>
</html:htmlOverride>
</s:BorderContainer>
And three tabs with the inner tabs (whitespace) collapsing down:
<s:BorderContainer x="110" height="160" width="240" y="52">
<html:htmlOverride><![CDATA[
<script>
var test:Boolean = true;
</script>
]]></html:htmlOverride>
</s:BorderContainer>
Interesting note:
The editor on SO is outdenting when you click code button when the code is already indented.
You could either construct a RegExp object from a template, or you could use the regular expression several times:
var temp:String = '^[\t ]{0,';
function outdent(input:String, amount:Number = 1):String {
return input.replace(new RegExp(temp + amount.toString() + '}', 'gm'), '');
}
Or:
var pattern:RegExp = /^[\t ]/gm;
function outdent(input:String, amount:Number = 1):String {
for (var i:Number = 0; i < amount; i++)
input = input.replace(pattern, '');
return input;
}

Extract info from email body with Google Scripts

I am trying to extract specific info from email in one of my labels in Gmail. I've hacked (my scripting knowledge is very limited) the following together based on a script from https://gist.github.com/Ferrari/9678772. I am getting an error though: "Cannot convert Array to Gmail Thread - Line 5"
Any help will be greatly appreciated.
/* Based on https://gist.github.com/Ferrari/9678772 */
function parseEmailMessages(start) {
/* var threads = GmailApp.getInboxThreads(start, 100); */
var threads = GmailApp.getMessagesForThread(GmailApp.search("label:labelname"));
var sheet = SpreadsheetApp.getActiveSheet();
var tmp, result = [];
for (var i = 0; i < threads.length; i++) {
// Get the first email message of a threads
var message = threads[i].getMessages()[0];
// Get the plain text body of the email message
// You may also use getRawContent() for parsing HTML
var content = messages[0].getPlainBody();
// Implement Parsing rules using regular expressions
if (content) {
tmp = content.match(/Name and Surname:\n([A-Za-z0-9\s]+)(\r?\n)/);
var username = (tmp && tmp[1]) ? tmp[1].trim() : 'No username';
tmp = content.match(/Phone Number:\n([\s\S]+)/);
var phone = (tmp && tmp[1]) ? tmp[1] : 'No phone';
tmp = content.match(/Email Address:\n([A-Za-z0-9#.]+)/);
var email = (tmp && tmp[1]) ? tmp[1].trim() : 'No email';
tmp = content.match(/Prefered contact office:\n([\s\S]+)/);
var comment = (tmp && tmp[1]) ? tmp[1] : 'No office';
sheet.appendRow([username, phone, email, comment]);
}
}
};
Thanks folks.. This did the trick:
// Adapted from https://gist.github.com/Ferrari/9678772
function processInboxToSheet() {
// Have to get data separate to avoid google app script limit!
var start = 0;
var label = GmailApp.getUserLabelByName("yourLabelName");
var threads = label.getThreads();
var sheet = SpreadsheetApp.getActiveSheet();
var result = [];
for (var i = 0; i < threads.length; i++) {
var messages = threads[i].getMessages();
var content = messages[0].getPlainBody();
// implement your own parsing rule inside
if (content) {
var tmp;
tmp = content.match(/Name and Surname:\n([A-Za-z0-9\s]+)(\r?\n)/);
var username = (tmp && tmp[1]) ? tmp[1].trim() : 'No username';
tmp = content.match(/Phone Number:\n([\s\S]+)/);
var phone = (tmp && tmp[1]) ? tmp[1] : 'No phone';
tmp = content.match(/Email Address:\n([A-Za-z0-9#.]+)/);
var email = (tmp && tmp[1]) ? tmp[1].trim() : 'No email';
tmp = content.match(/Prefered contact office:\n([\s\S]+)/);
var comment = (tmp && tmp[1]) ? tmp[1] : 'No office';
sheet.appendRow([username, phone, email, comment]);
Utilities.sleep(500);
}
}
};
var threads = GmailApp.getMessagesForThread(GmailApp.search("label:labelname"));
should include an array index since GmailApp.search returns an array, even if only one item is found.
var threads = GmailApp.getMessagesForThread(GmailApp.search("label:labelname")[0]);
would work but is wordy.
var thread_list = GmailApp.search("label:labelname");
var threads = GmailApp.getMessagesForThread(thread_list[0]);
IMO, the above is clearer in meaning.

Get URL parameter - sharepoint

I am trying to extract an ID from my URL and paste it in the sharepoint List into the ID text box, I have tried it like that:
<script type=”text/javascript”>
function GetQueryStringParams(sParam)
{
var sPageURL = window.location.search.substring(1);
var sURLVariables = sPageURL.split('&');
for (var i = 0; i < sURLVariables.length; i++)
{
var sParameterName = sURLVariables[i].split('=');
if (sParameterName[0] == sParam)
{
return sParameterName[1];
}
}
}​
document.forms['aspnetForm'].ctl00_m_g_d9f5e4f9_7e05_4a4d_88b7_a6b1dcdda667_ctl00_ctl01_ctl00_ctl00_ctl00_ctl04_ctl00_ctl00_TextField.value=GetQueryStringParams(‘ID’);
</script>
however it does not work, my URL:
http://Lists/Systems%20Survey/NewForm.aspx?Source=http://Lists/Systems%2520Survey/overview.aspx&ID=14-01234
I have put it as a web part xml on my web.
Do you have any idea what I did wrong? :)
thanks for assistance
Use the function below as getQueryVariable('source')
function getQueryVariable(variable) {
var query = window.location.search.substring(1);
var vars = query.split('&');
for (var i = 0; i < vars.length; i++) {
var pair = vars[i].split('=');
if (decodeURIComponent(pair[0]) == variable) {
return decodeURIComponent(pair[1]);
}
}
console.log('Query variable %s not found', variable);
}