splitting function attributes - regex

Hi would it be possible to correctly split function attributes using regex ?
i want an expression that splits all attributes in a comma seperated list. But the attributes themselves could be an Array or Object or something that can also contain commas :
ex :
'string1','string2',sum(1,5,8), ["item1","item2","item3"], "string3" , {a:"text",b:"text2"}
this should be split up as :
'string1'
'string2'
sum(1,5,8)
["item1","item2","item3"]
"string3"
{a:"text",b:"text2"}
so the expression should split all commas , but not commas that are surrounded by (), {} or [].
i am trying this in as3 btw
here is some code that will split all the commas (which is ofcourse not what i want) :
var attr:String = "'string1','string2',sum(1,5,8), ['item1','item2','item3'], 'string3' , {a:'text',b:'text2'}";
var result:Array = attr.match(/([^,]+),/g);
trace(attr);
for(var a:int=0;a<result.length;a++){
trace(a,result[a]);
}
here is an expression that allows nested round brackets , but not the others...
/([^,]+\([^\)]+\)|[^,]+),*/g

I've created a little example how to tackle a problem like this, only tested on your input so it might contain horrible mistakes. It only takes into account the parentheses and not the (curly) braces, but those can be easily added.
Basic idea is that you iterate over the characters in the input and add them to the current token if they are not a separator char, and push the current token into the result array when encountering a separator. You have to add a stack that will keep track how 'deep' you are nested to determine of a comma is a separator or part of a token.
For any issue more complicated than this you'll probably be better of using a 'real' parser (and probably a parser-generator), but in this case I think you'll be ok using some custom code.
As you can see parsing code like this quickly becomes quite hard to understand/debug. In a real-case scenario I'd recommend adding more comments, but also a good batch of tests to explain your expected behavior.
package {
import flash.display.Sprite;
public class parser extends Sprite
{
public function parser()
{
var input:String = "'string1','string2',sum(1,5,8), [\"item1\",\"item2\",\"item3\"], \"string3\" , {a:\"text\",b:\"text2\"}"
var result:Array = parseInput(input);
for each (var item:String in result)
{
trace(item);
}
}
// this function only takes into account the '(' and ')' - adding the others is similar.
private function parseInput(input:String):Array
{
var result:Array = [];
trace("parsing: " + input);
var token:String = "";
var parenthesesStack:Array = [];
var currentChar:String;
for (var i:int = 0; i < input.length; i++)
{
currentChar = input.charAt(i)
switch (currentChar)
{
case "(":
parenthesesStack.push("(");
break;
case ")":
if (parenthesesStack.pop() != "(")
{
throw new Error("Parse error at index " + i);
}
break;
case ",":
if (parenthesesStack.length == 0)
{
result.push(token);
token = "";
}
break;
}
// add character to the token if it is not a separating comma
if (currentChar != "," || parenthesesStack.length != 0)
{
token = token + currentChar;
}
}
// add the last token
if (token != "")
{
result.push(token);
}
return result;
}
}
}

Related

Typescript regex exclude whole string if followed by specific string

I'm been running into weird issues with regex and Typescript in which I'm trying to have my expression replace the value of test minus the first instance if followed by test. In other words, replace the first two lines that have test but for the third line below, replace only the second value of test.
[test]
[test].[db]
[test].[test]
Where it should look like:
[newvalue]
[newvalue].[db]
[test].[newvalue]
I've come up with lots of variations but this is the one that I thought was simple enough to solve it and regex101 can confirm this works:
\[(\w+)\](?!\.\[test\])
But when using Typescript (custom task in VSTS build), it actually replaces the values like this:
[newvalue]
[newvalue].[db]
[newvalue].[test]
Update: It looks like a regex like (test)(?!.test) breaks when changing the use cases removing the square brackets, which makes me think this might be somewhere in the code. Could the problem be with the index that the value is replaced at?
Some of the code in Typescript that is calling this:
var filePattern = tl.getInput("filePattern", true);
var tokenRegex = tl.getInput("tokenRegex", true);
for (var i = 0; i < files.length; i++) {
var file = files[i];
console.info(`Starting regex replacement in [${file}]`);
var contents = fs.readFileSync(file).toString();
var reg = new RegExp(tokenRegex, "g");
// loop through each match
var match: RegExpExecArray;
// keep a separate var for the contents so that the regex index doesn't get messed up
// by replacing items underneath it
var newContents = contents;
while((match = reg.exec(contents)) !== null) {
var vName = match[1];
// find the variable value in the environment
var vValue = tl.getVariable(vName);
if (typeof vValue === 'undefined') {
tl.warning(`Token [${vName}] does not have an environment value`);
} else {
newContents = newContents.replace(match[0], vValue);
console.info(`Replaced token [${vName }]`);
}
}
}
Full code is for the task I'm using this with: https://github.com/colindembovsky/cols-agent-tasks/blob/master/Tasks/ReplaceTokens/replaceTokens.ts
For me this regex is working like you are expecting:
\[(test)\](?!\.\[test\])
with a Typescript code like that
myString.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
Instead, the regex you are using should replace also the [db] part.
I've tried with this code:
class Greeter {
myString1: string;
myString2: string;
myString3: string;
greeting: string;
constructor(str1: string, str2: string, str3: string) {
this.myString1 = str1.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.myString2 = str2.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.myString3 = str3.replace(/\[(test)\](?!\.\[test\])/g, "[newvalue]");
this.greeting = this.myString1 + "\n" + this.myString2 + "\n" + this.myString3;
}
greet() {
return "Hello, these are your replacements:\n" + this.greeting;
}
}
let greeter = new Greeter("[test]", "[test].[db]", "[test].[test]");
let button = document.createElement('button');
button.textContent = "Say Hello";
button.onclick = function() {
alert(greeter.greet());
}
document.body.appendChild(button);
Online playground here.

Removing line breaks using Docs GAS

I want to remove all newlines with spaces using google apps script for Docs
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody().editAsText();
body.replaceText("\\t", " "); //works properly for tabs
body.replaceText("\\n", " "); //doesnt work
https://developers.google.com/apps-script/reference/document/body#replacetextsearchpattern-replacement
Any suggestions.?
It seems that the body.replaceText does not allow replacements of what is between paragraphs (the paragraph breaks).
You need to somehow merge all paragraphs into 1 paragraph. You may do it roughly with the following code:
function mergePars() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var pars = body.getParagraphs();
for( var j = 0; j < pars.length; ++j ) {
try {
pars[j].merge();
}
catch (e) {
Logger.log(e); // It will log "Exception: Element must be preceded by an element of the same type."
}
}
}
You may get rid of the try-catch if you get the number of all children in the document (with .getNumChildren()) and then loop through the items checking their DocumentApp.ElementType, and if the previous node was of type DocumentApp.ElementType.PARAGRAPH, apply .merge().

Eliminate newlines in google app script using regex

I'm trying to write part of an add-on for Google Docs that eliminates newlines within selected text using replaceText. The obvious text.replaceText("\n",""); gives the error Invalid argument: searchPattern. I get the same error with text.replaceText("\r","");. The following attempts do nothing: text.replaceText("/\n/","");, text.replaceText("/\r/","");. I don't know why Google App Script does not allow for the recognition of newlines in regex.
I am aware that there is an add-on that does this already, but I want to incorporate this function into my add-on.
This error occurs even with the basic
DocumentApp.getActiveDocument().getBody().textReplace("\n","");
My full function:
function removeLineBreaks() {
var selection = DocumentApp.getActiveDocument().getSelection();
if (selection) {
var elements = selection.getRangeElements();
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
// Only deal with text elements
if (element.getElement().editAsText) {
var text = element.getElement().editAsText();
if (element.isPartial()) {
text.replaceText("\n","");
}
// Deal with fully selected text
else {
text.replaceText("\n","");
}
}
}
}
// No text selected
else {
DocumentApp.getUi().alert('No text selected. Please select some text and try again.');
}
}
It seems that in replaceText, to remove soft returns entered with Shift-ENTER, you can use \v:
.replaceText("\\v+", "")
If you want to remove all "other" control characters (C0, DEL and C1 control codes), you may use
.replaceText("\\p{Cc}+", "")
Note that the \v pattern is a construct supported by JavaScript regex engine, and is considered to match a vertical tab character (≡ \013) by the RE2 regex library used in most Google products.
The Google Apps Script function replaceText() still doesn't accept escape characters, but I was able to get around this by using getText(), then the generic JavaScript replace(), then setText():
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var bodyText = body.getText();
//DocumentApp.getUi().alert( "Does document contain \\t? " + /\t/.test( bodyText ) ); // \n true, \r false, \t true
bodyText = bodyText.replace( /\n/g, "" );
bodyText = bodyText.replace( /\t/g, "" );
body.setText( bodyText );
This worked within a Doc. Not sure if the same is possible within a Sheet (and, even if it were, you'd probably have to run this once cell at a time).
here is my pragmatic solution to eliminate newlines in Google Docs, or, more exact, to eliminate newlines from Gmail message.getPlainBody().
It looks that Google uses '\r\n\r\n' as a plain EOL and '\r\n' as a manuell Linefeed (Shift-Enter). The code should be self explainable.
It might help to get alone with the newline problem in Docs.
A solution possibly not very elegant, but works like a charm :-)
function GetEmails2Doc() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
var pc = 0; // Paragraph Counter
var label = GmailApp.getUserLabelByName("_Send2Sheet");
var threads = label.getThreads();
var i = threads.length;
// LOOP Messages within a THREAT
for (i=threads.length-1; i>=0; i--) {
for (var j = 0; j < messages.length; j++) {
var message = messages[j];
/* Here I do some ...
body.insertParagraph(pc++, Utilities.formatDate(message.getDate(), "GMT",
"dd.MM.yyyy (HH:mm)")).setHeading(DocumentApp.ParagraphHeading.HEADING4)
str = message.getFrom() + ' to: ' + message.getTo();
if (message.getCc().length >0) str = str + ", Cc: " + message.getCc();
if (message.getBcc().length >0) str = str + ", Bcc: " + message.getBcc();
body.insertParagraph(pc++,str);
*/
// Body !!
var str = processBody(message.getPlainBody()).split("pEOL");
Logger.log(str.length + " EOLs");
for (var k=0; k<str.length; k++) body.insertParagraph(pc++,str[k]);
}
}
}
function processBody(tx) {
var s = tx.split(/\r\n\r\n/g);
// it looks like message.getPlainBody() [of mail] uses \r\n\r\n as EOL
// so, I first substitute the 'EOL's with the string pattern "pEOL"
// to be replaced with body.insertParagraph in the main function
tx = '';
for (k=0; k<s.length; k++) tx = tx + s[k] + "pEOL";
// then replace all remaining simple \r\n with a blank
s = tx.split(/\r\n/g);
tx = '';
for (k=0; k<s.length; k++) tx = tx + s[k] + " ";
return tx;
}
I have now found out through much trial and error -- and some much needed help from Wiktor Stribiżew (see other answer) -- that there is a solution to this, but it relies on the fact that Google Script does not recognise \n or \r in regex searches. The solution is as follows:
function removeLineBreaks() {
var selection = DocumentApp.getActiveDocument()
.getSelection();
if (selection) {
var elements = selection.getRangeElements();
for (var i = 0; i < elements.length; i++) {
var element = elements[i];
// Only deal with text elements
if (element.getElement()
.editAsText) {
var text = element.getElement()
.editAsText();
if (element.isPartial()) {
var start = element.getStartOffset();
var finish = element.getEndOffsetInclusive();
var oldText = text.getText()
.slice(start, finish);
if (oldText.match(/\r/)) {
var number = oldText.match(/\r/g)
.length;
for (var j = 0; j < number; j++) {
var location = oldText.search(/\r/);
text.deleteText(start + location, start + location);
text.insertText(start + location, ' ');
var oldText = oldText.replace(/\r/, ' ');
}
}
}
// Deal with fully selected text
else {
text.replaceText("\\v+", " ");
}
}
}
}
// No text selected
else {
DocumentApp.getUi()
.alert('No text selected. Please select some text and try again.');
}
}
Explanation
Google Docs allows searching for vertical tabs (\v), which match newlines.
Partial text is a whole other problem. The solution to dealing with partially selected text above finds the location of newlines by extracting a text string from the text element and searching in that string. It then uses these locations to delete the relevant characters. This is repeated until the number of newlines in the selected text has been reached.
This Stack Overflow answer removes, specifically, "\n". It may help, it helped me indeed.

Added plus sign before number input in angularjs

I am using this directive to keep user typing only number into input tag.
app.directive('validNumber', function () {
return {
require: '?ngModel',
link: function (scope, element, attrs, ngModelCtrl) {
if (!ngModelCtrl) {
return;
}
ngModelCtrl.$parsers.push(function (val) {
if (angular.isUndefined(val)) {
var val = '';
}
var clean = val.replace(/[^0-9\.]/g, '');
var decimalCheck = clean.split('.');
if (!angular.isUndefined(decimalCheck[0])) {
decimalCheck[0] = decimalCheck[0].slice(0, 10);
if (!angular.isUndefined(decimalCheck[1])) {
clean = decimalCheck[0] + '.' + decimalCheck[1];
}
else {
clean = decimalCheck[0];
}
//console.log(decimalCheck[0][0]);
}
if (!angular.isUndefined(decimalCheck[1])) {
decimalCheck[1] = decimalCheck[1].slice(0, 3);
clean = decimalCheck[0] + '.' + decimalCheck[1];
}
if (val !== clean) {
ngModelCtrl.$setViewValue(clean);
ngModelCtrl.$render();
}
return clean;
});
element.bind('keypress', function (event) {
if (event.keyCode === 32) {
event.preventDefault();
}
});
}
};
});
But now i want to custome this, that means user can type ONLY ONE of "+" or "-" in the first. I think i have to change this pattern of
var clean = val.replace(/[^0-9\.]/g, '');
i also try to change into val.replace(/[^0-9.+-]/g, ''). It works but incorrectly, with this pattern user can type more "+" and "-" in any position of input field. I just wanna keep user typing ONLY ONE of "+" or "-" in the first like "+1234" or "-1234"
This is more of a regex problem than an AngularJS one, so you might have more luck there: https://stackoverflow.com/questions/tagged/regex
I'll try help you though. I think the regex you want matches a single +-, then any number of digits, then optionally a decimal point, then any number of digits. A single regex to match that is:
^[+-]?[0-9]*\.?[0-9]*
Have a read about groups and the '?' operator. This regex allows:
+.
-.
which don't make sense as input. You could design clever regexes to omit those results, but I think it would be easier to check the entry programmatically.
Finally, there are also very likely regexes online to help you solve any regex problem you ever come across more comprehensivley than you could. Just google an english description next time, and check out this for what you want:
http://www.regular-expressions.info/floatingpoint.html

How To split a string in c# and keep the delimiter in the array while excluding white space in a name parser

This took me a while to figure out so I will Post my results here in the Question as this is Answered.
Question: How do i split a string using a array of possible delimiters in a name field while keeping the delimiter in the split array and excluding white-space the split may create in the array.
Example: Sam Washington& Jenna
My issue was the name parser i created was writing
Firstname:Sam
LastName : Jenna
Using the following code I was able to Parse it out like this
FirstName: Sam
Lastname : Washington
Firstname2 Jenna
Be careful However because if you are going to use my list of joiners do not include string values that can be found in common names such as "And" and "OR"
This would parse your names EX: "Andy" would be "And" , "Y"
EX2: "Gregory would be "Greg" "or" "y"
Hope this helps someone. If you have questions please feel free to shoot me a message.
/// <summary>
/// remove bad name parts
/// </summary>
/// <param name="parts">name parsed for review</param>
public static void CheckBadNames(ref string[] parts)
{
string[] BadName = new string[] {"LIFE", "ESTATE" ,"(",")","*","AN","LIFETIME","INTREST","MARRIED",
"UNMARRIED","MARRIED/UNMARRIED","SINGLE","W/","/W","THE","ET",
"ALS","AS", "TENANT","WIFE", "HUSBAND", "NOT", "DRIVE" ,"INSURED",
"EXCLUDED","DISABLED" ,"LICENSED","TRUSTEE","ATSOT","A T S O T",
"AKA", "-ATSOT","OF","DBA","EVOCABLE","FAMILY","INTEREST","MASTER"};
string[] joiners = new string[9] { "&", #"AND\", #"OR\", "\\", "&/OR", "AND/OR", "&-OR", "/", "OF/AND" };
Restart:
List<string> list = new List<string>(parts); //convert array to list
foreach (string part in list)
{
if (BadName.Any(s => part.ToUpper().Equals(s)) || part == "-")
{
list.Remove(part);
parts = list.ToArray();
goto Restart;
}
//check to see if any part ends with joiner
if (joiners.Any(s => part.ToUpper().EndsWith(s)))
{
//check if by ends with means that it is just a joiner
if (joiners.Any(s => part.ToUpper().Equals(s)))
{
continue;
}
else //name part ends with a joiner EX. Washington&
{
foreach (string div in joiners.Where(s => part.ToUpper().Contains(s))) // each string that contains a joiner
{
var temp = Regex.Split(part, "(" + div + ")").Where(x => x != String.Empty); // split into parts ignore leading or trailing spaces
int pos = list.IndexOf(part);
list.Remove(part);
for (int i = 0; i < temp.Count(); i++)
{
list.Insert(pos + i, temp.ElementAt(i));
}
parts = list.ToArray();
goto Restart;
}
}
}
}
if (parts.Count() == 0)
{
return;
}
if (joiners.Any(s => list.Last().ToUpper().Equals(s))) //remove last part if is a joiner
{
list.Remove(list.Last());
}
parts = list.ToArray(); // convert list back to array
}