Using regex to extract a string - regex

I am using Google Apps Script to return a text string from simple comments entity in a database into Google Sheets. I would like to identify certain comments that contain my own 'bbCode' such as [Status: Complete]. But I am not sure how to extract the 'Complete' text and how to remove the entire '[Status: Complete]' bbCode from the comment text. This is where I have got to so far - thank you for any suggestions:
Example 1 comment text: '[Status: Complete] Lorem ipsum bla bla'
Desired output: Col1: 'Lorem ipsum bla bla' Col2: 'Complete'
Example 2 comment text: 'Lorem [Status: Proposed] ipsum bla bla'
Desired output: Col1: 'Lorem ipsum bla bla' Col2: 'Proposed'
Example 3 comment text: 'Lorem ipsum bla bla'
Desired output: Col1: 'Lorem ipsum bla bla' Col2:
var bbCode = '[Status: ";
//get data from db and for next into val
var val = rs.getString(col+1);
if (val.indexOf(bbCode) > -1) {
// Item with Status, place the status in the Status Column
//but this next line is not right - I would like var Status = 'Complete' or 'Proposed' etc...
var Status = RegExp(.*\[Status: (.*)\].*', 'g');
cell.offset(row, col+2).setValue(Status);
// Place the remaining comment in the Comment column
//This next line is not right I would like val = the comment string without the '[Status: Completed]' or '[Status: Proposed]' etc...
cell.offset(row, col+2).setValue(val);
} else {
// Enter comment in Comment column as normal
cell.offset(row, col+1).setValue(val);
}

Try this sample:
function RegexGetTwoStrings() {
Logger.log('^^^^^^^^^^^^^^^^^^^^^^^^^^^^^');
var sample = ['[Status: Complete] Lorem ipsum bla bla',
'Lorem [Status: Proposed] ipsum bla bla',
'Lorem ipsum bla bla'];
var RegEx = /(.*)\[.*: (.*)\] (.*)/;
var Replace = "$1$3,$2";
var str, newStr, Col1, Col2;
var strArr = [];
for (var i = 0; i < sample.length; i++) {
str = sample[i];
newStr = str.replace(RegEx, Replace) + ",";
strArr = newStr.split(',');
Col1 = strArr[0];
Col2 = strArr[1];
Logger.log("Sample1 = '" + str + "'");
Logger.log('Col1 = ' + Col1);
Logger.log('Col2 = ' + Col2);
}
Logger.log('^^^^^^^^^^^^^^^^^^^^^^^^^^^^^');
}

Related

Regex - Replace input and output as decimals only? [duplicate]

Anyone have a regex to strip lat/long from a string? such as:
ID: 39.825 -86.88333
To match one value
-?\d+\.\d+
For both values:
(-?\d+\.\d+)\ (-?\d+\.\d+)
And if the string always has this form:
"ID: 39.825 -86.88333".match(/^ID:\ (-?\d+\.\d+)\ (-?\d+\.\d+)$/)
var latlong = 'ID: 39.825 -86.88333';
var point = latlong.match( /-?\d+\.\d+/g );
//result: point = ['39.825', '-86.88333'];
function parseLatLong(str) {
var exp = /ID:\s([-+]?\d+\.\d+)\s+([-+]?\d+\.\d+)/;
return { lat: str.replace(exp, "$1"), long: str.replace(exp, "$2") };
}
function doSomething() {
var res = parseLatLong("ID: 39.825 -86.88333");
alert('Latitude is ' + res.lat + ", Longitude is " + res.long);
}

How to search and replace multiline string in files using linux commands

i have many files which contain similar string which is multiline string for example :
<script> var i = 100
var j = 200
var x = 1000 </script>
and it can be look like this:
<script> var i = 100
var j = 200
var x = 1000 </script>
or
<script> var i = 100
var j = 200
var x = 1000 </script>
and i want to replace it with
<script> var i = 100
var j = 200
var x = xxxx </script>
Notice that the line can be also none spaced and sometimes it can be tabs
The case i have problem is the multiline , if it was simple one line it easir ,
Multi line replacements are easy in perl:
perl -0 -pe 's/<script>\s*var\s+i\s+=\s+100\s+var\s+j\s+=\s+200\s+var\s+x\s+=\s+1000\s+<\/script>/<script> var i =100\n var j =100\n var x = xxxx <\/script>/g' input-file
Or (slightly more readable):
perl -0 -pe 's/<script>\s*
var\s+ i\s+ =\s+ 100\s+
var\s+ j\s+ =\s+ 200\s+
var\s+ x\s+ =\s+1000\s+
<\/script>/<script> var i =100\n var j =100\n var x = xxxx <\/script>/gx' input-file

Replacement matching regex with anchor tag?

I have a problem when using Regex. I have a html document which create an anchor link when it matches condition.
An example html:
Căn cứ Luật Tổ chức HĐND và UBND ngày 26/11/2003;
Căn cứ Nghị định số 63/2010/NĐ-CP ngày 08/6/2010 của Chính phủ về
kiểm soát thủ tục hành chính;
Căn cứ Quyết định số 165/2011/QĐ-UBND ngày 06/5/2011 của UBND tỉnh
ban hành Quy định kiểm soát thủ tục hành chính trên địa bàn tỉnh;
Căn cứ Quyết định số 278/2011/QĐ-UBND ngày 02/8/2011 của UBND tỉnh
ban hành Quy chế phối hợp thực hiện thống kê, công bố, công khai thủ
tục hành chính và tiếp nhận, xử lý phản ánh, kiến nghị của cá nhân, tổ
chức về quy định hành chính trên địa bàn tỉnh;
Xét đề nghị của Giám đốc Sở Công Thương tại Tờ trình số
304/TTr-SCT ngày 29 tháng 5 năm 2013
I want to match these bold texts and make anchor links from these. If it has, try ignore. Link example 63/2010/NĐ-CP
var matchLegals = new Regex(#"(?:[\d]+\/?)\d+\/[a-z\dA-Z_ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ\-]+", RegexOptions.Compiled);
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var allElements = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']").Descendants();
foreach (var node in allElements)
{
var matches = matchLegals.Matches(node.InnerHtml);
foreach (Match m in matches)
{
var k = m.Value;
//dont know what to do
}
}
What can i do this
Many thanks.
I assume your regex pattern is OK and works. Another assumption is that node.InnerHtml doesn't contain any <a> tags already encompassing any of the potential matches.
In this case, it's as simple as doing something like this:
node.InnerHtml = Regex.Replace(node.InnerHtml, "[your pattern here]", "<a href='query=$&'>$&</a>");
...
doc.Save("output.html");
Note, that you may need to work on the href component - I'm unsure how your link should be built.
you match text and replace:
<script>
var s = '...';
var matchs = s.match(/\d{2,3}\/\d{4}\/[a-zA-Z\-áàảãạăâắằấầặẵẫậéèẻẽẹêếềểễệóòỏõọôốồổỗộơớờởỡợíìỉĩịđùúủũụưứửữựÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼÊỀỂỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỨỪỬỮỰỲỴÝỶỸửữựỵỷỹ]+/gi);
if (matchs != null) {
for(var i=0; i<matchs.length;i++){
var val = matchs[i];
s = s.replace(val, '<a href="?key=' + val + '"/>' + val + '</a>');
}
}
document.write(s);
</script>
#Shaamaan thank for your advice. After few hours of coding, it works now
var content = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']");
var items = content.SelectNodes(".//text()[normalize-space(.) != '']");
foreach (HtmlNode node in items)
{
if (!matchLegals.IsMatch(node.InnerText) || node.ParentNode.Name == "a")
{
continue;
}
var texts = node.InnerHtml.Trim();
node.InnerHtml = matchLegals.Replace(texts, a => string.Format("<a href='/search?q={0}'>{0}</a>",a.Value));
}

Google Apps Script Regular Expression to get the last name of a person

I am trying to write a regex to get the last name of a person.
var name = "My Name";
var regExp = new RegExp("\s[a-z]||[A-Z]*");
var lastName = regExp(name);
Logger.log(lastName);
If I understand correctly \s should find the white space between My and Name, [a-z]||[A-Z] would get the next letter, then * would get the rest. I would appreciate a tip if anyone could help out.
You can use the following regex:
var name = "John Smith";
var regExp = new RegExp("(?:\\s)([a-z]+)", "gi"); // "i" is for case insensitive
var lastName = regExp.exec(name)[1];
Logger.log(lastName); // Smith
But, from your requirements, it is simpler to just use .split():
var name = "John Smith";
var lastName = name.split(" ")[1];
Logger.log(lastName); // Smith
Or .substring() (useful if there are more than one "last names"):
var name = "John Smith Smith";
var lastName = name.substring(name.indexOf(" ")+1, name.length);
Logger.log(lastName); // Smith Smith

groovy list indexOf

If I have a list with following elements
list[0] = "blach blah blah"
list[1] = "SELECT something"
list[2] = "some more text"
list[3] = "some more text"
How can I find the index of where the string starts with SELECT.
I can do list.indexOf("SELECT something");
But this is a dynamic list. SELECT something wont always be SELECT something. it could be SELECT somethingelse or anything but first word will always be SELECT.
Is there a way to apply regex to the indexOf search?
def list = ["blach blah blah", "SELECT something", "some more text", "some more text"]
def index = list.findIndexOf { it ==~ /SELECT \w+/ }
This will return the index of the first item that matches the regex /SELECT \w+/. If you want to obtain the indices of all matching items replace the second line with
def index = list.findIndexValues { it ==~ /SELECT \w+/ }
You can use a regex in find:
def list = ["blach blah blah", "SELECT something", "some more text", "some more text"]
def item = list.find { it ==~ /SELECT \w+/ }
assert item == "SELECT something"
list[1] = "SELECT somethingelse"
item = list.find { it ==~ /SELECT \w+/ }
assert item == "SELECT somethingelse"