Unable to select some text from Google Docs with a simple regex - regex

I'm trying to highlight some text (in the example below I would like to highlight "ORGANIZA") with some regex on a Google Docs document, but I'm unable to make first work a simple regex to find the "category_name" string.
Why this:
function highlightTextTwo() {
/* DOCUMENT DEFINITION */
var doc = DocumentApp.openById('1M6JmJPndLS_hkdaUo5holsdxB5GSSrcWMa1j4Hh7Dig');
/* VARIABLE DEFINITION */
var highlightStyle = {};
var paras = doc.getParagraphs();
var textLocation = {};
var i;
/* REGEX DEFINITION */
var MyRegex = new RegExp('category_name','i');
/* COLOR STYLE DEFINITION */
highlightStyle[DocumentApp.Attribute.FOREGROUND_COLOR] = '#FF0000';
/* CODE */
for (i=0; i<paras.length; ++i) {
Logger.log( paras[i].findText(MyRegex) );
}
}
applied to this document:
{
"map_image": "mapa_con_close_button.png",
"categories":[
{
"category_id": 1,
"category_name": "ORGANIZA",
"color": "#4591D0",
"icon_image": "Organiza.png"
},
{
"category_id": 2,
"category_name": "DELEGA",
"color": "#94C5DD",
"icon_image": "Delega.png"
},
{
"category_id": 3,
"category_name": "NEGOCIA Y GESTIONA EL CONFLICTO",
"color": "#E7344A",
"icon_image": "Negocia_y_Gestiona.png"
}
returns this:
[15-06-03 20:12:48:026 CEST] null
[15-06-03 20:12:48:027 CEST] null
[15-06-03 20:12:48:028 CEST] null
[15-06-03 20:12:48:029 CEST] null
[15-06-03 20:12:48:030 CEST] null
[15-06-03 20:12:48:030 CEST] null
instead the some nulls and one "category_name".

I found out a way to display all of your category_name strings. Main points:
Use RegExp exec in a while loop instead od findText
To get all occurrences, we need to use g flag with regex
To access text in paragraphs, we need getText()
Code:
var paras = doc.getParagraphs();
var MyRegex = new RegExp('category_name','ig');
for (i=0; i<paras.length; ++i) {
while (match = MyRegex.exec(paras[i].getText()))
{
Logger.log(match[0]);
}
}
Output in the log:
[15-06-04 21:07:36:320 CEST] category_NAME
[15-06-04 21:07:36:322 CEST] category_name
[15-06-04 21:07:36:324 CEST] category_name
EDIT:
Here is a way to highlight the matches with red color:
var paras = doc.getParagraphs();
var MyRegex = new RegExp('category_name','ig');
for (i=0; i<paras.length; ++i) {
while (match = MyRegex.exec(paras[i].getText()))
{
var searchResult = paras[i].findText(match[0]);
if (searchResult !== null) {
var thisElement = searchResult.getElement();
var thisElementText = thisElement.asText();
thisElementText.setBackgroundColor(searchResult.getStartOffset(), searchResult.getEndOffsetInclusive(),"#FF0000");
}
}
}

Related

Mongodb Regex Query first 2 characters of the string

In one of my mongodb collection, I have a date string that has a mm/dd/yyyy format. Now, I want to query the 'mm' string.
Example, 05/20/2016 and 04/05/2015.
I want to get the first 2 characters of the string and query '05'. With that, the result I will get should only be 05/20/2016.
How can I achieve this?
Thanks!
For a regex solution, the following will suffice
var search = "05",
rgx = new RegExp("^"+search); // equivalent to var rgx = /^05/;
db.collection.find({ "leave_start": rgx });
Testing
var leave_start = "05/06/2016",
test = leave_start.match(/^05/);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
or
var search = "05",
rgx = new RegExp("^"+search),
leave_start = "05/12/2016";
var test = leave_start.match(rgx);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
Another alternative is to use the aggregation framework and take advantage of the $substr operator to extract the first 2 characters of a field and then the $match operator will filter documents based on the new substring field above:
db.collection.aggregate([
{
"$project": {
"leaves_start": 1,
"monthSubstring": { "$substr": : [ "$leaves_start", 0, 2 ] }
}
},
{ "$match": { "monthSubstring": "05" } }
])

Replacement matching regex with anchor tag?

I have a problem when using Regex. I have a html document which create an anchor link when it matches condition.
An example html:
Căn cứ Luật Tổ chức HĐND và UBND ngày 26/11/2003;
Căn cứ Nghị định số 63/2010/NĐ-CP ngày 08/6/2010 của Chính phủ về
kiểm soát thủ tục hành chính;
Căn cứ Quyết định số 165/2011/QĐ-UBND ngày 06/5/2011 của UBND tỉnh
ban hành Quy định kiểm soát thủ tục hành chính trên địa bàn tỉnh;
Căn cứ Quyết định số 278/2011/QĐ-UBND ngày 02/8/2011 của UBND tỉnh
ban hành Quy chế phối hợp thực hiện thống kê, công bố, công khai thủ
tục hành chính và tiếp nhận, xử lý phản ánh, kiến nghị của cá nhân, tổ
chức về quy định hành chính trên địa bàn tỉnh;
Xét đề nghị của Giám đốc Sở Công Thương tại Tờ trình số
304/TTr-SCT ngày 29 tháng 5 năm 2013
I want to match these bold texts and make anchor links from these. If it has, try ignore. Link example 63/2010/NĐ-CP
var matchLegals = new Regex(#"(?:[\d]+\/?)\d+\/[a-z\dA-Z_ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ\-]+", RegexOptions.Compiled);
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var allElements = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']").Descendants();
foreach (var node in allElements)
{
var matches = matchLegals.Matches(node.InnerHtml);
foreach (Match m in matches)
{
var k = m.Value;
//dont know what to do
}
}
What can i do this
Many thanks.
I assume your regex pattern is OK and works. Another assumption is that node.InnerHtml doesn't contain any <a> tags already encompassing any of the potential matches.
In this case, it's as simple as doing something like this:
node.InnerHtml = Regex.Replace(node.InnerHtml, "[your pattern here]", "<a href='query=$&'>$&</a>");
...
doc.Save("output.html");
Note, that you may need to work on the href component - I'm unsure how your link should be built.
you match text and replace:
<script>
var s = '...';
var matchs = s.match(/\d{2,3}\/\d{4}\/[a-zA-Z\-áàảãạăâắằấầặẵẫậéèẻẽẹêếềểễệóòỏõọôốồổỗộơớờởỡợíìỉĩịđùúủũụưứửữựÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼÊỀỂỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỨỪỬỮỰỲỴÝỶỸửữựỵỷỹ]+/gi);
if (matchs != null) {
for(var i=0; i<matchs.length;i++){
var val = matchs[i];
s = s.replace(val, '<a href="?key=' + val + '"/>' + val + '</a>');
}
}
document.write(s);
</script>
#Shaamaan thank for your advice. After few hours of coding, it works now
var content = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']");
var items = content.SelectNodes(".//text()[normalize-space(.) != '']");
foreach (HtmlNode node in items)
{
if (!matchLegals.IsMatch(node.InnerText) || node.ParentNode.Name == "a")
{
continue;
}
var texts = node.InnerHtml.Trim();
node.InnerHtml = matchLegals.Replace(texts, a => string.Format("<a href='/search?q={0}'>{0}</a>",a.Value));
}

Swift splitting "abc1.23.456.7890xyz" into "abc", "1", "23", "456", "7890" and "xyz"

In Swift on OS X I am trying to chop up the string "abc1.23.456.7890xyz" into these strings:
"abc"
"1"
"23"
"456"
"7890"
"xyz"
but when I run the following code I get the following:
=> "abc1.23.456.7890xyz"
(0,3) -> "abc"
(3,1) -> "1"
(12,4) -> "7890"
(16,3) -> "xyz"
which means that the application correctly found "abc", the first token "1", but then the next token found is "7890" (missing out "23" and "456") followed by "xyz".
Can anyone see how the code can be changed to find ALL of the strings (including "23" and "456")?
Many thanks in advance.
import Foundation
import XCTest
public
class StackOverflowTest: XCTestCase {
public
func testRegex() {
do {
let patternString = "([^0-9]*)([0-9]+)(?:\\.([0-9]+))*([^0-9]*)"
let regex = try NSRegularExpression(pattern: patternString, options: [])
let string = "abc1.23.456.7890xyz"
print("=> \"\(string)\"")
let range = NSMakeRange(0, string.characters.count)
regex.enumerateMatchesInString(string, options: [], range: range) {
(textCheckingResult, _, _) in
if let textCheckingResult = textCheckingResult {
for nsRangeIndex in 1 ..< textCheckingResult.numberOfRanges {
let nsRange = textCheckingResult.rangeAtIndex(nsRangeIndex)
let location = nsRange.location
if location < Int.max {
let startIndex = string.startIndex.advancedBy(location)
let endIndex = startIndex.advancedBy(nsRange.length)
let value = string[startIndex ..< endIndex]
print("\(nsRange) -> \"\(value)\"")
}
}
}
}
} catch {
}
}
}
It's all about your regex pattern. You want to find a series of contiguous letters or digits. Try this pattern instead:
let patternString = "([a-zA-Z]+|\\d+)"
alternative 'Swifty' way
let str = "abc1.23.456.7890xyz"
let chars = str.characters.map{ $0 }
enum CharType {
case Number
case Alpha
init(c: Character) {
self = .Alpha
if isNumber(c) {
self = .Number
}
}
func isNumber(c: Character)->Bool {
return "1234567890".characters.map{ $0 }.contains(c)
}
}
var tmp = ""
tmp.append(chars[0])
var type = CharType(c: chars[0])
for i in 1..<chars.count {
let c = CharType(c: chars[i])
if c != type {
tmp.append(Character("."))
}
tmp.append(chars[i])
type = c
}
tmp.characters.split(".", maxSplit: Int.max, allowEmptySlices: false).map(String.init)
// ["abc", "1", "23", "456", "7890", "xyz"]

How to find which group is matched in NSRegularExpression

I have a regex statement with multiple capture groups which are separated by | operator. How can I find out which capture group is matched? Only way I can think of -for this example- is counting the number of characters if something is matched.
var string = "1234567897"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[a-zA-Z]{2}$)"
var myRegex = NSRegularExpression(pattern: pattern, options: nil, error: nil)!
if let myMatch = myRegex.firstMatchInString(string, options: nil,
range: NSRange(location: 0, length: string.utf16Count)) {
println((string as NSString).substringWithRange(myMatch.rangeAtIndex(0)))
}
I wrote a code which worked for my example. I am sure it can be written better way but it works for now.
Swift 2.3
var string = "123456789"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[wW]{2}$)"
var myRegex = try! NSRegularExpression(pattern: pattern, options: [])
if let myMatch = myRegex.firstMatchInString(string, options: NSMatchingOptions.init(rawValue: 0), range: NSRange(location: 0, length: string.utf16.count)) {
var matchedGroup = 0
for var i in 1..<myMatch.numberOfRanges {
if myMatch.rangeAtIndex(i).length != 0 {
matchedGroup = i
break
}
}
print(matchedGroup)
print((string as NSString).substringWithRange(myMatch.rangeAtIndex(0))) //whatever the range you want to print
}
Swift 3
var string = "123456789"
var pattern = "(^\\d{9}$)|(^\\d{10}$)|(^\\d{13}$)|(^[a-zA-Z]{2}\\d{9}[wW]{2}$)"
var myRegex = try! NSRegularExpression(pattern: pattern, options: [])
if let myMatch = myRegex.firstMatch(in: string, options: NSRegularExpression.MatchingOptions.init(rawValue: 0), range: NSRange(location: 0, length: string.utf16.count)) {
var matchedGroup = 0
for var i in 1..<myMatch.numberOfRanges {
if myMatch.rangeAt(i).length != 0 {
matchedGroup = i
break
}
}
print(matchedGroup)
print((string as NSString).substring(with: myMatch.rangeAt(0))) //whatever the range you want to print
}

Is there a way to filter a text file using grep (or any other tool), so that you can get a section of the file that's encased in bracers or brackets?

I got several files that look something like this:
universe = {
["stars"] = {
["Sun"] = {
["planets"] = "9",
["life"] = "Yes",
["asteroid"] = "9001"
},
["Alpha Centauri"] = {
["planets"] = "3",
["life"] = "No",
["asteroid"] = "20"
},
["Rigel"] = {
["planets"] = "5",
["life"] = "No",
["asteroid"] = "11"
}
}
}
My intention is to find, for instance, every block where ["life"] equals "No". I realize this could be handled better if it was within a database (or something with a structure), but I'm not sure how to convert this data onto that.
I have a bunch of files in this format, and I'd like to run a command that could display the sections (up to the immediate parent bracket) where the condition is true, so for the previous example, I'd like to get:
["Alpha Centauri"] = {
["planets"] = "3",
["life"] = "No",
["asteroid"] = "20"
},
["Rigel"] = {
["planets"] = "5",
["life"] = "No",
["asteroid"] = "11"
}
Can this be done with GREP? Or is there any other tool that could do something like this?
Any help is greatly appreciated. Thanks in advance.
EDIT
Example 2: https://regex101.com/r/jO9dU5/1
Try this Lua program:
local function find(w,t,p)
for k,v in pairs(t) do
if v==w then
print(p.."."..k)
elseif type(v)=="table" then
find(w,v,p.."."..k)
end
end
end
find("No",universe,"universe")
Add the definition of universe before this code.
If you really want to do text processing, try this instead:
S=[[
universe = {
...
}
]]
for w in S:gmatch('%b[] = {[^{]-"No".-},?') do
print(w)
end
Yep, it's possible through grep which supports -P (Perl Regex) parameter.
$ grep -oPz '.*\[[^\[\]]*\]\s*=\s*\{[^{}]*\["life"\]\s*=\s*"No"[^{}]*}.*' file
["Alpha Centauri"] = {
["planets"] = "3",
["life"] = "No",
["asteroid"] = "20"
},
["Rigel"] = {
["planets"] = "5",
["life"] = "No",
["asteroid"] = "11"
}
DEMO
From grep --help
-z, --null-data a data line ends in 0 byte, not newline
-o, --only-matching show only the part of a line matching PATTERN
Update:
\[[^\n]*\]\h*=\h*\{(?!,\s*\[[^\[\]]*\]\h*=\h*{).*?\["fontSize"\]\h*=\h*20,.*?\}(?=,\s*\[[^\[\]]*\]\h*=\h*{|\s*})
DEMO
$ pcregrep -oM '(?s)[^\n]*\[[^\n]*\]\h*=\h*\{(?!,\s*\[[^\[\]]*\]\h*=\h*{).*?\["fontSize"\]\h*=\h*20,.*?\}(?=,\s*\[[^\[\]]*\]\h*=\h*{|\s*})' file
["frame 1"] = {
["fontSize"] = 20,
["displayStacks"] = "%p",
["xOffset"] = 251.000518798828,
["stacksPoint"] = "BOTTOM",
["regionType"] = "icon",
["yOffset"] = 416.000183105469,
["anchorPoint"] = "CENTER",
["parent"] = "Target Shit",
["numTriggers"] = 1,
["customTextUpdate"] = "update",
["id"] = "Invulnerabilities 2",
["icon"] = true,
["fontFlags"] = "OUTLINE",
["stacksContainment"] = "OUTSIDE",
["zoom"] = 0,
["auto"] = true,
["selfPoint"] = "CENTER",
["width"] = 60,
["frameStrata"] = 1,
["desaturate"] = false,
["stickyDuration"] = true,
["font"] = "Emblem",
["inverse"] = false,
["height"] = 60,
}
["frame 2"] = {
["fontSize"] = 20,
["displayStacks"] = "%p",
["parent"] = "Target Shit",
["xOffset"] = 118.000427246094,
["stacksPoint"] = "BOTTOM",
["anchorPoint"] = "CENTER",
["untrigger"] = {
},
["regionType"] = "icon",
["color"] = {
1, -- [1]
1, -- [2]
1, -- [3]
1, -- [4]
},
["desaturate"] = false,
["frameStrata"] = 1,
["stickyDuration"] = true,
["width"] = 60,
["font"] = "Emblem",
["inverse"] = false,
["icon"] = true,
["height"] = 60,
["yOffset"] = 241
}
(?s) DOTALL modifier which makes dots in your regex to match even line breaks.
Using a proper lua parser in perl
This is not a quick'n'dirty snippet, but a robust way to query a lua's DS :
use strict; use warnings;
use Data::Lua; # lua 2 perl parser
use Data::Dumper; # to dump Data Structures (in color)
# retrieving the lua'DS in a perl's DS
my $root = Data::Lua->parse_file('lua.conf');
# iterating over keys of planet's HASH
foreach my $planet (keys $root->{universe}->{stars}) {
print Dumper { $planet => $root->{universe}->{stars}->{$planet} }
if $root->{universe}->{stars}->{$planet}->{life} eq "No";
}
Output
$VAR1 = {
'Rigel' => {
'planets' => '5',
'life' => 'No',
'asteroid' => '11'
}
};
$VAR1 = {
'Alpha Centauri' => {
'asteroid' => '20',
'life' => 'No',
'planets' => '3'
}
};
How To
install Data::Lua if not already installed with # cpan Data::Lua
put the Data Structure in the file lua.conf
put this script in the same dir in by example lua_DS_parser.pl
run the script with $ perl lua_DS_parser.pl
enjoy ;)
You could use something like this
grep -C 2 -E 'life.+= "No"' path_to_file
But in my opinion better way is converting files to some common format.