How do I process splitting on certain characters with regex

How do I process splitting on certain characters with regex - regex

In my app, I'm trying to split a string into an array based on a regex pattern. I'd like to be able to load my volt templates and run them through our custom rendering engine - just to learn a bit more on how rendering engines work.
I wrote the regex below to do just that:
"(?s)(\\{\\{.*?\\}\\}|\\{%.*?%\\}|\\{#.*?#\\})"
And this is an example of such a template:
# {{ title }}
{{created_at}} {{created_location}}
============
Paragraphs are separated by a blank line.
2nd paragraph. *Italic*, **bold**, and `monospace`.
Itemized lists look like:
{% for (item in items) %}
* {{ item }}
{% endfor %}
Now, ideally, I'd like this to be converted to an array looking like this:
[
"# ",
"{{ title }}",
"\n",
"{{created_at}}",
" ",
"{{created_location}}",
"\n============\nParagraphs are separated by a blank line\n2nd paragraph. *Italic*, **bold**, and `monospace`.\n\nItemized lists look like:"
"{% for (item in items) %}",
"\n* {{ item }}\n",
"{% endfor %}"
]
However, when I run the regex above, I get:
[
"Paragraphs are separated by a blank line.\n2nd paragraph. *Italic*, **bold**, and `monospace`.\n\nItemized lists look like:",
"{% for (item in items) %}\n* {{ item }}",
"{% endfor %}\n"
]
As you can see the title part completely disappears. Furthermore, there seem to be some issues with the newline characters. Any ideas how I could solve this?

The problem wasn't in the regex, but in the code that I was using to split on the regex. I modified the code below to also return the regex itself.
extension NSRegularExpression {
func split(_ str: String) -> [String] {
let range = NSRange(location: 0, length: str.characters.count)
//get locations of matches
var matchingRanges: [NSRange] = []
let matches: [NSTextCheckingResult] = self.matches(in: str, options: [], range: range)
for match: NSTextCheckingResult in matches {
matchingRanges.append(match.range)
}
//invert ranges - get ranges of non-matched pieces
var pieceRanges: [NSRange] = []
//add first range
pieceRanges.append(NSRange(location: 0, length: (matchingRanges.count == 0 ? str.characters.count : matchingRanges[0].location)))
var endLoc: Int = 0
var startLoc: Int = 0
//add between splits ranges and last range
for i in 0..<matchingRanges.count {
let isLast = i + 1 == matchingRanges.count
let location = matchingRanges[i].location
let length = matchingRanges[i].length
startLoc = location + length
endLoc = isLast ? str.characters.count : matchingRanges[i + 1].location
pieceRanges.append(NSRange(location: startLoc, length: endLoc - startLoc))
}
var pieces: [String] = []
var previous: NSRange = NSRange(location: 0, length: 0)
for range: NSRange in pieceRanges {
let item = (str as NSString).substring(with: NSRange(location:previous.location+previous.length, length:range.location-(previous.location+previous.length)))
pieces.append(item)
let piece = (str as NSString).substring(with: range)
pieces.append(piece)
previous = range
}
return pieces
}
}

Related

Find and change cyrillic word with boundary in google scripts

The problem is that \b doesn't work with Russian and Ukrainian letters.
Here I try to find all matches of a word 'февраля' it the text, change them to tempword, then make it a link and change it back to 'февраля'.
function addLinks(word, siteurl) {
var id = 'doc\'s ID';
var doc = DocumentApp.openById(id);
var body = doc.getBody();
var tempword = 'ASDFDSGDDKDSL2';
var searchText = "\\b"+word+"\\b";
var element = body.findText(searchText);
console.log(element);
while (element) {
var start = element.getStartOffset();
var text = element.getElement().asText();
text.replaceText(searchText, tempword);
text.setLinkUrl(start, start + tempword.length - 1, siteurl);
element = body.findText(searchText);
}
body.replaceText(tempword, word);
}
addLinks('февраля', 'example.com');
It works as it should, if I change Russian word 'февраля' to English 'february'.
addLinks('february', 'example.com');
I need regular expression, because if I just look for 'февраля' script will apply it to other words like 'февралям', 'февралями' etc.
So, it is a question, how to make it work.
Mistake "Exception: Invalid regular expression pattern" occurs with this code:
var searchText = "(?<=[\\s,.:;\"']|^)"+word+"(?=[\\s,.:;\"']|$)";
or this:
var searchText = "(^|\s)"+word+"(?=\s|$)";
and some other.

Here is my solution:
function main() {
addLinks('февраля', 'example.com');
}
function addLinks(word, url) {
var doc = DocumentApp.getActiveDocument();
var pgfs = doc.getParagraphs();
var bound = '[^А-яЁё]'; // any letter except Russian one
var patterns = [
{regex: bound + word + bound, start: 1, end: 1}, // word inside of line
{regex: '^' + word + bound, start: 0, end: 1}, // word at the start
{regex: bound + word + '$', start: 1, end: 0}, // word at the end
{regex: '^' + word + '$', start: 0, end: 0} // word = line
];
for (var pgf of pgfs) for (var pattern of patterns) {
var location = pgf.findText(pattern.regex);
while (location) {
var start = location.getStartOffset() + pattern.start;
var end = location.getEndOffsetInclusive() - pattern.end;
pgf.editAsText().setLinkUrl(start, end, url);
location = pgf.findText(pattern.regex, location);
}
}
}
Test output:
It handles well the word placed at the start or at the end of the line (or both). And it gives no the weird error message.

how to create a filter to search for a word with special characters while writing in the input without special characters

it's my first post.
I work to Quasar (Vue.js)
I have list of jobs, and in this list, i have words with special caractere.
Ex :
[ ...{ "libelle": "Agent hôtelier" },{"libelle": "Agent spécialisé / Agente spécialisée des écoles maternelles -ASEM-"},{ "libelle": "Agriculteur / Agricultrice" },{ "libelle": "Aide aux personnes âgées" },{ "libelle": "Aide de cuisine" },...]
And on "input" i would like to search "Agent spécialisé" but i want to write "agent specialise" (without special caractere) or the initial name, i want to write both and autocomplete my "input".
I just don't fin the solution for add to my filter code ...
My input :
<q-select
filled
v-model="model"
use-input
hide-selected
fill-input
input-debounce="0"
:options="options"
hint="Votre métier"
style="width: 250px; padding-bottom: 32px"
#filter="filterFn"
>
</q-select>
</div>
My code :
export default {
props: ['data'],
data() {
return {
jobList: json,
model: '',
options: [],
stringOptions: []
}
},
methods: {
jsonJobsCall(e) {
this.stringOptions = []
json.forEach(res => {
this.stringOptions.push(res.libelle)
})
},
filterFn(val, update) {
if (val === '') {
update(() => {
this.jsonJobsCall(val)
this.options = this.stringOptions
})
return
}
update(() => {
const regex = /é/i
const needle = val.toLowerCase()
this.jsonJobsCall(val)
this.options = this.stringOptions.filter(
v => v.replace(regex, 'e').toLowerCase().indexOf(needle) > -1
)
})
},
}
}
To sum up : i need filter for write with or witouth special caractere in my input for found in my list the job which can contain a special character.
I hope i was clear, ask your questions if i haven't been.
Thanks you very much.

I am not sure if its work for you but you can use regex to create valid filter for your need. For example, when there is "e" letter you want to check "e" or "é" (If I understand correctly)
//Lets say we want to match "Agent spécialisé" with the given search text
let searchText = "Agent spe";
// Lets create a character map for matching characters
let characterMap = {
e: ['e', 'é'],
a: ['a', '#']
}
// Replacing special characters with a regex part which contains all equivelant characters
// !Remember replaceAll depricated
Object.keys(characterMap).forEach((key) => {
let replaceReg = new RegExp(`${key}`, "g")
searchText = searchText.replace(replaceReg, `[${characterMap[key].join("|")}]`);
})
// Here we create a regex to match
let reg = new RegExp(searchText + ".*")
console.log("Agent spécialisé".match(reg) != null);
Another approach could be the reverse of this. You can normalize "Agent spécialisé". (I mean replace all é with normal e with a regex like above) and store in the object along with the original text. But search on this normalized string instead of original.

Go Templates: two or more slices ranges

The next code works perfect for output one slice inside the HomeTemplate.
main.go
type Item struct {
Id int
Name string
Type string
}
var tmpl = template.Must(template.ParseGlob("tmpl/*"))
func Index(w http.ResponseWriter, r *http.Request) {
db := database.DbConn()
selDB, err := product.ByID()
if err != nil {
panic(err.Error())
}
i := Item{}
resItems := []Item{}
for selDB.Next() {
var id int
var product_name, product_type string
err = selDB.Scan(&id, &product_name, &product_type)
if err != nil {
panic(err.Error())
}
i.Id = id
i.Name = product_name
i.Type = product_type
resItems = append(resItems, i)
}
tmpl.ExecuteTemplate(w, "HomeTemplate", resItems)
// Close database connection
defer db.Close()
}
In the template forks fine the next code:
{{ range . }}
{{ .Name }}<br />
{{ end }}
Why something like this does not work?
{{ range .resItems }}
{{ .Name }}<br />
{{ end }}
What if I want output two or more slices, what I need to do or change?
Thank you

first question, why rang .resItems doesn't work.
In template, . means current item. like this in java.
if . in some direction like range, it means the item from range operation.
if not, it means the item you passed form ExecuteTemplate() method. like in ExecuteTemplate(w, "HomeTemplate", resItems), . means resItems. so you can not use .resItems because it means resItems have a value called resItems.
second, if you have more slices to pass to template, you can add all of them to a map, like this:
t := template.New("test")
t, _ = t.Parse(`
test range
{{range .first}} {{.}} {{end}}
{{range .second}} {{.}} {{end}}
`)
var res = make(map[string]interface{})
aa := []string{"first", "second"}
bb := []string{"123", "456"}
res["first"] = aa
res["second"] = bb
t.Execute(os.Stdout, res)
// output
test range
first second
123 456
I have two array aa and bb, and add them to a map then pass it to the template. In template, . means the map, and .first got the array aa. and so as bb.
Hope this can help you...

Mongodb Regex Query first 2 characters of the string

In one of my mongodb collection, I have a date string that has a mm/dd/yyyy format. Now, I want to query the 'mm' string.
Example, 05/20/2016 and 04/05/2015.
I want to get the first 2 characters of the string and query '05'. With that, the result I will get should only be 05/20/2016.
How can I achieve this?
Thanks!

For a regex solution, the following will suffice
var search = "05",
rgx = new RegExp("^"+search); // equivalent to var rgx = /^05/;
db.collection.find({ "leave_start": rgx });
Testing
var leave_start = "05/06/2016",
test = leave_start.match(/^05/);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
or
var search = "05",
rgx = new RegExp("^"+search),
leave_start = "05/12/2016";
var test = leave_start.match(rgx);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
Another alternative is to use the aggregation framework and take advantage of the $substr operator to extract the first 2 characters of a field and then the $match operator will filter documents based on the new substring field above:
db.collection.aggregate([
{
"$project": {
"leaves_start": 1,
"monthSubstring": { "$substr": : [ "$leaves_start", 0, 2 ] }
}
},
{ "$match": { "monthSubstring": "05" } }
])

Replacement matching regex with anchor tag?

I have a problem when using Regex. I have a html document which create an anchor link when it matches condition.
An example html:
Căn cứ Luật Tổ chức HĐND và UBND ngày 26/11/2003;
Căn cứ Nghị định số 63/2010/NĐ-CP ngày 08/6/2010 của Chính phủ về
kiểm soát thủ tục hành chính;
Căn cứ Quyết định số 165/2011/QĐ-UBND ngày 06/5/2011 của UBND tỉnh
ban hành Quy định kiểm soát thủ tục hành chính trên địa bàn tỉnh;
Căn cứ Quyết định số 278/2011/QĐ-UBND ngày 02/8/2011 của UBND tỉnh
ban hành Quy chế phối hợp thực hiện thống kê, công bố, công khai thủ
tục hành chính và tiếp nhận, xử lý phản ánh, kiến nghị của cá nhân, tổ
chức về quy định hành chính trên địa bàn tỉnh;
Xét đề nghị của Giám đốc Sở Công Thương tại Tờ trình số
304/TTr-SCT ngày 29 tháng 5 năm 2013
I want to match these bold texts and make anchor links from these. If it has, try ignore. Link example 63/2010/NĐ-CP
var matchLegals = new Regex(#"(?:[\d]+\/?)\d+\/[a-z\dA-Z_ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ\-]+", RegexOptions.Compiled);
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var allElements = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']").Descendants();
foreach (var node in allElements)
{
var matches = matchLegals.Matches(node.InnerHtml);
foreach (Match m in matches)
{
var k = m.Value;
//dont know what to do
}
}
What can i do this
Many thanks.

I assume your regex pattern is OK and works. Another assumption is that node.InnerHtml doesn't contain any <a> tags already encompassing any of the potential matches.
In this case, it's as simple as doing something like this:
node.InnerHtml = Regex.Replace(node.InnerHtml, "[your pattern here]", "<a href='query=$&'>$&</a>");
...
doc.Save("output.html");
Note, that you may need to work on the href component - I'm unsure how your link should be built.

you match text and replace:
<script>
var s = '...';
var matchs = s.match(/\d{2,3}\/\d{4}\/[a-zA-Z\-áàảãạăâắằấầặẵẫậéèẻẽẹêếềểễệóòỏõọôốồổỗộơớờởỡợíìỉĩịđùúủũụưứửữựÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼÊỀỂỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỨỪỬỮỰỲỴÝỶỸửữựỵỷỹ]+/gi);
if (matchs != null) {
for(var i=0; i<matchs.length;i++){
var val = matchs[i];
s = s.replace(val, '<a href="?key=' + val + '"/>' + val + '</a>');
}
}
document.write(s);
</script>

#Shaamaan thank for your advice. After few hours of coding, it works now
var content = doc.DocumentNode.SelectSingleNode("//div[#class='main-content']");
var items = content.SelectNodes(".//text()[normalize-space(.) != '']");
foreach (HtmlNode node in items)
{
if (!matchLegals.IsMatch(node.InnerText) || node.ParentNode.Name == "a")
{
continue;
}
var texts = node.InnerHtml.Trim();
node.InnerHtml = matchLegals.Replace(texts, a => string.Format("<a href='/search?q={0}'>{0}</a>",a.Value));
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I process splitting on certain characters with regex - regex

Related

Find and change cyrillic word with boundary in google scripts

how to create a filter to search for a word with special characters while writing in the input without special characters

Go Templates: two or more slices ranges

Mongodb Regex Query first 2 characters of the string

Replacement matching regex with anchor tag?

Categories

Resources