I have a string like this in PHP
<li>bla bla bla bla</li>
<li>hello hello</li>
<li>the brown fox didnt jump</li>
.
.
.
<li>aaaaaaarghhhh</li>
I want to get a string wich contains the first X li's (li tags included in the result string)
<li>.....first one....</li>
<li>.....second one....</li>
<li>.................</li>
<li>.....X one....</li>
How can this be done with REGEX or something else????
I could remove the
</li>
then explode by
<li>
and get the first X elements of array and then adding again li tags at beginning and at end of each element, but i think its too dirty...
Any better ideas?
See if this regex works for you (replace the number 2 with the required number of li):
((?:<li>.*?<\/li>){2})
How about parsing it as actual DOM elements using PHP Simple HTML DOM Parser
You can download the script from here: http://sourceforge.net/projects/simplehtmldom/files/
If you load that script in to your current script like this:
include_once("simple_html_dom.php");
Then it's as simple as:
$html = "<li>bla bla bla bla</li>, etc, etc ......";
$list_array = array();
foreach($html->find('li') as $element) {
$list_array[] = $element->innertext;
}
Related
I have a list of terms which I want to match as follows:
final List _emotions = [
'~~wink~~',
'~~bigsmile~~',
'~~sigh~~',
];
And a second list of replacements:
final List _replacements = [
'0.gif',
'1.gif',
'2.gif',
];
SO that if I have text:
var text = "I went to the store and got a ~~bigsmile~~";
I could have it replace the text as
I went to the store and got a <img src="1.gif" />
So essentially, I was thinking of running a regex replace on my text variable, but the search pattern would be based on my _emotions List.
Forming the replacement text should be easy, but I'm not sure how I could use the list as the basis for the search terms
How is this possible in dart?
You need to merge the two string lists into a single Map<String, String> that will serve as a dictionary (make sure the _emotions strings are in lower case since you want a case insensitive matching), and then join the _emotions strings into a single alternation based pattern.
After getting a match, use String#replaceAllMapped to find the right replacement for the found emotion.
Note you can shorten the pattern if you factor in the ~~ delimiters (see code snippet below). You might also apply more advanced techniques for the vocabulary, like regex tries (see my YT video on this topic).
final List<String> _emotions = [
'wink',
'bigsmile',
'sigh',
];
final List<String> _replacements = [
'0.gif',
'1.gif',
'2.gif',
];
Map<String, String> map = Map.fromIterables(_emotions, _replacements);
String text = "I went to the store and got a ~~bigsmile~~";
RegExp regex = RegExp("~~(${_emotions.join('|')})~~", caseSensitive: false);
print(text.replaceAllMapped(regex, (m) => '<img src="${map[m[1]?.toLowerCase()]}" />'));
Output:
I went to the store and got a <img src="1.gif" />
I have a String like this:
val rawData = "askljdld<a>content to extract</a>lkdsjkdj<a>more content to extract</a>sdkdljk
and I want to extract the content between the tags <a>
I've tried this, but the end part of the regex is not working as I expected:
val regex = "<a>(.*)</a>".r
for(m <- regex.findAllIn(rawData)){
println(m)
}
the output is:
<a>content to extract</a>lkdsjkdj<a>more content to extract</a>
I understand what's happening: the regex finds the first <a> and the last </a>.
How can I have an iterator with the two entries?
<a>content to extract</a>
<a>more content to extract</a>
thanks in advance
All is very simple: "<a>(.*?)</a>"
.*? - means anything until something. In your case until </a>
Your regex is not the right one. You should use <a>(.*?)</a> instead
val rawData = "askljdld<a>content to extract</a>lkdsjkdj<a>more content to extract</a>sdkdljk"
val regex = "<a>(.*?)</a>".r
regex.findAllIn(rawData).foreach(println)
How can I get the contents between two delimiters using a regular expression? For example, I want to get the stuff between two |. For example, for this input:
|This is the text I want|
it should return this:
This is what I want
I already tried /^|(.*)$|/, but that returns This is what I want | instead of just This is what I want (shouldn't have the | at the end)
Try escaping the pipes /\|(.*?)\|/
For example, using JavaScript:
var s = '| This is what I want |';
var m = s.match(/\|\s*(.*?)\s*\|/);
m[1]; // => "This is what I want"
See example here:
https://regex101.com/r/qK6aG2/2
I'm beginner in regular expressions and I want to cut some text placed beeween two other words. I'm using QT to do it. Some exapmle:
<li class="wx-feels">
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>°</i>
</li>
I want to get
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>°
From code above, sespecially a number 55 , my idea was to cut whole line from text first and then search it for nubers, but I cannot recover it from whole text.
I typed somthing like that:
QRegExp rx("(Feels like <i><span class=\"wx-value\" itemprop=\"feels-like-temperature-fahrenheit\">)[0-9]{1,3}(</span>°</i>)");
QStringList list;
list = all.split(rx);
Where all is a whole text, but a list contains only those substrings I didn't wanted, is there a posibity split QString into three pieces?
First - text at the beginning (which I don't want)
Second - wanted text
Third - rest of text?
Description
This regex will collect the inner string within the li tags where the li tag has a class of wx-feels, it'll also capture the numeric value inside the span tag.
<li\b[^>]*\bclass=(["'])wx-feels\1[^>]*?>(.*?\bitemprop=(['"])feels-like-temperature-fahrenheit\3[^>]*>(\d+).*?)<\/li>
Groups
Group 0 gets the entire string including the open and close LI tags
gets the open quote for the LI class attribute. This allows us to find the correct close quote after the value
get the string directly inside the LI tag
gets the open quote for the itemprop attribute
gets the digits from the span inner text
Example
This PHP example is simply to show how the regex works.
<?php
$sourcestring="<li class=\"wx-feels\">
Feels like <i><span class=\"wx-value\" itemprop=\"feels-like-temperature-fahrenheit\">55</span>°</i>
</li>";
preg_match('/<li\b[^>]*\bclass=(["\'])wx-feels\1[^>]*?>(.*?\bitemprop=([\'"])feels-like-temperature-fahrenheit\3[^>]*>(\d+).*?)<\/li>/ims',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
$matches Array:
(
[0] => <li class="wx-feels">
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>°</i>
</li>
[1] => "
[2] =>
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>°</i>
[3] => "
[4] => 55
)
Disclaimer
Parsing html with a regex can be problematic because of the high number of edge cases. If you are in control of the input text or if it's always as basic as your sample, then you should have no problem.
If QT has one, I recommend using an HTML parsing tool to capture this data.
Okay, regex ninjas. I'm trying to devise a pattern to add hyperlinks to endnotes in an ePub ebook XHTML file. The problem is that numbering restarts within each chapter, so I need to add a unique identifier to the anchor name in order to hash link to it.
Given a (much simplified) list like this:
<h2>Introduction</h2>
<p> 1 Endnote entry number one.</p>
<p> 2 Endnote entry number two.</p>
<p> 3 Endnote entry number three.</p>
<p> 4 Endnote entry number four.</p>
<h2>Chapter 1: The Beginning</h2>
<p> 1 Endnote entry number one.</p>
<p> 2 Endnote entry number two.</p>
<p> 3 Endnote entry number three.</p>
<p> 4 Endnote entry number four.</p>
I need to turn it into something like this:
<h2>Introduction</h2>
<a name="endnote-introduction-1"></a><p> 1 Endnote entry number one.</p>
<a name="endnote-introduction-2"></a><p> 2 Endnote entry number two.</p>
<a name="endnote-introduction-3"></a><p> 3 Endnote entry number three.</p>
<a name="endnote-introduction-4"></a><p> 4 Endnote entry number four.</p>
<h2>Chapter 1: The Beginning</h2>
<a name="endnote-chapter-1-the-beginning-1"></a><p> 1 Endnote entry number one.</p>
<a name="endnote-chapter-1-the-beginning-2"></a><p> 2 Endnote entry number two.</p>
<a name="endnote-chapter-1-the-beginning-3"></a><p> 3 Endnote entry number three.</p>
<a name="endnote-chapter-1-the-beginning-4"></a><p> 4 Endnote entry number four.</p>
Obviously there will need to be a similar search in the actual text of the book, where each endnote will be linked to endnotes.xhtml#endnote-introduction-1 etc.
The biggest obstacle is that each match search begins AFTER the previous search ends, so unless you use recursion, you can't match the same bit (in this case, the title) for more than one entry. My attempts with recursion have so far yielded only endless loops, however.
I'm using TextWrangler's grep engine, but if you have a solution in a different editor (such as vim), that's fine too.
Thanks!
A bit of awk might do the trick:
Create the following script (I've named it add_endnote_tags.awk):
/^<h2>/ {
i = 0;
chapter_name = $0;
gsub(/<[^>]+>/, "", chapter_name);
chapter_name = tolower(chapter_name);
gsub(/[^a-z]+/, "-", chapter_name);
print;
}
/^<p>/ {
i = i + 1;
printf("<a name=\"endnote-%s-%d\"></a>%s\n", chapter_name, i, $0);
}
$0 !~ /^<h2>/ && $0 !~ /^<p>/ {
print;
}
And then use it to parse your file:
awk -f add_endnote_tags.awk < source_file.xml > dest_file.xml
Hope that helps. If you are on a Windows platform, you might need to install awk by either installing cygwin and the awk package or downloading gawk for Windows
I think this would be difficult to accomplish in a text editor as it requires a two-step process. First you need to section the file into chapters, then you need to process the contents of each chapter. Assuming that "endnote paragraphs" (which is where you wish to add the anchors), are defined as paragraphs having a first word equal to an integer word, then this PHP script will do what you need.
<?php
$data = file_get_contents('testdata.txt');
$output = processBook($data);
file_put_contents('testdata_out.txt', $output);
echo $output;
// Main function to process book adding endnote anchors.
function processBook($text) {
$re_chap = '%
# Regex 1: Get Chapter.
<h2>([^<>]+)</h2> # $1: Chapter title.
( # $2: Chapter contents.
.+? # Contents are everything up to
(?=<h2>|$) # next chapter or end of file.
) # End $2: Chapter contents.
%six';
// Match and process each chapter using callback function.
$text = preg_replace_callback($re_chap, '_cb_chap', $text);
return $text;
}
// Callback function to process each chapter.
function _cb_chap($matches) {
// Build ID from H2 title contents.
// Trim leading and trailing ws from title.
$baseid = trim($matches[1]);
// Strip all non-space, non-alphanums.
$baseid = preg_replace('/[^ A-Za-z0-9]/', '', $matches[1]);
// Append prefix and convert whitespans to single - dash.
$baseid = 'endnote-'. preg_replace('/ +/', '-', $baseid);
// Convert to lowercase.
$baseid = strtolower($baseid);
$text = preg_replace(
'/(<p>\s*)(\d+)\b/',
'<a name="'. $baseid .'-$2"></a>$1$2',
$matches[2]);
return '<h2>'. $matches[1] .'</h2>'. $text;
}
?>
This script correctly proccesses your example data.