Extracting text with imacros - imacros

SITUATION: I am finding it difficult to EXTRACT a specific text from a website.
The template example on the iMacros website (http://wiki.imacros.net/Data_Extraction#Data_Extraction_and_Web_Scraping) for
extracting a variable from iMacros is as follows:
TAG POS=1 TYPE=SPAN ATTR=CLASS:bdytxt&&TXT:* EXTRACT=HTM
However in the html code below, the specific element text1 doesn't have a class to specify in the ATTR section. I am specifically trying to extract text1 from the example below:
//This code is within an html page
<div class="class1">
<img class="class2" src="...">
<strong>
text1
</strong>
<br>
<small>text2</small>
<small class="class3">
<br>
<em>text3:</em>
<span>
<a href="..." class="class4">
<small style="color: #aaa; font-size: 80%">text4</small>
text5
</a>
</span>
<br>
<em>text6</em>
text7,
text8
</small>
</div>
What I have tried:
I know that when I record using "Experimental event recording mode" and click on the specific text1 that I get the following code:
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV:nth-of-type(5)>DIV>STRONG>A" BUTTON=0
I tested to see if the SELECTOR would work in the EXTRACT code like so:
TAG POS=1 TYPE=SPAN SELECTOR="HTML>BODY>DIV:nth-of-type(5)>DIV>STRONG>A" EXTRACT=TXT
but as you can imagine, it didn't.
QUESTION: Does anyone know how I can extract text1 from the above situation?

Well, there can be several ways to extract this text. For example:
TAG POS=1 TYPE=IMG ATTR=CLASS:"class2"
TAG POS=R1 TYPE=A ATTR=* EXTRACT=TXT
Or if you use 'iMacros for Chrome', here's a solution with the help of selector:
TAG SELECTOR="HTML>BODY>DIV:nth-of-type(5)>DIV>STRONG>A" EXTRACT=TXT

Related

Regex to get string between html tags: stop selection at the first match of closing tag

I want to get html between a ul tag. It resembles the following:
<nav id="navi_list_box" class="local_nav favorite">
<ul id="navi_list">
<li class="active">Foo</li>
<li class="">Bar</li>
<li class="">Baz</li>
</ul>
<ul>
<li>Just to make sure this won't be selected</li>
</ul>
</nav>
I want to get html of ul#navi_list.
This is what I have done so far:
<ul[^>]*id(\s)*=('|")navi_list(\s)*('|")>(\n|\r|(\n\r)|.)*(</ul>)
It selects html of #navi_list but also html of the second ul tag.
How can I stop selection before the second ul tag?
I found the solution.
<ul[^>]*id(\s)*=('|")navi_list(\s)*('|")>((.|\n|\r|(\n\r))*?)</ul>
The following regex will select all html between a tag and its closing counterpart.
<TAG\b[^>]*>((.|\n|\r|(\n\r))*?)</TAG>

Imacros Extract TXT after word

I really need help for get the code with this case:
<tr class="detail-middle">
<td colspan="4">
<span class="font-bold">Address</span>
<p>
<strong>Orin Fade</strong>
<br>
19 rue marciere
<br>
Lyon
<br>
Lyon
<br>
France
<br>
Phone Number: +33 0478372730
</p>
</td>
</tr>
I use imacros code:
TAG POS=1 TYPE=SPAN ATTR=TXT:"Address" EXTRACT=TXT
but i need th TXT after TXT address, can imacros get after any word?
Thank you
You can use relative positioning in your case:
TAG POS=1 TYPE=SPAN ATTR=TXT:"Address"
TAG POS=R1 TYPE=P ATTR=* EXTRACT=TXT

iMacros: How can I click a link in with a specific attribute?

In an iMacros script, how can you trigger a click on a link with a specific attribute? In this case, the link I would like to have clicked has a class of "i-project":
<div data-explore-index="1" >
<div class="i-project-card ">
<a href="/xxxxxxxxxxxxxxxxxx" ">
<span ></span>
</a>
<a href="blablabla" class="i-project">
<img src="https://blabla.jpg">
</a>
</div>
</div>
You should be able to select this link based upon its CLASS attribute:
TAG POS=1 TYPE=A ATTR=CLASS:i-project

Imacros, search source

I am not expert on imacros search source command, I tried to looking some text on the source page to be extracted..
<div id='keywordsDiv' name='keywordsDiv' class='r-sidebar'>
<dl class="list normal-text">
<dt class="key">Category</dt>
<dd class="value"><a class="black" href="http://www.abcd">abcd</a> </dd>
<dt class="key">Style</dt>
<dd class="value"><a class="black" href="http://www.def.com/">def</a> </dd>
<dt class="key">Location</dt>
<dd class="value"><a class="black" href="http://www.ghi.com/">GHI</a> </dd>
<dt class="key">Keywords</dt>
<dd class="value">
</dd>
</dl>
</div>
How can I extract from source a text from div id=keywordsDiv.
Thank you
I've used the SEARCH command. It uses regex and has worked well for me searching source code. It can really be powerful in automating dynamic pages.
Here is a link:
http://wiki.imacros.net/SEARCH
*Note: I've run into issues with complex regex, I think there are a few flavors or regex and iMacros uses a specific one, plus there are regex limitations.
TAG POS=1 TYPE=DIV ATTR=ID:keywordsDiv EXTRACT=TXT
Try this.

imacro on tumblr posts for adding multiple tags

I have been using iMacros to input multiple tags when posting photos to save time. They have recently updated the site and I can not figure out how to get iMacro to enter multiple tags.
When recording a macro this is the code iMacro comes up with
TAG POS=11 TYPE=INPUT:TEXT FORM=NAME:NoFormName ATTR=* CONTENT=foo,
The , is needed to start a new tag. It is not starting a new tag or recording the content correctly.
I have looked at the code where the tags come up and this is it below
<section class="tag_editor" style="display: block;">
<div class="tags">
<input class="post_tags" type="text" value="" style="display: none;" name="post[tags]">
<div class="editor_wrapper">
<input class="editor borderless" type="text">
</div>
</div>
</section>
It looks like the input I need is around editor_wrapper and editor borderless and I think I need it added to FORM=NAME:NoFormName and ATTR=* in the iMacro TAG. I have tried different combinations yet iMacro will not autofill the tags for me. The new post feature on tumblr is a pop-up ajax looking window.
The old macro that worked before the site update looked like this. Not sure if it will be of any help.
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:/blog/foobar/new/photo ATTR=ID:tag_editor_input CONTENT=foo,
WAIT SECONDS=.3
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:/blog/foobar/new/photo ATTR=ID:tag_editor_input CONTENT=bar,
WAIT SECONDS=.3
TAG POS=1 TYPE=INPUT:TEXT FORM=ACTION:/blog/foobar/new/photo ATTR=ID:tag_editor_input CONTENT=foo,
Looking for help getting this to work again. It saves me a ton of time to have these tags auto filled for me.
try this one to fill the second input:
TAG POS=1 TYPE=INPUT:TEXT ATTR=CLASS:editor* CONTENT=foo,
let me know if it works