Extract multiple occurrences in JSON file with iMacros - regex

I'm using iMacros for Firefox and want to extract some id's from a JSON file. The JSON file looks like this:
"count":0,"id":"12345","time"
blabla
"count":0,"id":"12346","time"
The code I'm using in iMacros is:
URL GOTO=https://www.jsonurl.com
SEARCH SOURCE=REGEXP:"\"id\":\"(.[^\"]*)\"" EXTRACT="$1"
PROMPT {{!EXTRACT}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=*
With this code, it is only extracting 12345 from the above JSON example. How can I edit the code to extract all occurrences of id?

Sorry, no can do :(
Global, iterative matching is currently not supported, so only the first match on the page can be found and extracted.
Source (iMacros Wiki)

I would use JavaScript solution.
var macro;
macro ="CODE:";
macro +="URL GOTO=https://www.jsonurl.com"+"\n";
macro +="TAG POS=1 TYPE=DIV ATTR=CLASS:what_ever_you_are_extracting EXTRACT=HTM"+"\n";
iimPlay(macro)
var text=iimGetLastExtract();
text=text.split('"id":"')[1];
text=text.split('",')[0];
text=text.trim();
alert(text);
Edit:
The command
TAG POS=1 TYPE=HTML ATTR=CLASS:* EXTRACT=HTM
Extracts everything on the page.

Related

Picking data from Excel CSV using iMacros

I want to automate the submission of deleted pages in Google Search Console for a website that I manage.
That's what I wrote in iMacros for Chrome (I've replaced my-domain.name.com and my-file.csv with the real names, of course):
VERSION BUILD=1011 RECORDER=CR
SET !DATASOURCE C:\Users\MY-USERNAME\Desktop\my-file.csv
SET !DATASOURCE_COLUMNS 1
SET !TIMEOUT_STEP 0
SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP YES
SET !LOOP 1
TAB T=1
URL GOTO=https://search.google.com/search-console/removals?resource_id=https://www.my-domain-name.com/&hl=fr&utm_source=wmx&utm_medium=deprecation-pane&utm_content=url-removal
TAG POS=2 TYPE=SPAN ATTR=TXT:Nouvelle<SP>demande
WAIT SECONDS=4
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:newremovalform ATTR=NAME:urlt CONTENT={{!COL1}}
TAG POS=4 TYPE=SPAN ATTR=TXT:Suivante
TAG POS=4 TYPE=SPAN ATTR=TXT:Envoyer<SP>la<SP>demande
WAIT SECONDS=15
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:/webmasters/tools/removals-submit-ac?hl=fr&siteUrl=https://www.my-domain-name.com/ ATTR=NAME:next
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:the-form ATTR=ID:submit-button
WAIT SECONDS=3
But when I play the macro, I immediately get this error message:
Blockquote SyntaxError: wrong format of SET command at line 2
Thanks in advance for your help.
Best regards,
Eva.
FCI not mentioned (check my Profile on how to ask Qt's about the iMacros Tag "a bit correctly"...), but you probably have some Space(s) in the Path for your DataSource, I reckon...
=> Probable FCI:
iMacros for CR v10.1.1, CR93/94(...?), Win7/10/11(...?).
=> Need to replace Spaces with <SP>... :idea:
... Or you can enclose the whole Path with Single or Double Quotes... (But the Backslashes then need to be escaped with another Backslash...)
What also would cause "a Problem" is if your Path (including the Filename) contains some Single Quote(s)..., ... that you then can try to escape also, or maybe easier is to make sure that your Path "avoids" Spaces and "Special" Chars...
It's all documented and explained (with Examples) in the Wiki for the !DATASOURCE Command...

Extract htm text for differents tags in imacros

I want to extract the htm text for a particular tags, so here is the link for which I want to extract their htm text
I use this tag to extract the entire htm text for a particular record
TAG POS=2 TYPE=div ATTR=class:m-srp-card<SP>SRCard&&TXT:* EXTRACT=HTM
So by just changing the POS=? number I get the htm text for every record, but in this case the attribute class:m-srp-card<SP>SRCard changes for different position. for example in when POS=3 it tags 4th record instead of 3rd.
Is there any alternative by which I can just change the POS number and get the htm record?
Thanks
Domnick.
I checked the link you provided, the problem is with the attribute selector as you mentioned, the reason you are facing this issue is because it exactly checks for the class in the div to be m-srp-card<SP>SRCard but in some of the elements there are other classes which are present, my solution for this will be to add a wildcard (*) at the end of the classes, so that it is flexible and allows other classes to be present.
VERSION BUILD=1001 RECORDER=CR
SET !ERRORIGNORE YES
SET !LOOP 1
TAG POS={{!LOOP}} TYPE=div ATTR=class:m-srp-card<SP>SRCard* EXTRACT=HTM
I have also setup a loop which can be rerun to iterate through the divs, also included the second line which will handle cases where the particular tag is not present in the html.
Please let me know if this fixes your issue!

Regex filter for iMacros

I'm trying to scrape search result counter from Google SERP. It works with Google Spreadsheets, ImportXML and RegExReplace, but not always, because of Spreadsheets fault. So i'm trying to accomplish it with iMacros and can't get scraped string correctly filtered out.
In G Spreadsheets i use
=REGEXREPLACE(IMPORTXML("https://www.google.com/search?q=test&hl=en&as_qdr=m","//div[#id='resultStats']"),".*?([0-9,]+) (w|r)esults?","$1")
The whole imported string in the id="resultsStats" is About 4,290,000 results Here regex .*?([0-9,]+) (w|r)esults? filters all words out so i get only results number. As i said, it doesn't work reliably in Spreadsheets.
The question is: how i use this RegEx with iMacros to get only number? I use this iMacros code:
VERSION BUILD=8881205 RECORDER=FX
SET !TIMEOUT_STEP 0
SET !ERRORIGNORE YES
TAB T=1
SET !DATASOURCE sr1.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !VAR1 EVAL("var randomNumber=Math.floor(Math.random()*45 + 16); randomNumber;")
URL GOTO={{!COL1}}
WAIT SECONDS={{!VAR1}}
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
ADD !EXTRACT {{!URLCURRENT}}
SET !EXTRACT EVAL("decodeURI('{{!EXTRACT}}');")
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.csv
It's very simple to do:
' ... '
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.match(/[0-9,]+/);")
' ... '

how to extract only first two characters with imacro?

So the HTML code is :
<span class="countdown">
01:22
</span>
The imacro code to extract the text (01:22) is :
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
I want to extract only the first two characters and not the whole text, in the example i posted , the extracted TEXT would be "01" and not "01:22"
You will need to use the imacro EVAL method and use a little bit of javascript and regex to break up the string and assign it to another variable so that you get the first 2 characters. Below is the solution:
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
SET !VAR1 EVAL("var x=\"{{!EXTRACT}}\"; x=x.match(/^.{2}/).join(''); x;")
PROMPT {{!VAR1}}
Enjoy! If this was helpful, please mark as such, thanks!
Edit Here's a slightly better method, using the javascript split function. This will allow you to specify the 1st part (01) or the 2nd part (22) of 01:22
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
' Below line will assign the first part before colon (01) to VAR1
SET !VAR1 EVAL("var x=\"{{!EXTRACT}}\"; y=x.join(':'); y[0];")
' Below line will assign the first part before colon (01) to VAR2
SET !VAR2 EVAL("var x=\"{{!EXTRACT}}\"; x=x.join(':'); y[1];")
PROMPT {{!VAR1}}
Answer updated due to recent comment/question to my answer.

IMACROS how to do this with Regex supported in search function

I am using IMACRO Firefox inbuilt editor to edit macros . I see it has an "REGULAR expressions" option while doing a search/replace function. See pic!
http://i.imgur.com/nbfRDQs.jpg
As per their site..it supports DOTNET REGEX http://www.regular-expressions.info/dotnet.html
How can i search for a line break using search function?
For. e.g. i need to replace multiple such instances in macro like following
TAG POS=1 TYPE=A ATTR=TXT:AB
with
TAG POS=1 TYPE=A ATTR=TXT:AB
TAG POS=1 TYPE=A ATTR=TXT:SOLUTION
Note at that i have to replace 1 line with 2 lines so needs a line break
Can i enter some regex code in replace field to enter a break and then replace with 2 lines. Currently without that replaced text (i.e 2 lines) comes as one line
Thanks
Don’t waste your time with the built-in editor. Use more advanced ones: ‘Sublime Text’, ‘Notepad++’ etc. (The symbol of a line break (new line) is \n there.)