I'm trying to scrape search result counter from Google SERP. It works with Google Spreadsheets, ImportXML and RegExReplace, but not always, because of Spreadsheets fault. So i'm trying to accomplish it with iMacros and can't get scraped string correctly filtered out.
In G Spreadsheets i use
=REGEXREPLACE(IMPORTXML("https://www.google.com/search?q=test&hl=en&as_qdr=m","//div[#id='resultStats']"),".*?([0-9,]+) (w|r)esults?","$1")
The whole imported string in the id="resultsStats" is About 4,290,000 results Here regex .*?([0-9,]+) (w|r)esults? filters all words out so i get only results number. As i said, it doesn't work reliably in Spreadsheets.
The question is: how i use this RegEx with iMacros to get only number? I use this iMacros code:
VERSION BUILD=8881205 RECORDER=FX
SET !TIMEOUT_STEP 0
SET !ERRORIGNORE YES
TAB T=1
SET !DATASOURCE sr1.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !VAR1 EVAL("var randomNumber=Math.floor(Math.random()*45 + 16); randomNumber;")
URL GOTO={{!COL1}}
WAIT SECONDS={{!VAR1}}
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
ADD !EXTRACT {{!URLCURRENT}}
SET !EXTRACT EVAL("decodeURI('{{!EXTRACT}}');")
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.csv
It's very simple to do:
' ... '
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.match(/[0-9,]+/);")
' ... '
Related
I want to automate the submission of deleted pages in Google Search Console for a website that I manage.
That's what I wrote in iMacros for Chrome (I've replaced my-domain.name.com and my-file.csv with the real names, of course):
VERSION BUILD=1011 RECORDER=CR
SET !DATASOURCE C:\Users\MY-USERNAME\Desktop\my-file.csv
SET !DATASOURCE_COLUMNS 1
SET !TIMEOUT_STEP 0
SET !ERRORIGNORE YES
SET !EXTRACT_TEST_POPUP YES
SET !LOOP 1
TAB T=1
URL GOTO=https://search.google.com/search-console/removals?resource_id=https://www.my-domain-name.com/&hl=fr&utm_source=wmx&utm_medium=deprecation-pane&utm_content=url-removal
TAG POS=2 TYPE=SPAN ATTR=TXT:Nouvelle<SP>demande
WAIT SECONDS=4
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:newremovalform ATTR=NAME:urlt CONTENT={{!COL1}}
TAG POS=4 TYPE=SPAN ATTR=TXT:Suivante
TAG POS=4 TYPE=SPAN ATTR=TXT:Envoyer<SP>la<SP>demande
WAIT SECONDS=15
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ACTION:/webmasters/tools/removals-submit-ac?hl=fr&siteUrl=https://www.my-domain-name.com/ ATTR=NAME:next
TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:the-form ATTR=ID:submit-button
WAIT SECONDS=3
But when I play the macro, I immediately get this error message:
Blockquote SyntaxError: wrong format of SET command at line 2
Thanks in advance for your help.
Best regards,
Eva.
FCI not mentioned (check my Profile on how to ask Qt's about the iMacros Tag "a bit correctly"...), but you probably have some Space(s) in the Path for your DataSource, I reckon...
=> Probable FCI:
iMacros for CR v10.1.1, CR93/94(...?), Win7/10/11(...?).
=> Need to replace Spaces with <SP>... :idea:
... Or you can enclose the whole Path with Single or Double Quotes... (But the Backslashes then need to be escaped with another Backslash...)
What also would cause "a Problem" is if your Path (including the Filename) contains some Single Quote(s)..., ... that you then can try to escape also, or maybe easier is to make sure that your Path "avoids" Spaces and "Special" Chars...
It's all documented and explained (with Examples) in the Wiki for the !DATASOURCE Command...
I am struggling for quite a while now.
I am extracting text from a Website using Imacros with this result :
Niklaus Hasling
There are whitespaces before the first name and after the surname
This String is stored in the variable !VAR2
I would like to use a regex that isolates the first name in !VAR3 and the surname in !VAR4
Can someone help me ?
I can't figure out how to write the regex
'Extract and Save Names
TAG XPATH="/html/body/main/div[1]/div[1]/div/div[1]/div[1]/div/dl/dt/span" EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.trim(REGEX'')")
SET !VAR3 {{!EXTRACT}}
SET !EXTRACT NULL
'Extract and save SurNames
TAG XPATH="/html/body/main/div[1]/div[1]/div/div[1]/div[1]/div/dl/dt/span" EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.trim(REGEX'')")
SET !VAR4 {{!EXTRACT}}
SET !EXTRACT NULL
enter image description here
I figured this could trim the white spaces before and after the names :
^[ \t]+|[ \t]+$.
and an array could split names, but I cant figure out how with iMacros
Hum, you seem to like XPATH and REGEX, ah-ah...!
Implementation without REGEX (that I don't like and never use):
'Extract and Save First (`!VAR3`) + Last (`!VAR4`) Names:
SET !EXTRACT NULL
TAG XPATH="/html/body/main/div[1]/div[1]/div/div[1]/div[1]/div/dl/dt/span" EXTRACT=TXT
SET !ERRORIGNORE YES
SET !VAR3 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.trim(); y=x.split(' '); z=y[0]; z;")
SET !VAR4 EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.trim(); y=x.split(' '); z=y[1]; z;")
'>
'Debug:
SET Debug_Info EXTRACT:<BR>_{{!EXTRACT}}_<BR><BR>
ADD Debug_Info First_Name:<SP>_{{!VAR3}}_<BR>Last_Name:<SP>_{{!VAR4}}_
PROMPT {{Debug_Info}}
!ERRORIGNORE is "needed" in case y[1] does not "exist", or iMacros will throw a Runtime Error...
So the HTML code is :
<span class="countdown">
01:22
</span>
The imacro code to extract the text (01:22) is :
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
I want to extract only the first two characters and not the whole text, in the example i posted , the extracted TEXT would be "01" and not "01:22"
You will need to use the imacro EVAL method and use a little bit of javascript and regex to break up the string and assign it to another variable so that you get the first 2 characters. Below is the solution:
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
SET !VAR1 EVAL("var x=\"{{!EXTRACT}}\"; x=x.match(/^.{2}/).join(''); x;")
PROMPT {{!VAR1}}
Enjoy! If this was helpful, please mark as such, thanks!
Edit Here's a slightly better method, using the javascript split function. This will allow you to specify the 1st part (01) or the 2nd part (22) of 01:22
TAG POS=1 TYPE=SPAN ATTR=CLASS:"countdown" EXTRACT=TXT
' Below line will assign the first part before colon (01) to VAR1
SET !VAR1 EVAL("var x=\"{{!EXTRACT}}\"; y=x.join(':'); y[0];")
' Below line will assign the first part before colon (01) to VAR2
SET !VAR2 EVAL("var x=\"{{!EXTRACT}}\"; x=x.join(':'); y[1];")
PROMPT {{!VAR1}}
Answer updated due to recent comment/question to my answer.
Imacros Eval function to replace "'" with ""
or just to delete all the ' in a string of text.
ive tried this but i cant get it to work with apostrophes
TAG POS=1 TYPE=DIV ATTR=CLASS:after_title EXTRACT=TXT
SET !VAR2 EVAL("var extr2=\"{{!EXTRACT}}\"; extr2.replace(\"'\",\"\"); ")
After doing some reading i tried this, get an error
TAG POS=1 TYPE=DIV ATTR=CLASS:after_title EXTRACT=TXT
SET !VAR2 EVAL("var extr2=\"{{!EXTRACT}}\"; extr2.replace(\'/g\,\"GHF\"); ")
I really hope some one can help, its really doing my head in
TAG POS=1 TYPE=DIV ATTR=CLASS:after_title EXTRACT=TXT
SET !VAR2 EVAL("var extr2=\"{{!EXTRACT}}\"; extr2.replace(/'/g,''); ")
Can you try this and let us know if it worked?
My example script clicks the first "rate" button it finds on a page, yet I want it to find text and then click the next "rate" button on a page.
VERSION BUILD=8300326 RECORDER=FX
TAB T=1
SET !DATASOURCE C:\Users\admin\Documents\iMacros\Downloads\extract.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
SEARCH SOURCE=TXT:"my favorite energy drink"
TAG POS=1 TYPE=SPAN ATTR=TXT:*<SP>Rating:<SP>Good<SP>Answer
WAIT SECONDS=2
The search line seems to work, yet the tag line jumps the script back to the top of the page.
Try replacing this:
TAG POS=1 TYPE=SPAN ATTR=TXT:*<SP>Rating:<SP>Good<SP>Answer
With this:
TAG POS=2 TYPE=SPAN ATTR=TXT:*<SP>Rating:<SP>Good<SP>Answer
Try replacing this:
TAG POS=1 TYPE=SPAN ATTR=TXT:*<SP>Rating:<SP>Good<SP>Answer
With this:
TAG POS=R1 TYPE=SPAN ATTR=TXT:*<SP>Rating:<SP>Good<SP>Answer
The 'R' in the position refers to "Relative", so no matter where it first finds what you are searching for it will look for the link/txt/etc in 1 position directly after what it found. You can also have it work in reverse ex. POS=R-1