Checking each URL works in tests on a Yesod site - unit-testing

I was trying to check that all links work on a Yesod website home page. I wrote this hSpec test.
module Handler.HomeSpec (spec) where
import Data.Either (fromRight)
import qualified Data.Text as T
import Network.Wai.Test (simpleBody)
import TestImport
import Yesod.Test.TransversingCSS (findAttributeBySelector)
getAllLinks :: YesodExample site [Text]
getAllLinks = withResponse $ \res -> do
let links = fromRight [] findAttributeBySelector (simpleBody res) "a" "href"
return $ T.concat <$> links
spec :: Spec
spec = withApp $
describe "Homepage" $ do
it "checks all links" $ do
get HomeR
statusIs 200
links <- getAllLinks
forM_ links $ \oneLink -> do
get HomeR
statusIs 200
get oneLink
statusIs 200
Everything compiles ok but the get function gets rid of the host part of the URLs you feed it. For example, when you give it https://github.com/zigazou/bazasso, it will try to fetch /zigazou/bazasso which returns a 404 code.
Is there a way to make it work like I want ?
Should I add a function that removes external links from the tests ?
Is it simply not the right place to do it ?

The simpler, the better: I've removed everything that starts with a protocol from the links that will be checked. Thanks to #ncaq for your comments.
module Handler.HomeSpec (spec) where
import Data.Either (fromRight)
import qualified Data.Text as T
import Network.Wai.Test (simpleBody)
import TestImport
import Yesod.Test.TransversingCSS (findAttributeBySelector)
isRelative :: Text -> Bool
isRelative url
| T.take 7 url == "http://" = False
| T.take 8 url == "https://" = False
| T.take 7 url == "mailto:" = False
| T.take 4 url == "tel:" = False
| otherwise = True
getAllLinks :: YesodExample site [Text]
getAllLinks = withResponse $ \res -> do
let currentHtml = simpleBody res
links = fromRight [] $ findAttributeBySelector currentHtml "a" "href"
return $ filter isRelative $ T.concat <$> links
spec :: Spec
spec = withApp $
describe "Homepage" $ do
it "checks all links" $ do
get HomeR
statusIs 200
links <- getAllLinks
forM_ links $ \oneLink -> do
get HomeR
statusIs 200
get oneLink
statusIs 200

Related

Haskell Spock: How to get the raw request body

The body function in Web.Spock.Action is supposed to return the raw request body. However, it just doesn't seem to be doing that:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text.Encoding (decodeUtf8)
import Debug.Trace (trace)
import Web.Spock
import Web.Spock.Config
app :: SpockM () () () ()
app = do
get root $ text "Hello!"
post "test" $ do
b <- body -- b is always ""!
text $ trace ("b="++show b) decodeUtf8 b
main :: IO ()
main = do
spockCfg <- defaultSpockCfg () PCNoDatabase ()
runSpock 3000 (spock spockCfg app)
A curl --data 123123 localhost:3000/test returns nothing, and the trace output confirms that b is an empty string.
Spock is running on port 3000
b=""
The equivalent Scotty app works just fine:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Text.Lazy.Encoding (decodeUtf8)
import Web.Scotty
main = scotty 3000 $ do
get "/" $ text "Hello!"
post "/test" $ do
b <- body -- works fine
text $ decodeUtf8 b
I absolutely can't see what I'm doing wrong. Any input would be highly appreciated!
Update: Above example will work with Spock >= 0.12.0.0!

Element not found in cache - Selenium (Python)

I just wrote a simple webscraping script to give me all the episode links on a particular site's page. The script was working fine, but, now it's broke. I didn't change anything.
Try this URL (For scraping ) :- http://www.crunchyroll.com/tabi-machi-late-show
Now, the script works mid-way and gives me an error stating, ' Element not found in the cache - perhaps the page has changed since it was looked up'
I looked it up on internet and people said about using the 'implicit wait' command at certain places. I did that, still no luck.
UPDATE : I tried this script in a demote desktop and it's working there without any problems.
Here's my script :-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
from subprocess import Popen
#------------------------------------------------
try:
Link = raw_input("Please enter your Link : ")
if not Link:
raise ValueError('Please Enter A Link To The Anime Page. This Application Will now Exit in 5 Seconds.')
except ValueError as e:
print(e)
time.sleep(5)
exit()
print 'Analyzing the Page. Hold on a minute.'
driver = webdriver.Firefox()
driver.get(Link)
assert "Crunchyroll" in driver.title
driver.implicitly_wait(5) # <-- I tried removing this lines as well. No luck.
elem = driver.find_elements_by_xpath("//*[#href]")
driver.implicitly_wait(10) # <-- I tried removing this lines as well. No luck.
text_file = open("BatchLink.txt", "w")
print 'Fetching The Links, please wait.'
for elem in elem:
x = elem.get_attribute("href")
#print x
text_file.write(x+'\n')
print 'Links have been fetched. Just doing the final cleaning now.'
text_file.close()
CleanFile = open("queue.txt", "w")
with open('BatchLink.txt') as f:
mylist = f.read().splitlines()
#print mylist
with open('BatchLink.txt', 'r') as inF:
for line in inF:
if 'episode' in line:
CleanFile.write(line)
print 'Please Check the file named queue.txt'
CleanFile.close()
os.remove('BatchLink.txt')
driver.close()
Here's a screenshot of the error (might be of some help) :
http://i.imgur.com/SaANlsg.png
Ok i didn't work with python but know the problem
you have variable that you init -> elem = driver.find_elements_by_xpath("//*[#href]")
after that you doing some things with it in loop
before you finishing the loop try to init this variable again
elem = driver.find_elements_by_xpath("//*[#href]")
The thing is that the DOM is changes and you loosing the element collection.

How to merge few python scripts into one?

I am newbie when it comes to programming and python.
Therefore I've got a question. With my fellow students we have created few python scripts but now we are stuck and have no more ideas. We need to merge few python scripts into one working scripts. Could anyone help us with that, please?
Scripts:
# Script: webpage_get.py
# Desc: Fetches data from a webpage, and parses out hyperlinks.
# Author: Wojciech Kociszewski
# Created: Nov, 2013
#
import sys, urllib
def wget(url):
''' Try to retrieve a webpage via its url, and return its contents'''
print '[*] wget()'
#open file like url object from web, based on url
url_file = urllib.urlopen(url)
# get webpage contents
page = url_file.read()
return page
def main():
#temp testing url argument
sys.argv.append('http://www.soc.napier.ac.uk/~cs342/CSN08115/cw_webpage/index.html')
#check args
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return
#Get and analyse web page
print wget(sys.argv[1])
if __name__ == '__main__':
main()
# Script: webpage_getlinks.py
# Desc: Basic web site info gathering and analysis script. From a URL gets
# page content, parsing links out.
# Author: Wojciech Kociszewski
# Created: Nov, 2013
#
import sys, re
import webpage_get
def print_links(page):
''' find all hyperlinks on a webpage passed in as input and print '''
print '[*] print_links()'
# regex to match on hyperlinks, returning 3 grps, links[1] being the link itself
links = re.findall(r'(\<a.*href\=.*)(http\:.+)(?:[^\'" >]+)', page)
# sort and print the links
links.sort()
print '[+]', str(len(links)), 'HyperLinks Found:'
for link in links:
print link[1]
def main():
# temp testing url argument
sys.argv.append('http://www.soc.napier.ac.uk/~cs342/CSN08115/cw_webpage/index.html')
# Check args
if len(sys.argv) != 2:
print '[-] Usage: webpage_getlinks URL'
return
# Get the web page
page = webpage_get.wget(sys.argv[1])
# Get the links
print_links(page)
if __name__ == '__main__':
main()
# Script: webpage_getemails.py
# Desc: Basic web site info gathering and analysis script. From a URL gets
# page content, parsing emails out.
# Author: Wojciech Kociszewski
# Created: Nov, 2013
#
import sys, re
import webpage_get
def print_emails(page):
''' find all emails on a webpage passed in as input and print '''
print '[*] print_emails()'
# regex to match on emails
emails = re.findall(r'([\d\w\.-_]+#[\w\d\.-_]+\.\w+)', page)
# sort and print the emails
emails.sort()
print '[+]', str(len(emails)), 'Emails Found:'
for email in emails:
print email
def main():
# temp testing url argument
sys.argv.append('http://www.soc.napier.ac.uk/~cs342/CSN08115/cw_webpage/index.html')
# Check args
if len(sys.argv) != 2:
print '[-] Usage: webpage_getemails'
return
# Get the web page
page = webpage_get.wget(sys.argv[1])
# Get the emails
print_emails(page)
if __name__ == '__main__':
main()
Analyse your scripts and find the common code
Convert the common code into a module
Rewrite the individual programs with the common code
If you then wish to make the individual programs into one big program it will be much easier

Building a haskell interpreter (hint) as dynamic library, useable from C++: Missing Interpreter.dyn_hi

I want to create a haskell interpreter that I can use from C++ on linux.
I have a file FFIInterpreter.hs which implements the interpreter in haskell and exports the functions via FFI to C++.
module FFIInterpreter where
import Language.Haskell.Interpreter
import Data.IORef
import Foreign.StablePtr
import Foreign.C.Types
import Foreign.C.String
import Control.Monad
import Foreign.Marshal.Alloc
type Session = Interpreter ()
type Context = StablePtr (IORef Session)
foreign export ccall createContext :: CString -> IO Context
createContext :: CString -> IO Context
createContext name = join ((liftM doCreateContext) (peekCString name))
where
doCreateContext :: ModuleName -> IO Context
doCreateContext name
= do let session = newModule name
_ <- runInterpreter session
liftIO $ newStablePtr =<< newIORef session
newModule :: ModuleName -> Session
newModule name = loadModules [name] >> setTopLevelModules [name]
foreign export ccall freeContext :: Context -> IO ()
freeContext :: Context -> IO ()
freeContext = freeStablePtr
foreign export ccall runExpr :: Context -> CString -> IO CString
runExpr :: Context -> CString -> IO CString
runExpr env input = join ((liftM newCString) (join (((liftM liftM) doRunExpr) env (peekCString input))))
where
doRunExpr :: Context -> String -> IO String
doRunExpr env input
= do env_value <- deRefStablePtr env
tcs_value <- readIORef env_value
result <- runInterpreter (tcs_value >> eval input)
return $ either show id result
foreign export ccall freeString :: CString -> IO ()
freeString :: CString -> IO ()
freeString = Foreign.Marshal.Alloc.free
When I compile the whole project with ghc, everything works fine. I use the following command:
ghc -no-hs-main FFIInterpreter.hs main.cpp -lstdc++
But the haskell module is only a small piece of the C++ project and I don't want the whole project to depend on ghc.
So I want to build a dynamic library with ghc and then link it to the project using g++.
$ ghc -shared -fPIC FFIInterpreter.hs module_init.c -lstdc++
[1 of 1] Compiling FFIInterpreter ( FFIInterpreter.hs, FFIInterpreter.o )
Linking a.out ...
/usr/bin/ld: /usr/lib/haskell-packages/ghc/lib/hint-0.3.3.2/ghc-7.0.3/libHShint-0.3.3.2.a(Interpreter.o): relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/lib/haskell-packages/ghc/lib/hint-0.3.3.2/ghc-7.0.3/libHShint-0.3.3.2.a: could not read symbols: Bad value
collect2: ld gab 1 als Ende-Status zurück
So I added the -dynamic keyword, but that also doesn't work:
$ ghc -dynamic -shared -fPIC FFIInterpreter.hs librarymain.cpp -lstdc++
FFIInterpreter.hs:3:8:
Could not find module `Language.Haskell.Interpreter':
Perhaps you haven't installed the "dyn" libraries for package `hint-0.3.3.2'?
Use -v to see a list of the files searched for.
I searched my system for Interpreter.dyn_hi but didn't find it. Is there a way to get it?
I also tried to install hint manually, but this also doesn't deliver the Interpreter.dyn_hi file.
You have to install the library (and all it depends on) with the --enable-shared flag (using cabal-install) to get the .dyn_hi and .dyn_o files. You may consider setting that option in your ~/.cabal/config file.
Perhaps the easiest way is to uncomment the shared: XXX line in ~/.cabal/config, set the option to True and
cabal install --reinstall world
For safety, run that with the --dry-run option first to detect problems early. If the --dry-run output looks reasonable, go ahead and reinstall - it will take a while, though.

Scripted main in OCaml?

How can I emulate this Python idiom in OCaml?
if __name__=="__main__":
main()
See RosettaCode for examples in other programming languages.
There is no notion of main module in Ocaml. All the modules in a program are equal. So you can't directly translate this Python idiom.
The usual way in Ocaml is to have a separate file containing the call to main, as well as other stuff like command line parsing that only make sense in a standalone executable. Don't include that source file when linking your code as a library.
There is a way to get at the name of the module, but it's rather hackish, as it is intended for debugging only. It violates the usual assumption that you can rename a module without changing its behavior. If you rely on it, other programmers reading your code will curse you. This method is provided for entertainment purposes only and should not be used in real life.
let name_of_this_compilation_unit =
try assert false with Assert_failure (filename, _, _) -> filename
You can compare the name of the compilation unit with Sys.executable_name or Sys.argv.(0). Note that this is not really the same thing as the Python idiom, which does not rely on the toplevel script having a particular name.
$ ocamlc -o scriptedmain -linkall str.cma scriptedmain.ml
$ ./scriptedmain
Main: The meaning of life is 42
$ ocamlc -o test -linkall str.cma scriptedmain.ml test.ml
$ ./test
Test: The meaning of life is 42
scriptedmain.ml:
let meaning_of_life : int = 42
let main () = print_endline ("Main: The meaning of life is " ^ string_of_int meaning_of_life)
let _ =
let program = Sys.argv.(0)
and re = Str.regexp "scriptedmain" in
try let _ = Str.search_forward re program 0 in
main ()
with Not_found -> ()
test.ml:
let main () = print_endline ("Test: The meaning of life is " ^ string_of_int Scriptedmain.meaning_of_life)
let _ =
let program = Sys.argv.(0)
and re = Str.regexp "test" in
try let _ = Str.search_forward re program 0 in
main ()
with Not_found -> ()
Posted on RosettaCode.