[plt-scheme] Problems fetching page
I want to extract some information (the list of news) from the
page <http://www.drv.dk>. Unfortunately it is redirected and thus
the functions GET-PURE-PORT and GET-IMPURE-PORT give me problems.
The first problem is that I'm not told about the redirection:
> (copy-port (get-impure-port (string->url "http://www.drv.dk/"))
(current-output-port))
HTTP/1.1 500 Server Error
Server: Microsoft-IIS/5.0
Date: Sat, 24 Apr 2004 15:43:14 GMT
Content-Type: text/html
Content-Length: 102
<html><head><title>Error</title></head><body>The system cannot find the file specified.
</body></html>
Using wget I found out that the proper address is <http://www.drv.dk/default_frontpage.aspx?siteid=1>.
The second problem is that GET-IMPURE-PORT behaves the same with
the new address:
> (copy-port (get-impure-port (string->url "http://www.drv.dk/default_frontpage.aspx?siteid=1"))
(current-output-port))
HTTP/1.1 500 Server Error
Server: Microsoft-IIS/5.0
Date: Sat, 24 Apr 2004 15:35:04 GMT
Content-Type: text/html
Content-Length: 102
<html><head><title>Error</title></head><body>The system cannot find the file specified.
</body></html>
Just to be sure the address is correct I fetched it again using wget:
> (require (lib "process.ss" "mzlib"))
> (system "c:/cygwin/bin/wget \"http://www.drv.dk/default_frontpage.aspx?siteid=1\"")
--17:37:02-- http://www.drv.dk/default_frontpage.aspx?siteid=1
=> `default_frontpage.aspx at siteid=1.5'
Resolving www.drv.dk...
213.150.32.111
Connecting to www.drv.dk[213.150.32.111]:80... connected.
HTTP request sent, awaiting response...
200 OK
Length: 40,271 [text/html]
0K .......... ....
...... .......... ......... 100% 58.44 KB/s
17:37:05 (58.44 KB/s) - `default_frontpage.aspx at siteid=1.5' saved [40271/40271]
#t
The .5 in the final filename is because it's the fifth time I fetch it.
And its the proper contents too:
> (call-with-input-file "default_frontpage.aspx at siteid=1.5"
(lambda (port)
(copy-port port (current-output-port))))
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<TITLE>Det Radikale Venstre</TITLE>
<LINK REL="STYLESHEET" HREF="style.css">
<SCRIPT LANGUAGE="JavaScript1.2" SRC="hiermenu/HM_Loader.js"
TYPE="text/javascript"></SCRIPT>
<SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript">
...
[rest of front page deleted]
Is there a way to persuade GET-PURE-PORT and GET-IMPURE-PORT to fetch the page?
Note: The above was on a WindowsXP-machine with IE 6.0
--
Jens Axel Søgaard