[plt-scheme] Using ssax with broken web pages...

From: geb a (geb_a at yahoo.com)
Date: Sat Jun 3 13:00:35 EDT 2006

Hello all,

I am trying to process web pages using ssax and I have
used someone's example on the web and gotten it
working ("somewhat").  The problem comes when trying
to process something on the internet. For instance,
processing google's web page yields the error:


 Saturday, June 3rd, 2006 9:50:21am session 1:
xml-server exception:  [GIMatch] broken for (END .
head) while expecting ENDMETA

So apparently, the parser expected an ending tag but
didn't find it.  Does it make sense to use ssax on web
pages that are not developed by yourself or can
permissive parsers be developed to ignore these
problems?  How would the parser be modified to ignore
this problem?

Thanks ahead of time for the help!

Sincerely

Dan Anderson


(require 
 (lib "thread.ss" )
 (lib "url.ss" "net")
 (lib "xml.ss" "xml")
 (lib "input-parse.ss" "ssax")
 (lib "date.ss")
 (lib "ssax.ss" "ssax"))

(define *timeout* 20)
(define google  (get-pure-port (string->url
"http://www.google.com")))
(define outport (current-output-port))
(date-display-format 'american)

;parser-error PORT MESSAGE SPECIALISING-MSG*
(define (parser-error port message .
specialising-msgs)
  (error (cons message specialising-msgs)))

;(set! parser-error report-parser-error)



(define (log-message msg)
  (let ((datestr
         (date->string (seconds->date
(current-seconds)) #t)))
    (fprintf (current-error-port) "~a ~a~%" datestr
msg)))

(define (test-parser inport outport)
  (#cs (ssax:make-parser
        NEW-LEVEL-SEED
        (lambda (elem-gi attributes namespaces
expected-content seed)
          (fprintf outport "new : ~a<br>" seed)
          (cons elem-gi seed))
        
        FINISH-ELEMENT
        (lambda(elem-gi attributes namespaces
parent-seed seed)
          (fprintf outport "finish: ~a<br>" seed)
          parent-seed)
        
       ;[GIMatch] broken for (END . head) while
expecting ENDMETA
        
        CHAR-DATA-HANDLER
        (lambda (string1 string2 seed)
          (fprintf outport "char: ~a " seed)
          seed)) 
       inport "()"))

(define counter-sem (make-semaphore 1))

(define counter
  (let ((cnt 1))
    (lambda()
      (semaphore-wait counter-sem)
      (begin0
        cnt
        (set! cnt (+ cnt 1))
        (semaphore-post counter-sem)))))

(define (xml-server iport oport)    
  (let ((session (counter))) 
    (with-handlers
        ((exn?
          (lambda(exn)
            (log-message
             (format "session ~a: xml-server
exception:  ~a" session
                     (exn-message exn)))
            #f)))
      (log-message (format "session ~a BEGIN"
session))
      (test-parser iport oport)
      (log-message (format "session ~a END"
session))))) 


;(define (start port)
;  (run-server port xml-server *timeout*))

(test-parser google outport)
(xml-server google outport) 



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Posted on the users mailing list.