[plt-scheme] Cookies to web servlet getting lost when __utmz cookie present

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Tue May 4 13:21:05 EDT 2010

I've just committed a more permissive cookie parser that handles the
original "cookies".

Jay

On Mon, May 3, 2010 at 9:49 PM, Nadeem Abdul Hamid <nadeem at acm.org> wrote:
> Thanks, Jay and Todd. Very helpful.
>
> It would certainly be nice if (request-cookies ...) didn't throw away all the cookies just because one of them has an ill-formed value.
>
> --- nadeem
>
>
> On May 3, 2010, at 10:09 PM, Todd O'Bryan wrote:
>
>> Nadeem,
>>
>> You might try get-cookie from the net/cookie module.
>>
>> #lang scheme
>> (require net/cookie)
>> (require web-server/http/request-structs)
>>
>> (get-cookie "teaching-order" (headers-assq* #"cookie"
>> (request-headers/raw req)))
>>
>> It looks like it works for your cookie, but it doesn't do so well with Google's.
>>
>> While I sympathize with Jay's desire to be true to the spec, it seems
>> like there should be some way to deal with evil, non-compliant
>> cookies, especially given that you have no control over them.
>>
>> Maybe use request-cookies first and then, if that ends up empty, check
>> to see if there's a Cookie header and try a more forgiving parsing
>> algorithm like the one Jay provided. If you come up with something
>> that doesn't feel terribly hacky, please share!
>>
>> Todd
>>
>> On Fri, Apr 30, 2010 at 12:33 PM, Jay McCarthy <jay.mccarthy at gmail.com> wrote:
>>> On Fri, Apr 30, 2010 at 10:12 AM, Nadeem Abdul Hamid <nadeem at acm.org> wrote:
>>>> Thanks, Jay.
>>>>
>>>> That's kind of annoying though -- I mean I can't do anything about the
>>>> cookies that Google requests the browser to set when other pages on
>>>> the domain are visited. And I tried different browsers -- the same
>>>> problem. Why would Google Analytics be setting invalid cookies? (A
>>>> search doesn't seem to turn up any relevant hits, or anyone else every
>>>> experience this issue.)
>>>
>>> I don't know why they write buggy programs. I've submitted a bug
>>> report the last time this exact problem came up.
>>>
>>>> So basically, I need to parse the headers for cookies in my own program?
>>>
>>> It's not that bad. Here's something that does what Ruby does for
>>> parsing cookies. [Note that Ruby allows a value like the one've sent
>>> to be interpreted as a cookie, even though it is not.]
>>>
>>> #lang scheme
>>> (define ex #"teaching-order=course;
>>> __utmz=165257760.1272597702.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\r\n")
>>>
>>> (define (not-cookie-parse s)
>>>  (for/fold ([cookies (make-immutable-hash empty)])
>>>    ([key*val (in-list (regexp-split #rx"[;,][ \t]*" s))])
>>>    (match-define (list _ key val) (regexp-match #rx"^([^=]*)=(.*)$" key*val))
>>>    (hash-update cookies key
>>>                 (curry append (regexp-split #rx"&" val))
>>>                 empty)))
>>>
>>> (not-cookie-parse ex)
>>> =>
>>> #hash((#"teaching-order" . (#"course")) (#"__utmz" .
>>> (#"165257760.1272597702.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\r\n")))
>>>
>>> Jay
>>>
>>>>
>>>> --- nadeem
>>>>
>>>>
>>>> On Fri, Apr 30, 2010 at 12:26 AM, Jay McCarthy <jay.mccarthy at gmail.com> wrote:
>>>>> Hi Nadeem,
>>>>>
>>>>> The problem is that this:
>>>>>
>>>>> "teaching-order=course;
>>>>> __utmz=165257760.1272597702.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\r\n"
>>>>>
>>>>> is not a valid cookie. In particular, the characters to the right of
>>>>> the second "=":
>>>>>
>>>>> "165257760.1272597702.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\r\n"
>>>>>
>>>>> should be a VALUE, which is a value, which is either a token or a quoted-string.
>>>>>
>>>>> It is not a token because it contains "(", ")", and "=".
>>>>>
>>>>> It is not a quoted-string because it is not wrapped in "".
>>>>>
>>>>> The Web Server does not throw an exception when it is asked to parse
>>>>> an invalid cookie string, instead it returns the empty list of
>>>>> cookies. You can look at the request's headers directly to do
>>>>> something to this header, but since it is not a cookie, the Web
>>>>> Server's cookie parsing cannot do anything with it.
>>>>>
>>>>> For Reference:
>>>>>
>>>>> http://tools.ietf.org/html/rfc2965 [for VALUE and value]
>>>>> http://tools.ietf.org/html/rfc2616 [for token and quoted-string]
>>>>>
>>>>> Jay
>>>>>
>>>>> On Thu, Apr 29, 2010 at 10:05 PM, Nadeem Abdul Hamid <nadeem at acm.org> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I've been experimenting with a simple servlet to drive my website, but am having a strange problem with cookies not getting through when a Google Analytics cookie (__utmz) is present in the browser's request to the servlet. I've included a step-by-step trace below, but briefly, here's the problem: Starting from an empty browser cache of cookies, I request a page from my servlet that sets a cookie; when I then request a page again, I see that the cookie is sent by the browser and received by the servlet. However, then I visit another web page on the same domain as my servlet, which sets a __utmz (apparently a Google Analytics cookie). After this cookie is set, when I request pages from my servlet, I see that the browser is sending all the cookies, but none are getting through to the servlet! If I clear the __utmz cookie from the browser, and then request pages from the servlet, the servlet receives the cookie again.
>>>>>>
>>>>>> So, I haven't peeked into the webserver code (no time yet), but the question is: is there some reason that the __utmz cookie (and not any other __utma, __utmc, cookies) seems to cause interference with all other cookies getting through to it?
>>>>>>
>>>>>> Below are the steps that I take to replicate this problem, as well as snippets from my code. I added some terminal output to the servlet dispatch function that is passed to serve/servlet so that it displays the cookies it receives and the URL being requested before doing anything else.
>>>>>>
>>>>>> I appreciate any help/insight. Thanks,
>>>>>>
>>>>>> --- nadeem
>>>>>>
>>>>>>
>>>>>> **************************************************************
>>>>>> (Step 1)
>>>>>> Browser request (sniffed using Wireshark):
>>>>>>   GET /~nhamid/teaching/ HTTP/1.1\r\n
>>>>>>   (no cookies sent)
>>>>>>
>>>>>> Servlet output:
>>>>>>   ()/~nhamid/teaching/
>>>>>>
>>>>>>
>>>>>> (Step 2)
>>>>>>   request page that sets a cookie (and redirects to /~nhamid/teaching/course…)
>>>>>>
>>>>>>
>>>>>> (Step 3)
>>>>>> Browser request:
>>>>>>   GET /~nhamid/teaching/course HTTP/1.1\r\n
>>>>>>   Cookie: teaching-order=course\r\n
>>>>>>
>>>>>> Servlet output:
>>>>>>   ((teaching-order course #f #f))/~nhamid/teaching/course
>>>>>> (See the cookie is received.)
>>>>>>
>>>>>>
>>>>>> (Step 4) visit another page (www.berry.edu) -- sets __utmz (google analytics cookie) and a bunch of others, which I cleared out, except for the __utmz one.
>>>>>>
>>>>>>
>>>>>> (Step 5)
>>>>>> Browser request:
>>>>>>   GET /~nhamid/teaching/ HTTP/1.1\r\n
>>>>>>   Cookie: teaching-order=course; __utmz=165257760.1272597702.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)\r\n
>>>>>>
>>>>>> Servlet output:
>>>>>>   ()/~nhamid/teaching/
>>>>>> (Note: no cookies received at all!)
>>>>>>
>>>>>>
>>>>>> (Step 6) delete the __utmz cookie from the browser. This is the *only* thing I change.
>>>>>>
>>>>>>
>>>>>> (Step 7)
>>>>>> Browser request:
>>>>>>   GET /~nhamid/teaching/ HTTP/1.1\r\n
>>>>>>   Cookie: teaching-order=course\r\n
>>>>>>
>>>>>> Servlet output:
>>>>>>   ((teaching-order course #f #f))/~nhamid/teaching/
>>>>>> (Cookie is received again!)
>>>>>>
>>>>>> **************************************************************
>>>>>>
>>>>>> Scheme code:
>>>>>>
>>>>>> (serve/servlet my-dispatch
>>>>>>               #:listen-ip #f
>>>>>>               #:launch-browser? #f
>>>>>>               #:servlet-path "/nhamid/index.ss"
>>>>>>               #:servlet-regexp #rx""
>>>>>>               #:extra-files-paths (list htdocs)
>>>>>>               #:stateless? false)
>>>>>>
>>>>>> (define-values (web-dispatch web-url)
>>>>>>  (dispatch-rules
>>>>>>   [("~nhamid" "index.ss") render-home]
>>>>>>   [("~nhamid") render-home]
>>>>>>   [("~nhamid" "") render-home]
>>>>>>   [("~nhamid" "editlinks") render-editlinks]
>>>>>>   [("~nhamid" "teaching" "") render-teaching]
>>>>>>   [("~nhamid" "teaching" "semester") render-teaching-by-semester]
>>>>>>   [("~nhamid" "teaching" "course") render-teaching-by-course]
>>>>>> ))
>>>>>>
>>>>>> (define (display-cookies req)
>>>>>>  (let ([cookies (request-cookies req)])
>>>>>>    (display
>>>>>>    (map (lambda (c) (list (client-cookie-name  c)
>>>>>>                           (client-cookie-value c)
>>>>>>                           (client-cookie-domain c)
>>>>>>                           (client-cookie-path c)))
>>>>>>         cookies))))
>>>>>>
>>>>>> ;; my-dispatch : request -> response
>>>>>> (define (my-dispatch req)
>>>>>>  (display-cookies req)
>>>>>>  (display (string-append (url->string (request-uri req)) "\n"))
>>>>>>  (web-dispatch req))
>>>>>>
>>>>>>
>>>>>> _________________________________________________
>>>>>>  For list-related administrative tasks:
>>>>>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jay McCarthy <jay at cs.byu.edu>
>>>>> Assistant Professor / Brigham Young University
>>>>> http://teammccarthy.org/jay
>>>>>
>>>>> "The glory of God is Intelligence" - D&C 93
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nadeem Abdul Hamid
>>>> Assistant Professor, Computer Science
>>>> Berry College
>>>> PO Box 5014
>>>> 2277 Martha Berry Hwy NW
>>>> Mount Berry, GA 30149-5014
>>>> (706) 368-5632
>>>> http://cs.berry.edu/~nhamid/
>>>>
>>>
>>>
>>>
>>> --
>>> Jay McCarthy <jay at cs.byu.edu>
>>> Assistant Professor / Brigham Young University
>>> http://teammccarthy.org/jay
>>>
>>> "The glory of God is Intelligence" - D&C 93
>>> _________________________________________________
>>>  For list-related administrative tasks:
>>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>>
>
>



-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://teammccarthy.org/jay

"The glory of God is Intelligence" - D&C 93


Posted on the users mailing list.