[racket] net/http-client

From: Erik Pearson (erik at adaptations.com)
Date: Thu Sep 19 14:21:56 EDT 2013

Just finishing up a two day wrestling match with net/url, couchdb, and
assorted other incomplete libraries ... I have some fresh insight.

First, the new http client library will be very welcome!

A quick review of the existing code relieves me of most of my worries about
net/url, which is a bit of a beautiful mess. In particular, I think that
some of the code that analyzes the header is too brittle. In net/url a
header is expected to be nearly perfectly formed. For instance, the
"chunked" condition is tested for by a match against the literal header
string "Transfer-Encoding: chunked". Just a difference in case, or an extra
space somewhere will lead to a false negative. In the new code, this is
done by a much more forgiving regexp comparison. However, I'm not sure this
goes far enough. More on that in a sec.

I would like to make some recommendations:
- keep text as bytstrings -- it looks like this is the way the new library
works (but not net/url.)
- the status line should be parsed and the status code converted to an
integer.
- The header should be parsed into the simplest usable form. I would
suggest that an alist, with the key being a lower cased symbol and the
value being the original bytestring.
- When forming a header for a request, the same format should be used. I
realize that if one wants proper-cased field names, the library will need
to perform this formatting.
- All information should be made available back to the api user.
- Standard header field values should be optionally parsed

These changes would make the library much more useful. It would be
consistent with http library design across languages. It would also promote
more uniformity across Racket libraries which use them. I've found that
when diving into a library (currently I'm dealing with couchdb) that has to
reinvent the http wheel -- it may be reinvented with a slight twist. If one
is using multiple libraries together, each with a different idea of what a
header or https status line is, well, things get really unnecessarily
complicated.

I think the core library could benefit from this as well. For instance,
when looking for fields in the header, it would be more reliable to find
the field in an alist, than to use an regex to extract values out of the
bytestring. This is a very common operation when dealing with http, and I
think it should be as simple and reliable as practical.

Finally I would recommend that there be a facility for decoding and
encoding standard header fields. The application of the parsing would be
optional, but I think it is important to have an implementation in the core
library. The header field names and formats are well specified, and there
are not all that many of them. They could be supplied as a hash or aref,
with the field name lower-cased symbol (as in the parsed header) as key and
two functions, one to decode from a bytstring, and one to encode into a
bytestring. The simplest, most racketiest data structure would be best. A
library user could grab this translation database and extend or modify it
at will, but they would have a great head start.

Anyway, this is far from exhaustive, but these are some of the issues I've
grappled with in recent hours and are right on the top of my head.


On Tue, Sep 17, 2013 at 6:51 AM, Jay McCarthy <jay.mccarthy at gmail.com>wrote:

> I think it's an obvious request, but a character flaw of mine is not
> doing things unless they can be done really good. In this case, I see
> a hash table as a "parse" of the headers. It's not obvious to me how
> to parse them. For example...
>
> - The same header can appear many times, so (Key -> Value) is
> incorrect, unless you overwrite one. It would be better to have (Key
> -> (Listof Value)) but that feels really ugly since most of the time
> there will just be one
> - The spec doesn't mandate case sensitivity on headers, so I would
> need to canonicalize "ACCept-ENCodiNG" to something else. Maybe
> 'Accept-Encoding?
> - The value of many headers is not really a string. For instance,
> Content-Length is a number, Cache-Control is an association list,
> Content-Disposition is complicated, etc. I feel like it is
> disingenuous to only partial parse.
> - Dealing with all this may be wasted effort for most requests that
> just care about the body
>
> For these reasons, I think http-client should just return the list of
> bytes. I think it would be nice to have another function that parses
> that so clients could optionally call it if it is important. That can
> be part of http-client.
>
> Jay
>
>
>
> On Tue, Sep 17, 2013 at 3:41 AM, adam moore <nerdfunk at gmail.com> wrote:
> > Hi Jay,
> >
> > Just looking over the new http client code - looking very nice, and
> > much better than my current slapped together parsing of
> > get-impure-port.
> >
> > I was wondering if it might be better to pass back the headers as
> > something easier to look up against, for example as a hash table? Or
> > perhaps, provide an option to do so. I think it's a pretty common use
> > case to provide for.
> >
> > Thanks again,
> > Adam
>
>
>
> --
> Jay McCarthy <jay at cs.byu.edu>
> Assistant Professor / Brigham Young University
> http://faculty.cs.byu.edu/~jay
>
> "The glory of God is Intelligence" - D&C 93
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>



-- 
Erik Pearson
Adaptations
;; web form and function
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130919/ed11ef33/attachment.html>

Posted on the users mailing list.