[plt-scheme] PLT with very large datasets?
Last year I skimmed a bit at what was available, and you are right,
there is a lot of information. (In fact, there is probably too much;
doing a survey of all that is a project in itself.)
In particular, I remember reading a high level description of what the
current top team (bellkor/korbell) is doing: they implemented many
(>100) different well-known algorithms and their variants. Then they
combine the predictions from each using a linear meta-model.
Individually, even the best algorithm is pretty weak, but combining
predictions from complementary algorithms helps a lot.
Given this kind of effort already put into them, it seems that there
is very little room to improve well-known algorithms by a newbie like
me. It should be certainly more fun to try (and fail miserably) to
come up with my own ;)
The papers I have seen tend to be full of gory details of math
involved, but have precious little on the infrastructure used to
implement them.
The link you gave looks interesting, and they have source available. I
will check it out.
Thanks :)
--Yavuz
On Mon, Jun 30, 2008 at 11:42, Noel Welsh <noelwelsh at gmail.com> wrote:
> You might be interested in online learning algorithms, which do not
> need to hold all data in memory. Here is one example:
>
> http://hunch.net/~vw/
>
> If you haven't done so already it is worth reading the literature on
> the Netflix prize. There is a fair amount out there which will give
> you ideas on how to process the data.
>
> HTH,
> Noel
>