[plt-scheme] Performance Targets for MzScheme

Wed May 12 20:50:08 EDT 2004

from Python's src/Objects/fileobject.c:

static PyObject *
file_readlines(PyFileObject *f, PyObject *args)
{
[huge C function...120 lines]
}

from Python's src/Objects/stringobject.c:

static PyObject *
string_split(PyStringObject *self, PyObject *args)
{
[snip...8 lines]
	if (subobj == Py_None)
		return split_whitespace(s, len, maxsplit);
[snip...52 lines]
}

and...

static PyObject *
split_whitespace(const char *s, int len, int maxsplit)
{
	int i, j;
	PyObject *str;
	PyObject *list = PyList_New(0);

	if (list == NULL)
		return NULL;

	for (i = j = 0; i < len; ) {
		while (i < len && isspace(Py_CHARMASK(s[i])))
			i++;
		j = i;
		while (i < len && !isspace(Py_CHARMASK(s[i])))
			i++;
		if (j < i) {
			if (maxsplit-- <= 0)
				break;
			SPLIT_APPEND(s, j, i);
			while (i < len && isspace(Py_CHARMASK(s[i])))
				i++;
			j = i;
		}
	}
	if (j < len) {
		SPLIT_APPEND(s, j, len);
	}
	return list;
  onError:
	Py_DECREF(list);
	return NULL;
}


Daniel


On Wed, 12 May 2004 20:32:26 -0400, Matthias Felleisen
<matthias at ccs.neu.edu> wrote:
> 
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
> 
> Could you please check whether the string tokenizer codes are written
> in Python or Perl respectively? It is my impression that Python is just
> a scripting language for C-libraries. -- Matthias
> 
> 
> 
> 
> On May 12, 2004, at 8:25 PM, Brent Fulgham wrote:
> 
> >
> > --- Matthias Felleisen <matthias at ccs.neu.edu> wrote:
> >>   For list-related administrative tasks:
> >>
> >> [matthias-ti:~/Desktop] matthias% mzscheme -r
> >> test-gen.ss
> >> [matthias-ti:~/Desktop] matthias% time ./testfile.py
> >> 1.380u 0.090s 0:01.51 97.3%     0+0k 0+1io 0pf+0w
> >>
> > [ ... snip ...]
> >> [matthias-ti:~/Desktop] matthias% time mzscheme -r
> >> testfile.ss
> >> 1.820u 0.160s 0:02.04 97.0%     0+0k 0+0io 0pf+0w
> >>
> >> Okay, we lose by either 4.5 or .4 depending on how
> >> you count. That is
> >> slower but not an order of magnitude.
> >
> > Excellent !
> >
> > It looks as though the culprit is string-tokenize.  I
> > also tried a regular-expression-based version and
> > found it to be about an order of magnitude faster.
> >
> > For comparison:
> >
> >                       tokenize        regular-expr
> > Perl                   2.5721          2.3654
> > Python                 1.8225          1.7576
> > PLT Scheme            40.5884          3.6522
> >
> > This is with the naive do-ec implementation of the
> > loop, not processing directly to the port (so there is
> > room for improvement).  The second set of numbers is
> > much better.
> >
> > -Brent
> 
>