[plt-scheme] Performance Targets for MzScheme

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Wed May 12 21:10:39 EDT 2004

[I was hoping you'd read the thread.]

So this shows that the problem is with srfi's. They are coded in 
Scheme, and what you're measuring is the interpretation time for each 
call to split and all the Scheme calls within. For each function call, 
the interpreter "loops" once (roughly). If, on the other hand, you call 
a "native" C function, you do ONE function call. Period. Period.

Okay, guys keep this in mind when you write srfis and measure 
benchmarks. Until we have a compiler, we can't rely on srfi coding for 
efficiency.

-- Matthias

P.S. Ryan, can you measure the performance of this thing under .Net? 
Thanks.

On May 12, 2004, at 8:50 PM, Daniel Silva wrote:

>   For list-related administrative tasks:
>   http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>
> from Python's src/Objects/fileobject.c:
>
> static PyObject *
> file_readlines(PyFileObject *f, PyObject *args)
> {
> [huge C function...120 lines]
> }
>
> from Python's src/Objects/stringobject.c:
>
> static PyObject *
> string_split(PyStringObject *self, PyObject *args)
> {
> [snip...8 lines]
> 	if (subobj == Py_None)
> 		return split_whitespace(s, len, maxsplit);
> [snip...52 lines]
> }
>
> and...
>
> static PyObject *
> split_whitespace(const char *s, int len, int maxsplit)
> {
> 	int i, j;
> 	PyObject *str;
> 	PyObject *list = PyList_New(0);
>
> 	if (list == NULL)
> 		return NULL;
>
> 	for (i = j = 0; i < len; ) {
> 		while (i < len && isspace(Py_CHARMASK(s[i])))
> 			i++;
> 		j = i;
> 		while (i < len && !isspace(Py_CHARMASK(s[i])))
> 			i++;
> 		if (j < i) {
> 			if (maxsplit-- <= 0)
> 				break;
> 			SPLIT_APPEND(s, j, i);
> 			while (i < len && isspace(Py_CHARMASK(s[i])))
> 				i++;
> 			j = i;
> 		}
> 	}
> 	if (j < len) {
> 		SPLIT_APPEND(s, j, len);
> 	}
> 	return list;
>   onError:
> 	Py_DECREF(list);
> 	return NULL;
> }
>
>
> Daniel
>
>
> On Wed, 12 May 2004 20:32:26 -0400, Matthias Felleisen
> <matthias at ccs.neu.edu> wrote:
>>
>>  For list-related administrative tasks:
>>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>>
>> Could you please check whether the string tokenizer codes are written
>> in Python or Perl respectively? It is my impression that Python is 
>> just
>> a scripting language for C-libraries. -- Matthias
>>
>>
>>
>>
>> On May 12, 2004, at 8:25 PM, Brent Fulgham wrote:
>>
>>>
>>> --- Matthias Felleisen <matthias at ccs.neu.edu> wrote:
>>>>   For list-related administrative tasks:
>>>>
>>>> [matthias-ti:~/Desktop] matthias% mzscheme -r
>>>> test-gen.ss
>>>> [matthias-ti:~/Desktop] matthias% time ./testfile.py
>>>> 1.380u 0.090s 0:01.51 97.3%     0+0k 0+1io 0pf+0w
>>>>
>>> [ ... snip ...]
>>>> [matthias-ti:~/Desktop] matthias% time mzscheme -r
>>>> testfile.ss
>>>> 1.820u 0.160s 0:02.04 97.0%     0+0k 0+0io 0pf+0w
>>>>
>>>> Okay, we lose by either 4.5 or .4 depending on how
>>>> you count. That is
>>>> slower but not an order of magnitude.
>>>
>>> Excellent !
>>>
>>> It looks as though the culprit is string-tokenize.  I
>>> also tried a regular-expression-based version and
>>> found it to be about an order of magnitude faster.
>>>
>>> For comparison:
>>>
>>>                       tokenize        regular-expr
>>> Perl                   2.5721          2.3654
>>> Python                 1.8225          1.7576
>>> PLT Scheme            40.5884          3.6522
>>>
>>> This is with the naive do-ec implementation of the
>>> loop, not processing directly to the port (so there is
>>> room for improvement).  The second set of numbers is
>>> much better.
>>>
>>> -Brent
>>
>>



Posted on the users mailing list.