[plt-scheme] Looking for a static analysis tool
Dominique Boucher wrote:
>The analysis tool I am looking for shall meet the following requirements:
Well, here's what I can tell you regarding MrFlow:
>1. New types and primitives can be added.
>
> Most of the SGDL primitives generate 3D volumes in some internal form.
> I would like to be able to add these primitives to the set of primitives
> handled by the analyzer instead of relying on the source code of the
> primitives. Also, a volume (and other types as well) shall be considered a
> primitive type of the language.
There'll be at some point in the future a way for users to declare
their own types, and MrFlow will then be able to analyze programs in
terms of those types. Regarding the primitives, see below.
>2. Support for incremental analysis of multi file projects.
>
> Our IDE manages projects composed of source files organized in
> packages. Ideally, the analyzer shall be able to analyze each file
> separately and then run the global analysis. When a file is modified,
> the analysis of the file is done again and the whole global analysis
> is run. Hopefully, doing this would be less expensive than
> re-analyzing the whole project from scratch.
Robby Findler and I are working on a separate analysis based on his
contract system. Contracts added to a module's interface will be used
to analyze uses of the provided functions independently of the source
code of the functions themselves, and the source code of the functions
will be analyzed against the contracts independently of their uses
outside the module.
This means that if you put your SGDL primitives in a module and give
them contracts, the analysis will consider them as if they were
primitives (from the point of view of the code that's using them).
That will make the analysis of any module completely separate from the
analysis of all other modules (provided you've written contracts for
all the interfaces, if you don't then the analyzer will just revert to
analyzing the source code of functions in other modules as if they
were defined in the currently analyzed module).
We already have a prototype of that separate analysis working, and
it's going to make its way into MrFlow probably at the same time as
I'll add modules to the analysis.
Regarding automatically reanalyzing modified modules, I could see a
combination of Robby's module navigator and the analysis which would
find all the modules a given module depends on and automatically
analyze all the ones that have changed. That would probably be a neat
thing to do.
>3. Support for "analysis extensions".
>
> By this, I mean a way to extend the basic analysis framework with new
> special forms. These special forms are implemented in the visualizer,
> so the code of the corresponding macros is not necessarily available to
> the analyzer. Morever, adding extensions to the analyzer has an
> important advantage. It can help obtaining more accurate analysis results.
> Often, expanding the macros result in code much harder to analyze or that
> gives too conservative approximations. [This results from personal
> observations with an CFA-based analyzer run on code generated by an
> LALR(1) parser generater I developed.]
MrFlow only works on the MzScheme core forms (i.e. section 12.6.1 of
the MzScheme manual) that result from completely expanding a program.
I think that's the level were it should be, because then you can use
MrFlow for all the languages that DrScheme supports (not only R5RS,
but also the full MzScheme language, algol60, python, etc... provided
the language implementor has defined types for the primitives in the
language's runtime support) so that's very unlikely to change.
>4. R5RS Scheme only.
>
> This means that the analyzer must not rely on any implementation-dependent
> extension of Scheme, like a module system, in order to obtain more accurate
> results. [Of course, it is the analyzed code that shall be R5RS compliant,
> not the source code of the analyzer. I hope this was obvious from the
> context ;]
MrFlow currently analyzes R5RS plus define-struct. Define-struct is
in there as a test, to see what it would take to analyze generative
structures but it will disappear as more support for the full mzscheme
language is added. Note that then analyzing your code for just the
R5RS language will mean no separate analysis, since separate analysis
will be dependent upon the use of modules.
>I already have a number of ideas regarding most of (if not all) these
>requirements. I worked on similar ideas for my PhD dissertation. I can
>easily do this work by myself. But I don't want to reinvent the wheel. So if
>someone is already working on similar ideas, I am willing to collaborate on
>the design and implementation of such a tool (i.e. on the development side
>or on more theoretical aspects). This way, we could come up with a better,
>more robust tool, with applications in the "real world" ;-) [SGDLstudio will
>soon be distributed in many major universities and research centers across
>North America, Europe, and even Asia].
>
>I know that MrFlow is coming pretty soon. Unfortunately, I don't know if it
>will
>meet these requirements. I have not seen a lot of papers on it. And I don't
>want to download the code without knowing the design decisions that has
>driven its development.
Well, it looks like MrFlow is not going to meet all your requirements,
especially with regard to macros and separate analysis without using
modules. Two of our goals is to have an analysis that can support
multiple languages (so done after macro expansion) and works on large
programs (so based on our module system). But even if in the end you
decide to go for your own tool, there are probably pieces of the
analysis framework that you might be able to steal from MrFlow, and we
can definitely collaborate on those parts. And I'm certainly
interested in hearing from people working on "real" applications :-)
>[BTW, who is developing it?]
Hmmm, that would be me :-)
Philippe