[racket] [OT] Re: Fundamentals

From: Jakub Piotr Cłapa (jpc-ml at zenburn.net)
Date: Sat Oct 16 15:18:35 EDT 2010

On 16.10.10 18:29, Neil Van Dyke wrote:
> Jakub Piotr Cłapa wrote at 10/16/2010 11:57 AM:
>> On 14.10.10 11:44, Noel Welsh wrote:
>>> The distinction is a bit arbitrary, as with a one-to-one mapping you
>>> can convert from one to the other with no loss of information.
>>
>> AFAIK this is true in general since in the raw binary form you have to
>> distinguish code from data and as far as my knowledge od disassemblers
>> goes it is not a trivial problem to solve.
>
> Just in case any student is confused: by "one-to-one", Noel was speaking
> of assembly language when it used as a target code of a compiler (note
> his "usually"). You might see assembly language code that uses macros
> and descriptive symbolic labels, which have information that is lost
> when assembling to its target machine (or virtual machine / p-code)
> code, especially if it is handwritten. Plus, if the assembly language
> was handwritten, the comments are often indispensable, and they, too,
> are lost when going to target code.

Don't get me wrong: I think Noel's message was a very good answer to the 
original question.

I was not thinking about comments and descriptive labels. My comment was 
reffering to the fact that in the general case disassembly is not just a 
trivial process because detecting the data (constant strings, numbers, 
memory addresses in ARM) among the instructions is difficult without 
object file format support and/or debugging information. AFAIU you 
basically need to run/emulate the code and detect all the possible jump 
targets (including indirect ones and conditional branches) which IIUC is 
equivalent to solving the halting problem.

That said, as you point out it is not important when one-way generating 
machine code since it does not matter much whether you generate assembly 
or machine code. (hence my addition of [OT])

-- 
regards,
Jakub Piotr Cłapa


Posted on the users mailing list.