Understanding Functions in High-Level Languages
Modern disassemblers are quite intelligent and take on the lion’s share of recognition of key structures. In particular, IDA Pro successfully copes with the identification of standard library functions, local variables addressed through the RSP register, case branches, and more. However, sometimes IDA is mistaken, misleading the researcher, besides, the high cost of IDA Pro does not always justify its use. For example, students of assembly language (and the best way to learn assembly language is to disassemble other people’s programs), Ida can hardly afford.
Of course, the world did not converge on IDA, there are other disassemblers – for example, the same DUMPBIN, which is included in the standard delivery of the SDK. Why not take advantage of it? Of course, if there is nothing better at hand, DUMPBIN will do, but in this case you will have to forget about the intelligence of the disassembler and use only your head.
Getting Started with Non-Optimizing Compilers
First of all, we will get acquainted with the results of the work of non-optimizing compilers – the analysis of their code is relatively simple and quite understandable even for beginners in programming. Then, having mastered the disassembler, we move on to more complex things – optimizing compilers that generate very tricky, confusing and ornate code.
Identifying Functions
Function identification is the first step in the study of programs written in high-level languages. A function (also called a procedure or subroutine) is the basic building block of procedural and object-oriented languages, so code disassembly usually begins with identifying functions and identifying fictitious arguments passed to them. Strictly speaking, the term “function” is not present in all languages, but even where it is present, its definition varies from language to language.
Without going into details, we will understand a function as a separate sequence of commands called from various parts of the program. A function may take one or more arguments, or it may take none; may return the result of its work, or may not return. The key property of a function is the return of control to the place of its call, and its characteristic feature is the multiple call from different parts of the program (although some functions are called from only one place).
How does the function know where to return control? Obviously, the calling code must first store the return address and, along with other arguments, pass it to the called function. There are many ways to solve this problem: you can, for example, put an unconditional jump to the return address at the end of the function before calling it, you can save the return address in a special variable and, after the function ends, perform an indirect jump using this variable as the operand.