Telling a Story
A cornerstone of good writing is identifying what the reader needs to learn. A paper is a sequence of concepts, building from a foundation of knowledge assumed to be common to all readers up to new ideas and results. Thus an effective paper educates its readers. It leads readers from what they already know to new knowledge you want them to learn. For this reason, the body of a good paper—everything between the introduction and the conclusions—should have a logical flow that has the feel of a narrative.
The narrative told by a paper is a walk through the ideas and outcomes. It isn’t a commentary on the research program or the day-to-day activities of the participants, nor is it meant to be mysterious. Instead, it is like a guided tour through a gallery, in which each room contains something new for the readers to comprehend. There is also an expectation of logical closure. The early parts of the paper’s body typically explain hypotheses or claims; the reader expects to discover by the end whether these are justified.
There are several common ways for structuring the body of a paper, including as a chain, by specificity, by example, and by complexity. Perhaps the most common structure is the first of these alternatives, a chain in which the results and the background on which they build dictate a logical order for presentation of the material. First might come, say, a problem statement, then a review of previous solutions and their drawbacks, then the new solution, and finally a demonstration that the solution improves on its predecessors.
The “compression for fast external sorting” project suggests a structure of this kind. The problem statement consists of an explanation of external sorting and an argument that disk access costs are a crucial bottleneck. The review explains standard compression methods and why they cannot be integrated into external sorting. The new solution is the compression method developed in the research. The demonstration is a series of graphs and tables based on experiments that compare the costs of sorting with and without compression.
For some kinds of results, other structures may be preferable. One option is to structure by specificity, an approach that is particularly appropriate for results that can be divided into several stages. The material is first outlined in general terms, then the details are progressively filled in. Most technical papers have this organization at the high level, but it can also be used within sections.
Material that might have such a structure is an explanation of a retrieval system. Such systems generally have several components. For example, in text retrieval a parser is required to extract words from the text that is being indexed; this information must be passed to a procedure for building an index; queries must likewise be parsed into a format that is consistent with that of the stored text; and a query evaluator uses the index to identify the records that match a given query. The explanation might begin with a review of this overall structure, then proceed to the detail of the elements.
Another structure is by example, in which the idea or result is initially explained by, say, applying it to some typical problem. Then the idea can be explained more formally, in a framework the example has made concrete and familiar. The “compression for fast external sorting” could also be approached in this way. The explanation could begin by considering, hypothetically, the likely impact of compression on sorting. To make the discussion more concrete, a couple of specific instances—a small relation and a large relation, say—could be used to illustrate the expected behaviour in different circumstances. Given a clear explanation of the hypothetical scenario, you can then proceed to fill in details of the method that was tested in the research.
Another alternative is to structure the body by complexity. For example, a simple case can be given first, then a more complex case can be explained as an extension, thus avoiding the difficulty of explaining basic concepts in a complex framework. This approach is a kind of tutorial: the reader is brought by small steps to the full result. For example, a mathematical result for an object-oriented programming language might initially be applied to some simple case, such as programs in which all objects are of the same class. Then the result could be extended by considering programs with inheritance.
Structuring by complexity is good for a paper but, often, inappropriate for ongoing research. It is not uncommon to see a paper in which the authors have solved an easy case of a problem, say optimizations for iteration-free programs, motivated by hopeful claims such as “we expect these results to throw light on optimization of programs with loops and recursion”. All too often the follow-up paper never appears.
Some other structures are inappropriate for a write-up. For example, the paper should not be a chronological list of experiments and results. The aim is to present the evidence needed to explain an argument, not to list the work undertaken. Most experiments yield far more data than can be presented in a paper of reasonable length. Important results can be summarized in a graph or a table, and other outcomes reported in a line or two. It is acceptable to state that experiments have yielded a certain outcome without providing details, so long as those experiments do not affect the main conclusions of the paper (and have actually been performed). Similarly, there may be no need to include the details of proofs of lemmas or minor theorems. This does not excuse you from conducting the experiments or convincing yourself that the results are correct, but such information can be kept in logs of the research rather than included in the paper.
The traditional structure for organizing research papers can encourage you to list all proofs or results, then analyze them later; with this structure, however, the narrative flow is often poor. It usually makes more sense to analyze proofs or experimental results as they are presented, particularly since experiments or theorems often follow a logical sequence in which the outcome of one dictates the parameters of the next.
When describing specific results, it is helpful, although not always possible, to begin with a brief overview of whatever has been observed. The rest of the discussion can then be used for amplification rather than further observations. Newspaper articles are often written in this way. The first sentence summarizes the story; the next few sentences review the story again, giving some context; then the remainder of the article presents the whole story in detail. Sections of research papers can sometimes be organized in this way.