When I finished writing my book, Statistical Programming with SAS/IML Software, I was elated. However, one small task still remained.
I had to write the index.
How Long Should an Index Be?
My editor told me that SAS Press would send the manuscript to a professional editor who would index the book for me. They did, and I got back about 30 pages of terms and page numbers! The index shrunk a little when I incorporated all those terms into LaTeX which displays an index in two columns. However, it still occupies 11 pages and includes almost 1,200 entries!
I was a little annoyed. The entire book is only 440 pages—why is the index so long? I asked a colleague of mine who is an editor, and she said it didn't seem long to her. She referred me to Technical Writing 101: A Real-World Guide to Planning and Writing Technical Documentation (p. 148):
In technical documentation,... the rule of thumb is that the index should contain approximately one page of two-column index entries for every 20 pages of documentation.
That's crazy, I thought. There's no way my book should have an 22-page index! That might be a valid rule of thumb for documentation, but not for a statistical book.
But if I reject that rule of thumb, how can I determine whether the length of my index is appropriate compared to the length of my book? I decided to do what any sensible statistician would do in this situation: collect data.
Data for Index Length versus Book Length, by Publisher
A colleague and I marched down to the SAS library. We sampled books from three different publishers: SAS Press, Springer, and Wiley. For each book we recorded the number of pages (not including the index) and the number of pages for the index. You can download the data and the SAS/IML Studio program that analyzes the data.
The previous image summarizes the data. It shows the size of each book's index versus the size of the book. The books are colored according to the publisher. Because the data for each publisher appear to be linear except for outliers (for example, the 447-page Springer book with no index, which is a set of conference proceedings), I fit and added robust regression lines computed by the LTS algorithm in the ROBUSTREG procedure. I used a no-intercept model so that I could more easily interpret the slope of the regression line as a ratio that reveals the expected number of index pages, given the number of pages in the book. (The regression lines look similar for a model that includes an intercept term.)
The coefficients of the regression model indicate that SAS press books average about one page of index for each 26 pages of text. In contrast, the other publishers average about one page of index for each 65 pages of text. In other words, for this sample the professionally compiled SAS Press indexes contain, on average, more than twice as many pages as indexes of comparable books printed by other publishers.
The analysis reveals that the index of my book (marked by an "X") is not too long, at least by the SAS Press standards. In fact, the length of my index is low compared with other SAS Press books, although it is on the high end of Springer and Wiley books of similar length.
I contacted a colleague who wrote one of the Springer books in the sample. I asked whether he had used a professional indexer, or had compiled the index himself. He said that he created the index himself:
We assumed it would cost us [to use a professional indexer], and we were concerned that the indexer wouldn't understand the topic. Doing it myself enabled me to ensure that spelling was consistent. Initially I intended to get students to test the index for me, but at the end of the process I had rather lost my enthusiasm!
I cannot comment on the practices of other publishers, but SAS Press does not charge their authors for this professional service. Still, I share his sense of fatigue: writing an index is hard and tiring work.
If I had compiled the index myself, would I have had the stamina to include 1,200 entries? Definitely not. But am I glad that my index is thorough, useful, and professionally compiled? Absolutely.
The cost to me? Three weeks of drudgery. The benefit to my readers? Priceless. Thanks, SAS Press!