CHAPTER 5 Addendum
Data Processing in the '70s and LaterAnalysing size-density data, when I conducted the international study in the early 1970s, was much different than it would be today. It was much more labor intensive in terms of data retrieval and analysis. It usually involved many more people (volunteer student asssistants, staff in the campus Computer Center, perhaps staff at the campus Bureau for Faculty Research). And it meant a lot of running around from the library to my office to the comuter center.Here is the data for Algeria, Fig 5-3, from the Britannica World Atlas, a bulky 11-by-16-inch volume of maps and statistical tables.
The standard procedure was to write out the needed data on a data entry form, a ruled sheet which indicated the fields needed in the keypunching process (e.g., a row for each territorial unit, with columns 1-20 for the unit's name, 21-30 for its area, 31-40 for its population). I included the name of each division, both as an aid in later possible error correction and in order to readily identify any peculiar "outliers" or other oddities. The only problem here was to chose between the populations shown for 1954 and 1960; I invariably chose the most recent data, so I picked the 1960 census. Writing it all out was simply tedious.
The filled-in data entry forms were then taken to the campus Computer Center for keypunching. If you had a grant you could pay people to do such work, if not (my case) you punched the data yourself on a machine like that shown here.
Blank cards were stacked in a hopper on the right of the keyboard and fed, one at a time, into the "punching station" just behind the keyboard. Data were keyed in from the appropriate row (territorial division) on the data entry sheet. When the card for a given territorial division was finished, it was shoved through a "reading station" and into the card hopper on the left.
Ordinarily with such data, you punched the whole set twice, then ran both sets through a verifier to check for punching errors. Once the entire set of data cards (1764 territorial units, 98 nations) was ready you then wrote the program -- a series of statements in FORTRAN, PL/1 or BASIC -- and punched those onto cards as well, one card for each line of the program. A couple of "job entry" cards (identifying you, your project, etc.), the program deck and the data deck were then turned into the receiving desk of the Computer Center. The staff of the center would then process your deck on the IBM 360 mainframe computer. Turn-around time might be several hours or perhaps a whole day. Often what you got back was a list of errors (project identification errors, programming errors, unreadable card errors), any of which would terminate your job. This meant going back through everthing, discovering the problem, then going back to the Computer Center to repunch the offending cards, and resubmitting. If you were lucky, sooner or later you could pick up results such as these, printed on "greenbar paper" (helped the eye follow lines across). The printer was really a glorified monotype typewriter - no font variations, no pictures, just text like this:
While the computer's printer couldn't produce actual graphic results (I later did have access to a graphics plotter), it could be programmed to approximate a scatter diagram. You had it determine the maximum and minimum values for X and Y (log density, log area), then assign these to the rows and columns at the top, left, bottom and right of the typing which would appear on a single page of greenbar paper. Next you had it convert (a simple ratio problem) calculated X and Y values for each territorial unit into an appropriate cell determined by row and column of the printing surface. If two or more units occupied the same cell, you had to show that through incrementation (like the "3" in the case of Algeria):
With a pencil, ruler and little further effort you could estimate the approximate locations on the axes for integral unit values (marked here in blue). By locating two other points, the midpoint (MEAN X, MEAN Y) and the intercept (X=0, A), you could also draw a regression line (shown in red here). These niceties were simply drawn on with a felt-tip or ballpoint pen.
Finally, you could refer the computed value of T (-8.36370 in the printout above) to a table like this one, from the CRC Standard Mathematical Tables,27th ed., p 546:
The "degrees of freedom" for 15 cases is 13 (N - 2), and the t-value (ignoring the minus sign) exceeds the largest value shown, so we conclude that Algeria "rejects the null hypothesis" (no relation between size and density) with probability < .0005; essentially, this means Algeria strongly supports the size-density hypothesis, which is obvious from the scatter diagram. If I wanted a fancy version of my Algeria scattergram, say for publication, I took my sketch (the greenbar printing with the added blue and red ink info) to the campus Bureau for Faculty Research. There a professional graphic artist (Joy Dabney) would place it on a light table and trace a chart with india ink on vellum. The result looked something like this:
Beyond the '70sIn the late 1970s I got a $195 Hewlett-Packard HP25 scientific programmable calculator (shown here half size). It enabled me to do a lot of my size-density work independently from the keypunch and mainframe of the campus Computer Center, in my office or while working on a tan in the backyard.
The main drawback to using this handy little (6 oz) machine was that you had to enter all the area and population figures of a country's terrritorial divisions each time you ran the analysis. There was no way to store entered data for correction or further use. Another problem was that there was no printer. The program would compute logs, sum the squares and cross products, compute means and variances, then store these in various statistical registers. You had to write down these results or lose them.
With such results you could compute the t-test (also programmed) for the null
Initially I had no graphics display, so I still had to pretend, in text mode only, that I was creating scatter diagrams as on the Computer Center's printer, scaling computed XY-values to the number of lines and columns of screen display. Later, with a graphics card/monitor, I could write programs which would do virtually anything I could think of. The only thing tedious at this point was initial data entry. Fortunately, as my poor vision proceded to get worse, I had more and more student help. In 1984 the fellow who succeded me as department chair offered me a new computer, the very first Macintosh. He loved it (didn't have to learn all those pesky DOS commands) and thought I would, too. It was awful. All the memory (128k!) was taken up with graphic display of text. I couldn't run any data analyses because it simply couldn't handle the data sets and programs, which were not very demanding. He later showed up with a 512k model which would do the job, for the most part, so I accepted his (the department's) generosity. Thus began a long-term fondness for Macs, reflected in this display of the machines I've used. The last one the department supplied was the IIfx which, ten years later, is still functioning as my office machine (though I don't ask much of it).
Most of the programming I've done since the days of the IBM 360 has been in BASIC. It was quite adequate for all my size-density research and for most other projects as well. But with each advance in the Macintosh computer, the BASIC application has become less and less stable (I presume because no one was writing new versions). Now it's as likely to crash the system as not. In contrast, graphics programs have improved dramatically since I started using Macs. I have gone through successive versions of Deneba's "Canvas" - a bit-map and object-based application which enabled me to take graphics generated from BASIC programs or Excel spreadhseet charts (and still more recently, scanners) and fix them up a great deal. GraphicConverter, another Mac (shareware) application, has also been very useful. The arrival of the internet has dramatically changed methods of obtaining data. You can get lots of area/population figures now, and maps, directly from their sources, without any of the tedious data entry problems of the old keypunch era. The internet also permits you to disseminate results easily, as I am doing here, using whatever graphics you wish (even animations) and without going through the endless and usually pointless process of submitting your work to editors who often know less of your subject and techniques than you and your readers know. |