Downloading . . . If the document you requested does not automatically load, please click here.
CONSTRUCTING INSIDER HOLDINGS FROM
COMPUTERIZED SEC TRANSACTIONS DATA
Kenneth H. Johnson
Abstract
Studies of ownership structure generally extract the insider holdings for officers and directors from proxy statements, which has three disadvantages: (1) the data must be manually extracted, assembled, and entered into the experimental dataset; (2) dates for the insider holdings are limited to the dates given in the proxy statements; and (3) the insider holdings include only officers and directors. This paper describes a way to construct the holdings for all insiders, not just officers and directors, at any arbitrary time, for any firm reporting to the Securities Exchange Commision (SEC), by using the data files from the SEC's computerized Ownership Reporting System (ORS). A test of this ORS-based measure against a sample of proxy statements found that, for 90 percent of the 77 useable proxy statements, the ORS-based measure was within 5 percent of the proxy data. The ORS data are noisy, though, and the resulting values for insider holdings could contain both noise and some degree of bias. Although the ORS-based measure almost certainly is not as reliable as proxy data, it is suitable, and sometimes the only feasible alternative, for some research designs. A great deal of time was spent in ferreting out certain important information about the documentation and the data. This paper furnishes that information in sufficient technical detail to substantially flatten the learning curve for other researchers who want to use the ORS data, whether to derive insider holdings or to study other aspects of the data.
INTRODUCTION
A sizeable body of literature now exists regarding the ownership structure of publicly held firms, defined by Jensen and Meckling as "the relative amounts of ownership claims held by insiders (management) and outsiders (investors with no direct role in the management of the firm)" [1, p. 305]. A recent computerized search of the literature for just the years 1989-1992 found 83 articles on the topic. However, there appears to be no computerized database from which a researcher can easily extract historical data on the number of shares held by insiders. A few commercial databases contain proxy data, but only the most recent information; none offers an historical file. Therefore, studies of ownership structure generally extract the insider holdings for officers and directors from proxy statements. This approach has three disadvantages. First, the data must be manually extracted, assembled, and entered into the experimental dataset. Second, the dates for the insider holdings are limited to the dates given in the proxy statements. Third, the holdings include only officers and directors, leaving out the other categories of insiders defined by the SEC. This paper describes a way to construct the holdings for all insiders, not just officers and directors, at any arbitrary time, for any firm reporting to the SEC, by using the data files from the SEC's computerized Ownership Reporting System (ORS). A great deal of time was spent in ferreting out certain important information about the documentation and the data. This paper furnishes that information in sufficient technical detail to substantially flatten the learning curve for other researchers who want to use the ORS data, whether to derive insider holdings or to study other aspects of the data.
The remainder of this paper is organized into four sections. The first section describes several errors and peculiarities in both the ORS documentation and the data. The second section describes the use of the data to construct total insider holdings. It also describes the results of a test of the ORS-based measure of insider-holdings against insider holdings from proxy statements. The third section summarizes the strengths and weaknesses of the ORS-based measure of insider holdings. The final section presents a summary and conclusions. It suggests the best uses for the ORS data, discusses the changes which have been made to the ORS file structures since the empirical work was done for this paper, and describes additional work in progress.
ERRORS AND PECULIARITIES IN ORS DOCUMENTATION AND DATA1
This study is based on the SEC ORS MASTER file for the years 1975 through 1985, obtained in April, 1986. It contained approximately 200,000 insider identification (header or name master) records and 1.2 million transaction records. The data were arranged with the name master records in ascending order by insider identification number, and each insider's transactions following that insider's name master record in ascending order of transaction date. Annotated record layouts for the name master and transactions records are furnished at Appendices A and B, respectively.
Lessons Learned: ORS Documentation
The READ statement in the initial FORTRAN extraction program was constructed by taking at face value the record layouts in the ORS documentation for the two kinds of records, using field lengths and data type (numeric or alphanumeric) as indicated. The program failed when the READ statement encountered data of an incorrect type; i.e., non-numeric data in a field which had been defined as numeric. The first 3,000 records were printed as a 75-byte dummy variable in character format, and the first 100 records were printed again in hexadecimal. Careful examination of those records produced the following information.
Name Records and Transaction Records. In both the name records and the transaction records:
1. All numeric fields are stored in unsigned numeric format, which can be read in either numeric or character format.
2. The Insider Relationship field is alphanumeric, not numeric as shown in the record layouts. This is confirmed on the "Guide to Symbols" page of the ORS documentation, where all of the defined values are clearly alphanumeric.
Name Record. In the name record the Social Security or Taxpayer Number field, bytes 58-67, is ten bytes long, not nine bytes as shown in the record layout. The first byte, for no discernible reason, is always a minus sign (hexadecimal 60).
Transaction Record. In the transaction record:
1. The CUSIP field is alphanumeric, not numeric. Some CUSIP numbers actually contain valid non-numeric characters.
2. The last two characters of the eight-character CUSIP field are blank, and the two-digit issue identifier is contained in a separate numeric field called CUSIP Class of Security. (This approach should be familiar to users of the COMPUSTAT2 files.)
3. Holdings at End of Month, bytes 44-51, is an eight-byte field, not seven as specified in the record layout.3
4. The Holdings at End of Month field must be read as alphanumeric, not numeric as shown in the documentation,. The last byte (byte 51) sometimes contains a non-numeric character to indicate that the sign of the value in the field is negative. (Since this phenomenon is comparatively rare, it was not detected until later in the data analysis. It is mentioned here for completeness. Technical details appear in the next section.)
Lessons Learned: ORS Data
The data-extraction phase of the study revealed certain flaws in the ORS data. The following discussion describes those flaws, the steps taken to correct for those flaws, and a test of the quality of the measure of ownership structure which resulted from the use of the file.
Transaction Date Field. Some transactions contained data of the wrong type in the Transaction Date field. The field was supposed to contain numeric data in Julian-date format (YYDDD). Among the approximately 1.2 million transaction records were 753 records where the first three characters were asterisks and the last two appeared to be the year, 11 records where the field contained only a single zero, and three records where the second character was a hyphen; in all, a total of 767 bad records. Those transactions were discarded.
Holdings at End of Month field. About 1,500 transaction records contained a non-numeric value in byte 51, the last byte of the Holdings at End of Month field. Every occurrence was limited to one of the ten characters }JKLMNOPQR (hexadecimal D0 through D9). A hexadecimal dump of the first such record confirmed that the character "J" was stored at byte 51 as hexadecimal D1. The recurrence of ten specific characters suggested that these were something systematic, rather than data-entry errors, particularly since those characters are contiguous in the IBM EBCDIC sort order. In a telephone conversation with the author, Ms. Joyce Campbell of the SEC described it as a "zone overpunch." It is used to show that the Holdings at End of Month field is negative, a condition that occurs when a market-maker insider has a net short position.
The zone overpunch is a holdover from the days of punch cards, when it was devised to save a card column. If the value is negative, the last position in the field is punched with the combination of holes that would result if the last numeral were overpunched with a minus sign. The characters }JKLMNOPQR have cardpunch patterns equivalent, respectively, to the numerals 0123456789 overpunched with a minus sign.
Each user will have to decide how best to deal with these transactions. For example, the author used FORTRAN's ICHAR function with appropriate supporting programming to convert those records to signed numeric format.
CONSTRUCTING INSIDER HOLDINGS
With all of the data-type errors finally resolved, the ORS file was used to construct the total insider holdings, in shares, for the announcing firm at the announcement date for each of the 15,482 earnings announcements in the event study for which this approach was developed. Each share total was then divided by the total shares outstanding, as taken from the CRSP4 Monthly Stock Master file, to get the percentage of shares held by insiders.
One characteristic of the data helped make the programming reasonably straightforward. Each transaction record contains that insider's total holdings (bytes 44-51) for a particular nature-of-ownership (byte 8) at that transaction date (bytes 11-15), so the program only needs to accumulate the last transaction before the event date for each nature-of-ownership.
Initial Results
The results were sorted in descending order of percent-of-shares-held by insiders in order to examine the distribution. The resulting sorted list contained 725 announcements where insiders supposedly held more than 100 percent of the firm's common stock, a result which was patently impossible and thus somewhat disturbing.
To investigate this phenomenon, the documentation for the ORS file was carefully reconsidered and the complete transactions file for each of the first 15 firms on the sorted list was printed and examined. Those firms accounted for the first 42 announcements on the list. Examination of the transactions file for each of those firms suggested three possible reasons for the overstatement of the number of shares held by insiders: duplicate reporting of indirect holdings, data-entry errors, and unrecorded termination of insider relationships.
Duplicate reporting of indirect holdings. Interrelationships among the insiders sometimes produced multiple reporting of shares which were held indirectly. For example, if three insiders were trustees for a trust which held shares of the firm's stock, then each would report the shares. Such multiple reporting was evident for several of the firms, and usually involved a large number of shares. It seems likely that there are other instances where multiple reporting of indirect shares could exist but would not be obvious from a simple inspection of the transactions. Absent the ability to unravel these relationships, two alternatives are feasible. One could simply ignore the multiple reporting and accept the overstatement of insider holdings for the affected firms. For some purposes one might even defend such an approach conceptually by suggesting that the effects of insider holdings are likely to be more pronounced where there are interrelationships among insiders. Alternatively, one could exclude all indirect shares from the insider holdings. Such an approach would eliminate multiple reporting, but would understate insider holdings. The author chose the latter approach.
Data-entry errors. The second explanation suggested by examination of the transactions files is the occasional occurrence of a data-entry error. In one case, the holdings of a particular insider varied within the range of 4,000 to 5,000 shares for several years, then a transaction reported direct ownership of 7,174,971 shares. According to the CRSP monthly stock master file, the firm only had 2,103,000 shares outstanding at that date. Those kinds of errors can be detected as outliers when they occur at the order of magnitude cited in this example, but they are otherwise undetectable.
Unrecorded termination of insider relationships. The third possible explanation is the most serious. On examination of the transaction file, it was apparent that the last transaction for some insiders was very old. Some dated all the way back to the beginning of the data base in 1975. The data-extraction program which produced the insider holdings assumed that the termination of a reportable relationship is a reportable event which would reduce the insider's holdings to zero. However, a call to the office of the SEC's general counsel confirmed that it was not. According to a staff attorney there, on termination of a reportable relationship the insider simply sent a letter to the SEC giving the date of termination. The insider was required to report for another six months, but nothing was entered into the ORS file to show the termination of the relationship.
The author concluded that the only feasible approach to this problem would be to assume a time limit on the age of the latest transaction, after which the reporting relationship would be assumed no longer to exist. Careful examination of the transactions records of those 15 firms suggested that it is rare for an insider to go without a transaction for over three years and then resume active reporting. The author decided to assume, therefore, that if more than three years had elapsed between the last transaction date and the announcement date that the insider relationship had been terminated and to count that insider as holding zero shares.
Results Of The Modified Program
The modified data-extraction program was used to construct insider holdings for the 15,482 earnings announcements in the sample and again the results were sorted in descending order of percent-of-shares-held by insiders. Only 108 announcements on the sorted list still showed that more than 100 percent of the firm's stock was held by insiders, down substantially from the former value of 725. The likely sources of the remaining errors would seem to be data-entry errors and failure of the three-year inactivity assumption to properly capture every termination of a reporting relationship.
Since the percentage of insider holdings represented the principal variable of interest in the study for which this work was done, the distribution of those values was examined with some care. Table 1 shows the frequency of occurrence, in five-percent ranges, for percent-of-shares-held by all insiders.
TABLE 1
Distribution Of Insider Holdings For 15,482
Earnings Announcements During 1982-1984
Obviously, any observation where total insider holdings exceed 100 percent of total shares outstanding contains at least one error. Keeping in mind that these firms are all listed on either the NYSE or AMEX, it seems quite likely that any observation where a very high percentage of the shares supposedly is held by insiders should be viewed with suspicion. Table 15 shows that in 15,188 (98.1 percent) of the 15,482 total observations, 70 percent or less of the total shares outstanding are held by insiders. Furthermore, no single five-percent range above 40 percent accounts for more than 2 percent of the total observations.
Testing The Derived Measure Of Insider Holdings
These flaws in the ORS data base raised serious concerns about the quality of the insider holdings information which was being generated. Therefore, a test was undertaken to compare a sample of program-generated insider holdings to the insider holdings from proxy statements of the same date. Microfilm records of proxy statements were searched for the years 1982-1984 for the first 58 firms in the COMPUSTAT alphabetical listing. The search produced a total of 91 proxy statements for 38 firms.
Fourteen proxy statements involving six firms were discarded for reasons detailed in Appendix A. For the remaining 77 proxy statements, the notes to the proxy statements were used to eliminate from the proxy data those shares which were held indirectly, since the insider-holdings program had been modified to exclude such holdings. The insider-holdings program was modified so as to only accumulate the holdings of officers and directors, to mirror the proxy statements. It then was used to calculate the total shares held by officers and directors at the same effective date as the insider-holdings data in each proxy statement.
The difference between the adjusted proxy statement figures and the figures produced by the modified program was less than or equal to 1 percent for 42 (55 percent) of the 77 observations, less than 5 percent for another 27 observations (35 percent), less than 10 percent for three others (4 percent), and greater than 10 percent for the remaining four observations (6 percent). In all, the difference exceeded 5 percent in only 10 percent of the observations. Detailed results are shown in Appendix D, in order of the absolute value of the difference between the adjusted proxy data and the value derived from the ORS data.
DISCUSSION
The use of the SEC's ORS data to generate insider holdings has both advantages and disadvantages relative to the use of proxy data. The choice depends on the experimental design and sometimes on the resources available to extract the proxy data.
Advantages Of ORS Data
The ORS data has three advantages. First, the information can be generated much more quickly. Digging information out of proxy statements is laborious and mind-numbing, which renders it impractical for large samples unless the research is very well financed.
Second, the level of insider holdings can be determined at any arbitrary date. Proxy statements typically are furnished annually. For an event study, the closest proxy date could be as much as six months away from an event date.
Third, the insider holdings can be accumulated or partitioned according to any combination of the variables listed in Appendices A and B. Proxy statements list only the holdings of officers and directors, with supporting notes regarding such matters as shares indirectly held and exercisable options. However, the Insider Relationship field of the master name record has 24 values defined in the documentation, some of which do not denote either officers or directors. Thus, the ORS data contains information which simply is not available in the proxy statements.
Disadvantages Of ORS Data
The disadvantages of the ORS data are implied by the previous discussion of the three most likely reasons that the program calculated some insider holdings in excess of 100 percent and by the discussion of bad transaction dates. All of these conditions generate either noise or bias, or both.
Data-entry errors. Data-entry errors are a source of noise, and even the limited procedures performed in this study found reason to question the quality of the data-entry process. The transactions of the first 15 firms where insider holdings exceeded 100 percent of outstanding shares contained a few obviously incorrect entries. The Transaction Date field contained invalid data in 767 transaction records. The recurring pattern of 753 of those invalid entries (***YY instead of YYDDD) suggests that these were done intentionally, perhaps because the transaction date was not furnished by the reporting insider. The other 14 entries may have been data-entry errors, or simply another way to handle incomplete data. Of course, the use of non-standard procedures for treatment of incomplete data constitutes simply another kind of data-entry error. If the ***YY convention was used in place of missing data, it suggests that the SEC's approach to collecting the data might not be as rigorous as researchers would like.
The evidence in this study is purely anecdotal, of course. No systematic effort was made to detect invalid data or other data-entry errors in every field, and the first 15 firms with insider holdings calculated in excess of 100 percent certainly do not constitute a random sample of transactions. Nonetheless, the results clearly suggest that the data are not as clean as, say, the CRSP and COMPUSTAT files.
Multiple reporting. There is no way to disentangle, or usually even to recognize, multiple reporting of some indirect holdings; e.g., two or more insiders serving as trustees with regard to the same shares. The user must either accept the multiple reporting or discard all indirect holdings. The first approach biases the results upward to the extent of such multiple reporting. The second approach biases the results downward by the amount of the indirect holdings discarded.
Unreported terminations. The use of an arbitrary cut-off age to identify terminated insider relationships introduces both noise and bias. Any arbitrary cut-off age for assumed termination of an insider relationship will be too soon in some cases and too late in others. If the cut-off age were too short, then some valid insider holdings will be discarded and the level of insider holdings will be understated. The converse would be true if the cut-off age were too long. The optimal cut-off age would be equally as likely to overstate as to understate insider holdings, thus producing noise but no bias. However, the optimal cut-off age is not known and may not be determinable. Therefore, any arbitrary cut-off age probably will be suboptimal and will introduce some degree of bias, with direction and magnitude unknown.
SUMMARY AND CONCLUSIONS
The ORS data can be use to construct insider holdings, once the researcher solves the various problems associated with incorrect documentation and odd data-entry conventions. The ORS data are noisy, though, and the resulting measure of insider holdings could contain both noise and some degree of bias. Although the ORS-based measure almost certainly is not as reliable as proxy data, it is suitable, and sometimes the only feasible alternative, for some research designs.
Best Uses For ORS-Based Insider Holdings
Some studies have a manageable sample size, and the dates and contents of the proxy statements meet all of the data requirements. The disadvantages of the ORS approach almost certainly outweigh the advantages for such a study, since some degree of reliability would be sacrificed for no benefit.
However, a researcher who needs precise dates or needs to partition the insider holdings in ways not feasible with the proxy data has little choice but to use the ORS data. The SEC is able to collect this sensitive information through force of law, and it is hard to imagine getting it any other way. If the sample size is manageable, the researcher could devise ways to mitigate the effects of the flaws in the ORS data. For example, he or she could examine all of the relevant transactions for any apparent data-entry errors, and use proxy statements to narrow down to within one year the termination dates for at least the officers and directors.
Studies with sample sizes too large for manual extraction of proxy data are obvious candidates for use of the ORS data. Indeed, this approach was devised for just such a study.
An excellent use for an ORS-based measure of insider holdings might be to construct a stratified sampling frame. The researcher could use the ORS data to calculate the insider holdings for all firms at an arbitrary date, then group the results by percentage-of-shares-held. He or she could randomly select a stratified sample from those groups, then proceed with one of the approaches described in the preceding three paragraphs.
Future Research
More work is needed to make the ORS data more accessible and to more thoroughly evaluate its deficiencies.
Changes to ORS file structures. Documentation received in December, 1992, shows that the SEC has changed the ORS file structures several times since the tape for this study was obtained in April, 1986. The record layouts shown in Appendices A and B are valid for the ORS Master File, which covers the period January, 1975, through August, 1987. It is stored as two separate, but overlapping, datasets: ORS Master-History (1/75-4/82) and ORS Master-Current (1/80-8/87). However, in August, 1987, the SEC abandoned the use of separate name and transaction records in favor of a single transaction record which contained most of the information from both of the former record formats. Date formats were changed from YYDDD to YYMMDD. Since then the record layout and length have changed several times. Consequently, the user who needs to access transactions through the entire period since 1975 will have to either build a single dataset with a common record or write different programs for each record format.
Further investigation of data errors. This study has only begun the process of investigating the extent to which incomplete or incorrect records may exist in the ORS data. For example, the program written for this study identified data of invalid type (alphanumeric rather than numeric) in the Transaction Date field, but did not screen for the possible occurrence of invalid data of the correct type; e.g., a Julian date (YYDDD) of 84455. The data should be sifted through a wide variety of such screens to produce a clearer picture of the overall quality of the data.
Termination of insider relationship. The three-year estimate used in this study as the cut-off age for the assumption of termination of an insider relationship was based on ad hoc analysis of the 3,000 transactions which were printed in the initial stage of trying to read the tape. A better estimate could be developed through a systematic analysis of the intervals between all reported transactions. It is even possible that the best estimate could vary over the 17 years that the ORS has been in place.
Research in progress. The author currently is building a dataset which includes all of the information that is common to all of the files for all of the years since 1975. When it is complete, the data will be screened for errors and a better estimate of the optimal cut-off age for assuming termination of an insider relationship will be produced.
ENDNOTES
1. The author is indebted to Dr. Kent T. Fields, Auburn University, for introducing him to the mysteries of IBM mainframe computing and helping sort out the errors and peculiarities in the SEC data.
2. COMPUSTAT is a registered trademark of Standard & Poor's Compustat Services, Inc. This trademark acknowledgement applies to every use of the name COMPUSTAT in this paper.
3. Documentation for the ORS Master File received after December, 1992, may not contain the field-length specification errors noted for the Social Security or Taxpayer Number field of the name record and the Holdings at End of Month field of the transaction record. In early December, 1992, the author mentioned those errors during a telephone conversation with Mr. Lee Gladwin of the National Archives. Documentation subsequently purchased from the National Archives had those two items pen-changed to the correct values.
4. CRSP is a registered trademark of the Center for Research in Security Prices, University of Chicago. This trademark acknowledgement applies to every use of the name CRSP in this paper.
REFERENCE
[1] Jensen, Michael C., and William H. Meckling, "Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure," Journal of Financial Economics 3, October 1976, pp. 305-360.
APPENDIX A
Proxy Statements Discarded In Test Of Insider
Holdings Program Against Proxy Data
APPENDIX B
Name Master Record
APPENDIX C
Transaction Record
1. Documentation specifies N (numeric) for this field.
2. The documentation's Guide to Symbols refers to this field as CUSIP Number in the version received in April, 1986, and as CUSIP Suffix in the version received in December, 1992. This field contains the last two characters of the standard eight-character CUSIP number.
3. Bytes 22-23 of the CUSIP Number field are blank. See CUSIP Class of Security.
4. The April, 1986, version of documentation shows this field with same name as byte 8, Nature of Ownership. A scan of some of the records shows that this field is not blank, but it does not contain the same information as byte 8.
5. Documentation does not specify data type for this field.
6. Documentation specifies N (numeric) for this field, but byte 51 occasionally contains a zone overpunch. See explanation in this paper.
7. Documentation specified 7 for this field size, but the specified beginning and ending values are correct, so the field size has to be 8.
8. The documentation does not say so, but this field uses the YYDDD format.
APPENDIX D
Test Of ORS-Based Measure Of Direct
Insider Holdings Against Proxy Data