printable version of page Printer-friendly page

Focus On Basics

Volume 2, Issue D ::: December 1998

Lessons from NCSALL's Outcomes and Impacts Study

by Harold Beder
Researchers occasionally encounter contradictory findings, fidings that disagree with each other to the extent that it is hard to imagine both could be true. Although this is frustrating, it is exciting, too, because in resolving contradictions new insights often emerge. In NCSALL's Outcome and Impacts Study, we were faced with such a contradiction. After examining testing data from seven outcome and impact studies, we concluded that the evidence was insufficient to determine whether adult basic education participants gain in basic skills. In contradiction, however, learners in ten studies were asked if they gained in reading, writing, and mathematics, and they overwhelmingly reported large gains. What led to this contradiction, and what is the answer to the gain question? In this article we will examine possible reasons for the contradiction, but before we do, it is important to know something about the study. And as for the question of gain: that remains to be answered.

Critical Issue

We studied the outcomes and impacts of adult literacy education because the subject is critical for adult literacy educators today. Policy makers who control resources have increasingly demanded that accountability be based on program performance as measured by impact on learners. Indeed, under the newly enacted Workforce Investment Act (HR 1385), programs which fail to achieve stipulated outcomes can be severely sanctioned. And, since outcome and impact studies can identify program strengths and weaknesses, their results provide vital information for program planning and policy formation at the national, state, and local levels.

We characterized outcomes as the changes that occur in learners as a result of their participation in adult literacy education. We saw impacts as the changes that occur in the family and society at large. Commonly studied outcome variables include individual gains in employment, job quality, and income; reduction of welfare; learning gains in reading, writing, and mathematics; GED acquisition; and changes in self-confidence. Common impact variables include effects on children's reading readiness, participation in children's school activities, and whether learners vote.

The NCSALL Outcomes and Impacts Study examined outcome and impact studies conducted since the late 1960's, including national studies, state-level studies, and studies of welfare, workplace, and family literacy programs. Its goals were to determine if the publicly-funded adult literacy education program in the United States was effective; to identify common conceptual, design, and methodological problems inherent in the studies; to raise issues for policy and to make recommendations for research, policy, and practice. In essence, the research was a study of studies. To determine whether the adult literacy program was effective, we prepared case studies on the 23 outcome studies that we judged to be the most credible from a research perspective (see box for the criteria we used). Then, based on the case studies, we conducted a qualitative meta-analysis in which we treated each study's findings as evidence that we weighed to make conclusions about program effectiveness on commonly studied outcome and impact variables. Consensus among these studies has pointed towards positive impact, and in making conclusions, we gave the evidence from more credible studies more weight.

Criteria for Selecting Studies

  • The study included an outcome/impact component.
  • The report was adequately documented in respect to design and methods.
  • There were an adequate number of cases.
  • The sampling plan was adequate ( i.e., could and did result in external validity).
  • Data collection procedures were adequate. (i.e., were not tainted by substantial attrition or biased by other factors).
  • Objective measures, rather than self-report, were used to measure outcomes.
  • Measures, especially tests, were valid and reliable.
  • The research design included a control group.
  • Inferences logically followed from the design and data.

There is no consensus on what adult literacy is, but acquisition of skills in reading, writing, and mathematics is included in almost everyone's definition of what adult literacy education should achieve. Outcome and impact studies have measured reading, writing, and mathematics gains in two ways: via tests or by questionnaires or interviews. Yet as we noted at the outset, the findings using these two methods conflict. While the results from tests are inconclusive on gain, when asked, learners generally report large gains. This calls into question whether tests are correct, whether learners are correct, or whether another explanation exists. To understand the lessons that can be learned from examining this issue, we will first look at studies that used tests and then at studies that measured learners' perception of gain through self-report.

1973 National Evaluation

The 1973 evaluation of the federal adult literacy education program was contracted to the System Development Corporation (Kent, 1973). It began in 1971 and ended two years later. At that time, the Adult Education Act restricted service to adults at the pre-secondary level, so the study was limited to learners with fewer than nine years of schooling. The study also excluded English for Speakers of Other Languages (ESOL) and learners older than 44. For the sample, states were selected according to a strati ed random sampling design and programs and learners were selected using other methods of random sampling.

After reviewing learning gain tests available in 1972, the System Development Corporation selected two tests from level M of the Test of Adult Basic Education (TABE) to use as their instrument. One measured reading comprehension, the other measured arithmetic. The validity and reliability of the TABE components used were not reported. After developing directions and field testing, tests and instructions were sent to local program directors and teachers in the study sample, who were asked to do the testing. The learning gain test was first administered in May, 1972, and then again the following May. Of the 1,108 initial tests obtained, matching tests from the first and second administration were obtained for only 441 subjects. Strictly speaking, the tests administered were not pre- and post-tests, since at initial testing learners had already received varying degrees of instruction.

When initially tested with components of the TABE, on average, learners scored at grade level 5.4 on reading achievement and 6.4 in mathematics. Raw scores were not reported. After the second administration of the test approximately four months later, in which a different test form was used, 26 percent of the students had gained one grade or more in reading, 41 percent had some gain, but less than one grade, and 33 percent had zero or negative gain. In mathematics, 19 percent gained one or more grades, 46 percent gained some, but less than one grade, and 35 percent showed zero or negative gain.

The proportions of those who gained and those who did not may have been affected by the differing hours of instruction learners had amassed between first and second test administration. While almost a fth of the learners had 39 or fewer hours of instruction between the first and second testing, another fth had 80 or more hours of instruction. Average gains for reading were 0.5 grades after 98 hours and 0.4 grades after 66 hours. For mathematics, the comparable figures were 0.3 grades and 0.3 grades respectively.

In the 1973 National Evaluation, we see many problems. Because of high attrition, the test scores are not representative of adult literacy learners in general. Furthermore, what do the fains reported mean? Are they high, medium, or low? In the absence of standards against which to assess learning gain, we do not know.

California GAIN Study

GAIN (Greater Avenues for Independence) was California's JOBS (Job Opportunities and Basic Skills) program. The tested learning gain data comes from the larger evaluation of the entire GAIN program conducted by the Manpower Development Research Corporation (Martinson & Friedlander, 1994). The GAIN evaluation included an experimental design, explained in the next paragraph. Between seven and 14 months after a county implemented GAIN, those welfare recipients who had scored below 215 on the Comprehensive Adult Student Assessment System test (CASAS) were randomly assigned either to a treatment group or to a control group. The treatment group was required to attend JOBS-sponsored adult literacy education classes; the control group was not required to attend JOBS-sponsored classes, but could attend non-JOBS sponsored classes if they wished. As its learning gain test, GAIN used the Test of Applied Literacy Skills (TALS) quantitative literacy section, which is similar to the quantitative literacy test used by the National Adult Literacy Survey (NALS). The test was administered to 1,119 treatment and control group members in their homes two to three years after random assignment. During that period learners had received an average of 251 scheduled hours of instruction.

The experimental design of GAIN is very important for the credibility of the research. In an experimental design, subjects are randomly assigned either to a treatment group, which in this case received instruction, or to a control group, which does not receive the treatment. Because random assignment insures that the two groups are same in every aspect except the treatment, when the performance of the two groups is compared, any difference between them can be logically attributed to the treatment. In short, an experimental design allows us to infer that adult literacy education caused an outcome to occur. This is critical because many outcomes, increased pay and welfare reduction for example, are susceptible to economic and social forces that have nothing to do with participation in adult literacy education. Thus, in the absence of an experimental design, we cannot be sure that participation caused the gains measured.

The researchers found that, on average, learners gained a statistically non-significant 1.8 points on the TALS test, a gain that was far too small to infer impact. However, despite the very small average learning gains, differences among the six counties were substantial. In fact, while in two counties the control group actually outperformed the treatment group, in one county the treatment group outperformed the control group by a highly statistically significant 33.8 percent.

National Evenstart Evaluation

Evenstart is the national, federally-funded family literacy program. To be eligible for Evenstart funding, a program must have an adult literacy education program, early childhood education, parent education, and home-based services. The National Evaluation was contracted to Abt Associates (St. Pierre, 1993; St Pierre et. al., 1995). As with the GAIN evaluation, the tested learning gain component of the Evenstart program was part of a larger study that assessed all the components of Evenstart.

Learning gain was measured in two components, 1) the National Evaluation Information System (NEIS), a data set of descriptive information collected from local programs, and 2) an in-depth study of ten local programs. For the NEIS, data were collected from families at entry, at the end of each year and at exit. The CASAS was used to test learning gain. For the in-depth study, data were collected from participants in ten programs selected because of geographic location, level of program implementation, and willingness to cooperate. The in-depth study included an experimental design. CASAS tests were administered to 98 control and 101 treatment group members who were adult literacy education participants from ve of the ten programs. Although control group members were not participants in adult literacy education at the time of the pre-test, they were not precluded from future participation if they wished. Study subjects were pre-tested in the fall of 1991 and then post-tested twice, nine months later and 18 months later.

For the in-depth study, valid pre- and post-tests were received from 64 participants and 53 control group members. Note two problems: this is a small number of subjects and there was substantial attrition from both groups. At the second post-test (18 months), a statistically non-significant difference of 3.7 points on the CASAS was found between the gains of the two groups leading the evaluators to conclude that Evenstart adult literacy instruction had not produced learning gain, at least in respect to the in-depth study. The NEIS component, which did not use an experimental design, did show small but statistically significant gains of 4.6 points on the CASAS after 70 hours of instruction.


The National Evaluation of Adult Education Programs (NEAEP), which was conducted by Developmental Associates Inc., began in 1990, was concluded in 1994, and issued ve reports of fidings: Development Associates (1992), Development Associates (1993), Development Associates (1994), Young, Fitzgerald and Morgan (1994a), and Young, Fitzgerald and Morgan (1994b). Costing almost three million dollars, the NEAEP was the largest and most comprehensive of three national evaluations of the federal adult literacy education program. Data on learners were collected at several points. Client Intake Record A was completed for each sampled student at the time of intake, and with this instrument, data for 22,548 learners were collected from a sample of 116 local programs. The sample was drawn using a statistical weighting system designed to enable the researchers to generalize fidings from the sample to the United Sates as a whole. Client Intake Record B was completed for all learners who supposedly completed Intake Record A and completed at least one class. For this data collection, records were gathered for 13,845 learners in 108 programs. Learner attrition from the study was clearly evident between the administration of the two instruments. Indeed, by the second data collection (Intake B) eight programs and more than 8,000 learners had dropped from the study: many learners who attended intake sessions never attended a class and also, some records were not forwarded to the researchers. After the second data collection, additional data were collected at ve to eight week intervals for 18 weeks.

The NEAEP lacked the resources to send trained test administrators into the field to administer tests, so it had to rely on program staff to give the tests and it had to use the tests that programs normally used. Because the Comprehensive Adult Student Assessment System test (CASAS) and the TABE (Test of Adult Basic Education) were in sufficiently wide use, they were chosen as the tests for the project. The programs selected for the study were supposed to administer either of these tests near the inception of instruction and again after 70 and 140 hours of instruction. Pre-tests were obtained from 8,581 learners in 88 programs and post-tests were received from 1,919 learners in 65 programs. As one can see, the attrition between pre- and post-tests was substantial due to learner drop out and the failure of programs to either post-test or to submit the test data. Moreover, when Developmental Associates checked the tests, much of the data was so suspect that it was deleted from the study. The NEAEP was left with only 614 usable pre- and post-test scores, less than 20 percent of the intended number of valid cases.

Based on these 614 cases, the NEAEP reported that ABE students received a mean of 84 hours of instruction between pre- and post-tests and attended for an average of 15 weeks. On average, their gain was 15 points on the TABE. Adult secondary students received a mean of 63 hours of instruction and gained seven points on the TABE. All gains were statistically significant at the .001 level (Young et al, 1994a).

The NEAEP found that learners do gain in basic skills, but how credible are the fidings? In a reanalysis of the Development Associates data, Cohen, Garet, and Condelli (1996) concluded, "The implementation of the test plan was also poor, and this data should not be used to assess the capabilities of clients at intake. Some of the key evidence supporting this conclusion includes:
Only half the clients were pretested, and sites that pretested differed from sites that did not. At sites that pretested only some of their clients, pretested clients differed from those who were not pretested.

Programs reported perfect exam scores for a substantial proportion of pretested clients. Less than 20 percent of eligible clients received a matched pretest and posttest.

Among clients eligible to be posttested, significant differences exist among those who were and were not posttested.

The available matched pre- and posttests were concentrated in a very few programs.

These facts render the test data unusable. Therefore this reanalysis invalidates all of the fidings concerning test results from the original analysis" (Cohen, Garet, and Condelli, 1996. p. xi).

Stated simply, because of the problems noted by Cohen, Garet, and Condelli, the test scores received by the NEAEP, as with those of the 1973 National Evaluation, are most certainly biased and, therefore, not representative of adult literacy learners in general. Perhaps, for example, the learners from whom valid pre- and post-test scores were obtained were more motivated and able and the scores are inflated. Perhaps they were less able. We simply do not know. Again, lacking standards, we do not know whether the gains reported should be considered high, medium or low.

The Answer?

The two national evaluations and the NEIS component of the National Evenstart Evaluation do show tested learning gain, but learner attrition from both national evaluations was so severe that we cannot generalize the results. In addition, these studies did not use an experimental design. In contrast, the two studies which did use an experimental design, the GAIN study and the in-depth component of the Evenstart study, showed no significant tested learning gain. Both studies were limited in other ways that space does not permit us to describe here. The studies included in the Outcomes and Impact Study that are not reported here show a similar pattern of confusion on tested learning gain. As measured by tests, do learners gain basic skills as a consequence of their participation? The jury is still out.

Learners' Perceptions

As noted at the outset of this article, when learners are asked whether they have gained skills in reading, writing, and math as a result of participation in ABE, they tend to respond in the affirmative. The National Evaluation of Adult Education Programs (NEAEP) conducted a telephone survey of 5,401 former ABE learners; respondents were asked if they had gained in basic skills. Although many of the former learners who were supposed to be interviewed could not be found, and although many respondents had received very little actual instruction, 50% of the ABE learners and 45% of the adult secondary education (ASE) learners said that participation had helped their reading "a lot." For math, the figures were 51% for ABE and 49% for ASE.

In another national evaluation of the Federal Adult Education Program conducted in 1980 (Young et. al., 1980), data were collected from 110 local programs strati ed according to type of funding agency and program size. Learners were interviewed over the phone. Although the response rate to the interviews was low, 75 percent of those interviewed responded that they had improved in reading, 66 percent said they improved in writing, and 69 percent reported that they had improved in math.

In a study in New Jersey (Darkenwald and Valentine, 1984), a random sample of 294 learners who had been enrolled for seven to eight months was interviewed. Of the respondents, 89 percent said that participation in ABE had helped them become better readers, 63 percent reported that ABE classes had helped their writing, and 85 percent said participation had helped their math. A study in Maryland (Walker, Ewart & Whaples, 1981) interviewed 120 ABE learners who enrolled in Maryland programs and volunteered for the study. Of the respondents, 81 percent reported they could read better because of participating and 90 percent reported their computational skills had improved.

A study in Ohio (Boggs, Buss & Yarnell, 1979) followed up on learners who had terminated the program three years earlier. Data were collected by telephone. Of the 351 valid respondents, 96 percent of those who said improving their reading was a goal reported they had reached the goal. For those who had math as a goal, the figure was 97 percent. Finally, in a study in Wisconsin (Becker, Wesselius, & Fallon, 1976) that assessed the outcomes of the Gateway Technical Institute, a comprehensive adult literacy education program that operated learning centers in a wide range of locations, data were collected from a random sample of former learners who were divided into four categories based on the amount of instruction they had received. A total of 593 learners were contacted and asked if they would participate; 270 usable interviews resulted. That the program helped them with reading was reported by 90 percent, 83 percent reported that the program had helped with writing, and 82 percent reported that they had been helped with math.

The Answer?

The limitations of self-report is the problem that surfaces in these studies. Participation in adult literacy education is hard work and becoming literate is socially acceptable behavior. It could be that self-reported perceptions of basic skills gain are inflated by the normal human tendency to answer with socially acceptable responses and a reluctance to say unfavorable things in a program evaluation. In most of the studies, a large discrepancy existed between the number of learners the evaluation planned to interview and the number who actually completed interviews. It could be that those who were biased in favor of the evaluated programs were more likely to respond to interviews than those who were unfavorably disposed: the "if you can't say anything good, don't say anything at all" syndrome. Indeed, for the Maryland and Wisconsin studies, the respondents had volunteered to be included and may have been favorably biased in comparison to those who did not volunteer.

Then again, perhaps the self-report data is accurate and learners are recognizing important gains in themselves that are too small to be measured by tests. Shirley Brice Heath (1983), for example, chronicles how being able to write for the first time a simple list or a note to one's children is perceived as a significant bene t to those with limited literacy skills. It is doubtful that any of the tests in common usage are sensitive enough to register such gains.


Although the studies reviewed here are just a sample of those analyzed in the full report of the Outcomes and Impact Study, they provide many lessons. First, even the best outcome studies are limited in many ways, and these limitations influence findings. The most common limitation is a unacceptably large attrition of subjects between pre- and post-testing. The subjects for whom both pre-and post test data are available almost always differ substantially from those who were only pre-tested because those who are not post-tested include a high proportion of dropouts, then the results are almost certainly biased.

A second common limitation is post-testing before substantial learning gain can be reasonably expected. Although what constitutes a reasonable time is open to debate, surely 30 hours of instruction is suspect and even 60 hours is questionable. Giving inappropriate levels of tests often creates ceiling or floor effects. In any test, there is a chance factor. When a test is too hard, learners score at the bottom, or floor. Since the scores cannot go down any further "by chance," and can only go up, the chance factor artificially inflates the post-score. When the test is too easy, the opposite or ceiling effect occurs. Many of the tests Development Associates had to delete for the study in the NEAEP suffered from ceiling or floor effects.

The most serious problem with testing may lie in the tests themselves. To be valid, tests must reflect the content of instruction, and the extent to which the TABE or CASAS reflect the instruction of the programs they are used to assess is an unanswered question. Similarly, it may be that the tests are not sensitive enough to register learning gains that adult learners consider to be important.

With some exceptions, such as the California GAIN Study, most outcome evaluations have relied on local programs to collect their data, a practice that is common in elementary and secondary education research. However, adult literacy education is not like elementary and secondary education, where the learners arrive in September and the same learners are still participating in June. Many adult literacy programs have open enrollments, most have high attrition rates, and few have personnel on staff who are well-trained in testing or other data collection. These factors confound accurate record keeping and systematic post-testing at reasonable and pre-determined intervals. Moreover, many adults are reluctant to take tests.

Perhaps the most important lesson for policy and practice is that credible outcomes and impacts research is expensive and requires researchers who are not only experts in design and methodology but who also understand the context of adult literacy education. If data are to be collected from programs, staff must have the capacity to test and to keep accurate records. This will require more program resources and staff development. Good outcome studies help demonstrate accountability and enable us to identify practices that work. Bad outcome studies simply waste money.



Becker, W. J., Wesselius, F., & Fallon, R. (1976). Adult Basic Education Follow-Up Study 1973-75. Kenosa, WI: Gateway Technical Institute.

Boggs, D. L., Buss, T. F., & Yarnell, S. M. (1979). "Adult Basic Education in Ohio: Program Impact Evaluation. Adult Education. 29(2),123-140.

Cohen, J, Garet, M., & Condini, L. (1996). Reanalysis of Data from the National Evaluation of Adult Education Programs: Methods and results. Washington, D.C.: Pelavin Research Institute.

Darkenwald, G. & Valentine, T. (1984). Outcomes and Impact of Adult Basic Education. New Brunswick, NJ: Rutgers University, Center for Adult Development.

Development Associates. (1992). National Evaluation of Adult Education Programs First Interim Report: Profiles of Service Providers. Arlington, VA: Development Associates.

Development Associates. (1993). National Evaluation of Adult Education Programs Second Interim Report: Profiles of Client Characteristics. Arlington, VA: Development Associates.

Development Associates. (1994). National Evaluation of Adult Education Programs Third Interim Report: Patterns and Predictors of Client Attendance. Arlington, VA: Development Associates.

Heath, S., B. (1983). Ways with words: Language, Life, and Work in Communities and Classrooms. NY: Cambridge University Press.

Kent, William P. (1973). A Longitudinal Evaluation of the Adult Basic Education Program. Falls Church, VA: System Development Corporation.

Martinson. K., & Friedlander, D. (1994). GAIN: Basic Education in a Welfare to Work Program. New York, NY: Manpower Development Research Corporation.

St. Pierre, R., Swartz, J., Gamse, B., Murray, S., Deck, D., & Nickel, P. (1995). National Evaluation of the Even Start Family Literacy Program: Final Report. Washington, D.C. : U.S. Department of Education, Office of Policy and Planning.

St. Pierre, R., Swartz, J., Murray, S., Deck, D., & Nickel, P. (1993). National Evaluation of the Even Start Family Literacy Program: Report on Effectiveness. Washington, D.C. : U.S. Department of Education, Office of Policy and Planning.

Walker, S. M., Ewart, D. M., & Whaples, G. C. (1981). Perceptions of Program Impact: ABE/GED in Maryland. [Paper presented at the Lifelong Learning Research Conference, College Park, MD, February 6 and 7, 1981]

Young, M. B., Fitzgerald, N., & Morgan, M. A . (1994b). National Evaluation of Adult Education: Executive Summary. Arlington, VA: Development Associates.

Young, M. B., Fitzgerald, N., & Morgan, M. A., (1994a). National Evaluation of Adult Education Programs Fourth Interim Report: Learner Outcomes and Program Results. Arlington, VA: Development Associates.


About the Author

Hal Beder is a professor of adult education at Rutgers University and author of Adult Literacy: Issues for Policy and Practice. He is chair of the Rutgers' Department of Educational Theory, Policy, and Administration, and chair-elect of the Commission of Professors of Adult Education, a national commission attached to the American Association of Adult and Continuing Education (AAACE).

Full Report Available

The research report upon which this article is based is available from NCSALL for $10.
For a copy, contact Kim French
World Education
44 Farnsworth Street
Boston, MA 02210-1211
phone: (617) 482-9485

Updated 7/27/07 :: Copyright © 2005 NCSALL