F. Evaluation Designs

The quality of an evaluation varies by design. The most rigorous design is a randomized trial, which requires randomly assigning individuals or groups to either intervention or control status. This is probably not feasible or appropriate for a community-level Safe Routes to School (SRTS) program. Less rigorous designs (see box) have strengths and weaknesses to consider when choosing among them.

Common Evaluation Designs for Program Evaluation

Pre and Post One-Sample Tests:

For example, assess how many students walk to school before a kick off event takes place and how many students walk after the event.

  • Strength: easy to conduct, it is the most feasible design.
  • Weakness: results may not be accurate as there is no control for outside factors that may explain the findings even in the absence of the SRTS program.

Pre and Post Two-Sample Tests:

For example, measure how many students walk or bike before and after SRTS has been in place for 6 months and measure at those same points in time in a similar school elsewhere that did not take part in SRTS.

  • Strength: fairly easy to conduct, better control than the one-sample test, especially if the second school is similar with regard to outside factors.
  • Weakness: no two schools are exactly alike with regard to outside factors; some unmeasured difference between the two schools may still explain the result rather than the SRTS program itself.

Time-Series Design:

For example, measure rates of walking and bicycling  before the SRTS program, then every other month for one year. A time-series design is most feasible with one sample (the school where the program occurs). However, it is more accurate when it includes a comparison school to rule out the possibility of other explanations (beyond the SRTS program) for the changes.

  • Strength: the strongest of the three designs presented in this box when it includes a comparison group. Provides information over time.
  • Weakness: more costly to conduct and because the comparison school will not be exactly the same, some differences in the results will still not be explained just by the presence of the SRTS program.

Adapted from Physical Activity Evaluation Handbook  DHHS, CDC, 2002.


Note:  All of these designs require baseline data (data collected before the SRTS program begins). This is another reason it is important to identify conditions before the program starts.