- About Us
- Statewide Resources
- Get Involved
- For Employees
- Find Us
Evaluation is an Everyday Activity
Program Evaluation Discussions
Updated: 14 hours 24 min ago
This is a link to an editorial in Basic and Applied Social Psychology. It says that inferential statistics are no longer allowed by authors in the journal.
“What?”, you ask. Does that have anything to do with evaluation? Yes and no. Most of my readers will not publish here. They will publish in evaluation journals (of which there are many) or if they are Extension professionals, they will publish in the Journal of Extension. And as far as I know, BASP is the only journal which has established an outright ban on inferential statistics. So evaluation journals and JoE still accept inferential statistics.
Still–if one journal can ban the use, can others?
What exactly does that mean–no inferential statistics? The journal editors define this ban as as “…the null hypothesis significance testing procedure is invalid and thus authors would be not required to perform it.” That means that authors will remove all references to p-values, t-values, F-values, or any reference to statements about significant difference (or lack thereof) prior to publication. The editors go on to discuss the use of confidence intervals (No) and Bayesian methods (case-by case) and what inferential statistical procedures are required by the journal.
This ban reminds me of a valuable lesson presented to me in my original statistics course (maybe not the original one, maybe several). That lesson being the difference between practically significant and statistically significant. Something can be statistically significant (all it has to do is meet the p<.05 bar) and not be practically significant. To demonstrate this point, I offer the example of a three point gain per semester. It was statistically significant at p<.05 but did it actually show that the students learned something over the semester? Does three points make that much difference? Three points is three questions on a 100-question test or 1.5 questions on a 50-question test. How much would they have to learn to make a difference in their lives? You do the math.
The journal is requiring strong descriptive statistics INCLUDING EFFECT SIZE (go read Cohen) because effect size is independent of sample size unlike significance tests. Cohen lists effect sizes as small (0.2), medium (0.5) or large (0.8). Descriptive statistics are those numbers which describe the SAMPLE (not the population from which the sample was drawn) and typically include measures of central tendency (mean, median, mode) and variability (range , standard deviation, kurtosis , and skewness), as well as frequency and percentage (i.e., distributional data).
By using a larger sample size, the “descriptive statistics become increasingly stable and sampling error is less of a problem.” The journal stops “…short of requiring particular sample sizes…”, however, stating that “…it is possible to imagine circumstances where more typical sample sizes might be justifiable.” I do remember a voice advocating for stating effect size; no one ever went so far as to talk down inferential statistics.
What does that say for the small sample sizes Extension professionals typically achieve? I would suggest Extension professionals look at effect size.
Evaluators are often the key people identified to conduct a needs assessment. A needs assessment is identified in the situation that exists before the intervention is designed or implemented. Hopefully. Currently, there is discussion in the field that rather than focusing on needs (i.e., what is missing, needed), there should be discussions of assets (i.e., what is available, strengths). My favorite go-to person on needs assessments is Jim Altschuld who has published a volume that talks about bridging the gap between the two. . In it, he talks about the difference between the two. He says, “Need is a noun, a problem that should be attended to or resolved. It is a gap or discrepancy between the ‘what should be’ and the ‘what is’ conditions”. However, assets/capacity building (emphasis added) refer “…to building a culture in an organization or community so that it can grow and change in accord with its strengths…”
Recently, at a meeting looking at diversity, I suggested the focus be on assets rather than needs. The meeting’s facilitator jumped from looking at assets to conducting a SWOT analysis (strengths, weaknesses, opportunities, threats) analysis.
I went looking for how Altschuld treats this topic (he doesn’t–at least it isn’t in the index).
So I wondered how these two approaches are related. Certainly, strengths could be equated with assets (although the Wikipedia page cited above says strengths are “characteristics of the business or project that give it an advantage over others” which are not exactly the same thing as assets) and weaknesses could be equated to needs (although the Wikipedia page cited above says weaknesses are “characteristics that place the business or project at a disadvantage relative to others”). I don’t think in terms of “others” when I conduct a needs assessment or an assets assessment that consequently builds capacity. So, how is the project (intervention) at a disadvantage? How is it at an advantage. I suppose that a current situation that “needs” something could be at a disadvantage when compared to others (interventions, projects, programs). Typically, the evaluator doesn’t have the time or other resources to explore how the situation is different/disadvantaged from others.
Where do the opportunities and threats fit? The Wikipedia citation says that opportunities are “elements that the project could exploit to its advantage” (are these not assets?) and threats are “elements in the environment that could cause trouble for the business or project” (these could be needs,certainly).
Sounds to me like a SWOT is detailed approach that includes some of the same things that a needs and assets assessment includes. The biggest difference that I see is that SWOT includes comparison “others”. What if there is not an “other”? What if the project is in a controlled environment? Can this be a useful tool to an evaluator? What do you think?
Earlier this week I attended a meeting of the College of Education (my academic home) Social and Environmental Justice (SJE) Work Group. This is a loosely organized group of interested faculty and staff, led by an individual who is the ESOL Program Coordinator & Instructor. We had representatives from each of the four program areas (Adult and Higher Education [AHE], Teacher and Counseling Education [TCE], Science, Technology, Engineering, and Math [STEM], and Cultural and Linguistic Diversity [CLD]) in person (AHE, TCE,. CLD) or on paper [STEM]. The intent was to document for the work group, what each program area is doing in the area of social justice. Social Justice is a mandate for the College and OSU. The AHE and the TCE representatives provided us with information. We never did get to the STEM response. Then we got on to a discussion of what exactly is meant by social justice (since AHE has not defined the term specifically). My response was the evaluation response: it depends.
Most of the folks in the group focused on the interface of race and gender. OK. Others focused on the multiple and different voices. OK. Others focused on the advantages and disadvantages experienced. How is that not based in economics? Others focused on power and privileged. (As an accident of birth?) What is social justice exactly? Can you have social justice without environmental justice? How does that fit with the issue of diversity? How does any of this relate to evaluation?
The American Evaluation Association has had in place for a long time (since 1994) a set of five guiding principles (see Background section at the link for a bit of history). The fourth and fifth principles are, respectively, Respect for People and Responsibilities for General and Public Welfare. Respect for people says this: Evaluators respect the security, dignity and self-worth of respondents, program participants, clients, and other evaluation stakeholders. Responsibilities for the General and Public Welfare says this: Evaluators articulate and take into account the diversity of general and public interests and values that may be related to the evaluation. Although both talk about parts of social justice that we talked about earlier this week, is this a complete view? Certainly, security, dignity, and self worth and diversity of interests and values approach the discussion we had. Is there still something missing? I think so. Where is fairness addressed?
To me, fairness is the crux of the issue. For example, it certainly isn’t fair that in the US, 2% of the population has the cumulative wealth of the remaining 98%. (Then we are into economics.) Although Gandhi said “be the change” is that enough? What if that change isn’t fair? And the question must be addressed, fair to whom? What if that change is only one person? Is that fair? I always talk about the long term outcome as world peace (not in my lifetime, though). If you work for justice (for me that is fairness) will peace result? I don’t know. Maybe.
Tomorrow is the Lunar New Year. It is the year of the goat/sheep/ram. I wish you the best. Eat jiaozi and tangerines (for encouraging wealth), and noodles without breaking/biting them (you do want a long life, right?). Happy New Year.
I don’t know what to write today for this week’s post. I turn to my book shelf and randomly choose a book. Alas, I get distracted and don’t remember what I’m about. Mama said there would be days like this…I’ve got writer’s block (fortunately, it is not contagious). (Thank you, Calvin). There is also an interesting (to me at least because I learned a new word–thrisis: a crisis of the thirties) blog on this very topic (here).
So this is what I decided rather than trying to refocus. In the past 48 hours I’ve had the following discussions that relate to evaluation and evaluative thinking.
- In a faculty meeting yesterday, there was the discussion of student needs which occur during the students’ matriculation in a program of study. Perhaps it should include assets in addition to needs as students often don’t know what they don’t know and cannot identify needs.
- A faculty member wanted to validate and establish the reliability for a survey being constructed. Do I review the survey, provide the reference for survey development, OR give a reference for validity and reliability (a measurement text)? Or all of the above.
- There appears to be two virtual focus group transcripts for a qualitative evaluation that have gone missing. How much affect will those missing focus groups have on the evaluation? Will notes taken during the sessions be sufficient?
- A candidate came to campus for an assistant professor position who presented a research presentation on the right hand (as opposed to the left hand) [Euphemisms for the talk content to protect confidentiality.] Why even study the right hand when the left hand is what is the assessment?
- Reading over a professional development proposal dealing with what is, what could be, and what should be. Are the questions being asked really addressing the question of gaps?
I’m sure there are others. These jump to my mind. So I’ll give the references that relate to the above situations by number. Some of them I’ve given before; seems appropriate to do so again.
- Altschuld, J. W. (2014). Bridging the gap between asses/capacity building and needs assessment. Thousand Oaks, CA: Sage.
- Dillman, D. A., Smyth, J. D., Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.
2a. Salkind, N. J. (2005). Tests & measurement for people who (think they) hate tests and measurements.Thousand Oaks, CA: Sage. (I show an image of the first edition; there is a second edition available.)
5. (See number 1).
Where have you found evaluation/evaluative thinking in your day?
Let me know.