Tel 701

Mixed Methods Research Bins

Codes:

THREAT TO VALIDITY

MATURATION THREAT

CONFOUNDED

INTERNAL THREAT

TESTING THREAT

PRACTICE EFFECT

INSTRUMENTATION EFFECT

NONEQUIVALENCE THREAT

REGRESSION THREAT

TYPES OF RESEARCH

EXTERNAL VALIDITY

VALIDITY

RELIABILITY

SURVEYS

MIXED METHODS SAMPLING

ROBUSTNESS

ANOVA

SAMPLE SIZE

LIKERT SCALES

CHI SQUARED

P-VALUE

NULL HYPOTHESIS

ASSUMPTIONS

Index:

Module 01:

Gelo, O., Braakmann, D., & Benetka, G. (2008). Quantitative and qualitative research: Beyond the debate. Integrative Psychological and Behavioral Science, 42(3) 266-290.

Johnson, R.B. & Onwuegbuzie, A.J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33(7), 14-26.

Module 02:

Smith, M. L. & Glass, G. V. (1987). Correlational studies in M. L. Smith and G. V Glass, Research and Evaluation in Education and the Social Sciences, pp. 198-224, Needham Heights, MA: Allyn and Bacon

Module 03:

Bryson et al. (2012). The Science of Opinion: Survey Methods in Research, Canadian Journal of Anesthesiology, Springer, pp. 736-738.

Smith, M. L. & Glass, G. V. (1987). Experimental studies in M. L. Smith and G. V Glass, Research and Evaluation in Education and the Social Sciences, pp. 124-157, Needham Heights, MA: Allyn and Bacon.

Brualdi, A (1999). Traditional and modern concepts of validity. Eric Clearinghouse on Assessment and Evaluation, Washington, D.C.

Module 04:

Fraenkel, J. R. & Wallen, N. E. (2005). Validity and reliability, in J. R. Fraenkel and N. E. Wallen, How to design and evaluate research in education with PowerWeb, pp. 152-171, Hightstown, NJ: McGraw Hill Publishing Co.

Suskie, L. A. (1996.) Questionnaire survey research: What works. (2nd ed.). Tallahassee, FL: The Association for Institutional Research. *

Thayer-Hart, N., Dykema, J., Elver, K., Schaeffer, N. C., Stevenson, J. (2010). Survey fundamentals: A guide to designing and implementing surveys. Madison, Wisconsin: University of Wisconsin Survey Center.

Diem, K. G. (2004). A step-by-step guide to developing effective questionnaires and survey procedures for program evaluation & research. Rutgers Cooperative Research and Extension Fact Sheet, New Brunswick, New Jersey: Rutgers University.

Miller, P. R. (n.d.). Tipsheet: Question wording. Duke Initiative on Survey Methodology, Durham, North Carolina: Duke University, Retrieved August 2016 from http://www.dism.ssri.duke.edu/pdfs/Tipsheet%20-%20Question%20Wording.pdf

Miller, P. R. (n.d.). Tipsheet: Improving response scales. Duke Initiative on Survey Methodology, Durham, North Carolina: Duke University, Retrieved August 2016 from http://www.dism.ssri.duke.edu/pdfs/Tipsheet%20-%20Question%20Wording.pdf

Module 05:

Teddlie, C., & Yu, F. (2007). Mixed methods sampling: A typology with examples.Journal of Mixed Methods Research, 1(77), 77-100. doi:10.1177/2345678906292430

Norman, G. (2010). Likert scales, levels of measurement and the ‘‘laws’’ of statistics. Advances in Health Science Education, 15(5), 625–632. doi:10.1007/s10459-010-9222-y

Edmondson, D. R., Edwards, Y. D., & Boyer, S. L. (2012). Likert scales: A marketing perspective. International Journal of Business, Marketing, and Decision Sciences, 5(2), 73-85.

Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12), 1212–1218. doi:10.1111/j.1365-2929.2004.02012.x

Module 06:

Marshall, G., & Jonker, L. (2010). An introduction to inferential statistics. A review and practical guide. Radiography, 17(1), e1-e6. doi:10.1016/j.radi.2009.12.006

Allua, S., & Thompson, C. B. (2009). Inferential statistics. Air Medical Journal, 28(4), 168-171. doi:10.1016/j.amj.2009.04.013

Allua, S., & Thompson, C. B. (2009). Hypothesis testing. Air Medical Journal, 28(3),108-153. doi:10.1016/j.amj.2009.03.002

Pereira, S. M. C., & Leslie, G. (2009). Hypothesis testing. Australian Critical Care, 22(4), 187-191. doi:10.1016/j.aucc.2009.08.003

Ren, D. (2009). Understanding statistical hypothesis testing. Journal of Emergency Nursing, 35(1), 57-59. doi:0.1016/j.jen.2008.09.020

Fisher, M. J., & Marshall, A. P., (2008). Understanding descriptive statistics. Australian Critical Care, 22(2), 93-97. doi:10.1016/j.aucc.2008.11.003

Shi, R., & McLarty, J. W. (2009). Descriptive statistics. Annals of Allergy, Asthma & Immunology, 103(4), 9-14. doi:10.1016/s1081-1206(10)60815-0

Module 07:

Sedgwick, P. (2014). Pitfalls of statistical hypothesis testing: Multiple testing. British Medical Journal (BMJ), 349, 1-2. doi:10.1136/bmj.g5310.

Module 08:

Veazie, P. J. (2015). Understanding statistical testing. Sage Open, 5(1), 1-9. doi:10.1177/2158244014567685.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations, European Journal of Epidemiology, 31(4), 337–350. doi:10.1007/s10654-016-0149-3.

Sedgwick, P. (2014). Understanding statistical hypothesis testing. British Medical Journal, 348(), g3557. doi:10.1136/bmj.g3557.

Wasserstein, R. L., Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133, doi:10.1080/00031305.2016.1154108.

Author(s) Unknown (n.d.). Chi square tests. Book Author(s)/Editor(s) unknown (pp. 703-765). Retrieved from http://uregina.ca/~gingrich/ch10.pdf

McHugh, M. L. (2013). The chi-square test of independence. Biochemia Medica, 23(2), 143–149. doi:10.11613/BM.2013.018.

Module 09:

Gabrenya, W. (2003) Inferential statistics basic concepts. Retrieved fromhttp://my.fit.edu/~gabrenya/IntroMethods/eBook/inferentials.pdf.

Mcgough, J. J., Faraone, S. V. (2009). Estimating the size of treatment effects: Moving beyond P Values, Psychiatry, 6(10), 21–29.

Kelly, K. & Preacher, K. J. (2012). On effects size. Psychological Methods, 17(2), 137-152. doi: 10.1037/a0028086.

Powers, J. M., Glass, G. V. (2014). When statistical significance hides more than it reveals. Teachers College Record. Retrieved fromhttps://www.tcrecord.org/content.asp?contentid=17591

Wuensch, K. (2015). Cohen’s Conventions for Small, Medium, and Large Effects. Retrieved fromhttp://core.ecu.edu/psyc/wuenschk/docs30/EffectSizeConventions.pdf.

Wuensch, K. (2015). Estimating the Sample Size Necessary to Have Enough Power. Retrieved from: http://core.ecu.edu/psyc/wuenschk/docs30/Power-N.Doc

Cohen, J. (1992). A power primer. Psychological Bulletin, 12(1), pp. 155-159.

Gelo, Ol, Braekmann, D, & Benetja, G. (2008). Quantitative and Qualitative Research: Beyond the debate. (Link) (Return to Index)

Gelo, O., Braakmann, D., & Benetka, G. (2008). Quantitative and qualitative research: Beyond the debate. Integrative Psychological and Behavioral Science, 42(3) 266-290.

Summary: This article describes the historical - and current - debate about the uses of both qualitative and quantitative methodologies for research. The article presents the key differences between the methods, and describes or suggests how the two methods should best be combined to provide more thorough research.

While qualitative research approaches (e.g., Silverman 2004) have been developed starting from completely different philosophical assumptions, such as phenomenology and hermeneutics, some quantitative researchers (e.g. Michell 1999, 2000; Toomela 2008) have become self-critical about their own research approach. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 266)

For example, Michell (1999) provided a critical historical overview of the concept of measurement in psychology, identifying two main issues: (1) most quantitative research is based upon the fact that psychological attributes can be measured in a quantitative way rather than upon empirical investigation of the issue; (2) most quantitative researchers adopt a defective definition of measurement, thinking that measurement is simply the assignment of numbers to objects and events according to specific rules. In a similar way, Toomela (2008) recently showed how (1) quantitative variables may encode information ambiguously, and how (2) statistical analysis may not always allow a meaningful theoretical interpretation, because of ambiguity of information encoded in variables, and because of intrinsic limitation of statistical procedures.(Gelo, O., Braakmann, D., & Benetka, G., 2008, p 267)

Hence the basic concern is what information is encoded in quantitative variables supposed to represent mental phenomena (ontology of a variable), and how this kind of information may enlighten us about the relationship between these mental phenomena (epistemology of a variable)(Gelo, O., Braakmann, D., & Benetka, G., 2008, p 267)

Quantitative research requires the reduction of phenomena to numerical values in order to carry out statistical analysis. By contrast, qualitative research involves collection of data in a non-numerical form, i.e. texts, pictures, videos, etc. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 268)

According to quantitative approaches, psychological and social phenomena have an objective reality. The relationships between these phenomena are investigated in terms of generalizable causal effects, which in turn allow prediction. By contrast, qualitative approaches consider reality as socially and psychologically constructed. The aim of scientific investigation is to understand the behaviour and the culture of humans and their groups “from the point of view of those being studied” (Bryman 1988, p. 46) (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 268)

As noted by Bryman (1984), the QQD is based to a large extent

on epistemological issues, and questions relating to research techniques are

systematically related to these issues. Some other authors, though, assume a more

pragmatic position. According to them, it is both possible to subscribe to the

philosophy of one approach and employ the methods of the other (Reichardt and

Cook 1979; Steckler et al. 1992).(Gelo, O., Braakmann, D., & Benetka, G., 2008, p 268)

Scientific investigation can be characterized by a set of philosophical and metatheoretical assumptions concerning the nature of reality (ontology), knowledge (epistemology), and the principles inspiring and governing scientific investigation (methodology), as well as by technical issues regarding the practical implementation of a study (research methods). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 268)

While philosophical and meta-theoretical assumptions underlie the worldviews constraining the kinds of questions we try to answer, and the principles governing our research approach, research methods specify the practical implementation of our scientific investigation in terms of data collection, analysis and interpretation (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 269)

The main features characterizing quantitative and qualitative approaches may be described with respective reference to their philosophical foundations, methodological assumptions, and to the research methods they employ. Differences at each of these levels have contributed to sustain the QQD. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 269)

Similarly, scientific paradigms contain a basic set of beliefs or assumptions that guide our inquiries (Guba and Lincoln 2005). With reference to quantitative and qualitative research approaches, three main worldviews may be identified: objectivism (according to which reality exists independent from consciousness), subjectivism (according to which subjective experience is fundamental to any knowledge process), and constructivism (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 269)

Quantitative and qualitative approaches present different methodologies which, as in the case of their paradigmatic foundations, have deeply contributed to maintain the QQD (see Table 1). The former are usually described to adopt a nomothetic methodology, while the latter adopt an idiographic methodology. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 270)

(Gelo, O., Braakmann, D., & Benetka, G., 2008, p 271)

Quantitative approaches tend to explain, i.e. to verify if observed phenomena and their systematic relationship confirm the prediction made by a theory. Qualitative approaches, in turn, tend to comprehend, i.e. aspire to reconstruct the personal perspectives, experiences and understandings of the individual actors. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 271-272)

In quantitative approaches, hypotheses are deductively derived from the theory and have then to be falsified through empirical investigation (confirmatory study). In qualitative approaches, however, the development of hypothesis is part of the research process itself, whose aim is to develop an adequate theory according to the observations that have been made (exploratory study). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 272)

Contrary to quantitative research approaches—which employ experimental and non-experimental designs—qualitative approaches make use of naturalistic designs (Lincoln and Guba 1985), whose aim is to study behaviour in natural settings. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 272)

With regard to quantitative research, Cook and Campbell (1979) identified statistical conclusion validity (i.e. the validity of inferences from the sample to the population), construct validity (i.e. the validity of the theoretical constructs employed), and causal validity (i.e. the validity of the cause–effect relationship between observed variables) as specific kinds of internal validity. External validity, on the other hand, can be defined as “the extent to which the results of a study can be generalized across populations, settings, and times” (Johnson and Christensen 2000; p. 200) (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 274)

Qualitative approaches, by contrast, make use of almost exclusively purposive sampling strategies. These allow “selecting information-rich cases to be studied in depth” (Patton 1990; p. 169). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 275)

These emergent themes [re quantitative methods] are then re-labelled, using a language closer to the language of the researcher and to the theory of reference. Finally, the themes (or content categories) are interrelated to each other and abstracted into a set of themes, which will receive new labels. This procedure allows reaching gradually higher levels of abstraction in the description of the data, and identifying the constituents of the analyzed texts. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 277).

In qualitative research, data interpretation is based on a process of inductive inference (Tashakkori and Teddlie 2003b), which refers to a process of creating meaningful and consistent explanations, understanding, conceptual frameworks, and/or theories drawing on a systematic observation of phenomena. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 278).

Greene and Caracelli (2003) delineate four meaningful instances in mixing paradigms: (1) thinking dialectically about mixing paradigms, (2) using a new paradigm, (3) being pragmatic, and (4) putting substantive understanding first. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 278)

Shedding light on the “dynamic of research purposes” is necessary to understand MMR’s methodology (Newman et al. 2003). In order to do that, Newman and colleagues (Newman et al. 2003) present a typology of research purposes, each of which is generally associated with either a quantitative or a qualitative methodology. These nine general purposes (and the correspondent methodologies) are categorized as follows: (1) predict—through quantitative methodology, (2) add to the knowledge base —through quantitative methodology, (3) have a personal, social, institutional, and/or organizational impact—through qualitative methodology, (4) measure change— through quantitative methodology, (5) understand complex phenomena—through qualitative research, (6) test new ideas—through quantitative methodology, (7) generate new ideas—through qualitative methodology, (8) inform constituencies— through qualitative methodology, and (9) examine the past—through qualitative methodology (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 279)

In order to provide a more synthetic, parsimonious and functional overview of the different research designs actually existing in MMR, Creswell and Plano Clark (2007) propose four major mixed methods designs, each of one with its variants: the triangulation design, the embedded design, the explanatory design, and the exploratory design. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 280)

(Gelo, O., Braakmann, D., & Benetka, G., 2008, p 281)

A two-phase approach is instead used when the researcher needs qualitative information before the intervention (e.g. in order to better shape the intervention or to select participants) or after the intervention (e.g. to explore in depth the results of the intervention or to follow up on the experiences of the participants about the intervention). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 282)

The specific sampling strategies for quantitative and qualitative research (see Table 2) should be applied also when these two research approaches are used in combination. (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 284)

Data collection in MMR can be concurrent (as in triangulation and one-phase embedded designs) or sequential (as in explanatory, exploratory, and two phase embedded designs) (for a detailed account see Creswell and Plano Clark 2007). In the case of concurrent data collection, data is collected during the same timeframe, even though independently from each other (see Fig. 1). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 284)

The purpose of sequential mixed methods data analysis is to use the results from

the first data set to inform the results which will be obtained with the second data set.

Sequential data analysis therefore involves an initial stage where the first data set is

analyzed following the traditional quantitative (as in explanatory or two-phase

embedded designs) or qualitative (as in exploratory or two-phase embedded designs)

procedures of analysis (see Table 2). (Gelo, O., Braakmann, D., & Benetka, G., 2008, p 285)

Johnson, R.B. & Onwueguzie, A.J (2004). Mixed Methods Research: A research paradigm whose time has come. (Link) (Return to Index)

Johnson, R.B. & Onwuegbuzie, A.J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33(7), 14-26.

Summary: This article presents an argument for the value of mixed methods research. It describes the history of the debate between quantitative and qualitative researchers, but then articulates the values of a mixed method approach which incorporates both the quant and qual aspects. It positions mixed methods between the two extremes of qualitative and quantitative research methodologies. The extent to which the strengths and weaknesses is are presented are extensive.

Quantitative purists maintain that social science

inquiry should be objective. That is, time- and context-free generalizations

(Nagel, 1986) are desirable and possible, and real

causes of social scientific outcomes can be determined reliably

and validly. According to this school of thought, educational researchers

should eliminate their biases, remain emotionally detached

and uninvolved with the objects of study, and test or

empirically justify their stated hypotheses. (Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 14)

Qualitative purists (also called constructivists and interpretivists)

reject what they call positivism. They argue for the superiority of

constructivism, idealism, relativism, humanism, hermeneutics,

and, sometimes, postmodernism (Guba & Lincoln, 1989; Lincoln

& Guba, 2000; Schwandt, 2000; Smith, 1983, 1984). These

purists contend that multiple-constructed realities abound, that

time- and context-free generalizations are neither desirable nor

possible, that research is value-bound, that it is impossible to differentiate

fully causes and effects, that logic flows from specific

to general (e.g., explanations are generated inductively from the

data), and that knower and known cannot be separated because

the subjective knower is the only source of reality (Guba, 1990).

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 14)

the incompatibility

thesis (Howe, 1988), which posits that qualitative

and quantitative research paradigms, including their associated

methods, cannot and should not be mixed.

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 14)

both sets of researchers incorporate safeguards into their inquiries

in order to minimize confirmation bias and other sources of invalidity

(or lack of trustworthiness) that have the potential to

exist in every research study (Sandelowski, 1986).

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 15)

some individuals

who engage in the qualitative versus quantitative paradigm debate

appear to confuse the logic of justification with research methods.

That is, there is a tendency among some researchers to treat

epistemology and method as being synonymous (Bryman, 1984;

Howe, 1992).

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 15)

Obviously, the conduct of

fully objective and value-free research is a myth, even though the

regulatory ideal of objectivity can be a useful one.

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 16)

he strong

ontological relativistic or constructivist claim in qualitative research

that multiple, contradictory, but equally valid accounts of

the same phenomenon are multiple realities also poses some potential

problems. Generally speaking, subjective states (i.e., created

and experienced realities) that vary from person to person

and that are sometimes called "realities" should probably be

called (for the purposes of clarity and greater precision) multiple

perspectives or opinion or belief (depending on the specific phenomenon

being described) rather than multiple realities (Phillips

& Burbules, 2000).

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 17)

We

agree with qualitative researchers that value stances are often

needed in research; however, it also is important that research is

more than simply one researcher's highly idiosyncratic opinions

written into a report. (Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 16)

Mixed methods research should, instead (at this time),

use a method and philosophy that attempt to fit together the insights

provided by qualitative and quantitative research into a

workable solution. (Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 16)

If two ontological positions about the mind/body problem

(e.g., monism versus dualism), for example, do not make a difference

in how we conduct our research then the distinction is,

for practical purposes, not very meaningful. (Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 17)

It is an expansive and creative form of research, not

a limiting form of research. It is inclusive, pluralistic, and complementary,

and it suggests that researchers take an eclectic approach

to method selection and the thinking about and conduct

of research.(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 17)

Many research

questions and combinations of questions are best and most fully

answered through mixed research solutions.

In order to mix research in an effective manner, researchers

(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 17)

Ultimately, the possible

number of ways that studies can involve mixing is very large

because of the many potential classification dimensions. It is a

key point that mixed methods research truly opens up an exciting

and almost unlimited potential for future research. (Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 20)

The majority of mixed methods research designs can be developed

from the two major types of mixed methods research:

mixed-model (mixing qualitative and quantitative approaches

within or across the stages of the research process) and mixed method

(the inclusion of a quantitative phase and a qualitative

phase in an overall research)(Johnson R.B. & Onwuegbuzie, A.J., 2004, p. 20)

Correlational studies in M. L. Smith and G. V Glass, Research and Evaluation in Education and the Social Sciences, pp. 198-224 (Link) (Return to Index)

Smith, M. L. & Glass, G. V. (1987). Correlational studies in M. L. Smith and G. V Glass, Research and Evaluation in Education and the Social Sciences, pp. 198-224, Needham Heights, MA: Allyn and Bacon

Correlational studies serve two broadly conceived purposes. THe first is building theory about phenomena by better understanding the constructs, what they consist of, and how they relate to other constructs….The second purpose of correlational studies is to enable us to predict one variable from another( or several others). (Smith, M. L. & Glass, G. V., 1987, p198)

The difference between a correlational study and a causal comparative study is solely the investigators purpose: “ to relate or predict” ( correlational) as opposed to” to determine cause” ( causal comparative). (Smith, M. L. & Glass, G. V., 1987, p199)

The most prevalent way of expressing a relationship between variables is the correlation coefficient. The Pearson product moment correlation is the correlation coefficient that is an index of the degree of relationship between two variables. (Smith, M. L. & Glass, G. V., 1987, p202)

Relationships between variables are expressed in several ways. First, a researcher can compare the means of one variable of two or more groups that are differentiated on a second variable….. The second way of expressing the relationship between variables employees the analysis of cross brakes, or the cross classification of variables. The researcher sub categorizes the sample according to two or more variables and looks for differences in relative frequency or a percentages among the categories. (Smith, M. L. & Glass, G. V., 1987, p201).

the magnitude of a correlation in a sample is influenced by several features in the study. The level of reliability with which a variable is measured sets a limit on how high it can correlate with another variable. (Smith, M. L. & Glass, G. V., 1987, p202)

to obtain a better idea on the actual relationship between two such variables, researchers sometimes form what is known as correction for attenuation. This statistical procedure adjust the obtained correlation upwards to account for error of measurement in either of the variables. (Smith, M. L. & Glass, G. V., 1987, p202)

When two variables are related to each other they share variance in common. (Smith, M. L. & Glass, G. V., 1987, p204)

The regression coefficients or regression weights are determined by the principle of least squares. (Smith, M. L. & Glass, G. V., 1987, p205)

Suppose that a researcher wanted to study the relationship between authoritarianism and religiosity. He measures both variables on a continuous scale that divides the city scale at the median, creating a dichotomy of persons high and low on religiosity. The correct form of correlation is the biserial correlation because, although religiosity forms a dichotomy, there is a continuous variable underlying it. (Smith, M. L. & Glass, G. V., 1987, p202)

Regression analysis uses the correlation between two variables to predict one from the other. (Smith, M. L. & Glass, G. V., 1987, p207)

...Another form of reporting in regression analysis is the test of statistical significance of the regression coefficients. Test such as these are made when the intent of the researcher is to generalize beyond the sample at hand you some larger population. (Smith, M. L. & Glass, G. V., 1987, p210)

Several circumstances exacerbate capitalization on chance. The first is small sample size. As the ratio of the number of predictors to the size of the sample Rises, the results grow more misleading. To avoid this problem some methodologist recommend having at least 30 subjects for each predictor variable in the equation. Others recommend that, for estimates of R22 be stable and generalizable, as many as 300 subjects are needed for each predictor (Kerlinger and Pedhauzur, 1973) . (Smith, M. L. & Glass, G. V., 1987, p216)

factor analysis is an advanced variety of correlational study….the first step of factor analysis is called Factor extraction... the next step is Factor rotation - the factors are manipulated mathematically to reduce the ambiguity of factor loadings. the goal is to enhance the correlation of variables with the factors they load most highly on and reduce the correlation of the variables with the other factors. (Smith, M. L. & Glass, G. V., 1987, p 219-220)

When reading correlational studies you should consider the following questions: 1.) has the sample been chosen to represent a defined population? Or have the characteristics of the sample been described sufficiently so that a judgment of generalization can be made? 2.) Is the sample large enough not only to yield stable bivariate correlation if the offset problems of capitalization on chance? Is there sufficient variability in the sample? 3.) Have the variables been measured with adequate reliability and validity? 4.) Have scatterplots been examined to rule out Curvilinear relationships between variables? 5.) Correct correlational statistics? 6.) In multiple regression studies has a shrinkage correction been applied or cross-validation performed? 7.) Has the author inappropriately interpreted the meaning of significant and nonsignificant regression coefficients? 8.) Has the author committed the correlation causation fallacy in interpreting the results? (Smith, M. L. & Glass, G. V., 1987, p 221-222).

The Science of Opinion: Survey Methods in Research (Link) (Return to Index)

Bryson et al. (2012). The Science of Opinion: Survey Methods in Research, Canadian Journal of Anesthesiology, Springer, pp. 736-738.

With the advent of affordable and user-friendly online survey tools, the means to design questionnaires with which to evaluate patients, trainees, and peers are within reach of anyone with internet access. However, to ensure that the survey research results are valid and informative, a survey must conform to the scientific methods of the field. A biased questionnaire sent to a non-representative sample will not yield useful information no matter how easy or inexpensive it may be to administer. (Bryson et. al., 2012, p736)

The United Nations defines a survey as ‘‘an investigation about the characteristics of a given population by means of collecting data from a sample of that population and estimating their characteristics through the systematic use of statistical methodology’’.1 Under this broad definition, surveys may be either descriptive or analytical. (Bryson et. al., 2012, p736)

Surveys, like all other forms of research, must begin with a review of the relevant background literature; they must describe the rationale and importance for the research question and substantiate its relevance, and they should define the objectives (primary and secondary) of the study. A detailed statement of the research question is essential to ensure that the items included in the survey address the relevant domains or themes of the question (Bryson et. al., 2012, p736)

, membership lists of medical or professional societies or associations are often used as the sampling frame in surveys of health professionals. Unless membership in these societies or associations is compulsory, the sampling frame may not be entirely representative of the target population and may result in a coverage error. (Bryson et. al., 2012, p737)

Once a sampling frame has been assembled, the investigators must determine whether to conduct a census in which all units of the target population are measured or in which a sample, or subset, of the target population is measured. Sampling is a decision that is typically based on the resources available to the investigators. (Bryson et. al., 2012, p737)

A non-response error occurs if the data from non-respondents are systematically different from those of respondents. A high non-response rate may increase the cost of conducting a survey and cast doubt on the representativeness of its results. Many actions can be taken to increase the response rate of a survey, including introduction letters, reminders, and the use of different survey modes (both internet and mailed questionnaire). (Bryson et. al., 2012, p737)

Failure to complete the questionnaire is a neglected source of non-response error and can arise when the questionnaire is too long, the questions are confusing, or the subject matter is trivial. A respondent’s questionnaire is normally considered completed when at least 80% of the questions are answered.4 Dillman et al. consider survey response a social exchange driven by the respondent’s altruism and the trustworthiness of the investigator.3 I (Bryson et. al., 2012, p737)

Regrettably, many investigators spend little effort developing their questionnaire; questions may not represent the content they were intended to evaluate and/or the questions may be unclear and their responses variable. (Bryson et. al., 2012, p737)

Systematic reviews, focus groups, and expert opinions should be used to create a list of potential concepts and themes relevant to the research question. (Bryson et. al., 2012, p738)

The survey questions should be pre-tested in individuals similar to the sampling frame to ensure that questions are relevant and consistently interpreted. After making any specified changes, the questionnaire as a whole should be pilot-tested to determine ease of use, narrative flow, and time required. Pilot testing will reveal problems with wording, order, and presentation of questions that might cause respondents to provide inaccurate responses. (Bryson et. al., 2012, p738)

This structured process, from item selection to clinical sensibility testing, is essential to identify and minimize measurement bias. The review by Burns et al. 5 offers additional information on the characteristics informing both the validity and reliability of a questionnaire.(Bryson et. al., 2012, p738)

In general, the following features of a well-developed survey should be clearly described: 1) the research question; 2) details of the target population and study sample; 3) methods used to (a) develop the survey and measure its validity and reliability, (b) calculate the sample size, (c) administer and follow up on the survey, and (d) analyze the data; 4) the results of the survey and interpretation; and 5) the conclusions that may be drawn directly from the results. (Bryson et. al., 2012, p738)

Experimental studies in M. L. Smith and G. V Glass, Â (Link) (Return to Index)

Smith, M. L. & Glass, G. V. (1987). Experimental studies in M. L. Smith and G. V Glass, Research and Evaluation in Education and the Social Sciences, pp. 124-157, Needham Heights, MA: Allyn and Bacon.

The dominant position amon research methodologists is that to establish the claim that one variable was the cause of another, three conditions must be met. First, a statistical relationship between the two variables must be demonstrated. Second, the presumed cause must occur before the presumed effect. Third, all other possible causes of the presumed effect must be ruled out. These three conditions… represent the canons of evidence necessary for establishing cause-and-effect claims in science. (Smith & Glass, 1987, p. 125)

If the researcher hypothesized that gender is the cause of differences in math achievement, it is obvious that the condition of temporal sequence is met, since a person acquires gender before going to school. (Smith & Glass, 1987, p. 126)

The significance test rules out chance as an explanation of the difference in means and meets the condition of statistical relationship between independent and dependent variables. (Smith & Glass, 1987, p. 126)

Methodologists Donald T. Campbell and Julian C. Stanley (1963) used the term internal validity to refer to the extent to which one could claim that the independent variable was responsible for or caused the dependent variable. The greater the internal validity of a particular study, the more credible is the causal claim. The term threats to internal validity stands for rival hypotheses or alternative explanations to the researcher's hypothesis that changes in the independent variable were the cause of changes in the dependent variable. Specific threats include history, maturation, testing,instrumentation, nonequivalence, regression, and mortality. (Smith & Glass, 1987, p. 127)

CONFOUNDED INTERNAL THREAT TO VALIDITY Just because a threat to internal validity exists in a study, we cannot assume that the research hypothesis is false. The correct interpretation is that either of these alternative hypothesis ( and some other possible causes as well) must be considered plausible. We can say that the hypothesized cause (the program) and the alternative cause (the movie) are confounded or mixed up inextricably. (Smith & Glass, 1987, p. 128)

MATURATION THREAT The maturation threat to internal validity coes about when certain events internal to the research subjects may be responsible for the differences on the dependent variable. These internal events consist of physiological or psychological development that occurs naturally through the course of time, or as the subject grows older, more coordinated, fatigued, bored, and the like. (Smith & Glass, 1987, p. 128)

TESTING THREAT PRACTICE EFFECT Considerable research shows that subjects learn something merely from taking the pretest…. This practice effect resulting in higher posttest scores, even when the intervening treatment itself has no effect on the outcome variable. (Smith & Glass, 1987, p. 128)

INSTRUMENTATION THREAT The instrumentation threat occurs when the method of measuring the dependent variable changes form one group or time to the next. (Smith & Glass, 1987, p. 129)

NONEQUIVALENCE THREAT The nonequivalence threat is any subject characteristic that makes the groups compared unequal in any respect other than the treatment. …. This bias in the composition of the groups and the higher levels of verbal ability in one group could be the true cause of the observe posttest differences. … so it would be the nonequivalence of the subjects assigned to the groups, rather than the differences among treatments that was responsible for the observed posttest differences in means. (Smith & Glass, 1987, p. 130)

REGRESSION THREAT The threat to internal validity known as regression occurs only when the subject in the study are chosen because of their extreme position on some variable… a statistical artifact of studies using such extreme groups makes it inevitable that individuals will appear less extreme on a second measure ( not perfectly correlated with the first variable). This movement in the direction of the mean of the group may be mistaken for a treatment effect. (Smith & Glass, 1987, p. 130-131)

MORTALITY THREAT … those subject who compete the study may have different characteristics from those who drop out could threaten the internal validity of the study. This threat is know as mortality, or attrition. (Smith & Glass, 1987, p. 131)

Threats to internal validity, or alternative explanations, can be ruled out by several means. The first is by logical argument. Take the example of a study that includes a posttest given two years after the pretest. One can argue and accumulate evidence that the practice effect is substantially reduced if there is a lengthy interval between the administrations of the test. (Smith & Glass, 1987, p. 133)

TYPES OF RESEARCH When research hypothesis suggests a cause-and-effect relationship, the researcher commonly chooses among three varieties of design to test the hypothesis. True experiments are those in which the independent variable is a treatment that the researcher deliberately introduced and manipulates, and the resear’s control extends to the ability to assign subjects at random to the levels of the independent variable. Quasi experiments… are studies in which the research has only partial control of the independent variables… in causal-comparative studies, also call ex post facto studies, the researcher does not have control over the independent variables. Either the independent variable has occurred in the past, prior to the study, or tsubjects have assigned themselves to the various “treatment conditions”, or the independent variable is some fixed characteristic of the subjects. (Smith & Glass, 1987, p. 136)

The most familiar case of the true experiment is the randomized, pretest, post test, control group design. Subjects are selected by either random or non-random methods. They are assigned at random two or more treatment conditions and are pre-tested. (Smith & Glass, 1987, p. 142)

The second variety of true experiment is the same as the first accept that the pre-test is eliminated. This is called you randomized, post test only, control group design.(Smith & Glass, 1987, p. 142)

The third variety is the Solomon four group design, would test the effect of the pretest on the subject subsequent performance on the post test. (Smith & Glass, 1987, p. 142)

The fourth variety of true experiment is the factorial experiment with more than one independent variable. (Smith & Glass, 1987, p. 142)

The fifth variety of true experiment involves one or more independent variables plus one or more moderator variables. A moderator variable is a subject ask, or method characteristic that can be used as a factor in an experiment along with the independent variable of interest. The moderator variable is systematically varied to determine whether it interacts with the independent variable. (Smith & Glass, 1987, p. 143)

EXTERNAL VALIDITY In an attempt to establish cause-and-effect relationships, the researcher is confronted with this task colon to observe a statistical difference between two or more groups on the dependent variable and to infer that that difference was caused by the independent variable. This is the issue of internal validity. However having established that the difference is due to the treatment, the researcher is confronted with another task colon to interfere that the effect observe in the experiment what also be observed in broader context. This is external validity. (Smith & Glass, 1987, p. 143-144)

EXTERNAL VALIDITY The method of random selection Insurance population external validity from the sample to the population from which the sample was chosen. The population from which the sample is drawn has a special name, the accessible population, to distinguish it from the target population, which is an ideal group to which the researcher would like the findings to apply. (Smith & Glass, 1987, p. 144)

If the researcher has sampled at random from an accessible popu

latton, generahzmg from the sample to the accessible population is basec

on probability and statistics.(Smith & Glass, 1987, p. 145)

Traditional and modern concepts of validity. (Link) (Return to Index)

Brualdi, A (1999). Traditional and modern concepts of validity. Eric Clearinghouse on Assessment and Evaluation, Washington, D.C.

Test validity refers to the degree with which the inferences based on test scores are meaningful, useful, and appropriate. Thus test validity is a characteristic of a test when it is administered to a particular population. Validating a test refers to accumulating empirical data and logical arguments to show that the inferences are indeed appropriate. (Brualdi, 1999, p 1)

Traditionally, the various means of accumulating validity evidence have been grouped into three categories -- content-related, criterion-related, and construct-related evidence of validity. These broad categories are a convenient way to organize and discuss validity evidence. There are no rigorous distinctions between them; they are not distinct types of validity. Evidence normally identified with the criterion-related or content-related categories, for example, may also be relevant in the construct-related evidence (Brualdi, 1999, p 1)

Messick (1989, 1996a, 1996b) argues that the traditional conception of validity is fragmented and incomplete especially because it fails to take into account both evidence of the value implications of score meaning as a basis for action and the social consequences of score use. His modern approach views validity as a unified concept which places a heavier emphasis on how a test is used. Six distinguishable aspects of validity are highlighted as a means of addressing central issues implicit in the notion of validity as a unified concept. In effect, these six aspects conjointly function as general validity criteria or standards for all educational and psychological measurement. These six aspects must be viewed as interdependent and complementary forms of validity evidence and not viewed as separate and substitutable validity types. (Brualdi, 1999, p 1)

Validity and reliability, in J. R. Fraenkel and N. E. Wallen, How to design and evaluate research in education with PowerWeb (Link) (Link) (Return to Index)

Fraenkel, J. R. & Wallen, N. E. (2005). Validity and reliability, in J. R. Fraenkel and N. E. Wallen, How to design and evaluate research in education with PowerWeb, pp. 152-171, Hightstown, NJ: McGraw Hill Publishing Co.

The quality of the instruments used in research is very important, for the conclusions researchers draw are based on the information they obtain using these instruments. (Fraenkel & Wallen, 2005, p. 152)

In recent years, validity has been defined as referring to the approPriateness, meaningfulness, and usefulness of the specific inferences researchers make based on the data they collect. Validation is the process of collecting evidence to support such inferences. (Fraenkel & Wallen, 2005, p. 153)

Validity is the most important idea to consider when preparing or selecting an instrument for use. (Fraenkel & Wallen, 2005, p. 153)

A meaningful inference is one that says something about the meaning of the information (such as test scores) obtained through the use of an instrument. (Fraenkel & Wallen, 2005, p. 154)

VALIDITY Content-related evidence of validity: Refers to the content and format of the instrument. How appropriate is the content~ How comprehensive? Does it logically get at the intended variable? How adequately does the sample of items or questions represent the content to be assessed? Is the format appropriate? The content and format must be consistent with the definition of the variable and the sample of subjects to be measured. Criterion-related evidence of validity: Refers to the relationship between scores obtained using the instrument and scores obtained using one or more other instruments or measures (often called a criterion). How strong is this relationship? How well do such scores estimate present or predict future performance of a certain type? Construct-related evidence of validity: Refers to the nature of the psychological construct or characteristic being measured by the instrument. How well does this construct explain differences in the behavior of individuals or their performance on certain tasks? We provide further explanation of this rather complex concept later in the chapter. (Fraenkel & Wallen, 2005, p. 154)

VALIDITY A criterion is a second test or other assessment procedure presumed to measure the same variable. (Fraenkel & Wallen, 2005, p. 157)

VALIDITY There are two forms of criterion-related validity-predictive and concurrent. To obtain evidence of predictive validity, researchers allow a time interval to elapse between administration of the instrument and obtaining the criterion scores. (Fraenkel & Wallen, 2005, p. 157)

VALIDITY On the other hand, when instrument data and criterion data are gathered at nearly the same time, and the results compared, this is an attempt by researchers to obtain evidence of concurrent validity. (Fraenkel & Wallen, 2005, p. 157)

RELIABILITY Reliability refers to the consistency of the scores obtained-how consistent they are for each individual from one administration of an instrument to another and from one set of items to another. (Fraenkel & Wallen, 2005, p. 160)

The scores obtained from an instrument can be quite reliable, but not valid. (Fraenkel & Wallen, 2005, p. 160)

Since errors of measurement are al'A1ays present to some degree, researchers expect son1e variation in test scores (in answers or ratings 1 for ·example) when an instrument is administered to the same group more than once, when two different forms of an instrument are used, or even from one part of an instrument to another. (Fraenkel & Wallen, 2005, p. 160)

VALIDITY a validity coefficient expresses the relationship that exists between scores of the same individuals on two different instruments. A reliability coefficient also expresses a relationship, but this time it is between scores of the same individuals on the same instrument at two different times, or between two parts of the same instrument. (Fraenkel & Wallen, 2005, p. 160)

INTERNAL CONSISTENCY Perhaps the most frequently employed method for determining internal consistency is the dude Richardson approach, particularly formulas KR20 and KR21. The latter formula requires only three pieces of information-the number of items in the test, the mean, and the standard of deviation. (Fraenkel & Wallen, 2005, p. 163)

RELIABILITY VALIDITY What this means is that there is no substitute for checking reliability and validity as a part of the research procedure. There is seldom any excuse for failing to check internal consistency, since the necessary information is at hand and no additional data collection is required. Reliability over time does, in most cases, require an additional administration of an instrument, but tl1is can often be done (Fraenkel & Wallen, 2005, p. 167)

Questionnaire survey research: What works (Link) (Return to Index)

Suskie, L. A. (1996.) Questionnaire survey research: What works. (2nd ed.). Tallahassee, FL: The Association for Institutional Research.

SURVEYS Before designing your survey it is essential that you first identify your objectives, or the reason why you are conducting the survey. The following questions will help you to clarify your objectives to the survey. Consider and make note of the answers to these questions as you begin to develop your survey: (Suskie, 1996, p. 1)

SURVEYS The questions that you include on your survey should always be guided by your objectives. This will help to ensure that you gather quality data and are able to address both your needs and the needs of your audience. Gathering quality data is also dependent upon the quality of the questions that you have constructed. Good questions have the following characteristics: (Suskie, 1996, p. 1)

SURVEYS A respondent’s time is precious. Long questions with unnecessary details take up too much time and can cause respondents to lose interest. On the other hand, your questions should also be specific enough to be clear and unambiguous. To achieve this, write questions that get to the point as quickly as possible in as few words as possible. (Suskie, 1996, p. 1)

SURVEYS If questions are biased or leading in any way, they will steer a respondent toward the response that is considered socially desirable. This can occur either in the question itself or by limiting the types of response options that you provide for the respondent. (Suskie, 1996, p. 1)

SURVEYS Asking respondents about sensitive topics can make them uncomfortable or embarrassed and should be avoided. Certain types of questions, however, are sensitive to some respondents but are necessary for the objectives of your survey. For instance, many demographic questions, such as race, income, education level, are of a private nature and not all respondents will wish to divulge this information. In these cases, place the questions at the end of the survey and provide a response option that allows refusal, such as “Decline to state.” There may also be times when the very nature of your research question may be a sensitive topic, for example, substance abuse. In these cases, to minimize the impact your questions may have on a respondent provide information on resources and seek the approval of the IRB (Suskie, 1996, p. 2)

SURVEYS There are a wide variety of response options available to you. The response option you choose, however, should always be based on the objectives of the question and survey. Below is a list of common response options and things to consider when using them in your survey. ((Suskie, 1996, p. 2)

SURVEYS The format of your survey can have a great impact on your response rate. Poorly organized surveys run the risk of respondents losing interest, becoming confused, or refusing to participate. (Suskie, 1996, p. 4)

SURVEYS Even the best survey designer is bound to write a question that seems clear to them, but may be confusing to others. Testing your survey before implementing it on a large scale will help you to improve your survey so that you are better able to obtain informative and useful information. Here are a few things to keep in mind when conducting a pilot test: (Suskie, 1996, p. 4)

Survey fundamentals: A guide to designing and implementing surveys (Link) (Return to Index)

Thayer-Hart, N., Dykema, J., Elver, K., Schaeffer, N. C., Stevenson, J. (2010). Survey fundamentals: A guide to designing and implementing surveys. Madison, Wisconsin: University of Wisconsin Survey Center.

SURVEYS The first step in planning for your survey is to determine the goal and how you will use the information. The usual goal of a survey is to describe a population, for example, the group of people using your services. Another common goal that may require a very different study design is to make comparisons between groups, such as comparisons between those using your services and those who do not. Often surveys are used simply to get feedback from a specific group of people and/or learn about their wants and needs. (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 4)

SURVEYS Writing good survey questions requires keeping the goal of the survey firmly in mind and then formulating each question from the perspective of the respondent. It may be tempting to ask questions simply because it would be interesting to know the answers, but if they are not essential to the goal of the survey, such questions can actually detract from your survey results. Unnecessary questions distract respondents or cause confusion. (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 6)

SURVEYS Avoid using complex words, technical terms, jargon, and phrases that are difficult to understand. Instead, use language that is commonly used by the respondents (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 8)

SURVEYS Maintaining a parallel structure for all questions immensely improves respondents’ ability to comprehend and effectively respond to the survey. Use the same words or phrases to refer to the same concepts (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 9)

SURVEYS With self-administered questionnaires, it is especially valuable to provide a context for the survey as a whole – how the data will be used, whether responses are anonymous, etc. The earlier and more effectively this is done, the less likely people will be to dismiss the survey before they even start responding. (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 11)

SURVEYS Incentives are often used to maximize the response rate. Mailed surveys or advance letters are often accompanied by a crisp bill. When it is provided before the survey is taken, even a small denomination encourages the respondent to complete and return the survey. Prize drawings for a gift certificate are popular incentives for completing a web survey, however the evidence about their potential effectiveness in gaining participation is mixed at best. (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 14)

SURVEYS A consistent process for organizing and analyzing survey data should be established and clearly documented well ahead of receiving the first responses, with everyone involved receiving ample training (Thayer-Hart, Elver, Schaeffer & Stevenson, 2010, p. 15)

A step-by-step guide to developing effective questionnaires and survey procedures for program evaluation & research. (Link) (Return to Index)

Diem, K. G. (2004). A step-by-step guide to developing effective questionnaires and survey procedures for program evaluation & research. Rutgers Cooperative Research and Extension Fact Sheet, New Brunswick, New Jersey: Rutgers University.

RELIABILITY Reliability is a measure of how consistent the results of using a measurement instrument (e.g., a test, questionnaire) will be. Reducing “random” error in questionnaires by removing “quirky” questions or changing their arrangement improves reliability. (Diem, 2004, p. 5)

Tipsheet: Question wording. (Link) (Return to Index)

Miller, P. R. (n.d.). Tipsheet: Question wording. Duke Initiative on Survey Methodology, Durham, North Carolina: Duke University, Retrieved August 2016 from http://www.dism.ssri.duke.edu/pdfs/Tipsheet%20-%20Question%20Wording.pdf

SURVEYS Technical terms may be appropriate when surveying experts, but they are often inappropriate for general population surveys. (Miller, Question Wording, n.d., p. 4)

Tipsheet: Improving response scales (Link) (Return to Index)

Miller, P. R. (n.d.). Tipsheet: Improving response scales. Duke Initiative on Survey Methodology, Durham, North Carolina: Duke University, Retrieved August 2016 from http://www.dism.ssri.duke.edu/pdfs/Tipsheet%20-%20Question%20Wording.pdf

SURVEYS As important as it is to have good question wording in a survey, it is equally important to have well constructed and reliable question response options. Researchers should first consider whether open- or closed-ended questions are more appropriate to gather the desired data. Surveys typically rely more on closed-ended questions, so researchers need to be thoughtful in how they construct closed response options. (Miller, Improve Response Scales, n.d., p. 1)

SURVEYS Researchers should think carefully about giving respondents the option to answer “don’t know” (DK) or “unsure.” Survey analysts typically treat these responses as missing data, resulting in fewer cases for analyses. As respondents tend to choose DK/unsure for systematic reasons, the exclusion of these cases tends to affect analyses significantly (Miller, Improve Response Scales, n.d., p. 2)

SURVEYS DK selection rates are greater in certain survey modes. Rates tend to be highest in self administered formats such as web and mail surveys that lack a human interviewer. DK rates tend to be lower in formats like face-to-face and phone where a live interviewer can probe respondents for an answer if they initially choose DK. (Miller, Improve Response Scales, n.d., p. 4)

SURVEYS The pros and cons of midpoints are similar to those of DK/Unsure. It may be good for data quality in some instances because it best reflects the attitudes of some respondents. On the other hand, offering a midpoint invites respondents to choose it, and they usually will in significant numbers. Respondents who are ambivalent or who have weak opinions may choose the midpoint, but might easily select a more substantive category were they forced to do so.) Miller, Improve Response Scales, n.d., p. 5)

SURVEYS A good rule of thumb is that unless a ranking question is very simple or unless respondents can be forced to answer correctly (e.g. in a web survey the respondent cannot proceed with the survey unless options are ranked in a certain way), then consider alternative question formats. (Miller, Improve Response Scales, n.d., p. 6)

SURVEYS Researchers should avoid asking respondents to use negative numbers in their rankings as the negative sign exaggerates positive reports. The negative sign is a signal of negativity that turns off many respondents. For example, respondents will rank their health much more positively on a scale from -5 to 5 than on a scale from 0 to 10. (Miller, Improve Response Scales, n.d., p. 7)

SURVEYS Always try to anticipate the answers that significant percentages of respondents would choose, and include those as response options. It may also be worth offering another” response that lets respondents fill in the blank next to it with their preferred answer. Researchers should always pretest questions to ensure that they are not missing a major or obvious response option. (Miller, Improve Response Scales, n.d., p. 8)

SURVEYS Respondents tend to choose either the first option on a list or the last. These biases are respectively called primacy and recency effects. They are especially problematic with nominal scales in which response options that do not fit in any particular order are listed (e.g. What is your favorite soda? Coke, Pepsi, Sprite, Mountain Dew, Sunkist; these sodas could be listed in any order unlike the points on a Likert scale). (Miller, Improve Response Scales, n.d., p. 9)

SURVEYS The labels of scale endpoints tend to affect how respondents perceive the rest of the scale, something called an “anchoring effect.” Endpoints labeled with more extreme language tend to turn off respondents, causing them to choose responses closer to the middle of the scale. (Miller, Improve Response Scales, n.d., p. 9)

Mixed methods sampling: A typology with examples. (Link) (Return to Index)

Teddlie, C., & Yu, F. (2007). Mixed methods sampling: A typology with examples.Journal of Mixed Methods Research, 1(77), 77-100. doi:10.1177/2345678906292430

MIXED METHODS SAMPLING MM sampling strategies involve the selection of units1 or cases for a research study using both probability sampling (to increase external validity) and purposive sampling strategies (to increase transferability).2 This fourth general sampling category has been discussed infrequently in the research literature (e.g., Collins, Onwuegbuzie, & Jiao, 2006; Kemper, Stringfield, & Teddlie, 2003), although numerous examples of it exist throughout the behavioral and social sciences. (Teddie & Yu, 2007, p.78)

MIXED METHODS SAMPLING Purposive sampling techniques have also been referred to as nonprobability sampling or purposeful sampling or ‘‘qualitative sampling.’’ As noted above, purposive sampling techniques involve selecting certain units or cases ‘‘based on a specific purpose rather than randomly’’ (Tashakkori & Teddlie, 2003a, p. 713). (Teddie & Yu, 2007, p.80)

MIXED METHODS SAMPLING a purposive sample is typically designed to pick a small number of cases that will yield the most information about a particular phenomenon, whereas a probability sample is planned to select a large number of cases that are collectively representative of the population of interest. There is a classic methodological trade-off involved in the sample size difference between the two techniques: Purposive sampling leads to greater depth of information from a smaller number of carefully selected cases, whereas probability sampling leads to greater breadth of information from a larger number of units selected to be representative of the population (e.g., Patton, 2002) (Teddie & Yu, 2007, p.83)

MIXED METHODS SAMPLING MM sampling strategies may employ all the probability and purposive techniques discussed earlier in this article. Indeed, the researcher’s ability to creatively combine these techniques in answering a study’s questions is one of the defining characteristics of MM research. Teddie & Yu, 2007, p.85)

MIXED METHODS SAMPLING An important sample size issue in QUAL research involves saturation of information (e.g., Glaser & Strauss, 1967; Strauss & Corbin, 1998).6 For example, in focus group studies the new information gained from conducting another session typically decreases as more sessions are held. Krueger and Casey (2000) expressed this guideline as follows: The rule of thumb is, plan three or four focus groups with any one type of participant. Once you have conducted these, determine if you have reached saturation. Saturation is a term used to describe the point when you have heard the range of ideas and aren’t getting new information. If you were still getting new information after three or four groups, you would conduct more groups. (p. 26) (Teddie & Yu, 2007, p.87)

MIXED METHODS SAMPLING Sequential QUANQUAL sampling is the most common technique that we have encountered in our exploration of the MM literature, as described by Kemper et al. (2003): In sequential mixed models studies, information from the first sample (typically derived from a probability sampling procedure) is often required to draw the second sample (typically derived from a purposive sampling procedure). (p. 284) (Teddie & Yu, 2007, p.89)

MIXED METHODS SAMPLING There are examples of QUAN-QUAL and QUAL-QUAN MM sampling procedures throughout the social and behavioral sciences. Typically, the methodology and results from the first strand inform the methodology employed in the second strand.9 In our examination of the literature, we found more examples of QUAN-QUAL studies in which the methodology and/or results from the QUAN strand influenced the methodology subsequently employed in the QUAL strand. In many of these cases, the final sample used in the QUAN strand was then used as the sampling frame for the subsequent QUAL strand. In these studies, the QUAL strand used a subsample of the QUAN sample (Teddie & Yu, 2007, p.89)

Likert scales, levels of measurement and the ‘‘laws’’ of statistics. (Link) (Return to Index)

Norman, G. (2010). Likert scales, levels of measurement and the ‘‘laws’’ of statistics. Advances in Health Science Education, 15(5), 625–632. doi:10.1007/s10459-010-9222-y

One recurrent frustration in conducting research in health sciences is dealing with the reviewer who decides to take issue with the statistical methods employed. Researchers do occasionally commit egregious errors, usually the multiple test phenomenon associated with data—dredging. But this is rarely the basis of reviewer’s challenges. As Bacchetti (2002) has pointed out, many of these comments are unfounded or wrong, and appear to result from a review culture that encourages ‘‘overvaluation of criticism for its own sake, inappropriate statistical dogmatism’’, and is subject to ‘‘time pressure, and lack of rewards for good peer reviewing’’. (Norman, 2010, p. 625)

These issues are particularly germane to educational research because so many of our studies involve rating scales of one kind or another and virtually all rating scales involve variants on the 7 point Likert scale. It does not take a lot of thought to recognize that Likert scales are ordinal. (Norman, 2010, p. 625)

ROBUSTNESS But what is left unsaid is how much it increases the chance of an erroneous conclusion. This is what statisticians call ‘‘robustness’’, the extent to which the test will give the right answer even when assumptions are violated. And if it doesn’t increase the chance very much (or not at all), then we can press on. (Norman, 2010, p. 625)

ROBUSTNESS If Jamieson and others are right and we cannot use parametric methods on Likert scale data, and we have to prove that our data are exactly normally distributed, then we can effectively trash about 75% of our research on educational, health status and quality of life assessment (as pointed out by one editor in dismissing one of the reviewer comments above). (Norman, 2010, p. 627)

ROBUSTNESS ANOVA Nowhere in the assumptions of parametric statistics is there any restriction on sample size. It is simply not true, for example, that ANOVA can only be used for large samples, and one should use a t test for smaller samples. ANOVA and t tests are based on the same assumptions; for two groups the F test from the ANOVA is the square of the t test. Nor is it the case that below some magical sample size, one should use non-parametric statistics. Nowhere is there any evidence that non-parametric tests are more appropriate than parametric tests when sample sizes get smaller. (Norman, 2010, p. 627)

SAMPLE SIZE sample size is not unimportant. It may be an issue in the use of statistics for a number of reasons unrelated to the choice of test: (a) With too small a sample, external validity is a concern. It is difficult to argue that 2 physicians or 3 nursing students are representative of anything (qualitative research notwithstanding). But this is an issue of judgment, not statistics. (b) As we will see in the next section, when the sample size is small, there may be concern about the distributions (see next section). However, it turns out that the demarcation is about 5 per group. And the issue is not that one cannot do the test, but rather that one might begin to worry about the robustness of the test. (c) Of course, small samples require larger effects to achieve statistical significance. But to say, as one reviewer said above, ‘‘Given the small number of participants in each group, can the authors claim statistical significance?’’, simply reveals a lack of understanding. If it’s significant, it’s significant. A small sample size makes the hurdle higher, but if you’ve cleared it, you’re there (Norman, 2010, p. 628)

SAMPLING SIZE [Refuting the argument:] You can’t use t tests and ANOVA because the data are not normally distributed This is likely one of the most prevalent myths. We all see the pretty bell curves used to illustrate z tests, t tests and the like in statistics books, and we learn that ‘‘parametric tests are based on the assumption of normality’’. Regrettably, we forget the last part of the sentence. For the standard t tests ANOVAs, and so on, it is the assumption of normality of the distribution of means, not of the data. The Central Limit Theorem shows that, for sample sizes greater than 5 or 10 per group, the means are approximately normally distributed regardless of the original distribution. (Norman, 2010, p. 629)

Likert scales: A marketing perspective. (Link) (Return to Index)

Edmondson, D. R., Edwards, Y. D., & Boyer, S. L. (2012). Likert scales: A marketing perspective. International Journal of Business, Marketing, and Decision Sciences, 5(2), 73-85.

LIKERT SCALES However, operationalization of the Likert scale has not been codified for researchers and practitioners. Lack of a standard procedure stems from confusion over whether the Likert scale is ordinal (non metric) or interval (metric) in nature. (Edmonson, Edwards & Boyer, 2012, p. 73)

LIKERT SCALES While the Likert scale is truly ordinal in nature, it is assumed to be on an interval scale with which statistical properties such as the mean can be justifiably used. (Edmonson, Edwards & Boyer, 2012, p. 76)

LIKERT SCALES Tl1e original scale is ordinal in nature rather than a true interval scale; therefore, using means and standard deviations are incorrect (Martilla & Carvey, 1975) and instead only the median or n1ode are appropriate measures of the central tendency of the data. Range, percentiles or interquartile range should be used to describe variability in the data. (Edmonson, Edwards & Boyer, 2012, p. 76)

Constructing a valid and reliable Likert scale is not so easy. ln order to create a psychometrically defensible Likert scale, Likert (1932) provided the necessary steps in an appendix. These steps include: making all statements expressions of desired behavior; making statements clear, concise, and straightforward; have each statement worded in such a manner that a 111odal reaction occurs; and have approximately half of the statements on the upper end and half on the lower end. of the scale. Failure to complete these steps can lead to severe issues with the created scale. However since these steps are barely mentioned in most marketing research textbooks, it is easy to understand why practitioners do not perform those additional important scale validity and reliability step (Edmonson, Edwards & Boyer, 2012, p. 83)

Likert scales: How to (ab)use them (Link) (Return to Index)

Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12), 1212–1218. doi:10.1111/j.1365-2929.2004.02012.x

LIKERT SCALES Likert scales fall within the ordinal level of measurement.2–4 That is, the response categories have a rank order, but the intervals between values cannot be presumed equal, although, as Blaikie 3 points out, ...researchers frequently assume that they are. (Jamieson, 2004, p. 1217)

LIKERT SCALES Methodological and statistical texts are clear that for ordinal data one should employ the median or mode as the measure of central tendency 5 because the arithmetical manipulations required to calculate the mean (and standard deviation) are inappropriate for ordinal data,3,5 where the numbers generally represent verbal statements (Jamieson, 2004, p. 1217)

LIKERT SCALES Standard texts also advise that the appropriate inferential statistics for ordinal data are those employing non-parametric tests, such as chi squared, Spearman’s Rho, or the Mann–Whitney U-test 1 because parametric tests require data of interval or ratio level.2,5 (Jamieson, 2004, p. 1217)

LIKERT SCALES 0 issues such as levels of measurement and appropriateness of mean, standard deviation and parametric statistics should be considered at the design stage and must be addressed by authors when they discuss their chosen methodology (Jamieson, 2004, p. 1218)

An introduction to inferential statistics. A review and practical guide. (Link) (Return to Index)

Marshall, G., & Jonker, L. (2010). An introduction to inferential statistics. A review and practical guide. Radiography, 17(1), e1-e6. doi:10.1016/j.radi.2009.12.006

INFERENTIAL STATISTICS They can be applied to compare two or more samples with each other to investigate potential differences and they can also be used for studying the relationship between two or more variables. Such statistics are inferential statistics, which are used to infer from the sample group generalisations that can be applied to a wider population. This allows the detection of large or even small, but important differences, in variables or correlations between variables that are relevant to a particular research question.1e3 (Marshall & Jonker, 2010, p. e2)

INFERENTIAL STATISTICS It is of paramount importance to ensure that the sample that has been selected as representative and this is determined predominantly by an appropriate sampling method. (Marshall & Jonker, 2010, p. e2)

INFERENTIAL STATISTICS Inferential statistics measures the significance, i.e. whether any difference e.g. between two samples is due to chance or a real effect, of a test result. This is represented using p values. The type of test applied to a data set relies on the sort of data analysed, i.e. binary, nominal, ordinal, interval or ratio data; the distribution of the data set (normal or not); and whether a potential difference between samples or a link between variables is studied.Readers can refresh themselves regarding the terms nominal ordinal and interval/ratio data and discrete and continuous variables at this point, by consulting the earlier article in this series. (Marshall & Jonker, 2010, p. e2)

INFERENTIAL STATISTICS A p value is the product of hypothesis testing via various statistical tests and is claimed to be significant (i.e. not due to chance) most commonly when the value is 0.05 or less. The value 0.05 is arbitrary; it is simply a convention amongst statistician that this value is deemed the cut off level for significance.12 (Marshall & Jonker, 2010, p. e3)

INFERENTIAL STATISTICS The p values say nothing about what the size of an effect is or what that effect size is likely to be on the total popUlation. (Marshall & Jonker, 2010, p. e3)

INFERENTIAL STATISTICS The standard error whilst standard errors shrink with increasing sample size, the researcher should be seeking to reach an optimal sample size, rather than the maximal samplesize. Testing more subjects than required in a clinical trial maynot be ethical and in addition it would be a waste of money and resources.2) The mean and the variability, i.e. standard deviation, of the effect being studied e the less variability in the sample, the more precise the estimate in the population and therefore a narrower range.3) The degree of confidence required e the more confident someone wants to be in the obtained results, the higher the confidence interval needs to be. In other words, if a 99%confidence interval is desired then the range will have to be wider, to cover the extra data that needs to be covered over and above the arbitrary 95%, to ensure that it is possible to be more confident that the average for the population (the population mean) lies within it.7,15 (Marshall & Jonker, 2010, p. e3)

INFERENTIAL STATISTICS If data shows a normal distribution, then tests such as one way analysis of variance (ANOVA) and the paired and unpaired t-tests can be applied for ratio and interval data. (Marshall & Jonker, 2010, p. e3)

INFERENTIAL STATISTICS Non-parametric statistical tests are used for binary, ordinal or nominal data, and also for interval and ratio data that is not normally distributed or, in specific instances, does not exhibit equality of variance. (Marshall & Jonker, 2010, p. e3)

(Marshall & Jonker, 2010, p. e3)

INFERENTIAL STATISTICS When selecting a statistical test, the appropriate test to deter-

mine the p value is based on: a) the level of the data e binary, nominal, ordinal, interval/ratio.

b) the number of different groups in the investigation. c) whether the data was collected from independent groups i.e. fifty patients who were hydrated periprocedurally for MRI

contrast agent-enhanced examinations, and fifty other patients who had not have the hydration regime prior to their contrast agent-enhanced examinations. d) the distribution of the data, i.e. are parametric assumptions justified.4 e) if the test designed to investigate a correlation or difference. (Marshall & Jonker, 2010, p. e3-4)

INFERENTIAL STATISTICS The t-test requires the input of interval/ratio data and the assumption that the data is suitable to having parametric statistics data (see above)applied to it, must be satisfied. (Marshall & Jonker, 2010, p. e4)

INFERENTIAL STATISTICS known as Pearson’s Chi square test, shows whether there is a relationship between two variables and is normally used for data suitable for nonparametric statistical testing. A few important assumptions for this test to be suitable for testing data are that the sampling is random, the sample size is large enough and the observations are independent. (Marshall & Jonker, 2010, p. e4)

Inferential statistics. (Link) (Return to Index)

Allua, S., & Thompson, C. B. (2009). Inferential statistics. Air Medical Journal, 28(4), 168-171. doi:10.1016/j.amj.2009.04.013

Hypothesis testing is the method for determining the probability of an observed event that occurs only by Chance. (Allua & Thompson, 2009, p. 108)

Significant difference tests can be used to evaluate differences on one interval or ratio level dependent variable of interest between two groups and three or more groups. (Allua & Thompson, 2009, p. 108)

The independent samples t-test is used to test the statistical significance of the differences in means between two groups (a dichotomous independent variable) on some dependent variable measured at the interval or ratio level. (Allua & Thompson, 2009, p. 108)

The critical t value is determined by the researcher-selected significance level or alpha level (eg, = .05) and the degrees of freedom that represent the conditions under which the t is calculated (related to numbers of subjects, number of groups, and the statistic). (Allua & Thompson, 2009, p. 108)

(Allua & Thompson, 2009, p. 108)

Although ANOVA can be used with two groups, it is most commonly used for independent variables that have three or more groups(possible values for the independent variable). Again, the dependent is assumed to be measured at the interval or ratio level. (Allua & Thompson, 2009, p. 108)

The most common statistic used to describe the relationship (the correlation) between two variables is the Pearson product-moment correlation or Pearson’s r (rp). Pearson’s r is a descriptive statistic when used only to describe a relationship; it is an inferential statistic when used to inferential relationship in the population. (Allua & Thompson, 2009, p. 109)

When data are not measured at the interval or ratio level,other variations of correlations are more appropriate. For Example, the correlation between two ordinal level variables should be analyzed using the Spearman rho correlation coefficient, the nonparametric equivalent to the Pearson r. (Allua & Thompson, 2009, p. 109)

Data analyzed using the chi-squared test statistic, are typically organized into a contingency table. The chi-squared procedure calculates the expected number of observations in each cell of the contingency table and compares them with the number of observations actually occurring in each cell (observed frequencies). The Greater the deviation of the observed frequencies from the expected frequencies, the greater the chance for statistical significance, providing evidence that the variables are related. (Allua & Thompson, 2009, p. 109)

Simple regression analysis is used for a single independent and dependent variable, whereas multiple regression analysis used for multiple independent variables. Multiple regression assumes that the dependent variable is measured at the interval or ratio level. Regression analysis is computationally related to ANOVA but is used for independent variables that are measured at the interval or ratio level, rather than nominal or ordinal level. Interval (Allua & Thompson, 2009, p. 109)

When reporting findings in text, the author must report the names of the independent and dependent variables, the statistical procedure used, the statistic calculated(eg or F), the value of the statistic, appropriate degrees of freedom (df, from the computer print out), and the P value (in relation to alpha or actual value). (Allua & Thompson, 2009, p. 110)

Hypothesis testing (Link) (Return to Index)

Allua, S., & Thompson, C. B. (2009). Hypothesis testing. Air Medical Journal, 28(3),108-153. doi:10.1016/j.amj.2009.03.002

INFERENTIAL STATISTICS Hypothesis testing is the method for determining the

probability of an observed event that occurs only by chance. If chance were not the cause of ab event, then something else must have been the cause, such as the treatment having had an effect on the observed event (the out-come) that was measured. (Allu & Thompson, 2009, p. 108)

INFERENTIAL STATISTICS The significance or alpha level () establishes the proba-

bility that the investigator is willing to accept that he or she has incorrectly rejected the null hypothesis. In other words,given = .05, the investigator is willing to accept that his or her decision to reject the null hypothesis will be wrong 5% of the time (i, say that the treatment significantly decreases mortality when actually it does not). The most common alpha levels are .05 and .01. (Allu & Thompson, 2009, p. 108)

INFERENTIAL STATISTICS You can conclude that there is a difference between the two groups, when there actually is no difference, or you can conclude that there is no difference between the two groups when actually there is a difference(fail to measure a difference). The first error is called a type (or alpha) error, and the second is called a type II (orbeta) error.Type I and II errors (Allu & Thompson, 2009, p. 109)

INFERENTIAL STATISTICS This is particularly important because the investigator cannot know whether they have made either of these errors when interpreting the data. (Allu & Thompson, 2009, p. 109)

INFERENTIAL STATISTICS An additional difficulty is that, as the chance of a type I Error decreases, the chance of a type II error increases and vice versa. (Allu & Thompson, 2009, p. 110).

Hypothesis testing. (Link) (Return to Index)

Pereira, S. M. C., & Leslie, G. (2009). Hypothesis testing. Australian Critical Care, 22(4), 187-191. doi:10.1016/j.aucc.2009.08.003

Hypothesis testing is a statistical tool that provides an objective framework for making decisions using a set of rules (probabilistic methods), rather than relying on subjective impressions. (Pereira & Leslie, 2009, p. 187)

The usual process of hypothesis testing consists of a number of steps. The steps are the formulation of the null hypothesis, selection of the test statistic, choice of the acceptable significance level, computation of the probability of the value from the data, and comparison of computed value with significance level.2

This article explains a formal procedure (Pereira & Leslie, 2009, p. 187)

The statistical hypotheses are defined as conjectures or assertions about a parameter of a population, for instance, about the mean or the variance (variability) of a normal population. The Hypotheses are based on the concept of proof by Contradiction. (Pereira & Leslie, 2009, p. 188)

The null hypothesis, denoted by H0, states that there is no difference in the parameter. H0 is always the hypothesis to be tested. (Pereira & Leslie, 2009, p. 188)

The general aim of hypothesis testing is to use statistical tests that make ̨ and ˇ (the chance of errors) as small as possible. (Pereira & Leslie, 2009, p. 188)

The probability of a type I error is the likelihood of rejecting H0 when the null hypothesis is true. (Pereira & Leslie, 2009, p. 188)

A type II error The probability of a type II error is the likelihood of accepting H0 when the alternative hypothesis is True. (Pereira & Leslie, 2009, p. 188)

The Central Limit Theorem states that for large sample sizes (n ≥ 30) drawn randomly from a population, the distribution of the means of those samples will approximate the normal distribution,even when the data in the parent population is not normally distributed. (Pereira & Leslie, 2009, p. 188)

The decision for rejecting the null hypothesis is based on the following rule. If the computed value(from the sample data) is smaller than ̨ (significance level) then the rule is to reject H0.

Otherwise, if the computed value is greater than or equal to ̨ then the rule is to not reject the null Hypothesis. (Pereira & Leslie, 2009, p. 188)

Understanding statistical hypothesis testing. (Link) (Return to Index)

Ren, D. (2009). Understanding statistical hypothesis testing. Journal of Emergency Nursing, 35(1), 57-59. doi:0.1016/j.jen.2008.09.020

Hypothesis testing is a widely used statistical procedure conducted for the purpose of evaluating evidence to support or refute a proposed hypothesis. It should be noted that hypothesis testing does not directly answer a research question, such as “Is the new drug more effective in reducing pain in adults undergoing procedures in the emergency Department?” (Ren, 2009, p. 57)

The second step is to set the level of significance (usually denoted as α). Traditionally, in scientific research the level of significance is set at 5%, or .05, but this designation is arbitrary. This 5% represents the probability (chance) the researchers are willing to reject the null hypothesis given that the null hypothesis is true. (Ren, 2009, p. 57)

Next, the test statistic is calculated from the sample data.There are different types of test statistics, such as the-test, Student t test, analysis of variance test, and χ2 test. (Ren, 2009, p. 58)

In the last steps the researcher converts the test statistic toa conditional probability, called a P value. The P value is used to measure how much evidence is obtained from the sample data against the null hypothesis or how likely it is that any observed difference between groups is due to chance, under the assumption of the null hypothesis being true. In (Ren, 2009, p. 58)

Understanding descriptive statistics. (Link) (Return to Index)

Fisher, M. J., & Marshall, A. P., (2008). Understanding descriptive statistics. Australian Critical Care, 22(2), 93-97. doi:10.1016/j.aucc.2008.11.003

Statistics incorporates mathematics and logic yetthe concepts are reasonably simple and require an understanding a few key rules and assumptions. We Need to first recognise that statistics deals with Numbers. (Fisher & Marshall, 2008, p. 93)

Nominal (or categorical) level of measurement is the scoring of cases/participants into broad catEgories. (Fisher & Marshall, 2008, p. 94)

Ordinal level of measurement is the scoring of research participants into hierarchically ordered categories. Ordinal level is used for variables that cannot be directly measured, for example pain,satisfaction or anxiety. Likert scales are numeric categories ordered from a low score to a high score. (Fisher & Marshall, 2008, p. 94)

Continuous level data are usually directly measured using infinite scales where the increments on the scale are of equal distance (e.g. weight in grams,pressure in mmHg and volume in millilitres). (Fisher & Marshall, 2008, p. 95)

As nominal level of measurement is the sorting of cases into one of several categories, the measure of dispersion is based on the count or frequency of cases in each category known as the frequency distribution. The measure of central tendency for nominal data is the category with the most frequent number of cases, also known as the mode. (Fisher & Marshall, 2008, p. 95)

At the ordinal level of measurement cases are sorted into one of several categories, were the categories have a numerical hierarchy. Like for nominal data the measure of dispersion can be the frequency distribution of scores (see Text Box 1).As ordinal data are ordered in a hierarchy then all cases can be sorted from the lowest to highest score(rank-ordered distribution). (Fisher & Marshall, 2008, p. 95)

Continuous data can be presented in a graphical form known as a frequency histogram. A frequency histogram is a bar graph representing the number of cases for each score. (Fisher & Marshall, 2008, p. 96)

Descriptive statistics. (Link) (Return to Index)

Shi, R., & McLarty, J. W. (2009). Descriptive statistics. Annals of Allergy, Asthma & Immunology, 103(4), 9-14. doi:10.1016/s1081-1206(10)60815-0

Statistics can be thought of as 2 distinct activities: descriptive statistics and inferential statistics. As its name implies, descriptive statistics describe the characteristics of data. Inferential statistics on the other hand take a more experimental approach to the analysis of data and consist of testing hypotheses about the data and inferring properties of a population from samples. (Shi & McLarty, 2009, p. S9)

The measures of locations are a reflection of the data central tendency, and the measures of variability are a reflection of the data spread or dispersion. The range, percentiles or quartiles, variance, and SD are frequently used statistics for measures of dispersion.

The range is the distance between the lowest and the highest values.The percentiles or quartiles divide the data into parts, for example, the highest third, the middle third, and the lowest third. The 25th and 75th percentiles are also called first and third quartiles, and the median is the 50th percentile.The variance is a measure of the difference of each value from the mean value. Sample variance or variance is defined as follows: if we denote a set of data by X (x1, x2,...,xn),then the sample mean is typically denoted with a horizontal bar over the variable: (Shi & McLarty, 2009, p. S10)

Most statistical tests are based on the distribution of the data,which describes how often certain values occur, the range of values, and the shape of the probability and value curve. This Is necessary to convert ideas and words into probabilities that can be used for statistical decision making. Distributions are based on the probability of the value or range of values of a particular variable. For (Shi & McLarty, 2009, p. S11)

The normal distribution is the most widely used family of distributions in statistics, and many statistical tests are based on the assumption of normality. Fortunately, many variables in nature have a normal distribution; examples include blood pressure, temperature, adult height, spirometry, and many common serum components. (Shi & McLarty, 2009, p. S12)

Pitfalls of Statistical Hypothesis Testing. (Link) (Return to Index)

Sedgwick, P. (2014). Pitfalls of statistical hypothesis testing: Multiple testing. British Medical Journal (BMJ), 349, 1-2. doi:10.1136/bmj.g5310.

Care must be taken when research papers undertake a large number of statistical tests—ultimately some of these will result in a type I error. However, it is not possible to ascertain which significant findings are a type I error. Various approaches have been suggested to control the number of type I errors (the type I error rate) when undertaking multiple testing. The simplest approach is the Bonferroni method. This involves obtaining a new critical level of significance by dividing the traditional one of 0.05 by the number of significance tests performed. The disadvantage of this approach is that it tends to be conservative—that is, it errs on the side on non-significance. However, it does avoid spurious significant results. (Sedgwick, 2014, p. 2)

Understanding statistical testing. (Link) (Return to Index)

Veazie, P. J. (2015). Understanding statistical testing. Sage Open, 5(1), 1-9. doi:10.1177/2158244014567685.

Current concepts of statistical testing can lead to mistaken ideas among researchers such as (a) the raw-scale magnitude of an estimate is relevant, (b) the classic Neyman–Pearson Approach constitutes formal testing, which in its misapplication can lead to mistaking statistical insignificance for evidence of no effect, (c) one-tailed tests are tied to point null hypothesis, (d) one- and two-tailed tests can be arbitrarily selected, (e) two-tailed tests are informative, and (f) power defined intervals or data-specific intervals constitute formal test hypotheses. (Veazie, 2015, p. 1)

An empirical hypothesis is one for which empirical evidence can, in principle, bearon judgments of its truth or falsity. A statistical hypothesis is an empirical hypothesis about distribution parameters of random variables defined by a data generating process. (Veazie, 2015, p. 2.)

The sample mean statistic has a distribution of possible values whereas the mean of a given sample is a number. (Veazie, 2015, p. 2)

Statistical hypothesis testing is common when a researcher wishes to determine a substitute claim. If the truth or falsity of the substantive claim can be identified with the truth or falsity of a statistical hypothesis, then hypothesis testing can be used to inform judgments about the substitute claim. (Veazie, 2015, p. 6-7)

Statistical hypothesis testing is also used with the goal of estimation is of interest only if the parameter is in a particular range of values. ( (Veazie, 2015, p. 7)

Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. (Link) (Return to Index)

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations, European Journal of Epidemiology, 31(4), 337–350. doi:10.1007/s10654-016-0149-3.

ASSUMPTIONS Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these Concepts that are at once simple, intuitive, correct, and foolproof. (Greenland, et. al, 2016, p. 337)

ASSUMPTIONS Every method of statistical inference depends on a complex web of assumptions about how data were collected and analyzed, and how analysis Results were selected for presentation. But full set of assumptions is embodied in a statistical model that underpins the method. This model is a mathematical representation of data variability, and US ideally would capture accurately all sources of such variability. (Greenland, et. al, 2016, p. 338)

ASSUMPTIONS Many problems arise however because the statistical model often incorporates unrealistic or a best unjustified assumptions. (Greenland, et. al, 2016, p. 338)

ASSUMPTIONS The difficulty of understanding and assessing underlying assumptions is exacerbated by the fact that the statistical model is usually presented in a highly compressed and abstract form if presented at all. (Greenland, et. al, 2016, p. 338).

NULL HYPOTHESIS Much the statistical teaching and practice has developed a strong and unhealthy focus on the idea that the main aim of a study should be to test null hypothesis. In fact most descriptions of statistical testing Focus only on testing no hypotheses, and the entire topic has been called “null hypothesis significance testing”. (Greenland, et. al, 2016, p. 339)

P-VALUE A more refined goal of statistical analysis is to provide an evaluation of certainty or uncertainty regarding the size of an effect. It is natural to express such certainty in terms of probabilities of hypotheses (Greenland, et. al, 2016, p. 339) In conventional statistical methods, however, probability refers not to hypotheses, but two quantities that are hypothetical frequencies of data patterns under an assumed statistical model (Greenland, et. al, 2016, p. 399).

P-VALUE Despite considerable training to the contrary, many statistically educated scientist referred to the habit of misinterpreting these frequency probabilities as hypothesis probabilities even more confusing lie, the term likelihood of a parameter value is reserved by staticians to refer to the probability of The observed data given the parameter value it does not refer to a probability of the parameter value taking on the given value. (Greenland, et. al, 2016, p. 339)

P-VALUE The general definition of a P value may help one to understand why statistical tests tell us much less than what many think they do: not only does a P value not tell us whether the hypothesis targeted for testing is true or not; it says nothing specifically related to that hypothesis unless we can be completely sure that every other assumption use for its computation is correct -- an assurance that is lacking in far too many studies. (Greenland, et. al, 2016, p. 339)

P-VALUE The terms “ significance level” and” Alpha level” are often used to refer to the cutoff; however the term” significance level” invites confusion of the cut off with the P value in itself. Their difference is profound: the cut-off value Alpha is supposed to be fixed in advance and is not part of the study design, unchanged in light of the data. in contrast, The p-value is a number computed from the data and that's an analysis result, unknown until it is computed. (Greenland, et. al, 2016, p. 339-340)

P-VALUE Especially when a study is large, very minor effects or small assumption violations can lead to statistically significant tests of the null hypothesis. Again, a small Knoll p-value simply Flags the data as being unusual if the assumed suctions used to compute it, including the null hypothesis, were correct; but the way the data or unusual might be of no clinical interest. One must look at confidence interval to determine which effect sizes of scientific or other subjective importance are relatively compatible with the data, given the model. (Greenland, et. al, 2016, p. 341)

P-VALUE especially when I study is small, even large effects Maybe “ drowned in noise” and US failed to be detected as statistically significant by statistical test. A large Noll p-value simply Flags the data as not being unusual if all the assumptions used to compute it, including the test hypothesis, were correct; but the same data will also not be unusual under many other models and hypotheses besides the null. (Greenland, et. al, 2016, p. 341)

P-VALUE Upon realizing that statistical tests are usually misinterpreted, one may wonder what if anything these tests do for science. They were originally intended to account for random variability as a source of error, thereby sounding a note of caution against overinterpretation of observed associations as true effects or as stronger evidence against null hypotheses than was warranted. But before long that use was turned on its head to provide fallacious support for null hypotheses in the form of ‘‘failure to achieve’’ or‘‘failure to attain’’ statistical significance. (Greenland, et. al, 2016, p. 346)

P-VALUE A shift in emphasis from hypothesis testing to estimation has been promoted as a simple and relatively safe way to improve practice [5, 61, 63, 114, 115] resulting in increasing use of confidence intervals and editorial demands for them;nonetheless, this shift has brought to the fore misinterpretations of intervals such as 19–23 above [116]. Other Approaches combine tests of the null with further calculations involving both null and alternative hypotheses [117,118]; such calculations may, however, may bring with them further misinterpretations similar to those described above for power, as well as greater complexity.Meanwhile, in the hopes of minimizing harms (Greenland, et. al, 2016, p. 347)

Understanding statistical hypothesis testing. (Link) (Return to Index)

Sedgwick, P. (2014). Understanding statistical hypothesis testing. British Medical Journal, 348(), g3557. doi:10.1136/bmj.g3557.

P-VALUE Statistical hypothesis testing involves the statement of the statistical null and alternative hypotheses. The researchers would have done this conceptually before the trial was started.Traditional statistical hypothesis testing starts at the position of equipoise as specified by the null hypothesis.(Sedgwick, 2014, p. 1)

P-VALUE It is important to distinguish between the research hypothesis and the statistical hypotheses. The researchers would have stated the research hypothesis, which predicts the study results, before starting the trial. The research hypothesis would have been that the outcome would be superior with varenicline compared with placebo (b is true); this would have been based on anecdotal evidence or perhaps on a pilot or exploratory study. (Sedgwick, 2014, p. 1)

P-VALUE It is not possible to infer from the P value for the statistical test of continued abstinence that the null or alternative hypothesis is true or false (c is false). Sample data only ever provide evidence in support of the null or alternative hypothesis, in turn permitting inferences to be made about the population. (Sedgwick, 2014, p. 2)

The ASA's statement on p-values: Context, process, and purpose. (Link) (Return to Index)

Wasserstein, R. L., Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133, doi:10.1080/00031305.2016.1154108.

P-VALUE “It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation.” A November 2013, article in Phys.org Science News Wire (2013) cited “numerous deep flaws” in null hypothesis significance testing. A Science News article (Siegfried 2014) on February 7, 2014, said “statistical techniques for testing hypotheses ...have more flaws than Facebook’s privacy policies.”A week later, statistician and “Simply Statistics” blogger Jeff Leek Responded. “The problem is not that people use P-values poorly,Leek wrote, “it is that the vast majority of data analysis is not performed by people properly trained to perform data analysis” (Leek 2014). (Wasserstein & Lazzar, 2016, p. 129)

P-VALUE The validity of scientific conclusions, including their reproducibility, depends on more than the statistical methods themselves. Appropriately chosen techniques, properly conducted analysis and correct interpretation of statistical results also play a key role in ensuring that conclusions are sound and that uncertainty surrounding them is represented properly (Wasserstein & Lazzar, 2016, p. 131)

P-VALUE In view of the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches. These include methods that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decision-theoretic modeling and false discovery rates. All these measures and approaches rely on further assumptions, buth they may more directly address the size of an effect (and its associated uncertainty) or whether the hypothesis is correct. (Wasserstein & Lazzar, 2016, p. 132)

Chi square tests. (Link) (Return to Index)

Author(s) Unknown (n.d.). Chi square tests. Book Author(s)/Editor(s) unknown (pp. 703-765). Retrieved from http://uregina.ca/~gingrich/ch10.pdf

CHI SQUARED The chi Square distribution is a theoretical or mathematical distribution which has wide applicability in statistical work. (no author, p. 704)

CHI SQUARED Each chi-square distribution has a degree of Freedom associated with it, so that there are many different chi-squared distributions. (no author, p. 705)

CHI SQUARED For both the goodness of fit test and the test of Independence, the chi-square statistic is the same. For both of these tests, all the categories into which the data have been divided are used. (no author, p. 705)

CHI SQUARED In the chi-square tests, the null hypothesis makes a statement concerning how many cases are to be expected in each category if this hypothesis is correct. The chi-square test is based on the difference between the observed and expected values for each category. (no author, p. 705)

CHI SQUARED When the observed data does not conform to what has been expected on the basis of the null hypothesis, the difference between the observed and expected values is large. The chi-square statistic is best small when the null hypothesis is true, and large when then hold null hypothesis is not true. (no author, p. 707)

CHI SQUARED The chi-square goodness of fit test begins by hypothesizing that the distribution of a variable behaves in a particular matter. For example, in order to determine daily staffing needs of a retail store, the manager may wish to know whether there are an equal number of customers each day of the week. To begin a hypothesis of equal numbers of customers on each day could be assumed, and this would be the null hypothesis. (no author, p. 709)

CHI SQUARED The larger the expected number of cases, the more accurate the chi-squared calculation would be. Also note that this condition is placed on the expected values, not the observed values. There may be no observed values in one or more categories, and this causes no difficulty with the test. But there must be around five or more expected cases per category. If there are considerably less than this, but Jason categories maybe group together to boost the number of expected cases. (no author, p. 717)

CHI SQUARED For the chi-square test, the number of degrees of freedom is the number of categories - 1. (no author, p. 724)

CHI SQUARED If the data is such that the no hypothesis cannot be rejected, then the conclusion is left at this point, without accepting the null hypothesis. The reason for this is that the level of type 2 error is usually fairly considerable and without more evidence most researchers feel that more proof would be required before the null hypothesis could be accepted. (no author, p. 729-730)

CHI SQUARED Any claim that is made concerning the whole distribution of a variable can be tested using the chi-square goodness of fit test. The only restriction on the test is that there should be approximately 5 or more cases or cell. Other than that there are really no restrictions on the use of the test. The frequency distribution can be measured on a nominal, ordinal, interval, or ratio scale and could be either discrete or continuous. All that is required is a grouping of the values of the variable into categories, and you need to know the number of observe cases which fall in each category. Once this is available, any claim concerning The nature of the distribution can be tested. (no author, p. 730-731)

CHI SQUARED At the same time, there are some weaknesses to the test. This test is often a first test to check whether the frequency distribution more or less confirms to some hypothesized distribution. If it does not conform so closely, the question which emerges is how or where it does not match the distribution claim. Chi-square test does not answer this question, so that further analysis is required. (no author, p. 731)

CHI SQUARED The chi-square test for independence of two variables is a test which uses a cross classification table to examine the nature of the relationship between these variables. These tables are sometimes referred to as contingency tables, and they have been discussed in this textbook as cross classification tables in connection with probability in chapter 6. These tables show the manner in which two variables are either related or are not related to each other. The test for Independence examines whether the observed pattern between the variables in the table is strong enough to show that the two variables are dependent on each other or not. While the chi-square statistic and distribution are used in this test, the test is quite distinct from the test of goodness of fit. The goodness of fit test examines only one variable, while the test of Independence is concerned with the relationship between two variables. (no author, p. 731)

CHI SQUARED Chi-square test of Independence begins with the hypothesis of no association or no relationship between two variables. In intuitive terms this means that the two variables do not influence each other and are not connected in any way. If one variable changes in value and this is not associated with changes in the other variable in a predictable manner, then there is no relationship between the two variables. (no author, p. 733)

CHI SQUARED The only restriction on the use of the test is the expected number of cases should exceed 5 in most cells of the Cross classification table. One widely used rule is as follows: no expected case should be less than one. No more than 20% of the cells should have less than five expected cases. This is the rule which SPSS print out as part of its output when the chi-squared test for Independence is requested. (no author, p. 743)

CHI SQUARED When reporting the results of chi-square test of Independence, the table of expected values in the value of the chi-square statistic are usually given. Expected values are not ordinarily reported, and these can easily be computed by the reader. In addition to the chi-square statistic obtained from the table, it is common to report the exact distance level, or the probability, of for the statistic. The degrees of freedom may also be reported, although this is not really necessary, since the reader can easily compute the degrees of freedom. (no author, p. 751)

CHI SQUARED The chi-square test for Independence is an extremely flexible and useful test. The test can be used to examine the relationship between any two variables, with any types of measurement nominal, ordinal, interval or ratio, and discrete or continuous. The only constraint on the use of the test is that there be sufficient numbers of cases in each cell of the table. The rule given on page 742 that there be no cell with less than one expected case, and no more than one fifth of the cells with less than five expected cases, generally an adequate rule. If there are too few cases in some cells, and adjacent categories can usually be grouped together. In doing so, the researcher Must be careful in how the grouping is carried out. (no author, p. 764)

The chi-square test of independence. (Link) (Return to Index)

McHugh, M. L. (2013). The chi-square test of independence. Biochemia Medica, 23(2), 143–149. doi:10.11613/BM.2013.018.

CHI SQUARED The Chi-square test of independence (also known as the Pearson Chi-square test, or simply the Chi-square) is one of the most useful statistics for testing hypotheses when the variables are nominal, as often happens in clinical research. Unlike most statistics, the Chi-square (χ2) can provide information not only on the significance of any observed differences, but also provides detailed information on exactly which categories account for any differences found. (McHugh, 2013, p.143)

CHI SQUARED The Chi-square test is a non-parametric statistic,also called a distribution free test. Non-parametric tests should be used when any one of the following conditions pertains to the data:1. The level of measurement of all the variables is nominal or ordinal.2. The sample sizes of the study groups are unequal; for the χ2 the groups may be of equal size or unequal size whereas some parametric tests require groups of equal or approximately equal size.3. The original data were measured at an interval or ratio level, but violate one of the following assumptions of a parametric test: (McHugh, 2013, p.143)

CHI SQUARED As with parametric tests, the non-parametric tests,including the χ2 assume the data were obtained through random selection. However, it is not uncommon to find inferential statistics used when data are from convenience samples rather than random samples. (To have confidence in the results when the random sampling assumption is violated, several replication studies should be performed with essentially the same result obtained). (McHugh, 2013, p.147)

CHI SQUARED Nominal variables require the use of nonparametric tests, and there are three commonly used significance tests that can be used for this type of nominal data. The first and most commonly used is the Chi-square. The second is the Fisher's Exact test, which is a bit more precise than the Chi-square, but it is used only for 2 x 2 Tables (4). (McHugh, 2013, p.147)

CHI SQUARED The third test is the maximum likelihood ratio Chi-square test which is most often used when the data set is too small to meet the sample size assumption of the Chi-square test. (McHugh, 2013, p.147)

CHI SQUARED Statistical strength tests are correlation measures.For the Chi-square, the most commonly used strength test is the Cramer’s V test. (McHugh, 2013, p.148)

CHI SQUARED The Cramer’s V is a form of a correlation and is interpreted exactly the same. For any correlation, value of 0.26 is a weak correlation. It should be noted that a relatively weak correlation is all that can be expected when a phenomena is only partially dependent on the independent variable. (McHugh, 2013, p.148)

CHI SQUARED The Chi-square is a valuable analysis tool that provides considerable information about the nature of research data. It is a powerful statistic that enables researchers to test hypotheses about variables measured at the nominal level. As with all inferential statistics, the results are most reliable when the data are collected from randomly selected subjects, and when sample sizes are sufficiently large that they produce appropriate statistical power. The Chi-square is also an excellent tool to use when violations of assumptions of equal variances and homoscedastic are violated and parametric statistics such as the t-test and ANOVA cannot provide reliable results. (McHugh, 2013, p.149)

Inferential statistics basic concepts (Link) (Return to Index)

Gabrenya, W. (2003) Inferential statistics basic concepts. Retrieved fromhttp://my.fit.edu/~gabrenya/IntroMethods/eBook/inferentials.pdf.

Estimating the size of treatment effects: Moving beyond P Values (Link) (Return to Index)

Mcgough, J. J., Faraone, S. V. (2009). Estimating the size of treatment effects: Moving beyond P Values, Psychiatry, 6(10), 21–29.

On effects siz (Link) (Return to Index)

Kelly, K. & Preacher, K. J. (2012). On effects size. Psychological Methods, 17(2), 137-152. doi: 10.1037/a0028086.

When statistical significance hides more than it reveals (Link) (Return to Index)

Powers, J. M., Glass, G. V. (2014). When statistical significance hides more than it reveals. Teachers College Record. Retrieved fromhttps://www.tcrecord.org/content.asp?contentid=17591

Cohen’s Conventions for Small, Medium, and Large Effects.

Wuensch, K. (2015). Cohen’s Conventions for Small, Medium, and Large Effects. Retrieved fromhttp://core.ecu.edu/psyc/wuenschk/docs30/EffectSizeConventions.pdf.

Estimating the Sample Size Necessary to Have Enough Power (Link) (Return to Index)

Wuensch, K. (2015). Estimating the Sample Size Necessary to Have Enough Power. Retrieved from: http://core.ecu.edu/psyc/wuenschk/docs30/Power-N.Doc

A power primer. (Link) (Return to Index)

Cohen, J. (1992). A power primer. Psychological Bulletin, 12(1), pp. 155-159.