| |||
Notes on Topic 1:
| |||
Statistics, Science, and Observations
Observations Observations are the basic empirical "stuff" of science.Statistics Statistics is a set of methods and rules for organizing, summarizing and interpreting information.The methods and rules enable scientific researchers to describe and analyze the observations they have made. Statistical methods are tools for science. Science consists of methods for making observations; Here are some of the "observations" we gathered in the survey we did on the first day of class in 1997 and 1998.
Populations & SamplesPopulationsA population is the set of all individuals of interest in a particular study. We will also refer to populations of scores.Samples A sample is a set of individuals selected from a population, usually intended to represent the population in a study. We will also refer to samples of scores.The data we gathered in class are a "sample" of scores obtained with a sample of individuals. The population we sampled from is the population of UNC undergraduates. Parameters A Parameter is a value, usually a numerical value, that describes a Population. A Parameter may be obtained from a single measurement, or it may be derived from a set of measurements from the Population.Statistics A Statistic is a value, usually a numerical value, that describes a Sample. A Statistic may be obtained from a single measurement, or it may be derived from a set of measurements from the Sample.Here are some "statistics" computed from our sample of data:
Data Descriptive Statistics Descriptive Statistics are statistical procedures used to summarize, organize and simplify data. It is also the branch of statistical activity focusing on the use of such procedures. These procedures are the focus of chapters 1 through 5.Statistical Visualization Recently developed computational statistical procedures used to visually summarize, organize and simplify data. The statistical system we are using is named ViSta for "Visual Statistics", because it includes statistical visualiation.A statistical visualization of our data is shown below. It shows the relationship between GPA and Satisfaction with the UNC experience. Higher satisfaction is associated with higher GPA. Exploratory Statistics The process of exploring data by using descriptive and visualization methods to "see what the data seem to say". The branch of statistics that focuses on "seeing what the data seem to say" (Tukey, 19??).Inferential Statistics Inferential Statistics consist of techniques that allow us to study samples and then to make generalizations about the populations from which the samples were selected. It is also the branch of statistical activity focusing on the use of such procedures. These procedures are the focus of chapters 8 through the remainder of the text. The groundwork for statistical inference is laid in chapters 6 and 7.Sampling Error Sampling error is the discrepency, or amount of error, that exists between a sample statistic and the corresponding population parameter.The Scientific Method and the Design of ExperimentsScience attempts to discover orderliness in the universe - to discover regularity in changes. Something that can change is called a variable.Variables A variable is a characteristic or condition that changes or has different values for different individuals. In the data we gathered, the variables include "Gender", "Age", etc.A constant is a characteristic or condition that does not vary, and is the same for every individual. The Correlational Method The scientific method in which two (or more) variables are observed without manipulation (i.e., as they exist naturally) to see if there is any relationship between them.The correlational method cannot establish cause-and-effect: Correlation is not causation! The data we gathered are an example of the correlational method. We can say that "Higher satisfaction is associated with higher GPA", but we can't say that "Higher GPA causes higher satisfaction" (or the converse). The Experimental Method The scientific method which can establish a cause-and-effect relationship between two (or more) variables. Some important points:Independent Variable (also called the predictor variable) The variable which is manipulated by the researcher. Dependent Variable (also called the response variable)The variable which is observed by the researcher for changes in order to access the effect of the treatment. (The treatment is the manipulation of the predictor variable). Confounding VariableAn uncontrolled variable that is unintentionally allowed to vary systematically with the independent variable. Confounds the results (bad, bad, bad!).The control group This is a condition of the independent variable that does not receive the experimental treatment. Usually, the control group receives either no treatment or a placebo treatment. The experimental groupThis is a condition of the independent variable that does receive an experimental treatment. There may be several experimental groups.The Quasi-Experimental Method Examines differences between pre-existing groups of subjects (such as men vs. women) or differences between groups of scores obtained at different times (before and after treatment).Hypotheses A hypothesis is a prediction about the outcome of an experiment. In experimental research, a hypothesis makes a prediction about how the manipulation of the independent (predictor) variable will affect the dependent (response) variable.MeasurementData are measurements of observations which involve categorizing, ordering or using number to characterize amount. Several levels of measurement are involved. These in turn determine what statistics can be computed. Measurements may also be discrete or continuous.Nominal The nominal level of measurement labels observations so that they fall into different categories. Football jersey numbers and home street addresses are common examples.In ViSta, nominal variables are called "Category" variables. Ordinal The ordinal level of measurement consists of categories that are ordered in a sequence. Order of finish in a race is a common example.In ViSta, ordinal variables are called "Ordinal" variables. Interval The interval level of measurement consists of ordered categories where all of the categories are intervals of exactly the same size. Temperature is a common example. Here, equal differences between numbers reflect equal differences in magnitude of the observed variable.Ratio The ratio level of measurement is an interval scale with an absolute zero point. Length and weight are common examples. Here, ratio of numbers reflect ratios of variable magnitude.In ViSta, interval and ratio variables are called "Numeric" variables. Mathematical NotationIn statistical calculations you will constantly be required to add a set of values to find a specific total. We use algebraic expressions to represent the values being added. For exampleX means "Scores on a Variable.We will use the greek letter Sigma to signify the summation process. Thus, we write Note that The following term, which is called the "squared sum" works as shown: Because of the order of operations, the following term, which is called "the sum of squares", works as shown: Consider how the following summation equation works: On the other hand, the next summation equation works differently: Finally, consider how this last summation equation works: |