Basic Principles of Operant Conditioning: Thorndike's Law of EffectThorndike's law of effect states that behaviors are modified by their positive or negative consequences. Show
Learning Objectives Relate Thorndike's law of effect to the principles of operant conditioning Key TakeawaysKey Points
Key Terms
Operant conditioning is a theory of learning that focuses on changes in an individual's observable behaviors. In operant conditioning, new or continued behaviors are impacted by new or continued consequences. Research regarding this principle of learning first began in the late 19th century with Edward L. Thorndike, who established the law of effect. Thorndike's ExperimentsThorndike's most famous work involved cats trying to navigate through various puzzle boxes. In this experiment, he placed hungry cats into homemade boxes and recorded the time it took for them to perform the necessary actions to escape and receive their food reward. Thorndike discovered that with successive trials, cats would learn from previous behavior, limit ineffective actions, and escape from the box more quickly. He observed that the cats seemed to learn, from an intricate trial and error process, which actions should be continued and which actions should be abandoned; a well-practiced cat could quickly remember and reuse actions that were successful in escaping to the food reward. Thorndike's puzzle box: This image shows an example of Thorndike's puzzle box alongside a graph demonstrating the learning of a cat within the box. As the number of trials increased, the cats were able to escape more quickly by learning.The Law of EffectThorndike realized not only that stimuli and responses were associated, but also that behavior could be modified by consequences. He used these findings to publish his now famous "law of effect" theory. According to the law of effect, behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated. Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. Law of effect: Initially, cats displayed a variety of behaviors inside the box. Over successive trials, actions that were helpful in escaping the box and receiving the food reward were replicated and repeated at a higher rate. Thorndike's law of effect now informs much of what we know about operant
conditioning and behaviorism. According to this law, behaviors are modified by their consequences, and this basic stimulus-response relationship can be learned by the operant person or animal. Once the association between behavior and consequences is established, the response is reinforced, and the association holds the sole responsibility for the occurrence of that behavior. Thorndike posited that learning was merely a change in behavior as a result of a consequence, and that if an action
brought a reward, it was stamped into the mind and available for recall later. Basic Principles of Operant Conditioning: SkinnerB. F. Skinner was a behavioral psychologist who expanded the field by defining and elaborating on operant conditioning. Learning Objectives Summarize Skinner's research on operant conditioning Key TakeawaysKey Points
Key Terms
Operant conditioning is a theory of behaviorism that focuses on changes in an individual's observable behaviors. In operant conditioning, new or continued behaviors are impacted by new or continued consequences. Research regarding this principle of learning was first conducted by Edward L. Thorndike in the late 1800s, then brought to popularity by B. F. Skinner in the mid-1900s. Much of this research informs current practices in human behavior and interaction. Skinner's Theories of Operant ConditioningAlmost half a century after Thorndike's first publication of the principles of operant conditioning and the law of effect, Skinner attempted to prove an extension to this theory—that all behaviors are in some way a result of operant conditioning. Skinner theorized that if a behavior is followed by reinforcement, that behavior is more likely to be repeated, but if it is followed by some sort of aversive stimuli or punishment, it is less likely to be repeated. He also believed that this learned association could end, or become extinct, if the reinforcement or punishment was removed. B. F. Skinner: Skinner was responsible for defining the segment of behaviorism known as operant conditioning—a process by which an organism learns from its physical environment. Skinner's Experiments Skinner's
most famous research studies were simple reinforcement experiments conducted on lab rats and domestic pigeons, which demonstrated the most basic principles of operant conditioning. He conducted most of his research in a special cumulative recorder, now referred to as a "Skinner box," which was used to analyze the behavioral responses of his test subjects. In these boxes he would present his subjects with positive reinforcement, negative reinforcement, or aversive stimuli in various timing
intervals (or "schedules") that were designed to produce or inhibit specific target behaviors. ShapingShaping is a method of operant conditioning by which successive approximations of a target behavior are reinforced. Learning Objectives Describe how shaping is used to modify behavior Key TakeawaysKey Points
Key Terms
In his operant-conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target, or desired, behavior, the process of shaping involves the reinforcement of successive approximations of the target behavior. The method requires that the subject perform behaviors that at first merely resemble the target behavior; through reinforcement, these behaviors are gradually changed, or shaped, to encourage the performance of the target behavior itself. Shaping is useful because it is often unlikely that an organism will display anything but the simplest of behaviors spontaneously. It is a very useful tool for training animals, such as dogs, to perform difficult tasks. Dog show: Dog training often uses the shaping method of operant conditioning. How Shaping Works In shaping, behaviors are broken down into many small, achievable steps. To test this method, B. F. Skinner performed shaping
experiments on rats, which he placed in an apparatus (known as a Skinner box) that monitored their behaviors. The target behavior for the rat was to press a lever that would release food. Initially, rewards are given for even crude approximations of the target behavior—in other words, even taking a step in the right direction. Then, the trainer rewards a behavior that is one step closer, or one successive approximation nearer, to the target behavior. For example, Skinner would reward the rat for
taking a step toward the lever, for standing on its hind legs, and for touching the lever—all of which were successive approximations toward the target behavior of pressing the lever.
Applications of Shaping This process has been replicated with other animals—including humans—and is now common practice in many training and teaching
methods. It is commonly used to train dogs to follow verbal commands or become house-broken: while puppies can rarely perform the target behavior automatically, they can be shaped toward this behavior by successively rewarding behaviors that come close. Reinforcement and PunishmentReinforcement and punishment are principles of operant conditioning that increase or decrease the likelihood of a behavior. Learning Objectives Differentiate among primary, secondary,
conditioned, and unconditioned reinforcers Key TakeawaysKey Points
Key Terms
Reinforcement and punishment are principles that are used in operant conditioning. Reinforcement means you are increasing a behavior: it is any consequence or outcome that increases the likelihood of a particular behavioral response (and that therefore reinforces the behavior). The strengthening effect on the behavior can manifest in multiple ways, including higher frequency,
longer duration, greater magnitude, and short latency of response. Punishment means you are decreasing a behavior: it is any consequence or outcome that decreases the likelihood of a behavioral response. Positive and Negative Reinforcement and PunishmentBoth reinforcement and punishment can be positive or negative. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something and negative means you are taking something away. All of these methods can manipulate the behavior of a subject, but each works in a unique fashion. Operant conditioning: In the context of operant conditioning, whether you are reinforcing or punishing a behavior, "positive" always means you are adding a stimulus (not necessarily a good one), and "negative" always means you are removing a stimulus (not necessarily a bad one. See the blue text and yellow text above, which represent positive and negative, respectively. Similarly, reinforcement always means you are increasing (or maintaining) the level of a behavior, and punishment always means you are decreasing the level of a behavior. See the green and red backgrounds above, which represent reinforcement and punishment, respectively.
Primary and Secondary Reinforcers The stimulus used to reinforce a certain behavior can be either primary or secondary. A primary reinforcer, also called an unconditioned reinforcer, is a stimulus that has innate reinforcing qualities. These kinds of reinforcers are
not learned. Water, food, sleep, shelter, sex, touch, and pleasure are all examples of primary reinforcers: organisms do not lose their drive for these things. Some primary reinforcers, such as drugs and alcohol, merely mimic the effects of other reinforcers. For most people, jumping into a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure. Schedules of ReinforcementReinforcement schedules determine how and when a behavior will be followed by a reinforcer. Learning Objectives Compare and contrast different types of reinforcement schedules Key TakeawaysKey Points
Key Terms
A schedule of reinforcement is a tactic used in operant conditioning that influences how an operant response is learned and maintained. Each type of schedule imposes a rule or program that attempts to determine how and when a desired behavior occurs. Behaviors are encouraged through the use of reinforcers, discouraged through the use of punishments, and rendered extinct by the complete removal of a stimulus. Schedules vary from simple ratio- and interval-based schedules to more complicated compound schedules that combine one or more simple strategies to manipulate behavior. Continuous vs. Intermittent SchedulesContinuous schedules reward a behavior after every performance of the desired behavior. This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in teaching a new behavior. Simple intermittent (sometimes referred to as partial) schedules, on the other hand, only reward the behavior after certain ratios or intervals of responses. Types of Intermittent SchedulesThere are several different types of intermittent reinforcement schedules. These schedules are described as either fixed or variable and as either interval or ratio. Fixed vs. Variable, Ratio vs. IntervalFixed refers to when the number of responses between reinforcements, or the amount of time between reinforcements, is set and unchanging. Variable refers to when the number of responses or amount of time between reinforcements varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements. Simple intermittent schedules are a combination of these terms, creating the following four types of schedules:
All of these schedules have different advantages. In general, ratio schedules consistently elicit higher response rates than interval schedules because of their predictability. For example, if you are a factory worker who gets paid per item that you manufacture, you will be motivated to manufacture these items quickly and consistently. Variable schedules are categorically less-predictable so they tend to resist extinction and encourage continued behavior. Both gamblers and fishermen alike can understand the feeling that one more pull on the slot-machine lever, or one more hour on the lake, will change their luck and elicit their respective rewards. Thus, they continue to gamble and fish, regardless of previously unsuccessful feedback. Simple reinforcement-schedule responses: The four reinforcement schedules yield different response patterns. The variable-ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambling). A fixed-ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass sales). The variable-interval schedule is unpredictable and produces a moderate, steady response rate (e.g., fishing). The fixed-interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., hourly employment). Extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. Among the reinforcement schedules, variable-ratio is the most resistant to extinction, while fixed-interval is the easiest to extinguish. Simple vs. Compound Schedules All of the examples described above are referred to as simple schedules. Compound schedules combine at least two simple schedules and use the same reinforcer for the same behavior. Compound schedules are often seen in the workplace: for example, if you are paid at an hourly rate
(fixed-interval) but also have an incentive to receive a small commission for certain sales (fixed-ratio), you are being reinforced by a compound schedule. Additionally, if there is an end-of-year bonus given to only three employees based on a lottery system, you'd be motivated by a variable schedule. Licenses and AttributionsCC licensed content, Shared previously
CC licensed content, Specific attribution
Are positive consequences that motivate behavior?Positive consequences that motivate behavior are referred to as: reinforcers.
Is the act of applying a consequence that increases the likelihood that the person will repeat the behavior that led to that result?Reinforcement is defined as a consequence that follows an operant response that increase (or attempts to increase) the likelihood of that response occurring in the future.
Which theory holds that people will choose certain behaviors over others with the expectation of a certain outcome?Expectancy theory (or expectancy theory of motivation) proposes that an individual will behave or act in a certain way because they are motivated to select a specific behavior over others due to what they expect the result of that selected behavior will be.
Is a positive reinforcement strategy that rewards successive approximations to a desirable behavior?Shaping encourages the formation of desirable work behaviors by rewarding successive approximations to those behaviors.
|