CHAPTER FIVE

Sampling Distributions 

 

In this chapter we will introduce one of the most important concepts in statistics, that of the sampling distribution.

 

Example 5.1: A population consisting of exam grades of N = 5 students taking the Engineering Statistics course is described below

 

Table 1 Distribution of Grades for the Engineering Statistics Class

Name

Ordinal Grade

Grade Points

Mohammad

Abdallah

Khaled

Ali

Saad

B

C

B

A

C

3

2

3

4

2

 

 

It is easy to check that for grade points (), the mean and variance are given by

 

 

 

       

 

Now consider a sample of size 2, and prepare all possible samples without replacement. In fact, there will be  samples without replacement. The sample means and the corresponding probabilities are tabulated below:

 

            Table 2 All Possible Samples of Size 2 with their Probabilities

Sample Units

P(=)

2.0

(Abdallah, Saad)

0.10

2.5

(Mohammad, Abdallah), (Mohammad, Saad)

(Abdallah, Khaled), (Khaled, Saad)

0.40

3.0

(Mohammad, Khaled), (Abdallah, Ali),

(Ali, Saad)

0.30

3.5

(Mohammad, Ali), (Khaled, Ali)

0.20

 

The expected value of the sample mean is

 

which is the same as the population mean . The expected value of   is given by

 

 

so that .

It may be shown that

 

 

That is the variance of the sample mean  can be calculated from the population variance . Note that in case of sampling with replacement with large , we have  and consequently, .

 

 

5.1 Sampling Distributions of Sums and Means and the Central Limit Theorem

 

Example 5.2 One hundred bolts are packed in a plastic box. The weight of the empty box can be ignored.  However, each bolt weighs around 1 ounce with standard deviation s = 0.01 ounce. Assume that weights of bolts follow a normal distribution.

 

(a) Find the probability that a box filled with hundred bolts weighs more than 100.196 ounces.

(b) Find the probability that the mean weight of 100 bolts is more than 1.00196 ounces.

 

Solution 

    (a)      

                .

 

    (b) Since

              ,

                    .

Note that the events in (a) and (b) are equivalent.

 

Example 5.3  The weights of ball bearings have a distribution with a mean of 22.40 ounces and a standard deviation of 0.048 ounces. If a random sample of size 36 is drawn from this population, find the probability that the sample mean lies between 22.39 and 22.42.

 

Solution Let X = weight of a ball bearing. Then

                                              

and

                                   

 and hence

                            .

 

 

5.2             The Normal Approximation to the Binomial Distribution

 

Let X be a binomial random variable with  trials and success probability p. Then probabilities of events related to X can be approximated by a normal distribution with mean and variance  if the conditions  and  are satisfied.

 

Continuity Correction

It is the adjustment made to an integer-valued discrete random variable when it is approximated by a continuous random variable.  For a binomial random variable, we inflate the events by adding or subtracting 0.5 to the event as follows:

 

                                   

                                   

                                   

                                   

                                    .

 

The continuity correction should be applied anytime a discrete random variable is being approximated by a continuous random variable.

 

Example 5.4  The pass mark in an examination is the median mark.  A random sample of 10 candidates is chosen after the examination.

 

(a)    What is the distribution of the random variable “the number of students who passed the examination”?

(b)   Find the probability that more than 2 of the selected candidates passed the examination.

(c)    Find the probability that at least 2 of the selected candidates passed the examination.

(d)   Solve parts (b) and (c) using the normal approximation.

 

 

Solution

(a)    Any candidate picked either passes or fails the examination (i.e. mutually exclusive outcomes at each trial).  Since the pass mark is the median mark, it means that 50% of the candidates passed the examination so that  The trials are assumed to be independent. Thus, the random variable “the number of students who passed the examination” is a binomial random variable with n = 10 and p = 0.5.

(b)   If we represent the random variable in (a) by X, then we are interested in the probability .  By the use of binomial probability we have

                                     

             

            whereas by using Statistica we have:

 

                     .

 

 

(c)    By the use of  binomial probability we have

 

                    

                                    

 

            whereas by using Statistica we have:

 

                                       .

 

(d)   Since, we use the normal approximation to binomial and so the random variable X has a normal distribution with mean  and variance   So to solve the problems based on the approximation, open the Normal Probability Calculator, put 5 for the mean and  for the standard deviation.

 

For the problem in part (b), we have

                                   

 

                       

For the problem in part (c), we have

           

                               

 

5.3      Drawing a Random Sample from a known Distribution

 

Random samples are drawn from a known distribution by evaluating the inverse cumulative distribution functions at random probabilities.  For example, suppose a random variable has a cumulative distribution function F, i,e, ,  where x is a value in the domain of X, and hence .  Thus by supplying random values between 0 and 1 for p, we obtain values of x, which constitute a random sample from the distribution with cumulative distribution function F.

 

The V-functions and Rnd function

In Statistica, the V-functions are the inverse cumulative distribution functions and can be found in the Function Wizard.  There is VNormal for the normal distribution, VExpon for the exponential distribution, and so on.

 

Another function in the Function Wizard is the Rnd(x) function, which gives random numbers between 0 and x. For example, Rnd(1) gives a random number between 0 and 1, which can be used to represent random probability values.

Sampling from the Normal Distribution

The inverse cumulative distribution function of the normal distribution is  where x is a probability value, mu is the mean of the normal distribution and sigma is its standard deviation.  To draw a random sample from the normal distribution with mean 60 and variance 25, we summarize the steps below:

 

1.      Double-click VAR1

2.      Type =Rnd(1) in the formula box

3.      OK/Yes

4.      Double-click VAR2

5.      Type the equal sign “=” in the formula box

6.      Functions/Distributions, Double-click  VNormal

7.       x = v1, mu = 60 and sigma = 5

8.      OK/Yes.

 

The values given in VAR2 constitute the required sample. Alternatively, we can obtain the sample without using two columns of our data sheet.  For example, if we wish to have our sample in VAR3, we will proceed as follows:

 

1.      Double-click VAR3

2.      Type the equal sign “=” in the formula box

3.      Functions/Distribution, Double-click VNormal

4.      x = Rnd(1), mu = 60 and sigma = 5

5.      OK/Yes.

 

Sampling from the Exponential Distribution

The inverse cumulative distribution function of the exponential distribution is VExpon(x, lambda), where x is a probability value and lambda is the parameter of the exponential distribution. Care must be taken when supplying the value for lambda.  For example, if we are sampling from an exponential distribution with parameter k, we directly substitute k for lambda. On the other hand, if we are sampling from an exponential distribution with mean k, we substitute 1/k for lambda.  To draw a sample from the exponential with parameter 0.5, follow the steps:

 

1.      Double-click VAR1

2.      Type = Rnd(1) in the formula box

3.      OK/Yes

4.      Double-click VAR2

5.      Type the equal sign “=” in the formula box

6.      Functions/Distributions, Double-click VExpon

7.      x = v1, lambda = 0.5

8.      OK/Yes.


Here, we can also draw the random sample using only one column of our data sheet as follows:

 

1.      Double-click VAR3

2.      Type the equal sign “=” in the formula box

3.      Functions/Distributions, Double-click VExopn

4.      x = Rnd(1), lambda = 0.5

5.      OK/Yes.

 

100 random samples each of size 30 are drawn from the exponential distribution with mean ; their means are tabulated below:

                                                                                                          

  Table 3 Means of 100 Random Samples of Size 30 from the ExponentialDistribution

Sample

Sample

Sample

Sample

1

2.184054

26

2.082070

51

2.282192

76

1.783575

2

1.814839

27

1.445744

52

1.504987

77

1.587805

3

1.845744

28

1.758239

53

2.398457

78

2.620010

4

2.204828

29

1.997465

54

2.227194

79

1.679762

5

2.556562

30

1.758824

55

2.314496

80

1.274267

6

1.931955

31

1.850869

56

1.866983

81

2.160579

7

2.179001

32

1.393448

57

1.697291

82

1.310457

8

1.977819

33

1.849390

58

1.483425

83

2.149534

9

2.580562

34

1.806390

59

2.121959

84

2.210060

10

2.116091

35

2.342794

60

2.005428

85

2.403257

11

1.992988

36

1.872477

61

2.249343

86

1.801674

12

2.321373

37

1.948877

62

1.733591

87

1.759119

13

2.037973

38

1.829172

63

1.718578

88

2.018302

14

2.437841

39

1.457925

64

1.296010

89

1.549417

15

2.149032

40

2.142974

65

2.332216

90

2.446048

16

1.758222

41

2.096374

66

1.944724

91

1.754672

17

1.756615

42

1.900634

67

2.360446

92

1.968279

18

2.027894

43

2.054499

68

2.815900

93

1.729697

19

2.010520

44

2.077696

69

2.436328

94

1.959883

20

1.877407

45

2.126649

70

2.168109

95

2.062244

21

1.711968

46

2.091533

71

1.683284

96

1.808607

22

1.823209

47

2.304099

72

2.290407

97

1.378574

23

2.750169

48

1.565620

73

1.601436

98

1.950490

24

2.146609

49

2.700095

74

1.694431

99

1.888556

25

1.933193

50

2.321632

75

2.279583

100

2.009424

 

 

 

One hundred means given in Table 3 have been used to draw a histogram (See Figure 5.1) which is approximately bell shaped. The larger the sample sizes, the better would be the approximation towards normal frequency curve.


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 5.1 Histogram of the sample in Table 3

 

5.4 Use of t , and F Tables

 

Using the student t table in Appendix A3 we can find the following:

For a t random variable with 8 degrees of freedom, P(t > 2.896) = 0.01

For a t random variable with 17 degrees of freedom, P (t < –2.898) = 0.005

For a t random variable with 21 degrees of freedom, P (t < –1.721) = 0.05

 

Using the  table in Appendix A4 we can find the following:

For a random variable with 7 degrees of freedom, P(> 2.17) = 0.95

For a random variable with 17 degrees of freedom, P(< 33.41) = 0.99

For a random variable with 28 degrees of freedom, P(> 48.28) = 0.01

 

Using the F table in Appendix A5 we can find the following:

For an F random variable with 4 and 7 degrees of freedom, P(F > 4.12) = 0.05

For an F random variable with 15 and 21 degrees of freedom, P(F > 2.534) = 0.025

For an F random variable with 12 and 9 degrees of freedom, P(F > 5.111) = 0.01

 

 


5.5 The Probability Calculator for t, and F Distributions

 

The t Probability Calculator

In the Probability Distribution Calculator, select t (Student) for the -distribution (see Figure 5.2) and supply the Degrees of Freedom (df).

 

 

 

 

 

 

 

 

 

 

 

 


                         

 

Figure 5.2 Probability Calculator for the t-distribution

 

It is important to note that the t random variable is symmetric about zero regardless of the value of the df.

 

Figure 5.2 shows that for the t random variable with 5 df, the probability that t is less than 0.3 is 0.611875, i.e., , where has a Student t-distribution with 5 degrees of freedom. The Two-tailed and (1-Cumulative p) probabilities and their combinations are computed in the same way as done for standard normal random variable.   

 

The value   is the value of t random variable for a given df that has an area (or a probability) of   to its right.  The value    is similarly defined.

 

 

The Chi-Square Probability Calculator

Selecting Chi2 under Probability Distribution Calculator gives the calculator for chi-square random variable.  Just as in the case of t-distribution, the parameter to be supplied here is the df.  However, the Two-tailed on this calculator is disabled and cannot be used.  The (1-Cumulative p) and the Inverse function are used the same way as done for the normal distribution.

 

The F Probability Calculator

To get the probability calculator for the F distribution, we select F from the list of distributions.  Here, we are required to supply two parameters; the first and the second df (df1 and df2) respectively.  Figure 5.3 shows that for F distribution with 3 and 20 df, the 95th percentile is 3.098389.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 5.3 Probability Calculator for the F distribution

 

Note that the Two-tailed on this calculator is disabled and cannot be used. 

 

 


Exercises

 

5.1       (Johnson, R. A., 2000, 212). A random sample of size 100 is taken from an infinite population having a mean, 76 and a variance, 256. What is the probability that the sample mean will be between 75 and 78?

 

5.2       (Johnson, R. A., 2000, 212). A wire- bonding process is said to be in control if the mean pull-strength is 10 pounds. It is known that the pull-strength measurements are normally distributed with a standard deviation of 1.5 pounds. Periodic random samples of size 4 are taken from this process and the process is said to be “out of control” if a sample mean is less than 7.75 pounds. Comment.

 

5.3       The weights of ball bearings have a distribution with a mean of 22.40 ounces and a standard deviation of 0.048 ounces. If a random sample of size 49 is drawn from this population, find the probability that the

 

(a)    sample mean lies between 22.36 and 22.41,

(b)   sample mean is more than 22.38,

(c)    sample mean is no more than 22.43,

(d)   sample mean is greater than or equal to 22.41.

 

5.4       Suppose X is normally distributed with mean 50 and variance 9.  Let  be a random variable in the sense of drawing repeated samples of size 16 from the distribution of X.  Find the probability that

 

(a)  differs from the mean by less than 2.5 units,

(b)  differs from the mean by more than 1.5 units,

(c)  is between 1.8 and 2.6.

 

5.5       Consider a binomial random variable with 50 trials and success probability 0.39.

 

(a)    Compute the probability that it is at least equal to 15.

(b)   Compute the probability that it is at most equal to 12.

(c)    Compute the probability that it is equal to 20.

(d)   Compute the probability that it is equal to 30.

(e)    Repeat (a) to (d) using the normal approximation to the binomial.

 

5.6       Consider a binomial random variable with 100 trials and success probability 0.45.

 

(a)    Compute the probability that it is at least equal to 25.

(b)   Compute the probability that it is at most equal to 32.

(c)    Compute the probability that it is less than or equal to 20.

(d)   Compute the probability that it is equal to 40.

(e)    Repeat (a) to (d) using the normal approximation to the binomial.

 

5.7       Draw a random sample of size 200 from a normal distribution with mean 40 and variance 36. Compute the mean and standard deviation of your sample.

 

5.8       Draw a random sample of size 60 from a normal distribution with mean 10 and variance 4. Compute the mean and standard deviation of your sample.

 

5.9       Draw a random sample of size 100 from a normal distribution with mean 20 and variance 25. Compute the mean and standard deviation of your sample.

 

5.10     Draw a random sample of size 100 from an exponential distribution with λ = 4.  Compute the mean and standard deviation of your sample.

 

5.11     Draw a random sample of size 120 from an exponential distribution with mean 2. Compute the mean and standard deviation of your sample.

 

5.12     Suppose X is normally distributed with mean 60 and variance 16. Find the probability that  based on samples of size 9, differs from the mean by less than 2.5 units.

 

5.13     Consider a binomial random variable with 20 trials and success probability 0.45.

 

(a) Compute the probability that it is at least equal to 3.

(b) Compute the probability that it is at most equal to 12.

(c) Compute the probability that it is equal to 3.

     (d) Compute the probability that it is equal to 12.

     (e) Repeat (a) to (d) using the normal approximation to binomial.

 

5.14     Draw a random sample of size 36 from a normal distribution with mean 10 and variance 4. Compute the mean and standard deviation of your sample.

 

5.15     Consider an exponential distribution with expected value 10.   Draw  samples of size  from the above population. Draw a relative frequency histogram and relative frequency curve for the 100 sample means. Repeat the experiment with  samples of size . What is the sampling distribution of the sample means?

 

5.16     Consider the t random variable with 10 df.

 

(a) Find the proportion of the area to the right of 2.1.

(b) Find the probability that is less than 2.

(c) Find the proportion of the area to the left of –2.1.

(d) Find the proportion of the area between –2.1 and +2.1.

(e) Find the proportion of the area between –1.2 and +2.1.

 

5.17     Consider a  random variable with 9 degrees of freedom.

Find .

 

5.18     Complete the table for t random variable with 19 df. Note that the relationship between   and    is: 

 

0.01

 

2.539488

 

 

0.02

 

 

 

 

0.05

 

 

 

 

.010

 

 

1.729133

 

0.02

 

 

 

 

 

5.19     Consider the chi-square random variable with 25 degree of freedom.

 

(a) Find the probability that it is less than 20.

(b) Find the probability that it is greater than 25.

(c) Find the probability that it is between 21and 24.

 

5.20          Consider a  random variable with 32 degrees of freedom.

Find .

 

5.21          Consider a random variable with 119 degrees of freedom.

Find  and compare them with

.

 

5.22          Consider a random variable with 9 degrees of freedom.

  Find .

 

5.23                Consider an F random variable with 3 and 4 degrees of freedom.

Find .