

Have sizes of figures properly proportioned, and 5. Have bars and/or rectangles of equal width to avoid distortion, 4. Figure 1.2 shows a frequency histogram for price, clearly showing the preponderance of homes selling from 50 to 150 thousand dollars.Īll axes labeled correctly, with clearly identifiable scales, 2. Figure 1.1 shows a frequency bar chart for exter from Table 1.2 that shows the large proportion of brick homes clearly. In a bar chart the width of the rectangle has no meaning however, all the rectangles should be the same width to avoid distortion. In a histogram, the width of each rectangle represents the size of the class and the rectangles are usually contiguous and of equal width so that the areas of the rectangles reflect the relative frequency. Heights of vertical rectangles represent the frequency or the relative frequency. Both these graphs are constructed in the same way. The most common graphical presentation of a frequency distribution for numerical data is a histogram while the most common presentation for nominal, categorical, or discrete data is a bar chart. Using the principle that a picture is worth a thousand words (or numbers), the information in a frequency distribution is more easily grasped if it is presented in graphical form. 1.4.1 Graphical Representation of Distributions We will see later that cumulative relative frequencies - especially those near 0 and 100% - can be of considerable importance. Thus the 59 homes priced at less than $200,000 represent 85.51% of the number of homes offered. The column labeled Cumulative Percent is the cumulative relative frequency distribution, which gives the proportion (percentage) of observed values less than the upper limit of that class interval. Thus, for example, 59 of the homes are priced at less than $200,000. The column labeled Cumulative Frequency in Table 1.6 is the cumulative frequency distribution, which gives the frequency of observed values less than or equal to the upper limit of that class interval. Handling ordinal, interval, and ratio scale measurements can be a little more complicated, but, as subsequent discussion will show, we can easily handle such data simply by correctly defining the classes. For example, the variable exter has three values, Brick, Frame, and Other. Nominally scaled variables naturally have these classes or categories. Because we count each observation only once, if we add up the number (called the frequency) of houses in all the classes, we get the total number of houses in the data set. Because we want to count each house only once, these categories (called classes) are constructed so they don’t overlap.

To provide more information, we will construct frequency distributions by grouping the data into categories and counting the number of observations that fall into each one.

We might be able to conclude that most of the houses have brick exteriors, or that the selling price of houses ranges from $30,000 to $395,000, but a lot more information about this data set can be obtained through the use of some rather simple organizational tools. Very little information about the characteristics of recently sold houses can be acquired by casually looking through Table 1.2. Freund, in Statistical Methods (Fourth Edition), 2022 1.4 Distributions (This provides the information in a table half as long.) From Table 6.1, 1.75 σ yields an area halfway between 0.045 and 0.036, or about 4% of tumors are less than 1 cm.ĭonna L.

Since the normal curve is symmetric, the area under the curve to the left of μ−1.75 σ is the same as the area to the right of μ+1.75 σ. About 1.4% of tumors are larger than 5 cm.Īs further illustration, what percent of tumors are less than 1 cm? 1 cm lies 2.77−1=1.77 cm or 1.77/1.01=1. Table 6.1 shows the area in the right tail to the right of 2.20 is 0.014. The value 5 cm lies 5−2.77=2.23 cm to the right of μ. If we are willing to accept the normal curve as the probability distribution of liver tumor sizes, the probability of a tumor larger than 5 cm is the proportion of curve greater than 5. We ask what percent of tumors are larger than 5 cm. Let us denote mean by μ and standard deviation by σ for shorthand. A normal curve with the same mean (2.77 cm) and standard deviation (1.01 cm) is superposed. Riffenburgh, in Statistics in Medicine (Third Edition), 2012 Probability of Certain Ranges Occurringįigure 3.4 shows the relative frequency distribution of tumor sizes of 115 liver cancers.
