There are optional summary statistics available for each value set in a table. To add summary statistics to a table use the Tally Attributes for a Variable dialog.
The choices for statistical measures are:
Mean – average value of observations.
Median - middle value (half of the observations are above this value and half below).
Mode – for categorical (discrete) variables this is the most frequently observed value. For grouped variables, this is the lower limit of the modal class or the range in the value set in which the most observations lie.
Std Deviation – Standard Deviation, a measure of how clustered the observations are around the mean (square root of the variance).
Variance – another measure of dispersion.
N-tiles - values that divide the values into N groups each of which contain 1/N of the total observations (N = 2 is equivalent to median).
Minimum – smallest value found in all observations.
Maximum – largest value found in all observations.
Proportion – show counts for certain values in the value set as a fraction or percentage of the total for the variable.
It is assumed that users know the meaning and relevance of any statistics that are selected. They are only meaningful for data that represents a true numeric value, e.g., age, income, hectares, etc.).
Many of the statistics above have additional parameters that can set by selecting them and clicking on the "Options" button. These options are documented in Tally Attributes for a Variable.
Note that in the case of grouped or continuous variables, the median, n-tiles and mode depend highly on the value set used. This is because these statistics are calculated using the frequency distribution for the value set rather than on the raw data itself. The mode will be the category in the value set that contains the most observations. The median is calculated by interpolation using the limits of the category in the value set within which the cumulative frequency reaches 50%. This means that the mean, mode and n-tiles will be more accurate for value sets with more smaller categories rather than fewer large categories. For best results, use value sets with a large number of uniform categories when calculating median, mode and n-tiles. For example, using single year of age or 5 year age groups rather than 15 or 20 year age groups will result in a more accurate median age. Using a single grouping for age (for example one category for 0-115 years) will result in a highly inaccurate result. Note that for median and n-tiles, it is possible to set groupings for the median/n-tile calculation which are different from the categories in the value set in order to get a more accurate result. See Tally Attributes for a Variable for details.