Similar to the mean, outliers affect the standard deviation (after all, the formula for standard deviation includes the mean). Here’s an example: the salaries of the . Lakers in the 2009–2010 season range from the highest, $23,034,375 (Kobe Bryant) down to $959,111 (Didier Ilunga-Mbenga and Josh Powell). Lots of variation, to be sure! The standard deviation of the salaries for this team turns out to be $6,567,405; it’s almost as large as the average. However, as you may guess, if you remove Kobe Bryant’s salary from the data set, the standard deviation decreases because the remaining salaries are more concentrated around the mean. The standard deviation becomes $4,671,508.
A data set (or dataset , although this spelling is not present in many contemporary dictionaries like Merriam-Webster) is a collection of data . Most commonly a data set corresponds to the contents of a single database table , or a single statistical data matrix , where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows. The term data set may also be used more loosely, to refer to the data in a collection of closely related tables, corresponding to a particular experiment or event. An example of this type is the data sets collected by space agencies performing experiments with instruments aboard space probes . Data sets that are so large that traditional data processing applications are inadequate to deal with them are known as big data .