Statistics

Statistical Sampling

A Data Scientist uses the tools of Data Science to analyze a dataset with the goal of drawing conclusions. These conclusions are generalizations obtained from a specific dataset (sample). In order for these generalizations to be valid, the data that was collected (the sample) or was already ready for the analysis should be representative of the population it was sampled from. The real-life datasets a Data Scientist encounters might contain many biases and the analysts should know the details of how this sample was collected and whether it is representative of the population on which generalizations are to be made. Since most of the model building endeavors are retrospective observational studies, i.e. the data is already collected and labeled, issues such as the sampling of the data to build the model, determining the stability of samples across time, etc. are of paramount importance and need special care. The Data Scientist should have the theoretical understanding and practical experience in order not to fall into the biases and fallacies caused by erroneous sampling.

Sample Topics

Sampling
Data collection methods
Type of studies
Observational studies and generalization
Sampling methods
Sampling biases
Calibration
Sampling sufficiency and precision
Population stability
Experimental Design
Components of an experiment
Well designed experiments
Experimental designs and methods
Comparison of experimental designs and methods
Experimental studies and generalizations

Statistical Sampling

Sample Topics

Page Not Found