Statistical Sampling
A Data Scientist uses the tools of Data Science to analyze a dataset with the goal of drawing conclusions. These conclusions are generalizations obtained from a specific dataset (sample). In order for these generalizations to be valid, the data that was collected (the sample) or was already ready for the analysis should be representative of the population it was sampled from. The real-life datasets a Data Scientist encounters might contain many biases and the analysts should know the details of how this sample was collected and whether it is representative of the population on which generalizations are to be made. Since most of the model building endeavors are retrospective observational studies, i.e. the data is already collected and labeled, issues such as the sampling of the data to build the model, determining the stability of samples across time, etc. are of paramount importance and need special care. The Data Scientist should have the theoretical understanding and practical experience in order not to fall into the biases and fallacies caused by erroneous sampling.
Sample Topics
- Sampling
- Data collection methods
- Type of studies
- Observational studies and generalization
- Sampling methods
- Sampling biases
- Calibration
- Sampling sufficiency and precision
- Population stability
- Experimental Design
- Components of an experiment
- Well designed experiments
- Experimental designs and methods
- Comparison of experimental designs and methods
- Experimental studies and generalizations