SQL
SQL is a declarative programming language for managing and processing data held in relational database management systems (RDBMS).
Data Scientists and Analysts should have sufficient knowledge of SQL in order to explore data and prepare it for further analysis.
SQL is a declarative programming language for managing and processing data held in relational database management systems(RDBMS). It is primarily used for processing structured data, i.e. it has a schema and incorporates relations among entities(e.g. customer, account) and variables(e.g. name, id, amount). SQL has been the de facto data processing language for online transaction processing(OLTP). As companies have accumulated data, they have begun to use data for decision maing purposes which gave rise to online analytical processing(OLAP). SQL has become de facto language for OLAP as well. The datawarehousing revolution of the late 1990s created many novel use cases for employing SQL and it did mot stop there. As companies started Machine Learning to extract intelligence from company datawarehouses, SQL has been used for exploring data, preparing data, scoring data with ML models and serving the model artifacts on SQL databases. The resilience of SQL is due to several factors:
1. Companies have invested a lot in datawarehousing in order to collect data in a central place for decision support. Most of the structured business data resides in datawarehouses where a relational database management system is used and SQL is used to process data. Users with different roles access this data and Data Scientists are no exception.
2. SQL Engines are powerful. The technology has matured and there are SQL engines which able to do massively paralel processing. From a Data Science point of view, it is possible to express an entire Machine Learning pipeline as a directed acyclic graph of SQL tasks.
3. SQL is expressive, easy to learn, powerful, and ubiquitous. Its akin to Python in these respects. A novice Data Analyst and Data Scientist could start programming and write meaningful SQL queries in a couple of days. 15 Though the volume of unstructured data is uncomparably larger than structured data, there is still valuable information residing in relational databases and SQL is the de-facto language for processing this data. Data Scientists and Analysts should have enough knowledge of SQL in order to explore it and prepare it for further analysis.
Sample Topics
- Relational database objects
- SQL, No-SQL
- SQL row functions and column functions
- SQL logical operators
- Nested queries
- DDL commands
- DML commands
- Aggregations
- Joins
- Analytics functions