SQLite3 Intro Tutorial
Recently, I completed a SQL for Data Science course offered by UC Davis on Coursera.
The database management system we used was SQLite3, a free and open-source database engine. SQLite is great because, as a beginner, you don't have to worry about configuring users, permissions, and security. Every database schema, or set of tables, is stored in its own file, and inherits read/write permissions directly from the filesystem (i.e. Linux).
For example: Take a look at the file permissions and file type information here for "Northwind.db", a SQLite3 database I practiced with during the course:
$ export DB="Northwind.db" ; ls -al | grep $DB ; file $DB
-rw-r--r-- 1 adam adam 561152 Sep 14 13:08 Northwind.db Northwind.db: SQLite 3.x database, last written using SQLite version 3027002
So we can see that the file itself is Read/Write-able by myself and my group (user/group: adam) on my system. And the file type is:
SQLite 3.x database.
SQLite is also very lightweight and is entirely self-contained. It's a great platform to begin learning SQL syntax and how to construct queries.
You can create and keep as many databases as you'd like, and store them anywhere within your filesystem with SQLite3.
As for what I learned:
Throughout the course, I took notes of some of the most useful and interesting SQL queries I could. I applied them to the freely available Northwind SQLite3 database.
I forked the original repo containing the database from user
jpwhite3 on GitHub, and then added all my favorite SQL commands to the Readme to create a much more complete Beginner's How-to Guide for SQLite. See: https://github.com/84adam/northwind-SQLite3
Here's a few of my favorite queries:
Generating a 'UserName' using First Initial, Last Name, and EmployeeID
This command takes advantage of two built-in functions:
SUBSTR()for selecting parts of a string (substrings), and
||for concatenation (combining strings together)
SELECT EmployeeID ,FirstName ,LastName ,LOWER(SUBSTR(FirstName,1,1)||SUBSTR(LastName,1,8))||EmployeeID AS UserName FROM Employees;
EmployeeID|FirstName|LastName|UserName 1|Nancy|Davolio|ndavolio1 2|Andrew|Fuller|afuller2 3|Janet|Leverling|jleverlin3 4|Margaret|Peacock|mpeacock4 5|Steven|Buchanan|sbuchanan5 6|Michael|Suyama|msuyama6 7|Robert|King|rking7 8|Laura|Callahan|lcallahan8 9|Anne|Dodsworth|adodswort9
Another one I liked was: Calculating 'Age' given only a BirthDate and Today's Date.
The key is you must cast Age as an Integer so that the value will not be rounded up.
SELECT BirthDate ,CAST(STRFTIME('%Y.%m%d', 'now') - STRFTIME('%Y.%m%d', BirthDate) AS INT) AS Age FROM Employees;
BirthDate|Age 1948-12-08|71 1952-02-19|68 1963-08-30|57 1937-09-19|82 1955-03-04|65 1963-07-02|57 1960-05-29|60 1958-01-09|62 1966-01-27|54
One more: Creating a 'Categorical Variable' or 'One-Hot (Binary) Encoding'.
This is useful for Data Science applications as it permits easier analysis of data which may originally be in the form of strings or other non-numeric data types. For example, if you care about looking at the differences between people who live in London vs. those who don't, you can simply label each London Dweller with a '1' and other folks with a '0'.
Here's how to do this using a
SELECT EmployeeID ,LastName ,FirstName ,City ,CASE City WHEN 'London' THEN 1 ELSE 0 END "London(Y/N)" FROM Employees ORDER BY City, LastName LIMIT 10;
EmployeeID|LastName|FirstName|City|London(Y/N) 3|Leverling|Janet|Kirkland|0 5|Buchanan|Steven|London|1 9|Dodsworth|Anne|London|1 7|King|Robert|London|1 6|Suyama|Michael|London|1 4|Peacock|Margaret|Redmond|0 8|Callahan|Laura|Seattle|0 1|Davolio|Nancy|Seattle|0 2|Fuller|Andrew|Tacoma|0
Notice there are 4 employees who live in London in the results above, each marked with a '1' at the end (last column).
See more example commands on my GitHub Readme page here: https://github.com/84adam/northwind-SQLite3
To learn more about the SQL for Data Science course, visit: https://www.coursera.org/learn/sql-for-data-science
One caveat about the course: A number of the examples shown in the lecture videos do not correspond to the "Chinook" database mentioned by the instructor. (Instead they rely on data from the Northwind database.) This is why I created the guide on GitHub. (The Northwind SQLite3 database will come in very handy for you!)