SQLite3 Intro Tutorial

Recently, I completed a SQL for Data Science course offered by UC Davis on Coursera.

The database management system we used was SQLite3, a free and open-source database engine. SQLite is great because, as a beginner, you don't have to worry about configuring users, permissions, and security. Every database schema, or set of tables, is stored in its own file, and inherits read/write permissions directly from the filesystem (i.e. Linux).

For example: Take a look at the file permissions and file type information here for "Northwind.db", a SQLite3 database I practiced with during the course:

BASH COMMAND:

$ export DB="Northwind.db" ; ls -al | grep $DB ; file $DB

RESULT:

-rw-r--r--  1 adam adam  561152 Sep 14 13:08 Northwind.db
Northwind.db: SQLite 3.x database, last written using SQLite version 3027002  

So we can see that the file itself is Read/Write-able by myself and my group (user/group: adam) on my system. And the file type is: SQLite 3.x database.

SQLite is also very lightweight and is entirely self-contained. It's a great platform to begin learning SQL syntax and how to construct queries.

You can create and keep as many databases as you'd like, and store them anywhere within your filesystem with SQLite3.

As for what I learned:

Throughout the course, I took notes of some of the most useful and interesting SQL queries I could. I applied them to the freely available Northwind SQLite3 database.

I forked the original repo containing the database from user jpwhite3 on GitHub, and then added all my favorite SQL commands to the Readme to create a much more complete Beginner's How-to Guide for SQLite. See: https://github.com/84adam/northwind-SQLite3

Here's a few of my favorite queries:

Generating a 'UserName' using First Initial, Last Name, and EmployeeID

This command takes advantage of two built-in functions:

  • SUBSTR() for selecting parts of a string (substrings), and
  • || for concatenation (combining strings together)

SQL COMMAND:

SELECT  
    EmployeeID
    ,FirstName
    ,LastName
    ,LOWER(SUBSTR(FirstName,1,1)||SUBSTR(LastName,1,8))||EmployeeID AS UserName
FROM Employees;  

RESULT:

EmployeeID|FirstName|LastName|UserName  
1|Nancy|Davolio|ndavolio1  
2|Andrew|Fuller|afuller2  
3|Janet|Leverling|jleverlin3  
4|Margaret|Peacock|mpeacock4  
5|Steven|Buchanan|sbuchanan5  
6|Michael|Suyama|msuyama6  
7|Robert|King|rking7  
8|Laura|Callahan|lcallahan8  
9|Anne|Dodsworth|adodswort9  

Another one I liked was: Calculating 'Age' given only a BirthDate and Today's Date.

The key is you must cast Age as an Integer so that the value will not be rounded up.

SQL COMMAND:

SELECT  
    BirthDate
    ,CAST(STRFTIME('%Y.%m%d', 'now') - STRFTIME('%Y.%m%d', BirthDate) AS INT) AS Age
FROM Employees;  

RESULT:

BirthDate|Age  
1948-12-08|71  
1952-02-19|68  
1963-08-30|57  
1937-09-19|82  
1955-03-04|65  
1963-07-02|57  
1960-05-29|60  
1958-01-09|62  
1966-01-27|54  

One more: Creating a 'Categorical Variable' or 'One-Hot (Binary) Encoding'.

This is useful for Data Science applications as it permits easier analysis of data which may originally be in the form of strings or other non-numeric data types. For example, if you care about looking at the differences between people who live in London vs. those who don't, you can simply label each London Dweller with a '1' and other folks with a '0'.

Here's how to do this using a CASE statement:

SQL COMMAND:

SELECT  
    EmployeeID
    ,LastName
    ,FirstName
    ,City
    ,CASE City
        WHEN 'London' THEN 1
        ELSE 0
        END "London(Y/N)"
    FROM Employees
    ORDER BY City, LastName
    LIMIT 10;

RESULT:

EmployeeID|LastName|FirstName|City|London(Y/N)  
3|Leverling|Janet|Kirkland|0  
5|Buchanan|Steven|London|1  
9|Dodsworth|Anne|London|1  
7|King|Robert|London|1  
6|Suyama|Michael|London|1  
4|Peacock|Margaret|Redmond|0  
8|Callahan|Laura|Seattle|0  
1|Davolio|Nancy|Seattle|0  
2|Fuller|Andrew|Tacoma|0  

Notice there are 4 employees who live in London in the results above, each marked with a '1' at the end (last column).

See more example commands on my GitHub Readme page here: https://github.com/84adam/northwind-SQLite3


To learn more about the SQL for Data Science course, visit: https://www.coursera.org/learn/sql-for-data-science

One caveat about the course: A number of the examples shown in the lecture videos do not correspond to the "Chinook" database mentioned by the instructor. (Instead they rely on data from the Northwind database.) This is why I created the guide on GitHub. (The Northwind SQLite3 database will come in very handy for you!)