A Guide to SQL - Structured Query Language

The following notes are A programming language designed to manage data stored in relational databases. SQL is a declarative language (vs. imperative). That is, the user tells SQL what they want, not how to get it.

Advantages and Disadvantages

Structured databases are popular for their advantages, which may also be disadvantages in certain situations.

Structured: tables and schema are fixed and inflexible both an advantage (in terms of user IO) and disadvantage (if a change is needed, the data do not lend themselves to a table format)
Centralized: SQL is designed to be in a single server, rather than distributed across servers. Both an advantage (controlled) and a disadvantage (e.g., moving Asian customer table to Asia for efficiency).

Data Types

CHAR: fixed-length characters
DATE: date
ENUM: values from a discrete set of options. E.g., ENUM('android','iphone','other')
INTEGER: integer.
MONEY: Dollar format
TEXT: string format. Note that this must be inserted into SQL with ' and escaped with double '. E.g., 'String for O''Brian'.
VARCHAR: variable-length characters

Note that variables may be cast to new formats with the :: operator. For example, SELECT '10'::INTEGER;

Order of Operations

FROM: Choose and JOIN tables to get base data
WHERE: Filters the base data
GROUP BY: Aggregates the base data into groups
HAVING: Filters the aggregated data
SELECT: Chooses the final data
ORDER BY: Sorts the final data
LIMIT: Limits the returned final data to a row count

SQL Statements

SQL comes in many 'flavors', but the following statements are generally universal with some exceptions. Some statements (e.g., COPY) are more specific to flavor and should be Googled. SQL statements are made of clauses, written in CAPITALS by convention (not syntactically required), have parameters in between (), and end in ;.

The `SELECT` & `FROM` Statements

The most common query. SELECT is used to select data FROM a table. For example,

SELECT var1_name, var2_name FROM table_name;

*: SELECT all table columns). SELECT always returns a new table called the result set.
DISTINCT: a query returning unique values in a result set. Example: SELECT DISTINCT genre FROM movies;
AS: rename column names using a alias. Example SELECT name AS "Album Name", year FROM albums;. Depending on the flavor of SQL, the alias may need to be in double quotes, and AS might be unnecessary (e.g., PostgreSQL)
UNION: Combine results from two sub-query result sets, with the same column name, type and ordering. By default, duplicate rows are removed. This may be overridden with a subsequent ALL statement. For example:

SELECT * FROM TABLE1
UNION ALL
SELECT * FROM TABLE2

Math may be conducted in line for variable in the select statement. For example, SELECT proportion_sick*100 AS percent_sick FROM table_name;.

Subqueries

You can make a subquery to substitute for table in a FROM statement. For example, SELECT name FROM (SELECT col FROM ...) and on.

Filter Statements

GROUP BY: Clause that aggregates rows of a column name that have the same value. It is only used with other aggregate functions. Usually, the columns you are grouping by are also in the SELECT statement. For Example:

SELECT price, type, COUNT(*) 
FROM fake_apps 
GROUP BY price, type;

Note, that order does not matter for multiple GROUP BY columns; the statement returns all permutations, regardless. Further note that GROUP BY 1 is equivalent, that groups by the 1^st column regardless of what it is called. * HAVING: A WHERE clause for aggregate functions. For Example:

SELECT
    name AS 'Artist',
    SUM(albums) AS 'Number of Albums'
FROM
    artists
GROUP BY 
    name
HAVING
    SUM(albums) > 2;

ORDER BY: order the results of a query, either alphabetically or numerically. Example:

SELECT * 
FROM movies 
ORDER BY imdb_rating DESC;

* `DESC`: modifier for `ORDER BY` in descending order. See example above. 
* `ASC`: modifier for `ORDER BY` in ascending order (i.e., `1,2,3,...` or `A,B,C...`).

LIMIT: specifies the maximum number of rows to return. Example: SELECT * FROM movies ORDER BY imdb_rating DESC LIMIT 3
WHERE: filters row(s) based on following conditional in order to SELECT, SET, etc. See UPDATE example.
- AND: combines WHERE conditions. Both conditions must be true to be included in the result set. EXAMPLE: SELECT * FROM movies WHERE name BETWEEN 1998 AND 2000 AND genre = 'comedy'
- BETWEEN: filter results between two values. Example: SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J' filters movies with a name starting with 'A' up to, but not including 'J'.
- IN: Specify multiple values for the WHERE clause. For example: WHERE name IN (name_1, name_2, ...)
- LIKE: Pattern recognition for WHERE filtering. Example: SELECT * FROM movies WHERE name LIKE "Se_en"; returns moves with names Se7en and Seven. _, for a single letter, and %, for zero or more letters are both wildcards.
- OR: combines WHERE conditions similarly to AND. However, only one condition must be true to be included in the result set. Note that aggregators are not allowed in the WHERE statement because WHERE occurs before SELECT statement and aggregators.

Update Statements

Update statements change or remove information from SQL databases.

ALTER TABLE: Add new column to a TABLE object. Example: ALTER TABLE table_name ADD COLUMN miley_cirus_fan BIT.
DELETE FROM: Remove row(s) of data. Example: DELETE FROM celebs WHERE twitter_handle IS NULL;
INSERT INTO: Add row to an existing relation. Example:

INSERT INTO
    celebs [(id, name, age)] #Specify if non-inserting row values for all columns
VALUES 
    (1, 'Justin Bieber' , 21);

SET: indicates a column to edit. See UPDATE example.
UPDATE: used to edit a data row in a relation, often in conjunction with SET and WHERE. Example:

UPDATE celebs 
SET age = 22
WHERE id = 1;

The `JOIN` Statements

Tables are related to each other using primary keys and foreign keys. Primary keys are a unique identifier for rows in a table. Foreign keys are the unique identifier in one table for the rows in a another table.

JOIN & ON: Joins two tables into a single table set based ON a primary & foreign key pair. This command is called an inner command without LEFT or RIGHT, meaning only rows that exist in both tables are returned (i.e., JOIN and INNER JOIN are equivalent). Example:

SELECT 
    albums.id, 
    albums.name AS Album, 
    albums.year, 
    artist.name AS Artist   
FROM
    albums
JOIN artists ON
    albums.artist_id = artists.id AND
    artists.genre != 'pop'; #do not join pop artists

LEFT and RIGHT JOIN: An outer join command includes NULL foreign keys for the left and right table respectively. All rows in the "left" table are returned, regardless of meeting the join condition. Example: ...FROM albums LEFT JOIN artists ON albums.artist_id = artists.id; includes all rows from albums, regardless of whether albums.artist_id contains a value.
FULL JOIN: An outer join including all rows for each table.

JOIN statements may be chained together for a single, or multiple tables.

The `CREATE` & `DROP` Statements

DROP: delete a database or table
CREATE: create a database
- CREATE TABLE: create a database TABLE. Example: CREATE TABLE tab_name(id INTEGER PRIMARY KEY, name TEXT);
- PRIMARY KEY: tell SQL which column is the unique identifier of a row. SQL provides insurance that the PRIMARY KEY is unique for each row and that none of the values are NULL.

SQL Functions

Aggregate Functions

AVG(): Aggregate function that returns the average value of a column, given as a parameter. Example: SELECT AVG(price) FROM fake_apps;.
COUNT(): function that counts the rows of a result set. Example: SELECT COUNT(*) FROM movies; returns the number of movies in the database.
LENGTH(): returns the length of text columns. For example: SELECT name, LENGTH(name) FROM pets; returns the name and length of name of pets (might be LEN() in some SQL implementations)
MAX(): Aggregate function that returns the maximum value for a column, given as a parameter. Example: SELECT MAX(downloads) FROM fake_apps;
MIN(): Aggregate function that returns the maximum value for a column, given as a parameter. Example: SELECT MIN(price) FROM fake_apps;
ROUND(): Aggregate function that rounds a column to a specified number of digits, both given as parameters. Example:

SELECT price, ROUND(AVG(downloads), 2)
FROM fake_apps
GROUP BY price;

SUM(): Aggregate function that adds the INTEGER or REAL values of a column, given as a parameter. Example: SELECT SUM(downloads) FROM fake_apps;

Date Functions

DATEDIFF(<date_units>, <date1>, <date2>): Calculate the difference in date_units between date1 and date2.
DATE_PART(): Filter a specific part of date. E.g., DATE_PART('year', <date_col>) extracts the year from date_col.
TO_DATE(): converts string literal to date format. For example, SELECT TO_DATE(<datetime>), or SELECT TO_DATE(<string> , 'MM/DD/YYYY') where string is in the given format.
INTERVAL <value>: Specifies a time interval. E.g., duration*interval '1 day' would convert duration integer value into a that number of days.

String Functions

RIGHT(<str>, <num>): Return the right num number of characters from str. E.g., RIGHT(account_str, 4) selects the last 4 characters of account_str.
CONCAT(<arg1>, <arg2>, ...): Concatenate a number or arguments into a single string.

Window Functions

Window functions operate on a set of rows and return a single value for each row. The term window refers to the set of rows that the function operates on. There are a number of aggregate functions that are applicable within the window, including AVG, COUNT, FIRST_VALUE, and SUM. The syntax is <FUNCTION()> OVER(PARTITION BY <GROUP_COLUMN>) [AS <alias>] the following:

SELECT
    merchant_location_id,
    service_date,
    ROW_NUMBER() OVER(PARTITION BY merchant_location_id ORDER BY service_date DESC) AS bill_order
FROM
    raw.analytics.analytics_insights_billing

More information on window functions may be found here: Window Functions.

Array Functions

Array functions process or return array data types

sequence(<start>, <end>, [<step>]): Creates an array between start and end [by step]. Can be used with integers, and dates.

Other Functions

CAST(<col_name> AS <type>): Cast a column into a new data type.
COALESCE(): exchanges null values for a user-defined value. E.g., COALESCE(col_name, 0) turns all null values in col_name into 0.
EXISTS(<query>): Returns a boolean value regarding whether the input <query> exists. For example, the following query returns the id and product name from any product that EXISTS having sold for over $20 in the transactions table.

SELECT 
    id
    , name
FROM products
WHERE EXISTS (SELECT 1 FROM transactions WHERE product_id = product.id AND price > 20)

TYPEOF(<col_name>): Return data type of col_name

`CREATE` Functions

The user may define their own functions with CREATE FUNCTION. For (atleast) PostgreSQL flavor, there are 4 requirements

Specify RETURNS <data_type>
Specify language with LANGUAGE SQL (or LANGUAGE plpgsql)
Use SELECT or BEGIN RETURN ... END to specify the output
$$ or ' surrounds function logic

Here is a syntactical example from codewars.com:

CREATE FUNCTION agecalculator(date) RETURNS integer
AS $$ BEGIN RETURN EXTRACT(YEAR FROM AGE($1))::int END $$
LANGUAGE SQL;

Or equivalently

CREATE FUNCTION agecalculator(date) RETURNS int
AS 'SELECT EXTRACT(YEAR FROM AGE($1))::int;'
LANGUAGE SQL

More information regarding creating custom functions with PostgreSQL's CREATE FUNCTION command found here.

Operators

||: concatenate strings. E.g., SELECT 'Ron Weasley has ' || COUNT(*) || ' pets' FROM pets WHERE owner = 'Ron Weasley';
::: Cast field to fmt. E.g., SELECT dollars::FLOAT.
->>: Access fields in JSON data. E.g.,

SELECT * FROM email_events
WHERE sendgrid_data->>'event' = 'email_verification_failure'

--: comments
/* ... */: Multi-line comments

The `CASE` Expression

A generic conditional expression that is similar to if/else expressions in other languages. It follows the following basic syntax.

CASE  
    WHEN condition THEN result
    [ WHEN condition THEN result ]
    ...
    [ ELSE result ]
END

Where condition is a conditional expression and result is the specified return value (braces indicate optional arguments). Alternatively, CASE Expressions may follow this syntax:

CASE expression
    WHEN value THEN result
    [ WHEN value THEN result ]
    ...
    [ ELSE result ]
END

where expression is an expression that is evaluated and compared with value for equality. CASE statements may be inserted directly within SELECT statements and end with END AS var_name, ....

Database Administration

Admin Statements

USE <TYPE> <NAME>: Specify a database or schema for TYPE by NAME for query
SHOW COLUMNS IN <table>: Used to show table columns and information regarding them

Schema

A meta-SQL table with information regarding tables, databases and more. A common usage of the information_schema (postgreSQL) table is to learn about columns. For example information_schema.columns contains much information on all table's columns, including column_name and data_type.

CREATE TABLE table_name(
    id INTEGER, 
    name TEXT,
    age INTEGER
);

Where CREATE TABLE is a clause and column_name DATA_TYPE are parameters. Numbers of lines used does not matter.

Importing Data With `COPY`

Data may be copied into a SQL database from a local or remote source using the following COPY ... FROM syntax:

COPY <table_name> FROM <path>
DELIMITER ',' CSV
HEADER;

Temp. Tables

Using WITH and AS statements, a temporary table may be created for the current session. For example:

WITH calls_made AS 
    (SELECT
        caller_id,
        count(*) AS total_calls
    FROM call_history
    GROUP BY caller_id
    )
SELECT
    caller_id,
    total_calls
FROM 
    calls_made;

Temporary tables may also be created with CREATE TEMPORARY TABLE table_name AS ... or SELECT * INTO TEMPORARY TABLE table_name FROM .... These temp tables will be dropped upon disconnection.

From the command line, SQL databases may be created with .sql text files using the following syntax:

sqlite3 database.db < create_database.sql

A .db file may then be queried in a SQL environment.

Vocab

Some important SQL terminology.

Child Table: A table that references another's primary key, using a foreign key. E.g., a sales_record table, referencing a customer table with a foreign key. If updating a child table ever requires you to update the parent as well, then we have improper design.
Column: a set of data values of a particular type.
Cross-reference Table: A table that links primary keys from two tables, when there are many-to-many relationships between them. E.g., student applicants to universities tables.
Database Normalization: A database design ideal: All non-key columns should belong exclusively to the primary key.
Extract, Transform, Load: three database functions that are combined into one tool to pull data out of one database and place it into another database.
Foreign Key: A table column that contains another table's primary key. A foreign key can contain NULL.
Parent Table: A table that stores a primary key, referenced by a child table's foreign key. E.g., a cities parent table, and a neighborhoods child table.
Primary Key: A table column that uniquely identifies a row from the table and cannot contain NULL.
Row: a single record in a table
Relationship: A table primary key that is the connection between two or more tables.
Relational database: a database that organizes information into one or more tables. Saves space
Relational Database Management System (RDBMS): a program that lets you create, update, and administer a relational database. Most relational database management systems use SQL to access the database. Types: SQLite, MySQL, SQL Database
Schema: defines the structure of a table or a database
Structured Data: Data organized neatly into columns and rows, where all values are primary SQL data types (e.g., integer, varchar, etc.). Thus, excludes data values that are themselves rows or arrays or maps datatypes.
Table (AKA Relation): a collection of data organized into rows and columns.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
joins_reference.png		joins_reference.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Guide to SQL - Structured Query Language

Table of Contents

Advantages and Disadvantages

Data Types

Order of Operations

SQL Statements

The `SELECT` & `FROM` Statements

Subqueries

Filter Statements

Update Statements

The `JOIN` Statements

The `CREATE` & `DROP` Statements

SQL Functions

Aggregate Functions

Date Functions

String Functions

Window Functions

Array Functions

Other Functions

`CREATE` Functions

Operators

The `CASE` Expression

Database Administration

Admin Statements

Schema

Importing Data With `COPY`

Temp. Tables

Vocab

About

Releases

Packages

kdonavin/sql_guide

Folders and files

Latest commit

History

Repository files navigation

A Guide to SQL - Structured Query Language

Table of Contents

Advantages and Disadvantages

Data Types

Order of Operations

SQL Statements

The SELECT & FROM Statements

Subqueries

Filter Statements

Update Statements

The JOIN Statements

The CREATE & DROP Statements

SQL Functions

Aggregate Functions

Date Functions

String Functions

Window Functions

Array Functions

Other Functions

CREATE Functions

Operators

The CASE Expression

Database Administration

Admin Statements

Schema

Importing Data With COPY

Temp. Tables

Vocab

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

The `SELECT` & `FROM` Statements

The `JOIN` Statements

The `CREATE` & `DROP` Statements

`CREATE` Functions

The `CASE` Expression

Importing Data With `COPY`

Packages