The following notes are A programming language designed to manage data stored in relational databases. SQL is a declarative language (vs. imperative). That is, the user tells SQL what they want, not how to get it.
- Advantages and Disadvantages
- Data Types
- Order of Operations
- SQL Statements
- The
CREATE
&DROP
Statements - SQL Functions
- Operators
- The
CASE
Expression - Database Administration
- Temp. Tables
- Vocab
Structured databases are popular for their advantages, which may also be disadvantages in certain situations.
- Structured: tables and schema are fixed and inflexible both an advantage (in terms of user IO) and disadvantage (if a change is needed, the data do not lend themselves to a table format)
- Centralized: SQL is designed to be in a single server, rather than distributed across servers. Both an advantage (controlled) and a disadvantage (e.g., moving Asian customer table to Asia for efficiency).
CHAR
: fixed-length charactersDATE
: dateENUM
: values from a discrete set of options. E.g.,ENUM('android','iphone','other')
INTEGER
: integer.MONEY
: Dollar formatTEXT
: string format. Note that this must be inserted into SQL with'
and escaped with double'
. E.g.,'String for O''Brian'
.VARCHAR
: variable-length characters
Note that variables may be cast to new formats with the ::
operator.
For example, SELECT '10'::INTEGER;
FROM
: Choose andJOIN
tables to get base dataWHERE
: Filters the base dataGROUP BY
: Aggregates the base data into groupsHAVING
: Filters the aggregated dataSELECT
: Chooses the final dataORDER BY:
Sorts the final dataLIMIT
: Limits the returned final data to a row count
SQL comes in many 'flavors', but the following statements are generally
universal with some exceptions. Some statements (e.g., COPY
) are more
specific to flavor and should be Googled. SQL statements are made of
clauses, written in CAPITALS by convention (not syntactically
required), have parameters in between ()
, and end in ;
.
The most common query. SELECT
is used to select data FROM
a table.
For example,
SELECT var1_name, var2_name FROM table_name;
*
:SELECT
all table columns).SELECT
always returns a new table called the result set.DISTINCT
: a query returning unique values in a result set. Example:SELECT DISTINCT genre FROM movies;
AS
: rename column names using a alias. ExampleSELECT name AS "Album Name", year FROM albums;
. Depending on the flavor of SQL, the alias may need to be in double quotes, andAS
might be unnecessary (e.g., PostgreSQL)UNION
: Combine results from two sub-query result sets, with the same column name, type and ordering. By default, duplicate rows are removed. This may be overridden with a subsequentALL
statement. For example:
SELECT * FROM TABLE1
UNION ALL
SELECT * FROM TABLE2
Math may be conducted in line for variable in the select statement. For
example, SELECT proportion_sick*100 AS percent_sick FROM table_name;
.
You can make a subquery to substitute for table in a FROM
statement.
For example, SELECT name FROM (SELECT col FROM ...)
and on.
GROUP BY
: Clause that aggregates rows of a column name that have the same value. It is only used with other aggregate functions. Usually, the columns you are grouping by are also in theSELECT
statement. For Example:
SELECT price, type, COUNT(*)
FROM fake_apps
GROUP BY price, type;
Note, that order does not matter for multiple GROUP BY
columns; the
statement returns all permutations, regardless. Further note that
GROUP BY 1
is equivalent, that groups by the 1st column
regardless of what it is called. * HAVING
: A WHERE
clause for
aggregate functions. For Example:
SELECT
name AS 'Artist',
SUM(albums) AS 'Number of Albums'
FROM
artists
GROUP BY
name
HAVING
SUM(albums) > 2;
ORDER BY
: order the results of a query, either alphabetically or numerically. Example:
SELECT *
FROM movies
ORDER BY imdb_rating DESC;
* `DESC`: modifier for `ORDER BY` in descending order. See example above.
* `ASC`: modifier for `ORDER BY` in ascending order (i.e., `1,2,3,...` or `A,B,C...`).
LIMIT
: specifies the maximum number of rows to return. Example:SELECT * FROM movies ORDER BY imdb_rating DESC LIMIT 3
WHERE
: filters row(s) based on following conditional in order toSELECT
,SET
, etc. SeeUPDATE
example.AND
: combinesWHERE
conditions. Both conditions must be true to be included in the result set. EXAMPLE:SELECT * FROM movies WHERE name BETWEEN 1998 AND 2000 AND genre = 'comedy'
BETWEEN
: filter results between two values. Example:SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J'
filters movies with a name starting with 'A' up to, but not including 'J'.IN
: Specify multiple values for theWHERE
clause. For example:WHERE name IN (name_1, name_2, ...)
LIKE
: Pattern recognition forWHERE
filtering. Example:SELECT * FROM movies WHERE name LIKE "Se_en";
returns moves with namesSe7en
andSeven
._
, for a single letter, and%
, for zero or more letters are both wildcards.OR
: combinesWHERE
conditions similarly toAND
. However, only one condition must be true to be included in the result set. Note that aggregators are not allowed in theWHERE
statement becauseWHERE
occurs beforeSELECT
statement and aggregators.
Update statements change or remove information from SQL databases.
ALTER TABLE
: Add new column to aTABLE
object. Example:ALTER TABLE table_name ADD COLUMN miley_cirus_fan BIT
.DELETE FROM
: Remove row(s) of data. Example:DELETE FROM celebs WHERE twitter_handle IS NULL
;INSERT INTO
: Add row to an existing relation. Example:
INSERT INTO
celebs [(id, name, age)] #Specify if non-inserting row values for all columns
VALUES
(1, 'Justin Bieber' , 21);
SET
: indicates a column to edit. SeeUPDATE
example.UPDATE
: used to edit a data row in a relation, often in conjunction withSET
andWHERE
. Example:
UPDATE celebs
SET age = 22
WHERE id = 1;
Tables are related to each other using primary keys and foreign keys. Primary keys are a unique identifier for rows in a table. Foreign keys are the unique identifier in one table for the rows in a another table.
JOIN
&ON
: Joins two tables into a single table set basedON
a primary & foreign key pair. This command is called an inner command withoutLEFT
orRIGHT
, meaning only rows that exist in both tables are returned (i.e.,JOIN
andINNER JOIN
are equivalent). Example:
SELECT
albums.id,
albums.name AS Album,
albums.year,
artist.name AS Artist
FROM
albums
JOIN artists ON
albums.artist_id = artists.id AND
artists.genre != 'pop'; #do not join pop artists
LEFT
andRIGHT JOIN
: An outer join command includesNULL
foreign keys for the left and right table respectively. All rows in the "left" table are returned, regardless of meeting the join condition. Example:...FROM albums LEFT JOIN artists ON albums.artist_id = artists.id;
includes all rows fromalbums
, regardless of whetheralbums.artist_id
contains a value.FULL JOIN
: An outer join including all rows for each table.
JOIN
statements may be chained together for a single, or multiple
tables.
DROP
: delete a database or tableCREATE
: create a databaseCREATE TABLE
: create a databaseTABLE
. Example:CREATE TABLE tab_name(id INTEGER PRIMARY KEY, name TEXT);
PRIMARY KEY
: tell SQL which column is the unique identifier of a row. SQL provides insurance that thePRIMARY KEY
is unique for each row and that none of the values areNULL
.
AVG()
: Aggregate function that returns the average value of a column, given as a parameter. Example:SELECT AVG(price) FROM fake_apps;
.COUNT()
: function that counts the rows of a result set. Example:SELECT COUNT(*) FROM movies;
returns the number of movies in the database.LENGTH()
: returns the length of text columns. For example:SELECT name, LENGTH(name) FROM pets;
returns the name and length of name of pets (might beLEN()
in some SQL implementations)MAX()
: Aggregate function that returns the maximum value for a column, given as a parameter. Example:SELECT MAX(downloads) FROM fake_apps;
MIN()
: Aggregate function that returns the maximum value for a column, given as a parameter. Example:SELECT MIN(price) FROM fake_apps;
ROUND()
: Aggregate function that rounds a column to a specified number of digits, both given as parameters. Example:
SELECT price, ROUND(AVG(downloads), 2)
FROM fake_apps
GROUP BY price;
SUM()
: Aggregate function that adds theINTEGER
orREAL
values of a column, given as a parameter. Example:SELECT SUM(downloads) FROM fake_apps;
DATEDIFF(<date_units>, <date1>, <date2>)
: Calculate the difference indate_units
betweendate1
anddate2
.DATE_PART()
: Filter a specific part of date. E.g.,DATE_PART('year', <date_col>)
extracts the year fromdate_col
.TO_DATE()
: converts string literal to date format. For example,SELECT TO_DATE(<datetime>)
, orSELECT TO_DATE(<string> , 'MM/DD/YYYY')
wherestring
is in the given format.INTERVAL <value>
: Specifies a time interval. E.g.,duration*interval '1 day'
would convertduration
integer value into a that number of days.
RIGHT(<str>, <num>)
: Return the rightnum
number of characters fromstr
. E.g.,RIGHT(account_str, 4)
selects the last4
characters ofaccount_str
.CONCAT(<arg1>, <arg2>, ...)
: Concatenate a number or arguments into a single string.
Window functions operate on a set of rows and return a single value for
each row. The term window refers to the set of rows that the function
operates on. There are a number of aggregate functions that are
applicable within the window, including AVG
, COUNT
, FIRST_VALUE
,
and SUM
. The syntax is
<FUNCTION()> OVER(PARTITION BY <GROUP_COLUMN>) [AS <alias>]
the
following:
SELECT
merchant_location_id,
service_date,
ROW_NUMBER() OVER(PARTITION BY merchant_location_id ORDER BY service_date DESC) AS bill_order
FROM
raw.analytics.analytics_insights_billing
More information on window functions may be found here: Window Functions.
Array functions process or return array data types
sequence(<start>, <end>, [<step>])
: Creates an array betweenstart
andend
[bystep
]. Can be used with integers, and dates.
CAST(<col_name> AS <type>)
: Cast a column into a new data type.COALESCE()
: exchanges null values for a user-defined value. E.g.,COALESCE(col_name, 0)
turns all null values incol_name
into0
.EXISTS(<query>)
: Returns a boolean value regarding whether the input<query>
exists. For example, the following query returns theid
and productname
from any product thatEXISTS
having sold for over $20 in thetransactions
table.
SELECT
id
, name
FROM products
WHERE EXISTS (SELECT 1 FROM transactions WHERE product_id = product.id AND price > 20)
TYPEOF(<col_name>)
: Return data type ofcol_name
The user may define their own functions with CREATE FUNCTION
. For
(atleast) PostgreSQL flavor, there are 4 requirements
- Specify
RETURNS <data_type>
- Specify language with
LANGUAGE SQL
(orLANGUAGE plpgsql
) - Use
SELECT
orBEGIN RETURN ... END
to specify the output $$
or'
surrounds function logic
Here is a syntactical example from codewars.com
:
CREATE FUNCTION agecalculator(date) RETURNS integer
AS $$ BEGIN RETURN EXTRACT(YEAR FROM AGE($1))::int END $$
LANGUAGE SQL;
Or equivalently
CREATE FUNCTION agecalculator(date) RETURNS int
AS 'SELECT EXTRACT(YEAR FROM AGE($1))::int;'
LANGUAGE SQL
More information regarding creating custom functions with PostgreSQL's
CREATE FUNCTION
command found
here.
||
: concatenate strings. E.g.,SELECT 'Ron Weasley has ' || COUNT(*) || ' pets' FROM pets WHERE owner = 'Ron Weasley';
::
: Castfield
tofmt
. E.g.,SELECT dollars::FLOAT
.->>
: Access fields in JSON data. E.g.,
SELECT * FROM email_events
WHERE sendgrid_data->>'event' = 'email_verification_failure'
--
: comments/* ... */
: Multi-line comments
A generic conditional expression that is similar to if/else expressions in other languages. It follows the following basic syntax.
CASE
WHEN condition THEN result
[ WHEN condition THEN result ]
...
[ ELSE result ]
END
Where condition
is a conditional expression and result
is the
specified return value (braces indicate optional arguments).
Alternatively, CASE
Expressions may follow this syntax:
CASE expression
WHEN value THEN result
[ WHEN value THEN result ]
...
[ ELSE result ]
END
where expression
is an expression that is evaluated and compared with
value
for equality. CASE
statements may be inserted directly within
SELECT
statements and end with END AS var_name, ...
.
USE <TYPE> <NAME>
: Specify a database or schema forTYPE
byNAME
for querySHOW COLUMNS IN <table>
: Used to showtable
columns and information regarding them
A meta-SQL table with information regarding tables, databases and more.
A common usage of the information_schema
(postgreSQL) table is to
learn about columns
. For example information_schema.columns
contains
much information on all table's columns, including column_name
and
data_type
.
CREATE TABLE table_name(
id INTEGER,
name TEXT,
age INTEGER
);
Where CREATE TABLE
is a clause and column_name DATA_TYPE
are
parameters. Numbers of lines used does not matter.
Data may be copied into a SQL database from a local or remote source
using the following COPY ... FROM
syntax:
COPY <table_name> FROM <path>
DELIMITER ',' CSV
HEADER;
Using WITH
and AS
statements, a temporary table may be created for
the current session. For example:
WITH calls_made AS
(SELECT
caller_id,
count(*) AS total_calls
FROM call_history
GROUP BY caller_id
)
SELECT
caller_id,
total_calls
FROM
calls_made;
Temporary tables may also be created with
CREATE TEMPORARY TABLE table_name AS ...
or
SELECT * INTO TEMPORARY TABLE table_name FROM ...
. These temp tables
will be dropped upon disconnection.
From the command line, SQL databases may be created with .sql
text
files using the following syntax:
sqlite3 database.db < create_database.sql
A .db
file may then be queried in a SQL environment.
Some important SQL terminology.
- Child Table: A table that references another's primary key,
using a foreign key. E.g., a
sales_record
table, referencing acustomer
table with a foreign key. If updating a child table ever requires you to update the parent as well, then we have improper design. - Column: a set of data values of a particular type.
- Cross-reference Table: A table that links primary keys from two tables, when there are many-to-many relationships between them. E.g., student applicants to universities tables.
- Database Normalization: A database design ideal: All non-key columns should belong exclusively to the primary key.
- Extract, Transform, Load: three database functions that are combined into one tool to pull data out of one database and place it into another database.
- Foreign Key: A table column that contains another table's
primary key. A foreign key can contain
NULL
. - Parent Table: A table that stores a primary key, referenced by a child table's foreign key. E.g., a cities parent table, and a neighborhoods child table.
- Primary Key: A table column that uniquely identifies a row from
the table and cannot contain
NULL
. - Row: a single record in a table
- Relationship: A table primary key that is the connection between two or more tables.
- Relational database: a database that organizes information into one or more tables. Saves space
- Relational Database Management System (RDBMS): a program that lets you create, update, and administer a relational database. Most relational database management systems use SQL to access the database. Types: SQLite, MySQL, SQL Database
- Schema: defines the structure of a table or a database
- Structured Data: Data organized neatly into columns and rows,
where all values are primary SQL data types (e.g.,
integer
,varchar
, etc.). Thus, excludes data values that are themselvesrows
orarrays
ormaps
datatypes. - Table (AKA Relation): a collection of data organized into rows and columns.