Prof: Torben Andersen & Zhengyang Jiang
TA: Jose Antunes-Neto
These are the datasets used in the course. You will need to download them for the weekly assignments. If there is any issue with the data, contact me by email. Codes to update this dataset are available on the GitHub repository. The datasets are described below:
- CRSP Daily Stock Data
- Small Cap Daily Index
- S&P 500 Daily Index
- S&P 500 1-minute Price Data
- Shiller CAPE Data
The file CRSP_daily.csv contains the return series of CRSP firms listed on the NYSE, AMEX, NASDAQ or ARCA at the daily frequency. These returns are calculated as percentage changes in the closing price of the stock. In the beginning of the sample, approximately 520 stocks are included. Towards the end of the sample approximately 7520 stocks are included. The data is downloaded from Wharton Research Data Services (WRDS) and is available from 1926-01-02 to 2023-12-29. The file contains the following variables:
Variable | Type | Description |
---|---|---|
date | string | Date of the observation in the format yyyymmdd |
vwretd | float | CRSP value-weighted index return (Dividends included) |
ewretd | float | CRSP equal-weighted index return (Dividends included) |
More information can be found in the CRSP website. Data is contained in the crsp_q_stock.dsi table.
The file smallcap_daily.csv contains the return series of the bottom 30% of CRSP firms listed on NYSE, AMEX or NASDAQ, ordered by size. Returns are calculated as a percentage change in the closing price of the index and are displayed at the percentage level. Data was obtained from Kenneth French's website by downloading the Portfolios Formed on Size Daily csv file. Data is available from 1926-07-01 to 2024-01-31. The file contains the following variables:
Variable | Type | Description |
---|---|---|
DATE | string | Date of the observation in the format yyyymmdd |
vw | float | Returns of the value-weighted portfolio (Dividends included) |
ew | float | Returns of the equal-weighted portfolio (Dividends included) |
More information can be found here.
The file SP500_daily.csv contains the returns of the S&P 500 index. Returns are calculated using the closing price of the index and are displayed at the percentage level. The data is downloaded from Wharton Research Data Services (WRDS) and is available from 1926-01-02 to 2023-12-29. The file contains the following variables:
Variable | Type | Description |
---|---|---|
caldt | string | Date of the observation in the format yyyymmdd |
vwretd | float | S&P 500 value-weighted index return (Dividends included) |
ewretd | float | S&P 500 equal-weighted index return (Dividends included) |
Data is contained in the crsp_q_indexes.dsp500 table.
The file SPY_HF.zip contains the price series of the S&P 500 ETF, SPY at the minute frequency. The data is download from NYSE Trades and Quotes (TAQ) Consolidated Trades database using WRDS's SAS Studio platform. Observations are available from 1993-01-29 to 2024-03-28 from 9:30 to 16:00 (Eastern Time). Note that this data is not regularly observed as trades are not always available at every minute for the entire sample. The file contains the following variables:
Variable | Type | Description |
---|---|---|
DATETIME | string | Date and time of the observation in the format ddmmmyyyy:HH:MM:SS |
SYMBOL | string | Symbol of the extracted stock (SPY) |
PRICE | float | Average price per minute (weighted by trade size using all non canceled trades) |
The TAQ database is divided into 2 types. For the series between 1993 and 2014, data is available at the second frequency and is obtained from the taq library. For observations starting on 2015, data was collected at the milisecond level and is available in the taqmsec library. For both these series, the data was upscaled to the minute frequency using the last available price. More information about this dataset can be found at the NYSE website.
The file Shiller_ie_data.csv data set consists of monthly stock price, dividends, and earnings data and the consumer price index (to allow conversion to real values), all starting January 1871. The data is directly downloaded from Robert Shiller's website and more information can be found there.
code/getTAQ.sas
: SAS script example used in WRDS to download the S&P 500 1-minute price data;code/update_data.py
: Python script to update the remaining data. It downloads the data from the sources and saves it in thedata/
directory.