Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap Forms 3,4, and 5 #22

Closed
firmai opened this issue Nov 24, 2024 · 4 comments
Closed

Roadmap Forms 3,4, and 5 #22

firmai opened this issue Nov 24, 2024 · 4 comments

Comments

@firmai
Copy link

firmai commented Nov 24, 2024

You seemed to have worked a little on the form 3 download function at a point in time.

downloader.download(form='3', date='2024-05-21', output_dir='filings')

It does currently lead to an error, I was wondering about a few things, will you expand datamule to incorproate the download of all file form types eventually? Also more technically

And secondly speaking about parsers, for 3,4,and 5 are you planning on parsers to create an insider trading dataset?

@john-friedman
Copy link
Owner

My bad! I think that was an overlooked issue with adding metadata. downloader.download(form='3', date=['2024-05-21'], output_dir='filings1') works.

Datamule will be expanded to incorporate downloads of all files contained within submissions, including graphics + unusual file types.

Yep! 3,4,5 parsers are already done - considering setting up insider trading datasets hosted on a database for faster speed. Is this something you're interested in?

@firmai
Copy link
Author

firmai commented Nov 24, 2024

Very interested, and willing to sponsor some of the hosting cost. So you will end up creating downloading modules for all file types?

For example this typically is very slow and leads to too many requests issues:

https://sec-edgar.github.io/sec-edgar/filings.html#secedgar.ComboFilings

import nest_asyncio
nest_asyncio.apply()

from datetime import date
from secedgar import ComboFilings

# Create filter function for Form 4
def form4_filter(filing_entry):
    return filing_entry.form_type == "4"

# Create ComboFilings instance with Form 4 filter
combo_filings = ComboFilings(
    start_date=date(2020, 1, 6),
    end_date=date(2020, 11, 5),
    entry_filter=form4_filter,
    user_agent="Clase [email protected]",
)

combo_filings.save('/my_directory')

@john-friedman
Copy link
Owner

Currently you can get 10 requests/ second from the SEC using downloader. Once I have my archive set up the rate limit should be more like 100-1000s of requests / second.

For insider trading, I'd like to setup a database that is updated when new filings come in. There are about 4 million insider trading disclosures and I think with some optimization I can get that to fit on Turso's 8gb free tier.

@john-friedman
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants