EDGAR API Scraping/Alerting Ideas

Navi · January 19, 2023, 2:15am

Place holder for now. Making notes after I fix my marriage

derfam · January 19, 2023, 2:30am

Could scrape the earnings calls transcript to pull out specific data points and push it to trading floor if the ticker is on a watchlist?

Navi · January 19, 2023, 2:37am

guess they have a bulk file jesus christ lmao SEC.gov | EDGAR Application Programming Interfaces

Navi · January 19, 2023, 2:40am

adding in Fred cause it has economic data St. Louis Fed Web Services: FRED® API

Conqueror · January 19, 2023, 6:16am

Grabbed a key for FRED this evening as it’ll be a great tool to feed data to GPT. Going to grab a key for Edgar tomorrow.

Main things this thread needs to deal with is specifically what we should be looking to do with this data. I know @Shadowstars had a project he was working on. Scraping is my forte so I don’t anticipate having any issues extracting the data we’re looking for if we can narrow down the criteria in this thread.

Additionally the positions system should be extended when this data for alerts sake.

Conqueror · January 19, 2023, 6:33am

Briefly looked into earnings call transcripts and the common theme is that there is a sizable lag between the call and availability of the transcript.

Conqueror · January 19, 2023, 9:21pm

Friendly reminder to continue this conversation you lazy sacks of shit

Navi · January 19, 2023, 9:25pm

oh yeah can you grab that bulk file actually and add it to here?

Conqueror · January 19, 2023, 9:26pm

Firstly, what bulk file? Secondly, why would I add it to the forum? Lmao. There are two that I see and the first of which is 1GB.

Navi · January 19, 2023, 9:27pm

Never mind, I’ll download and look through it. Wanna get a field mapping legend together. I’m just not at home still in Seattle

Conqueror · January 19, 2023, 9:36pm

The bulk files wouldn’t be what we’re using since they’re only compiled nightly and we’re looking for things as soon as they come out. If we’re looking for the fastest data, we’d have to use their subscription service: SEC.gov | EDGAR Public Dissemination Service (PDS) System

However, I think that we’d be fine with using the latest filings API (https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent) which gives us the latest filings within “1-3 minutes” of acceptance and a text based version of the filing which can be seen here: https://www.sec.gov/Archives/edgar/data/823546/000149315223001903/0001493152-23-001903.txt

Navi · January 19, 2023, 9:42pm

Idk the bulk files could be helpful for giving Mimir a set of data to be more up to date on stuff. But yeah the application of getting notifications on filings wouldn’t work for that. My only thing is how do we establish the trigger event of it hitting the api cause otherwise it’ll just hit the rate limiter and get timed out

Navi · January 19, 2023, 9:47pm

Hmmmm that’s a good txt file, would rather it comes in a JSON format or something. Be nice to have it feed to Mimir but also set up a Tableau or PowerBI report for the bull data so people can super easily search stuff as well almost. But yeah the new filings question and when to figure out how to hit it is my main question atm

Conqueror · January 19, 2023, 9:47pm

For this project as I understand it we’re not looking for “everything”, we should narrow down a list of filings and stocks that we’re interested in. The process of scanning will be conducted in two parts, one is a minute by minute check of the latest filings page that keeps a snapshot of the data the last time it saw it and forwards any new filings (minus the filtered results) to a worker process that grabs filings in a queue and processes them while respecting the rate limiting of the SEC site. Any information found in the filing that matches criteria we’ve set can be saved easily.

As far as the bulk file, if we’re looking to attack actually storing data we’re going to have to figure out the costs. I personally don’t think it’s that needed as the SEC site is very accessible on demand but I can see reasons for it.

Navi · January 19, 2023, 9:50pm

Kk yeah would we want to then do something like querying the get current reports with the positions feed then? Basically getting a tuple or dictionary of the positions system to be the query_list? Minus a black list for indexes/etfs.

Navi · January 19, 2023, 9:53pm

For the reporting aspect of the bulkfile I kinda just like the idea of basically running an automation to grab the bulk file, overwrite the old one in a OneDrive folder, and then just setting the power bi report source set to the new bulk file. The relationships and formatting will hold once I set it up and then we’ve got an easy sort of Edgar library we could share if people wanna use it. Idk about how helpful it would be honestly. More of a weird project for me to try really

jjcox82 · January 20, 2023, 3:18am

So I’ve been monitoring this for few months now since we discussed in November and December it seems Dow jones newswire drops the first data almost always to TD at least. Which I know you have linked in already.

Typically the press release comes out before the call especially on premarket prints… If we weekly set a group of tickers. To monitor the press release for words or phrases as “growth” “increased guidance” “buyback” “loss” etc. could be helpful. I’m by no means any where close to programmer and slightly computer illiterate when it comes to that stuff. So I don’t know how that aspect works.

Conqueror · January 20, 2023, 5:30am

Well I think using the positions system or even watchlist system is a bit too limited perhaps. While having the info and alerts about stocks that we’re actively playing and watching is important, I also want to build out some functions for signals as well since I’m sure there are potentially opportunities that we’re not going to be in front of. So more thinking along the lines of market cap and more importantly filing type.

Yeah I get this part of it and I’m interested in the “library” aspect but I think having something search the filings as they come is a bit more of the important part of the project considering time is an element. Honestly if you’re waiting for the bulk file it’s already too late in terms of trading opportunities considering something accepted at 8AM won’t be reflected in that file until literally that evening. I’m happy to work on both things but for the sake of trading opportunities the scanning of the latest filings is probably more likely to produce tradeable opportunities.

This is definitely the sort of thing I’m hoping to accomplish. We do have those headlines already wired in and I’m going to attempt to leverage GPT to decipher those releases as they drop. Ideally Mimir would grab headlines and analyze those for instant results (since those are typically sent to news outlets ahead of time per my understanding) and then the latest filings process would catch the filing drops and parse them for additional information, triggering new signals with each data point that is found if that makes sense.

So really in most cases we need to hammer down on what exactly we’re looking for. In regards to EDGAR stuff, what filings and what tickers and with earnings results and press releases, we’re looking for examples of phrases that we can match.

Conqueror · January 24, 2023, 10:04pm

Just a reminder that this development needs further discussion on what to look for in documents. @Navi @Shadowstars, etc.

Navi · January 25, 2023, 9:44pm

Yeah, quick question, as far as developing our ticker criteria library. There is often going to be certain types of submissions that are going to materially effect stocks. I think to better narrow down what type of submissions we want I gotta go through and review the different use cases for each form. That way we can figure out what we actually want. Like I don’t think we truly give a fuck about 10-K or even the 10-Q. If you’re playing Earnings then you’re likely already positioned and trying to get that data faster is pointless cause you can’t manage the position. 8-K’s and possibly a couple others tho are often not well reported on but give huge amounts of context to companies decisions and what’s really going on (the MULN 8-K just published today is a great example)