![]() ![]() Note it stacks each scraped table into a master dataframe, with most recent year ending on top. This takes four arguments: that beginning and end of the URL, and the desired start and end year to scrape. ScrapeData = function ( urlprefix, urlend, startyr, endyr ) ![]() We’ll use the XML library, which we load into our session with ![]() ![]() Let’s write a user-defined scraping function that will scrape one of the nice, clean tables from any one of PFR’s many branches: boxscores, drafts, player data, … we don’t want to have to specify yet. Veered away from the Python way for other reasons, but I may come back to it in a future post. I was originally going to do this in Python, using the BeautifulSoup package, similar to the nice post here, instead. Sidenote on language choice: The content of this post is similar to that in this post, and I am posting because I found it was a surprisingly simple process to scrape massive amounts of data from PFR. If you are interested in doing NFL analytics but are unfamiliar with R, you might want to check out an introduction like mine over here (or a million others around the web), and then come back here. This post will give a few clean techniques to easily scrape data from Pro-Football-Reference using R. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
August 2023
Categories |