Announcing Horse Racing Datasets
By Dana Byerly
“…lots of fun can be had playing with the data looking for all kinds of funky angles.”
– o_crunk, 2010
We’ve been interested in horse racing datasets, and their unfortunate lack of availability, for some time. It’s a time-honored tradition for horseplayers to compile datasets for personal use, but it’s less common for them to share those datasets. Obviously this is logical, given that horse racing is a parimutuel game–why share a potential edge?
But there are some gems of compiled data out there, and given the scarcity and value of such generously shared datasets, we’re making them a little easier to find.
In a recent post where we were able to use a dataset compiled by a fellow player, we noted this:
“As handicappers and analysts, we’re all grudgingly used to the fact that we don’t live in an open data ecosystem, or even a closed data one where we can, for a reasonable price, easily grab data in a convenient and modifiable form to use for non-commercial purposes such as handicapping or blog posts meant to share insights and information. Yes, one can download text files from various PP vendors but I wouldn’t count the prices as affordable (e.g., $700-$800 for text files of charts for the year for personal use) and I also wouldn’t count the format as easy to use, especially when comparing to file formats like the ones found at Tennis-Data.co.uk, for example.”
We believe that the game would be better, and ultimately more enticing to data nerds, if horse racing information were easier to access as raw data at a reasonable price.
On November 1, 2009, we launched the first version of Hello Race Fans, a site dedicated to educating both new and existing fans by providing information on handicapping and wagering, and on the history and pleasures of the game. At the time, few online resources existed for people who were interested in learning more about Thoroughbred horse racing.
Just over five years later, the landscape has changed; you’ll find no shortage of ways to learn about handicapping, wagering and horse racing history online. Whether you’re looking to sharpen your skills or help a friend take those first steps towards understanding how to place a wager, you don’t have to work very hard to track down meaningful, useful information. And it’s not just independent publishers such as ourselves who have taken up the charge. The industry itself has realized the importance of helping people engage more deeply with racing and has allocated plenty of resources and budget to do so.
We’re happy to continue to play our part. We still believe in fan education, and we also believe that the data-rich landscape of racing lends itself to making sure that information is widely available and easy to use. With this in mind we created Horse Racing Data Sets, to make it a tiny bit easier to find what’s already out there. And just like with fan education, we hope the industry will follow suit and become committed to a more open data ecosystem over the next five or so years.
will you do a dataset for the Belmont stakes?
Hi Chuck, thanks for stopping by. And thanks for your interest in Horse Racing Datasets!
We don’t currently have any Belmont datasets or plans to make any. But, you could grab the Brisnet dataset, which has a tab for all the of TC races going to back 1990. We’re also keeping our eyes open for Belmont or Triple Crown datasets that others might have shared. Hope this helps!
Hi! I’m interested in horse race data (completion time, horse weight/age, jockey data, etc) to use in regression analysis models. This is not for wagers, so can be old data. Do you know where I could find something like this?
I don’t know how much data you’re looking for, but I would try Keeneland. You can download all of their results going back to results as csv file:
Hope this helps, and thanks for stopping by! You can always check out Horse Racing Datasets too (the Keeneland link is there as well):
I like what you are attempting to do here. I have a request – back when the Brisnet Handicapper’s Edge didn’t link to TwinSpires for info, the B.H.E. used to publish the week’s top Beyers figures for various surfaces and distances – used many,many track not just the “big” tracks.. Do you know if I can obtain Beyers for a week, month or other manageable time period?
Thanks for the kind words! Unfortunately the only thing that comes to mind for Beyers is that they’re included on DRF’s Stakes Results page:
Not exactly a “top Beyers of the week” but it’s the only thing I’ve noticed. I think Illman used to include a list in blog but I’m entirely sure he posts regularly these days, let alone includes top Beyers.
Elsewhere Brisnet publishes their top stakes figs weekly here:
TimeformUS publishes theirs weekly at TVG:
And you can search for graded and listed results at Equibase for results that include their figs:
Hope this helps, and thanks for stopping by.
I have a theory that the roi on favorites in the last race of the day may be better than other races given folks trying to get out fit the day. Any stats for last race of the day?
Hi Marty, that’s a very interesting theory. We don’t have any datasets specific to that, but maybe you could grab that data from Keeneland (2006 – current) to see what you can make of it:
Let us know if you come up with anything!
I’m looking at creating a speed map of horses over time and need some particular results to do so.
I’m not fussed about what data I use right now but it would have to be from today going back 2-3 years.
The data I need would be split times and completion times of a large set of given races.
Any idea where I could find this?
The Keeneland handicapping database has (some) split times going back to 2006, you can grab the data here: