New Datasets: Saratoga
By Dana Byerly

We’ve done some much needed updating at Horse Racing Datasets over the last couple of weeks. In addition to a bit of sprucing, we’ve added few new datasets to help you get in mood for Saratoga!

Using Handycapper, a (no longer available) tool that allows you digitize Equibase racing charts to create your own personal racing database, we queried the 2017 Saratoga meet and put together some info. All datasets exclude Steeplechase races.

There are three new datasets:
Saratoga 2017 – Field Size, Favorites and Odds by Distance and Course
Saratoga 2017 – Trainers off 60 days+ and 90 days+
Saratoga 2017 – Trainers 2-year-old FTS

Below you’ll find more info about each dataset plus a query that you can run against your own Handycapper database to produce this data for yourself, for Saratoga or any other meet! What’s a query you ask? Take a look at this thread for an overview and pep-talk as well as some examples. If you have any suggestions for improvements to the listed queries, let us know in the comments.

NOTE: At the request of Equibase Handycapper is no longer available for download. If you already have Handycapper, and we hope you do, the queries below will still work.

In addition to the new datasets, we’ve also added links to a few tools to help you digitize racing data. We’re hoping to do a post with more information and examples in the near future. Also on deck is allowing for sorting by type of date (e.g., view only datasets about breeding or a the Triple Crown). In the meantime, enjoy the new datasets!


Races by Distance and Course
This dataset lists counts of races by distance and course as well as the average field size, average winning odds, average $2 win payout, # of winning favorites and win % for favorites. Here’s a screenshot of the top five races by amount of races run.

For those who grumble about too many turf sprints, you were not imagining it! There were 59 turf sprints run at 5.5 furlongs in 2017, one race more than 6 furlongs on the dirt. On the bright side they had a better average starter rate and paid a little more than 6 furlongs on the dirt.

A few other items of note:
1 1/16 miles on the turf had the highest average field size of 9.71 starters in 27 starts with average winning odds of 6.21-1 but favs won 37.04% of the time, which is pretty much bang on the average favorite win percentage for 2017.

Best winning average odds for races with over 20 starts goes to 6.5 furlongs on the dirt with average winning odds of 7.18-1 in 35 starts, an average field size of 8.06 starters and favorites winning 34.29% of the time.

Favorites fared the worst at 1 mile in the inner, only winning 18.92% of the time in 37 starts. This combo featured the second best average fields size of 9.41 starters and average winning odds of 6.59-1.

Grab this dataset!

Query to produce this dataset using Handycapper
Load your chart or charts into Handycapper and then launch the SQL console. You can copy and paste the query below.

USE HANDYCAPPER;
select DISTANCE_COMPACT as Distance,
COURSE, count (course) as Amt_Races,
round (avg (cast (NUMBER_OF_RUNNERS as float)), 2) as avg_field_size,
round(avg (cast (STARTERS.ODDS as float)), 2) as avg_win_odds,
round(avg (cast (WPS.payoff as float)), 2) as avg_win_payoff,
sum (favorite = 1) as num_winning_favs,
round (sum (favorite = 1) * 100.0 / count (course), 2) as perc_winning_fav
from RACES
inner join STARTERS on races.id=starters.race_id
inner join WPS on starters.id=wps.starter_id
where track = ‘SAR’
and year(date) = ‘2017’
and WPS.type = ‘Win’
and official_position = ‘1’
and not course = ‘Hurdle’
group by DISTANCE_COMPACT, COURSE
order by amt_races desc;

A few notes:

  • To use this query for the upcoming meet, swap in 2018 here: year(date) = ‘2017’
  • To change track swap in the official track code in the where track = ‘SAR’ part.
  • If you want to include Steeplechase races just delete “and not course = ‘Hurdle’.
  • To get the win percentage to play nice with Google Sheet percentage formatting, change the “100.0 / count (course), 2)” to “1.0 / count (course), 4)”.

Trainers off a Break
This dataset looks at how trainers have performed with horses making starts off 60+ day breaks and 90+ day breaks. There’s a tab for counts including number of starts, win, place, show plus win and ITM percents. Screenshots below are of the top five trainers by number of starts.

Top five by number of starts for 60+ days

Top five by number of starts for 90+ days

Brown was notably more effect in the 60-90 day range than in the 90+ day range while Pletcher did a little better at 90+ days. There are 106 trainers in total so plenty to discover beyond Brown and Pletcher.

There’s also a tab that includes more detail about each start for 60+ days including the following data points:

Race #
Distance
Course
Trainer Last
Trainer First
Horse
Last Track
Days Since
Odds
Choice
Finish Position

You can sort and filter on these columns to get a better picture of strengths and areas of opportunity.

Grab this dataset!

Query to produce the counts dataset using Handycapper
Load your chart or charts into Handycapper and then launch the SQL console. You can copy and paste the query below.

USE HANDYCAPPER;
select concat(trainer_first , ‘ ‘, trainer_last) as trainer,
count(*) as starts,
sum(case when OFFICIAL_POSITION = ‘1’ then 1 else 0 end) as wins,
sum(case when OFFICIAL_POSITION = ‘2’ then 1 else 0 end) as place,
sum(case when OFFICIAL_POSITION = ‘3’ then 1 else 0 end) as show
from STARTERS
inner join RACES on starters.race_id=races.id
where track=’SAR’ and year(date) = ‘2017’ and last_raced_days_since >= ’60’ and official_position is not null and not course = ‘Hurdle’
group by TRAINER_LAST,TRAINER_FIRST
order by starts desc, wins desc

A few notes:

  • To use this query for the upcoming meet, swap in 2018 here: year(date) = ‘2017’
  • To change track swap in the official track code in the where track = ‘SAR’ part.
  • If you want to include Steeplechase races just delete “and not course = ‘Hurdle’.
  • You change the days since by changing ’60’ (e.g, change to 90)
  • The win & ITM % are calculated by formulas in the Google Sheet


Query to produce the race details dataset using Handycapper

Load your chart or charts into Handycapper and then launch the SQL console. You can copy and paste the query below.

USE HANDYCAPPER;
select track, date, number as race, distance_compact as dist, course, trainer_last, trainer_first, horse, last_raced_track as last_track, last_raced_days_since as days_since, odds, choice, official_position
from STARTERS
inner join RACES on starters.race_id=races.id
where track=’SAR’ and year(date) = ‘2017’ and last_raced_days_since >= ’60’ and official_position is not null and not course = ‘Hurdle’
order by trainer_last, trainer_first, official_position asc;

A few notes:

  • To use this query for the upcoming meet, swap in 2018 here: year(date) = ‘2017’
  • To change track swap in the official track code in the where track = ‘SAR’ part.
  • If you want to include Steeplechase races just delete “and not course = ‘Hurdle’.

Trainers of Juvenile First Time Starters
This dataset looks at how trainers have performed with juvenile first time starters (FTS). There are several tabs for counts including number of starts, win, place, show plus win and ITM percents. Counts tabs include: All starts, Dirt starts and Turf starts. Like the “off a break” dataset, there’s also a tab with race details for each start. Screenshots below are of the top five trainers by number of starts.

Top five by number of starts – All starts

Top five by number of starts – Dirt starts

Top five by number of starts – Turf starts

Grab this dataset!

Looking at Brown and Pletcher again, Brown has more juvenile first time dirt starters and Pletcher did a bit better with his juvenile first time turf starters. You can drill in and look for plenty of other little interesting tidbits in the race details tab. There are 97 trainers and individual starters.

Query to produce the counts dataset using Handycapper
Load your chart or charts into Handycapper and then launch the SQL console. You can copy and paste the query below.

USE HANDYCAPPER;
select concat(trainer_first , ‘ ‘, trainer_last) as trainer,
count(*) as starts,
sum(case when OFFICIAL_POSITION = ‘1’ then 1 else 0 end) as wins,
sum(case when OFFICIAL_POSITION = ‘2’ then 1 else 0 end) as place,
sum(case when OFFICIAL_POSITION = ‘3’ then 1 else 0 end) as show
from STARTERS
inner join RACES on starters.race_id=races.id
where track=’SAR’ and year(date) = ‘2017’ and last_raced_date is null and age_code = ‘2’
group by TRAINER_LAST,TRAINER_FIRST
order by starts desc, wins desc

A few notes:

  • To use this query for the upcoming meet, swap in 2018 here: year(date) = ‘2017’
  • To change track swap in the official track code in the where track = ‘SAR’ part.
  • To specify a surface add ‘and surface = ‘Dirt’ (or Turf) after age_code = ‘2’
  • To look at first time starters of any age remove “and age_code = 2′”
  • The win & ITM % are calculated by formulas in the Google Sheet


Query to produce the race details dataset using Handycapper

Load your chart or charts into Handycapper and then launch the SQL console. You can copy and paste the query below.

USE HANDYCAPPER;
select track, date, number as race, distance_compact as dist, course, trainer_last, trainer_first, horse, odds, choice, official_position
from STARTERS
inner join RACES on starters.race_id=races.id
where track=’SAR’ and year(date) = ‘2017’ and last_raced_days_since is null and official_position is not null and age_code = ‘2’
order by trainer_last, trainer_first, official_position asc;

A few notes:

  • To use this query for the upcoming meet, swap in 2018 here: year(date) = ‘2017’
  • To change track swap in the official track code in the where track = ‘SAR’ part.
  • To look at first time starters of any age remove “and age_code = 2′”

There you have it, some data from last year’s Saratoga meet and some queries to help you keep track this year. Also be sure to check out Horse Racing Datasets as we’ve added Jessica Chapel’s extensive 2017 Saratoga Juvenile dataset.

Questions? Comments? We're here to help!

Your email address will not be published. Required fields are marked *