PDA

View Full Version : A new ranking systen for college hockey



Pages : [1] 2 3

goblue78
03-22-2011, 08:30 PM
This paper (http://dl.dropbox.com/u/5755704/ranking.doc) describes a new ranking system for college hockey based on Poisson Regression. It's not quite perfect yet, but I wanted to get a preliminary version of the paper out for comments. I post it now because there are probabilistic predictions for the upcoming NCAA Tournament contained in it. They differ substantially from the KRACH-based probabilistic projections I posted earlier in the week. I realize the math may be challenging for some, but I've tried to give an introduction that anybody with a familiarity with probability models can understand. I'd love to get comments from anyone about what's in here; what makes sense, what's confusing, and what's flat out wrong. By the way -- "This can't be right... My team is ranked too low" doesn't count as wrong.

state of hockey
03-22-2011, 08:39 PM
This can't be right... My team is ranked too high.

BigRedTerrier
03-22-2011, 09:13 PM
I think this is awesome and I'll definitely take a longer look at this once I'm finished with my thesis. Maybe this will finally help me understand how to fit my own data with a Poission dist.

One thing I noticed though...I assume this is a first pass that you'll then add addittional parameters. Would the assumption of scoring uniformity be violated for power plays as well? Is there some way you could reasonably factor in the likelihood of a team to take penalties, or prob more importantly get scored on, or score on power plays? I would think given every game is going to have at least 5-6 penalties (or in HE 15), this would be an important factor to include when trying to model a team's performance. This would probably also apply for ENGs, or goals with an extra attacked, but you could prob just subtract those goals for now.

very cool though

goblue78
03-22-2011, 09:25 PM
Thanks, Mr. Terrier. I don't think the nonuniformity imposed by penalties is that big of a deal, as long as teams consistently draw or are called for penalties. That just fits into the background rate. You bring up a point that's worth exploring, however, which is whether teams playing interconference games have different scoring or defensive propensities owing to differential probabilities of penalty calls. That's easy to control for.

As to controlling for ENG, extra attacker and PP goals in general though, I am curently limited to my dataset, which only gives the score of every game and whether or not it went to overtime (as well as home or neutral sites). Without parsing every box score, I can't adjust for PP or ENG. But thanks again.

JF_Gophers
03-22-2011, 09:40 PM
It sounds fine for trying to predict records, scores, etc. But i wouldn't want to use it to rank teams say for tournament selection.

It shouldn't matter how you win a game, just that you won it.

goblue78
03-22-2011, 10:16 PM
I discuss that briefly in the paper, and the NCAA agrees with you. That doesn't enhance my confidence that you're right. I'm trying to find the most powerful teams, irrespective of their record. The fact that these rankings pretty much mirror both KRACH and the Pairwise suggest that this concern is mostly theoretical. But I grant that this system will rank teams which just barely lost a bunch of games much higher than KRACH will. It will also penalize teams that lose badly more than KRACH. But don't you think of two teams playing roughly the same schedule, that the team that won all the games they won by 3 and lost all the games they lost by 1 is just plain more worthy than a team with the same record which barely won in their wins and got killed in their losses?

Fighting Sioux 23
03-22-2011, 10:29 PM
I think this is awesome and I'll definitely take a longer look at this once I'm finished with my thesis. Maybe this will finally help me understand how to fit my own data with a Poission dist.

One thing I noticed though...I assume this is a first pass that you'll then add addittional parameters. Would the assumption of scoring uniformity be violated for power plays as well? Is there some way you could reasonably factor in the likelihood of a team to take penalties, or prob more importantly get scored on, or score on power plays? I would think given every game is going to have at least 5-6 penalties (or in HE 15), this would be an important factor to include when trying to model a team's performance. This would probably also apply for ENGs, or goals with an extra attacked, but you could prob just subtract those goals for now.

very cool though

As far as penalties and powerplay goals and the like is concerned, you could probably use PK and PP percentages somehow. That way, you don't have to go through every single box score.

BigRedTerrier
03-22-2011, 10:37 PM
As far as penalties and powerplay goals and the like is concerned, you could probably use PK and PP percentages somehow. That way, you don't have to go through every single box score.

That's along the lines of what I was suggesting. To me, if you're a team that's converting 25% of your powerplay chances and you're facing a team that's dead last in the kill, the probability of scoring is going to skyrocket for the 2min-ish periods of time, which would violate the assumption that there's a uniform distribuition of of scoring probability throughout the game. Just could be another area where a bit of tweaking could better predict the data as I feel like nothing is truly Poisson.

Also those lines goblue...have you been able to statistically show that your model fits the season results? I may have missed that...apologize if I did.

slurpees
03-22-2011, 10:52 PM
Certainly a very interesting creation and a good read. A couple of things, first, if this were to be used as the ranking system for determining playoff teams, how do you account for injuries? Say player A is the leading goal scorer for Team A, but gets injured in the final week of the season and is out for the year. The team's rating is based on results from when he played, but by the rules laid out in this system, the goal scoring probability very likely will change for the tournament games, which is what this system is the end result of this whole idea. Another injury example, what if a goaltender goes down for the year and the backup has to play out the rest of the season. Obviously, the goalie is the single player on the ice who has the most direct impact on the number of goals scored, and could theoretically be out there by himself and keep a team scoreless, thereby rendering the other five skaters pointless from a defensive perspective. Of course this is a wildly irrational scenario, but it underscores the huge effect the goalie has on the outcome of the game. If he goes down, the system has to account for that in some way. I think the point I'm getting at here is a system like this seems like it would be dependent upon who is playing in the game, not just generalized statistics about overall offense and defense. Who is scoring the goals, and who isn't allowing them. I'm not a math whiz, so I may have missed any adjustment for this you may have made, but if it was there, it didn't seem too heavily weighted. If this kind of system were to be used for determining tournament teams, I think there would have to be some accountability for the changes in lines, injuries, goalie tandems, and other player personnel changes made game to game. This may be a reason why the projected records this formula put out were off, which may be a sign that this is incomplete.

Of course the chances of the NCAA adopting something like this are about the same as my chances were earlier today at winning $20,000 on my $2 scratch ticket, but it's still fun to talk about, and I think if you were able to tweak this a bit and be able to produce predictions for previous seasons that were very close to dead-on, this could be something that would be fun to use in the future.

Red Cows
03-22-2011, 11:08 PM
It sounds fine for trying to predict records, scores, etc. But i wouldn't want to use it to rank teams say for tournament selection.

It shouldn't matter how you win a game, just that you won it.

I'm not sure I agree with your 2nd sentence.

When Bill James was still writing Baseball Abstract, he did an entire chapter one year on the significance of how you won, and, in particular, by what margin, that postulated that it was very indicative of what kind of team you are/have. Good teams win by large margins. Bad ones don't. That was the gist of it all. This same article completely pooh-poohed 1 run wins as pretty much meaningless (despite how much you always hear about them in MLB), over the course of baseball history, and he took a look at all of it to formulate that opinion.

It looks like the writer here came to some of the same conclusions that James did, for college hockey, although I readily admit we are talking two entirely different sports here. The parallels to what he said in Baseball Abstract are interesting, though.

JF_Gophers
03-23-2011, 07:22 AM
I'm not sure I agree with your 2nd sentence.

When Bill James was still writing Baseball Abstract, he did an entire chapter one year on the significance of how you won, and, in particular, by what margin, that postulated that it was very indicative of what kind of team you are/have. Good teams win by large margins. Bad ones don't. That was the gist of it all. This same article completely pooh-poohed 1 run wins as pretty much meaningless (despite how much you always hear about them in MLB), over the course of baseball history, and he took a look at all of it to formulate that opinion.

It looks like the writer here came to some of the same conclusions that James did, for college hockey, although I readily admit we are talking two entirely different sports here. The parallels to what he said in Baseball Abstract are interesting, though.Wins and Quality of opponents are still the two most important factors.

Quantifying quality of opponent is the hard part of the equation. Over the history of a program, I would agree that margin of victory probably does bare out the quality of a team vs others. But 34 games and a fairly insular scheduling system doesn't allow for that type of analysis to be meaningful in a single given season.

If everyone played everyone else in a single season, then maybe I would lend credence to it. But that 1) is impossible and 2) would never happen even if it was possible.

ETA: I would also be interested to know if this ranking system was reverse engineered. Because there is a built in bias when you a create a system that validates teams you think are good, or have been historically good. This is the same question i've had about the RPI number.

You should never start with "Here are the teams I know are good, now how do I validate that?"

goblue78
03-23-2011, 07:55 AM
Thanks to all. I'm going to add more to the paper about nonuniformity and controlling for PP and ENG, so I won't add much here. In short, I don't really think it's necessary, but it's worth a look. And thanks for the suggestion of using aggregate PP and PK, FS23. That won't really work for technical reasons, but it helped clarify in my head how to describe the issue.

Slurpees: Any statistics-based model assumes that the model applies for the whole dataset, or explicitly invokes some changing parameter over time. There is no reasonable way in this sort of model to account for injuries, personnel changes, or any of that stuff. It essentially just assumes that this year's history is who you are. Of course, so does every other system we're discussing: KRACH and PWR, for example, though human polls can take account of anything they choose. Obviously, a team that sustains injuries to critical players won't be as good as the power rating under this sort of system indicates. And there's no real way in this sort of model to figure out how much worse. To do that, you'd need a model that worked at the player level, not the team level. There are some diagnostics you could use to see if a team is underperforming relative to the way they performed a month ago, but I don't think that's an ideal use for this kind of model, because humans will spot patterns that are really just random occurrences.

JF_Gophers: First, I can state that the methodology has not been tweaked or adjusted in any way to get a result. And the methodology is so simple (in concept -- to get practical numbers you need a pretty good computer and expensive software) there's really not any scope for doing so. Second, this methodology, without knowing the score of a single game (it only knows what one team scored and who their opponent is) managed to rank the teams in such a way that 13 of the 15 non-playin teams in the tournament were in the top 15. So PWR depends on quality wins and wins and head-to-head comparisons, but it gets almost the same results as a method which doesn't know the complete score of any game. For those who haven't looked at the paper, here are the top 15 teams in rank order:

1. North Dakota
2. Miami
3. Yale
4. Boston College
5. Michigan
6. Nebraska-Omaha
7. Union
8. Notre Dame
9. Denver
10. Wisconsin
11. New Hampshire
12. Minnesota Duluth
13. Merrimack
14. Western Michigan
15. St. Cloud State

For a method that doesn't look at the winner of a single game, that's a pretty good list, IMO, other than Wisconsin, for whom I'm going to add a section in the paper. Other than Air Force, the two teams (CC and RPI) who made it in in lieu of SCS and Wisconsin, are ranked 17th and 20th respectively. Pretty close. And note that this methodology got all four of the top seeds, albeit in a different order.

Finally, Red Cows: I'm in complete agreement, and the Bill James article you cited was an important article to me way back in 1982 when it was published. For those who don't go back that far, that was the year an Atlanta Braves team who wasn't very good the year before rattled off a dozen wins or so to start the season. As that article pointed out (and is fully in the spirit of this methodology) any team can go on a hot streak. To figure out if they're any good, you need to look at whether they're killing people or squeaking by to make an informed judgment. That Braves team, to many people's surprise, ended up winning the National League West that year and the James article was an attempt to retrospectively figure out whether or not we should have figured that out at the time. Thanks for reminding me of it.

Khryx
03-23-2011, 08:27 AM
goblue78, I still need to read the details of your paper as I have only had a chance to skim it but I like the idea. Before you start tweaking anything, I would STRONGLY suggest looking at multiple seasons and see how well it matches those.

goblue78
03-23-2011, 08:33 AM
goblue78, I still need to read the details of your paper as I have only had a chance to skim it but I like the idea. Before you start tweaking anything, I would STRONGLY suggest looking at multiple seasons and see how well it matches those.

Thanks. It's on the agenda. If I get time I'll do it today. All I have to do is go get the data. The rest of the programs are now written.

JF_Gophers
03-23-2011, 08:35 AM
You would get similar results just looking at the top 16 team in goals scored average (11/16 teams), and goals against average (11/16).
Top power plays? 9/16. Top penalty kill? Only 6/16.

But I think this proves that score doesn't really matter. Because 15 of the 16 teams that won 20+ games this year are in the tournament. WMU at 19 is the only outlier thats in, and Wisconsin at 21 wins is the only one out.

So scoring a lot of goals and not giving up many leads to more wins, which leads to making the tournament and being ranked highly.

So why bother with scoring, when wins actually gets you a closer match?

jcarter7669
03-23-2011, 08:52 AM
It has Yale at #3, therefore it needs a lot of work still. No computer in it's right mind would put them at #3. #13 maybe...

Alton
03-23-2011, 09:05 AM
So why bother with scoring, when wins actually gets you a closer match?

That depends on what you are trying to match. If you are only trying to predict who gets into the tournament, then just use PWR--it's 100 percent accurate!

Obviously, you want a rating system to do something more than just that. A good rating system will predict games in the future, not just represent what happened in the past. A test of a rating system would be to measure how accurate its predictions actually are. For that purpose, a rating system that takes into account goals scored and allowed (in addition to strength of schedule) is always more accurate than one that ignores scoring margin, and only looks at wins and losses.

I agree with goblue78 that although goal scoring in hockey is not exactly a poisson process--the increments are not completely independent--it is close enough that this problem is lost in the noise. What is needed, of course, is a rigorous test of the system, and it looks like Chapter VIII of the paper is a good start.

unofan
03-23-2011, 09:29 AM
I'm not sure I agree with your 2nd sentence.

When Bill James was still writing Baseball Abstract, he did an entire chapter one year on the significance of how you won, and, in particular, by what margin, that postulated that it was very indicative of what kind of team you are/have. Good teams win by large margins. Bad ones don't. That was the gist of it all. This same article completely pooh-poohed 1 run wins as pretty much meaningless (despite how much you always hear about them in MLB), over the course of baseball history, and he took a look at all of it to formulate that opinion.

It looks like the writer here came to some of the same conclusions that James did, for college hockey, although I readily admit we are talking two entirely different sports here. The parallels to what he said in Baseball Abstract are interesting, though.

Right, but then ultimately the W is the most important thing. Bill James simply says a team that wins lots of 1-run games is a statistical aberration and unlikely to continue to do so in the future. He doesn't say that they shouldn't count, however.

Same thing here; all else being equal, a team that goes 19-7-2 with a negative scoring differential is still more deserving of an NCAA bid than a team that goes 12-12-4 with a postivie scoring differential. Doesn't matter that they got lucky or won lots of close games while getting blown out in others; all that matters is that they won.

goblue78
03-23-2011, 09:29 AM
You would get similar results just looking at the top 16 team in goals scored average (11/16 teams), and goals against average (11/16).
Top power plays? 9/16. Top penalty kill? Only 6/16.


Thanks. I hadn't done that calculation, but you've just demonstrated that this system works substantially better than just looking at goals scored and goals against. (Why? Because it looks at who you've scored against and who you've stopped from scoring.) Plus, they aren't the same 11 teams, so what do you do then? Which one works better and why? And you've demonstrated that PP and PK are lousy methods of prediction.

Secondly, while wins will tell you who gets in the tournament, for the most part, it doesn't seed teams very well. Because PWR takes into account who you beat. This system takes into account who you score against, which is valuable information.

unofan
03-23-2011, 09:32 AM
A good rating system will predict games in the future, not just represent what happened in the past.

I disagree with this premise. I think it depends on what you're expecting the rating system to do. Not all systems have to be predicative in nature.