Thursday, November 03, 2005

Football Wonk (sort of) - Windmills

[note: I'm not particularly happy with this post, but, ahh screw it. I've spent too much time on it just to trash it. Feel free to skip it.]

All week, and usually about this time every football season, the topic of discussion turns to the reliable "which conference is best." It's one that quickly gets fans' ire up. Folks down south refuse to believe the slow linemen in the Big Ten are as strong as they think, while the people out west think the rest of the country is too dumb to play offense, while the folks in the ACC hector about getting respect and for people to quit thinking of them as a basketball conference, while the people in the midwest talk about tradition, and on and on. People wnat everyone else to know how great their conference is, and how their teams have it so tough that they should be rated higher just for playing in said conference. Just about every conference has chestbeaters and people playing the "respect" card, or people who disrespect the accomplishments of other teams because they play in a weak conference.

The truth of the matter is that there is absolutely no possible way to determine "the best conference" in a particular year and base it solely on objective fact. There aren't enough games between conferences and between comparable conference teams to determine it. Some have tried, myself included, by tallying up OOC games and trying to compare them conference-wide. But there are two specific logical flaws with that.

First, the sample size is too small. The largest conferences are 12 teams, who each play 3 OOC games, so you'd think 36 OOC games might be enough. Statistically speaking, I wouldn't consider 36 games to be much of a trend. But those 36 games actually break down into much smaller subsets. You could break it down by conference (no conference plays all their OOC games against another conference). You could break it down by "major" conferences and "minor" conferences. You could take away lower division opponents. In any event, if you wanted to compare OOC records between conferences, you're stuck with a sample size too small to prove any trends. You could increase the size of the sample by comparing several years worth of OOC games, but that pollutes the question. By looking to other years, you cannot say who is best this year, but rather who's the best over those several years, which, because of attrition, cannot prove much about this year.

Second, there is a flaw in simply looking at raw numbers because those numbers represent actual matchups which might not prove a thing. Say the Big Ten plays 5 OOC games against the Pac 10. In every single one of them, the Pac 10 team is a double-digit underdog. Bottom feeders from one conference playing elite teams from another. If the result is 5 wins for the Big Ten, all that shows is that good teams from one conference beat bad teams from another. It doesn't show the relative strengths or weaknesses of each. We could look at gambling spreads to take away this, but spreads no not necessarily tell the story of relative strengths either (the goal of the spread is equal money, not competitive balance, and much more goes into setting a line than just the strengths of the teams).

So it is op paramount importance that I state clearly and unambiguously that I believe nobody can say, with anything even remotely approaching factual certainty, that one conference is definitely better than another. There just isn't enough factual evidence to support any argument. And the factual evidence we have is flawed and unreliable.

We're left with opinions. Which is fine and what makes college football fun. I can think that one conference is better than others, and others can think their conference is better than mine. But whenever someone says "the facts say Conference X is clearly the best/worst" that person is merely stating their own opinion. There are no facts that can say such.

But it's still semi-interesting to debate it. Tilting at windmills, as it were.

And so I set out to try to come up with an objective (as in a set of rules governing the discussion, flaws revealed, and failures admitted) system that pits conferences against each other. When I was younger, our family was big time into college basketball. We lived in Atlanta, but the patriarch is a Villanova grad. I remember following the Big East-ACC Challenges in theiir early days. The two conferences would line up the teams best to worst and play on the court, often at neutral sites. Most wins won the challenge. I think the Big Ten and ACC have done this recently too. It's not a perfect system, but it did settle it on the court (sort of).

It'd be impossible to structure the same in college football, since schedules are set so far in advance and $$$ controls everything. But we can do it in the internet world.

So here's what I did: I drew a huge spreadsheet ranking the conference teams based upon the ColleyRatings. I find that to be the most accurate and reliable system, based upon how they run their system (shifting rankings to infinity until they don't change). I also don't find any inherent bias in their system (favoring offense or defense scoring, etc.). So I'd rank the teams solely based on where they ranked in the ColleyRatings list on this very week [see below for the biggest caveat of them all - I realize the results will change week to week]. I did not alter the list based upon conference standings or head to head within the conference. [see below for my caveat] I then created these "ladders" for each conference. Then I lined them up. 1-12 for the SEC against 1-12 for the ACC. And then they "played" head to head, with the rating by Colley being the score. For example, the top ranked SEC team was Alabama, ranked 5th against the top ranked ACC team (VT), ranked 2nd. Virginia Tech won the matchup. Then I'd take the second slotted team, third slotted team, and so on. I'd run every conference against every other conference. If the conferences were of different size, I wouldn't count the bottom teams for the larger conference [see caveat below]. My thinking on that is that the 8th best team in one conference should be comparable to the 8th best team in another, not the 12th best team. I counted from the top, not the bottom (because the top really should count more - and yes, see the caveat below).

After running each simulated "challenge", it started to look pretty clear. Is this an irrefutable answer to "Who's the best conference?" NO. It's a system-based answer. Under this system, here's your answer.

The rankings below correspond to both the total won-loss (all challenge matches against all other conferences) and the "similar" won-loss (all challenge matches against the other conferences on your same side of that "line of demarcation"). Total W/L first, "Similar" W/L second.

1. Big Ten 90-9, 34-9
2. Big XII 82-21, 24-21
3. ACC 80-23, 22-23
4. SEC 77-26, 19-26
5. Pac-10 62-32, 10-30
6. Big East 32-48, 32-8
7. Conference USA 36-67, 36-10
8. Mountain West 28-60, 28-15
9. Mid American 24-79, 22-24
10. WAC 11-77, 11-32
11. Sun Belt 0-80, 0-40

First off, there is a clear demarcation between the top 5 conferences and the bottom 6. of the "challenge matches" between those conferences, the top 5 went 282-2 against the bottom 6. (the two losses? Ball State over Washington and Eastern Michigan over Arizona). The Big East didn't have a single challenge win over the other BCS automatic bid conferences, which should tell you something.

Second off, the key to doing well in this system is two fold: a) several strong teams at the top and b) not-terribly weak teams at the bottom. It might seem obvious, but I think this is actually a good gauge of conference strength. To me, what makes a conference strong is the difficulty in playing week in and week out. If a conference has two incredible and near-unbeatable teams, 3 mediocre teams and 4 bad teams, that conference isn't all that good. If a conference has 8 teams that "on any given Saturday" can beat each other, or at least test each other, I think it's a good conference.

And in fact, that's what this system ended up with. The top 3 conferences all have 9 teams each ranked in the top 60. In comparison, the SEC only has 7, and the Pac-10 has 6. The Big Ten has depth and elite teams. The Big XII and the ACC have great depth. The SEC's weakness at the bottom hurts them badly. Likewise with the Pac-10, whose top 3 can take on just about anyone, but who falls off precipitously after that.

Remember, I don't think this solves the question. I don't think that's even possible. But I do think this is one interesting way to frame the debate. And interestingly enough, this list matches up pretty closely with Massey's Ratings (MWC ahead of C-USA), though not really Colley's conference rankings.

Here are the flaws in the system (and I'm making them very clear):

1. Everything relies upon the Colley Ratings. If the ratings aren't an accurate judge, the entire thing fails. I recognize this and admit it as a real problem. To do a stupid study like this, you have to rely on some form of independent ranking of all 119 teams. I chose this one. Others may be as good. I think using Colley is better than me choosing the slotting of the teams. But, yes, I realize that this is the bottom layer of the house of cards.

2. This list only applies for this very week and this week alone. Since the Colley Ratings change every week, the head to head matchups may vary significantly from week to week. And in actuality, there were several "challenge matches" decide by one or two ranking positions. Even the slightest shift in the Colley Ratings can change the results of this exercise entirely. So, the results above only apply for right now. Trying this again at the end of the year (or really any other week) may result in a much different list. I admit it, and realize it as a flaw.

3. In relying on the Colley Ratings to create a ladder of teams, I made no changes to the ladder based on head-to-head wins or conference standings. In some examples, what I think might've been a better team may have been ranked behind what I consider a worse team. The possibility of incorrect ladders exists because I relied on an independent list. Another recognition that the Colley Rankings flaws may translate over to this list.

4. Conference sizes: yes, they're different. And yes, comparing the worst team in one conference to the 4th worst in another conference is not exactly fair. But there isn't a good way to remove teams - it wouldn't be fair to omit the best team in the larger conferences either. It may seem like an advantage for larger conferences, but the results don't really show that either. Because of intra-conference games, the conferences all spread out quality in sort of a bell curve regardless of the size. But the main reason why I did it this way is because the top of the ladder should be mre important than the bottom. It's a choice I made, without knowing how it would really play out. I admit that it may have skewed the results. It's definitely a flaw in the system.

5. Equal Weight: Probably, I should weigh the top of the ladder games more heavily than the bottom of the ladder games. I might revisit the data and apply a fomula that would weigh the elite teams more. Haven't done it yet, though. Haven't figured out a right formula. But I fully admit the flaw in how this system treats the strength of the elite teams (comparatively) equal to the comparative strength of bottom feeders. Huge flaw in the system.

So there's my attempt, which really proves little. In fact, the main thing I've learned from this is that it's a fool's errand to actually try to establish clearly and concisely, based upon fact, which conference is definitely better. It cannot be done with any level of actual unimpeachable proof. However, we can debate the issue, and try our best to come up with a way to explain our opinions. I started this thinking the SEC was good, the ACC a little better, the Big XII terrible and the Big 10 about as good as the SEC. I created a system not to prove my opinions, but because it seemed somewhat independent. The results were much different from what I expected. Does that make them right? Your guess is as good as mine.