Scoring my 2022 predictions

I performed: very badly

Jan 01, 2023

At the start of this year I made about fifty forecasts about what would be true at midnight on 1/1/2023. How did I do?

Not great.

My score

I started by discarding one forecast where the result was unclear, and a number of conditional guesses where the condition hadn’t happened. (For example, I said that if Russia didn’t invade Ukraine, then there was a 70% chance of there being new US sanctions on Russia. But they did invade, so the forecast was null and void.) But I kept in conditional forecasts where the condition did occur. (For example, I said that if Roe was overturned by Dobbs, then there was a 70% chance that Roberts would vote with the majority; Roe was indeed overturned by Dobbs, so that 70% number is taken into account.)

With those conditions, I had a total of 55 predictions for the year 2022. And across those 55, my ‘Brier score’ was about 0.27.

I think there are good reasons not to take statistical tools like the Brier score too seriously when applied to these kinds of forecasts (see nostalgebraist’s great post here, the points within which apply even more to this kind of personal exercise than to fake-money prediction markets). But even still, 0.27 is worse than I would have gotten if I’d just guessed 50% for every forecast: that would have gotten me a Brier score of 0.25, and a lower score means more accurate guesses.

I want to repeat that. If someone had been asked ‘how likely do you think these events are?’, and they replied to every one with ‘well, it’ll either happen or it won’t, so it’s fifty-fifty!’, I would have said that they were a moron with no understanding of probability. I have, in the past, laughed at public figures who have made exactly this error. Yet someone like this would have predicted 2022 better than me. That - well, it hurts.

Why did I do so badly?

My basic problem was that I was insanely overconfident. Only two-thirds of my 90% forecasts came true, and less than a third of my 80% forecasts. One of my best categories was the one in which I genuinely was just kinda guessing (the ‘other - politics’ category). I forecasted my personal life OK (as perhaps should be expected!) and also actually did alright on Northern Irish politics (which maybe is my ‘special subject’ or whatever), both of which I was relatively confident in; but I was awful on almost everything else.

But what struck me is the memory that, looking back to January 2022 when I made these predictions, most people thought they were - eh, pretty alright! Definitely, there was some disagreement, but among smart people who I talked to about these predictions, definitely nobody communicated to me the sense that these predictions looked actively worse than random.

Maybe these people were just being kind to their stupid friend. But I don’t think so. When I go back through my forecasts and ignore the numbers, just thinking of each one as a prediction of ‘this is more likely than not’, in general they all seem pretty reasonable. My forecasts seem much better as binary ‘more likely than not’ guesses; they’re still not great, but even with hindsight I think they’re entirely defensible, and (based on eyeballing) they look slightly better than random.

My error, I think, was to turn a general sense of ‘this seems more likely than not’ into a numerical confidence of 70, 80, or even 90%. And this seems like a general source of overconfidence. When Bayesians are explaining their philosophy to an interested but disinterested party, they tend to casually say things like ‘belief isn’t all-or-nothing - it comes in degrees, and you can be more or less sure of something’, as if these was an obvious fact. But (phenomenologically speaking) I’m not sure this is true in general. Most of us, I would guess, very often naturally evaluate questions in pretty binary terms, and our descriptions of how certain we are can often just be arbitrary and post-hoc. This might explain why my friends also seemed to think my forecasts were - if not great - at least fine. But this produces very bad outcomes, because the Bayesians are right about their core claim: we should be able to tell the difference between 51% and 90%, even if we can’t!

But what makes me feel the worst, ironically, is my success at the extreme end of the distribution: everything I evaluated at 95% or 99% came true. This suggests that I had some subjective impression of certainty, a gut feeling that this but not that was something I could be pretty sure about at a 95%+ level. So the possibility exists that, had I really tried, I maybe could have used this as the basis of a comparison for varying degrees of certainty! I could have compared my other forecasts with my most certain ones and tried to generate a kind of ranking, which maybe would have imposed some proper modesty on the rest of my guesses. (This might suggests why my track record at betting is better than this attempt at predictions - money forces you into a certain kind of numerical thinking.) But I just didn’t do anything like that.

Am I stupid, or just ignorant?

One of my favourite epistemology papers is Egan and Elga (2005), ‘I Can’t Believe I’m Stupid!’. (Highly recommended; it doesn’t require all that much background knowledge.) In this paper, Egan and Elga draw a distinction between being ignorant and being stupid.

If you’re ignorant, it means that your beliefs are basically unreliable, and not a particularly good guide to the world - to be technical, they are ‘uncorrelated with the truth’. This is a pretty common situation. If you discover your beliefs are less reliable than you had previously thought, Egan and Elga argue you should try to do two things: first, you should (obviously) slightly downgrade your confidence about various matters; but second, you should make your beliefs much less resilient. Even relatively bad arguments and relatively weak pieces of evidence should sway your opinion much more than previously. Roughly, this is because you should now see your ‘prior’ beliefs as only weakly suggesting the truth of something: even if you think you have good evidence for your positions, you know that your judgments about the quality of evidence aren’t reliable. Being swayed more by even weak pieces of evidence, and not letting any one piece of evidence dominate, ‘adjusts’ for your ignorance.

But if you’re stupid, that’s a different matter. This is when your beliefs are actually negatively correlated with the truth. This is different from ignorance, where your beliefs are just pretty unreliable and all over the place, uncorrelated with the truth; if a stupid person thinks something is true, that makes it less likely to be true. The most extremely stupid people are the ‘anti-experts’, whose predictions are so bad that they are essentially the opposite of expert predictions: if an anti-expert claims that something is 90% likely to happen, you should basically act as if an expert had told you it was 90% likely not to happen.

Egan and Elga’s neat result is that, while it is not only possible but reasonable for people to think of themselves as ignorant, it is potentially even impossible to believe that you yourself are stupid - hence the title of their paper. This isn’t just because of arrogance or whatever (although in real life that certainly plays a role). Even if you are fully rational in some absurdly idealised sense, it is incoherent to think your own beliefs are stupid, even though it is perfectly possible to think that others are stupid. I won’t go into too much detail about why this is (read the paper! it’s very good), but it has to do with Moore’s paradox: if you think you are an anti-expert about Chinese foreign policy, then you might end up saying ‘Xi Jinping will launch an invasion of Taiwan, and I believe Xi Jinping will not launch an invasion of Taiwan’, which is a kind of weird incoherence.

As such, if you discover evidence that you are stupid, you can’t just adjust for it the way you can adjust for ignorance. Either you immediately update all of your beliefs in a fully Bayesian manner so that you now reasonably believe ‘well, I was stupid, but thankfully no longer’ - which is hard to do! - or you just completely suspend judgment like the sceptics. This makes it important for me to find out whether this exercise has given me evidence of stupidity, or mere ignorance: the latter is humbling but OK, the former scary and scepticism-inducing.

Luckily, while there was a weak negative correlation between my predictions and the truth, my Brier score here was low enough to make this an instance of ignorance rather than stupidity sensu stricto. Part of Egan and Elga’s result was finding that there’s a hard limit to the confidence that anyone can coherently have in the claim ‘my Brier score is greater than or equal to X’ when X is above 0.25. But for the value X=0.27, this hard limit is a bit under 90%. So there’s nothing incoherent about me believing that my Brier score is indeed about 0.27; I should respond in the standard manner, trying to adjust for ignorance by reducing my confidence and being much more open to changing my mind.

But this strategy relies on me being able to improve over time. If, over the course of several updates, I don’t become more accurate, then my ignorance begins to shade over into something closer to stupidity - in particular, I should begin suspending judgment much more often. We shall see how it goes.

Final takeaways

This exercise has been deeply embarrassing, and I am unquestionably going to do it again for 2023.

No, seriously! Sitting down today and calculating exactly how wrong I was was incredibly embarrassing, yes, but also (in a weird sense) fantastic fun; and I think I’ve genuinely learned something about myself. I’ve been humbled in a really productive way, and I’m excited to see how I do for 2023. I encourage all of you to do it too - hopefully I find out I have many ignorant friends, and my embarrassment will be much reduced.

More detail on my 2022 predictions

Northern Ireland

Actually pretty good! I was perhaps overconfident that government formation would go ahead, but guessed the election results about right (albeit the DUP did slightly better than I expected).

The DUP are not the largest party in the Assembly - 95% (TRUE)
- Conditional on that, the DUP are the largest unionist party - 65% (TRUE)
Alliance are the fifth party - 60% (FALSE)
Unionists + Other have a majority in the Assembly - 90% (TRUE)
- Conditional on that, Unionists + Alliance have a majority in the Assembly - 99% (TRUE)
Swann is Health Minister - 55% (FALSE)
Givan is neither FM nor DFM - 95% (TRUE)
At least three major outlets (BBC Newsline, UTV News, BelTel, the Newsletter, etc.) ran top stories on anti-Protocol protests / riots during the year - 50% (FALSE)

United Kingdom

My subjective ‘vibe’ that Johnson would stay was even stronger than my numerical prediction would suggest, and I failed to properly model what would happen conditional on him not staying - which, of course, is what actually happened.

Starmer is leader of the Labour Party - 95% (TRUE)
Johnson is Prime Minister - 80% (FALSE)
Sunak is the most popular (approval minus disapproval) Cabinet or Shadow Cabinet member - 85% (FALSE)
Sunak is in one of the Great Offices of State - 95% (TRUE)
There has not been a general election - 90% (TRUE)
The Lib Dem polling average across the year has been 10%, plus or minus 2.5 points - 75% (TRUE)
Labour made gains in the Cambridge City Council elections - 75% (TRUE)

Republic of Ireland

My predicted new two-party system never materialised. We’ll see what happens at the next election, but basically I think my high-level predictions for RoI politics should be considered failures.

Fianna Fáil have dropped below 15% on Politico’s poll of polls at least once - 65% (FALSE)
Fianna Fáil never overtook Fine Gael on Politico’s poll of polls - 70% (TRUE)
Varadkar was more popular (approval - disapproval) than Martin when the rotating Taoiseach agreement went into effect - 65% (UNCLEAR)
- I said I’d take an average of the last poll before and the first poll after, but there’s not been a new approval poll for either yet (as far as I can tell). Happy to update this if someone has actually seen some good data.
Sinn Féin’s polling average across the year has been 30%, plus or minus 5 points - 85% (TRUE)
- Conditional on that, Sinn Féin’s polling average across the year has been 30%, plus or minus 2.5 points - 80% (FALSE)

United States

Reality was less depressing than I had given it credit for here.

The GOP controls the Senate - 85% (FALSE)
The GOP controls the House - 90% (TRUE)
- Conditional on that, the GOP controls the Senate: 90% (FALSE)
Biden is president - 95% (TRUE)
Roe v. Wade has been overturned - 80% (TRUE)
- Conditional on that, it was overturned in Dobbs v. Jackson Women’s Health Organization - 95% (TRUE)
  - Conditional on that, Roberts voted with the majority - 70% (FALSE)
    - This one depends on how you construe ‘voted with the majority’, but I’ll be strict with myself.
Breyer has not retired - 80% (FALSE)

Covid

China has given up on zero-covid - 75% (TRUE)
A new variant of concern has been identified by the WHO - 90% (FALSE)
Variant-specific vaccines have not been rolled out in any OECD nation - 75% (FALSE)
At least one nation has less than 10% of its population double-vaxxed - 95% (TRUE)
- I said Yemen, and I was right.

Other (politics / international affairs)

Actually not bad at all here! This was the category I was least confident in, but honestly that seems to have helped me, and I’ve made fewer clear blunders than elsewhere.

Macron is the president of France - 70% (TRUE)
Pécresse did not reach the run-off - 65% (TRUE)
Lula is the president of Brazil - 55% (FALSE)
- This one fucked me up! Lula, of course, won the election, which is what I was trying to predict. But - as I could have just checked! - the new president won’t be sworn in until the new year, so the statement is literally false. And thus, when I made this prediction, I could have known for certain that it would be false! My phrasing was trying to account for extra-constitutional fuckery from Bolsonaro, but I should have checked the very basic inauguration date stuff.
China has not invaded Taiwan - 85% (TRUE)
Russia has not invaded Ukraine - 60% (FALSE)
The Uyghur genocide has not stopped - 90% (TRUE)

Other (misc.)

Simone Biles has announced her retirement from competition - 75% (FALSE)
At least 5 additional living artists, or bands with living members, in the Rolling Stone top 500 have followed the Boss and sold their back-catalogues - 80% (FALSE)
- This one is really interesting, actually: while Genesis made a good bit of money on this this year, otherwise the trend seems to have run out of steam. Analysis here which is very much worth a read, even if it is a bit speculative / woo.
Kendrick still hasn’t dropped a new album - 70% (FALSE)
Dublin won Sam - 85% (FALSE)

This blog

I published Part 2 of the ‘Self-defeating theories’ series - 85% (FALSE)
I published Part 3 of the ‘Self-defeating theories’ series - 65% (FALSE)
I have finished the ‘Self-defeating theories’ series - 60% (FALSE)
- Gave up on it; realised it wasn’t the best framing for my arguments.
I have written at least one post every month - 65% (FALSE)

Personal

I am in a relationship with [current partner] - 99% (TRUE)
I am living in Cambridge - 80% (FALSE)
I am living at [current address] - 75% (FALSE)
- The move to Glasgow was quite sudden!
I am still vegan - 95% (TRUE)
I donated more money to charitable causes in 2022 compared to 2021 - 95% (TRUE)
>25% of my donations went to animal advocacy causes - 65% (TRUE)
I ran in the local elections in Cambridge - 60% (TRUE)
I attended CULA AGM - 75% (TRUE)
I did not attend a CULA event other than the AGM or an EGM - 80% (TRUE)
I have come to disagree with the conclusion of at least one of my articles / blog posts / essays - 70% (TRUE)

Postscript: my unquantified forecasts

These didn’t count towards my final score - the whole point of the exercise was quantified, falsifiable forecasts, whereas these are somewhat vague and potentially up for interpretation. I just did these for fun.

Covid as a social phenomenon is basically over in the UK - 70% (TRUE)
The Northern Ireland Secretary is incompetent - 90% (UNCLEAR)
- Heaton-Harris hasn’t been in the job long enough to judge, but probably.
Jared Polis has done something absolutely fucking based - 99% (TRUE)
- Too many examples to list.
It looks less likely that Trump will run for president than it did at the start of the year - 60% (FALSE)
- This one is a pretty clear outcome - he has now announced he’s running, so yeah it’s more likely.
The ‘applied turn’ in Anglo-American philosophy continues apace - 80% (UNCLEAR)
- Feels true but idk.

Her Fingers Bloomed

Discussion about this post