Wednesday, January 28, 2015

Theorycrafting 101

So I saw a Twitter conversation where someone was basically asking, "How do I Theorycraft?" While Twitter is amazing for jumping in on these conversations, it's terrible for any sort of protracted discussion. I'm relatively new to the Theorycrafting space for WoW itself, but I've managed to make a couple waves (like with my secondary stats posts). I'm no Theck, or Bouchebaguette, or Vixsin, but I do okay I think.

So here's how I got into Theorycrafting, and a slightly formalized measure of how to do it. Warning, my example will be WoW-heavy.

There are definitely days where I feel this.

Theorycrafting 101

Theorycrafting is science. You're basically applying the scientific method to something in the game you're playing. As a refresher for those who don't remember 6th grade science class, the scientific method is as follows:

Totally stolen from

Step 1: Ask a Question, Do Some Research

The process of forming your question starts with being curious about something, then performing research, simplifying variables, and adding constraints and base assumptions to come up with a more specific question, repeating until you think you have something you can try to answer.

Okay, so what do you want to figure out? For my secondary stats posts, one of the questions I wanted to get the answer to was, "When should I take an upgrade?". I started scouring the Internet using sources like other theorycrafters, blue posts, data in,,, and using my own logic and knowledge of the game to figure out what is involved in a gear upgrade: ilvl, primary stats versus secondary stats, sockets, warforged, tertiary stats, DPS versus survivability, different classes determining how much a stat is worth via different stat weights. That's clearly way too much to handle, so it was time to start whittling that down to more specific questions and clear away irrelevant data.

I decided personally to focus on DPS. "When was it useful for DPS to take an upgrade?" Tanks and healers muddy the water because they're not concerned with raw throughput, but for my first attempt, raw throughput is a relatively easy concept to measure.

Once I had a more specific question, I could clear away largely irrelevant data. DPS versus survivability, well, we're trying to optimize our DPS throughput, so Stamina and the defensive benefits of Versatility could be ignored. Tertiary stats are not DPS boosting stats. Well, wait, what about movement speed? The faster you can move, the less moving you need to do, which means more uptime on your rotation. Ehh, like physicists, let's pretend the cow is a perfect sphere, or the fight is Patchwerk, so we're creating more constraints to simplify the problem.

This is how theoretical physicists do it.
So now we're down to ilvl, primary stats versus secondary stats, sockets, warforged, and stat weights. More research indicates that ilvl is tied to something called a stat budget. Basically, the amount of primary and secondary stats on a piece of gear is related to a formula based on ilvl. Ideally, two bracers, for example, with the same ilvl, will have the same budget, or raw values of primary and secondary stats. Basically, ilvl is equivalent to your primary stats (because your gear will always have Stamina plus your primary attribute), so we can collapse a couple more of our inputs.

So, ilvl, secondary stats, sockets, warforged, and stat weights. Warforged is just ilvl, so we can collapse that down to ilvl as well. Sockets throw a wrench in the works, so I decided to ignore that for the time being.

We know that higher ilvl should generally be better, but because of how secondary stats and stat weights interact, not every piece is created equally for every class. Stat weights inform how good a specific secondary is for a class, so we know that this is going to vary per class. but maybe there's a good rule of thumb? I mean, eventually we'll get enough stat budget that it won't matter what secondaries are on the item, we'll just want to take it regardless.

Stat weights themselves are generated through simulation. Someone else did all the work, and for my piece, while I did a lot of research into how those are generated, that research strictly wasn't necessary--unless I felt like challenging how stat weights are generated, but I didn't. So a basic assumption I went in was that stat weights are generally accurate for our purposes.

So the question has become, "For maximum standing DPS throughput for most classes, how many more ilvls does an upgrade need to have before you should probably take it no matter what, ignoring sockets and tertiaries?"

Step 2: Form a Hypothesis

Hypothesis: Poop Theft
For our question, we should take an educated guess at what the result might be. For my post, I figured, well, 15 ilvls is probably a pretty good bet given that's how much space there is between a raid tier.

If you're having difficulty forming a hypothesis, or you have so many caveats your hypothesis might be mistaken for a cell phone contract, you'll want to go back to Step 1. Your question is likely too broad.

The process of coming up with an educated guess will inform how you design your experiment. If your question is good enough, this step should be relatively simple.

Step 3: Construct and Execute an Experiment

Constructing an experiment requires you to take your question with your research, and use logic and/or math to determine a way to test your hypothesis.

Time to get into the nitty gritty! So to test that a 15 ilvl jump is sufficient to take an upgrade, we need a way to measure that jump. Thankfully, we can use aforementioned stat weights to measure how much value a piece of equipment offers. To determine if an item is an upgrade, we need only evaluate its value and compare it to the piece we're currently trying to upgrade.

But that's not sufficient. Remember that we're trying to find a more general answer, so perhaps we should consider the worst-case scenario: the item we're starting with has nothing but the best secondary we like, and the item we're upgrading to has nothing but the worst secondary, like an Enhancement Shaman with 630 Haste bracers, upgrading to 645 Versatility bracers. If we can show what ilvl bump is needed for the worst possible scenario, then anything better will clearly fit our model and hooray, success!

Step 4: Analyze Data and Draw Conclusions

What conclusion can you draw from this data?
Once you have your data, you need to compare it to your hypothesis. Do the results hold up your hypothesis? Partially or fully? Can you break or find holes in your results? Can you refine your experiment to test boundaries? Can you draw other inferences from your data? You may need to repeat the loop between Step 3 and Step 4 many times before coming to a satisfactory answer.

When I executed the experiment, I ended up showing that actually, 10 ilvls was more sufficient for Enhancement Shaman in the worst possible case. But again, that's not enough. We can't generalize with that. While Enhancement Shaman have some of the more divergent secondary stat weightings, they're not the worst.

So instead I found even more divergent stat weightings and repeated the experiment for a stat weight spread where the best is nearly twice as good as the worst (something that pretty well never actually occurs in simulated stat weights), and we found that, hm, actually, we'd need about 18 ilvls. More data and more calculations proved that to be an outlier, however, so we could caveat our results to the point where in most but the most egregious stat weighting cases, 10 ilvls was sufficient, and 15 ilvls was a sure bet.

Since the results held at lower ilvls, and stat budget is exponential in nature, ilvl gaps at higher ilvls would only exhibit this behaviour more strongly. That is, it would require smaller ilvl gaps to make an upgrade worthwhile at higher ilvls. Empirical analysis (basically, just trying the experiment at higher ilvls) bore this out.

Interestingly, my results didn't take into account procs on Trinkets, or the stat weightings for weaponry, so I had to caveat my results there as well. They don't match the constraints I put together in my question and research phase, so they'd also require their own experiments and refinement. Sockets, as mentioned before, also did not fit.

Step 5: Communicate Results

This is a very important step, not only because you're trying to answer people's questions, but because you're opening yourself up to peer review. Folks who read your work will naturally find holes in it. Every theorycrafter makes mistakes, but mistakes are completely okay. By correcting those mistakes, you either find your conclusions were bunk, or you can alter the experiment to make your conclusions even more airtight.

In my case, I published my results for secondary stats on my blog, and a totally valid criticism came along that I had used bracers, an item with very low, if not the lowest, stat budgets. So I repeated my experiments with an item that had one of the highest stat budgets: the chest piece. My results came back even stronger.

Expect challenges to your work. In fact, relish in those challenges. If someone pokes a hole in your logic or math, don't despair. Fix it! And if you can't fix it, and your experiment was a failure, you've still contributed knowledge to the greater community as well as your own experiences. In science, and in theorycrafting, failing and being wrong aren't bad things. It's still useful information other folks can build upon.


So that's it, that's theorycrafting in a nutshell. Having a math/science background helps significantly, as does knowledge in computer science. I can't offer how to analyze your data precisely, or how to generate exact experiments, because it differs greatly depending on the question, and some of that does require mathematics, statistical models, and so on.

But people didn't know off-hand how to build an experiment to test if molecules were made up of atoms: that only came after numerous other experiments that expanded scientific knowledge and consensus, and the same holds with theorycrafting. It takes practice, doing, and sound logical thought.

So start with something relatively small, what you think might be easy--you'll be quickly surprised at how not easy even the simplest questions tend to be with all of their caveats--but that's okay. Simplify and answer a very specific question, add in the caveats afterwards to generalize or explore your results further. And most importantly, have fun doing it! #Theorycrafting

1 comment:

  1. Those are some hilarious pictures you picked to go with the post Talarian. Awesome stuff! My theorycrafting has increased by 0.01%! :)