← Back to blog
News

TFTable Set 16 Data Review: Problems We Ran Into and What We Learned

A dev-log on TFTable's first set: the same old data problems we kept running into — sample size, comp overlap, conflict rerun — how we handle them, and what we took away.

TFTable Set 16 Data Review: Problems We Ran Into and What We Learned

Hey Tacticians, thanks for looking back on the set with us. And yes, before anyone says it, this does look a bit like a dev-log post. This is the first time we've written something like this, and also the first set since TFTable.cc launched. After spending multiple sets tracking data, we kept running into the same old problems: sample size, comp overlap, and what we call conflict rerun. So while we've kept trying to fix things, reduce error as much as possible, and learn along the way, the truth is that these issues will probably always exist to some extent. You could say that's a byproduct of the novelty.

TL;DR

Necessity: We made heavy use of a metric that differs from what most other sites use, we believe it does a better job evaluating item and comp strength. In most cases, it worked the way we hoped.

Sample Size: To keep our recommendations reliable, every composition and itemization we show has to pass a sample-size threshold. That means comps first created and popularized on CN, where we do not have data, or item lines that are perfectly reasonable but rarely played, usually cannot be surfaced right away.

Artifacts from Ornn: We used a simple method to reduce the distortion caused by Ornn-forged artifacts on what should have been a unit's true artifact average placement. On itemization pages and on composition pages that do not include Ornn, this gave us artifact recommendations that were usually safe and accurate, with the tradeoff of occasionally missing a few.

Comp Overlap: To make our conclusions reproducible and easier to explore, we define comps using data filters that can be replicated on other data sites. This can cause some overlap between different comp filters, which makes comp separation less rigorous, but it does not affect practical use.

Conflict Rerun: This is a form of data bias that has not been widely recognized in all regions. We use a dedicated algorithm to correct for it, and we provide the raw data under "Read More."

Necessity

We've already explained in detail why we believe Necessity is the better metric https://www.notion.so/tftai/TFTable-2d4345fb465d8078ac03fc5ba9097f02. Now let's talk about the issues that come with it. Some of these issues, like the concept of Necessity itself, may always be around to some extent, so they are worth keeping in mind. And perhaps one day, we may cautiously try a few ways to address them by stepping away from our original approach of keeping human interference in data to a minimum.

Mana Items

Necessity measures how strongly a unit depends on a specific item. So if a unit only needs a mana item to cast, but does not depend on any one option in particular, meaning Shojin, Blue Buff, Adaptive Helm, or Nashor's can all do the job, then Necessity cannot really track that. In other words, the data cannot directly tell you, "this unit needs a mana item, just build one already, any of them works." Mel is a typical example. On her item recommendations, Blue Buff and Adaptive Helm both have Necessity close to zero, while Nashor's and Shojin can even come out negative. Thankfully, everyone already understands that Mel as the main carry wants a mana item, so we can live with this small imperfection in the way the data ends up presenting it.

Mel also shows another mana-item issue. The Necessity of Nashor does not look good as single-item for her, but it still shows up in her BIS recommendedation. That part is not hard to explain. After Nashor's was buffed to 18 AP, double Nashor's plus JG or Vanquishier Emblem should be her BIS. The reason the Nashor as single-item data looks bad is that in most real games, the leftover components are not that ideal. In roughly 10 percent of games, players ended up with Nashor's but no spell crit, which heavily drags down the data.

Double T-Hex in the end board

In patches where T-Hex dominated the meta, the most highroll players would often slot in a second T-Hex to completely take over the lobby. Once that happens, any item on the second T-Hex gets artificially inflated placement data. Our current methods cannot tell whether an item was held by the first T-Hex or the second one. As a result, the recommended item list for T-Hex often ends up stacked by random leftover items from the second one. Core items that should belong to items like IE or Last Whisper get pushed aside by things like Shojin, Red Buff, or Deathblade. Luckily, the root of this problem is the meta balance, not data. Once the B-patch launches, the problem usually fixes itself and saves us the trouble.

Blind spots of Necessity

Necessity is not very good at tracking the strength of low-play-rate items. Its basic logic is a unit's placement without the item minus its average placement. For items with extremely low play rates, such as Radiant items, which usually appear in only a tiny fraction of games, whether the item shows up or not barely changes the result in a large sample, so Necessity stays close to zero. That is why we use normal average placement as a reference for low-sample items like Emblems, Artifacts, and Radiant items. For the same reason, we usually cannot identify niche hidden tech before it starts catching on. That kind of lag is an unavoidable part of TFT data.

Sample Size

Everything that makes it onto the TFTable.cc has to clear a sample-size threshold that we consider safe. For example, for the other two item slots in The Darking Bow Bel'veth, we recommend the safer QSS plus Kraken, rather than Rageblade plus Titan's, which is just as playable and even has better average placement, but still does not clear the sample threshold. This threshold is applied across all comps. If we lower it just to let Rageblade plus Titan's show up, we would also end up surfacing unreliable item combinations in other comps. The same goes for Crown of Demacia Jinx itemization. Here too, we can only recommend the more conservative Rageblade plus Kraken, rather than Rageblade plus Giant Slayer, which makes more sense in terms of multiplicative scaling but only has one twelfth of the sample size.

To be honest, this kind of system-level issue cannot be fully solved. All we can do is be transparent and do our best to provide item recommendations as a reference. They are not the answer, and they should always be open to challenge, refinement, and updates.

Artifacts from Ornn

A familiar situation for many players is the following: in a highroll game, you hit Ornn, keep him on the board for four rounds so he can forge an Artifact, and then sell him. When the Artifact is lowroll and does not fit the comp at all, it is often placed on a unit that does not affect the final placement.

Getting a free Artifact in game is, of course, a good thing. But for anyone trying to analyze Artifact data outside the game, this creates a difficulty: to what extent is an Artifact's ranking actually reliable, and to what extent is it being distorted by the statistical bias that comes from already-advantaged game spots?

To address this issue, on top of setting a sample-size threshold, we also exclude games whose final board still contains a 1-star Ornn. This reduces the interference to some extent, but only to that extent. On most pages, this gives us a much cleaner list of Artifact recommendations. For comps that naturally play Ornn in the end board, however, such as Bilgewater and Shadow Isles, an excessive number of Artifacts still appear.

Comp Overlay

There are two main reasons why comp overlap happens.

The first is that TFT has built-in flexibility. In the current set, Demacia Ryze and Freljord Ryze are good examples. If you transition through Stage 4 from a Demacia Invoker and want to play Ryze, you can either pivot toward the Freljord Ryze or play Demacia Ryze. Both versions play 3 Demacia with Galio.

The second is that we categorize comps by cost. A 5-cost carry usually means pushing fast 9, while a 4-cost carry usually means rolling at level 8, with the same logic applying further down the cost. This creates cases like Freljord Warwick, where in some patches the comp is played by rolling at level 7 for 3-stars, while in others it is played as a standard level-8 board, even though the actual build is unchanged. As a result, the same comp can overlap across both the 3-cost and 4-cost comp pages.

Conflict Rerun

Just as every set players transition Guinsoo's from a lower-cost unit to a higher-cost, the data bias created by that kind of item transfer will always be part of TFT. This set's Jinx plus Guinsoo's is a typical example. If you look at itemization page or the 3-cost categorized Jinx comps, you will find that Guinsoo's is one of Jinx's most important items. But if you switch to the 4-cost categorized Zaun Warwick page, Jinx's Guinsoo's suddenly shows up as "bad." That is obviously not because the item itself is bad. It is because Zaun Warwick plays a more tempo-oriented, level-up style of play. Instead of rolling Jinx 3, players usually cap the board with upgraded 5-costs. Once Kindred or Ziggs hits 2-star, players will remove Guinsoo's from Jinx and move it onto them. In other words, 4-cost categorized Zaun Warwick final boards where Jinx is still holding Guinsoo's, a meaningful share are simply bad games that never had upgraded 5-costs. If you add Guinsoo's to the corresponding higher-cost unit, Jinx's Guinsoo's data returns to the level it should have been at.

This issue exists across every set. Think back to examples like these:

  • Set 13: Rebel, Shojin from Zoe to Jinx
  • Set 15: Tesla(Twisted Fate, Varus, Zyra) legendaries flex, IE from TF to Varus

We have addressed this issue to some extent. For example, if you open the Bilgewater Miss Fortune page, you will find that IE ranks quite high in Miss Fortune's Necessity instead of looking weak the way it does in AVP or Delta on a search tool. Under "Read More" at the bottom of each comp page, you can see how a unit's data changes after Conflict Rerun is applied. Miss Fortune's IE is corrected from -0.14 to 0.09, which is much closer to where the item actually belongs.

That's about it. Thanks for reading something this dense, and thanks for using our site TFTable.cc. See you among the stars.