The initial path locations displayed are the high elo junglers’ positions at 2:00. With buffs spawning at 1:30, most champions have already cleared their first buff by around 1:45, so, by 2:00, most junglers are currently making progress on or finishing up their second camp.
Because buffs spawn at 1:30, the jungler position at 1:00 isn’t necessarily indicative of a jungler’s starting camp. Because we are working with minute-by-minute positional data, we chose to start with the 2:00 position since that typically reveals the jungler on the second camp of their route.
As you continue to add points to your path, our tool filters the data to only include junglers who went on a path similar to yours. If you add a point to your path that few high elo junglers chose, then a large portion of the data will be filtered out, meaning that the number of similar paths you have becomes too small to be useful.
Data from the Riot API was stored in a MySQL database, which we then extracted and processed in Python to create a tree data structure in order to store the paths. The tool was coded with a JavaScript library called D3 and uses our data structure to create the position and win rate plots.
Short answer: we included a correction for small sample sizes so that small sample size regions would have their win rates forced closer to 50%.
Long answer: consider a region of the map that only two junglers passed through at 2:00. If both junglers won their games, then that region’s win rate would be 100% even though we can’t draw meaningful conclusions from two data points. To combat this, we bumped the win rates of all regions to be closer to 50% than they actually were, but made it so this “shrinkage” towards 50% didn’t apply as much to large sample size regions. This way, the colors correspond to more realistic estimates of win rates in each region. For those of you who know a bit about Bayesian statistics, we used a Beta-Binomial model with a Beta prior parametrized by alpha = beta = 20. In essence, this method adjusts low sample size regions to be closer to 50% without drastically impacting the win rates calculated in higher sample size regions.
We used the Riot API to gather match timeline data from the North American server on patches 9.1 and 9.2. Since this data is from patches 9.1 and 9.2, the initial scuttle crab timer is at 2:00 rather than 3:15, which was a recent change in patch 9.9. The timeline data from Riot provided us with positional data for all players in the game in one minute intervals, and we filtered this match timeline data to specifically include high elo (diamond or higher) junglers. We chose to filter for high elo data because we felt that there was more to learn from high ranked gameplay, and, also, we felt that users would be more interested in analyzing this data. In total, we have high elo jungle position data for nearly 200,000 paths.
Each time you select a point for your path, our tool looks up high elo junglers who took paths “similar” enough to yours and displays these junglers’ next-minute locations. In order for a path to be considered similar to your path, each minute position from the high elo jungler’s path must be within a range of 750 units (Caitlyn’s attack range is 650 units) for all corresponding minute positions on your path. Once we have filtered the high elo jungle data to only include paths similar to yours, the positions from the similar path data can then be used to create the plots.