This is the fourth in the series of posts about Mass Effect Multiplayer Challenges and Halls of Fame. Previously, I wrote about Platinum Solos Hall of Fame (PHOF) and fan-created Themed Challenges. Now it’s time to talk about the halls of fame for those same themed challenges (TCHOF).
Being always the optimist, I expect that you have read the previous posts and I won’t re-introduce terms of repeat explanations. This post is probably the most technical in the series, so bear with me (and/or feel no guilt for skipping parts that delve into details).
Submissions for TCHOF are stored similarly to the submissions for PHOF—as a big table in an Excel spreadsheet (located in the file Mass Effect 3 MP TCHOF Runs.xlsm on GitHub). Each row of the table contains one submission. The first column stores the number of points accumulated in a run; this is archived only for solo runs with the purpose of resolving ties. Where there was only one column for the player screen-name in PHOF, there are four columns for players (and the links to their recordings—screenshots or videos—of the run) in TCHOF, since challenges exist for teams of all sizes. A unique team name is calculated from player names, sorted alphabetically, for the needs of leaderboards and statistics. There are columns for the duration and the difficulty of the run like in PHOF, but here it’s not necessary to specify the map, enemy, characters or weapons because these are all preset by the challenge conditions. Instead, there’s a column for the base name of the challenge. The parent series, alternative name and type (solo/duo/trio/team) of the challenge are automatically determined from the name. There’s a column to indicate if the challenge is a bonus; bonus challenges are optional and do not count toward the cumulative times. Dates were sadly never recorded for the themed challenge runs.
At first glance, this doesn’t look more complicated than the PHOF archive. Only with PHOF, nothing needs to be done with the data before it’s presented on the site (except the statistics; but that is a separate topic). Placement on the completionist lists requires only a check if the player has satisfied the (rather straightforward) requirements; for example, that they played a platinum solo on every map. And this can be done in seconds by simple filtering of the archive (also doable with a few clicks in PHOF Search).
TCHOF has two additional complications: the leaderboards and the re-runs. The hall of fame for every challenge series contains one or more leaderboards to rank players or teams that completed some or all of the challenges in the series according to cumulative time taken for all the runs. This wouldn’t be a problem in itself if not for re-runs: the oft-seen phenomenon of players and teams redoing challenges and sometimes entire series, usually to measure an improvement in skill or beat the existing records. Without this, every player or team would have only one entry in the TCHOF for each challenge, and I could build the leaderboard following these steps:
- Filter by Series (so we end up with all the runs submitted for the series we’re making the leaderboard for)
- Filter to exclude bonus challenges (because they don’t count in the cumulative times)
- Filter by Difficulty (because every difficulty usually has a leaderboard of its own)
- Filter by Player or Team (so we end up with all the runs submitted by the Player or Team under scrutiny)
- Take the count of rows to see if they completed all the required challenges
- Take the sum of the Duration to get the cumulative time
- Sort the leaderboard by cumulative time in a separate spreadsheet or manually on the site
Re-runs make this relatively simple script impossible. Instead of one entry by a player/team for one challenge, I have several, and I need to select the best one (with the shortest duration) somewhere between steps 4 and 5. But this can’t be done with elementary filter/sort operations.
I’ll take a moment here for a digression on the subject of why re-runs are allowed. They weren’t before the Excel archives; one’s re-run of a challenge would simply overwrite the old entry in the halls of fame while they were kept on BSN Prime. All but one! The oldest series (as far as I know), Weekly Themed Solos 1, allows multiple entries under special conditions: if one submits a run with the Ten Waves Survived medal but with a longer (worse) duration than another existing run without the medal, both are displayed in the hall of fame, and both are taken into account for two different leaderboards. So, at the time when the present archive was created, ex-Clusum and I, as the curators, had the choice to either allow the re-runs in general or disallow them completely, which would’ve meant the retroactive rearrangement of the Weekly Themed Solos Hall of Fame, and that, we agreed, would not be a good thing. And if re-runs were allowed even for just one series, we’d still need a system in place that could handle them. And if we had a system in place for one series, why not apply it to the others too? And so the re-runs, and the complexity they introduce, are now a regular part of the archive.
Like I said, they can’t be handled with elementary table operations. To select the best of several entries by the same player/team for the same challenge, one must use the grouping mechanics in an SQL-like query. Grouping finds rows with identical values in some column (in this case, the Player or Team) and replaces them with a single row based on some calculation within the group (in this case, it’s the selection of the row with the minimum value in the Duration column).
Trouble is, this needs to be done for every challenge on every difficulty; and there are hundreds of challenges. So, after several lengthy, failed experiments, I developed a system to automate the whole process of putting together a leaderboard, using queries.
During this effort, I discovered another complication that perhaps wouldn’t have presented an obstacle if I didn’t have the ambition to automate things. It’s the achievements of individual players accomplished in a team. For example, one player could do the first five challenges of a 10-challenge series with one teammate, and the rest with another. Or, more commonly, a player who does all the challenges in a series that offers both duos and trios, must have done it within different teams. Also, submissions will typically enumerate team-members in a random order, so that, for example, half the duos I did with ex-Clusum was submitted as “Smehur | ex-Clusum” and half as “ex-Clusum | Smehur”; if these strings are entered into the TCHOF literally, it will see them as two different teams.
All this necessitated two not particularly transparent steps at the beginning of the pipeline: 1) the calculation of a Team column from the (up to) four Player columns and 2) creating an auxiliary table where each row has only one player (the file RunsByPlayer.xls on GitHub). Using the examples above, the Team column for all my duos with ex-Clusum always has the value “ex-Clusum | Smehur”, following an alphabetical sorting rule. In the case of the player who had entries for both duos and trios in some series, where the original table contained one row per run, the RunsByPlayer table will have one row per run per player. Having it decomposed like this allows me to create leaderboards for both teams and individual players.
For every challenge series, I have a separate Excel file that creates leaderboards by querying the RunsByPlayer table. The two top-most queries, “PlayerRuns” and “TeamRuns” do steps 1–2 from the script above, then group entries by Player (Team), Challenge and Difficulty to calculate a new column “Best Time” for each player (team) for each challenge on each difficulty. The tables resulting from these queries are then queried for each leaderboard separately. For example, the Themed Duo Challenges series (located in the file ME3MP-TCHOF-ThemedDuoChallenges1.xlsx on GitHub) has leaderboards for both individual players and teams on Silver, Gold and Platinum difficulty in its Hall of Fame. That’s six leaderboards, and six corresponding queries. The “PlayerSilver” query builds the individual player leaderboard for the Silver difficulty. Starting from the result of the “PlayerRuns” query, it filers entries on Silver (step 3 from the script), removes the Difficulty column (because all the remaining entries have the same value after filtering—Silver), and pivots the rest on the Challenge column, taking the minimum of the Best Time column.
The pivoting operation is where the magic happens. Pivot takes distinct values from a column and turns them into the new column headers for the resulting table, aggregating the contents of another column, similar to how grouping works. It’s like transposing the table, but with bonus grouping. The result of this, in the case of our “PlayerSilver” query, will be a new table with as many columns as there are challenges in the series, and with as many rows as there are players with entries for it; and each cell will contain the best time the corresponding player has achieved with the corresponding challenge (or nothing if the player has not played that challenge).
The “PlayerSilver” query loads the resulting table into a spreadsheet. There I take the column-wise count of challenges with non-null values and can filter by this count to show only those players who have completed all (in this case 20) challenges of the series (step 5); and I take the column-wise sum of the cells with best times to get the cumulative time (step 6). The players’ names, with the cumulative times, sorted from shortest to longest (step 7) is, finally, the leaderboard.
The actual script is, as you see, quite a bit more complicated than the naive one we started with. And there are still more complications; mostly various series-specific stuff. For example, The N7 Challenges, which happens to be one of the most popular series, has seven tiers of completion: one tier for every three sequential challenges. So it’s not enough to take the count of challenges a player has done: I need to take the count in each of the tiers, and to sum only the entries up to that tier for cumulative time (so sorting the whole leaderboard doesn’t cut it). Some series, like Fight for the Lost, have special completion categories (some of which were too complex to be captured by this archive). Several others have challenges for all team types, making for a large number of leaderboards (and queries). For example, the For Tuchanka Hall of Fame has 17 different leaderboards (individual/team for solo/duo/trio/team, plus three special ones).
On average, there’s probably 10 queries per an archive file processing each series; and with 20 series, that’s around 200 queries (and as many leaderboards). Figuring out, setting up and coding all of this was a lot of work—but it was well invested. The system makes updating the TCHOF a lot less daunting than it would be in any sort of “manual” scenario that I can imagine. This is what the process usually looks like:
- ex-Clusum, a veteran of ME3MP and one of the handful of contributors on the project, goes through the submissions in the forum thread and copies the supplied data into the TCHOF file
- I refresh the RunsByPlayer query, then for every series with new entries, I refresh the leaderboard queries
- I refresh the listings and leaderboards on the site (replacing existing runs and/or records by re-runs and new records as needed)
- I refresh the interactive report with TCHOF statistics (still a WIP).
As this is likely the last time I’ll be writing about queries in this series of posts, I want to go back to a query-related problem I complained about early on and have solved since:
As of this writing, the most troublesome limitation of PowerBI (and Excel) queries is that they don’t allow relative source paths. If I move the PHoF file, I need to edit each of a dozen queries in both reports to reflect the file-name change. This makes any collaborative endeavor impractical…
from Platinum Solos Hall of Fame
One solution for this is to put any paths to other files in query parameters, and then reference that in the queries rather than the files themselves. This way the parameter is the only thing one needs to touch in the case of a file location change; all the queries using it remain the same. It needs to be done in each of the 20 files containing TCHOF queries, so it’s still a chore—but a smaller one.
With regards to collaboration… it’s doubtful that anyone else will ever be using these files. The system works for me but it’s far from foolproof, and most people who might in theory be interested (or forced) to take up the curation of TCHOF would need to climb a steep and rocky learning curve. More than any other post in the series, this one was supposed be “educational”, at least back when I conceived of the idea. In the meantime, I realized I will probably never retire from this position. Mass Effect is an old game and I expect it to soon be laid to rest. Until then, doing the updates three or four times a year, now that I developed this pipeline, isn’t half as hard and time-consuming as it seemed in the beginning.