Rainbow 6 Siege - Network Analysis


There was recently several posts appearing on the Rainbow 6 sub-Reddit, Rainbow6, regarding two somewhat linked topics;
1) Complaints over the high-ping peekers advantage due to unfair anti-latency implementation[1]
2) Exposure of a player seemingly hacking while taking part in ESL[2][3]
The reason they are partially linked is because there have been concerns regarding weaknesses in the service-side architecture and the data that the server shares during game-sessions. We will begin this analysis by taking a look at the first of these; Peekers Advantage.

The deal with 'Peekers Advantage'

The peekers-advantage is the reality of the client (player) always existing in his own world that is only showing proximate data from the server, while being able to move and perform actions before the server has been notified about it. Generally this would seem to not be a problem, the server would eventually get updated about the players actions and merely verify that he is allowed to do them and them initiate their effects upon the game world. But while in old games such as QuakeWorld and Quake 2 operated in such fashion, today's games allows not only movement and shooting to be initiated, but also allow the client to see them carried out before the server has been notified. This is done by games often to allow the player to feel like the game is very responsive, even if the players own ping towards the game-server or other players are higher then usual (usual here meaning what was once the standard; LAN play). Internet play always means a longer and noticeable delay if the server has to authorise and approve each action before they are displayed in the world. We are not even talking about lock-step motion, such in original Quake, but the effect of spawning projectiles on all connected clients only when they are part of the server down-stream. QuakeWorld and Quake 2 allowed free movement while requiring projectiles to be server-spawned only. A player with a ping of 250 to the game-server would therefore see the rocket from his rocket-launcher spawn 500 ms after he has pressed the trigger. While this was technically the best approach of the time, it gave low-ping players a considerable advantage since high-ping players would need to predict each shot with the fluctuating ping in mind. The advantage and disadvantages with their associated complaints gave rise to the term Low Ping Bastard (LPB) and High Ping Whiner (HPW).

With the new generation of gamers that found online play, with the rise of game (mods) such as Counter Strike and the expansion of the market for Game Consoles into regions with lower quality network infrastructure, games started to implement ways around the issue of shooting and actions above movement being authored by the server. In some cases it took the shape of the clients all being allowed to authorise the firing and impact (i.e. dealing out damage) on other players. This was cheaters dream come true - suddenly a player should script a weapon to be a veritable nuke and kill everything in sight each time it was used, as was the case of the gold version of Diablo. Or as in the case of the initial retail-version of Warcraft, edit all out-going packages so that their player was always 30 meter up in the air, un-reachable from being hit by any PvE mobs or PvP players close-combat attacks. And many other examples.

So game-makers learnt to pull back a little - the client could initiate actions, give the player initial instant feedback (spawn hit-effects, such as blood-splatter) but all actions had to be post-verified by the server. Illegal moves/actions could cause the event to be dropped and/or the player disconnected. This however wasn't enough as it still allowed the client to describe the scenario where the other player was hit, by delivering all parameters for the event - including the opponents position. The verification server-side had to become smarter.

So the next step was to implement what we today count as the predominant anti-latency solution; Historical Roll Back And Pairing.

The short technical explanation for this is that when a client sends sought state-updates to the server it does so with the server having either a determined or an estimated understanding of the clients latency. This is then used as the starting-point in the servers buffer of historical data, for evaluation of valid actions. For example: The server sent the client an update indicating that an opponent was located at position x, y at time z. When it reaches the client the server is at z+h, but the client still lives in time z. The client now sends the update of the player shooting one bullet at the opponent, a perfect hit in the head. This update arrives at the server at z+2h server-time, with a note that the event occurred at z player time. The server rolls back into the back-log of events distributed originally, and noticed that yes, that action of shooting would hit the player at that point. The server sends back confirmation to the player that all his actions sent it was approved (or not; that is when we see rubber-banding from movements) and that the opposing player was killed due to the successful headshot. The opposing player on his side is also sent the same - and therein begins the problems. But more on that further down.

This approach requires that the server maintains a near-history buffer of all known events, preferably as they would be known to each client (this is rarely done, tolerance for in-correctness is implemented instead), up to a set time-span. For Battlefield 4 this is reportedly 300 ms, but for Rainbow 6 Siege this may be up to 1000 ms and even above[4]. While in theory this is a very good thing, the method it is implemented in most games causes secondary issues - most crucially it gives undue benefit to peekers and even more so for high-ping ones.


Why Historical Roll Back And Pairing fails to be the solution

On the surface the solution to allow a client to have a 1:1-input match against old data and still accept it as valid would seem to be a silver bullet to the online gaming scene. So why is it failing so often? Let's look at what happens when a player with a latency of 200 ms moves into a room and fires 4 bullets, with the 4th one connection to the opponents head.

At z: 0ms (local time) the player initiates his movement into the room, emerging from behind the door-post, to see the opponent and shooting him. Let's assume the player was aware that a player was there and there is no reaction-delay before he starts shooting. So we can discount that for this exercise. The opponent has a latency of 50 ms. During a side-by-side comparison the player sees what the server sends out 150 ms after the opponent. The opponent sees what the player chooses to do 250 ms after the player has done it. Let's assume the shots from the player occur at 50 ms, 100 ms, 150 ms and 200 ms mark. For easy calculation, let's assume the client only manage a send-frequency of once every 200 ms, rolling up every action with time-marks as to 'when they occurred in local time'.

As the packets are received by the server they are un-wind and matched to the historical record. To simplify the scenario, it is assumed the opponent is hiding stationary in a corner behind a shield. So if the client saw him and shot upon him, it is easy for us to imagine that the hits will not fail any sanity check. So let's assume all the data delivered by the client is good. Multiple steps of actions (depending on the tick-count of the game) have been gathered into one big historical statement packet from the client. These are all tested as soon as possible, since for the server these are all 'in the past' and there is no point replaying them in 'real-time' when it comes to validation (but it will when it comes to delivering it to opponents, see below). So each tick is validated and approved, the client moved into the room and fired 4 shots upon the opponent - with the 4th bullet pushing the opponent into 'DEAD' state due to either damage or headshot. As the events are accepted, they are now put as current events for the player to replay, with start at the moment the events was received.

The replaying of the events by the server in 'adjusted real-time' allows the movement to look smooth, without any warping, as they are pushed to the opponents. It does however mean that the motion into the room is observed by the opponent - as noted above - at earliest 250 ms after they actually begun on the clients machine. Any counter-action would therefore be against a delayed reality that is (client + opponent) in ms behind any actions being carried out.

So what about the shooting? As been noted during hours upon hours of research thanks to video recordings and 'death cam replays' there is a big difference between what the opponent will initially see and what they (and others, if it was the end of the round-kill) will be shown during the 'replay' cam. So why will the opponent not see the client enter the room and shoot a sequence of shots, but seemingly die right away?

When the actions arrive as listed in the packet from the client they are flattened and compared to the historical data the summation is that the 4th bullet causes a 'kill' (via opponents death). This is a special event, and the game-makers do not want the opposing player to keep moving for another 200 ms before (the full replay-time to sync up the clients movement alongside new actions) he is notified that he is dead, since it will delay the client from seeing the death of the opponent a full 600 ms (200 ms latency + 200 ms replay + 200 ms latency back) before they observe that their hit did indeed gain a kill. Since the whole reason to do immediate shooting with local spawning of hit-decals and blood-splatter effects, and then usage of Historical Roll Back And Pairing to make sure the data is not faked from the client, is to fake a real 'snappiness' to the game the decision from the developers is clearly to allow a server-detected kill-hit to bypass the to be acted out right away. So given the data above the server, after flattening and approving all events, sends out a KILL-event to all connected nodes of the game. The opponent receive this 50 ms after the server got the full packet - while the client on the opponents screen is still outside of view just behind a door-frame. Since the full history of the actions where sent along-side the KILL-event, the Replay cam shows the event closer to what the client saw.

So how could the peekers advantage be removed

The peekers advantage only exists for successful DBNO and KILL states occurring. Damage received is replayed as movement, with starting-point at the moment the server received it. But the presence of One-Shot-Kill headshots in Rainbow 6 Siege makes it extra vulnerable to the effect compared to games such as Battlefield 4. The only way to fully remove the peekers advantage is to replay all states in the same way, no KILL-priority that bypasses the replay queue. This will allow an opponent (often a defender) to see the opponent enter the room and allow them to shoot back before they are mortally hit.

What would be the side-effect of doing this?

The side-effects would be as follows:

  1. - High latency players would see their kill-shots with full latency
  2. - More 'both-kill' events would occur, as the opponent back would pass a Historical Roll Back And Pairing test

The both-kill scenario can be minimized if the Historical Roll Back And Pairing test of a damage-allocation is rejected if the intended target's death is already part of the replay-data. This would remove situations where high latency players would, as seen from the low latency players perspective, be able to shoot back after they under all reasonable intent, and possibly even after all other clients have been informed of, their death. Another approach is to have a slightly more aggressive 'highest latency allowed' requirement of the connecting players.

Please note: This is a work in progress.