Input Latency on the Steam Deck - What it's like now and how to improve it
tl;dr - Due to Wayland forcing VSync, the Deck has 2 screen refreshes of unavoidable latency - ~33.3ms at 60hz to ~50ms at 40hz. This is outside of Valve's control. The built in limiter also adds a further 3 frames of latency at roughly whatever the framerate is limited to - ~50ms at 60fps, ~75ms at 40fps, ~100ms at 30fps, and ~150ms at 20fps.
To save ~ a frame of latency, add MANGOHUD_CONFIG=fps_limit=40,no_display mangohud %command%
to your game's launch args changing the number to your desired framerate instead of using the built in limiter.
As a note, a lot of what I'm about to say is unverified, but assumed based on testing. I've done as much research as I can, but I don't have the knowledge to dig through code and get any actual verifiable facts on how some things work. Everything here is based on info I've read and testing I've done myself.
If I'm wildly wrong with any of my assumptions here please let me know!
Frame Pacing
One of the best features of the Steam Deck is the ability to set the native refresh rate of the screen down all the way to 40hz. This has huge benefits with frame pacing when running at lower framerates.
Frame pacing is all about how consistently you can show a frame to the viewer, and the more consistent the smoother the game feels. This is especially noticeable at lower framerates.
Imagine you have a 60hz screen with a game running at 60fps. Every 16.6ms the screen will refresh, and present a new frame to the viewer. If the game is able to render the frame in time for the next refresh, then that frame will be shown. However if it's not, it will repeat the previous frame. This is felt as a stutter or a hitch, which makes the game feel substantially less smooth.
Now imagine your game runs at 30fps on that 60hz screen. The monitor will display a frame every 16.6ms, however the game will only draw a new frame every 33.3ms. This means that each frame will be displayed for two refresh cycles, but each frame will be consistently shown for 33.3ms.
Finally, imagine your game runs at 40fps on that 60hz screen. The monitor will continue to display a frame every 16.6ms, however the game has a new frame ready every 25ms.
The first refresh of the monitor at 0ms will show frame 1. At 16.6ms the monitor will refresh again, however the game hasn't drawn a new frame since new frames are only drawn every 25ms, so frame 1 is shown again. At 33.3ms the monitor refreshes a third time, and finally frame 2 is shown since that frame has been drawn. At 50ms the monitor refreshes a fourth time, and frame 3 is shown since we've now elapsed another draw cycle.
This means that frame 1 was shown for two frames, and frame 2 was shown for only one frame. This cycle will repeat if you keep going, with frame 3 being shown for two frames, and frame 4 being shown for one frame. Because this doesn't line up exactly, frames are shown alternating for 33.3ms and 16.6ms which is an inconsistent frame pacing. In a sense, the first frame is shown for 8.3ms longer than it should be, and the second frame is shown for 8.3ms less than it could be. This results in a distinct "judder" when playing the game.
This is where the 40hz mode of the Steam Deck excells - Since you can set the refresh rate of the screen to exactly match a framerate not divisible by 60, you can eliminate these frame pacing issues entirely.
For example, if the screen is set to 40hz and the game is set to 40fps, then the screen is refreshing at the same pace as new frames are being drawn. This means that each frame is displayed for an even 25ms, and the game feels much more smooth and consistent when running at 40fps.
VSync
Generally speaking, the way frames are drawn is with two buffers, a front buffer and a back buffer. The monitor will read from the front buffer to display an image at a set interval, and the GPU will write to the back buffer.
Without VSync, the buffers will be swapped whenever the GPU finishes drawing a new frame. When the frame is finished, the contents of the back buffer will be swapped with the contents of the front buffer immediately. Since this happens whenever the GPU is finished with a frame, this means that the swap can happen part way through a frame being displayed. When this happens, you get a distinct "tear" in the image where it switches from displaying the old frame to displaying the new frame. This has the advantage of very low latency in game, but can obviously result in visual artefacts.
In order to alleviate this, we can use VSync. This stops the buffers from being swapped until the frame has been completely displayed by the monitor, and it sends a sync signal saying it's beginning the next cycle. This means that the buffers are never swapped mid cycle, and there is no tearing. This works fine when the GPU can draw a frame quicker than a display cycle since it can just wait, but if the GPU isn't finished drawing the next frame then it has to wait an additional cycle before swapping the buffers. If your display refresh rate is 60hz (16.6ms) but the GPU takes 25ms to render a frame, it means that every frame gets displayed for two cycles, which is why a game with VSync on a 60Hz monitor will crash right down to 30fps if it's frame times are consistently over 16.6ms. This also increases input lag quite substantially, since in this example frames will be visually 50ms old when they're sent to the monitor, and after the monitor is finished drawing can be 66.6ms old which is very easy to feel. This is due to the start of the frame being drawn at the first cycle, it not being finished for the second cycle, and then finally only being displayed on the third cycle. This is also obviously bad since the GPU is sitting idle for a long time waiting on the monitor when it could be drawing new frames, potentially lowering latency.
The solution to this is a look ahead renderer, also known as triple buffering in OpenGL. What this does is add a an extra backbuffer so the GPU can keep working at all times. If the GPU can render frames faster than the refresh rate, then it alternates creating frames in the two backbuffers. As soon as it finishes drawing one, it swaps buffers and draws the next. When the display is ready for a new cycle, it takes the currently inactive buffer and draws that. If the GPU renders frames slower than the refresh rate, then as soon as it's done with one frame it can immediately start work on the next frame without having to wait for the VBlank interval. This is the scenario I describe in the Frame Pacing section, and OpenGL triple buffering can result in stutters or dropped frames if the renderer is slower or faster than the refresh rate of the monitor. This adds a small amount of lag compared to traditional double buffering, but gets rid of screen tearing.
(Confusingly, this discard behaviour in D3D is called Fast Sync on NVIDIA cards or Enhanced Sync for AMD cards. "Triple buffering" in most D3D titles is actually something entirely different, which is just an extra backbuffer before the front buffer. This has the benefit of smoothing out inconsistent framerates since it gives the GPU an extra frame of leeway, but at the cost of ~a frame of latency.)
Wayland's Forced VSync
EDIT: My assumptions on what Wayland is were partially wrong, please see this comment for a bit of clarification! Seems this is actually a part of Gamescope, which implements Wayland.
Wayland, the compositor that the Steam Deck uses, forces VSync at all times. This is why you never see any screen tearing on the steam deck.
From what I can tell based on my testing, this is my assumption of how it works. Gamescope will churn out frames as fast as it can make them, which is why the framerate doesn't lock to a specific number even though VSync is always enabled. When Wayland requests a new frame, gamescope sends the most recently generated frame. Wayland has a 3 frame FIFO buffer, and will only request a new frame on sync when everything shuffles.
With unlimited fps at 60hz on Portal doing a test with "Is it Snappy?" with a keyswitch with an LED on it, I consistently registered 45.8ms from the LED lighting to the start of the frame draw. Knowing a panel refresh is 16.67ms, we can divide this time into ~12.47ms of VSync while the current frame is being drawn, 16.67ms of the frame in the first buffer, 16.67ms of the frame in the second buffer, and then the next frame is our input frame. This means we have a Front buffer, and two frame buffers. As soon as the front buffer is finished everything shuffles along the queue, and Gamescope serves up a new frame in the second backbuffer.
My other data seems to generally line up with this. Doing the same test with an unlimited framerate at 40hz gave me times from input to visual between 50-75ms, or two frames + the sync. Based on the data, I'm fairly confident in saying that this is a hard limit due to the Wayland compositor - It is impossible to get lower than 2 full frames of lag at the refresh rate of the display.
This is fine, honestly. Games running at either 40 or 60 unlimited feel great in terms of responsiveness, and obviously there's no tearing. If you were playing with a mouse plugged in you might be able to feel it, but for the most part this is totally reasonable. Obviously though, there's a lot of wasted battery there hammering games at a million frames. Clearly you need a frame limiter?
An issue however arises when using the frame limiter built into the deck.
The Steam Deck's Frame Limiter
The main reason why I started on this is because of the horrible input lag that appears if you're using the Steam Deck's built in frame limiter. I noticed this when playing games with it set to 40Hz/40fps, the input lag felt absolutely atrocious.
After a lot of testing, I believe that for whatever reason, enabling the built in Steam Deck frame limiter adds an additional 3 frame buffer. This buffer isn't related to the refresh rate but is instead related to the framerate, so it has a much bigger impact at lower framerates. As the framerate is lowered, since the GPU is creating frames slower, that 3 frame buffer grows much longer very quickly.
At 60hz with an unlimited framerate, I got a delay of 45.8ms, as I described above. This had ~12.4ms of sync delay on the input from the light to the start of the next frame, and following the refresh stripe showed the deck did two further refresh cycles before starting to display the input frame.
When locking the framerate to 60, that delay literally doubled to 87.9ms. In this test, there are ~8.7ms of sync delay on the input from the light to the start of the next frame. Following the refresh stripe on the display on the video I took, there are 5 refresh cycles before the input frame starts to display. This means that the frame limiter delayed the input frame by 3 additional refresh cycles compared to no limiter.
Basically, don't use the built in frame limiter if you care about input latency.
A small tweak for an improvement
Limiting the framerate is extremely important for games running at 40hz, since it can allow you to get a much smoother and more even frame pacing. If the game is running at exactly 40fps with the screen at 40hz, every frame will be temporally consistent and smooth. There won't be any double frames, there won't be any skipped frames, and every frame will be an even distance apart. This is by far the best way to play some of the more demanding games on the deck, so especially when running at 40hz the frame limiter is a must.
Thankfully, there's a workaround that improves things a little bit until Valve improves the implementation of the built in frame limiter.
MangoHud is the software Valve uses to display the detailed performance information when in game, and has a much less impactful frame limiter included. Go into game mode, and go to the game you want to limit in Steam. Go into properties, and in the launch argument box type MANGOHUD_CONFIG=fps_limit=40,no_display mangohud %command%
, where the number is the desired FPS limit.
When you launch the game, the FPS will be limited to your chosen value with a lower impact on input latency.
For comparison sake, at 60hz with the framerate locked to 60 using MangoHud instead of the Deck's built in limiter, the delay was 66.7ms. 4.2ms of this was sync delay, and there were then 4 refreshes of the screen before the input frame started to display. This means that MangoHud is at least a frame faster than the Deck's built in frame limiter.
The biggest thing here though is that extra frame makes a huge amount of difference when running at lower framerates since it's a buffer frame at the speed the GPU can render, not at the speed of the vsync. At 40hz/40fps we're talking 25+25+25+25(+25) - 125ms with the built in limiter compared to 100ms with MangoHud, and at 40hz/20fps we're talking 25+25+50+50(+50) - 200ms with the built in limiter compared to 150ms with MangoHud.
The other viable workaround here is to use a frame limiter built into a given game. This will restrict frames without incurring the additional framebuffer that the other frame limiters have, meaning you're only having to deal with the base Wayland VSync buffer frames.
What needs to be done?
On Valve's end, the framebuffer for the frame limiter needs to be reduced. If it's possible to reduce this down to a single frame (which should be theoretically doable!) then the latency gains over the current setup would be substantial. At 60hz/60fps we'd go from 83.3ms down to 50ms. At 40hz/40fps we'd go from 125ms down to 75ms, and at 40hz/20fps we'd go from 200ms down to 100ms.
Regarding the forced VSync of Wayland, there has been a pull request open for a long time to add a method of disabling VSync. If this was eventually merged, it would be possible to disable VSync and get rid of the minimum 2 frame delay at the cost of some screen tearing. Unfortunately it seems to be stuck in limbo, so I think for the time being we'll have to accept at least 2 frames of delay no matter what else is improved.
This is definitely an issue that can be vastly improved, and would go a huge way for making games on the Deck feel so much better. Currently the choice at lower framerates is between stuttering due to inconsistent frame times or extremely high input lag, but realistically you should be able to get the best of both worlds if the buffers are improved.
Other threads with info about this
https://old.reddit.com/r/SteamDeck/comments/ug9kc2/psa_enabling_the_framerate_limiter_adds/ https://old.reddit.com/r/SteamDeck/comments/v3rcb7/steam_deck_input_latency_test/