While Stadia devs blame the Linux Kernel for the lag, Linus Torvalds blames developers. |
Google Stadia has had a predictably glitchy and lackluster launch in 2019. Many of the problems gamers predicted would happen when Stadia exited the development phase were proven right when the Stadia was made public this past November –not the least of which was issues with milliseconds–long LAG.
But a developer named Malte Skarupke has shed some light on what he believes the problem is: Spinlocks.
I overheard somebody at work complaining about mysterious stalls while porting Rage 2 to Stadia. The only thing those mysterious stalls had in common was that they were all using spinlocks. I was curious about that because I happened to be the person who wrote the spinlock we were using. The problem was that there was a thread that spent several milliseconds trying to acquire a spinlock at a time when no other thread was holding the spinlock. Let me repeat that: The spinlock was free to take, yet a thread took multiple milliseconds to acquire it. In a video game, where you have to get a picture on the screen every 16ms or 33ms (depending on if you’re running at 60Hz or 30Hz), a stall that takes more than a millisecond is terrible. Especially if you’re literally stalling all threads.Spinlocks, –for those of you who don't speak fluent programming– is one of the ways a computer waits for input from a program, and processes a programs' request. We call these requests "treads". Since dozens of threads can be brought to a computer at the same time, software developers would use spinlock commands in their code to cause every thread to wait its turn, so each request can be handled one at a time. Imagine a revolving door that blocks whoever is behind it until one person gets through; hence the term 'spinlock'. In linux, this whole spinlock thing is handled by the Linux Kernel event scheduler.
Spinlock, isn't the only way a computer can handle the kind of traffic threads create. There are multiple methods that a programmer can use to control thread traffic, such as Mutex, Semaphore, and "Busy wait." Sharpuke suggests that "most mutex implementations are really good, that most spinlock implementations are pretty bad and that the Linux scheduler is OK but far from ideal."
Sharupke spent months investigating the problem himself, and his conclusion seems to be that the Linux implementation of Spinlocks kernel scheduler is causing problems with games ported to Stadia. Since the Stadia datacenters are built on Google's implementation of Linux, and since the Chrome OS and Android Kernel are all based on the Linux kernel, these milaseconds-long delays could be multiplied on any games that use the Spinlock method for event handling. The developer settled on using Mutex to handle threads, suggesting that it's faster.
Linus Torvalds reply
But the patron inventor of the Linux kernel, Linus Torvalds, did his own digging into the issue. According to Phoronix, Torvalds investigated these reports, calling them to be "pure garbage."In a long-ass email thread response to Phoronix, Torvalds had this to say:
The whole post seems to be just wrong, and is measuring something completely different than what the author thinks and claims it is measuring. First off, spinlocks can only be used if you actually know you're not being scheduled while using them...It basically reads the time before releasing the lock, and then it reads it after acquiring the lock again, and claims that the time difference is the time when no lock was held. Which is just inane and pointless and completely wrong. That's pure garbage.
Torvalds seems to be suggesting that developers are using spinlocks wrong, or that they shouldn't be using it in the first place. Rather than measuring how long spinlock takes to unlock after receiving a thread, they're measuring how long it takes spinlock to get a thread in the first place, and the time it takes sitting with nothing to do.
Torvalds suggested a fix.
Use a lock where you tell the system that you're waiting for the lock, and where the unlocking thread will let you know when it's done, so that the scheduler can actually work with you, instead of (randomly) working against you...I repeat: do not use spinlocks in user space, unless you actually know what you're doing. And be aware that the likelihood that you know what you are doing is basically nil.
Well, there you have it. It's not the Linux kernel event Scheduler that's the problem; it's the fact that developers are using it wrong, or shouldn't be using it in the first place.
Linus then goes on to write a very long thread of posts on RealWorldTech.com, where he explains in great detail, the correct way to use the Linux Kernel scheduler.
Spinlocks is implemented differently, depending on the OS. In Windows, Spinlock commands are used mostly for user input. On linux, -an operating system used mostly in datacenters and online networking- spinlock mostly revolves around backend code, or things that don't require user input, hence why Linus suggests not to use spinlock commands when programming for "user space", I.E, any keystrokes or mouse clicks.
Since Linus chimed in, only time will tell if his suggestions will sort out the myriad of lag problems the Stadia already has.
Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained! best headphone or earbuds for basketball
ReplyDelete