>is hard to believe they ignored the risky aspects. I bet they were instructed to ignore the risk
The specific issue that Pentium line CPUs: a) do privilege check asynchronously; b) do it only for the "winning" execution branch was very well known among CPU design community.
Intel architects even bragged about that as their "innovation" in industry journals and filled a number of patents for that (this is the reason amd privilege checker runs on all branches)
And when Intel did this, everyone was happy that the cost of system calls went down. Now everyone is saying that they secretly knew that it was a security issue and only an idiot would have implemented it.
System calls where always slow because they used to be called via a software interrupt call.
SYSCALL is an in x64 instruction that speeds this up, introduced by AMD.
Speculative execution predates SYSCALL by about 5 years.
System calls are now slower because the kernel memory now has to be mapped and unmapped when the system call enters/leaves rather than be mapped all the time. This has to be done because memory that was marked as privledged can now be accessed by user programs i.e. memory protection no long
System calls where always slow because they used to be called via a software interrupt call.
And software interrupts were slow because they were not considered branches by early branch predictors and so triggered a complete pipeline flush equivalent to a branch mispredict (followed immediately by another branch, which SYSCALL removed). Intel addressed this by treating software interrupts as normal branches for the branch predictor, with an extra hint that they changed privilege level. This gave a small improvement to the Pentium, but was a huge boost on the Pentium 4, where the pipelines were lon
Speculative execution across ring changes is the root cause of this. AMD doesn't do this because Intel patented it, told AMD, and didn't include it in their cross-licensing agreement.
AMD does not do this because they check TLB permissions on speculative loads instead of waiting until instruction retirement. The speculation still happens up until the load.
The MMU protection isn't bypassed, because the instructions that would be bypassing the MMU protection are cancelled.
And AMD just never speculates the instructions which would cause a protection fault until the branch is resolved.
Thinking about it now, this *is* slower by a tiny bit but fixing it with address isolation is massively slower.
You can bet that AMD was just waiting for the patent to expire before doing it, because without it you have to wait until all branches up to the system call have been retired before you can perform the transition.
I read that AMD has a patent for checking the permissions during the speculative load but I do not know why this would not be part of their cross licensing agreement with Intel. Apparently this goes back to the Opteron.
Correction needed (Score:5, Informative)
>is hard to believe they ignored the risky aspects. I bet they were instructed to ignore the risk
The specific issue that Pentium line CPUs: a) do privilege check asynchronously; b) do it only for the "winning" execution branch was very well known among CPU design community.
Intel architects even bragged about that as their "innovation" in industry journals and filled a number of patents for that (this is the reason amd privilege checker runs on all branches)
Re: (Score:4, Interesting)
Re: (Score:2)
System calls where always slow because they used to be called via a software interrupt call.
SYSCALL is an in x64 instruction that speeds this up, introduced by AMD.
Speculative execution predates SYSCALL by about 5 years.
System calls are now slower because the kernel memory now has to be mapped and unmapped when the system call enters/leaves rather than be mapped all the time. This has to be done because memory that was marked as privledged can now be accessed by user programs i.e. memory protection no long
Re: (Score:5, Informative)
System calls where always slow because they used to be called via a software interrupt call.
And software interrupts were slow because they were not considered branches by early branch predictors and so triggered a complete pipeline flush equivalent to a branch mispredict (followed immediately by another branch, which SYSCALL removed). Intel addressed this by treating software interrupts as normal branches for the branch predictor, with an extra hint that they changed privilege level. This gave a small improvement to the Pentium, but was a huge boost on the Pentium 4, where the pipelines were lon
Re:Correction needed (Score:2)
Speculative execution across ring changes is the root cause of this. AMD doesn't do this because Intel patented it, told AMD, and didn't include it in their cross-licensing agreement.
AMD does not do this because they check TLB permissions on speculative loads instead of waiting until instruction retirement. The speculation still happens up until the load.
The MMU protection isn't bypassed, because the instructions that would be bypassing the MMU protection are cancelled.
And AMD just never speculates the instructions which would cause a protection fault until the branch is resolved.
Thinking about it now, this *is* slower by a tiny bit but fixing it with address isolation is massively slower.
You can bet that AMD was just waiting for the patent to expire before doing it, because without it you have to wait until all branches up to the system call have been retired before you can perform the transition.
I read that AMD has a patent for checking the permissions during the speculative load but I do not know why this would not be part of their cross licensing agreement with Intel. Apparently this goes back to the Opteron.