The way modern register renaming works is that every single write to a register name always allocates a new physical register. It's not even possible for there to be false dependencies because of reusing the same name over and over. From the programmers point of view only the working registers are really important, other registers are just OS related MSR , some are really important, but the internal representation may be total different.
Like a mode switch bit in a CR register. So MSR-r are just the interface. And the MSR register access can be "slow", so no synchronization or optimization required. But, the idea of more register makes better architecture, is a total bad assumption. See the dead body of Itanium general-purpose 64 bit integer registers, floating point registers etc. With multitasking, one have to switch between context, and larger context register file size takes more time.
So you may trust your compiler for vectorization, but it may make more harm than good. IMHO what killed itanium wasn't too many registers, and not even compiler difficulties — it was an attempt to have a working x86 emulation. So, instead of a weird but very fast CPU, it ended up being not very fast both in x86 and native modes, while still being weird.
The makers of the Cell CPU did not compromise, went full weird, and had a winner of sorts. If instruction length and encoding were not an issue, I bet we would have seen memory-to-memory ISAs where no GPRs exist, only instructions referencing memory locations.
The dynamic register file would then be just a level below L1 cache, or even completely removed. Sparc chips got around that by having sliding windows of registers: instead of having to push all the registers to the stack you just moved the window. Only registers that are orthogonal to each other should be counted. They share the underlying storage i. Not saying that it's not interesting to know how much actual storage the register file offers; just highlighting that TFA focuses on the instruction encoding angle of the question, which is also important.
CPU architectures are masterpieces of tradeoffs. Put too many registers and your instructions steam is not dense enough and you cannot keep your cpu busy due to stalls in the fetch phase. Also context switches become expensive there are solutions to that though. Put to few registers and you have to spill registers to memory too often, and thus also consume precious instruction stream space.
Then count RAX as 64 bits of registers. Good point, but they are both part of AX and up, so I don't think they should count. I disagree strongly with that characterisation. Just no. I think it's pointless to debate a methodology without a purpose. With this question, subregisters shouldn't be counted. Subregisters should be counted. Even these one might argue aren't directly useful; when considering context switching, one could dig down further into how much of the context switching time is attributable to saving the registers, validate that with experiments across architectures, etc.
This is something that has been bothering me for some time now - actually since the mid's: why not implement multiple contexts as an index into a large register file? It will impact latencies, but would the impact of having, say, 8 contexts not be smaller than having to hit L1 or L2 for the same data? Isn't that part of what hyperthreads do? True, but at two per core 4 or 8 in more enlightened architectures , it's very meh.
I would assume that, instead, a modern CPU tags decoded instructions in the reorder buffer with the virtual core number and register set it should be applied to. This way, the parallelism would be much easier to exploit.
The emulator has to handle and the testing infrastructure has to have tests for each subregister and how it affects its parent register. Your point that they can't be distinct holds, but but if they were distinct they would have less impact on the engineering than they will have as subregisters.
On the other hand, they do exist as a separate case, and have instructions that treat them as an individual register. There is precedent for all this too. Take the , having 8 bit A and B accumulators.
These can be addressed as D, a 16 bit register. D is not generally counted as it's own register because the D addressing does not point to anything new. A and B are counted as two registers. If somehow D brought new bits, say it was 24 bits long, then it would need to be counted. In physical terms, I probably would not count it. However if I were doing emulation, I would have to count it because it is a register case that has to be addressed. Just like the D register has to be dealt with on a My own personal way to resolve this has always been to determine whether or not a given register specification, that can be addressed somehow, brings new information not contain in any other register specification, to the table.
Fact is, CPU designers do all kinds of crazy things with registers. They overlap, they may be indirect, like not directly addressable, but still there as consideration for the programmer. There's a little circuit it keeps track of account, and some rules, and account may be a register that may or may not be directly addressable any other way. My general take on this article is, "wow, that's a lot of registers! A reference for Rosetta being a translation, rather then emulation.
Rosetta is an emulator. This is not the case for Rosetta 2, which does a translation pass before running the generated ARM binary. No x86 code is run. I'd argue that "translation" is an implementation detail of emulation. Your translated app still thinks it is x witnessed by running "uname -a" in a Rosetta terminal. That's just the difference between a JIT and not - something like QEMU in the other direction doesn't "run" ARM code either, but in the end it really doesn't matter and is only a minor pedantry.
But yeah, a more useful count would consider the sub-registers as part of the main register and the same for other ISAs like bit ARM, which does have a bit view of its bit general purpose registers , and would not consider registers outside each core like the MTRR registers.
Press ESC to cancel. Skip to content Home Dissertation How many registers does an Intel processor have? Ben Davis February 28, How many registers does an Intel processor have? How many registers are in a processor? How many pair registers are available? What is the right combination of registers as a pair? How accumulator is different from other registers?
They can be, depending on your definition, countered as registers. There are even internal registers that are not exposed through the instruction set but are used for performance reasons. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group.
Create a free Team What is Teams? Learn more. Ask Question. Asked 3 years ago. Active 9 months ago. Viewed 8k times. I am currently learning reverse engineering and am studying the flags register. I have spent at least an hour on this and found numerous different answers. Improve this question. NirIzr 11k 1 1 gold badge 34 34 silver badges 83 83 bronze badges. Although OP is learning reverse engineering, this question in particular has little specifically related to reverse engineering.
How many registers does a x processor have? Add a comment. So a function A calling a function B uses the same set of registers as B itself. Therefore, B has to save the contents of all registers it uses which still hold A's values and has to write them back before returning in some calling conventions it is the job of A to save its register contents before calling B, but the overhead is similar.
The more registers you have, the longer does this saving take, and thus the more expensive a function call becomes. Robert Buchholz Robert Buchholz 2 2 bronze badges.
Dcache also tends to support fewer accesses per cycle e. Clayton: If the L1 cache has a cycle latency, that would suggest that there might be some benefit to having e. Many methods have between 16 and 60 words of local variables, so cutting access time for those from cycles to one would seem helpful.
This is getting chatty so should probably end or go elsewhere. Add a comment. It all comes down to the difference between computation and execution! Hubert Lamontagne Hubert Lamontagne 41 1 1 bronze badge. If you have less register, you don't have to store and restore much when calling and returning from functions or switching tasks with the trade off of lacking of registers in some compute-extensive code Moreover, the larger the register file, the more expensive and complex it will be.
Realz Slaw Realz Slaw 6, 29 29 silver badges 67 67 bronze badges. And if you want to move lots of data around for example to perform three instructions with two operands and one result each in the same cycle a bus will absolutely, absolutely not work. Olsonist Olsonist 5 5 bronze badges. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Featured on Meta. Now live: A fully responsive profile.
Linked 1. Related Hot Network Questions.
0コメント