Apple's upcoming CPU-architecture change: ARM-based "Apple Silicon"

lpetrich · Nov 27, 2020

MacBook Air M1 review: Windows laptops are so screwed

ut are these new Macs really better than Intel-powered ones?

1,000 percent yes!

All of the hype about the 8-core M1 chip in the MacBook Air — up to 3.5x faster CPU and up to 5x faster GPU performance, and almost double the battery life compared to the previous-gen Intel MacBook Air — is real. I’ve been using the entry-level $999 MacBook Air with 8GB of RAM and 256GB of storage nonstop as my only computer for a week and it still doesn’t feel possible that a laptop this thin and this light is capable of all this power and battery life. It makes my 2019 13-inch MacBook Pro with Intel Core i5 and 16GB of RAM look like smoldering trash now.

What Apple has achieved with the M1 is nothing short of groundbreaking. It pains me there’s still no touchscreen, there are only two USB-C ports, there’s no SD card slot, and I know I'm bound to run into some apps that don’t emulate well (or at all) with Rosetta 2 (macOS Big Sur’s x86 Intel app translator), but these are all trivial issues.

The M1 MacBook Air (and M1 MacBook Pro) are now the best laptops regardless of operating system. They’re the new gold standard by which all laptops will be judged, and this is just the start. In a few years, we’ll look back and wonder how we ever tolerated laptops with anything less than this kind of performance.

Let's see how the PeeCee world reacts to it.

Microsoft is adding x64 emulation to Windows on ARM

Yesterday, Microsoft officially announced that it’s working on an x64 emulation for Windows on ARM, which will pave the way for up-to-date versions of applications like the Adobe Creative Suite to finally work on the platform.

“We will also expand support for running x64 apps, with x64 emulation starting to roll out to the Windows Insider Program in November,” Microsoft Chief Product Officer Panos Panay said in the announcement.

Seems like M$ will have to do a lot of catching up.

In Linux-land,
PINEBOOK Pro | PINE64

A Powerful, Metal and Open Source ARM 64-Bit Laptop for Work, School or Fun

The Pinebook Pro is meant to deliver solid day-to-day Linux or *BSD experience and to be a compelling alternative to mid-ranged Chromebooks that people convert into Linux laptops. In contrast to most mid-ranged Chromebooks however, the Pinebook Pro comes with an IPS 1080p 14″ LCD panel, a premium magnesium alloy shell, 64/128GB of eMMC storage* (more on this later – see asterisk below), a 10,000 mAh capacity battery and the modularity / hackability that only an open source project can deliver – such as the unpopulated PCIe m.2 NVMe slot (an optional feature which requires an optional adapter). The USB-C port on the Pinebook Pro, apart from being able to transmit data and charge the unit, is also capable of digital video output up-to 4K at 60hz.

lpetrich · Nov 27, 2020

ARM architecture and

AArch64 - 64-bit extension of the ARM architecture

ARM = Acorn RISC Machine, then Advanced RISC Machine

It goes back to

Acorn Archimedes (late 1980's) made by

Acorn Computers

ARM chips have been used in oodles of game consoles, PDA's, cellphones, tablets, and other such devices.

The ARM architecture is classified as RISC because of several features.

Reduced instruction set computer

It has a load/store architecture, with instructions that only do loading from main memory and storing to it, and no other instructions doing those actions.

It has limited or no support for misaligned memory accesses. Aligned: 16-bit on 2-byte boundaries, 32-bit on 4-byte boundaries, 64-bit on 8-byte boundaries.

It has fixed-sized 32-bit-long instructions, though it has a Thumb mode where it can use 16-bit-long instructions to save memory.

Its "register file" is 16 32-bit registers (32-bit versions) or 31 64-bit registers (64-bit versions)

Some versions of ARM chips don't have hardcoded divide instructions, though all versions have hardcoded add, subtract, and multiply ones.

So it looks like we are seeing RISC on the desktop again.

lpetrich · Dec 2, 2020

AWS engineer puts Windows 10 on Arm on Apple Mac M1 – and it thrashes Surface Pro X | ZDNet - "A virtualized Windows 10 on Arm runs faster on Apple's M1 hardware than on Microsoft's own Arm-based Surface Pro X."

ut Microsoft's reluctance to create a license for Windows 10 on Arm for end users hasn't stopped creative engineers from putting together a working example of what things could be like if it did.

AWS principal engineer Alexander Graf did just that, using the open-source QEMU virtualization software for Windows on Arm. QEMU emulates access to hardware such as the CPU and GPU. Graf's work was spotted by The 8-bit, via 9to5Mac.

ARM Windows M1 Mac virtualization demonstrated - 9to5Mac

Developer successfully virtualizes ARM Windows on Apple Silicon | The 8-Bit

Alexander Graf on Twitter: "Who said Windows wouldn't run well on #AppleSilicon? It's pretty snappy here

. #QEMU patches for reference: (links)" / Twitter
with a screenshot and QEMU patches - Patchwork

lpetrich · Dec 7, 2020

Apple isn't stopping with the M1 chip.

Apple (AAPL) Preps Next Mac Chips With Aim to Outclass Highest-End PCs - Bloomberg

Apple’s Mac chips, like those in its iPhone, iPad and Apple Watch, use technology licensed from Arm Ltd., the chip design firm whose blueprints underpin much of the mobile industry and which Nvidia Corp. is in the process of acquiring. Apple designs the chips and outsources their production to Taiwan Semiconductor Manufacturing Co., which has taken the lead from Intel in chip manufacturing.

The current M1 chip inherits a mobile-centric design built around four high-performance processing cores to accelerate tasks like video editing and four power-saving cores that can handle less intensive jobs like web browsing. For its next generation chip targeting MacBook Pro and iMac models, Apple is working on designs with as many as 16 power cores and four efficiency cores, the people said.

20 cores on a chip.

Apple could scale back release chips with 8 or 12 high-performance cores instead of 16.

For higher-end desktop computers and Mac Pro models, Apple is working on chips with as many as 32 high-performance cores.

Currently with Intel-x86 chips, Apple's highest-end laptops can have as many as 8 cores, a high-end iMac Pro 18 cores, and a high-end Mac Pro 28 cores.

AMD, Intel's main competitor with the x86 architecture, offers desktop CPU chips with as many as 16 cores, and high-end gaming-PC ones 64 cores.

Turning to graphics processors, the M1 includes an 8-core GPU, and for high-range laptops and mid-range desktops, Apple is testing 16-core and 32-core GPU's, and eventually 64-core and 128-core GPU's.

lpetrich · Dec 7, 2020

Apple is also planning on a "half-sized" Mac Pro.

That would fill in a gap between Apple's Mac Mini and Mac Pro. That is a gap that has existed in Apple's product line for a long time.

I think that a problem is Steve Jobs's design ideology, for lack of a better term. He likes all-in-ones. The Lisas and earliest Macs were all AIO's, but when SJ for forced out of Apple in 1985, Apple started offering what the PeeCee world had long had: boxes.

But in NeXT, SJ indulged in his love of AIO's, coming out with his NeXT Cube.

He returned to Apple in 1997, and he got into AIO's again with the iMac line. Apple introduced some high-end computers with little expandability, like a cylindrical Mac Pro, but they didn't do well, and Apple has made its most recent Mac Pro more expandable.

While it's OK to have low-end computers have little or no expandability, high-end users like expandability, and it's good that Apple's getting away from SJ's design ideology there.

barbos · Dec 7, 2020

There is a advantage to non-expandable RAM. It consumes less power and faster, sometimes ridiculously faster (HBM)
I think eventually these things will outweigh inability to expand.

lpetrich · Dec 8, 2020

barbos said:
There is a advantage to non-expandable RAM. It consumes less power and faster, sometimes ridiculously faster (HBM)
I think eventually these things will outweigh inability to expand.

High Bandwidth Memory

HBM achieves higher bandwidth while using less power in a substantially smaller form factor than DDR4 or GDDR5.[7] This is achieved by stacking up to eight DRAM dies (thus being a Three-dimensional integrated circuit), including an optional base die (often a silicon interposer[8][9]) with a memory controller, which are interconnected by through-silicon vias (TSVs) and microbumps. The HBM technology is similar in principle but incompatible with the Hybrid Memory Cube interface developed by Micron Technology.[10]

lpetrich · May 19, 2021

Apple’s M1 is a fast CPU—but M1 Macs feel even faster due to QoS | Ars Technica
QoS = Quality of Service
"Howard Oakley did an excellent deep dive on M1 scheduling and performance."
In How M1 Macs feel faster than Intel models: it’s about QoS – The Eclectic Light Company

Back to Ars Technica.

There's a very common tendency to equate "performance" with throughput—roughly speaking, tasks accomplished per unit of time. Although throughput is generally the easiest metric to measure, it doesn't correspond very well to human perception. What humans generally notice isn't throughput, it's latency—not the number of times a task can be accomplished, but the time it takes to complete an individual task.

...
When Oakley noticed how frequently Mac users praised M1 Macs for feeling incredibly fast—despite performance measurements that don't always back those feelings up—he took a closer look at macOS native task scheduling.

MacOS offers four directly specified levels of task prioritization—from low to high, they are background, utility, userInitiated, and userInteractive. There's also a fifth level (the default, when no QoS level is manually specified) which allows macOS to decide for itself how important a task is.

Apple's M1 chips have 4 performance CPU cores and 4 efficiency CPU cores. Background tasks are assigned to the efficiency ones, and interactive ones to the performance cores. That makes interaction very snappy, since background tasks don't interfere with interactive ones.

steve_bank · May 19, 2021

In the day the battle for market dsare between Intel and Motorola was legendary. One of the things that made Intel entrnched in PCs and other apps was maintaining backward compatibility. The x86 mode.

You can run old x86 software with little effort.

ARM has been arund for a while and believe its roots are in Motororola.

ARM does not manufacture. they license designs.

Microcomputers migrated to ARM. It means a common instruction set.

I'd think this will potentially throw a wrench into apps developers. I wonder if this is another Apple aggressive domination move.

lpetrich · Mar 25, 2022

List of iOS and iPadOS devices - Apple has been using ARM chips since the first iPhone, back in 2007, starting with a Samsung chip and continuing with its A series.

I've also found

Mac transition to Apple silicon and

Comparison of current Macintosh models

Which chips?

2020 Nov 17: M1 (MacBook Air, MacBook Pro 13", Mac Mini)
2021 May 21: M1 (iPad Pro, iMac)
2021 Oct 18: M1 Pro, M1 Max (MacBook Pro 14", 16")
2022 Mar 8: M1 (iPad Air), M1 Max, M1 Ultra (Mac Studio)

The Mac Pro and some Mac Minis are the only part of Apple's lineup that still uses an Intel-x86 CPU.

THe M1 Ultra chip is two M1 Max chips with a thin interconnect strip between them, thus making the two chips act like a single super chip.

Another feature of the M1 is that the RAM is in the same package as the CPU chip. That likely improves performance and reduces cost, though at the cost of less flexibility.

Apple silicon covers not only the M1 series but several other ARM-based chips that Apple has used over the years, like most of those in the various models of iPhone and iPad.

Apple M1 and

Apple M1 Pro and M1 Max which are

System on a chip - CPU, memory, GPU, etc.

lpetrich · Mar 25, 2022

Apple silicon -- one kind of CPU core, or else two kinds (fast, slow) with the same cache sizes. L1 is level 1, L2 level 2, L3 level 3, SLC system level cache (for the entire chip). L1 is split up into L1i for instructions and L1d for data. In the later ones at least (A14, A15, M1) each type of core shares the L2 cache.

Chip	# Cores	L1i KB	L1d KB	L2 MB	L3/SLC MB
(A1,2)	1	16	16
(A3)	1	32	32	0.25
A4	1	32	32	0.5
A5,5X,6,6X	2	32	32	1
A7,8	2	64	64	1	4
A8X	3	64	64	2	4
A9	2	64	64	3	4
A9X	2	64	64	3
A10	2 + 2	64	64	3	4
A10X	3 + 3	64	64	8	4
A11	2 + 4	64	64	8	4
A12	2 + 4	128	128	8	8
A12X,12Z	4 + 4	128	128	8	8
A13	2 + 4	128	128	8	16

With separate cache amounts for fast and slow cores

Chip	# Fast	L1i KB	L1d KB	L2 MB	# Slow	L1i KB	L1d KB	L2 MB	SLC
A14	2	192	128	8	4	192	128	4	16
A15	2	192	128	12	4	192	128	4	32
M1	4	192	128	12	4	128	64	4	16
M1 Pro	6, 8	192	128	24	2	128	64	4	32
M1 Max	8	192	128	24	2	128	64	4	64

lpetrich · Mar 25, 2022

I like the CPU-architecture code names:

Chip	Codename
A7	Cyclone
A8,8X	Typhoon
A9,9X	Twister

Chip	Fast	Slow
A10,10X	Hurricane	Zephyr
A11	Monsoon	Mistral
A12,12X,12Z	Vortex	Tempest
A13	Lightning	Thunder
A14 M1	Firestorm	Icestorm
A15	Avalanche	Blizzard

lpetrich · Mar 25, 2022

I'll now turn to the GPU's. In the A series, Apple used PowerVR GPU's, while in M1, Apple uses its own design. EU = execution unit, ALU = arithmetic-logical unit.

Chip	Make	# Cores	# EU/core	# ALU/EU	# AI cores
(A1,2)	PVR	1	1	8
(A3),A4	PVR	1	2	8
A5,5X,6	PVR	2,4,3	2	8
A6X	PVR	4	4	8
A7,8,8X,9	PVR	4,8,6,12	4	8
A10,10X	PVR	6,12	4	8
A11	APL	3	8	8	2
A12,12X,12Z	APL	4,7,8	8	8	8
A13	APL	4	8	8	8
A14	APL	4	16	8	16
A15	APL	4,5	32	8	16
M1	APL	7,8	16	8	16
M1 Pro	APL	14,16	16	8	16
M1 Max	APL	24,32	16	8	16

lpetrich · Mar 25, 2022

A M1 Ultra thus has 16 fast CPU cores, 4 slow CPU cores, as many as 64 GPU cores and 32 AI cores. The GPU cores in turn have a total of 1024 execution units and 8192 arithmetic-logic units.

Impressive parallelism.

Apple calls the fast cores "performance cores" and the slow cores "efficiency cores", since the slow ones are designed for low power consumption.

lpetrich · Mar 25, 2022

A notable feature of the

Mac Studio is that it's not a new iMac but a monitor-less box like the Mac Mini and the Mac Pro.

Seems like Apple is moving away from Steve Jobs's all-in-one design ideology. Though not expandable, the Mac Studio has a lot of ports on it: 2 USB-A ports, 2 USB-C 3.2 Gen 2, 4 Thunderbolt 4 USB-C 4.0 ports with HDMI 2.0, two 10Gb Ethernet ports, and a SD-card slot.

Comparison of current Macintosh models

The M1 chip is a

System on a chip but it is packaged with RAM to make a

System in a package

Perusing what Apple offers, I found out how much memory the chip packages can enclose. I've also included the size of each chip and how many transistors.

Chip	RAM: GB	Die size: mm^2	Transistors: billion
M1	8, 16	120	16
M1 Pro	16, 32	245	33.7
M1 Max	32, 64	432	57
M1 Ultra	64, 128	864	114

lpetrich · Nov 5, 2023

Since I last posted,

Apple silicon has gone to second and third generations: M2, M2 Pro, M2 Max, M2 Ultra, and M3, M3 Pro, M3 Max. So I'll update the tables.

This series is used in iPhones and some iPads:

Chip	# Fast	L1i KB	L1d KB	L2 MB	# Slow	L1i KB	L1d KB	L2 MB	SLC
A10	2	64	64	3	2	32	32	1	4
A10X	3	64	64	8	3	32	32	1	4
A11	2	64	64	8	2	32	32	1	4
A12	2	128	128	8	2	32	32	2	8
A12X	2	128	128	8	4	32	32	2	8
A12Z	2	128	128	8	4	32	32	2	8
A13	2	192	128	8	4	128	64	4	16
A14	2	192	128	8	4	128	64	4	16
A15	2	192	128	12	4	128	64	4	32
A16	2	192	128	16	4	128	64	4	24
A17	2	192	128	16	4	128	64	4	24

For some reason, most of these chips have more slow cores than fast cores. Could that be because these chips are for battery-powered devices?

Also, the A16 architectures are codenamed Everest and Sawtooth.

lpetrich · Nov 5, 2023

Now the Mac and some iPad chips. The M1 and M2 Ultra chips are two M1 or M2 Max chips connected to each other at their edges, and packaged in one chip package. There is no mention of a M3 Ultra, though Apple may eventually introduce one.

Chip	# Fast	L1i KB	L1d KB	L2 MB	# Slow	L1i KB	L1d KB	L2 MB	SLC
M1	4	192	128	12	4	128	64	4	8
M1 Pro	6, 8	192	128	24	2	128	64	4	24
M1 Max	8	192	128	24	2	128	64	4	48
M1 Ultra	16	192	128	48	4	128	64	8	96
M2	2	192	128	16	4	128	64	4	8
M2 Pro	6	192	128	32	4	128	64	4	24
M2 Max	8	192	128	32	4	128	64	4	48
M2 Ultra	16	192	128	64	8	128	64	8	96
M3	4	192	128	16	4	128	64	4
M3 Pro	5, 6	192	128	32	6	128	64	4
M3 Max	10, 12	192	128	32	4	128	64	4

Kinds of memory:

CPU registers
L1i - level-1 instruction cache
L1d - level-1 data cache
L2 - level-2 cache
SLC - system-level or level-3 cache
RAM
Persistent storage: disk drives or flash memory

As one goes down the list, the memory gets more and more capacious, but also slower and slower. There is a tradeoff between (1) speed and (2) expense and power consumption

lpetrich · Nov 5, 2023

Now the GPU's, the graphics accelerators. In the A series, Apple used PowerVR GPU's, and later its own design. In M1, Apple uses its own design. EU = execution unit, ALU = arithmetic-logical unit.

Chip	Make	# Cores	# EU/core	# ALU/EU	# AI cores
(A1,2)	PVR	1	1	8
(A3),A4	PVR	1	2	8
A5,5X,6	PVR	2,4,3	2	8
A6X	PVR	4	4	8
A7,8,8X,9	PVR	4,8,6,12	4	8
A10,10X	PVR	6,12	4	8
A11	APL	3	4	16	2
A12,12X,12Z	APL	4,7,8	4	16	8
A13	APL	4	4	16	8
A14	APL	4	4	16	16
A15	APL	4,5	4	32	16
A16	APL	5	4	32	16
A17	APL	6	4	32	16

lpetrich · Nov 5, 2023

Now the M series.

Chip	Make	# Cores	# EU/core	# ALU/EU	# AI cores
M1	APL	7,8	4	32	16
M1 Pro	APL	14,16	4	32	16
M1 Max	APL	24,32	4	32	16
M1 Ultra	APL	48,64	4	32	32
M2	APL	8,10	4	32	16
M2 Pro	APL	16,19	4	32	16
M2 Max	APL	30,38	4	32	16
M2 Ultra	APL	60,76	4	32	32
M3	APL	8,10	16	8	16
M3 Pro	APL	14,18	16	8	16
M3 Max	APL	30,40	16	8	16

The M3 has a design shift, toward using more EU's per core, but fewer ALU's per EU. The number of AI cores has stayed constant, however.

lpetrich · Nov 5, 2023

Chip	RAM: GB	Die size: mm^2	Transistors: billion
M1	8, 16	120	16
M1 Pro	16, 32	245	33.7
M1 Max	32, 64	432	57
M1 Ultra	64, 128	864	114
M2	8, 16, 24	155	20
M2 Pro	16, 32		40
M2 Max	32, 64, 96		67
M2 Ultra	64, 128, 192		134
M3	8, 16, 24		25
M3 Pro	18, 36		37
M3 Max	36, 96 / 48, 64, 128		92

Apple's M series seems to be a success so far, doing well against Intel and AMD chips, and it is the four CPU architecture of the Macintosh series:

Motorola 68K - IBM/Motorola PowerPC - Intel x86 - ARM

Also of note is the slow increase in clock speed for the newer chips, much slower than the increase in number of transistors or memory size. Given what CPU chip makers try to do, it seems like they are running into some physical limit.

Apple's upcoming CPU-architecture change: ARM-based "Apple Silicon"

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Diabetic retinopathy and poor eyesight. Typos ...

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor