Albert's notes on development and performance

torsdag 31 december 2020

Apple M1, future opportunities and challenges

Previous experiences

For many the introduction of the M1 chips by Apple felt like a revolution or at least significant innovation. For some of us it is just a modern implementation of a very old concept, this can be compared to electrical cars which where first built in the 1890's, went out of fashion and have now returned. What can we learn from the previous attempts of the same technical approach, what can we guess will be the next steps for Apples M1 chips and what opportunities and challenges lie ahead? This article is a stand-alone second part to (Apple M1 the bigger picture)

Historical context

For those of you familiar with the technical architecture of the Amiga or similar computers of its time feel free to skip.

In essence there was in the early nineties several computers available that in addition to relying on cpu manufacturers (eg: Motorola & Intel etc) developed additional pieces of silicon to complement the main cpu. Commodore did this to a huge extent in the Amiga chipsets but others including Apple did so too. Some of these processors where pretty simple and handled disk IO and other fairly simple tasks for the CPU, things that has since moved into motherboards or controller chips inside hard disk drives. A common strategy at the time was to use shared memory between these chips and let them execute fairly high level sub routines themselves to avoid bogging down the low power CPU's of the time with these tasks.

Even floating point calculations wheren't integrated in the CPU's of this time, if you wanted that you had to by an additional co-processor. Memory expansion was possible but there was a difference between expandable memory that could only be used by the CPU (in the Amiga called "Fast RAM") and chipset memory that was shared between co-processors.

The results where pretty good, despite the limitations of the CPU's they could still run graphical user interfaces nicely and multitask several time sensitive operations in parallell (eg: play games, display video and play music in sync). All while using just a small amount of power, cooling CPU's with a heatsink was not common, active cooling with fans wasn't a thing.

And they all died

Newer and faster CPU's appeared. Despite their improved performance there where still many things they couldn't do as well, multimedia and multitasking on early PC's was poor (to some extent due to software as well to be fair), UI's where slow and unresponsive and power use was very high. For many years there where things that huge, expensive, power hungry PC's couldn't do as well as the old computers they gradually replaced.

Yet the flexibility of added CPU power created the ability to handle new problems. The modular approach with discrete sound cards and graphics cards complemented the CPU with the needed capabilities to handle graphics, sound and IO. The new computers had motherboards allowing the expansion of RAM to previously unheard amounts.

Some of the most fundamental co-processors became targets for integration in the growing CPU's (eg: Floating point operations), many CPU's of today contain all kinds of multimedia extensions and graphics capabilities.

The pace of innovation was incredible and the old school of highly integrated chipsets died away, it just wasn't flexible enough and couldn't keep up with the innovation pace.

What is different now?

The biggest difference is maturity in the computing industry, not much has really changed the last few years.

Back in the days the introduction of a file format like JPEG was a huge challenge to older computers but today new radically different codecs are few and far between. It will be much easier for Apple to keep up with supporting these changes with tweaks to their CPU's and which codecs they accelerate via specialized processor features and most of what can be handled by an Intel/AMD CPU can be handled in the M1 anyway as they are fairly similar in computing power.

When it comes to RAM, then it doubled every year, what was great when you bought it was a critical problem two years later, modularity and expansion was a key benefit. Virtual memory existed but swapping memory to rotating disks was a very poor workaround to having more RAM. The consumer PC's of the last decade has all been working pretty well with 8-16GB RAM, few applications run orders of magnitudes fast from having more. Swapping to SSD's is still slower than RAM but the slowdown compared to spinning drives is much less noticeable, if this happens a few times a day it's not a big deal.

Computing has shifted from being a desk only activity to becoming something that is assumed to be portable. As the emphasis on portability has increased PC's have become less and less modular themselves. Few laptops today offer a generic internal connector for a high speed co-processing unit, the only I can think of is sockets for RAM and M2 slots which are typically occupied. The risk is much lower that a huge shift will be introduced by an amazing M2 card that everybody will buy to radically increase the capabilities of their devices, in the bad old days, upgrading add-on cards was common (eg sound, graphics, hard drive controllers etc).

It boils down to, when did you last hear of a new type of application that required a computer upgrade? For most consumers they can hardly remember this happening, pace of innovation is still fast but doesn't require you to change your device. This is a huge benefit for optimizing for efficiency and integration.

Opportunities and challenges

Multitasking is something we all take for granted compared to 30 years ago there is very little benefit of specialized processors for this today. Most computers even on the low end has multiple CPU cores and software is good enough to make up for lack of parallel processing. The opportunity for integration is less, but so is the risk. Possibly some new AI processing devices will change that soon but any non modular PC design will be at an equal playing field anyway, and this isn't where Apple's M1 is competing.

When it comes to memory capacity there will a small challenge to cater with enough flexibility for professional users that might want (or even really need) more RAM. The old "Fast RAM" trick can of course be applied and would probably be a very efficient and flexible solution to let those who like add another 32-64GB of RAM. The other option is of course to introduce a CPU model with 32GB of RAM on the package, that would increase both the capacity and likely also the performance, sounds simple but it would require space and does use more power so it's not without at least issues.

When it comes to performance a very close integration between the CPU and its memory has a significant benefit, with low latency connection you get even better improvements from adding even faster RAM. The M1 is using DDR4 today, this could be upgraded in the future with any of several higher performance options (eg HBM or DDR5/6)

From a marketing perspective when you control integrated features in the processor as well as the software used on all machines you'll have a much faster/easier path for optimizing specific work flows. I'd expect future Apple marketing to boast specifically about acceleration of specific work loads with numbers at least in the 200% plus region. These improvements will be impressive from a marketing point of view, and offer huge improvements for some customers. It will be a challenge over time to find these optimizations that matter to enough customers though as less and less things ever feel slow.

There will be a challenge in offering high power graphics capabilities. For now it seems like Apple is mostly staying out of that, but eventually they might want a piece of the gaming market. Today there is still a general acceptance that you do gaming on a different device (eg Console or gaming PC). There are two risks in this area, if the interest in this market expands heavily it might force Apple back to support external graphics cards, not a big deal but some of the current benefits of avoiding that will diminish. The second and less likely but higher impact risk is that a new type of killer app emerges that benefit increadibly from having that high power gfx card even in your laptop, then Apple will need to adapt very quickly.

Summary

The opportunity of beating everything else on efficiency for a few years is significant, compared to what has been released now it is a question to all competitors how long they have this exclusive benefit. For a large chunk of the mainstream consumer computer market this looks like a very rational and sensible direction. The near term challenges are minor and have several well known solutions. In the long term there is no knowing how fast Apple will be to react and adapt to shift in the market, or if they will even be leading and driving that change.

Apple M1, the modern version of the computer I used to love.

torsdag 19 november 2020

Apple M1 the bigger picture

This will be an attempt to make a short yet slightly different analysis of the Apple M1 release. I will try not to repeat what's already been said 100 times.

I see a lot of raving about M1 and to be fair it seems amazing, on the other hand for us who has been around computing since the 90's this is nothing new. In fact we used to get similar news and 50% improvements every year in the mid/late 90's. In some areas (eg: GPU's) the development has been fantastic even in the later years. But in CPU's well, we got constant improvements but not that much to rave about for the mainstream customers, until Apple released M1. So what have they really done?

On a high level they have done something very simple, they broke rules that didn't even exist. At it's core (pun intended) there is nothing new, I don't see a single piece of technology invented in M1. Not even the integration aspect of M1 is new, it has been done for "ages" in mobile phones. What is new is the design decision to use this highly integrated design in a traditional computer.

For most mainstream users of computing the machine is simply a black box, you don't change its internals once you've bought it, yet PC designs have been highly modular, with all the costs and issues that comes with modular design but reaping few of the benefits. This is more tradition than anything else, one manufacturer creates a dozen or so CPU designs that roughly fit everything from tablets to data centers and supercomputers (with a tweak), these are then combined with memory, storage and frequently graphics to build the consumer machine.

This was great for the average consumer at the time when you from one year to another could upgrade memory with four times the capacity or buy a new cpu and extend the lifetime of the machine with a few more years. But as the improvements reached diminishing returns fewer and fewer upgraded, and fewer and fewer saw any need to do so. Modularity became a feature not put in real use by the majority of the customers, they basically sponsored other groups of customers. However the approach to delivering systems was such a well oiled machine and worked for the benefit of the systems builders that nobody questioned the necessity of it, or the cost...

What cost? Modularity is great! Yes, when you use the flexibility frequently, having the ability to have a trailer after your car is great when you do it a few times a year, if you never do it you don't need the towing hook, if you do it every day a pickup might be better. For a cpu to have external memory modules is great if you do change memory chips, if not the cost is reduced memory bandwidth, higher latency and more power used, all for nothing. Some computing problems require more or less memory, but th 16GB that you can get in an M1 is what has been roughly the standard in home computing for the last 5 years or so anyway.

There are problems that just crave those 64GB of memory to be computed efficiently but if that is your use case then you are way outside of the target audience for this chip. There are customers who benefit greatly from upgrading their computers with new graphics cards (or even multiple) and just need the extra RAM, but the M1 is not a computing revolution, the fundamentals of how computers process problems has not changed. The M1 is great optimization for a large group of mainstream Apple customers, a well designed system on a single chip that will be amazing for those it fits, not a silver bullet for all of them. Even being a computer enthusiast I appreciate what Apple has done, the M1 (and its successors) will be a great benefit for a huge amount of customers, for customers who previously didn't really have products designed specifically for them, there will likely be options for the rest of us as well later.

A concern with highly integrated products is that the whole package need to last equally long. So even if we have leveled of the development in cpu performance and memory requirements, maybe we will soon see a big shift in AI/ML processing, if this happens the whole device might feel very aged unless an external AI/ML processor can be added via the USB-C connector. In a traditional modular PC there are more options available for such upgrades. In a worst case scenario dominant manufacturers will drive specific development to make older devices obsolete on purpose. A strong second hand market and options to upgrade or replace specific components such as cpu's and battery would be a reasonable mitigation of this if priced fairly.

If Apple has done their homework (and they usually do) they have just created an integrated version of what was previously available as stand alone components and well known approaches in mobile phone technology. If they've done the maths properly this chip will cater to and a be a big benefit for around 80% of their users, to deal with the remaining 20% they will likely offer another chip that is great for 80% of that group and the remaining users will likely get a different and possibly more traditionally modular and likely very expensive offering.

When Apple released the iPhone some companies and people laughed at it, I think most people recognize the same pattern this time. And most likely history will repeat itself, just as Android became a competitor for IoS devices we will likely see similar developments in PC's, will it be similarly integrated machines from Microsoft, Ubuntu or Chromebooks? Hard to say but this change has been long overdue and will likely result in similar devices from other manufacturers soon, hopefully sooner rather than later for the benefit of all customers.

måndag 1 januari 2018

NAS cpu performance

Network attached storage units (NAS units) are increasingly popular as the needs for storage increase and more and more people don't keep a traditional desktop computer with room for multiple disk drives for storage.

Traditional NAS benchmarks often focus on file transfer over network which indeed is the core use case for most. But when desktop usage is declining more and more tasks are shifted to NAS devices and this means new capabilities are needed, running virtual machines and media playback is no longer exotic or a niche.

NAS benchmarks needs to evolve to give relevant guidance to buyers for these new scenarios, network transfer performance is no longer enough to make a decision. This need is emphasized by the huge difference in performance between different devices, for devices sometimes at similar price.

A very basic test can be run on almost any linux based NAS unit. It doesn't nearly give complete information but it can give a very rough estimate of single thread CPU performance, it's not a good complete benchmark, but it's better than nothing:

time $(i=0; while (( i < 999999 )); do (( i ++ )); done)

Results for a few Synology devices (lower time is better):

DS115J	68s
DS212	67s
DS213	59s
DS214+	42s
DS412+	22s
DS413J	67s
DS414	40s
DS415+	14s
DS416play	12s
DS718+	7.7s
DS916+	11s
DS918+	7.8s
DS1512+	22s
DS1812+	21s
DS1815+	13s
DS3018xs	4.1s
RS3617RPxs	3.5s

As the results show, devices have vastly different CPU capabilities despite all of them capable of transferring files at very high speed.

onsdag 15 april 2015

Google BigQuery compared to Amazon Redshift

After tinkering with Amazon Redshift I was interested to compare it with Google BigQuery. Two solutions that both are built for analysis of huge datasets, how different can they be? It turns out that they have much in common but in some areas they are very different beasts…

The general capability to easily handle incredible amounts of data is the same, it is truly mindboggling how these services allows you to handle multi billion row datasets. Just as the case was with Redshift it really helps to have the raw data available in a format that is heavily compressed (to reduce storage cost) and easy to process in a place that BigQuery can access efficiently. For BigQuery storing the raw data in Google Cloud Storage makes loading operations simple and fast.

Operating these two solutions is very different, where Redshift has a web ui that allows you to manage your cluster with all the different aspects of it (hardware type and number of nodes etc) BigQuery is more of a service offering that reliefs you of all the infrastructure details. The main benefit with the BigQuery model is ease of use and quick scalability (no need to resize clusters) and the main benefit with Redshift is that you really feel that your data is on your platform, not in a shared service (the kind of minor point that still seems to be important in some contexts).

Loading data is done with a batch load command (like the Redshift copy command), it has a wizard like user interface for configuring the details of the loading. Although I was seriously impressed with the fantastic performance of my large Redshift clusters BigQuery was even faster (single digit minutes instead of two digit minutes). The batch load wizard is simple to operate but I lack some of the flexibility in the Redshift copy command and I really missed the excellent result reports that you could get after a load operation. Due a weirdness in the internal functions of Google Cloud Storage and lack of result feedback I really struggled with data loading initially but the Google support was beyond expectations and helped me quickly with an immediate workaround and has fixed the problem now.

In terms of performance the services are quite a bit different. On BigQuery the performance is very consistent regardless of the size of the dataset, on Redshift you can determine the performance by scaling the cluster size (at a cost though). In general I think Google has managed to strike a good enough balance for me to not care about it at just be happy that I don't have to think about it. When factoring in the large cluster size you need on Redshift to get comparable performance I'd say you are likely to have better performance on BigQuery unless you are willing to spend a lot.

Web UI, a really nice feature BigQuery to get going quickly or for doing the odd ad-hoc query is that you don’t need any tools, there is a basic sql query interface built into the web console.

The pricing of the services is difficult to compare since you pay for cluster runtime in the case of Redshift compared to storage and queries in the case of BigQuery. For my scenario with fairly large data volumes and a pattern of short periods of intense querying with long periods of low to none quering BigQuery is more than a factor 10 cheaper for similar performance. This cost comes from the need to continuously running a Redshift cluster for low volumes of ad-hoc queries, you trade of this low latency access and high cost to a long latency access at a lower cost (eg: starting and restoring the cluster when you need it) but with BigQuery I get the best of both worlds, paying for storage needed is still very cheap for huge datasets compared to running a cluster. Also note that with the super fast data loading in BigQuery you can have even less data loaded and keep more raw data compressed instead of loaded. The largest cost for BigQuery is the query cost, paying for data processed when having large amounts of data and a service that can process terabytes in a few seconds can hit you unexpectedly, the feeling of paying for every single select statement is a bit nagging but in the end $5 per terabyte of processed data is fairly cheap and as long as you don’t query all the columns in the table you can make pretty efficient queries. It is probably worth while to consider the different pricing models for your specific workload, in some cases (obviously in my case) the difference is huge.

fredag 10 oktober 2014

AWS Redshift tinkering

For a long time I've used a little hack (http://www.albert.nu/programs/filelinestatistics/) written in my spare time to do ad hoc analysis of large amounts of log files. With a decent sized machine to run it on there was no problem to dig in and query for any aggregation or finding details in gigabytes of compressed log files.

But every once in a while you come across that project where the data analysis needs are just that much greater. The last few days I've been doing my data analysis against some different AWS Redshift clusters. Some simple lessons learned are:

Size matters, when working with terrabytes of data even if you can load it into a fairly small cluster you need dozens of machines to get decent performance. At least for my use case with log files from webb applications it's best to go for the SSD nodes with less storage but more powerful machines, and to make sure to have as many as possible. You might want to contact Amazon to raise the node limit from the start.

Use the copy command and sacrifice a small bit of quality for shorter lead times. Depending on your options to continually load the data you might not need to optimize this but if you like me always have more systems and logs than you'd ever have capacity to keep in your database it becomes important to load the dataset you want fairly fast. If you store your logs on S3 it is simple to use the copy command to load surprising amounts of data in a few minutes provided you have a large enough cluster.

Beware of resizing the cluster with tons of data, if possible just empty the cluster and reload the new cluster. When loading from S3 you don't have any extra cost for data transfer as long as you keep the cluster in the same region as the log files. If the cluster is empty you can often do a resize in less then half an hour sometimes closer to fiften minutes.

tisdag 13 augusti 2013

CDN? The ISP's are doing it for you!

While analyzing the CDN access logs for a site I realized that the ratio of pageviews per visit didn’t at all reflected the amount of css/js files that was transferred. Considering browser cache I expected roughly one css/js access per visit, perhaps slighly less considering some return visitors.

I found that access to css/js was a factor 10 to 100 less than expected. I also found that a small bunch of IP's where causing huge amounts of traffic. The top IP’s causing traffic to the site fell into two categories, crawlers (googlebot, bingbot and similar) and ISP’s.

Obviously crawlers don’t need to get the css/js files over and over again but ISP’s not getting them when the traffic is obviously from multiple clients behind a large NAT setup or similar, why? Thinking about it for a while my best guess is that they simply do what the CDN does, pass the of traffic through a caching proxy and cache everything according to http headers, with some extra intelligence to figure out what is important enough to stay in the cache.

This has two important implications:

1. If you think you can control caching by controlling your CDN configuration and using the CDN purge function you are wrong. Just as something stuck in browser cache being out of your control the ISP cache is also out of your control.

2. If you don't cache bust your resources properly you'll end up with a lot of weird behaviour for users with ISP's that have a cache.

If enough ISP's start doing this and do it well, I even see this as an important improvement to the overall performance of the Internet. The ISP cache would be close to the user and in terms of traffic and end user performance it is a win-win for both the ISP and the site owner. This is really a half decent content delivery network and it's free!

If someone has insight into ISP's it would be interesting to hear what technology they are using and how they are thinking. My findings might be specific to a site targeting mobile users, maybe mobile operators are more aggressive in this area? But it would be really beneficial to all ISP's.

Surprised? I've always known that there is a potential for all kinds of proxies around the Internet. I just never seen it in effect, and I certainly didn't expect it to be this effective!

söndag 23 december 2012

Arm'ed for 2013

It wasn't planned and it wasn't expected it just happened, my main computers are now ARM-based. I could have sworn that it would not happen, but it did.

I've had at least 3-5 computers any given point in time the last 15 years. They've all been x86 machines, mainly destops. I still have those, but 80% of my usage is now on ARM devices.

How did it happen?
1. The IPad is my main surf/email/game/tv device.
2. The big file server is now an archive machine (rarely used) and a Synology diskstation is the main file server complementing the IPad with storage.

There are only three main things that I do on what used to be the "main rig", media encoding, FPS games and work (coding). Once I'm tired of the FPS games I still run on it, it will go and I'll just have the laptop left to work on.

It is fantastic how far I get now with soo little, instead of big bulky computers two small devices takes care of all my computing needs. The don't allow all the tinkering I do love but they work, silently always on always at hand.

Still I'm writing this on my Windows/Intel computer, why? It has the best keyboard, but I guess in 2013 I'll buy a nice keyboard for my IPad. Didn't see that one coming, I wonder what 2013 will bring...