torsdag 31 december 2020
Apple M1, future opportunities and challenges
torsdag 19 november 2020
Apple M1 the bigger picture
This will be an attempt to make a short yet slightly different analysis of the Apple M1 release. I will try not to repeat what's already been said 100 times.
I see a lot of raving about M1 and to be fair it seems amazing, on the other hand for us who has been around computing since the 90's this is nothing new. In fact we used to get similar news and 50% improvements every year in the mid/late 90's. In some areas (eg: GPU's) the development has been fantastic even in the later years. But in CPU's well, we got constant improvements but not that much to rave about for the mainstream customers, until Apple released M1. So what have they really done?
On a high level they have done something very simple, they broke rules that didn't even exist. At it's core (pun intended) there is nothing new, I don't see a single piece of technology invented in M1. Not even the integration aspect of M1 is new, it has been done for "ages" in mobile phones. What is new is the design decision to use this highly integrated design in a traditional computer.
For most mainstream users of computing the machine is simply a black box, you don't change its internals once you've bought it, yet PC designs have been highly modular, with all the costs and issues that comes with modular design but reaping few of the benefits. This is more tradition than anything else, one manufacturer creates a dozen or so CPU designs that roughly fit everything from tablets to data centers and supercomputers (with a tweak), these are then combined with memory, storage and frequently graphics to build the consumer machine.
This was great for the average consumer at the time when you from one year to another could upgrade memory with four times the capacity or buy a new cpu and extend the lifetime of the machine with a few more years. But as the improvements reached diminishing returns fewer and fewer upgraded, and fewer and fewer saw any need to do so. Modularity became a feature not put in real use by the majority of the customers, they basically sponsored other groups of customers. However the approach to delivering systems was such a well oiled machine and worked for the benefit of the systems builders that nobody questioned the necessity of it, or the cost...
What cost? Modularity is great! Yes, when you use the flexibility frequently, having the ability to have a trailer after your car is great when you do it a few times a year, if you never do it you don't need the towing hook, if you do it every day a pickup might be better. For a cpu to have external memory modules is great if you do change memory chips, if not the cost is reduced memory bandwidth, higher latency and more power used, all for nothing. Some computing problems require more or less memory, but th 16GB that you can get in an M1 is what has been roughly the standard in home computing for the last 5 years or so anyway.
There are problems that just crave those 64GB of memory to be computed efficiently but if that is your use case then you are way outside of the target audience for this chip. There are customers who benefit greatly from upgrading their computers with new graphics cards (or even multiple) and just need the extra RAM, but the M1 is not a computing revolution, the fundamentals of how computers process problems has not changed. The M1 is great optimization for a large group of mainstream Apple customers, a well designed system on a single chip that will be amazing for those it fits, not a silver bullet for all of them. Even being a computer enthusiast I appreciate what Apple has done, the M1 (and its successors) will be a great benefit for a huge amount of customers, for customers who previously didn't really have products designed specifically for them, there will likely be options for the rest of us as well later.
A concern with highly integrated products is that the whole package need to last equally long. So even if we have leveled of the development in cpu performance and memory requirements, maybe we will soon see a big shift in AI/ML processing, if this happens the whole device might feel very aged unless an external AI/ML processor can be added via the USB-C connector. In a traditional modular PC there are more options available for such upgrades. In a worst case scenario dominant manufacturers will drive specific development to make older devices obsolete on purpose. A strong second hand market and options to upgrade or replace specific components such as cpu's and battery would be a reasonable mitigation of this if priced fairly.
If Apple has done their homework (and they usually do) they have just created an integrated version of what was previously available as stand alone components and well known approaches in mobile phone technology. If they've done the maths properly this chip will cater to and a be a big benefit for around 80% of their users, to deal with the remaining 20% they will likely offer another chip that is great for 80% of that group and the remaining users will likely get a different and possibly more traditionally modular and likely very expensive offering.
When Apple released the iPhone some companies and people laughed at it, I think most people recognize the same pattern this time. And most likely history will repeat itself, just as Android became a competitor for IoS devices we will likely see similar developments in PC's, will it be similarly integrated machines from Microsoft, Ubuntu or Chromebooks? Hard to say but this change has been long overdue and will likely result in similar devices from other manufacturers soon, hopefully sooner rather than later for the benefit of all customers.
måndag 1 januari 2018
NAS cpu performance
Traditional NAS benchmarks often focus on file transfer over network which indeed is the core use case for most. But when desktop usage is declining more and more tasks are shifted to NAS devices and this means new capabilities are needed, running virtual machines and media playback is no longer exotic or a niche.
NAS benchmarks needs to evolve to give relevant guidance to buyers for these new scenarios, network transfer performance is no longer enough to make a decision. This need is emphasized by the huge difference in performance between different devices, for devices sometimes at similar price.
A very basic test can be run on almost any linux based NAS unit. It doesn't nearly give complete information but it can give a very rough estimate of single thread CPU performance, it's not a good complete benchmark, but it's better than nothing:
time $(i=0; while (( i < 999999 )); do (( i ++ )); done)Results for a few Synology devices (lower time is better):
DS115J | 68s |
DS212 | 67s |
DS213 | 59s |
DS214+ | 42s |
DS412+ | 22s |
DS413J | 67s |
DS414 | 40s |
DS415+ | 14s |
DS416play | 12s |
DS718+ | 7.7s |
DS916+ | 11s |
DS918+ | 7.8s |
DS1512+ | 22s |
DS1812+ | 21s |
DS1815+ | 13s |
DS3018xs | 4.1s |
RS3617RPxs | 3.5s |
As the results show, devices have vastly different CPU capabilities despite all of them capable of transferring files at very high speed.
onsdag 15 april 2015
Google BigQuery compared to Amazon Redshift
The general capability to easily handle incredible amounts of data is the same, it is truly mindboggling how these services allows you to handle multi billion row datasets. Just as the case was with Redshift it really helps to have the raw data available in a format that is heavily compressed (to reduce storage cost) and easy to process in a place that BigQuery can access efficiently. For BigQuery storing the raw data in Google Cloud Storage makes loading operations simple and fast.
Operating these two solutions is very different, where Redshift has a web ui that allows you to manage your cluster with all the different aspects of it (hardware type and number of nodes etc) BigQuery is more of a service offering that reliefs you of all the infrastructure details. The main benefit with the BigQuery model is ease of use and quick scalability (no need to resize clusters) and the main benefit with Redshift is that you really feel that your data is on your platform, not in a shared service (the kind of minor point that still seems to be important in some contexts).
Loading data is done with a batch load command (like the Redshift copy command), it has a wizard like user interface for configuring the details of the loading. Although I was seriously impressed with the fantastic performance of my large Redshift clusters BigQuery was even faster (single digit minutes instead of two digit minutes). The batch load wizard is simple to operate but I lack some of the flexibility in the Redshift copy command and I really missed the excellent result reports that you could get after a load operation. Due a weirdness in the internal functions of Google Cloud Storage and lack of result feedback I really struggled with data loading initially but the Google support was beyond expectations and helped me quickly with an immediate workaround and has fixed the problem now.
In terms of performance the services are quite a bit different. On BigQuery the performance is very consistent regardless of the size of the dataset, on Redshift you can determine the performance by scaling the cluster size (at a cost though). In general I think Google has managed to strike a good enough balance for me to not care about it at just be happy that I don't have to think about it. When factoring in the large cluster size you need on Redshift to get comparable performance I'd say you are likely to have better performance on BigQuery unless you are willing to spend a lot.
Web UI, a really nice feature BigQuery to get going quickly or for doing the odd ad-hoc query is that you don’t need any tools, there is a basic sql query interface built into the web console.
The pricing of the services is difficult to compare since you pay for cluster runtime in the case of Redshift compared to storage and queries in the case of BigQuery. For my scenario with fairly large data volumes and a pattern of short periods of intense querying with long periods of low to none quering BigQuery is more than a factor 10 cheaper for similar performance. This cost comes from the need to continuously running a Redshift cluster for low volumes of ad-hoc queries, you trade of this low latency access and high cost to a long latency access at a lower cost (eg: starting and restoring the cluster when you need it) but with BigQuery I get the best of both worlds, paying for storage needed is still very cheap for huge datasets compared to running a cluster. Also note that with the super fast data loading in BigQuery you can have even less data loaded and keep more raw data compressed instead of loaded. The largest cost for BigQuery is the query cost, paying for data processed when having large amounts of data and a service that can process terabytes in a few seconds can hit you unexpectedly, the feeling of paying for every single select statement is a bit nagging but in the end $5 per terabyte of processed data is fairly cheap and as long as you don’t query all the columns in the table you can make pretty efficient queries. It is probably worth while to consider the different pricing models for your specific workload, in some cases (obviously in my case) the difference is huge.
fredag 10 oktober 2014
AWS Redshift tinkering
But every once in a while you come across that project where the data analysis needs are just that much greater. The last few days I've been doing my data analysis against some different AWS Redshift clusters. Some simple lessons learned are:
Size matters, when working with terrabytes of data even if you can load it into a fairly small cluster you need dozens of machines to get decent performance. At least for my use case with log files from webb applications it's best to go for the SSD nodes with less storage but more powerful machines, and to make sure to have as many as possible. You might want to contact Amazon to raise the node limit from the start.
Use the copy command and sacrifice a small bit of quality for shorter lead times. Depending on your options to continually load the data you might not need to optimize this but if you like me always have more systems and logs than you'd ever have capacity to keep in your database it becomes important to load the dataset you want fairly fast. If you store your logs on S3 it is simple to use the copy command to load surprising amounts of data in a few minutes provided you have a large enough cluster.
Beware of resizing the cluster with tons of data, if possible just empty the cluster and reload the new cluster. When loading from S3 you don't have any extra cost for data transfer as long as you keep the cluster in the same region as the log files. If the cluster is empty you can often do a resize in less then half an hour sometimes closer to fiften minutes.
tisdag 13 augusti 2013
CDN? The ISP's are doing it for you!
söndag 23 december 2012
Arm'ed for 2013
I've had at least 3-5 computers any given point in time the last 15 years. They've all been x86 machines, mainly destops. I still have those, but 80% of my usage is now on ARM devices.
How did it happen?
1. The IPad is my main surf/email/game/tv device.
2. The big file server is now an archive machine (rarely used) and a Synology diskstation is the main file server complementing the IPad with storage.
There are only three main things that I do on what used to be the "main rig", media encoding, FPS games and work (coding). Once I'm tired of the FPS games I still run on it, it will go and I'll just have the laptop left to work on.
It is fantastic how far I get now with soo little, instead of big bulky computers two small devices takes care of all my computing needs. The don't allow all the tinkering I do love but they work, silently always on always at hand.
Still I'm writing this on my Windows/Intel computer, why? It has the best keyboard, but I guess in 2013 I'll buy a nice keyboard for my IPad. Didn't see that one coming, I wonder what 2013 will bring...