måndag 30 september 2024

NotebookLM

A colleague showed me Google’s NotebookLM (NLM from now on) service and I just tried it out and tested it with the meanest and gnarliest test I could figure out. Here’s my review:

What is it?

Basically it’s a service that allows you to upload a set of documents and use the content that you upload as a knowledge base to ask questions to. Much like ChatGPT but with your own dataset. This is similar to other services Google already offers in GCP but with a friendly and easy to use consumer user interface. Uploading of files accepts multiple formats and also allows for selecting documents from Google drive (docs and presentations).

The test

I uploaded five books at around 50K words in total. That means the total material is similar in length to two average murder mystery books or one thick fantasy novel. I chose to upload my own books specifically because they are not known by AI services like ChatGPT, this allows me to test the learning and reasoning skills of the service on a material that I (together with a few fans) is a top authority on. I know of things that are vaguely described and developments that take place over multiple books.


Amazon link to the test material, target audience are casual fantasy readers. No spoilers below, character names are replaced with aliases, the same aliases are reused in multiple questions for different characters. 

The results

For basic factual questions NLM is surprisingly good and gets even tricky questions that require understanding a full paragraph to successfully generate a correct answer. The references it provides to where it found the information are spot on, which if nothing else saves you time to find relevant sections in the material. However it frequently messes up small details in facts.

Example: Where was a certain sword found?

Answer: A, B and C went to place X and person E handed them her sword.

Correct answer: A, B, C and D went to place X and person E gave them the sword that belonged to place X.


NLM can map facts and draw simple conclusions in multiple directions.

Example: Who are X’s sisters?

Answer: A, B, C are X’s sisters.

This is correct despite the fact that Y is mentioned as the mother of A, B, C and then in a different section mentioned as the mother of X. In one separate paragraph C is distinctly mentioned as the sister of X, and this is specifically mentioned by NLM as well.


For developments over multiple chapters or books NLM easily finds even vague hints or evidence to responses for a question. Clearly it’s not relying on searching through for words or terms that you ask for, it’s much better than that.

Example: Who is X attracted to? Answer:

  • There is no clear evidence of a romantic interest for X, however there are hints that X is attracted to Y, because:
  • There are passages when X thinks of Y outside of the current task at hand
  • X asks her friends what they think of the appearance of Y, indicating that X considers Y attractive.
  • X has questions about the professional relationship between Y and Z which could be a hint of jealousy.
  • All these are spot on examples with correct references throughout multiple books.


    NLM is fairly good at looking at longer passages as a single piece of information. Facts or developments that are described over a paragraph up to a page or two are easily understood as a complete text. As per the above example of X’s sisters it seems to be able to add facts from different areas as long as they are simple enough. However when multiple pages or understanding of a complete chapter it fails.

    Example: What’s X’s real name?

    Answer: Even though we get insight into the history of X, her real name is never mentioned.

    The name is mentioned in the first sentence of the first chapter, which then goes on to describe an event which at the end reveals her new taken name X. Ironically when later asked who’s X’s mother, it replies that it is described that X changes her identity, referring to the correct section.


    Reasoning and calculating, using facts is something that NLM can do, but with low precision.

    Example: Who’s the oldest elf and how old is she?

    Answer: The oldest elf is X and she’s 2700 years old. Elves are immortal and can live for thousands of years. Elves are not described with exact age but mentioned in relation to events, eg: X and Y lived before the war, which took place more than two thousand years ago, and they were around 500 years old at the time.

    Correct answer: K is the oldest elf mentioned, he is over six thousand years old. The oldest female elf is G, who’s the grandmother of X although her age is not mentioned. Interestingly if asked who’s the mother of X, NLM correctly replies A. If asked about A’s age, NLM reasons that since X is the daughter at 2700 years old, A must be born before the world war, and at least more than 500 years earlier just like X and must then be at least 2500 years old. This a lot of needless reasoning that ends up with the wrong information. When asked about Z’s age, it suggests he’s 4407 years old based on completely correct facts (with the exception of the “7” which it just randomly adds), calculated in the wrong way. It fails to handle what happens before and after historic events, and it fails to factor in clear references to when an event is described to occur. It can also reason about the fact that N is the grandfather of Z and must be at least 11000 years old, which is not wrong although if correctly calculated could be concluded to be at least 17000 years old (if still alive, which is not clear).


    NLM can detect patterns when used frequently although it has trouble figuring out easily derived information based on context. Although not described in the books all elven names follow a strict <First name> ai/ei <First name of a parent> pattern. NLM concludes this easily and when asked: “What’s the structure of elven first and last names?” It describes the above in a wordy way. It correctly detects that “ai” and “ei” is used to bind the two pieces together but fails to figure out that “ai” means “daughter of” and “ei” means “son of” which is easily deduced when reading the names in context of the story.

    Conclusion

    When it comes to finding basic facts in a data set this size, NLM performs very well. When it comes to reasoning about the material it demonstrates several basic relationships of various kinds. There are some gaps in how it can connect facts but depending on how questions are asked it can puzzle together sections of different books to arrive at correct answers. However even basic arithmetics is hit and miss experience, at best. Applying mathematical operations based on context seems to be limited to—or at least default to—addition.


    torsdag 31 december 2020

    Apple M1, future opportunities and challenges

    Previous experiences

    For many the introduction of the M1 chips by Apple felt like a revolution or at least significant innovation. For some of us it is just a modern implementation of a very old concept, this can be compared to electrical cars which where first built in the 1890's, went out of fashion and have now returned. What can we learn from the previous attempts of the same technical approach, what can we guess will be the next steps for Apples M1 chips and what opportunities and challenges lie ahead? This article is a stand-alone second part to (Apple M1 the bigger picture)

    Historical context

    For those of you familiar with the technical architecture of the Amiga or similar computers of its time feel free to skip.

    In essence there was in the early nineties several computers available that in addition to relying on cpu manufacturers (eg: Motorola & Intel etc) developed additional pieces of silicon to complement the main cpu. Commodore did this to a huge extent in the Amiga chipsets but others including Apple did so too. Some of these processors where pretty simple and handled disk IO and other fairly simple tasks for the CPU, things that has since moved into motherboards or controller chips inside hard disk drives. A common strategy at the time was to use shared memory between these chips and let them execute fairly high level sub routines themselves to avoid bogging down the low power CPU's of the time with these tasks.

    Even floating point calculations wheren't integrated in the CPU's of this time, if you wanted that you had to by an additional co-processor. Memory expansion was possible but there was a difference between expandable memory that could only be used by the CPU (in the Amiga called "Fast RAM") and chipset memory that was shared between co-processors.

    The results where pretty good, despite the limitations of the CPU's they could still run graphical user interfaces nicely and multitask several time sensitive operations in parallell (eg: play games, display video and play music in sync). All while using just a small amount of power, cooling CPU's with a heatsink was not common, active cooling with fans wasn't a thing.

    And they all died

    Newer and faster CPU's appeared. Despite their improved performance there where still many things they couldn't do as well, multimedia and multitasking on early PC's was poor (to some extent due to software as well to be fair), UI's where slow and unresponsive and power use was very high. For many years there where things that huge, expensive, power hungry PC's couldn't do as well as the old computers they gradually replaced.

    Yet the flexibility of added CPU power created the ability to handle new problems. The modular approach with discrete sound cards and graphics cards complemented the CPU with the needed capabilities to handle graphics, sound and IO. The new computers had motherboards allowing the expansion of RAM to previously unheard amounts.

    Some of the most fundamental co-processors became targets for integration in the growing CPU's (eg: Floating point operations), many CPU's of today contain all kinds of multimedia extensions and graphics capabilities.

    The pace of innovation was incredible and the old school of highly integrated chipsets died away, it just wasn't flexible enough and couldn't keep up with the innovation pace.

    What is different now?

    The biggest difference is maturity in the computing industry, not much has really changed the last few years.

    Back in the days the introduction of a file format like JPEG was a huge challenge to older computers but today new radically different codecs are few and far between. It will be much easier for Apple to keep up with supporting these changes with tweaks to their CPU's and which codecs they accelerate via specialized processor features and most of what can be handled by an Intel/AMD CPU can be handled in the M1 anyway as they are fairly similar in computing power.

    When it comes to RAM, then it doubled every year, what was great when you bought it was a critical problem two years later, modularity and expansion was a key benefit. Virtual memory existed but swapping memory to rotating disks was a very poor workaround to having more RAM. The consumer PC's of the last decade has all been working pretty well with 8-16GB RAM, few applications run orders of magnitudes fast from having more. Swapping to SSD's is still slower than RAM but the slowdown compared to spinning drives is much less noticeable, if this happens a few times a day it's not a big deal.

    Computing has shifted from being a desk only activity to becoming something that is assumed to be portable. As the emphasis on portability has increased PC's have become less and less modular themselves. Few laptops today offer a generic internal connector for a high speed co-processing unit, the only I can think of is sockets for RAM and M2 slots which are typically occupied. The risk is much lower that a huge shift will be introduced by an amazing M2 card that everybody will buy to radically increase the capabilities of their devices, in the bad old days, upgrading add-on cards was common (eg sound, graphics, hard drive controllers etc).

    It boils down to, when did you last hear of a new type of application that required a computer upgrade? For most consumers they can hardly remember this happening, pace of innovation is still fast but doesn't require you to change your device. This is a huge benefit for optimizing for efficiency and integration.

    Opportunities and challenges

    Multitasking is something we all take for granted compared to 30 years ago there is very little benefit of specialized processors for this today. Most computers even on the low end has multiple CPU cores and software is good enough to make up for lack of parallel processing. The opportunity for integration is less, but so is the risk. Possibly some new AI processing devices will change that soon but any non modular PC design will be at an equal playing field anyway, and this isn't where Apple's M1 is competing.

    When it comes to memory capacity there will a small challenge to cater with enough flexibility for professional users that might want (or even really need) more RAM. The old "Fast RAM" trick can of course be applied and would probably be a very efficient and flexible solution to let those who like add another 32-64GB of RAM. The other option is of course to introduce a CPU model with 32GB of RAM on the package, that would increase both the capacity and likely also the performance, sounds simple but it would require space and does use more power so it's not without at least issues.

    When it comes to performance a very close integration between the CPU and its memory has a significant benefit, with low latency connection you get even better improvements from adding even faster RAM. The M1 is using DDR4 today, this could be upgraded in the future with any of several higher performance options (eg HBM or DDR5/6)

    From a marketing perspective when you control integrated features in the processor as well as the software used on all machines you'll have a much faster/easier path for optimizing specific work flows. I'd expect future Apple marketing to boast specifically about acceleration of specific work loads with numbers at least in the 200% plus region. These improvements will be impressive from a marketing point of view, and offer huge improvements for some customers. It will be a challenge over time to find these optimizations that matter to enough customers though as less and less things ever feel slow.

    There will be a challenge in offering high power graphics capabilities. For now it seems like Apple is mostly staying out of that, but eventually they might want a piece of the gaming market. Today there is still a general acceptance that you do gaming on a different device (eg Console or gaming PC). There are two risks in this area, if the interest in this market expands heavily it might force Apple back to support external graphics cards, not a big deal but some of the current benefits of avoiding that will diminish. The second and less likely but higher impact risk is that a new type of killer app emerges that benefit increadibly from having that high power gfx card even in your laptop, then Apple will need to adapt very quickly.

    Summary

    The opportunity of beating everything else on efficiency for a few years is significant, compared to what has been released now it is a question to all competitors how long they have this exclusive benefit. For a large chunk of the mainstream consumer computer market this looks like a very rational and sensible direction. The near term challenges are minor and have several well known solutions. In the long term there is no knowing how fast Apple will be to react and adapt to shift in the market, or if they will even be leading and driving that change.

    Apple M1, the modern version of the computer I used to love.

    torsdag 19 november 2020

    Apple M1 the bigger picture

    This will be an attempt to make a short yet slightly different analysis of the Apple M1 release. I will try not to repeat what's already been said 100 times.

    I see a lot of raving about M1 and to be fair it seems amazing, on the other hand for us who has been around computing since the 90's this is nothing new. In fact we used to get similar news and 50% improvements every year in the mid/late 90's. In some areas (eg: GPU's) the development has been fantastic even in the later years. But in CPU's well, we got constant improvements but not that much to rave about for the mainstream customers, until Apple released M1. So what have they really done?

    On a high level they have done something very simple, they broke rules that didn't even exist. At it's core (pun intended) there is nothing new, I don't see a single piece of technology invented in M1. Not even the integration aspect of M1 is new, it has been done for "ages" in mobile phones. What is new is the design decision to use this highly integrated design in a traditional computer.

    For most mainstream users of computing the machine is simply a black box, you don't change its internals once you've bought it, yet PC designs have been highly modular, with all the costs and issues that comes with modular design but reaping few of the benefits. This is more tradition than anything else, one manufacturer creates a dozen or so CPU designs that roughly fit everything from tablets to data centers and supercomputers (with a tweak), these are then combined with memory, storage and frequently graphics to build the consumer machine.

    This was great for the average consumer at the time when you from one year to another could upgrade memory with four times the capacity or buy a new cpu and extend the lifetime of the machine with a few more years. But as the improvements reached diminishing returns fewer and fewer upgraded, and fewer and fewer saw any need to do so. Modularity became a feature not put in real use by the majority of the customers, they basically sponsored other groups of customers. However the approach to delivering systems was such a well oiled machine and worked for the benefit of the systems builders that nobody questioned the necessity of it, or the cost...

    What cost? Modularity is great! Yes, when you use the flexibility frequently, having the ability to have a trailer after your car is great when you do it a few times a year, if you never do it you don't need the towing hook, if you do it every day a pickup might be better. For a cpu to have external memory modules is great if you do change memory chips, if not the cost is reduced memory bandwidth, higher latency and more power used, all for nothing. Some computing problems require more or less memory, but th 16GB that you can get in an M1 is what has been roughly the standard in home computing for the last 5 years or so anyway.

    There are problems that just crave those 64GB of memory to be computed efficiently but if that is your use case then you are way outside of the target audience for this chip. There are customers who benefit greatly from upgrading their computers with new graphics cards (or even multiple) and just need the extra RAM, but the M1 is not a computing revolution, the fundamentals of how computers process problems has not changed. The M1 is great optimization for a large group of mainstream Apple customers, a well designed system on a single chip that will be amazing for those it fits, not a silver bullet for all of them. Even being a computer enthusiast I appreciate what Apple has done, the M1 (and its successors) will be a great benefit for a huge amount of customers, for customers who previously didn't really have products designed specifically for them, there will likely be options for the rest of us as well later.

    A concern with highly integrated products is that the whole package need to last equally long. So even if we have leveled of the development in cpu performance and memory requirements, maybe we will soon see a big shift in AI/ML processing, if this happens the whole device might feel very aged unless an external AI/ML processor can be added via the USB-C connector. In a traditional modular PC there are more options available for such upgrades. In a worst case scenario dominant manufacturers will drive specific development to make older devices obsolete on purpose. A strong second hand market and options to upgrade or replace specific components such as cpu's and battery would be a reasonable mitigation of this if priced fairly.

    If Apple has done their homework (and they usually do) they have just created an integrated version of what was previously available as stand alone components and well known approaches in mobile phone technology. If they've done the maths properly this chip will cater to and a be a big benefit for around 80% of their users, to deal with the remaining 20% they will likely offer another chip that is great for 80% of that group and the remaining users will likely get a different and possibly more traditionally modular and likely very expensive offering.

    When Apple released the iPhone some companies and people laughed at it, I think most people recognize the same pattern this time. And most likely history will repeat itself, just as Android became a competitor for IoS devices we will likely see similar developments in PC's, will it be similarly integrated machines from Microsoft, Ubuntu or Chromebooks? Hard to say but this change has been long overdue and will likely result in similar devices from other manufacturers soon, hopefully sooner rather than later for the benefit of all customers.



    måndag 1 januari 2018

    NAS cpu performance

    Network attached storage units (NAS units) are increasingly popular as the needs for storage increase and more and more people don't keep a traditional desktop computer with room for multiple disk drives for storage.

    Traditional NAS benchmarks often focus on file transfer over network which indeed is the core use case for most. But when desktop usage is declining more and more tasks are shifted to NAS devices and this means new capabilities are needed, running virtual machines and media playback is no longer exotic or a niche.

    NAS benchmarks needs to evolve to give relevant guidance to buyers for these new scenarios, network transfer performance is no longer enough to make a decision. This need is emphasized by the huge difference in performance between different devices, for devices sometimes at similar price.

    A very basic test can be run on almost any linux based NAS unit. It doesn't nearly give complete information but it can give a very rough estimate of single thread CPU performance, it's not a good complete benchmark, but it's better than nothing:

    time $(i=0; while (( i < 999999 )); do (( i ++ )); done)
    
    
    Results for a few Synology devices (lower time is better):

    DS115J68s
    DS21267s
    DS21359s
    DS214+42s
    DS412+22s
    DS413J67s
    DS41440s
    DS415+14s
    DS416play12s
    DS718+7.7s
    DS916+11s
    DS918+7.8s
    DS1512+22s
    DS1812+21s
    DS1815+13s
    DS3018xs4.1s
    RS3617RPxs3.5s

    As the results show, devices have vastly different CPU capabilities despite all of them capable of transferring files at very high speed.

    onsdag 15 april 2015

    Google BigQuery compared to Amazon Redshift

    After tinkering with Amazon Redshift I was interested to compare it with Google BigQuery. Two solutions that both are built for analysis of huge datasets, how different can they be? It turns out that they have much in common but in some areas they are very different beasts…

    The general capability to easily handle incredible amounts of data is the same, it is truly mindboggling how these services allows you to handle multi billion row datasets. Just as the case was with Redshift it really helps to have the raw data available in a format that is heavily compressed (to reduce storage cost) and easy to process in a place that BigQuery can access efficiently. For BigQuery storing the raw data in Google Cloud Storage makes loading operations simple and fast.

    Operating these two solutions is very different, where Redshift has a web ui that allows you to manage your cluster with all the different aspects of it (hardware type and number of nodes etc) BigQuery is more of a service offering that reliefs you of all the infrastructure details. The main benefit with the BigQuery model is ease of use and quick scalability (no need to resize clusters) and the main benefit with Redshift is that you really feel that your data is on your platform, not in a shared service (the kind of minor point that still seems to be important in some contexts).

    Loading data is done with a batch load command (like the Redshift copy command), it has a wizard like user interface for configuring the details of the loading. Although I was seriously impressed with the fantastic performance of my large Redshift clusters BigQuery was even faster (single digit minutes instead of two digit minutes). The batch load wizard is simple to operate but I lack some of the flexibility in the Redshift copy command and I really missed the excellent result reports that you could get after a load operation. Due a weirdness in the internal functions of Google Cloud Storage and lack of result feedback I really struggled with data loading initially but the Google support was beyond expectations and helped me quickly with an immediate workaround and has fixed the problem now.

    In terms of performance the services are quite a bit different. On BigQuery the performance is very consistent regardless of the size of the dataset, on Redshift you can determine the performance by scaling the cluster size (at a cost though). In general I think Google has managed to strike a good enough balance for me to not care about it at just be happy that I don't have to think about it. When factoring in the large cluster size you need on Redshift to get comparable performance I'd say you are likely to have better performance on BigQuery unless you are willing to spend a lot.

    Web UI, a really nice feature BigQuery to get going quickly or for doing the odd ad-hoc query is that you don’t need any tools, there is a basic sql query interface built into the web console.

    The pricing of the services is difficult to compare since you pay for cluster runtime in the case of Redshift compared to storage and queries in the case of BigQuery. For my scenario with fairly large data volumes and a pattern of short periods of intense querying with long periods of low to none quering BigQuery is more than a factor 10 cheaper for similar performance. This cost comes from the need to continuously running a Redshift cluster for low volumes of ad-hoc queries, you trade of this low latency access and high cost to a long latency access at a lower cost (eg: starting and restoring the cluster when you need it) but with BigQuery I get the best of both worlds, paying for storage needed is still very cheap for huge datasets compared to running a cluster. Also note that with the super fast data loading in BigQuery you can have even less data loaded and keep more raw data compressed instead of loaded. The largest cost for BigQuery is the query cost, paying for data processed when having large amounts of data and a service that can process terabytes in a few seconds can hit you unexpectedly, the feeling of paying for every single select statement is a bit nagging but in the end $5 per terabyte of processed data is fairly cheap and as long as you don’t query all the columns in the table you can make pretty efficient queries. It is probably worth while to consider the different pricing models for your specific workload, in some cases (obviously in my case) the difference is huge.

    fredag 10 oktober 2014

    AWS Redshift tinkering

    For a long time I've used a little hack (http://www.albert.nu/programs/filelinestatistics/) written in my spare time to do ad hoc analysis of large amounts of log files. With a decent sized machine to run it on there was no problem to dig in and query for any aggregation or finding details in gigabytes of compressed log files.

    But every once in a while you come across that project where the data analysis needs are just that much greater. The last few days I've been doing my data analysis against some different AWS Redshift clusters. Some simple lessons learned are:

    Size matters, when working with terrabytes of data even if you can load it into a fairly small cluster you need dozens of machines to get decent performance. At least for my use case with log files from webb applications it's best to go for the SSD nodes with less storage but more powerful machines, and to make sure to have as many as possible. You might want to contact Amazon to raise the node limit from the start.

    Use the copy command and sacrifice a small bit of quality for shorter lead times. Depending on your options to continually load the data you might not need to optimize this but if you like me always have more systems and logs than you'd ever have capacity to keep in your database it becomes important to load the dataset you want fairly fast. If you store your logs on S3 it is simple to use the copy command to load surprising amounts of data in a few minutes provided you have a large enough cluster.

    Beware of resizing the cluster with tons of data, if possible just empty the cluster and reload the new cluster. When loading from S3 you don't have any extra cost for data transfer as long as you keep the cluster in the same region as the log files. If the cluster is empty you can often do a resize in less then half an hour sometimes closer to fiften minutes.

    tisdag 13 augusti 2013

    CDN? The ISP's are doing it for you!

    While analyzing the CDN access logs for a site I realized that the ratio of pageviews per visit didn’t at all reflected the amount of css/js files that was transferred. Considering browser cache I expected roughly one css/js access per visit, perhaps slighly less considering some return visitors.

    I found that access to css/js was a factor 10 to 100 less than expected. I also found that a small bunch of IP's where causing huge amounts of traffic. The top IP’s causing traffic to the site fell into two categories, crawlers (googlebot, bingbot and similar) and ISP’s.

    Obviously crawlers don’t need to get the css/js files over and over again but ISP’s not getting them when the traffic is obviously from multiple clients behind a large NAT setup or similar, why? Thinking about it for a while my best guess is that they simply do what the CDN does, pass the of traffic through a caching proxy and cache everything according to http headers, with some extra intelligence to figure out what is important enough to stay in the cache.

    This has two important implications:

    1. If you think you can control caching by controlling your CDN configuration and using the CDN purge function you are wrong. Just as something stuck in browser cache being out of your control the ISP cache is also out of your control.
    2. If you don't cache bust your resources properly you'll end up with a lot of weird behaviour for users with ISP's that have a cache.

    If enough ISP's start doing this and do it well, I even see this as an important improvement to the overall performance of the Internet. The ISP cache would be close to the user and in terms of traffic and end user performance it is a win-win for both the ISP and the site owner. This is really a half decent content delivery network and it's free!

    If someone has insight into ISP's it would be interesting to hear what technology they are using and how they are thinking. My findings might be specific to a site targeting mobile users, maybe mobile operators are more aggressive in this area? But it would be really beneficial to all ISP's.

    Surprised? I've always known that there is a potential for all kinds of proxies around the Internet. I just never seen it in effect, and I certainly didn't expect it to be this effective!