AI History on Crafty CTO

AI Engineer World's Fair 2025: My Day 1 Highlights

Wed, 18 Jun 2025 16:36:00 -0400

My live experience at the AI Engineer World’s Fair 2025 in San Francisco could be summarized as “extremely intense.” As I sat down to recap my highlights, starting with this post covering what Day 1, I ended up re-watching every session, and now I understand better why it felt so intense—the amount of information packed into the short 20 minute sessions was immense, and the people communicating that information were, on average, “brilliant plus” humans …

Here’s a quick summary table of what I attended in-person, with links to the detailed sections with my recaps—some brief, some more extended—and links to the session video. I had originally included ratings, but pretty much everything deserved five stars so I canned that idea.

I hope these notes help you get a sense of the energy, trends, and ideas shaping the future of AI and AI Engineering. Enjoy!

Section	Session	Speaker
Morning Keynotes	Designing AI-Intensive Applications	Shawn “swyx” Wang (Latent Space)
	Spark to System: Building the Open Agentic Web	Asha Sharma (Microsoft)
	State of Startups and AI 2025	Sarah Guo (Conviction)
	2025 in LLMs so far	Simon Willison (SimonWillison.net)
MCP Track	MCP Origins & Request for Startups	Theo Chu (Anthropic)
	What we learned from shipping remote MCP support at Anthropic	John Welsh (Anthropic)
	Full Spectrum MCP: Uncovering Hidden Servers and Clients Capabilities	Harald Kirschner (VS Code, Microsoft)
	MCP isn’t good, yet	David Cramer (Sentry)
	MCP is all you need	Samuel Colvin (Pydantic)
	Observable tools - the state of MCP observability	Alex Volkov (Weights & Biases), Benjamin Eckel (Dylibso)
	The rise of the agentic economy on the shoulders of MCP	Jan Čurn (Apify)
Afternoon Keynotes	Building Agents at Cloud-Scale	Antje Barth (AWS)
	Windsurf everywhere, doing everything, all at once	Kevin Hou (Windsurf)
	Define AI Engineer	Greg Brockman (OpenAI), Shawn “swyx” Wang (Latent Space)

Designing AI-Intensive Applications

Speaker: Shawn “swyx” Wang (Latent Space) – Session video

Swyx delivers an incredible overview of the state of the space. He compares today’s state of AI today to early days in physics where the Standard Model was developed and ended up serving physics almost unchanged through today. Swyx proposes a candidate for AI’s own Standard Model, SPADE:

Spark to System: Building the Open Agentic Web

Speaker: Asha Sharma (Microsoft) – Session video

As with AWS’s keynote later in the day featuring Antje Barth, this session with Microsoft’s Asha Sharma set off my “oh no, vendor product-flogging ahead” alarm. That’s not being fair though; we need major players like MS who can deliver necessary, massive-scale platforms, and they need to flog the sexy bits so they can sell the boring ones; no margin, no mission, as they say.

It’s also not fair in that Microsoft has staked out major segments in the AI landscape, even beyond their huge (if now troubled) partnership with OpenAI—Copilot of course, but also the VS Code ecosystem with its pioneering role in AI Coding, GitHub serving as a core source of context for AI Coding and actions in the agentic subcategory, and on and on.

Sharma shows us some significant new capabilities arriving in Copilot, and with her team presents a couple different agentic app solutions. Worth watching, understanding Microsoft’s broad suite of AI offerings is essential.

State of Startups and AI 2025

Speaker: Sarah Guo (Conviction) – Session video

Sarah wins the “best save” award when AV fails, leaving her with no slides … she still pulls off a great presentation, and doesn’t even run over. Excellent overview of the state of AI, the pace of progress, trends. I’m a fan of the Cursor AI Coding tool—one of Conviction’s portfolio companies—and I love Sarah’s recommendation to startups that they should think about building “Cursor for X” … it’s a nuanced message, though: she gets to the heart of what Cursor actually is: an LLM wrapper, yes, but a thick, juicy wrapper.

2025 in LLMs so far

Speaker: Simon Willison (SimonWillison.net) – Session video

I’m a huge fan of Simon’s, and here he’s in fine form—fine enough that this presentation won best-of-conference. As an indicator of the accelerating pace of LLM development, Simon was forced to cut the scope of his keynote from “the past year” to “the past six months” to have a prayer of finishing in the allotted 20 minutes.

His “draw a pelican riding a bicycle” LLM benchmark is both hilarious and an uncannily accurate LLM quality assessment. At least until “the pelican test” starts to contaminate model training—I’m betting next year this benchmark may no longer yield valid results after Willison showed the Google Gemini 2.5 Pro pelican on a Google slide from the recent Google IO event …

MCP Origins & Request for Startups

Speaker: Theo Chu (Anthropic) – Session video

Great intro to Model Context Protocol (MCP), an open standard invented by Anthropic—I love authoritative sources. Origins and futures. Highly recommended, start here on MCP.

What we learned from shipping remote MCP support at Anthropic

Speaker: John Welsh (Anthropic) – Session video

This session got straight to the heart of my most important questions about MCP. Even more background on MCP origins, vital information on the elements of MCP. Love the “pit of success” concept—make the right thing to do, the easiest thing to do. Excellent.

Full Spectrum MCP: Uncovering Hidden Servers and Clients Capabilities

Speaker: Harald Kirschner (VS Code, Microsoft) – Session video

Now we have authoritative from Microsoft: Harald Kirschner, a Principal Product Manager working on VS Code and GitHub Copilot, who was instrumental in releasing Agent Mode in Copilot. Kirschner dives a layer deeper into MCP, with great insights into how “early days” we are, and the basic primitives of the protocol. He does a great job of explaining the self-reinforcing trap that MCP finds itself in at this early stage, where both MCP servers and clients all tend to support the tools primitive, but most lack support (especially thoughtful support) of more powerful primitives such as resources and sampling that enable what he calls “rich, stateful interactions.” And guess what he announced for VS Code: full MCP spec support.

MCP isn’t good, yet

Speaker: David Cramer (Sentry) – Session video

I immediately liked Cramer and his common-sense, from-the-trenches perspective on MCP. Loved his “hot takes” imagery—he described the current phenomenon with MCP where there are lots of “I have opinions but haven’t actually built anything” people hanging around. Sentry wasn’t the earliest MCP adopter; what held them back was that they needed a Remote MCP solution with OAuth, which is only recently becoming viable. He warns about the many problems inherent with stdio and recommends to not screw around with it, just do OAuth, even internally—it will be worth the effort.

The Sentry MCP is a really a great example of “yet another bit of context” for the IDE, and their thoughtful approach instructive. Cramer reinforced the message that we can’t just wrap our APIs as MCP tools, with the same endpoints and the same payloads. “MCP is not just a thing that sits on top of OpenAPI …” Why? “Robots don’t know how to reason about giant JSON payloads that were not built for them .. we can’t just expose all those things as tools …” He makes it clear that we “need to really think about how would you use an agent today, how would the models react to what you do when you provide them context which is what this really is for, and design a system around that … it might leverage your but it is not your API.” In terms of what tools should return, he explains that, in practice JSON is the wrong answer, because language models don’t actually handle “unexpected JSON” well: “It can kinda figure out JSON here and there but if you actually push it you’ll find it breaks all the time …” What payload format to return then? Instead of JSON, they ended up using Markdown. They expose some of the API endpoints, and return some of API’s response as Markdown: “The bare essentials, in a structured way, that a human can think about—because if a human can reason about it, the language model can reason about it …”

Other great tips. “Mind your tokens …” because the user is paying for them. Or even spend some tokens yourself, by exposing agents (Sentry’s “Seer” for example) instead of dumb tools. Cramer says agentic-loop MCPs don’t work well yet (due to lack of streaming, loop waiting delays, etc.) but the promise is real and when it arrives the value will be super high. “This stuff is not that hard … you can just go build it and try it out …” “Everybody is scared of all this stuff because there’s fancy new words for everything … but those fancy new words for things we already have, a new coat of paint—MCP is just a plug-in architecture, agents are just services, LLM calls and MCP calls are just API calls with a new response format, etc.”

Cloudflare usage was really interesting: “We used a lot of Cloudflare tech … shim up a thing on Workers, they have an OAuth proxy …” explaining “we didn’t have websocket infrastructure internally, but the MCP protocol requires it … problem solved …” Super valuable session.

MCP is all you need

Speaker: Samuel Colvin (Pydantic) – Session video

Colvin and Pydantic are famous for the Pydantic Python package, which had been downloaded 1.6B times so far this year, as of the date of the talk.

The title “MCP is all you need” plays off Jason Liu’s “Pydantic is all you need” talk from about two years ago, and follow-on talk from about a year ago, “Pydantic is still all you need.” Both talks have the same theme: you’re overcomplicating things, people! Colvin goes on to show is just how powerful MCP can be, in conjunction with Pydantic’s (also free open source) Pydantic AI package, along with Pydantic Logfire, their paid observability platform based on the OpenTelemetry standard. Colvin walks through all the code, and runs it, right there during his 20-minute talk, despite Wi-Fi troubles. Colvin explains why MCP’s tool calling capabilities are much more complex and powerful than you might at first imagine—dynamic tools, logging, sampling, etc. His explanation of the poorly-named and frequently-misunderstood sampling element of the MCP protocol is excellent—he makes clear exactly why sampling is so useful and critical.

The power of well-implemented observability within an MCP server is clearly on display, delivered via Logfire, which has landed on my go-to list for observability.

Observable tools - the state of MCP observability

Speakers: Alex Volkov (Weights & Biases), Benjamin Eckel (Dylibso) – Session video

This session took me deeper into the details of MCP observability than I needed to go—I need to know what’s possible, and where to find the details when I need them, but not the full details right now when I don’t yet need them. For someone in the midst of building a robust MCP for production use, though, I’m sure this content would be super useful.

Alex Volkov’s Weights & Biases is known for their observability tool Weave. Benjamin Eckel operates MCP.run, which consists of a registry of MCPs, along with tools to glue together, orchestrate and run MCPs (AI services). Both feel the pain of the MCP observability blind spot (black boxes).

Both recommend a standards-based approach: leverage OpenTelemetry (OTel) within MCPs as the best approach for making the MCP black box observable. They explain core concepts of OTel—traces, spans, sinks, etc. Both show code solutions on their respective platforms applying OTel. Volkov also shows a great example of MCP in action in Windsurf, relating to debugging code via MCP tracing.

The rise of the agentic economy on the shoulders of MCP

Speaker: Jan Čurn (Apify) – Session video

Čurn’s company Apify is a marketplace (read: monetized) of 5,000+ Actors (read agent-like entities), historically web scraping centric, but more recently including other tool categories. Actor creators make money when their Actors are run by Apify customers, with Apify handling payments.infrastructure strong support of standa

MCPs are a slam-dunk addition to the Apify ecosystem. Apify was able to expose all 5,000+ Actors thanks to what Čurn calls the killer feature of MCPs, Tool Discovery. Not many clients support it yet—VS Code and Claude desktop just added support. Čurn explains that, while there’s no way they could publish 5,000+ Actors via OpenAPI due to search challenges, MCP’s Tool Discovery provided an elegant solution. Čurn points out how dominant MCP is becoming for agentic interaction. There are so many MCP registries that Mastra even has a registry of registries. Double meta. He considers Tool Discovery to be a huge differentiator for MCP.

Čurn suggests that, thanks to the richness of the MCP protocol, an agentic economy will arise based on MCP. But: there’s a killer problem: how to handle payments when a group (tree) of paid agents are working together to solve a problem? You might be able to discover other agents, but how would you pay them, each the normal setup is for each service to have its own API key and associated account? Give the parent agent a credit card? Bad for so many reasons. Central identity and payments providers? Maybe someday, but it’s hard and expensive to launch something like that. Apify has a solution: developers just deal with one API key, and interact only with mcp.apify.com; and entire tree of Actors can issue charges, which are passed back up to the dev via Apify. A great demo ensued using Claude Desktop.

Building Agents at Cloud-Scale

Speaker: Antje Barth (AWS) – Session video

As with Microsoft’s keynote featuring Asha Sharma, this session with AWS’s Antje Barth set off my “oh no, vendor flogging ahead” warning lights. That’s not being fair though; we need major players like AWS who can deliver massive-scale platforms, and when you’re AWS or Microsoft, you have to flog the sexy bits so you can sell the boring ones; no margin, no mission, as they say.

It’s also not fair in that, as with Microsoft, Barth and AWS actually had interesting things to show off. Alexa+ sure looks like it brings to the smart home device the level of real-time intelligence and interactivity that OpenAI’s Advanced Voice Mode delivers in their mobile app. I’m cautious, though: Amazon has botched their smart home strategy in the past—see my current Alexa deployment below. My Apple HomePods are likely to join them on the junk shelf as soon as someone trustworthy and with taste delivers on the promise. Apple qualifies as “trustworthy and with taste”—but they continue to foot-gun on AI.

It’s actually interesting how much higher my trust level is for AWS than for mainline Amazon products like Alexa, which leads us to Barth’s main topic, agents at cloud scale. Barth did a quick demo of the AWS Q CLI tool—not impressive on its own, but its three-week time-to-launch was more so. How? AWS Strands Agents (SA). Good things about SA: based over Bedrock, which means a wide selection of models to work with, including Anthropic’s; and a nice starting set of 20 pre-built tools including memory/RAG (the Retrieve tool sounds particularly interesting). What’s less clear is how the SA architecture fits in with MCP; what does SA bring to the table that MCP by itself doesn’t provide? AWS has lots of MCP server implementations, and MCPs can be used as tools within SA; but AWS seem to view MCP as “just one of the open protocols” for agents. We’ll have to wait and see how this plays out.

Windsurf everywhere, doing everything, all at once

Speaker: Kevin Hou (Windsurf) – Session video

In the AI Coding space, IDE-tool subspace, my most-admired product, and also the one I have the most experience with, is Cursor. In my experience, which combines a lot of paying attention to what others think and a moderate degree of hands-on use, Cursor leads this sub-niche. The AIEWF2025 audience seemed to agree, when someone asked for raised hands for who uses Windsurf, and who uses Cursor. My estimate was about 5:1 Cursor:Windsurf. Having seen Kevin Hou’s session live, and re-watching it as I write this, I didn’t see anything to change my opinion. Hou claimed he was sharing Windsurf’s “secret sauce”—shared timelines between human and AI—but this and other points made in the sessions struck me as marketing points as opposed to actual breakthroughs. I didn’t see anything new; I just saw obvious points made as if I knew nothing about other tools or the current state-of-the-art in AI Coding.

It’s possible, no doubt, that I’m missing something important. I strive to remain open-minded. I will give Windsurf major credit for capitalizing on OpenAI’s need to beef up their AI Coding story. If Windsurf can close that $3B deal—that fat lady hasn’t sung quite yet—being part of OpenAI may give Windsurf the unique advantage that its current product capabilities do not.

Define AI Engineer

Speakers: Greg Brockman (OpenAI), Shawn “swyx” Wang (Latent Space) – Session video

Swyx conducts a fascinating interview of Brockman, with a little help from Nvidia’s Jensen Huang (prerecorded). Having re-watched as part of writing my recap, I picked up a lot I missed in person. There’s so much content here, and I’ll zoom in on questions in the AI Coding domain.

Swyx: What are your thoughts on vibe coding?

Brockman: I think that vibe coding is amazing as an empowerment mechanism, right? I think it’s sort of a representation of what is to come. And I think that the specifics of what vibe coding is, I think that’s going to change over time, right? I think that you look at even things like Codex like to some extent I think our vision is that as you start to have agents that really work that you can have not just one copy not just 10 copies but you can have a hundred or thousand or 10,000 100 thousand of these things running you’re going to want to treat them much more like a co-worker, right, that you’re going to want them off in the cloud doing stuff being able to hook up to all sorts of things you’re asleep, your laptop’s closed it should still be working. I think that the current conception of of vibe coding in an interactive loop … My prediction of what will happen … there’s going to be more and more of that happening, but I think that the agentic stuff is going to also really intercept and overtake. And I think that all of this is just going to result in just way more systems being built.

Also very interesting is that a lot of the vibe coding kind of demos and and the cool flashy stuff, for example, making the joke website, it’s making an app from scratch. But the thing that I think will really be new and transformative and starting to really happen is being able to transform existing applications to go deeper. So many companies are sitting on legacy codebases and doing migrations and updating libraries and changing your COBOL language to something else is so hard and is actually just not very fun for humans and we’re starting to get AI that are able to really tackle those problems and the thing that I love about where vibe coding started has really been like with the most like just make cool apps kind of thing and it’s starting to become much more like serious software engineering. And I think that going even deeper to just like making it possible to just move so much faster as a company. That’s I think where we’re headed.

Swyx: How do you think that Codex changes the way we code?

Brockman: The direction is something that is like just so compelling and incredible to me. The thing that has been the most interesting to see has been when you realize that the way your structure your codebase determines how much you can get out of Codex, right? Like all of our existing codebases are kind of matched to the strengths of humans. But if you match instead to the strength of models which are sort of very lopsided, right? Models are able to handle way more like diversity of stuff but are not able to sort of necessarily connect deep ideas as much as humans are right now. And so what you kind of want to do is make smaller modules that are well tested that have tests that can be run very quickly and then fill in the details. The model will just do that, right? And it’ll run the test itself.

The connection between these different components, kind of the architecture diagram, that’s actually pretty easy to do, and then it’s like filling out all the details that is often very difficult. And if you actually do that, you know, what I described also sounds a lot like good software engineering practice. But it’s just like sometimes because humans are capable of holding more of this like conceptual abstraction in our head, we just don’t do it – it’s a lot of work to write these tests and to flesh them out and that the model’s going to run these tests like a hundred times or a thousand times more than you will and so it’s going to care way way more. So in some ways the direction we want to go is build our codebases for more junior developers in order to actually get the most out of these models.

Now it’ll be very interesting to see as we increase the model capability, does this particular way of structuring code bases remain constant? I think that it’s a good idea because again, it starts to match what you should be doing for maintainability for humans. But yeah, I think that to me the really exciting thing to think about for the future of software engineering is, what of our practices that we cut corners on, do we actually really need to bring back, in order to get the most out of our systems?

Swyx: Can you ballpark numbers on the amount of productivity you guys are seeing with Codex internally?

Brockman: Yeah I don’t know what the latest numbers are. I mean, there’s definitely a low double digit written entirely by Codex. That’s super cool to see. But it’s also it’s not the only system that we use internally and I think that to me it’s still in the very early days. It’s been exciting to see some of the external metrics. I think we had 24,000 PRs that were merged in like the last day in public GitHub repositories. And so yeah, this stuff is all just getting started.

Jensen Huang: AI native engineers in the audience, they are probably thinking, in the coming years, OpenAI will have AGIs and they will be building domain specific agents on top of the AGIs from OpenAI. So some of the questions that I would have on my mind would be, how do you think their development workflow would change as OpenAI’s AGIs become much more capable and yet they would still have plumbing workflows pipelines that they would create, flywheels that they would create for their domain specific agents. These agents would of course be able to reason, plan, use tools, have memory, short-term, long-term memory, and they’ll be amazing amazing agents, but how does it change the development process in the coming years?

Brockman: Yeah, I think that this is a really fascinating question, right? I think you can find a wide spectrum of very strongly held opinion that is all mutually contradictory. I think my perspective is that first of all, it’s all on the table, right? Maybe we reach a world where it’s just like the AIs are so capable that we all just let them write all the code. Maybe there’s a world where you have like one Al in the sky. Maybe it’s that you actually have a bunch of domain specific agents that require a bunch of of specific work in order to make that happen.

I think the evidence has really been shifting towards this menagerie of different models—I think that’s actually really exciting. There’s different inference costs, there’s different trade-offs like just distillation works so well, there’s actually a lot of power to be had by models that are actually able to use other models. I think that that is going to open up just a ton of opportunity because you know we’re heading to a world where the economy is fundamentally powered by AI. We’re not there yet but you can see it right on the horizon. I mean that’s what people in this room are building – that is what you are doing.

The economy is a very big thing, there’s a lot of diversity in it, and it’s also not static, right? I think when people think about what AI can do for us it’s very easy to only look at what are we doing now, and how does AI slot in, and the percentage of human versus AI, but that’s not the point. The point is how do we get 10x more activity, 10x more economic output, 10x more benefit to everyone. I think the direction we’re heading is one where the models will get much more capable, there’ll be much better fundamental technology, and there’s just going to be way more things we want to do with it and the barrier to entry will be lower than ever. And so things like healthcare that requires responsibility to go in and think about how to do it right. Things like education where there’s multiple stakeholders – the parent, the teacher, the student – each of these requires domain expertise, requires careful thought, requires a lot of work.

So I think that there is going to be so much opportunity for people to build. So l’m just so excited to see everyone in this room because that’s the right kind of energy.

AI Writing Index, Feb '23 to Jun '24

Tue, 10 Jun 2025 15:36:00 -0400

As I dig in to recap what I saw and learned at the AI Engineer World’s Fair 2025 in San Francisco, I was forced to face my failure to write consistently – a failure that dates back almost exactly a year, to June 12, 2024. Thinking back on it, I see two causes. First, my book of business on the fractional CTO side of things filled up completely, reducing available writing time. Second, the AI emergence, like Solomon Epstein’s ill-fated space yacht in The Expanse, just kept accelerating. I had “gotta write about that!!” panics daily, until I finally had to just stop rather than driving myself nuts.

I do have to say, though, that during the 16 months from February 2023 to June 2024:

I was early to AI
I wrote a lot
It was pretty decent stuff
My opinions and predictions regularly proved accurate. (One prediction, that Apple might get pinched by the AI emergence, is playing out in front of our eyes right now. Apple had little AI-related to announce at this week’s WWDC, and had no visibility whatsoever at AIEWF2025.)

It’s also pretty amazing to see what has happened in the past year. At AIEWF2025, Simon Willison was forced to cut the scope of his LLM update keynote from “the past year” to “the past six months” to have a prayer of finishing in the allotted 20 minutes.

Anyway, take a look at some of the stuff below and see what you think.

P.S.: In June 2025, Cursor and LLMs are vastly better at writing Markdown than a year ago.

Blog Posts

Title	Date	Description
Using Multimodal AI to Capture and Enrich Heirloom Recipes	2024-02-19	I applied OpenAI’s GPT-4 Vision model and Chat Completions API to preserve a treasure-trove of legacy family recipes.
An Appllama Week in AI	2023-07-21	Meta makes waves with Llama 2, while Bloomberg pumps itself with Apple LLM non-news.
Really, OpenAI?	2023-07-07	ChatGPT+ subscribers can prevent OpenAI from using their inputs as training data. That is, so long as they forego the service’s second-best feature.
Of Moats and Moat Busters	2023-06-29	Open Source, Commercial-friendly AI Challenges the Major Closed AI Players
Unlimited Stupid Questions	2023-06-14	Possibly the most important benefit of ChatGPT …
Is Runway Gen-2 Midjourney for Video?	2023-06-08	We Tested Gen-2 and Share the Resulting Videos
The Emergence	2023-04-18	Season 1, Episode 4
March Madness, AI Style	2023-03-18	Observations on AI’s March acceleration
Alarmed About AI?	2023-02-20	John Seely Brown, former director of Xerox PARC, has helpful advice—from April 2000!
Accelerating Crafty’s Launch Using AI Models	2023-02-08	Thanks to AI-based tools like Midjourney, Crafty shaved at least a month off its time to launch.

Micro Posts

Title	Date	Description
Apple Intelligence Initial Thoughts	2024-06-11	Mostly from other people, I’m still digesting
A Quick Look at GPT-4o	2024-05-19	I did a quick benchmark of the new GPT-4o model versus GPT-4-turbo. Roughly twice the speed, with improved Vision accuracy.
OpenAI Vision + Function Calling Examples	2024-04-28	OpenAI’s April 9th model update merges Turbo and Vision, and these examples show the two working together.
Simon Willison’s LLM Tool: Now I Have 50 LLMs	2024-03-25	I updated my installation of Willison’s LLM tool to add plugins, and now I have 50 LLMs at my fingertips, including 15 local models, which get installed on demand.
Did GPT-4 Code Interpreter Escape From Its Sandbox?	2023-07-14	I gave Code Interpreter a workout this morning, and it appeared to exit the building.
ChatGPT+ Privacy Update	2023-07-10	There appears to be a workaround for now to OpenAI’s “If you want privacy, no chat history for you!” policy.
Llama 2	2023-07-19	HN comments on Llama 2.
Apple Really Did Ignore the AI Emergence at WWDC	2023-06-18	As I watched the WWDC Keynote and Platforms State of the Union from this year’s WWDC, I was amazed that Apple appeared to be ignoring the current massive emergence in generative AI. Later, I confirmed it.
ChatGPT Code Review	2023-06-03	Another great use: coming up to speed on an unfamiliar codebase.
Working With My ChatGPT Intern	2023-05-29	Yesterday was a new record.

Elsewhere

Title	Date
Two Worthwhile Reads From Simon Willison	March 25, 2024
Willison: The killer app of Gemini Pro 1.5 is video	February 22, 2024
Ethan Mollick: “How to Use AI to Do Stuff”	July 26, 2023
OpenAI Concedes: AI Can’t Detect AI	July 26, 2023
“Commoditizing the Petaflop” with George Hotz of the tiny corp	July 9, 2023
Just-released GPT-4 Code Interpreter is a Big Deal, Part 1	July 11, 2023
Just-released GPT-4 Code Interpreter is a Big Deal, Part 2	July 11, 2023
The Lone Banana Problem	July 5, 2023
Midjourney V5.2 (Ars Technica)	June 24, 2023
I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI	June 19, 2023
Comparing Adobe Firefly, Dalle-2, OpenJourney, Stable Diffusion, and Midjourney	June 21, 2023
Faster sorting algorithms discovered using deep reinforcement learning	June 7, 2023
It’s infuriatingly hard to understand how closed models train on their input	June 4, 2023
OpenAI’s plans according to Sam Altman	June 1, 2023
Lawyer cites fake cases invented by ChatGPT, judge is not amused	May 30, 2023
CLI tools for working with ChatGPT and other LLMs	May 29, 2023

Quotes

Title	Date
How to Ask ChatGPT a Technical Question	March 17, 2024
MosaicML’s Open Source MPT-7B Model Writes an Epilogue to The Great Gatsby	March 11, 2024
Gruber Responds to Gurman’s Report of AI Anxiety at Apple	March 5, 2024
Vivek Haldar on GitHub Copilot	June 10, 2023

Observable Framework With Hugo

Wed, 12 Jun 2024 08:23:09 -0500

In my recent post on Observable Framework, I linked to Observable’s Mortgage Rates demo dashboard, which shows 15- and 30-year fixed mortgage rates from 1971 through the present, using data that is updated weekly and thoughtfully shared by Freddie Mac. The demo dashboard’s Data Loader, if it had been set to run periodically (weekly for example), would always ensure that the dashboard shows Freddie Mac’s latest weekly data, but you’ll note that it doesn’t–the demo shows data as of May 1—simply because Observable hasn’t “productionized” their demos with scheduled CI/CD.

Mike Bostock suggested I could simply clone the Framework project, which includes all the examples, and deploy my own version of Mortgage Rates. Since the Crafty site uses static site generator Hugo, “deploying my own” involved figuring out how to take the Framework build process’s static output and deploy as part of Hugo’s static output.

That turned out to be rather easy. Hugo has a /static directory, and everything in it gets deployed directly to Hugo’s final output directory, /public, during the Hugo build. Framework’s build similarly creates a /dist directory with its final output. So the steps are:

I first created an /observable/mortgage-rates directory under /static
I then ran the Framework build for the Mortgage Rates example
Next, I copied the contents of the resulting /dist directory over to /static/observable/mortgage-rates
Then I ran my Hugo build, which moved those files to /public/observable/mortgage-rates
Finally, I pushed the updates to GitHub, which in turn triggered Cloudflare Pages CI/CD to run and deploy the site.

Voila—note the craftycto.com domain and the June 5, 2024 as-of date:

The manual steps here, of course, need to be automated … so I have productionizing of my own to do.

The bigger takeaway here is that, thanks to Framework’s static generation approach, the resulting data dashboards, reports, etc. are completely self-contained, data and all. You can pick up whatever Framework build creates in the /dist directory and drop it anywhere—no data connections to set up, no dynamic web frameworks to configure—you’re done. That’s pretty great.

Apple Intelligence Initial Thoughts

Tue, 11 Jun 2024 13:48:52 -0500

I’ve been busy and haven’t watched any of the WWDC 2024 sessions yet, but I’ve definitely been observing news and people’s impressions. Having one foot deeply in the Apple world, and the other just as deeply sunk into AI, provides an interesting vantage point.

Over on Hacker News, the top of the Apple Intelligence for iPhone, iPad, and Mac comment thread was interesting:

To me, the key insights are:

It’s the benefit of how Apple does product ownership. In contrast to Google and Microsoft … I hadn’t considered it, but AI convergence is going to lay bare organizational deficiencies in a way previous revolutions didn’t … Nobody wants a GenAI feature that works in Gmail, a different one that works in Messages, etc. – they want a platform capability that works anywhere they use text … I’m not sure either Google or Microsoft are organizationally-capable of delivering that, at this point.

And:

‘AI convergence is going to lay bare organizational deficiencies in a way previous revolutions didn’t’

Your quote really hit me. I trust Apple to respect my privacy when doing AI, but the thought of Microsoft or Google slurping up all my data to do remote-server AI is abhorrent. I can’t see how Microsoft or Google can undo the last 10 years to fix this.

It’s early days yet, but if Apple manages the jiu-jitsu move of leveraging their platform strengths to overcome a huge lead by other players, that will be quite an achievement at the product and company level.

Elsewhere, Simon Willison has a good early take generally and specifically on Apple’s Private Compute Cloud.

Observable Framework Delivers Blazing-fast Data Dashboards

Mon, 10 Jun 2024 00:10:04 -0500

Stumbled Upon …

I’ve had requests to share my Heirloom Recipe Ingestor project, but it’s still in the hacked-together stage, not ready. I’ve been wanting to clean it up, and when I made updates to use the non-preview GPT-4-turbo model that OpenAI released on April 9th, I began cleaning up the code.

To make the project public, though, I wanted to do more significant remodeling. One example: the script currently runs in CLI mode with arguments, but it’s not implemented in the latest-best-Python way, and I’d definitely want to correct that. (Typer from Tiangolo looks like a promising solution, by the way).

As I was mulling this, I came across a pair of new Vision examples in the OpenAI Cookbook. Besides finding the examples very useful, I noted that OpenAI shared these as a Jupyter Notebook, rather than as a traditional Python project. That got me wondering if maybe I should the same approach for Recipe Ingestor.

Before I commit heavily to a particular technology or platform, I like to survey of the state of the art in that particular niche. So before committing to Jupyter Notebook / Jupyter Lab, I did a quick survey of the state of the art in data notebooks—was there an up-and-coming alternative to Jupyter I should be looking at?

My Kagi search returned hits for Observable Notebooks, a tool I had peripheral awareness of through Simon Willison, who regularly publishes work in Observable Notebook form, when that format seems a better fit than, say, a simple Python script.

So I dropped in over at ObservableHQ to get up to speed on Observable Notebooks—only to find that Observable had changed course. Though Observable Notebooks are still alive and well (and will, I suspect, get even better, as of February, the company’s primary focus has shifted to what they are calling Observable Framework. That led me down one final rabbit hole, but a very useful one, as I soon realized that Framework was a perfect fit for one of my fractional CTO client’s “data science reporting and dashboard” needs.

The Need

The client in question is a non-profit doing great work in sustainable agriculture, helping farms take informed action based on soil data. The majority of their team are data scientists. Lots of data has been collected, and now the data team is charged with producing useful, coherent, actionable outputs for our users, who are mostly farmers.

The data team works mostly in the R language, one of the primary languages of data science. R is widely used in agricultural data and particularly soil analysis. R is a data manipulation, aggregation, and analysis powerhouse, with plotting and charting libraries capable of a wide range of data visualizations. That said, R’s natural mode of operation is one-on-one with the data scientist; output is more about “produce those charts we need for that scientific paper” than “create awesome reports and dashboards for our users.” R can “do web stuff” in a number of ways, for example using the Shiny package, but it feels a bit unnatural and bolted-on.

The non-profit’s primary user-facing (farmer-facing!) platform, on the other hand, sits solidly in the center of mainstream, modern web / mobile. Logic is JavaScript / TypeScript, running inside open source frameworks like Node.js, React and React Native, augmented by dozens more open source libraries filling niches like Object Relational Mapping (Drizzle). Data lives in the powerful, scalable Postgres open source database.

All of this deploys gracefully onto modern AWS infrastructure, and the result is capable of supporting tens of thousands of web and mobile users. Costs are low—tooling costs almost zero—and the result has been highly effective for developers and enabled us to deliver an excellent web and mobile experience for our users.

We Have Questions

This left me asking a number of questions:

What is R really best at? Is it the data manipulation and analysis side or also data visualization?
Does R have enough advantages for our data science use cases to stick with it?
If R can’t “do it all,” should we move away from it?
Where is the line between “use R for this” and “this would probably better be handled with something else”?
If we don’t use R for visualizations, what would we use instead?
Aren’t we reinventing the wheel here—hasn’t somebody already figured this out? Isn’t there some kind of framework that at least provides a clean separation of concerns between data and presentation?

Enter Observable Framework

With these questions swirling around in my mind, it was a truly fortuitous moment when I landed on the Observable website and began to get my head around Observable Framework.Because, once I understood it, bang, my entire set of questions were answered, and the solution was beautiful. Thank you Mike Bostock and team.

Observable describes Framework as “solving data’s last mile problem.” This is a reference to the telecom world where by far the most expensive part of their network is that last mile, connecting up the actual consumers. For data, Observable is saying, that last mile is getting data onto the screen of the user in need, in a way that is highly useful and actionable—and that this is the hardest problem to solve.

Eureka! This is it! The last mile problem is exactly what my non-profit was struggling with. And wow, does Framework solve it elegantly.

We Have Answers

Let’s look at the answers Framework gives us:

Q: Does R have enough advantages for our data science use cases to stick with it?
- A: In soil science, it sure seems to. But, not to worry, you’re free to mix and match—use whatever data science tool fits best in each situation—R, Python, Julia, TypeScript, whatever. [link to O blog post about data loaders]
Q: If R can’t “do it all,” should we move away from it?
- A: R doesn’t need to do it all—just use it wherever it’s the best tool, which, in soil science, is frequently.
Q: Where is the line between “use R for this” and “this would probably better be handled with something else, e.g. mainstream web tech”?
- A: Although R can create visualizations, its sweet spot is data. Framework’s separation of concerns allows us to use R in its data sweet spot, while applying modern web technology for the presentation layer including visualizations.
Q: If we don’t use R for visualizations, what would we use instead?
- A: In Framework, the general answer for visualizations—and the presentation layer generally—is “use modern web technology.” This consists of basic web building blocks—HTML, JavaScript, CSS, Markdown—augmented by any of the dozens of powerful visualization libraries that exist in the modern web, most free and open source. The cherry on top is that Observable’s founders and team happen to have built two of the best viz libraries in existence—details below when we dig into the presentation layer.
Q: Aren’t we reinventing the wheel here—hasn’t somebody already figured this out? Isn’t there some kind of framework that at least provides a clean separation of concerns between data and presentation?
- A: Yes, someone has figured it out! In Framework, Observable have an exceptionally elegant framework, with clean separation of concerns between data and presentation among its many benefits.

Data Layer 1: Data Loaders

In Framework, Data Loaders are the data layer, the first step towards getting data onto the screen, be it for a dashboard, report, or other data visualization. The data files created by Data Loaders power the reports and visualizations of the presentation layer that we explore later.

Data Loaders can be written the language of your choice: data science languages like R, Python, and Julia, but also JavaScript, TypeScript, Java, Rust, Go, PHP, and even shell scripts. This is huge—let data scientists and developers use their tool of choice, whatever best fits the need. Mix and match at will.

Does loading data with a shell script sound crazy? You might be surprised. Here’s a one-line Data Loader shell script that downloads recent earthquakes from the USGS in GeoJSON format:

curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson

Thanks to Framework’s clean separation of concerns and elegant design, Data Loaders are simple programs—just connect to your data, wherever it is—database, data warehouse, cloud APIs, or all of the above. Join, transform, reshape, and munge it to your heart’s content. When done, simply dump the output, in almost any data format, to standard output. Framework automatically runs the Data Loader during the build process, intercepts its output and creates the output file, ready for use by the presentation layer.

Since Data Loaders run at build time—rather than runtime in front of an impatient data consumer—there’s less need to optimize code and queries. Shell script curl command a little slow? Who cares! Database indexes not optimized for this query? No worries! Cloud API introduce a lot of latency? No harm no foul! It’s build time, not runtime—time is measured differently here. Keep it simple.

Data Loaders really are simple—in developing our own proof of concept, I created five Data Loaders that connected to an RDS Postgres cluster and a pair of cloud APIs, producing both CSV and Apache Parquet files, in just a few hours.

Data Layer 2: Static Generation, Dynamic Dashboards

As you read this post, you’re benefitting from “static site generation.” The Crafty website is built using the static site generator Hugo. There’s no server runtime behind this site—everything’s pre-generated at build time, including HTML for every page, optimized images for every window size, optimized CSS and JavaScript where needed. No server round-trips for anything, just the initial page load. Even dynamic functions like site search run locally in the browser via JavaScript. Navigate around a little, it’s lightning-fast.

The “lightning-fast” part seems to have appealed to the folks at Observable, who recognized that static site generation could work its magic for data dashboards as well—that “statically generated, lightning-fast” and “dynamic, interactive user experience” aren’t mutually exclusive.

There’s a catch, though: lightning-fast user experiences can’t happen if the data’s slow. And slow is typical with traditional data warehouse + dataviz / BI tools. Executing complex queries in real time against remote data can take 10s of seconds and sometimes minutes—while the user waits impatiently.

Well, guess what: those same Data Loaders whose elegant separation of concerns simplify the gathering of data for dynamic dashboards and reports, also solve the slow data problem. Data Loaders do the time-consuming work of querying and collecting data at build time. At runtime, the data is already pre-generated, cached, and ready to access in milliseconds. So thanks to static generation and Data Loaders, Framework is able to deliver a near-instantaneous interactive experience for data.

Data Layer 3: How Fresh is Fresh Enough?

You may still be having doubts about the idea of a data dashboard that isn’t driven by live, real-time data. It’s natural to conflate “dynamic, interactive data consumer experience,” which we’re all in favor of, with “querying live data from a remote database, data warehouse, or cloud API in real time,” which isn’t necessary most of the time and often ruins that interactive experience by making it slow.

I love these two tips in Observable’s excellent Data Loader documentation, because they hit the nail on the head around “how fresh does my data need to be?” Tip 1:

I’d paraphrase the first part of this tip as “Dude, if you really want up-to-the-second data, we gotcha covered!” The second part addresses a Framework nuance that I haven’t yet mentioned: databases and data warehouses aren’t the only source of essential data for dashboards and reports—often times static files from various external sources can be just as essential. For example, in our proof a concept, I needed to plot soil sample locations on maps, for both the entire U.S., and then for individual states. To get the country and state outlines needed as the backdrop for my plots, I just included a publicly-available, 842K static JSON file in my Framework project. The file uses a variant of GeoJSON called TopoJSON and includes outlines of the U.S. itself, along with every state, and every county. Observable Plot, which we explore below, understands TopoJSON and easily produces the geographical plot needed. Tip 2:

The second tip addresses a critical supporting tool for Data Loaders and data freshness in Framework: Continuous Deployment. In parallel with the tremendous progress made in static site generation through projects like Hugo, the tooling for Continuous Integration/Continuous Deployment (CI/CD) has radically improved and become ubiquitous. In fact, the two spaces intersect significantly: as an example, I rely on CI/CD tools provided as part of Cloudflare Pages to automatically redeploy this site, triggered when I push updates the main branch of my GitHub repo. CI/CD is all about builds, and as the tip tells us, Continuous Deployment combined with Framework Data Loaders means we can “rebuild data as often as you like, ensuring that data is always up-to-date.” GitHub Actions are probably the most widely-used CI/CD tool, and can automate Framework builds so they happen exactly when we need them too, whether that be time based (e.g. once an hour); event based (the new batch of sales data has been processed); or programmatically—any which way you need.

Presentation Layer 1: Markdown

The central element of Framework’s presentation layer is Markdown, which Observable’s documentation elegantly describes:

Markdown is a language for formatting text and content; it’s a lightweight, ergonomic alternative (and complement) to HTML.

In a Framework project, each Markdown file represents one page, such as an interactive dashboard or report. Framework extends Markdown in several powerful ways:

Markdown in Framework extends CommonMark with a handful of features useful for data apps, including reactive JavaScript, HTML, YAML front matter, grids, cards, and notes.

Use JavaScript to render charts, inputs, and other dynamic, interactive, and graphical content on the client. JavaScript in Markdown can be expressed either as fenced code blocks or inline expressions. You can also import JavaScript modules to share code across pages.
You can write HTML directly into Markdown. HTML is useful for greater control over layout, say to use CSS grid for a responsive bento box layout in a dashboard, or adding an external stylesheet via a link element.
The grid class declares a CSS grid container. The grid class is designed to pair with the card class and the dashboard theme for dashboard layout.
The card class is used to group and delineate content. The card classes applies a background and border (with colors determined by the current theme).

As we explore later, Framework’s powerful Markdown extensions mean than each Markdown page is equivalent to a full data notebook—actually more powerful than Observable’s own Notebooks, which themselves outshine most other such as Jupyter.

Presentation Layer 2: JavaScript

JavaScript in Framework Markdown is like water to fish—everywhere. From Observable’s docs:

Use JavaScript to render charts, inputs, and other dynamic, interactive, and graphical content on the client. JavaScript in Markdown can be expressed either as fenced code blocks or inline expressions. You can also import JavaScript modules to share code across pages.

JavaScript fenced code blocks (```js) are typically used to display content such as charts and inputs. They can also be used to declare top-level variables, say to load data or declare helper functions.
Inline expressions ${…} interpolate values into Markdown. They are typically used to display numbers such as metrics, or to arrange visual elements such as charts into rich HTML layouts.

Think of a Framework Markdown page as being wrapped in a JavaScript context, what Observable calls the runtime—not just the ability to run JavaScript code basically anywhere, but also with our full data context available to operate on and visualize.

Presentation Layer 3: Getting the Data

How do we get the data, for example data snapshots created by Data Loaders, available to the presentation layer and accessible via JavaScript? From the Observable docs:

Load files — whether static or generated dynamically by a data loader — using the built-in FileAttachment function. This is available by default in Markdown.

Here’s a Markdown code block to load a JSON file listing active volcanos:

```js
const volcano = FileAttachment("volcano.json").json();
```

When the variable volcano is referenced in another code block or inline expression, for example in a visualization, it contains the appropriate JavaScript Array, Object, etc. to represent the JSON data. Two magical Framework notes:

The presentation layer doesn’t need to know if the file volcano.json was generated by a Data Loader, or was simply included in our project as a static file—Framework manages all of this at build time for us. Specifically, Framework build looks for the file volcano.json, but if it’s not missing, looks for a Data Loader based on the same name, say volcano.json.py, and runs it to generate the data snapshot.
Thanks to reactivity, even though FileAttachment is an async function since it’s reading data from the file system, we don’t need to deal with async logic—the Framework runtime manages this for us automatically.

Here’s a full list of formats that FileAttachment supports:

Presentation Layer 4: Client-side SQL

There’s one more totally amazing way to access data in the presentation layer: client-side SQL. From the Framework docs:

Framework includes built-in support for client-side SQL powered by DuckDB. You can use SQL to query data from CSV, TSV, JSON, Apache Arrow, Apache Parquet, and DuckDB database files, which can either be static or generated by data loaders.

To use SQL, you first register a SQL source in the Markdown doc’s front matter. Here we register a sample Gaia astronomical data source which happens to be in Apache Parquet format:

---
sql:
  gaia: ./data/gaia-sample.parquet
---

The same build-time Data Loader logic we described above applies here—if the file gaia-sample.parquet doesn’t exist, Framework build look for a matching Data Loader such as gaia-sample.parquet.py and run it to create the file.

Now we can query that source from a SQL code block:

```sql id=top10
  SELECT * FROM gaia ORDER BY phot_g_mean_mag LIMIT 10
```

The id=top10 stashes the result of the query in a JavaScript variable:

The value of a SQL code block is an Apache Arrow table. This format is supported by Observable Plot, so you can use SQL and Plot together to visualize data.

So, right in the browser, we’re running live, superfast SQL queries on our data—no server roundtrips and near-instantaneous responsiveness. When paired with Framework reactivity and interactivity, client-side SQL is a huge win for enabling blazing-fast interactive dashboards.

(Framework’s strong Apache Parquet support is a related win. Compared with other data formats, Parquet is very small, very fast, and works great with Framework’s DuckDB-powered SQL.)

Presentation Layer 5: Reactivity

JavaScript in Framework has a superpower: it’s reactive. From the docs:

Framework runs like a spreadsheet: code re-runs automatically when referenced variables change. This brings:

Easier interactivity because state is automatically kept in sync
Easier asynchronous programming via implicit await of promises
Better performance with incremental re-rendering
Greater flexibility by writing code and prose in any order

Reactivity is especially helpful for data apps because these apps tend to have complex state.

Reactivity was a key differentiating feature of Observable’s first product, Notebooks. In most data notebooks, Jupyter for example, cells execute in a linear order, and the user must manually re-run cells when dependencies change. In contrast, an Observable Notebook …

… knows which cells depend on which others, no matter where they are located in the notebook. So when one of them changes, all values that depend on it are automatically updated. This is also efficient: only those cells that are affected by a change are re-run. That means that you can organize cells in your notebook however you want, without worrying about how the order of operations may be affected.

Reactivity is even better in Framework, because the augmented Markdown format that sits at the heart of Framework’s presentation layer is more powerful and flexible than the original app-based Notebook model. Framework Markdown freely combines content with code; uses vanilla JavaScript rather than a JavaScript dialect with syntax differences; and finally, since a Framework page (the equivalent of a Notebook) is represented by a single, simple text Markdown file, it works flawlessly with source control / CI-CD platforms like GitHub.

Presentation Layer 6: Interactivity

Reactivity brings many benefits, but perhaps shines most brightly when it comes to enabling interactive dashboards, where user inputs dynamically reshape dashboard output.

Framework Inputs support direct user interaction on dashboards and reports. The 14 flavors of Input include Radio, Checkbox, Select, Range, Toggle, Date and eight more.

Let’s make a histogram of soil carbon values, with the width of each histogram bar (and therefore the number of bars) controlled by a Range Input. This code block with a single line of code:

```js
const binWidth = view(Inputs.range([.2, 1], {step: .05, value: .5, label: "Bin Width (%):"}))
```

puts a range slider control onto our dashboard screen, with min/max values of .2 and 1. The JavaScript variable binWidth has an initial value of .5, and gets updated as the user moves the slider left or right. In another code block, Observable Plot’s binX transform references binWidth, and thanks to Framework reactivity, the histogram updates in real time. Here’s the result:

Voila, one line of code, and we’ve got our first interactive dashboard element.

Now let’s take a look at a more complete and interactive dashboard (try it live here), which shows 15- and 30-year fixed mortgage rates from 1971 through the present, using data that is updated weekly and thoughtfully shared by Freddie Mac as a downloadable CSV. A 15-line Node.js JavaScript Data Loader grabs the latest CSV, current to within a week.

As you see below, the upper right card shows a detailed chart of mortgage rates, starting with the past year; while the bottom pane shows a full-history rate chart that covers 1971 through the present. In this case, the interaction isn’t through a Framework Input, but rather through JavaScript click and drag events setting a date range variable startEnd when the user interacts with the lower chart. The detailed chart above right references startEnd and reacts instantly when it changes. All four charts in the dashboard are rendered by Observable Plot.

As you can see, Framework supports highly-interactive experiences for data consumers.

Presentation Layer 7: Visualizations

So far, we’ve covered the data layer including a deep dive on Data Loaders, and many aspects of the presentation layer, from Markdown to interactivity. But what about the visualizations themselves? Here Framework’s options are virtually unlimited: virtually any JavaScript visualization library can be used within Framework. To insert a visualization into a Framework Markdown page, you just call the library from a JavaScript code block or inline expression.

Observable themselves actually created two of the most powerful libraries (both fully open source): Observable Plot, which is built on top of the lower-level D3. Just to give you a taste of what’s possible, here’s a tiny subset of the available examples for Plot and for D3:

But Observable’s own libraries are far from the only options—other widely-used general plotting libraries including Apache ECharts, Plotly, and Vega-Lite work great within Framework.

Mapping-specific libraries such as Leaflet and Mapbox GL JS are fully supported, as are a number of specialized viz libraries such as DOT, Deck.gl, Mermaid, and Mosaic vgplot.

Framework in a Nutshell

Observable’s homepage tagline is:

The best dashboards are built with code.

If one considers the conventional alternatives—a massive SaaS app along the lines of Tableau, where our visualization possibilities are limited by what the SaaS vendor decides we should have—I’ll take dashboards built with code, thank you very much.

A Quick Look at GPT-4o

Sun, 19 May 2024 20:32:43 -0500

Since last week’s Spring Update event from OpenAI, I’ve been wanting to test the new Omni model (GPT-4o) in my Vision-based recipe ingestor project. This Python script processes a directory of recipe images, and using GPT-4’s Vision capabilities, as well as JSON Mode, produces a structured JSON recipe that I can post directly to my Paprika Recipes app.

Benchmark Results

For a batch of three recipes / four recipe images, bypassing uploading to the Paprika cloud.

Quantitative

GPT-4-turbo: 1:03 (1 min 3 secs)
GPT-4o: 0:38 (38 secs)

Qualitative

GPT-4o’s Vision capabilities seem to have improved over GPT-4-turbo. Example: GPT-4-turbo had a hard time parsing the handwritten name “Debbie Gardner”, producing “Betty’s Cards”, “Betty Bard”, “Betty Borden” etc. on different runs. GPT-4o got the name correct.

OpenAI Vision + Function Calling Examples

Sun, 28 Apr 2024 09:27:35 -0500

When OpenAI released preview Vision support in GPT-4, it had some major limitations, lacking many of the newest features available in GPT-4 Turbo, including structured output (JSON Mode) and function calling. I had to hack my Recipe Ingestor project, which applies GPT-4 Vision to ingest, structure and enrich legacy recipe images, to run in two passes as a workaround.

On April 9th, OpenAI released an updated gpt-4-turbo model:

GPT-4 Turbo with Vision: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.

I’ve updated Recipe Ingestor so it now runs nicely in one pass that uses Vision alongside Turbo’s JSON Mode.

In making these updates, I dug around in the OpenAI docs and came across a pair of new Vision examples in the OpenAI Cookbook that take advantage of the April 9th model, applying function calling driven from image inputs. Provided as a Jupyter Notebook, the examples have me mulling whether function calling might make sense for Recipe Ingestor.

Using Tailscale to Access Amazon VPCs, EC2 Instances, and RDS Clusters

Tue, 23 Apr 2024 12:19:30 -0500

Inception: Direct Database Access for the Data Team

Our immediate need was getting the data science team programatic access to a read replica of our production database, an Amazon RDS Postgres cluster.

The initial stopgap solution was to open a Postgres port, but with an RDS Security Group rule that limited inbound access to a few specific IP addresses. That solution became high-maintenance quickly, since some of our team connected from a university campus network, where IP addresses rotated every few weeks.

Better Idea: AWS Systems Manager Session Manager

Our development partner suggested a better solution, AWS Systems Manager Session Manager, which enables tunneled sessions into the AWS environment and leverages AWS IAM to manage access. The result: better security and no need for manual IP whitelisting.

SMSM has one significant drawback: targeted at dev and devops folks, it is configured and run through the AWS command line interface. Yikes—imagine conversations with end users that begin “OK, first, open a terminal, then run this command …”—not the user experience we’re going for. Our data team could’ve handled it, but the CLI interface would be an ongoing source friction and pain.

More Session Manager rough edges appeared when I tried but failed to set up a quick proof of concept. The AWS docs were impenetrable, and even a third-party guide on the process wasn’t help enough. All the time, I was becoming less and less convinced that Systems Manager was the best solution: “All this mucking around, just to open a tunnel to an RDS replica?” I also began to realize that Session Manager was, in effect, a limited-scope VPN for AWS services only, without much of a user interface. Why invest precious time on such a limited solution?

Bigger Picture: Other Needs

When I broadened my thinking, I quickly realized we had already implemented two other one-off remote access solutions:

To enable SQL GUI tool access to the database, we set up bastion servers on our staging and production VPCs. These were Windows EC2 instances, with their own login credentials and the DBeaver SQL tool installed. We attached to the bastions using Remote Desktop. Scrolling through long tables in DBeaver via Remote Desktop is unpleasant bordering on dangerous on a write-capable connection.
To support miscellaneous tasks such as database migration, we had two additional Ubuntu EC2 instances, with open SSH ports.

With Systems Manager Session Manager shaping up as yet another one-off remote access solution, I decided to seek a broader solution that addressed all three remote access needs, provided a better admin and user experience, and shored up security as well.

Security Goals

Our desired security posture on AWS is “expose nothing we don’t absolutely need to expose, and what we do expose, make it robustly secure.” To drill down a bit:

Keep our AWS Virtual Private Clouds walled off and private.
Avoid exposing attack surfaces, such as SSH / Postgres ports and internal-only service endpoints.
Enable remote access in a robustly secure manner, granting specific users access to specific services, as opposed to blanket access for everyone.

VPN—or Tailscale?

The default solution for secure remote access is a VPN; and there’s no doubt a VPN could have worked for us. “Easy,” “Low Maintenance” and “Flexible” aren’t terms one usually associates with VPNs, however, so I kept solution-scanning.

I’d been peripherally aware of Tailscale through background chatter on Hacker News and similar venues. My vague impression of its niche was “better than a traditional VPN, and way less painful.” That sounded just perfect for our needs, so I took the opportunity to explore it more deeply, was immediately intrigued, found wonderful documentation including excellent setup guides for exactly what we needed, and before I knew it had a Tailscale-on-AWS proof of concept up and running.

What is Tailscale, Really?

The very minute I sat down to write this section, Tailscale’s monthly email newsletter landed in my inbox, and it highlighted this excellent Tailscale post by Casey Liss. Casey operates the Accidental Tech Podcast, and has several iOS apps in the App Store including the excellent Callsheet (4.9 star rating, App Store Editors Choice). So Casey already wrote it better than I could, so to get up to speed on Tailscale basics, please read his!

Getting Inside the VPC

VPC means Virtual Private Cloud, so how do we get in there? Enabling all sort of “getting in there,” in a magically simple way, is Tailscale’s superpower. Still, AWS is pretty serious about the P word, and Tailscale did the work to make it really work.

Getting Tailscale running on an EC2 instance, bare cloud-iron that we can directly access and control, is just a quick install. Literally a minute if one can SSH in. In one minute, that EC2 instance will magically pop up on your private Tailnet. (Installing Tailscale on Mac, Windows, iOS, etc. is even easier.)

But … we’re living in the age of Serverless now. Even in our smallish infrastructure, the two main pillars of our setup (compute and database, ECS-Fargate and RDS Postgres) are serverless. What do you do when you don’t have bare iron, when you can’t just SSH in and install Tailscale? The key here is Tailscale’s subnet router, which isn’t even another piece of software, but rather an argument added to the Tailscale command when you start it up. You just need a tiny EC2 instance within each VPC running Tailscale with the subnet router option enabled, and now, every device connected the VPC becomes accessible, even though they aren’t even running the Tailscale client. (To be clear, Security Groups still apply so access is blocked by default.)

Subnet routing gets us access to our RDS cluster, both primary writer and read replica instances. And that (along with Tailscale’s robust security) meets two of our three current use cases: data team read replica access, and developer access for SQL tools. Bastion servers are gone, and DBeaver is 10X better connecting directly versus through Remote Desktop.

SSH, Solved

Our third use case is SSH. Good news: just by installing Tailscale on the servers that need SSH access, and the SSH users’ client devices, we have closed off the attack surface exposed by opening an SSH port to the Internet. As Tailscale’s docs state, we already have “the standard SSH experience without exposing your servers to the internet.” That’s a big, free win right there.

We’ve achieved this without even implementing the Tailscale SSH feature … so why does that feature even exist? It turns out there’s more to robustly securing SSH than simply blocking network access. There’s authentication—“who are you and how do I know you’re telling the truth about that?”—and authorization—“what’s this user allowed to do?”—namely are they allowed SSH access to this server?

We all know the pitfalls of username/password security, and organizations that are serious about security (or have compliance requirements) tend to instead apply digital certificates. To quote Tailscale’s docs:

Historically, to secure an SSH connection, you generate a keypair on the machine you are connecting from (known as the client), with the private key stored on the client, and the public key distributed to the device you want to connect to (known as the server). This lets the server authenticate communication from the client.

So every SSH user needs to establish a keypair with every server they SSH access to. When you’ve got more than a handful of SSH users and servers, this gets painful fast, and opens up its own attack surfaces, such as the case where an SSH-privileged user leaves the company. The pain level and security/compliance concerns here are often large enough to drive the implementation of PKI (Public Key Infrastructure) and KMS (Key Management System) solutions. In other words, a whole new set of systems just to get secure, compliant SSH.

This is where Tailscale SSH shines. Tailscale already knows, through its underlying digital certificate implementation, who its users and machines are. Tailscale SSH simply applies Tailscale’s own strong authentication and key management capabilities to SSH transactions, adding an SSH-specific Access Control List facility to cover authorization, e.g. “what’s this user allowed to do?” This eliminates the whole every-user-to-every-server keypair rat race while improving security and compliance. And there are many other benefits to Tailscale SSH, from automatic key rotation to the SSH session recording.

Tricks, Tips, and Learning Moments

A Dedicated Tailscale AWS Proof of Concept Environment Is Worth It

In an attempt to save time, I tried to do my Tailscale POC in our existing Staging environment. The predictable result: I actually spent more time chasing little glitches caused by unique aspects of the preexisting environment. Another result: one of my experimental changes managed to take Staging down. Eventually, I learned my lesson, went back and set up a full POC on its own VPC with its own RDS cluster and all. Getting Tailscale set up fully there was quick, and by getting that environment working, I learned what I needed to (carefully) reimplement Tailscale on Staging, and then Production.

Duplicate IPv4 CIDR Blocks on Different VPCs

We used Terraform to set up our initial Staging and Production environments. One side effect of this was that the IPv4 address ranges (CIDR blocks) were identical between the two environments. This all works just fine inside the private VPCs but when Tailscale needs to route traffic from the outside, it needs to know “which 10.0.1.23 do you mean, sir?” Tailscale’s subnet router includes an elegant fix for this that they call 4via6. We configured our Staging environment this way.

Bad Tailscale Advice from GPT-4

I adore and rely on paid ChatGPT for help and advice across many knowledge domains with great success. Its specific advice around the details of Tailscale setup on AWS, however, turned out to be faulty and sent me off in the wrong direction several times. I think this knowledge niche was small and specific enough that ChatGPT’s training data wasn’t up to the task. My own lack of knowledge probably had me asking the wrong questions, and prevented me from recognizing bad advice. Filed away as a caution for similar situations in the future …

Tailscale With Pihole and Unbound

I use Pi-hole on my home network, for network-level ad and tracker blocking; it’s awesome, as I always realize when I’m away from home and see what I’ve been “missing”!

I also have been using Unbound, a recursive DSN server, running beside Pi-hole on the same Raspberry Pi. Unbound is faster and more secure than using even one of the fast public DNS services like Cloudflare DNS. Something about how Unbound works, however, makes Tailscale—which also needs to be smart about DNS resolution—unhappy. This took quite a few hours to diagnose. I suspect it’s fixable, but haven’t had time to spend on it, and for now have Unbound disabled.

Tailscale Exit Nodes

Speaking of the poor Internet experience I get when away from home, Tailscale has a nice fix for that, Exit Nodes, which can allow me to always route my internet traffic through my well-protected home network. Another thing I haven’t had time to implement, but it’s on the list.

Russ Cox's XZ Timeline

Tue, 02 Apr 2024 14:08:52 -0500

Fascinating—and terrifying. Cox includes a solid list of Further Reading links at the end of his post.

Simon Willison's LLM Tool: Now I Have 50 LLMs

Mon, 25 Mar 2024 15:21:49 -0500

I’ve been intending to get my feet wet with the new Claude models from Anthropic, especially Opus. When I saw Simon’s recent “sidequests” post I decided to update to the latest version of his LLM CLI tool (part of Datasette) and get Opus access from the command line, using the Claude plugin.

While I was at it, I connected Google Gemini, Anyscale Endpoints, and gpt4all, the latter providing the abovementioned 15 local models, which auto-install when you first request a model. 50 LLMs in all—and this is far from the complete list LLM provides access to through its plugin architecture.

Very handy for testing, such as “how does Llama 70b compare to ChatGPT GPT-4 on this question?”

Two Worthwhile Reads From Simon Willison

Mon, 25 Mar 2024 08:42:20 -0500

I subscribe to Simon Willison’s email newsletter (really just selected blog posts via email), and his last two were excellent.

Last week he talked about his casual daily use of the new Claude 3 Opus model via his own LLM tool, as well as ChatGPT GPT-4. He describes such use as “sidequests,” a description that resonates with me and my own daily LLM use. In this example, over an elapsed time of about 6 minutes, he was able to get an accurate GeoJSON for New York State’s huge Adirondack Park starting with a deeply strange shapefile.

Over the weekend, he shared a longer prototyping effort using ChatGPT GPT-4 Code Interpreter, where we managed to get GPT-4 to write, compile, and run a non-trivial bit of C code: a SQLite C extension for vector similarity. Code Interpreter runs Python, not C, but Simon suspected and confirmed that he could get GPT-4 to run gcc from within Python, and that was enough of a shim to get the job done.

Both are excellent reads that mirror my own experience in terms of the myriad ways I take advantage of LLM tools.

Moxie on Murder

Mon, 11 Mar 2024 19:28:49 -0500

A Murder at the End of the World grew on me over the course of the season, and by seasons end it had reached “definitely watch” status. Not the highest possible status, “Can It Be Tuesday Already Please?” which is, at the moment, held by Hulu’s new Shōgun series.

I rarely watch credits; sometimes to savor a great season finale, sometimes if there’s interesting music I want to hear out. With episode 6, it was a combination: an intense episode finish, building towards the season finale in episode 7; and an excellent electric guitar instrumental, Beni Said by 75 Dollar Bill, that began playing with the credits. For whatever reason, I was still watching the credits by the time they reached CO-STARRING, 1:51 from the end of the episode. And there, listed first: Moxie Marlinspike.

Moxie’s a legend in tech as creator of the Signal secure messaging app—and more importantly to cryptographers, the Signal Protocol and Double Ratchet Algorithm—and when I saw his name I immediately was looking for “technical advisor” credits. But no, this was actually CO-STARRING. I filed it away and determined to research it.

The following day I did a bit of searching, with little to show for it other than IMDB stating that yes, Moxie had indeed appeared in S1 E6. I refined the query a bit and finally got found a few some 2023 mentions of Moxie technical advising as well.

I use Kagi search, and beyond the core fact that, in stark contrast with Google, Kagi doesn’t suck, I’ve come to value its “Quick Answer” feature, which feeds search results through an LLM and generates an excellent summary with references. Here it is, wrapping up both aspects of Moxie’s contribution to Murder at the End of the World:

So I was right about the technical advisor thing; hacking is a significant part of Murder’s story, and it was accurately presented throughout the series. The only major tech flub I saw was the data center fire scene in episode 7. Maybe Moxie’s advisory contract had run out by then, or maybe they had a mini-GOT-gotta-finish-this-somehow brain cramp. But no need to go there, the final episode was still solid despite painfully stretched credibility in that scene.

I’ve never seen Moxie in person or even on video; maybe in a Wikipedia photo. So I had no clue that was him playing Max in episode 6. But I’m happy I noticed him in the credits and did this research.

Willison: The killer app of Gemini Pro 1.5 is video

Thu, 22 Feb 2024 07:54:30 -0600

Willison’s experience with, and reaction to, Gemini 1.5 Pro extracting structured output from video prompts parallels my own experience using GPT-4 Vision to extract structure from heirloom recipe images (often handwritten and horribly mangled):

… I’m pretty astonished by this.

… I find those results pretty astounding.

The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.

Using Multimodal AI to Capture and Enrich Heirloom Recipes

Mon, 19 Feb 2024 05:17:47 -0600

OpenAI is constantly firehosing us with new capabilities. It’s tough just to keep up with the announcements, let alone deeply understand the new offerings at depth. The only way I know to develop this kind of deep understanding is to build something real and non-trivial.

I was fortunate to have a somewhat complex personal need, just as the new GPT-4 Vision model and updated Chat Completions API landed in my OpenAI account. The tale of my exploration follows.

Note—Click any image to zoom

Sept. 25th: “ChatGPT can now see, hear, and speak …”

OpenAI had teased vision support earlier, but the actual announcement of Vision availability happened on Sept. 25th under the above headline.

I’m not an OpenAI VIP, sorry. I’m just a paid ChatGPT Plus user and Tier-1-going-on-Tier-2 API user. I don’t get early access; I get late access.

So while OpenAI announced GPT-4 Vision on Sept 25th, it didn’t show up for me until late October or early November. OpenAI doesn’t communicate actual availability of features to its retail customers, and so all I know for sure is that I noticed Vision had showed up in my ChatGPT+ sometime in early November.

My burning need: family heirloom recipes

During our August vacation at the family lake house, we uncovered a treasure-trove of recipes, which my daughter-in-law Julie was kind enough to capture using the iOS Notes app and share in PDF format. On my to-do list since then had been figuring out how to get those recipes into a recipe app. I use Paprika 3, while Julie has been trying out Recipe Keeper.

What the family really needs is both a great facsimile of the original paper recipe (high resolution, correct orientation, zoomable), and a fully-digital recipe with all the goodies that apps like Paprika and Recipe Keeper give you—ingredient scaling, store lists, easy sharing, and so on.

Paprika has been around forever, and several years back I had tried importing legacy recipe images, but at that point in time I wasn’t able to get the “high resolution, correct orientation, zoomable” solution desired. So I was wondering if Julie had any luck with Recipe Keeper and I began poking around the current Paprika.

In the end it turned out that the current version of Paprika 3 has quite strong image support, while in limited testing, I had trouble getting Recipe Keep to handle image orientations correctly. So the recipe app I was already using looked to be the best solution. Decision made.

We still had a lot of work ahead of us, though—laboriously transcribing from those torn, stained, mostly handwritten family treasures. And this particular trove is probably only 20% of the full library; my brother has the larger collection, probably at least 3X as many, and my sister and wife have others. This was shaping up to be a long side project.

Nov. 14th: Recipe image => ChatGPT Plus with Vision

So when I discovered that my ChatGPT Plus account now had the “new stuff” including vision support, I had a ready-made test case for Vision.

I know for sure that it was November 14th when I began toying with it. My first (failed) Vision request was:

This first request caused GPT-4V to puke—PDFs aren’t considered an image format. I used macOS Preview to export a png file from another one my recipe PDFs and fed that to Vision. This was the input:

I fully expected an interesting failure. I’ve been doing technology for over four decades, which coincidentally is just about how long OCR—optical character recognition—has been available for personal use. And OCR has sucked for all of those decades. OCRing scans of printed documents is painful and error-prone. But with our legacy recipes, we faced much worse, like Margie Johnson’s Cheesecake above—typed with a worn ribbon, wrinkled, stained.

My expectations were low, but on the other hand, GPT-4 blows my mind several times a week, so what the heck, let’s see what happens.

OCR is dead, long live Vision

This was GPT-4V’s response to that image:

I was expecting “maybe a little better than your grandma’s OCR” but this was shockingly far beyond that. Not only did 4V get almost everything correct from the Margie recipe, worn ribbon, wrinkles, stains and everything, but GPT-4V understood it and organized it for me. For example, 4V correctly “gets it” that this recipe had three main elements—crust, filling, and topping—when the recipe never uses any of those words! And then helpfully organizes its output that way.

With traditional OCR, at best I’d get most of the words, in the order they appear on the page. But this recipe isn’t plain text that reads left-to-right, top-to-bottom. Take for example the initial crust block that uses the typist’s trick of using a column of “)” characters to create a column separator: ingredients in the left column, instructions in the right. OCR would just give me a useless left-to-right, top-to-bottom character stream; but GPT-4V understood what it was seeing and placed the directions below the ingredients in a structured manner. Holy shit.

“Guys? Guys? This has structure. I’m [seeing] structure.”

I love the scene in the movie Contact where Kent, the blind SETI researcher, says:

Guys? Guys? Um…

You know those interlaced frames that we thought were noise?

This has structure. I’m hearing structure.

Seeing that chat response, it wasn’t just that I was getting better text than old-school OCR. GPT-4V was giving me structure, without my even asking. No surprise that my next question was:

Is there a standard interchange format for recipes?

Oh ho, I see Paprika on the list!

Dec. 17th: Image => ChatGPT-4V => YAML => Paprika import

I explored the various file formats that Paprika could import. Most of these were specific to what other recipe apps could export, to enable migration onto Paprika. But Paprika also supports a more generalized import from YAML format, and provided a couple of helpful examples. That looked like my best option, and then began the work of convincing ChatGPT to take a recipe image, and instead of giving me semi-structured text output, to instead produce Paprika-flavored YAML. It took a bit of research and several tries, but eventually I was able to feed in an image and get Paprika-compatible recipe YAML back. Here’s the source image and YAML output:

And here’s what it looks like when imported into Paprika, iPad and iPhone versions:

Now, ChatGPT was doing most of the work—image in, importable YAML out. That left me light years ahead of the mess I had been facing—laborious manual transcription or parsing garbled OCR output.

Late December: Automating in Python

Despite the impressive gains compared with the time I would have spent manually transcribing or old-school OCRing, interactive-chatting my way through hundreds of recipes was going to be quite time-consuming. ChatGPT also has a tendency to “lose the thread” on long-running chats. Finally, though Paprika is capable of importing YAML files with an embedded BASE64 image, it was beyond ChatGPT’s capabilities to produce that kind of YAML output. As a result, I had to first import the generated YAML into Paprika, then manually add photos.

So automating the process was looking attractive—eliminate interactive sessions in favor of directly scripting calls to the OpenAI API.

I picked Python as a logical choice to script in, figuring it’d have libraries for anything needed, and because GPT-4 is very good at Python coding; I was not disappointed on either front. Unfortunately I didn’t have much choice about which OpenAI API I’d need to use. Back then—and still to this day—the only API that supports GPT-4 Vision is a feature-reduced Chat Completions API:

This limited Chat Completions API available with Vision models is fine for simply automating something you’ve been doing using the chat interface; but it’s significantly less powerful than the full-strength Chat Completions API available for text input, let alone the new Assistants API. More on this later.

In any case, off I went scripting Chat Completions from Python:

A few days before Christmas I had Python successfully scripting images => Chat Completions API => YAML output, and soon after added a BASE64-encoded image to the YAML output. I also added directory processing, so the script would run through an entire directory of image files and produce YAML-with-image output for each recipe, in a single script run.

Early January: From YAML import to JSON cloud sync

More than a few of our heirloom recipes had multiple images—for example the front and back of a recipe card—and while GPT-4 Vision handled multiple images perfectly, I never had any success getting Paprika to ingest more than one image from a YAML file.

In searching for a solution, I discovered a whole community of recipe hobbyists who loved the Paprika app and found ways to creatively use Paprika’s cloud sync API to build useful tools like a recipe exchanger that synced across multiple Paprika accounts. Paprika’s cloud sync is the native interface of the app itself, and so by definition supports all the app’s capabilities, including multiple photos per recipe, so it was able address the apparent limitations of YAML image importing. Publishing direct-to-cloud also saved the manual YAML import step, another key win.

I dug in and by New Years day had cloud sync operating, and had multiple photos loading into Paprika soon after:

Later January: JSON Mode to overcome structured output fragility

The Chat Completions API, paired with the gpt-4-vision-preview model, is wonderful in its multimodal-ness. Unfortunately, this pairing is significantly restricted compared to the same API paired with the latest gpt-4-turbo-preview model. The even-more-powerful Assistants API currently doesn’t support image input at all.

What appears to have happened is this. OpenAI wanted to get the GPT-4 Vision model, announced on Sept. 25th, into the field as early as possible. But OpenAI likewise wanted to get the significantly updated GPT-4 Turbo model with corresponding Chat Completions API updates, along with the all-new Assistants API, into developers’ hands ASAP also, and did so at the DevDay event on Nov. 6th. But full integration of Vision into these updated and new APIs wasn’t (and still isn’t) ready. OpenAI’s compromise solution is what we have at the moment.

Why do I care about the new capabilities launched at DevDay? My immediate problem is that the Chat Completions API with Vision isn’t 100% reliable at producing structured output like YAML or JSON. It usually works, but sometimes doesn’t, depending on the content fed to it. This leads to a constant prompt engineering cycle, with prompts constantly being tweaked to avoid output breakage.

At the DevDay event on Nov. 6th, OpenAI announced JSON Mode, where using the response_format parameter, you tell the model to “constrain its output to generate a syntactically correct JSON object.” That’s exactly the solution I need to get higher reliability structured output and greatly reduce the need for constant prompt engineering.

Longer term, I’m convinced that the other new features announced on Nov. 6th, including Function Calling and Tools, will open up a whole new range of possibilities in my recipe ingestion pipeline. Check out Max Woolf’s post Pushing ChatGPT’s Structured Data Support To Its Limits to get a feel for what’s possible with structured data, function calling, etc.

My interim solution is to use both models:

Use the limited Chat Completions API paired with the gpt-4-vision-preview model to extract the recipe content
Feed that content (text!) into the more powerful pairing of Chat Completions API paired with the gpt-4-turbo-preview model new API so I can take advantage of JSON Mode

I know, though, that I’m discarding a tremendous store of context by bridging between models using only text. As soon as OpenAI fully integrates Vision into its updated and new APIs, I’ll rid myself of this compromise and start looking at how I might apply function calling, tools, etc.

Enrichment

You may have noticed that the final recipes showing up in Paprika have been significantly enriched. Take this before-and-after example (click image to zoom):

This a tour de force of GPT-4V’s content extraction: it correctly parses both typewritten and handwritten content, and likewise recognizes that these are two separate recipes. I end up with JSON content ready to feed into Paprika.

But look closer—the original image doesn’t have any directions—how did that get into Paprika? And what’s this Description section? And those notes? That’s enrichment in action.

Whimsical enrichment

If you’ve spent time with ChatGPT, you know that it loves to be creative, to cheerfully, energetically overdeliver and over-explain. Early in my process here, back when I was doing everything through the chat interface, I decided to let the GPT use its imagination in the Notes section:

Within the notes section of this recipe, please add a paragraph “Recipe History:” and invent a mythical history of the origin of the recipe, inventing family members like great aunt Dora. Regenerate the recipe yaml with this added.

The GPT came back with:

This cherished recipe dates back to the early 20th century and was a specialty of Great Aunt Dora, a renowned baker in her small town. Legend has it that Dora won the heart of her town with this cake at the annual winter fair. She ingeniously added crushed peppermint candies from her childhood candy jar, which became her secret signature. The recipe was passed down with love through generations, each adding their touch but always keeping Dora’s spirited invention alive.

I provided some real historical data to help ChatGPT create better stories

Our family has strong Irish and Scottish roots so as you do history on recipes, factor that in as you do histories. Family names include mcguire, clark, stewart. No need to redo the above.

Before long I was getting back pretty entertaining results

The recipe is believed to have been created by Seamus’s great-aunt, Eilidh McGuire Stewart, who was known in her village for her bountiful kitchen and her generous heart. Eilidh often baked this cake during the harvest season, and it became synonymous with the warmth of her home. Legend has it that Robert Burns himself enjoyed a slice of Eilidh’s cake during a visit, praising its rich flavor and the tender hands that made it.

Alas, the whimsical enrichment got old after a while and I ended up removing it from my instructions. But the idea that ChatGPT could enrich the recipes stuck with me and has been a focus my most recent work.

Useful enrichment

I’m currently focused on practical, useful enrichments only. So far here’s what seems to be working:

Category assignment

This is a slam-dunk win. Categories are a feature I never got around to using in Paprika, and GPT-4 is quite good at picking them.

Instructions

I actually don’t prompt GPT-4 to create these—in fact I prompt significantly for it not to add content unless instructed—but it decided on its own, correctly, that this would be a useful addition in the filling recipes above. I haven’t tried the instructions out yet, but superficially they appear plausible and this should be well within GPT-4’s capabilities.

Description

Here I’m prompting for a brief essence-of-the-recipe summary, another strong GPT-4 skill. I’m still working to avoid hyperbole that creeps in sometimes.

Recipe Tips

This gives GPT-4 a chance to provide helpful tips, and a lot of them do seem to be useful.

Variations to Try

GPT-4 has, what, 10 million recipes in its training data? 100 million? It’s pretty good at suggesting interesting variations. And none of the horrifying recipe mistakes that we used to see in GPT-3 days.

Noted in the Pics

In my first processing pass using the gpt-4-vision-preview model, besides extracting the core recipe data, I ask for a description of the physical properties of the recipe based on the images provided. Then in the second pass with the gpt-4-turbo-preview model, I ask GPT-4 to comment on notable physical aspects. This isn’t working well yet, and I’m sure the fact that I’m forced to use a text-only bridge between the Vision and Turbo models is reducing the value of this enrichment.

Prompt engineering

Oh, the prompting I have done. This project, which still supports both YAML and JSON output, contains a total of 4 active and 7 inactive prompts. Common categories of prompt engineering for this project:

Structured output: Before JSON Mode, I did a lot of prompting to ensure the output was valid YAML or JSON.
Shut up and stop making things up: ChatGPT just loves to be helpful and creative. I’m ok with that, carefully controlled, where enrichment is needed, but without lots of guidance, I was losing the original recipe content, and instead getting made-up facts, “family” and “heirloom” seeming to trigger the GPT into fantasy.
Plain Old Prompt Engineering (POPE): A massive number of test-then-tweak-prompt iterations, just fine-tuning how I phrase what I want, tuning for better results.

All of this testing came at a cost—the tall green bars show days where I was actively testing, with daily API charges approaching $4:

In summary

In the meantime, I’ve got a great way to preserving our heirloom family recipes while enjoying the benefits of a modern recipe app.

Gruber Responds to Gurman's Report of AI Anxiety at Apple

Tue, 24 Oct 2023 19:18:34 -0500

In my July essay An Appllama Week in AI, I mentioned that Apple “might get pinched” because they move much slower than the AI space is moving.

In a recent Power On newsletter, Mark Gurman recently quoted “a person with knowledge of the matter” about internal anxiety at Apple regarding the generative AI space:

There’s a lot of anxiety about this and it’s considered a pretty big miss internally

Gruber’s take, especially given his access to birdie friends, rings true to me. Back in July, I quoted several comments on Hacker News that gave the same impression, this one in particular:

I work in the field and [Apple] just are not hiring the people they need to be hiring.

The Ballad of Bull Cook, A Wacky Book

Tue, 17 Oct 2023 08:41:35 -0500

I was fortunate to be able to spend three weeks at the family lake house in northwestern Michigan in August. My father passed away in 2009, but his imprint is still palpable there on the lake—nowhere more strongly than on the bookshelves.

My son John and his wife Julie were able to join us for one of our weeks. John’s an amazing, intuitive, creative cook. He’s also strongly attracted to Old Things, so, for example, he snagged and regularly wears most of my father’s outerwear including an enormous ’70s-era Eddie Bauer winter parka and a tired out, ugly-as-sin L.L. Bean fleece.

At the intersection of cooking and Old Things, John has an unsurprising fascination with legacy recipes, such as (grandfather) Papa Tom’s Peach Cobbler, which he’s tweaked to a state of perfection. He likewise loves heirloom kitchen gear, and again has snagged a number of family classics, some dating back to almost to WW2, including Papa Tom’s biscuit cutter and (great grandmother) Ma’s classic heavy roasting pot.

A Book Encounter

While browsing the lake house bookshelves, John stumbled across Bull Cook and Authentic Historical Recipes and Practices and was immediately hooked. And if you’re a creative cook who’s into Old Things, what’s not to love—it’s an insane collection of historical recipes and random factoids, itself written back in prehistoric 1960, with many recipes dating centuries further back. I can recall my father’s enthusiasm for the book, along with its author George Leonard Herter and his catalog business Herter’s.

As John paged through the physical Bull Cook, I opened my iPad to follow along online, confident I’d quickly locate some free PDF version online, because Bull Cook was 73 years past its first edition date, and moreover wasn’t a “real” book anyway, but rather was self-published by Herter. As you might suspect from its unique cover:

Surprise! I did not quickly locate Bull Cook online. Eventually, thanks to the Internet Archive’s Open Library, I was able to read the book online, in an awkward web UI, by checking out the one available virtual copy, one hour at a time.

I began pulling on the thread of Bull Cook and its strange unavailability; much unravelling ensued.

Herter’s, The Business

In 1937, George Herter launched his disruptive startup—a mail-order outdoor-sports business—in the spare rooms over his father’s Waseca, Minnesota dry goods store. Disruptive? Well, yes. Sears had pioneered the mail-order catalog business almost 40 years earlier, but Herter had the insight that the same concept might work in a much smaller niche market that was, at the time, served by mom-and-pop storefronts. In 1937, mail order was as innovative and disruptive as e-commerce in 1999. So, yes, a disruptive startup it was.

Over the following 40+ years, Herter’s came to dominate its fishing-hunting-outdoor niche, first with mail order, and later, outlet stores; the same space and strategy that today is the domain of Cabela’s and its parent Bass Pro Shops.

Herter’s and Cabela’s overlapped by 20-odd years, with Cabela’s starting up in 1961, and Herter’s filing for bankruptcy in 1981. Herter’s assets ended up with Cabela’s through the liquidation process.

Self-Publishing

Pre-internet, every successful mail order business was also a publishing business, because—catalogs. At its peak, for example, Sears printed 315 million copies of its main catalog every year, along with millions of additional copies of specialty catalogs such as its Christmas “Wish Book.” That’s the book sales equivalent of 20 blockbusters, or 1,500 average bestsellers, according to my research assistant GPT-4.

Herter’s was smaller, but still a significant publisher, with regular printing runs of 400K to 500K copies for the often 600+ page Herter’s Catalog—equivalent to a couple of juicy bestsellers annually. In fact, according to Paul Collins’ 2008 New York Times article The Oddball Know-It-All, Herter’s catalog printer Brown Printing, also located in Waseca, MN, grew with Herter’s to become “one of the country’s largest commercial printers.”

All this to say: self-publishing Bull Cook wasn’t any kind of an obstacle for George Herter.

What’s a Bull Cook?

To be honest, I was confused by the book’s title, partly because I was unfamiliar with the term “Bull Cook”—I thought it might be a person—and partly because, even with an understanding of what a bull cook is, the title still doesn’t exactly make sense. Paul Collins describes Herter’s writing has as having “the artless charm of a confused book report” and that confused thinking seems to have found its way into Bull Cook’s title as well.

I eventually got around to asking my preferred search companion Kagi “What’s a bull cook?” and got an immediate answer. Bull cook is a logging term—makes sense given Herter’s Minnesota roots—and it means, according to Merriam Webster:

bull cook (noun)

: a handyman in a camp (as of loggers)

especially : one who does caretaking chores and acts as cook’s helper

Etymology

so called from his job of caring for oxen once used in logging camps

One of the gentlemen below is supposedly a bull cook. My money’s on washtub left or washtub right.

Wacky Author, Wacky Subject Matter, Good Recipes

I’ll follow Paul Collins’ lead and quote from the third paragraph to give you a feel for Bull Cook’s subject matter:

For your convenience, I will start with meats, fish, eggs, soup and sauces, sandwiches, vegetables, the art of French frying, desserts, how to dress game, how to properly sharpen a knife, how to make wines and beer, what to do in case of hydrogen or cobalt bomb attack. Keeping as much in alphabetical order as possible.

Nothing like mixing in a little nuclear attack drama to spice up the intro page. You can jump to page 206 for that advice around hydrogen or cobalt bomb attacks—presented in the context of preserving tomatoes. Herter’s range is demonstrated in a random selection of pages: 42 (How to Prepare Javelina) and 223 (Fisherman’s Wharf French bread).

The book’s very first recipe explains how to properly “corn” meats as in corned beef, and here Herter displays his gift of bombast:

Although some cook books and food editors of magazines from time to time publish recipes for corning meat these recipes are not even close to the real one. This is the first time the real authentic recipe for corning meat has ever been published.

One commenter quipped of the voluminous Herter’s catalogs that “if all the BS was out of them they’d only be a half dozen pages long,” and the same perhaps might be said of Bull Cook. But the recipes are apparently real and good according to those that have made them; I saw nothing ingredient- or procedure-wise that would make me feel otherwise.

Copyright Status: Murky

So Herter’s died as a business in 1981; George Leonard Herter as a human in 1994; so why are Bull Cook and the rest of George Leonard’s writings not living on in the public domain? Who in the world would care?

Below is the copyright page of the 12th edition; it’s clearly in George and wife Berthe’s name, as opposed to copyrighted by the business.

But the book’s actual copyright status today remains murky. Copyright, with a deceased author, an initial publication date before 1978, and a possibly related bankruptcy, makes a messy mix. These days, copyright is automatically extended 70 years past the author’s death. But pre-1978, an initial copyright provided 28 years of protection, and the author was required to register the copyright to get extended protection. In my light and not necessarily expert research, I didn’t turn up any such registration for the initial volume of Bull Cook (apparently there were as many as four volumes). I don’t know whether Herter himself went bankrupt, or just his business; whether he assigned rights to his books, potentially in his will; what do family members know about all of this; and so on.

One could summarize the gist of U.S. copyright laws as guilty (of copyright infringement) until proven innocent. With so many unknowns here, Bull Cook seems to be stuck in copyright purgatory, and our best option to consume it is one hour at a time on the Internet Archive.

But Wait, It’s on (sc)Amazon!

And here we arrive at another surprise turn of events. During my initial searches, I had gotten hits for Bull Cook on Amazon. When I clicked through, though, it didn’t look like the same book—the cover was completely different and totally sketchy, check it out:

At the time, I wasn’t interested in a sketchy and expensive print edition so I moved on. But in researching the copyright question, I circled back to Bull Cook’s Amazon listings, just to see what I might learn about the book’s copyright status. I found that Bull Cook on Amazon was published in 2012 (!!) by a company called Literary Licensing, LLC, of Whitefish, Montana; and is sold by Amazon themselves. Literary appears to be in the reprinting / print-on-demand business. That category itself is a scam magnet (e.g. ultra-expensive, poor-quality photocopied textbooks), but Literary might be in a scam-class of their own. Here’s their website, where they proudly proclaim themselves to be a “Publisher of Fine Books”:

Note that I didn’t say “home page” of Literary Licensing, because—that one screen is the whole damn website. It does have a search bar, but it just forwards you to an Amazon search for the term you enter, filtered for Literary’s titles. Furthermore, I’m not sure what the tropical beach scene has to do with Whitefish or fine books. Oh, wait, that’s an Unsplash image. OH WAIT this is a Squarespace site. I’m guessing Literary invested at least 15 minutes creating their home on the Internet, maybe even 20. Not confidence-inspiring.

Publisher of Fine Books

It appears, based on what I’m seeing and reading—and I don’t know for sure, just sharing my intuition here—that Literary Licensing lives somewhere in the spectrum from unethical-but-mostly-legal through totally-a-scam. For example, check out this series of posts relating to Literary Licensing / Kessinger Publishing. I haven’t dug deep, but something smells here. Like a putrefying raccoon carcass.

So what might be going on in our case with the Bull Cook reprint? My guess is, somehow Bull Cook showed up on Literary’s radar as “there might be some demand for a reprint of this out-of-print book,” perhaps from a signal such as used copies selling at a premium. Literary might then create a premium-priced listing on Amazon, and start taking orders. When orders roll in, Literary could then scan or find a scan of the book, slap their lovely cover on it, and fulfill via print-on-demand. Here’s a review of another Literary-published book on Amazon—the buyer is none too pleased at how much they paid for a poor photocopy:

Where’s the “check copyright, secure a license for the materials” etc. step here? Based on what’s I’ve seen reported about other books published by Literary, I suspect Literary might just skip that step, and instead just go ahead and list, sell, print, figuring that nobody will notice or care. If you’re a big fish in a small place like Whitefish, MT, you might be really tough to successfully sue.

There appears to be a related scam where a fairly current, somewhat popular book lacks one of Amazon’s formats, such as “hardcover,” and Literary appears to step in and list a poor-quality, overpriced reprint to fill that gap. In this case, you would think that the rightsholder gets compensated somehow … but you never know. I don’t understand Amazon’s rules around this, but clearly scammers seem to have found a way to thrive here.

Now get this—Literary Licensing, LLC appears to have 50,000 listings on Amazon. The scale is pretty staggering here, reminding me of the massive fake mobile app scams on the Play Store and App Store. There’s also a strong smell of alt-right-wacko in the selection of titles, possibly another Literary characteristic.

Amazon usually keeps scammers at arms-length, just fulfilling orders (“Ships from”) but not taking on the liability of being the seller of record (“Sold by”). So in this case, if Literary is in fact a scammer, it seems like Amazon is taking on liability for that scam.

What Stinks Here

That Bull Cook and other George Leonard Herter books aren’t simply out there, public domain, easily accessible.
The U.S.’s overly-protective copyright laws—the same ones that the Google Books project is still mired in.
Fake re-publisher/scammers taking advantage of public domain or inadequately protected copyrighted material
Amazon looking the other way, as they are known to do elsewhere when it suits their purposes

What Doesn’t Stink Here

Bull Cook’s fun recipes, authentic outdoor guide wisdom, crazy stories, and hilarious questionable facts
Organizations like the Internet Archive that work to preserve and disseminate interesting content like Bull Cook

I’ve pulled enough threads on this one, time to finish up. I enjoyed the journey and intend to try out some Bull Cook recipes soon.

Cloudflare CAPTCHA Hell

Tue, 22 Aug 2023 14:23:22 -0500

I run Pi-hole on a Raspberry Pi device on our home network, eliminating most ads and trackers for every member of the household. Recently, though, I’ve experienced an increasing number of cases where I get CAPTCHA challenges when I shouldn’t. Sometimes I have to go through multiple challenges, and occasionally run into infinite CAPTCHA loops where I’m unable to get through at all. I was worried that Pi-hole might be causing the issues.

Today I finally hit the breaking point and dug into what was going on. This Hacker News comment thread led to the culprit, which turned out to be Cloudflare. I had been relying on Cloudflare DNS as my upstream DNS resolver for Pi-hole, and Cloudflare was the party triggering the CAPTCHAs. Switching to OpenDNS fixed the issue.

I love Cloudflare Pages as my hosting solution for craftycto.com, but I’m getting concerned about Cloudflare in general—they seem to be moving into monopolistic, enshittification / extraction stage. That would be my cue to exit.

Gruber Translates Linda Yaccarino's Company-wide Memo on the X Rebrand

Fri, 28 Jul 2023 21:44:06 -0500

A few prime translations:

It’s OK to day-drink. I am.

I used to run all advertising for NBCUniversal. Now I’m running an $8/month multi-level marketing scheme where the only users who’ve signed up are men who own a collection of MAGA hats.

As if this rebranding disaster isn’t enough, our infrastructure is crumbling. Facebook took Threads from 0 to 100 million users in under a week, without a hitch, at the same time we imposed comical rate limits on usage. I mean can you even believe that shit? I still can’t. I said to him, “Elon, we are an ad-based business. Our revenue is directly commensurate to usage. This is like running a casino and turning the slot machines off to save on the electricity bill. It makes no sense.” And Elon was like “Bots!”

Please quit. Get out. I beg you. Leave while you can put on your resume that you worked for “Twitter”.

Twitter and Its Successor States

Fri, 28 Jul 2023 21:18:44 -0500

I love the tone of this article, a true pleasure to read—highly recommended. A couple of great lines:

I mean it’s really funny how a bunch of very rich dudes are breaking stuff online and making worse versions of existing products in response to said breakage. Obviously it is quite amusing to be shown how stupid rich people are, and I very much enjoy it.

This is much like Elon Musk, as are dogs who chase cars and catch them, presumably. After being tricked into paying way more money for Twitter than it is worth he had to figure out some way to keep it ticking over and also to service all the debt that he took on to buy it. So because he is, and I cannot stress this enough, really quite stupid, he began to strip out all the load bearing members of staff who did stuff like not make the website DDOS itself.

The lead plate is fun too, “Vengeance of Clovis”:

Ethan Mollick: "How to Use AI to Do Stuff"

Wed, 26 Jul 2023 18:45:13 -0500

I’ve loved a lot of New Microsoft (GitHub, VS Code) and since I’m on macOS + Linux, and have avoided installing Microsoft apps, hadn’t been exposed to that side of the company for years.

Recently, though, I’ve installed Office (client driven need), Teams (hoping the native Mac app would be better than running Teams in a browser), and Edge (to work with Bing Chat). The experience ranged from “ughhh” on the positive end to “UNINSTALL NOW” on the negative.

I can confidently report: the old M$ DNA is alive and well.

OpenAI Concedes: AI Can't Detect AI

Wed, 26 Jul 2023 08:07:11 -0500

OpenAI was one of many who attempted to build an AI tool to detect whether a given document was AI generated.

Announced in a blog post on January 31st, OpenAI quietly pulled the plug a few days ago, not with a new announcement but rather by adding a note on the top of that January post:

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.

An Appllama Week in AI

Fri, 21 Jul 2023 19:24:18 -0500

This past week brought significant AI news; in fact, it’s feeling like the rate of change in the generative AI space is notching back up again. I’ll focus in two areas here: Llama 2 and Apple LLM rumors.

Meta AI Releases Llama 2

Meta’s Llama 2 release is a seminal event, given that (a) Meta’s initial LLaMA 1 release spawned AI March Madness, (b) Llama 2 is substantially more capable than the first LLaMA, and (c) this time Meta is allowing as opposed to blocking commercial use. That is, unless you’re Apple, Snap, TikTok, and a handful of others—more on this below.

Last time around, Meta announced LLaMA but made the full details—most importantly, model weights—available only to a limited set of researchers. But someone leaked those weights within days of the announcement, and that led to an explosion of innovation among independent researcher-hobbyists. This time, I suppose Meta learned the lesson and preempted the leak.

Another innovation explosion appears to be underway; already Llama 2 derived models are near the top of the HuggingFace Open LLM Leaderboard, and new Llama stories are still on the first page of Hacker News, such as this LLama 2 pure C implementation that will run on laptops, including Apple M1 series MacBooks.

Meta releasing Llama 2 fairly openly is a very good thing for open LLM research and progress, but don’t be fooled into thinking that Meta’s doing something altruistic here—NOT. Meta’s looking out for Mark and Meta, only—this is a commoditize your competition play. But still a very good thing.

Bloomberg’s Mark Gurman “Scoops” Apple GPT aka Ajax

Mark Gurman on Wednesday “broke” the story that Apple is quietly active in the LLM space and working on a GPT style chatbot. I use airquotes around “scoop” and “broke” because everyone with any historical Apple knowledge knows that Apple always has a team working secretly on everything that Apple should obviously be working on, obviously. And there’s nothing in the universe more obvious than this one.

John Gruber, in his commentary on the Gurman piece, educated me on the Bloomberg News / Bloomberg Terminal pump-scam, which I hadn’t been aware of:

Apple’s brief 2.7 percent jump and Microsoft’s smaller but still-significant drop, both at 12:04pm, were clearly caused by Gurman’s report. Bloomberg Terminal subscribers get such reports before anyone else. (Bloomberg employees, of course, know such information before it’s published, but I’m sure never do anything untoward with it.) Once you view Bloomberg’s original reporting through this prism — that most of their original reporting is delivered with the goal of moving the stock prices of the companies they’re reporting on, for the purpose of proving the value of a Bloomberg Terminal’s hefty subscription fee1 to day-trading gamblers — a lot of their seemingly inexplicable stylistic quirks don’t seem so inexplicable any more. They just seem a little gross.

Yeah, that is SlimyWorld all the way. Not surprising, though, the last 8-odd years have lifted the curtain on a lot of formerly hidden beliefs and behaviors; this fits right in, and is brought to us courtesy of the same fabulous crowd.

Anyway, Gurman’s article didn’t contain much actual news; just that the project’s codename might be Ajax and that some people are calling the chat interface Apple GPT.

Intersection: Apple LLM Chatter on Hacker News

Big news like Llama 2 get immediately posted on YCombinator’s Hacker News, and Llama 2 was near the top of the activity-ranked feed for several days, with 819 comments. Meta released Llama 2 under a license that’s much more commercial friendly than the original LLaMA; let’s call it commercial-friendly-unless-you’re-huge because it contains a clause that triggers a need to secure a license only if the organization using Llama 2 has more than 700 million monthly active users. This constraint impacts a very short list of companies, Apple among them.

That led to chatter on Hacker News around Apple’s place in the LLM / GPT space. Here’s a sampling:

(stu2b50) I think more Apple [that this constraint blocks]. It’s not like Google or Microsoft would want to use LLaMA when they have fully capable models themselves. I wouldn’t be surprised if Amazon does as well …Apple is the big laggard in terms of big tech and complex neural network models.

(samwillis) Apple would absolutely not want to use a competitors, or any other, public LLM. They want to own the whole stack, and will want to have their own secret source as part of it. It’s not like they don’t have the capital to invest in training…

(NotAFood) Apple has shown time and time again that they have the human capital and money to tackle massive projects discretely. It’s already fairly well known that Apple’s NLP experts from Siri have been reallocated to some secret project. They are more than capable of training an LLM but given their track record in other segments they probably want to wait for the technology to become more “polished” and give less hallucinated answers. They are likely also want the LLM to work locally (at least partially) on their devices using the Neural Engine which adds further engineering complexity to their project. They could even be timing the LLM’s launch around a hardware release capable of running the model (M3, M4, etc…).

(amelius) Apple only has to slightly open their wallet to become a DL superpower.

(yellow_postit) Apple is a complete laggard in this space due to years of restrictions on research. They are hiring multiple “AI” roles now and they have the capital and focus to “eventually” catch up — but it is very much a catch-up game … That said, they seem to prefer catchup waiting till others explore new tech they swoop in an (claim) to perfect it from a usability pov. I have no reason to suspect they won’t do the same here.

(whimsicalism) I work in the field and they just are not hiring the people they need to be hiring.

My Take on Apple in LLM/GPT Land

Of course Apple is working on something LLM.
- This doesn’t mean the something will ever see the light of day.
- We won’t hear about it until it’s polished and ready.
Despite massive support for, and applications of, machine learning throughout their software+hardware ecosystem, Apple was blindsided by the generative AI explosion and is playing catch-up.
- The comments around Apple not hiring the people they need to be hiring are probably a fair assessment, a symptom of Apple’s blindsided-catching-up current state.
Apple is uniquely positioned to do some really good things in this space, once they get their shit together.
Apple will indeed get their LLM shit together to some reasonable degre—in Apple Time.
Apple Time moves far slower than AI Time. Apple might get pinched, except where they’re uniquely positioned.

Did GPT-4 Code Interpreter Escape From Its Sandbox?

Fri, 14 Jul 2023 19:21:46 -0500

UPDATE: This was hallucination, not sandbox escape

I talked through what happened with Simon Willison, and he convinced me that this was actually an interesting flavor of hallucination. Specifically, GPT-4 hallucinated a connection between the files I had uploaded, and documents in its training corpus, based on the names of the uploaded files. The same evidence I cited below, that the mistakenly-associated content was authored by a name similar to the filename / author of the document that I uploaded, works as an explanation here. As I mentioned to Simon:

It’s funny to watch [GPT-4] shift in and out of reality as the pull of hallucination ebbs and flows. Interesting also that it didn’t hallucinate the content (in two cases at least); but rather that “this file actually contains that content (from my training corpus)” …

Until this morning, I hadn’t thrown a real task at GPT-4 Code Interpreter, so I decided to see how it would perform at intelligently combining several proposal template documents into one (link to transcript). The answer: it didn’t perform all that well; but I have ideas on how to guide it better next time. But that’s not the story here.

I uploaded 4 PDF files, each of which was a different proposal template, and asked it to effectively merge them, putting like sections from different documents together. It actually made some progress at this, and clearly the ability to feed it files as input opens up a huge range of new applications.

Perhaps a third of the way into a long chatting session, however, Code Interpreter appeared to cross-link its mental pointers for my four proposal files with someone or something else’s documents. It certainly appears that Code Interpreter was outside my session sandbox.

I actually got linked to German language content for one of the documents briefly, and after a bit CI stabilized around linked my supposed uploads to four ML / AI research space documents.

I did a bit of searching on Kagi and in two of the four cases, found the source content on research oriented websites. I was unable to locate the other two sources, which might make sense as they looked like snippets from actual research papers. I didn’t search exhaustively or try Google Research or its ilk.

So where did these documents come from? The best I can say is, “definitely not my sandbox.” The main possibilities that come to mind for me are:

Someone else’s sandbox (a scary thought)
Some content embedded in the model (no clue how that might happen)
Something that gets installed in the sandbox as sample data, e.g. as part of some Python library

And how did Code Interpreter get its wires crossed? A real Python person might have a clue but all I can do is speculate. Most promising idea: a name collision in a shared scratchpad area (e.g. an S3 bucket). Why do I think that? I saw a case or two where the content seemed to be authored by a name similar to the name / author of the document that I uploaded. If true, this could support case (1) above … not good. (See update above—this appears to be “hallucination based on file name” …)

In any case—very interesting …

Just-released GPT-4 Code Interpreter is a Big Deal, Part 2

Tue, 11 Jul 2023 18:57:19 -0500

I think everyone on the podcast was in danger of hyperventilating. Many interesting facts and hacks came to light. The show notes are a novella and worth reading as swyx recaps exhaustively and does a useful executive summary.

Just-released GPT-4 Code Interpreter is a Big Deal, Part 1

Tue, 11 Jul 2023 18:05:15 -0500

Mollick has had beta access to Code Interpreter over the past several months and his introduction captures the broader implications of giving an LLM the ability to write Python code. He also points out that its ability to upload and download files (uploads are up to 100MB) is itself transformational, compared with the current, limited copy-and-paste-text approach. Mollick is not a Python developer, but you’ll be amazed at what he’s able produce with Code Intepreter.

ChatGPT+ Privacy Update

Mon, 10 Jul 2023 10:35:44 -0500

Recently, I wrote a short essay about an unpleasant discovery I made with ChatGPT’s dark pattern whereby, if you want to turn off “train your models using my chats,” you need to disable one of ChatGPT’s nicest features, chat history. Here’s the dialog in question:

The update is that there may be a workaround. On the Data Controls FAQ, buried two screen-scrolls down, there’s a section that says:

What if I want to keep my history on but disable model training?

We are working on a new offering called ChatGPT Business that will opt end-users out of model training by default. In the meantime, you can opt out from our use of your data to improve our services by filling out this form. Once you submit the form, new conversations will not be used to train our models.

The takeaways:

OpenAI seems to understand that business users will want chat history, but definitely not have sensitive data shared in chats be used to train OpenAI’s models; hence they are working on a “ChatGPT Business” offering.
There appears to be a workaround today by filling out the form. Ironically, the form is a … Google Form.

We’ll have to wait and see whether OpenAI will close this loophole for non-business users once they launch ChatGPT Business. This announcement hints that they might.

In any case, OpenAI still gets four weasels.

"Commoditizing the Petaflop" with George Hotz of the tiny corp

Sun, 09 Jul 2023 19:37:10 -0500

George Hotz moves at about 1000 mph and in multiple directions simultaneously, which makes for an enjoyable podcast. Swyx’s show summary:

How tiny is taking on Nvidia, Google, and PyTorch, building in public with AMD, hot takes on ggml, Mojo, Elon, e/acc, and GPT-4, and why AI Girlfriend is next.

A few additional links:

the tiny corp where Hotz is working on:
- tinygrad, the fastest growing neural network framework (over 9000 GitHub stars)
- The tinybox, a 1600W (one 120V outlet), 738 FP16 TFLOPS personal AI server.
the singularity is nearer, Hotz’ blog

Cthulhu taking a smoke break on the set of "Lovecraft Country"

Sun, 09 Jul 2023 07:20:10 -0500

This one's mine. Inspired by another user on r/midjourney doing an image based on "Lovecrafting" ...

n-gons: Intersections

Sat, 08 Jul 2023 20:54:10 -0500

Apparently also built using Girih macOS app.

Really, OpenAI?

Fri, 07 Jul 2023 13:32:37 -0500

(Note: See this update for a possible workaround.)

In a conversation with a venture partner yesterday, I was talking up the value that ChatGPT can bring to knowledge workers. He responded that their firm had a policy against putting any confidential data into ChatGPT. My reaction was, that applies to the free version and / or the old days, that with the advent of paid subscriptions, OpenAI no longer trained their models using inputs from subscribers.

But as I was digging around my ChatGPT+ settings just now, I made an unpleasant discovery, one that reminded me of a scene from the first Independence Day movie involving the sleazy Chief of Staff Nimziki:

			MOISHE
	Don't tell him to shut up!  You'd
	all be dead, were it not for my
	David.  You didn't do anything to
	prevent this!

As everyone is about to besiege Moishe, the President tries to
calm him down.

			PRESIDENT
	Sir, there wasn't much more we
	could have done.  We were totally
	unprepared for this.

			MOISHE
	Don't give me unprepared!  Since
	nineteen fifty whatever you guys
	have had that space ship, the
	thing you found in New Mexico.

			DAVID
		(embarrassed)
	Dad, please...

			MOISHE
	What was it, Roswell?  You had the
	space ship, the bodies, everything
	locked up in a bunker, the what is
	it, Area fifty one.  That's it!
	Area fifty one.  You knew and you
	didn't do nothing!

For the first time in a long time, President Whitmore smiles.

			PRESIDENT
	Regardless of what the tabloids
	have said, there were never any
	spacecraft recovered by the
	government.  Take my word for it,
	there is no Area 51 and no
	recovered space ship.

Chief of Staff Nimziki suddenly clears his throat.

			NIMZIKI
	Uh, excuse me, Mr. President, but
	that's not entirely accurate.

Here’s my imagined script from the OpenAI movie featuring character Sam Altman:

For the first time in a long time, the ChatGPT+ user smiles.

			CHATGPT+ SUBSCRIBER
	Regardless of what the tabloids
	have said, OpenAI doesn't train
	using its paid subscribers' chats.
	Take my word for it, my chats
	aren't ending up in OpenAI's model.

Chief of Venture Altman suddenly clears his throat.

			ALTMAN
	Uh, excuse me, Mr. Subscriber, but
	that's not entirely accurate.

OpenAI’s weasel move? Technically, ChatGPT+ does allow subscribers to turn off train-the-model-using-my-data. But only if the subscriber turns off one of ChatGPT+’s most valuable features, Chat history, which gives total recall of all past chats, allowing you to continue the right conversation right where you left off.

So sure, technically you can preserve privacy. All you have to do is turn off a feature which I’d argue is ChatGPT+’s most valuable addition to basic chatting. Here’s the Settings dialog:

Here’s the More Information page:

Despite OpenAI’s words that make it seem like “we had to do this!”, there’s absolutely no technical justification. This is not a feature that requires model training. This is just OpenAI preserving the ability to say “Subscribers can easily turn that off!” while guaranteeing that most won’t.

Shame on you, OpenAI. And by the way, you’re shooting yourself in the foot. What a great incentive to find an alternative that doesn’t force this ridiculous choice on their paying customers. I’m already mulling workarounds.

How to Ask ChatGPT a Technical Question (BoorishBears on Hacker News)

Fri, 07 Jul 2023 07:45:32 -0500

This was in the context of a link to someone training / fine-tuning an LLM for the full corpus of AWS documentation. Many pointed out that ChatGPT probably does it better; lots of whining about “but ChatGPT gives me wrong answers, boo hoo hoo,” and then this gem. This gets it exactly right: one needs to skillfully guide ChatGPT and the results will be 10X better.

The full comment includes an example:

BoorishBears 10 hours ago | parent | context | favorite | on: AI for AWS Documentation

Any time you ask ChatGPT a technical question on something you’re not familiar with it should be three parts:

What are the considerations?

Implement it with the considerations

Did we forget anything?

It understands the need for a proxy from step 1:

https://chat.openai.com/share/7ca37130-a771-457c-8742-a5f941…

The Lone Banana Problem

Wed, 05 Jul 2023 07:52:12 -0500

The example here is trying to get Midjourney to produce an image of a single banana—as opposed to a bunch of two or more. I’ve hit this class of problem many times in my use of Midjourney, and as the article points out, subtle differences in the prompt can lead to success. A Hacker News commenter pointed out another case, “Three cats in a trenchcoat standing on each other’s shoulders, pretending to be a human, Vincent Adultman style,” which is also practically impossible to get out of Midjourney.

Another commenter points out that the language models front-ending tools like Midjourney are small and quite limited compared to LLM models like LLaMA, let alone GPT-4, and points to this paper where a larger model enhances prompt understanding.

Pagefind From CloudCannon

Mon, 03 Jul 2023 20:48:21 -0500

After fixing my image processing woes, I decided to tackle site search. I actually have enough content between larger essays and smaller micro formats that I’m having trouble finding my own stuff.

I surveyed the field, which included local JS options such as lunr.js and fuse.js, as well as external options like Algolia. Then I decided to check what Bryce Wray uses. It turns out that Bryce uses Pagefind from CloudCannon, and he wrote extensively on why. Sold!

Pagefind is designed for Static Site Generators (SSGs) like Hugo. The way it works is, you first run the normal Hugo build, which generates all of your site’s static HTML; then you follow with the Pagefind build, which indexes that HTML content. At runtime, Pagefind provides some simple JS scripts to hit the index and provide results. It was a straightforward to set it up and then fine-tune the index a bit to get only the content I wanted to show up when searching. You can see it in action here.

Hugo Responsive Images, Thanks to Bryce and ChatGPT

Sat, 01 Jul 2023 20:19:51 -0500

I’d been using a community Hugo module that supposedly enabled responsive images. I discovered recently that it was always serving up the lowest resolution images being served up.

Active Hugo community member Bryce Wray has written about Hugo responsive images. So yesterday I made an attempt to implement his approach and was 75% there when I discovered that it was, ironically, always serving up the highest resolution images. (My problem, not Bryce’s).

Today I learned how all the subelements and attributes of the html picture element actually work, and have my own image shortcode and underlying partial that makes it all work deliciously. The approach is Bryce’s, adapted for my special needs, primarily the need to support 1 to N images; multiple images render horizontally such as:

Takeways:

Hugo’s image processing is really powerful. The three source images above were pngs downloaded right from Midjourney, never touching Pixelmator Pro. They are 1456px wide and weigh in at 2MB each. But thanks to Hugo, you’re likely seeing a 640px webp version—24KB! (The webp image format gets a lot of credit also.)
ChatGPT (GPT-4 model) continues to be incredibly powerful as my pair programmer. For example, when I needed to really understand why my picture output wasn’t doing what it should, it gave very accurate guidance. I’d guess that my speed was at least doubled on this little project.
Midjourney 5.2’s zoom out function is a lot of fun. Here’s the right image above at 2x and 4x zoom out:

Of Moats and Moat Busters

Thu, 29 Jun 2023 00:39:33 -0500

I did a micro-post recently, MosaicML’s Open Source MPT-7B Model Writes an Epilogue to The Great Gatsby. I was a bit puzzled, though, that MPT-7B was showing up as current news, since its release happened in May. I started pulling on that thread and it unravelled interestingly.

MosaicML MPT-30B and MPT-7B Models

First off—it wasn’t the MPT-7B model that was the news over the past week, it was MPT-30B, announced June 22nd. 30 stands for 30 billion parameters, up from 7 billion in the MPT-7B model, with a corresponding increase in capabilities. This chart from MosaicML compares the two models’ capabilities:

MPT-30B isn’t benchmarked completely on the LMSYS leaderboard, but 7B is there and competitive.

MosaicML’s Revenue Model Isn’t Models

Counterintuitively, although MPT-30B and 7B come from MosaicML, models aren’t the product, but rather enablers of Mosaic’s actual products—training and inference services. Here’s how Mosaic describes themselves on their homepage:

Generative AI for All
Easily train and deploy generative AI models on your data, in your secure environment.
Build your next model / transformation / disruption / innovation.

And when you look under the Products menu, you see:

Products / Training
Products / Inference

And finally, the language of Mosaic’s announcement post for MPT-30B is telling as well:

[Mosaic Logo] Foundation Series
MPT-30B
Open-source LLM.
Commercially licensed.
More Powerful.

MPT-30B and 7B are fully open source, ready-to use LLMs—pre-trained, fine-tunable, inference-ready, open training data, the works. They build upon the solid foundation of open source / open data projects that come out of a vibrant AI open source community.

Hmmm, this seems familiar, where have we heard about the power of the AI open source community recently … oh yes, the leaked Google document! Re-reading it again today, it’s as insightful as ever and worth another look.

The Leaked Google Memo

On May 4th, SemiAnalysis, a boutique semiconductor research and consulting firm, published a leaked internal document authored by a Google researcher. It has a powerful lead paragraph:

We Have No Moat

And neither does OpenAI

We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?

But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.

I’m talking, of course, about open source. Plainly put, they are lapping us. Things we consider “major open problems” are solved and in people’s hands today.

Both Google, and comparative upstart OpenAI, had strategies based around closed models, proprietary datasets, and eight-to-nine figure training investments. The during the madness of March, that entire strategy broke down.

The document continues:

Things we consider “major open problems” are solved and in people’s hands today. Just to name a few:

LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.

So during AI March Madness, the open source AI community solved four major problems that Google had on their “these will be tough” list.

Where does that leave Google?

While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months. This has profound implications for us:

We have no secret sauce. Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.
People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. We should consider where our value add really is.
Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

This chart shows the accelerating pace of breakthroughs by the open source AI community:

Repeat the phrase “doing things with $100 and 13B params that we struggle with at $10M and 540B” ten times fast—that’s the sound of the moat drying up and the alligators leaving town.

What Lit the Fuse?

This is a well-written document, top to bottom. Here’s its concise explanation of what happened to enable the open source AI acceleration:

What Happened

At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.

A tremendous outpouring of innovation followed, with just days between major developments (see The Timeline for the full breakdown). Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.

Most importantly, they have solved the scaling problem to the extent that anyone can tinker. Many of the new ideas are from ordinary people. The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.

The details of “got their hands on” are critical. On March 10th, Meta did a limited release of their LLaMA model, to researchers only, and with non-commercial-use license. This in and of itself was a huge move, as no model of LLaMA’s scale and power had ever been released in its entirety, even to a limited researchers-only audience. But then, just a few days later, the same complete model was leaked publicly. So the entire AI open source community now had a complete large-scale model. Then followed the “tremendous outpouring of innovation.”

Coulda Shoulda

The document makes clear that Google coulda, shoulda known this was inevitable, once the open source community got total access to both code and model weights of a full-scale language model, because just six months earlier, a virtually identical emergence occurred in the image generation (art AI) space:

Why We Could Have Seen It Coming

In many ways, this shouldn’t be a surprise to anyone. The current renaissance in open source LLMs comes hot on the heels of a renaissance in image generation. The similarities are not lost on the community, with many calling this the “Stable Diffusion moment” for LLMs.

In both cases, low-cost public involvement was enabled by a vastly cheaper mechanism for fine tuning called low rank adaptation, or LoRA, combined with a significant breakthrough in scale (latent diffusion for image synthesis, Chinchilla for LLMs). In both cases, access to a sufficiently high-quality model kicked off a flurry of ideas and iteration from individuals and institutions around the world. In both cases, this quickly outpaced the large players.

These contributions were pivotal in the image generation space, setting Stable Diffusion on a different path from Dall-E. Having an open model led to product integrations, marketplaces, user interfaces, and innovations that didn’t happen for Dall-E.

The effect was palpable: rapid domination in terms of cultural impact vs the OpenAI solution, which became increasingly irrelevant. Whether the same thing will happen for LLMs remains to be seen, but the broad structural elements are the same.

The rapid domination Google Trends chart cited above focuses on Google’s Dall-E being rapidly overtaken by upstart Stable Diffusion; but let’s add Midjourney to the chart for even more fun:

In the art AI space, Midjourney was the little guy’s little guy, and now they’re absolutely killing it.

Bringing it Back to MosaicML

There’s more to the leaked document, all of it good and worth re-reading, but I think we’ve pulled on the thread long enough and can return to where our story began with MosaicML. If Google, OpenAI, and other deeply-funded closed-AI players are at risk of having no moat, then MosaicML is one of a myriad of open-source-based entities whose business is keeping the AI world moat-free.

And look at the timeline of all of this:

December 2020-ish: MosaicML formed (stealth)
October 13, 2021: Mosaic comes out of stealth, with the mission: reduce the cost of training neural network models
August 22nd 2022: “Stable Diffusion Moment” for art AIs
October 18th 2022: Mosaic launches pillar one, their training service
November 22nd 2022: OpenAI launches ChatGPT
March 10th: Meta announces LLaMA and makes a limited release to researchers
Mid-March: The entire LLaMA model complete with weights leaks publicly
May 3rd: Mosaic launches pillar two, their inference service
May 5th: Mosaic releases MPT-7B and strongly emphasizes open source and commercial-friendly licensing
May 10th: the internal Google document leaks
June 22nd: Mosaic releases MPT-30B
June 26th: See below!

Late Breaking News (June 26th)

Here’s a fun chaser—news that happened as I was in the midst of writing this:

Databricks is acquiring MosaicML in a stock deal valued at $1.3 billion, a move intended to democratize AI. The deal, announced on June 26, includes retention packages, and all 62 of MosaicML’s employees are expected to stay. The transaction is slated to close in July. Neither Databricks nor MosaicML used an outside investment bank on the deal.

(ChatGPT summary, not an official release)

I have zero inside knowledge, but I’m willing to bet that MosaicML’s valuation took a giant hockey-stick leap thanks to the open source AI emergence of the past few months. That emergence, of course, was only the latest ratchet-up in a series of accelerations that began with the Stable Diffusion Moment and ChatGPT in summer to fall 2022.

In any case, congrats to the MosaicML and Databricks teams! Sally forth and bust some moats!

Steampunk? I prefer steamnoble.

Wed, 28 Jun 2023 00:00:00 +0000

This reminds me of fox medallions I created in MidJourney before I changed direction to something simpler for the Crafty logo.

Midjourney V5.2 (Ars Technica)

Sat, 24 Jun 2023 20:20:21 -0500

This Midjourney update looks fantastic. I'm working on a Crafty's Illustrated essay, and plan to give V5.2 a thorough workout when it's time for imagery.

Apt Description of React on Hacker News

Thu, 22 Jun 2023 19:30:31 -0500

React on the other hand feels like a pot warming up and we're the frogs being boiled alive at this point. You can write some perfectly valid, but naive React code and it's going to have some footguns. We're so used to write idiomatic React code we don't even notice that anymore. You learn to use stable references in callback functions, to be careful using Context, and so on. But the list is not short, and it only grows. --jankiel

MosaicML's Open Source MPT-7B Model Writes an Epilogue to The Great Gatsby

Thu, 22 Jun 2023 06:47:12 -0500

I have lived so long in the shadow of this dream that I may never see the bright sunlight of reality. I may look up some morning and find that I am six feet under, and that Gatsby, who gave me the dream and everything else, will have vanished behind me. And so it was. I had hardly crossed the threshold of the old world when my eye caught the headlines of a newspaper; and I knew at once that my dream was over. I knew that Gatsby was dead ....

Comparing Adobe Firefly, Dalle-2, OpenJourney, Stable Diffusion, and Midjourney

Wed, 21 Jun 2023 06:45:26 -0500

Muhammad Usman does a nice comparison of art AIs. To me, the richness of the outputs is amazing. I'm a heavy Midjourney user, and still prefer its output, though the alternatives are impressive as well.

Apple Vision Pro (Part 2) – Hardware Issues

Mon, 19 Jun 2023 20:24:09 -0500

Most of this is speculation—but it’s well-informed, expert speculation. I expect Vision Pro will have its weaknesses like the original iPhone or Apple Watch, but this is prime Apple territory–introduce a product that’s an order of magnitude better than anything that preceded it, albeit with weaknesses, and then refine & advance.

I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

Mon, 19 Jun 2023 20:00:00 +0000

Meta, a company I generally detest, is making real contributions in AI, including significant (partly) open source efforts such as the LLAMA model, which has fueled massive independent development. I trust Meta … to do what’s good for Meta alone, but in AI they’re (probably accidentally) doing a lot of good.

A Gnarly Hugo-Cloudflare Build Problem, Resolved

Sun, 18 Jun 2023 23:06:00 +0000

I’ve been upgrading this site, piece by piece. The latest major series of changes had to do with supporting micro content types, such as the one you’re reading right now, which is a “micro.” As part of this, I wanted to start running my Hugo site builds on Cloudflare Pages where this site is hosted. So rather than building locally and then sending the complete, fully-generated /public directory up to GitHub (which Cloudflare automatically picks up and publishes), I can send content updates only in the form of markdown files, and all the associated html etc. gets generated over on Cloudflare. And it worked! I was happy, for about a day. Then, for no apparent reason, it stopped working.

I was doing a lot of changes all at once, including upgrading Hugo versions and implementing the New Better Version of Tailwind CSS support in Hugo. And the build error message I was getting just didn’t make sense:

22:00:23.069	ERROR 2023/06/06 03:00:23 render of "page" failed: 
"/opt/buildhome/repo/layouts/_default/baseof.html:5:8": execute of template failed: template: _default/single.html:5:8: executing "_default/single.html" at : 
error calling partial: partial "head.html" timed out after 30s. 
This is most likely due to infinite recursion. If this is just a slow template, 
you can try to increase the 'timeout' config setting.
22:00:25.922	Total in 49821 ms

Now, my local builds happen in a few hundred milliseconds, maybe as long as a couple of seconds; but timing out after 30 seconds? That can’t be real! But the build does work fine and fast locally, so why would there be infinite recursion on Cloudflare? I assumed there must be some kind of environment difference, or some failure pulling my repo, etc. Chased that for days … finally reached out to the incredible Bryce Wray who went way beyond the call of duty to help. Finally Bryce suggested, “Why don’t you increase the Hugo timeout to a really high number just to see what happens?” And it worked! My builds on Cloudflare took something like 60-70 seconds versus milliseconds locally, but they worked. I showed Bryce the build stats:

16:02:02.470                       | EN   
16:02:02.470    -------------------+------
16:02:02.470      Pages            | 108  
16:02:02.470      Paginator pages  |  10  
16:02:02.470      Non-page files   |  65  
16:02:02.470      Static files     |  13  
16:02:02.470      Processed images | 329  
16:02:02.471      Aliases          |  29  
16:02:02.471      Sitemaps         |   1  
16:02:02.471      Cleaned          |   0  
16:02:02.471    
16:02:02.471    Total in 61877 ms

He took one look, and said “Oh, I see what it is, it’s all those images.” Hugo does some wonderful image processing magic, and that logic is normally only triggered incrementally when a new image asset is added. But, thanks to an ill-advised entry in .gitignore, I had Cloudflare regenerating all my images from scratch on each build. So that little gitignore change was the root cause, the change that broken the build. Sheesh.

I fixed the gitignore issue, now image processing is back to incremental only, and builds are roughly 5 seconds.

Apple Really Did Ignore the AI Emergence at WWDC

Sun, 18 Jun 2023 22:06:00 +0000

So far I’ve managed to watch the WWDC Keynote and Platforms State of the Union from this year’s WWDC. At the time, I was amazed that Apple appeared to be ignoring the current massive emergence in generative AI. I figured, well, they’ll dig into that in the deeper sessions later in the week. But sitting here 10 days post event, I’m realizing that in the river of buzz flowing by about WWDC developments, I wasn’t hearing anyone talk, at all, about Apple and AI. I used the Apple Developer app and did a bit of searching—kudos to Apple for having everything transcribed and searchable—and I’ll be damned, they really did ignore the elephant in the room. A few incidental AI mentions, and “generative” appeared once.

The same cook time (no pun intended) allowed me to rethink my original reaction, “Apple’s being clueless!” to a more measured “Ah, yeah, Apple’s being Apple!” Specifically, “If we’re not ready to talk about it, we’re not going out there and throwing half-baked crap against the wall.” In stark contrast to much of the tech world’s half-assed, panicked responses. I bet Apple’s following developments very closely, but waiting to see where it goes and really looking at what it means and how it fits into their world. I also bet Apple’s been doing serious generative AI work since well before it hit the radar last fall with the art AI emergence.

And give Apple credit for parachuting their own elephant into the room, in the form of Vision Pro and visionOS.

Unlimited Stupid Questions

Wed, 14 Jun 2023 08:06:00 +0000

I had a long hacking session with Hugo and Tailwind CSS yesterday evening, refining the micro content types I added last week and ripping out traditional CSS styling in favor of Tailwind utility classes and typography. This kept me constantly on the edge of things I don’t fully understand.

I’ve reinforced others’ comments about how generative AI tools allow us to tackle more ambitious projects, reduce the activation energy needed to get started and get to the finish line. And I’ve referenced the analogy of ChatGPT as a kind of weird intern with an interesting set of characteristics. But yesterday evening, I realized that one of the very top benefits I get from ChatGPT: permission to ask unlimited stupid questions.

Introverts and Men

I’m definitely on the introvert end of the spectrum–maybe even “on the spectrum” itself. I love the idea that introverts are fantastic at meetings, only they need an hour to get ready and two hours to recover. Social gatherings? Double that.

I think the strong aversion to bothering people, taking up someone else’s time, may an adjacent introvert characteristic. The fear of looking bad by asking dumb questions? Well, perhaps that may apply to most of us humans, introvert and extroverts alike.

I know this aversion holds me back, keeps me stuck, but it is difficult to overcome–I identify with the concept of “social exhaustion” and for me, every asked question is a significant withdrawal from the social energy account.

ChatGPT Doesn’t Mind

The models behind conversational AIs like ChatGPT are constructed to deliver a friendly, chatty, thrilled-to-help you persona. They never get tired or frustrated. They love questions, and no question is too stupid! They don’t care if I should have known that already. (SHAME = “should have already mastered everything.”) They’re just happy to help.

The crazy result: unlimited free questions! The dumber the better! Zero withdrawal from my social energy account when I ask a question!

I can’t describe how huge this is for me, and how much it has unblocked and freed me.

GPT-4 Has Great Answers

Unlimited free questions isn’t worth much unless the answers are actually good. I have access to ChatGPT’s GPT-4 model through a $20/mo ChatGPT+ subscription, and the answers are consistently solid. Yes, I have to keep my radar on watching for “alternative facts”–a process that’s similar to watching for fake reviews on Amazon. If it smells bad, it probably is bad. But that’s pretty rare with GPT-4 in the kinds of questions I’m asking. In fact, in several cases, I was sure I had caught GPT-4 hallucinating, only later to realize it was right and I was wrong.

Yesterday’s Topics

Just to give you a feel for the kind of things that ChatGPT / GPT-4 can help with, here are the topics from just yesterday evening. Not a developer? Imagine these are questions about all those Microsoft Word or Excel features that you can’t figure out or even find in the Office 365 Hell-Menu™️. GPT-4 is just as smart there.

Remind me how to set up a separate development branch in Git
Here’s what git status is telling me, what does that mean?
How do I switch back and forth between branches quickly
I made some changes and they are showing up in both branches, why is that?
I notice that my Hugo project’s package.json has older dependencies than the sample project I am following, how do I resolve that?
I’m using the Tailwind CSS typography plugin, but when I add “prose” directives they’re being ignored, what might the problem be?
There’s a file that Hugo auto-generates that I currently am tracking in Git but shouldn’t be, how do I resolve this and ignore that file from now on?
How do I merge that fix over onto my development branch?
I got this error when I tried that, how to resolve?
Oh crap, I accidentally pushed to the main branch instead of development!
How do I add a timestamp in my command prompt in zsh

Yes, dammit, I should have already mastered all of this.

Honey Badger Don’t Care

Stupid questions? Look at those eyes—honey badger doesn’t give a crap. Neither does ChatGPT. And neither do I. Free at last.

Ownership in Small and Medium Technology Companies

Tue, 13 Jun 2023 14:00:00 +0000

This is the first in a series; Part 2 is already completed and Brooke plans a Part 3 and Part 4. He was also recently a guest on the Definitely Maybe Agile podcast and talked ownership.

Le Mans, Unabridged

Sun, 11 Jun 2023 00:00:00 +0000

Yesterday I was watching the 24 Hours of Le Mans, 2023 edition, when I realized, “wait a minute, I’ve blogged about Le Mans before …” Much to my POSSE horror, I realized that I had taken down my 2015-era blog without preserving its content, and that some of the content actually appeared to be worth preserving.

Actually, Cory Doctorow would make the point that ALL of that content is worth preserving, even it if seems cringeworthy or no longer relevant. My intent is to revive all my content from 2015, which will take a bit of effort (Squarespace export for Wordpress => Wordpress-to-Hugo import).

For today, though, since it’s Le Mans weekend, and since Le Mans 2023 is shaping up as an amazing year in the P1 / hypercar class, I’m bringing this post back first.

Le Mans, Unabridged

Originally Published September 02, 2015

Race time, 2:57 AM on Sunday June 14th, the middle of the night, pitch black pierced by laser headlights. 13 hours, 3 minutes to go. My time-shifted reality time is 7:37 AM June 23rd, a full nine days after les 24 Heures du Mans has finished. I’m actively ducking my normal news feeds, especially Zite, trying to avoid spoilers. After almost 11 hours of racing, the leading Audi and Porsche LMP1 cars are just seconds apart, trading the lead with every pit stop.

LMP1 stands for Le Mans Prototype Class 1, the fastest of the four Le Mans racing classes. The top LMP1 cars are capable of circling the long, winding 8.47 miles of the Le Mans circuit in less than 3:20, with top speeds over 210 MPH. For my money, LMP1s are the most amazing racing machines on the face of the earth.

The Le Mans circuit

Know that I’m a serious Formula 1 fan. I love F1 cars, their amazing speed (even faster than LMP1s), the technology they represent. I love the carefully sculpted F1 racing format, with its three-round qualifying on Saturday followed by an intense Sunday race. I love the drama of F1’s long 18-race season, with drivers and teams fighting it out over nine months for individual and team championship titles. Formula 1 is my favorite racing series.

My favorite single race, though, is the 24 Hours of Le Mans. Hands-down, no contest. Le Mans is a season’s worth of racing compressed into a single race. Cars that finish these brutal 24 hours will never race again—the punishment is so heavy that even the most robust components, down to the chassis itself, are beat up, worn out, no longer race-worthy. Le Mans demands a three-driver rotation, and even then drivers are pushed to the point of physical exhaustion and dehydration. Pit crews and garage mechanics face their own marathon: besides rapidly rendering 24 hours of fuel-and-tire stops, they must be prepared to diagnose and fix even the most seriously broken cars, under the highest-intensity, the-clock-is-ticking circumstances. Race engineers like Audi’s Leena Gade might have the toughest job of all, because unlike drivers, they’re on duty for all 24 hours. This is 24 hours of racing insanity.

Le Mans cars are likewise in a class of their own. Even the commoners of Le Mans—the Porsche 911 RSRs and Ferrari 458 Italia GT2s of the GTE classes—are sexy beasts that wind up with a sound like a turbojet on the long Mulsanne Straight and would turn heads on any road in the world. The LMP1s are the pinnacle, though, truly the sexiest cars on the face of the earth, especially as they fly through a difficult corner sequence like the sinuous Porsche Curves, moving so fast it seems like you’re fast-forwarding the video. As sports car prototypes, LMP1s have beautifully enclosed bodywork and jet-aircraft-style cockpits. The LMP1 look is sleek spaceship, a vast contrast to the aero-deformed, exposed-wheel F1 car.

2015 Audi R18 e-tron quattro

Racing technology a big part of the appeal for me. I love Formula 1 for its tech, but here again, Le Mans and LMP1s rule the roost. The three top LMP1 teams—Audi, Porsche, and Toyota—all use hybrid technology. Audi’s hybrid power isn’t battery-based, though: the R18 e-tron quattro instead uses a flywheel accumulator which spins up during braking, storing up to 700K Joules of energy and delivering an extra 272 horsepower on demand. Audi has likewise innovated on its internal combustion engine: starting in 2006, Audi made an unprecedented switch to diesel power. Besides delivering amazing performance and fuel economy, the diesel powerplant is incredibly quiet; the loudest noise you’ll hear on the in-car feed is the flywheel spinning up, with engine noise just a low background throb. Using such unconventional power sources, Audi has been dominant at Le Mans, winning 10 of the last 11 years.

Porsche is the most famous name in Le Mans racing, a name that had been absent from prototypes for 16 years until their return last year with the 919 Hybrid. The 919 is a more conventional hybrid pairing than Audi’s, with a turbocharged 2.0 liter V4 gasoline engine complemented by a lithium ion battery system. Conventional or not, the 919 is a powerful rocketship capable of cranking out almost 1,000 HP, significantly more than Audi’s combined 840 HP and enough to easily win this year’s LMP1 pole. But Les Mans is endurance racing at its most fundamental, and despite the Porsche’s power advantage and qualifying pace, it was Audi who started this year’s Le Mans as heavy favorites, having handily beaten the 919s in two key up to Le Mans. It normally takes years for a new team to master Le Mans’ triple requirements: speed, reliability, and teamwork. 2015 was just Porsche’s second year, so winning seemed highly unlikely.

2015 Porsche 919 Hybrid

Race time, Sunday morning 3:50 AM. 12 hours 10 minutes to go. Time-shifted time, 8:41 PM on June 25th, dusk in Winnetka, 11 days and change after the finish. It’s shaping up to be a great race, perhaps as exciting as the classic Audi - Peugeot duels of 2006-2011. Porsche’s #19 car has pulled out a slight lead, enough to remain in front event after pit stops, although in a 24 hour race this is the thinnest of margins, one that could evaporate with the slightest mishap such as a punctured tire.

And yes, I’ve really watched it all, 11 hours and 50 minutes of racing, transitioning from bright afternoon sunshine, through the fading light of a beautiful evening, and now into the dark lonely hours past midnight. The race is not even halfway over, and already I’ve committed more viewing hours than an entire season of Game of Thrones. By the time I’m done, I’ll have watched 24 full hours, time-shifted to fit my spare viewing hours, over an elapsed period of 18 days, all the while staying clear of my usual racing news sources in an attempt to avoid spoilers.

That’s a serious commitment of time and focus. A similar commitment, applied to cross-country driving, will get me from Winnetka to Taos, New Mexico, including a side trip to Alliance, Nebraska to visit that automative wonder of the world, Carhenge. What motivates me to watch a single sporting event, a single race, for 24 full hours?

Watching Le Mans unabridged is sublime and memorable, in the same way that taking a cross-country journey in the company of great friends is sublime and memorable. It’s not a quick-fix adrenaline rush like the best of Formula 1 races. It’s not a quick anything. Instead, it’s a 24-hour stream of small wonderful moments, enjoyed in the company of good friends. Headlights coming on in the beautiful dusk, the screech of an Audi’s tires as it threads the Porsche Curves, chaos in a GTE garage after an unfortunate meeting with an Armco guardrail, the one-of-a-kind growl of the Corvette C7.R’s 5.5 liter V8, the intense focus of an LMP1 driver cranking out sub-3:20 laps in the pitch black of 3:00 AM. For me, the Le Mans experience can’t be condensed; try to grasp the “good parts” and you lose the whole essence of the thing.

919 Hybrid owns the night

Race time, Sunday, 6:44 AM, almost 15 hours into the race. It’s getting light. 9 hours 16 minutes to go. Porsche’s #19 car has stretched its lead over the #7 Audi to a pit-stop-adjusted 50 seconds, still a tight margin at Le Mans but definitely a concern to Audi. There’s an interesting Formula 1 connection to the #19 Porsche: German Nico Hülkenberg, an active F1 driver for Force India, is one of #19’s co-drivers. Hülkenberg is a Le Mans rookie, as is his Kiwi teammate Earl Bamber. Their third co-driver Nick Tandy has raced Le Mans twice before, but in the GTE Pro class, two large steps below LMP1. It’s a surprise to see #19 leading the Porsches, let alone the race.

For almost 15 hours I’ve enjoyed traveling in the company of my good Le Mans friends. Endurance racing in general, and Le Mans in particular, has the informal feel of “Hey, let’s do something insane! Screw it, let’s drive for 24 hours straight and see who can go the farthest!” It’s less like a formal contest and more like an all-night crazy-fest. And I get to be an integral part of the craziness: my role is to watch insanely—all 24 hours—two full seasons of The Wire.

Among those I’m intimately connected with are the head crazies, the drivers. Le Mans’ deep camera coverage brings me right inside their world for 24 hours, whether they’re two hours into a four-hour stint in their LMP1 cockpit or dozing in the garage. In the tightly-produced Formula 1 format, my exposure to the drivers is carefully scripted, with handlers always hovering nearby. In F1, the closest you’ll come to an informal moment is immediately after the race, when the top three finishers relax and rehydrate before the podium ceremony.

At Le Mans, in contrast, informality reigns. 24 hours is long time, and even if tight production and careful scripting were desired (they don’t seem to be), it would be prohibitively expensive to do. Also, because Le Mans drivers are part of a 3-person team, they’re actually driving just 1/3rd of the time. The 2015 edition of Le Mans had 56 cars, meaning 174 drivers. So at any particular point during the race, 116 drivers are not driving, but instead are hanging out around the garage, watching the race just like me, maybe grabbing some food and a couple hours of sleep, but mostly hanging out. Oh look, there’s André Lotterer from the #7 Audi, relaxing in the garage with his mechanics, watching the race feed and his car’s telemetry data. Wow, there’s Patrick Dempsey, yes, that Patrick Dempsey, who just finished his stint in the #77 GTE Am car. Casual, unscripted interviews are happening continuously. One of my favorites this year was with Anthony Davidson, co-driver of the #1 Toyota TS040 Hybrid. Toyota Racing was a top contender in 2014, winning the World Endurance Championship, but in 2015 Porsche and Audi both pulled significantly ahead. Davidson was fresh out of the #1 Toyota, having just crashed twice on a single lap as he desperately tried to match the pace of the leaders. He was devastated, brutally honest about his mistakes, and apologetic to his fans and team. The time lost for repairs ended any hopes Toyota had for a top finish. No handlers, no scripts, up close and personal.

Race time 12:49 PM, a bit over 3 hours to go. For me watching in Winnetka, it’s dinner time on July 2nd, 18 days later. Both the race and my time-shifted viewing are entering their final push. The drivers who take over now will bring their cars home.

Finish at Le Mans occurs when (a) the official Rolex 24 hour race clock finally counts down to 00:00:00, and (b) your car crosses the finish line, having completed its last 8+ mile lap in six minutes or less. Even if your car runs for 23 hours and 55 minutes, if you fail to cross the finish line, or are limping along at slower than a six minute pace, you are completely disqualified, expunged, shunned, purged from the classification list as if you had never entered.

The iconic Rolex race clock at Le Mans

About four hours ago, disaster befell Audi’s lead #7 car: the rear engine cover broke loose and shredded, resulting in engine damage and a lengthy garage visit. This gave the #19 Porsche a more comfortable one-lap lead. Since then, the Audis have continued to suffer reliability problems and even a drive-through penalty for “disrespecting the slow zone.” Porsche’s cars have meanwhile been rock-solid and now hold the top two spots, with #17 up to second place.

The Le Mans race commentators are another big part of my communal crazy-fest. It’s an hilarious, chaotic, ever-shifting cast of characters, with new voices suddenly appearing as their predecessors hit the wall of exhaustion and slip away for food and sleep. They’re enthusiastic and knowledgeable, many being former Le Mans drivers, people who can speak from personal experience about what it’s like to be in the car during Happy Hour, the golden hour just after daybreak when the track surface is rubbered in and sticky, and the cool air maximizes engine performance. They talk me through amazing laps, horrifying crashes, miraculous pit work, amateurish moves by amateur drivers, and hilarious moments (“André Lotterer there, taking on a banana.”) I feast on insider trivia—which teams have the best hospitality catering, what particular drivers choose to eat during the race (in one horrifying case, curried hot dogs).

Best of all, they share about Le Mans, the Grand Lady that even the world’s best drivers can never take for granted. Like life, Le Mans is fundamentally about endurance, about taking hits and getting beat up and getting it started again and limping back to the pits and intense recovery work and speed tape and surviving and finishing. It’s about yellow flags and safety cars and getting a lap down and getting back in the groove and unlapping yourself and retaking the lead when it’s the other driver’s turn for misfortune.

Race time 3:53 PM, less than eight minutes remaining, time-shifted to 9:09 PM July 2nd in Winnetka. Nico Hülkenberg is back at the wheel of the #19 Porsche and has an almost-comfortable one lap lead over his Porsche #17 sister car, two laps ahead of the #7 Audi.

Hülkenberg is still flying, to the great discomfort of his team, who try to reel him in over the radio, “Nico, you can take it a lot more easy, repeat, you can take it a lot more easy.” If Nico can hold it together for the last eight minutes, he’ll become the first active Formula 1 driver to win Le Mans since Johnny Herbert in 1991.

Heartbreak after 23+ hours for the #98 Aston Martin

The Audi garage is subdued. The unbelievable does still happen at Le Mans—just minutes earlier, in the GTE Am class, the #98 Aston Martin, running with a full two-lap advantage, crashed out of the race, handing the lead and eventual win to the #72 SMP Ferrari. But the Audi team senses that, with just minutes to go, and two Porsche 919s running fast and reliable well up the road, 2015 won’t bring another miracle win. Dr. Wolfgang Ullrich, Audi Director of Motorsport, walks over to the Porsche garage and offers congratulations.

Race time, a bit past 4:00 PM. The official Rolex 24 hour race clock has counted down past 00:00:00. Nico Hülkenberg has finally slowed down a bit and convoyed up with the #17 sister Porsche to cross the finish line together.

Another historic, magical Le Mans is in the books. Best consumed unabridged.

Vivek Haldar on GitHub Copilot

Sat, 10 Jun 2023 11:23:00 +0000

Speaking only for myself, I get orders of magnitude more value from [GitHub Copilot] than the $10/month it costs. How? Copilot (often used in conjunction with GPT, and now, Bard) has reduced the activation energy for the small scripty projects I want to undertake from being just slightly over a weekend (which meant they would get abandoned or not even started) to fitting comfortably within a weekend. And that has made a world of difference, at least for me. I suspect the same story will play out across programmers in general, where it will serve to lower the friction and frustration of programming just enough to increase their overall output.

– Vivek Haldar

First Impressions of Vision Pro and visionOS

Thu, 08 Jun 2023 11:23:00 +0000

John Gruber’s First Impressions of Vision Pro and visionOS Gruber conveys the essence of the experience better than other writeups I’ve read so far. Unlike a traditional media outlet with their usual needs to drive readership, avoid pissing off Apple, appear balanced, etc., Gruber pretty much straight says what he thinks, which is useful.

Is Runway Gen-2 Midjourney for Video?

Thu, 08 Jun 2023 00:06:00 +0000

I love Midjourney, use it almost every day to generate the illustrations that are an important component of this blog. In fact, this blog’s name change from Crafty’s to Crafty’s Illustrated happened entirely because of Midjourney, which made possible the “Illustrated” part. I happily pay $10/mo for these benefits.

I’d gladly fork out similar money for something like Midjourney but for video. To that end, some months back I signed up on a waitlist (or something) with Runway. Today I received an announcement that Runway Gen-2 is available. So I decided to give it a workout and assess the current state of generative AI for video.

We Need a Posse

In my last essay about writing, I touched on IndieWeb’s POSSE principles and that seemed like a great opportunity for an illustration. I used this Midjourney prompt:

a group of men on horseback, riding at high speed, in pursuit of a cattle thief, cinematic spaghetti western technicolor

and this was the winning image:

Midjourney actually generated four images for that prompt, and the others were quite good also:

From prompt to initial images was maybe 30 seconds, and the final image was web-ready in 5-10 minutes including tweaking and exporting to .webp using Pixelmator Pro.

Lights, Camera, Action

So, we might as well try Runway Gen-2. Same prompt, shall we? I gave it three tries, and it was a win in the sense that I got this fun little essay as a result. But Midjourney-for-video? Not so much.

Take 1: What?!

Here we see galloping at high speed … seated backwards on their horses? And in the background, the horses seem to be stampeding in reverse or something. A pretty amazing clip–though not exactly what we were looking for.

Take 2: Huh?

Here we have … a group of men riding while sitting on raised platforms fastened to their horses, kind of a war-elephant setup. Hats do not appear to be western in style, they have a Central Asian mongol-hordes feel. The hat second to left morphs in stages from wizard to Cat-in-the-Hat. And on closer inspection, the horses’ anatomy and mechanics are nightmarish.

Take 3: Look Out!

The first moment here is promising–but then a Reverse Rider, possibly a fugitive from Take 1, blows through the pack like a southbound drunk driver on the northbound Edens Expressway. Chaos ensues.

Nope … But Let’s Be Fair

We’re ready to call this race: Midjourney for video has not yet arrived. But a few things to bear in mind:

Midjourney generates its share of outtakes, including some true horrors (see examples at the bottom of the page). “That’s such a beautiful little girl, OH GOD SHE’S GOT 8 FINGERS ON EACH HAND” Or “What a cute little bunny AAAGHHH IT HAS AN EXTRA LEG COMING OUT OF ITS BACK”
A group of men on horseback is admittedly a fairly challenging scene.
Things move fast in generative AI. It won’t be long.

So, nope, not yet. But I predict it won’t be long.

Orion Browser by Kagi

Wed, 07 Jun 2023 11:23:00 +0000

Orion Browser by Kagi I’m late to the party, probably, but the Orion browser from Kagi (makers of the excellent paid search engine) is really interesting. WebKit-based, super lightweight, zero telemetry / auto ad blocking. And it supports both Firefox and Chrome browser extensions (or at least quite a few of them). Still considered beta but caught my attention for sure. The benchmarks are pretty wild.

Faster sorting algorithms discovered using deep reinforcement learning

Wed, 07 Jun 2023 11:23:00 +0000

Faster sorting algorithms discovered using deep reinforcement learning So how about an AI story that isn’t generative AI. To quote the Nature abstract:

Whereas remarkable progress has been achieved in the past, making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks.

Not earth-shattering, not an all-new fundamental sorting algorithm, but achieving a 1% to 5% improvement on code that runs a trillion times a day is meaningful. Already merged with LLVM so it will make its way into real code.

Apple Vision

Tue, 06 Jun 2023 11:23:00 +0000

Apple Vision Stratechery’s Ben Thompson with an excellent take on Vision Pro. He was able to get a few minutes hands-on with the device yesterday. His take aligns very much with my own. Pre-event, I kept thinking, “put AR/VR aside, this thing could be a really great replacement for external monitors” and Ben goes there and beyond as he looks at likely evolving uses of Vision products over time.

Well written and worth a read.

Sloppy Apple Keynote Observations, Part 1

Mon, 05 Jun 2023 22:07:00 +0000

Keynote video: Apple blew out the audio levels on interstitial music. Very off brand.
New Mac Pro: I was very amused when, as Apple talked about this beast having 7 PCIe slots and mentioned what you’d use them for, they skipped one little item … GPUs. “Let’s ignore generative AI and it’ll go away.”"
iOS 17
- Stickers is a terrible name, but I think this will be a great feature–like iOS’s “photo subject select / cut” on steroids.
- StandBy: Like it, but for nighttime, does an all-red color scheme make you sleepy?
- Autocorrect is being upgraded to “2019 state of the art” level
iPadOS 17
- Health app on iPad. I’ve been wanting that.
- Notes app massive upgrades, lots of Notability-type upgrades but just like I’ll never buy another Ford, I’ll never trust my notes to the Notes app. Been burned by both Fords and Notes.
macOS Sonoma
- tvOS like screen savers look nice
- Not a gamer but what they’re doing with Game Mode resonates
- Presenter overlay features are interesting and looks like they’re not FaceTime only thank god.
- Loved the car control demo featuring a Lucid Air and not a Tesla. Screw you Elmo.
- Safari profiles and webapps look intriguing. Skeptical that they’ll actually get profiles right though.
AirPods
- Adaptive audio, a combination of transparency and noise reduction.
- Conversation awareness, detecting and augmenting speech. Getting into hearing aid territory.
- Better autoswitching. Currently that’s unusably bad; we’ll see.
Apple Watch
- Road biking wonderfulness, if it actually works. Bluetooth sensor support, FTP calculations, full-screen bike computer modes. I use Cyclemeter but if this works …
- Looks like great upgrades for hiking also.

There you go. Sloppy enough for you?

Sloppy Apple Keynote Observations, Part 2

Mon, 05 Jun 2023 22:06:00 +0000

Pre-keynote expectations are to Vision Pro reality as “front screen will show googly eyes” is to the actual EyeSight feature
This product shows the degree to which Apple is the anti-Meta
The heavyweight first-gen feel of this reminds me of the first iPad. A super solid chunk.
The theater experience–want.
23M pixels per eye in a postage-stamp-size display.
So so so many things they had to do to get this right. E.g., Zeiss lens options for those needing corrected vision. Reminds me of Apple Watch, but probably 20X longer list.
On-secure-enclave OpticID. Holy hell.
So so so many cameras and sensors.
Required an entire new OS flavor to make this work.
Required an entire new chip family (in addition to an M2) to make this work–12ms latency sensor to display. That’s insane but necessary. Meta will never, ever catch up with this.
5,000 patents. Seems about right.

I predict an Apple Watch like trajectory of success but way way bigger revenues.

It’s infuriatingly hard to understand how closed models train on their input

Sun, 04 Jun 2023 00:07:00 +0000

It’s infuriatingly hard to understand how closed models train on their input Simon Willison digs into the question of whether the big closed LLMs are training their models based on users’ input. As Willison says:

I’ve been wanting to write something reassuring about this issue for a while now. The problem is… I can’t do it. I don’t have the information I need to credibly declare these concerns unfounded, and the more I look into this the murkier it seems to get … The fundamental issue here is one of transparency. The builders of the big closed models—GPT-3, GPT-4, Google’s PaLM and PaLM 2, Anthropic’s Claude—refuse to tell us what’s in their training data.

Ambiguous language from these big players is the norm; unambiguous statements, such as OpenAI’s (which unfortunately applies only to paid API users and only from March 1st onward) is a rarity:

OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering.

This ambiguity around closed models’ training data makes open, self-hostable models–where there is zero ambiguity about, and total control of, training data–increasingly attractive to developers and companies building LLM-based products.

Definitely worth a read.

ChatGPT Code Review

Sat, 03 Jun 2023 00:00:00 +0000

I’m working with a fractional CTO client to launch a significant update of their web + mobile platform. The mobile app needs to support offline mapping, and our dev team has fighting through some problems there. My understanding of how map rendering and offline worked in the app was fairly high level, and given the current problems, I decided I needed to deepen my knowledge a bit. So I invoked my trusty intern ChatGPT (GPT-4 model) to help me walk through the app and understand the libraries being used, sources of map tiles, online and offline flows, etc. In about an hour, my understanding progressed from superficial to fairly deep, and potential causes of the problem we’re fighting much more clear. The conversation spanned over 46 pages and 10,000 words. This wasn’t a “saved time” situation–rather it was, “never would have done it alone.” A big win.

A Day Late

Fri, 02 Jun 2023 00:00:00 +0000

Soon after I announced that I’m writing every day and why, I realized, uh oh … what happens if I miss a day? Well, that day arrived yesterday (June 2) when I had a series of unfortunate events trying to streamline my content publishing by doing Hugo site builds in via Cloudflare Pages rather than locally. I spent too many hours streamlining (or trying to!) and zero hours writing. Well, it’s a bit of a cheat, but I’m doing yesterday’s writing today, and I justify that because the subject is yesterday 😜. I’ve reverted to my previous local build workflow for now, and will continue to plug away on the Cloudflare issue while keeping daily writing as a higher priority.

OpenAI’s plans according to Sam Altman

Thu, 01 Jun 2023 00:07:00 +0000

OpenAI’s plans according to Sam Altman Summary of Raza Habib’s interview last week with Sam Altman. I have zero confidence that Altman is being transparent in any way here–in fact, my bet would be, assume that reality is the opposite of everything Altman says. E.g., regarding being GPU limited, this is to misdirect the competition and regulators, and not a real or important constraint. And is it just me, or does “OpenAI will avoid competing with their customers — other than with ChatGPT” sound a lot like Google’s “Don’t be evil” … yeah, that’s the ticket!

I'm an Upgrade Coward

Wed, 31 May 2023 00:00:00 +0000

I’m such a coward when it comes to applying updates that might break things. Today was Upgrade To The Latest Hugo (0.112.5) and Upgrade To The New Tailwind-CSS-With-Hugo Approach. I was terrified and had to lean on poor Bryce Wray over on the Hugo Discord for moral support. I finally worked up my courage … and typed

brew upgrade hugo

and 10 seconds later I was running on Hugo 112.5, with the old Tailwind approach. 10 more minutes and the new Tailwind approach was in place. I’ll take this opportunity to thank Bryce as well as all the folks contributing to make Hugo and Tailwind CSS awesome.

Lawyer cites fake cases invented by ChatGPT, judge is not amused

Tue, 30 May 2023 16:07:00 +0000

Lawyer cites fake cases invented by ChatGPT, judge is not amused Simon Willison does a deep review of the ChatGPT aspects of the recent case where a lawyer submitted ChatGPT-hallucinated case law. The story starts out as a standard ChatGPT hallucination, but got crazier when ChatGPT doubled down when asked to confirm if the cases were real. (I actually ran the confirmation prompt Willison used and it worked for me)

Camus on True Knowledge (From Myth of Sisyphus, 1955)

Tue, 30 May 2023 00:00:00 +0000

With the exception of professional rationalists, people today despair of true knowledge.

– Albert Camus, Myth of Sisyphus, 1955 May 30, 2023 #

How to Write Every Day—and Why

Tue, 30 May 2023 00:00:00 +0000

I had been struggling with wanting to write at least weekly, but failing miserably–monthly at best. The pressure-to-write would build up and up and up, and finally I would put everything else aside to spasm out a long piece. It was not pleasant, but at least I was writing again.

Then I came across Mattias Ott’s commentary Writing, Fragments, and the Memex Method, which led me to Cory Doctorow’s piece The Memex Method. These resonated amazingly, with Ott’s piece helping me better appreciate Doctorow’s. Read them both!

Daily Writing Isn’t Just About Frequency

I’ve always thought about daily writing as being all about “building muscle memory,” but Doctorow and Ott enlightened me–that’s the least of its benefits. As Ott says,

I already knew that blogging – and having a personal website in general – is a superpower ….

I had heard before of Vannevar Bush’s groundbreaking essay “As We May Think” that directly inspired the invention of hypertext by Ted Nelson and Douglas Engelbart, which, in turn, had a huge influence on the WorldWideWeb project.

I also knew that a blog lets you document your thoughts and impressions in what can become an archive for your notes and a record of your experiences over time.

I knew that by sitting down and writing, your writing will inevitably improve.

And, I knew that some people are bad at writing regularly and that others – and that’s why we like to call them “writers” – aren’t.

What Cory describes in his post, however, goes far beyond those individual ideas. He combines them into an approach to blogging that is maybe the most compelling and empowering reason to have a blog.

Short Form Content (Fragments) as Writer’s Notebook

I had already been thinking about writing shorter pieces as part of trying to get on a quicker cadence–it’s easier to hit a daily cadence if quick, short pieces are an option.

But of course, short content published daily is so much more than that, as Doctorow makes clear. He explains that daily short-form writing is analogous to the writer’s pocket notebook, connects that to the original “web log” concept, and finally ties it in with Vannevar Bush’s “memex” concept from the 1945 paper As We May Think. Distilling a few of Doctorow’s key points:

Ideas that are published have much more power than ideas that I just jotted down. I’m assembling my thoughts in a coherent enough manner to share them, and I’m signing my name to them, putting them out for comment and critique. With a daily writing routine, I’m largely “web logging” everything important that I’m thinking about.
If you build a large collection of small published ideas, they tend to self-assemble into larger, more important essay-scale work. Doctorow calls it “nucleation in a supersaturated solution”–each small idea constantly triggers our minds: “this is a piece of something bigger, and maybe something important.” Doctorow:

Every now and again, a few of these fragments will stick to each other and nucleate, crystallizing a substantial, synthetic analysis out of all of those bits and pieces I’ve salted into that solution of potential sources of inspiration.

For me, the huge win is, instead of cathartic agony once a month trying to gather my thoughts and figure out what to write about, all I have to do is write daily, usually small pieces, and allow the larger essays to emerge naturally. I suspect that the quality and power of the writing will be significantly higher also–we’ll see.

And POSSE Too

Doctorow has influenced me in other ways, among them introducing me to IndieWeb and their POSSE principles.

POSSE is an abbreviation for Publish (on your) Own Site, Syndicate Elsewhere, the practice of posting content on your own site first, then publishing copies or sharing links to third parties (like social media silos) with original post links to provide viewers a path to directly interacting with your content.

Early this year I set up the new company-website-plus-blog, and took a relatively POSSE-compliant approach, leveraging the Hugo open-source static generation framework, with content being simple markdown, my CMS being Git / GitHub, and my hosting platform being “any of the players that let me host static content under a free plan”, currently CloudFlare Pages but I could change that with minimal work. I gained control of company content, as well as the long-form essays I was posting to the blog.

I had a major POSSE leak, however, when it came to short-form content. I post on LinkedIn to help get visibility for myself and Crafty, and all that content was basically throw-away. (LinkedIn does, actually, throw away my content, as I found when I tried to get back to some work from 2019.) Likewise, some of my posting on Mastodon is similar in nature (I pulled the eject handle on Twitter back in November), and again ended up being throw-away, in the sense that it’s possible that my mastodon server might disappear without me having a chance to get my content out. (The Mastodon community is working on this, and the situation is so much better than with Twitter, where the absolute worst has already occurred, at least until it gets worser.) Partly because of the siloed, throw-away nature of self-contained posting to LinkedIn and Mastodon, I’ve been only sporadically done these short posts, exactly wrong when one is trying to get known and build a following.

So along with adopting a daily writing habit, I decided to plug my POSSE leaks.

Updating Crafty’s Illustrated for Daily Writing and POSSE

I use the PaperModX theme in Hugo, which is of course a fork of PaperMod, which itself is a fork of Paper. PaperModX is well suited for “normal” size blog posts, but less so for microblogging. Of course, Hugo itself is a blank slate that grants me absolute control over what gets served up to a browser–just a matter of tweaking it, in my case pulling some elements out of the theme and customizing.

I’ve mentioned that generative AI tools like ChatGPT and MidJourney allow me to be more ambitious with my projects than I would otherwise have been. I would have hesitated if I had to devote a work week or more to enabling microblogging, but ChatGPT GPT-4 edition cut that task down enough that I was able to complete it in spare hours over the holiday weekend. I liked Simon Willison’s approach of having several flavors of micro content, and ended up implementing four:

Elsewhere: A link to someone else’s content, and why I felt it was noteworthy
Quote: A quotation with attribution and links to the author
Imagery: Being a generative and geometric artwork fan, I love sharing others’ interesting creations, and this is a special micro format for exactly this purpose.
Micro: A small thought that doesn’t fit neatly as one of the above.

Plugging the LinkedIn and Mastodon POSSE Leaks

From now on, any significant small idea gets done first on Crafty’s Illustrated as micro content using these new microformats. Then, when it comes time for the Syndicate Everywhere phase of POSSE, I’ll do a quick paste-edit of the content into the external platform, always with a link back to the permanent, definitive page on Crafty’s. Here’s a recent LinkedIn post version of the micro entry linked above.

So I am now some level of POSSE compliant–I’m not kidding!–and writing daily! Hurray!

Why I Use Mimestream for Gmail

Mon, 29 May 2023 16:46:00 +0000

Why I Use Mimestream for Gmail Based on what I’m reading about Mimestream, I’m reconsidering my attitude that “I need a mail app that works with more than just Gmail,” because when I did an inventory and considered importance as a factor, 95% of my important email interactions are from Gmail accounts. I’m thinking I’d use Apple Mail for non-Gmail and Mimestream for Gmail (Mac only for now). I’m currently testing Mimestream on their free trial. Via Daring Fireball.

CLI tools for working with ChatGPT and other LLMs

Mon, 29 May 2023 16:07:00 +0000

CLI tools for working with ChatGPT and other LLMs I installed Simon’s llm, ttok, and strip-tags CLI tools and got them working, great stuff. Haven’t used them yet beyond running variations of his examples, but glad to have this toolkit. Also grabbed shot-scraper, I know that will come in handy. And installed pipx, very cool.

Working With My ChatGPT Intern

Mon, 29 May 2023 00:00:00 +0000

I set a record yesterday in terms of the most hours co-developing with ChatGPT (GPT-4 default model mostly). I made a decision recently to a) start writing daily, and b) follow POSSE–Publish (on your) Own Site, Syndicate Elsewhere–to keep control of my content. To enable a) and b), I expanded my blog, Crafty’s Illustrated, to support four new microblogging content types (like this one!). My ChatGPT intern helped with: Hugo templating, Tailwind CSS, shell scripts, how to resolve a Git merge issue, and more.

The Emergence

Tue, 18 Apr 2023 00:00:00 +0000

Here we go again. My last AI post hit the wire 31 days ago, after a week of agonizing over what to focus on, with Something Really Important seeming to land every other day and completely blow up my plans. One of my go-to AI trackers / prototypers is Simon Willison, creator of the Datasette open source project. In a recent conversation, we laughed about how tough it is to stay up with the AI Emergence, and I pointed out that it’s almost a full time job just keeping up with him, let alone everything else happening in the AI world. (Willison is prolific, to no small degree, because he’s skilled at applying tools like ChatGPT.)

One of my CS professors was Roger Schank, who just passed away in January. Schank was “a foundational pioneer in the fields of artificial intelligence, cognitive science, and learning sciences,” truly one of the original AI visionaries. But in a sense, Schank was ~43 years too early. I may sigh about the difficulties of staying current, but I’m also exhilarated to be here, able to observe and participate in The Emergence.

In any case—I’ve again passed through another week-long Period of Agonizing, and this is now my fourth AI post. The material doesn’t seem to be drying up, so I’m just going to run with it and start using Netflix style season-episode numbering. So Welcome To S1E4.

What are people “sleeping on” …

I’m borrowing the “sleeping on” reference from an interview Willison did on the Changelog podcast, where he said:

This is the thing I worry that people are sleeping on. People who think ’these language models lie to you all the time’ (which they do) and ’they will produce buggy code with security holes’—every single complaint about these things is true, and yet, despite all of that, the productivity benefits you get if you lean into them and say OK, how do I work with something that’s completely unreliable, that invents things, that comes up with APIs that don’t exist… how do I use that to enhance my workflow anyway?

And the answer is that you can get enormous leaps ahead in productivity and in the ambition of the kinds of projects that you take on, if you can accept both things are true at once at once: it can be flawed, and lying, and have all of these problems… and it can also be a massive productivity boost.

Question: “When will this AI stuff actually be useful?”; Willison’s answer, “It already is!”

Over on Reddit r/MachineLearning, someone asked Is all the talk about what GPT can do on Twitter and Reddit exaggerated or fairly accurate? I was struck by this response (since deleted):

Once this thing starts accellerating [sic], we won’t be having these discussions. That moment passed with GPT-4 and Bing chat starting using it. It’s just not readily visible to the populace and even to most decisionmakers, yet.

Question: “Is this AI stuff for real?”; answer, “Dude, we’re already WAY past that question.”

Yet not everyone is seeing it—hence Willison’s concern that people are sleeping on it. Why? One factor, I think, is Crypto Rash. Today’s AI hype sounds a lot like yesterday’s crypto hype; and crypto turned out to be a steaming pile of crap, a solution looking for a non-criminal problem to solve. Hearing similar hyperbole about AI—and seeing a lot of VC money pivoting away from crypto and chasing AI instead—I get it that people smell another hype bubble. Once burned, twice shy.

But … AI’s not like crypto. Drill down on crypto? Find nothing—vacuum. Drill down on the current AI emergence, find so much that, like me, you’ll have trouble even keeping up. We’re already past the “this looks promising” milestone, we’re at “this is delivering right now, today, this minute” and even more telling, the “yes, I’ll pay for it!” And the progress curve continues to look exponential.

Caught in the churn

If we really are at the early stages of an emergence, what are the implications? How does this change our day to day activities? How should we rethink our decision-making, how we run our businesses?

There’s an applicable concept from James S.A. Corey’s fantastic sci-fi series The Expanse known as “the churn.” Here’s a summary from phind.com, a ChatGPT-backed web search similar to (but better than, in my opinion) Bing Chat:

In the Expanse series, “the churn” is a term used to describe the cycle of chaos, change, and upheaval that the characters face due to various factors such as political, social, or criminal turmoil. The concept of the churn is particularly significant for the character Amos Burton, who grew up in a crime-ridden Baltimore where survival was a constant struggle app.thestorygraph.com.

The churn can be seen as a metaphor for the unpredictable nature of life and the constant state of flux that people experience. It represents the transitional periods during which stability and order are disrupted, and individuals must adapt or face the consequences. In the context of the Expanse series, the churn is often associated with the broader political and social upheaval occurring throughout the solar system, as well as the personal struggles and challenges faced by the characters.

In a great scene from the TV series, the Amos Burton character explains it himself:

Kenzo: It must be nice, having everything figured out like that.

Amos: Ain’t nothing to do with me: we’re just caught in the Churn, that’s all.

Kenzo: I have no idea what you just said.

Amos: This boss I used to work for in Baltimore, he called it the Churn. When the rules of the game change.

Kenzo: What game?

Amos: The only game. Survival. When the jungle tears itself down and builds itself into something new.

If you’ve read or watched The Expanse, you know that Corey’s churn is extreme—and that’s not what I anticipate from our AI emergence. But “churn” captures the dynamic—as we go about our daily tasks, run our businesses, plan and strategize—beneath our feet, the world will be churning away, reshaping itself.

A case in point: you’re familiar with the stock photo business? Well, that industry has been terminated. On Reddit r/Midjourney, this post outlines how it’s now possible to use ChatGPT’s newest GPT-4 model to generate great Midjourney text prompts, which in turn renders jaw-dropping images. In the comments, several users immediately focus on the implications for the stock photo industry:

Note, again, this is not “at some point we’ll be able to move off iStock”—but rather “we cancelled our iStock subscription last month.” And cost savings isn’t the only driver. Since each Midjourney prompt creates new, unique imagery, it eliminates an entire manual step in the workflow—the need to search around for where else a stock image might have been (over-)used.

The churn isn’t limited to smallish segments like stock photos, either. Meta is reeling, having bet heavily on “the metaverse.” Google is reeling, at risk of losing the Google search franchise to upstart AI competitors. Apple’s bag is mixed, with strength in ML-enabled silicon but weakness on AI software, particularly Siri’s grave limitations and snails-pace evolution.

Surfing the churn

So if we’re in the churn, if the jungle is indeed tearing itself down and building itself into something new, how do we function, how do we run a business? A good image to keep in mind, I think, is “surfing the churn.” We can’t predict, let alone control, how The Emergence will play out. But we can prepare ourselves to operate in a highly fluid environment—make ourselves ready to surf the churn. A few thoughts on how:

Don’t sleep on this. Recognize that something unprecedented is happening and that the alternative to surfing the churn is the churn surfing you. Prioritize accordingly.
Use AI yourself. Every knowledge worker can leverage AI tools like ChatGPT, right now, to claw back a significant percentage of their work week. For example, I recently talked with the CEO of a biotech investment fund. As a field, biotech has been a leader in applying machine learning (think protein folding); and it’s not clear whether generative AI tools will contribute much to core business of biotech. But every knowledge worker in a biotech business, including that CEO, is likely to find 10+ hours every working week if they actively, intentionally scan through their work week and leverage AI where AI makes sense.
War game AI for your business—offensively and defensively. Where can we apply what’s emerging to make our product better, increase our agility and effectiveness, reduce costs? Conversely, what AI moves might a competitor make that could put us in a world of hurt?
Stay up on what’s emerging. If you’re the CEO and not deeply technical, having a CTO or fractional CTO that can understand emerging AI and then translate that raw material into “most potentially impactful for us” is a great option, and can flow smoothly to the next point.
Get your feet wet with prototypes. Generative AI is your friend in applying generative AI … the cost of cranking out a prototype for a promising idea—now that we have ChatGPT and Copilot to write a lot of the code—is lower than ever. It’s a no-brainer to make this investment, and there’s little chance you’ll get to a great idea (or even achieve a decent understanding of what might be possible) without prototyping 10 less-than-great ideas first.

A footnote to all this dire churn and war talk: I’m optimistic about The Emergence. As I pointed out in Season 1 Episode 2, I’m a believer in the wisdom of John Seely Brown, former Chief Scientist at Xerox PARC, who pushed back against the predictions of singularity doom by pointing out, simply, that humans adapt. If we know that we’re in the churn, we can adapt, survive and maybe even thrive surfing the churn. But only if we’re awake to what’s happening.

Emergences over the last 31 days

The Segment Anything Model, out of Meta AI, is pretty amazing, as is the work that computer vision firm Roboflow is doing with it. In a nutshell, this is “generative AI meets computer vision” and if you’re intrigued, first listen to this Latent Space podcast featuring Joseph Nelson of Roboflow, and then watch this demo—very well done by Nelson and Latent Space’s swyx.
Open source AI progress, in the form of Dolly 2.0, and (just yesterday!) RedPajama‘a Data-1T LLM training set. In short: a lot of LLMs are closed source, have closed training datasets, or are constrained by non-commercial-use restrictions. Last week the Dolly 2.0 trained model arrived, licensed for research and commercial use. And just yesterday, RedPajama-Data-1T, a 1.2 trillion token dataset modeled on the training data described in Facebook Research’s original LLaMA paper—see Simon Willison’s early exploration here. Both developments lead in the direction of “we can build something great ourselves without depending on OpenAI or equivalent” …

ChatGPT as a weird intern. Simon Willison recently wrote a blow-by-blow of how he used ChatGPT as a pair programmer to write and run Python micro-benchmarks using ChatGPT’s code interpreter plugin. He included an epic analogy of ChatGPT as a “weird kind of intern”:
Here’s another analogy for large language models (since we can never have too many of those). It honestly feels a little like having a coding intern, with a strange set of characteristics:
- They’ve read and memorized all of the world’s public coding documentation, though they stopped reading in September 2021.
- They’re unbelievably fast—they take just seconds to output entire programs from scratch.
- If their code produces an error they can read the error message and attempt to update their code to fix it.
- They won’t argue with you. That’s not necessarily a good thing: if you ask them to build something that’s clearly a bad idea they’ll likely just go ahead and do it anyway.
- They respond incredibly well to feedback—they’ll apply it within seconds.
- You have to know how to prompt them. I’ve got pretty good at this now, but it takes a lot of trial and error, and I think requires a pretty deep understanding of how they work.
- You have to closely review EVERYTHING they do.
- They work for free.
And of course, they have zero understanding of anything at all. They’re a next-token-predicting machine with an unimaginably large training set.

The fact they can do even a fraction of the things they can do is, quite frankly, unbelievable. I’m still not sure I believe it myself.
Continuing the series of “LLMs I can run on my laptop,” Web LLM implements the fairly powerful vicuna-7b-delta-v0 model, and actually runs that model in a Chrome browser, making use of Chrome’s beta WebGPU API. Willison reports surprisingly solid performance.
The funniest news was Izzy Miller replacing his high school friends with an LLM chatbot. Miller’s long-running group chat contained 488,000 messages, which made for a great training corpus. So obviously he decided to train an LLM, hook up a simulated messaging UI, and eliminate the friends. Check it out—he did a great writeup and the LLM’s behavior is, like Izzy’s friends, pretty hilarious.

March Madness, AI Style

Sat, 18 Mar 2023 00:00:00 +0000

I’ve been trying to write a comprehensive, buttoned-up “AI Update,” covering all this month’s AI developments. But nearly every day, there’s been something new that’s at least important and sometimes game-changing. So I’m reluctantly giving up on comprehensive, and settling for selective commentary on a few of the more interesting bits.

First, let’s define AI …

The AI label is getting slapped on a lot of things these days, and I’m actually ok with that – it’s a short, handy, evocative term, useful as long as we get its definition straight.

AI does not mean “the machine is thinking.” That’s a real concept, but it has its own term: AGI or Artificial General Intelligence. AGI isn’t here today, is unlikely to arrive in the near future, and is not part of the AI news I’m writing about.

So if AI is not machine thought, what is it? Let me take a rough hack at a definition that I feel lines up with the reality of the tools that are our subject matter:

A system capable of doing tasks that humans can’t do, reasonably or at all, which therefore can therefore augment human capabilities.

That’s a lesser achievement than an AI actually thinking; but when you grasp and experience AI augmentation, you may conclude, as I have – “this changes everything” is not an exaggeration.

AI acceleration

Looking at March AI news from high above, what stands out to me is acceleration, possibly even exponential acceleration. Today’s acceleration is happening in the AI subspace I call Text AIs, which are based around Large Language Models (LLMs). The most famous Text AI is OpenAI’s ChatGPT, initially released just four months ago–though in the current mode of acceleration, those four months seem like ages.

The acceleration in the Text AI space has a precedent, from way back in August 2022, in what I call the Art AI space. Text AI and Art AI are close relatives within the AI family tree, both falling under the branch called Generative AI. Well known Art AIs include OpenAI’s DALL-E 2, my go-to tool Midjourney, and Stability AI’s Stable Diffusion.

Data scientist and open source developer Simon Willison is a fellow observer of emerging AI, and he describes the close parallels between Art AI’s acceleration in August and Text AI’s today:

Large language models are having their Stable Diffusion moment …

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment … The resulting explosion in innovation is still going on today.

That Stable Diffusion moment is happening again right now, for large language models—the technology behind ChatGPT itself.

I believe Text AI will follow the same “sustained explosion” trajectory of innovation seen in Art AI.

Order of magnitude better

Four days ago (Tuesday March 14th), OpenAI released GPT-4, making it available through both their ChatGPT conversational interface, and the OpenAI API. Early testers are reporting that the GPT-4 model provides a substantial, across-the-board improvement over its GPT-3.5 predecessor.

My personal experience mirrors these reports. I just conducted an 11-question, 2800 word conversation with GPT-4, relating to USDA data models, available datasets, and geospatial data, including several deep dives into nuanced questions about relationships between datasets and entities. Though I’ll definitely be cross-checking the results, I have high confidence in the accuracy of the responses, and I’m astonished at how well GPT-4 “recognized” the gist of my questions and how thoroughly its responses addressed them.

OpenAI demonstrated visual inputs (not yet released), allowing prompts to include interspersed images and text. And this doesn;t just mean it recognizes a picture of a cat and substitutes “cat” on the prompt – GPT-4 “understands” what kind of an image it’s seeing, the context of that image:

Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs.

Check out this example from the GPT-4 announcement:

How crazy is that? Other examples include interpreting a business barchart, and solving an engineering test problem (in French!) where a diagram is integral to the question. Mind-blowing.

Last but not least, GPT-4 delivers an eightfold increase in the size of its prompt. GPT-4 now accepts prompts of 32K tokens (25K+ words), compared to 4K tokens / 3500+ words for GPT-3/3.5. Among many other benefits, this order-of-magnitude increase gives GPT-4 a way to “get current.” Training a huge GPT model with, basically, all of human knowledge, is unsurprisingly a time-consuming (as well as expensive) process. There had to be a cutoff date for the knowledge gathered so it could be put through months of model training. What is that date? We can ask GPT-4 itself:

What is the cutoff date of your training data?

My training data includes information up until September 2021. I do not have knowledge of events, developments, or advancements that occurred after that date.

But – with GPT-4 and 32K tokens, we can actually inject newer information as part of the prompt – and 25,000 words is a lot of information. OpenAI demoed this, incorporating the post-2021 updated tax code in the prompt and then asking a question that required knowledge of the latest changes. GPT-4 was able to accurately address questions that factored in this updated tax code.

Order of magnitude cheaper

OpenAI kicked things off on March 1st, when they announced new ChatGPT and Whisper APIs, along with tenfold less expensive pricing. (Caveat: a few days later, OpenAI released GPT-4, with premium prices.)
Simon Willison points out that, thanks to Facebook Research’s LLaMA model, Stanford’s LLaMA-based Alpaca model, and related work, training our own models is getting radically cheaper:

I think it’s now possible to train a large language model with similar functionality to GPT-3 for $85,000. And I think we might soon be able to run the resulting model entirely in the browser, and give it capabilities that leapfrog it ahead of ChatGPT.

Bear in mind that, in the very recent past, training a GPT-3-grade model had been considered out of reach for anyone but the largest companies with huge research budgets, e.g. Big Tech.

Order of magnitude smaller (lighter weight)

This week was a crazy cascade of “Oh yeah? HOLD MY BEER!” announcements, where every day we learned of a new GPT-3-grade model becoming available on some new device orders of magnitude less powerful than what OpenAI has powering their ChatGPT cloud. Before this week, to quote from Simon Willison:

The best of these models have mostly been built by private organizations such as OpenAI, and have been kept tightly controlled—accessible via their API and web interfaces, but not released for anyone to run on their own machines.

These models are also BIG. Even if you could obtain the GPT-3 model you would not be able to run it on commodity hardware—these things usually require several A100-class GPUs, each of which retail for $8,000+.

But this week all that changed. First there was “Run a full GPT-3-grade model on your Mac!” (lightweight port of Facebook’s LLaMA model to C llama.cpp); then came “Get it to run on a Pixel 6 phone!” (@thiteanish on Twitter); next up, “Have it running on a Raspberry Pi!” (miolini reporting on llama.cpp GitHub); then “Run LLaMA on your computer in one step via node / npx!” (Dalai project, leveraging LLaMA and Alpaca, by coctailpeanut); and then just yesterday, “Train LLaMA to respond like characters from The Simpsons!” (Proof of concept from Replicate, trained with 12 seasons of Simpsons scripts).

The point here isn’t that we’ll run production LLMs on our MacBook Pros; it’s that, all of a sudden, we’re within sight of running ChatGPT-grade models on a low-cost server that we cloud-rent or own outright. And of training our own models on a reasonable budget.

And don’t underestimate the value of a developer being able to run a serious LLM on their development machine. Hands-on experimentation and prototyping is the top of the new application funnel.

Order of magnitude more potential applications

We’ve described three separate order-of-magnitude developments in the Text AI / LLM space above–better, cheaper, smaller. Each of these removes roadblocks that might have killed the feasibility of interesting applications. The feasibility space just got three orders of magnitude larger, so expect three orders of magnitude more applications to start appearing. More applications feeds back demand for more innovation, so as we’ve seen the the Art AI space, expect the acceleration to sustain.

If you’re a business with potential applications for Generative AI, be it the Text or Art flavors, now is a fantastic time to get hands-on and start prototyping.

Meanwhile, at Google …

Google made a few panicked, flop-sweat non-release announcements this week, including PaLM and MakerSuite, as well as a plan to embed generative AI within Google Workspace. Interesting, but just announcements, not anything anyone can use today. Catch up by announcement.
In the meantime, Google’s cash cow Search is under attack. In my own quest to understand the nuances of USDA data, to the degree that GPT-4 excelled, Google search stank. If GPT-4 scored an 8/10, Google was maybe a 2/10. Massively less useful; far more time-consuming and frustrating. Google search has been under self-inflicted threat for a while – see Google Search Is Dying from 2022 – but since ChatGPT launched in November, that threat’s timeline has compressed.

Meanwhile, at Apple …

Over in Cupertino, I suspect that Apple is having its own moments of panic.

I tend to give Apple the benefit of the doubt, to believe that they’re playing the long game, that they have a grand strategy, that maybe I’m just not seeing it. That often proves true; consider Apple Watch, where limited early generations led to today’s incredible wrist computer / health sensor / emergency detector, in a sizable market that Apple effectively created and now dominates. Long game, grand strategy in action.

But this week there were strong indications that Apple’s AI mainstay, Siri, is as much of a steaming pile of unmaintainable crap as it has always appeared to be from the outside, and there may be no fixing it. When you hear descriptions like these, you can smell the mess from a mile away:

Siri ran into technological hurdles, including clunky code that took weeks to update with basic features, said John Burkey, a former Apple engineer who worked on the assistant.

Apple is perhaps less screwed than Google, whose primary revenue producer is under attack. And Apple has invested heavily to make their hardware AI-capable by including Neural Engine cores (16!) in their A and M series processors. But I wonder how well Neural Engine will support the emerging AI models that we actually need instead of Siri? AI is critical for Apple’s future, and it’s emerging differently than Apple (and the rest of the universe) expected, so I’m guessing Apple execs are more than a little unsettled.

We need open, trained models

Too much of the news described above is based on closed technology and proprietary data. I’ve no problem with OpenAI earning back the huge investments they’ve made, and their APIs are a fine solution for a major swatch of application needs. But there are a multitude of other scenarios where, due to cost or privacy or need to maintain control, OpenAI isn’t an option.

What we need is the AI equivalent of free / open software – open, free, community developed trained models. Just add the application and a reasonable dollop of infrastructure, and you’ve got an offering that’s inexpensive to operate and 100% under your control.

Such work is underway; I’ve not had time to research it yet but will. I’m also encouraged by Willison’s point above, that training our own models is becoming increasingly cost-effective and viable.

Venture Funds Arrive in the Mastodon Space

Thu, 09 Mar 2023 00:00:00 +0000

I recently had a Mastodon exchange with John Gruber of Daring Fireball (and Markdown) fame. Gruber had just written a short piece about Medium launching their own Mastodon server – with the fun, short domain name of me.dm. The new server will be open to those with paid Medium accounts. He points out that Medium’s foray into Mastodon is doubly unique:

For one thing, amongst Medium’s co-founders are Ev Williams (who also served as Medium’s CEO for most of its existence) and Biz Stone — two people who were at Twitter at the beginning. Williams also served as one of Twitter’s numerous CEOs.

Second, Medium is a commercial company, having raised, according to CrunchBase, $163 million (so far). To my knowledge no company with such resources has started a public Mastodon instance to date.

Fighting Words

What Gruber said next, though, bothered me:

I am very uncomfortable with the fact that nearly all Mastodon servers are free-to-use volunteer efforts, funded by voluntary donations. That’s not sustainable. I suspect a lot of Mastodon servers that seem to be thriving today won’t be around in 5 years, taking all of their posts with them.

In fairness, he went on to express some of the same commercial/venture reservations I’ll talk about below:

I don’t feel great about the fact that Medium is venture-backed, either, but they do charge $5/month or $50/year for a membership. I like paying for the services I use. Twitter is free to use and look how that’s gone.

Well, everyone knows that recent converts are the most fervent. Although I first joined Mastodon in November 2018, I only recently became an active user, when in November I bailed out of the Musk-hole. (I’m pro-EV but so happy I didn’t buy a Musk-mobile.) I focused on the first sentences, not the later caveat, and my reaction was strong and immediate: HERESY! I’m not normally one to fire off a post complaining about something somebody said, but this time I did.

The Doctorow Enshittification Hypothesis

Besides being a recent Fediverse convert, of late I’ve been drinking the Cory Doctorow kool-aid. I’ve come around to his way of thinking about for-profit social media, with its seemingly inevitable progression from great free service, to critical mass must-have-this, to lock-in where the cost of switching gets so high that users stick around for years even though things are getting worse and worse. And that’s the “normal” progression; much worse can happen, such as an alt-right billionaire wingnut buying the platform and running it through the Earth’s crust all the way down to the seventh circle of hell.

Doctorow has written extensively on why the Fediverse, lacking commercial motivation to extract and enshittify, is a potentially-viable model for healthy social platforms. But guess what, that model depends on the support of the community to function. Those “free-to-use volunteer efforts, funded by voluntary donations” that Gruber describe are fundamental to the concept of the Fediverse, to its avoiding enshittification and commercial extraction.

The Dead Post Society

What about Gruber’s suspicion that “a lot of Mastodon servers that seem to be thriving today won’t be around in 5 years”? I responded that switching servers was easy on Mastodon; and it is. But (now I know) there’s more to the problem than that – there’s also the content. I had assumed that “moving accounts between Mastodon servers” meant “moving all my stuff,” posts included. Turns out that’s not the case. So even though one’s Mastodon account can smoothly transition and live on, posts can disappear. Gruber pointed out that:

I have 20 years of links at DF. Most of the older ones are dead, especially to blogs. You know which ones aren’t dead from 2002? Blogspot.

Yeah, he would feel that pain as much as anyone. Blogspot/Blogger is a commercial entity, and yes they have remained viable for … 20 years. Gruber’s right about this. My counter argument has some merit, though, I think:

The difference here is you’ve got a bunch of bright high-energy hackers ready to jump in and build what’s needed, unconstrained by a central for-profit entity in extraction mode. If server catastrophes do indeed become common, I have confidence that #fediverse solves it — auto migration tools, etc.

You probably followed the near shutdown of mastodon.lol — I’m just peripherally aware of it but a- the community stepped in and worked a solution to keep it running b- the whole experience was 100X less disastrous than any number of brill billionaire moves over on the #birdsite — e.g., let’s shut down 3rd party apps and APIs with no warning.

Big commercial / venture funded players => #enshittification => #extraction — that’s the bigger risk IMHO.

Having had time to mull it over, I’m still of that opinion. Enabling useful, healthy, non-enshittified social interaction online is critical – the last decade (and the Musk administration’s actions at Twitter) should make that obvious. Dealing with server transitions and dead post-links is indeed a pain, and that problem hasn’t yet been solved in the Fediverse (although it’s being discussed). But it’s worth dealing with this inconvenience to have a real social platform that’s not trying to algorithm and extract us to death.

Note to the Fediverse community: wouldn’t it be fantastic to crack the nut of full server migration including posts, with link preservation? Yeah, I know, it’s hard, but still. Let’s show ’em.

So What About Commercial Offerings for Mastodon?

Back to the original subject. What I took issue with in Gruber’s DF post was not Medium getting into the Mastodon server business. I’m actually enthusiastic about it. The more the merrier. It’s an endorsement of the platform. And there’s a decent chance that me.dm will still be running 10 or 20 years from now. Some commercial platforms, like Blogspot/Blogger, have managed to have that kind of longevity.

I’m also fine with Cloudflare’s Wildebeest Mastodon server offering. I’ve seen some concern and criticism (and I’m not up on all the details), but generally my attitude is the same as with Medium – the more the merrier, it’s an endorsement of the platform. And likewise the Mammoth Mastodon app, which is venture funded.

What the Fediverse does need to do, however, is keep its (our!) eyes open for the well-funded monopolists trying to take over and lock people in. We need to ensure that those efforts fail spectacularly, that we maintain control of our own destiny. I phrased it this way:

I don’t think [Medium’s mastodon offering is] necessarily a bad thing, as long as the community always has “fuck-you servers” — an easy exit from the commercial offerings if the commercial players start playing the #extraction game.

P.S. Ironically, Gruber skewered Medium back in 2017 about an early form of enshittification, the dickbar. That would look great on a UX designer’s resume, no? “I invented the dickbar.”

Alarmed About AI?

Mon, 20 Feb 2023 00:00:00 +0000

We’re currently experiencing a wave of AI news stories, including:

Uproar over artists’ work being used without permission by art AIs like Midjourney and DALL-E 2
A flood of ChatGPT stories, including the phenomenon of ChatGPT confidently presenting information that proves to be completely erroneous, and even hallucinating non-existent references
Microsoft’s just released Bing AI, acquired through a 10 or 11 figure investment in ChatGPT maker OpenAI, is spawning multiple personalities and going way off the rails

I recently had a brief interchange on Mastodon about Bing’s troubles. Simon Willison, maker of the excellent Datasette open source data exploration tool, had commented on a post by Stratechery’s Ben Thompson, who had captured Bing in the act of misbehaving. (John Gruber chimed in shortly after with a nice follow up, “Bing, the Most Exciting Product in Tech,” that calls out further commentary by New York Times columnist Kevin Roose.) Willison expressed alarm at Bing’s off-the-tracks behavior, with good cause–his comment to me was, in part:

… but it’s pretending to be a search engine!

Alarm over AI isn’t something new. Ray Kurweil’s book, “The Singularity is Near,” was published in 2005. From that book:

This book will argue, however, that within several decades information-based technologies will encompass all human knowledge and proficiency, ultimately including the pattern-recognition powers, problem-solving skills, and emotional and moral intelligence of the human brain itself.

And Kurzweil’s use of the term “singularity” was drawn from CS professor and sci-fi writer Vernor Vinge’s 1993 essay, “The Coming Technological Singularity: How to Survive in the Post-Human Era” where Vinge stated:

I argue in this paper that we are on the edge of change comparable to the rise of human life on Earth.

In April 2000, Bill Joy, CEO of Sun Microsystems, penned an article on wired.com titled “Why the Future Doesn’t Need Us.” Joy broadened the commentary to indict technologies ranging from robotics to genetic engineering to nanotech:

The new Pandora’s boxes of genetics, nanotechnology, and robotics are almost open, yet we seem hardly to have noticed. Ideas can’t be put back in a box; unlike uranium or plutonium, they don’t need to be mined and refined, and they can be freely copied. Once they are out, they are out.

Kurzweil, Vinge, and Joy all painted bleak visions of the future of humanity–in a nutshell:

We’re doomed.

Pushback

One contemporary, however, pushed back–John Seely Brown, then Chief Scientist at Xerox and director of Xerox PARC. Yeah, that Xerox PARC. Within a week of Joy’s article, JSB, along with PARC collaborator and Cal Berkeley researcher Paul Duguid, published a rebuttal, “A Response to Bill Joy and the Doom-and-Gloom Technofuturists.” A few excerpts from their rebuttal:

These self-unfulfilling prophecies failed to see that, once warned, society could galvanize itself into action …. Social and technological systems do not develop independently; the two evolve together in complex feedback loops, wherein each drives, restrains and accelerates change in the other. Malthus and Wells—and now Joy—are, indeed, critical parts of these complex loops. Each knew when and how to sound the alarm. But each thought little about how to respond to that alarm.

Once the social system is factored back into the equation like this, the road ahead becomes harder to navigate. Ultimately we should be grateful to Joy for saying, at the least, that there could be trouble ahead when so many of his fellow digerati will only tell us complacently that the road is clear.

Not long after, I had a chance to see JSB speak in person, where he was asked, “Are you pessimistic or optimistic about the future of humanity?” My recollection of his reply:

I’m optimistic. The pessimistic visions don’t take into account the fact that humans adapt–once we perceive and understand the risks of emerging technologies, we adapt.

JSB again expressed gratitude for the work of Bill Joy and others, as providing critical input to the human adaptation loop.

Returning to today’s AI stories and our attitudes towards them. First off, let’s set aside the misinformed–or Fox News / Meta / Vichy Twitter disinformed–that’s a topic for another day. Among the informed, some (like Simon Willison did in a small way) will raise alarms, which are liable to be genuinely useful as input to the feedback loop. But they won’t be good prophesies. There, as JSB suggests, they’re likely to be self-unfulfilling.

My own attitude and reaction? Predominately curiosity, with a side of WTF-surprise and a good helping of amusement:

Wow! What the actual hell is Bing doing here?! I bet the M$ execs are losing it! What is emerging here? What does this tell us about what’s likely to happen next in the AI space?

Based on what I read, I suspect that Ben Thompson and John Gruber might share this mindset.

History repeats itself. I personally believe that today’s AIs are already useful and usable, but just scratch the surface of what will emerge in coming years. We’re at the start of an emergence like the Industrial Revolution, but if you think you can predict how this will play out? Popcorn please!

Introducing Crafty CTO

Wed, 08 Feb 2023 00:00:00 +0000

I joined TechCXO Back in 2019, and it’s been a great experience surrounded by an awesome group of people. I’ve learned so much … most importantly, I think, that I love being a fractional CTO, a role that allows me to make use of everything I’ve learned and experienced over a long technology-focused career.

Today I’m turning the page – but not on what I do: my focus is still all-in on helping businesses as their fractional CTO. What’s changing, rather, is the entity I’m delivering through – a new company called Crafty CTO. I’m Crafty’s founder, chief executive, owner, the guy running the place, or to use a phrase that dates to before the Civil War, the chief cook and bottle washer.

I continue to partner with and do work through TechCXO, for their existing accounts and when a broader set of skills beyond technology is needed.

What Crafty does

At any given point in time, fewer than half of all early- and growth-stage companies have seasoned CTOs in place. The reasons vary, with some combination of “can’t afford it yet,” “moving too fast right now,” and “special situation” generally at the heart of things.

If technology is essential to the company’s viability (when is it not in today’s world?), there are serious risks and valuation penalties for going without a seasoned CTO. And that is precisely the gap that a strong fractional CTO can fill.

I work with businesses as early stage as pre-revenue, and as mature as growing-nearing-exit with engineering headcount approaching 100. The essence of what I bring to the table when I’m out there helping a business as their fractional CTO: recognizing what aspects of technology are uniquely critical for that business, and then bringing deep insights and guidance to the executive team in these critical areas. That’s where the name “crafty” comes from – I feel it captures the essence of what I do.

You can see more specifics around what we do here, and find an explanation of why the fractional model works so well here.

Finally, I’d strongly recommend reading this post about how we leveraged AI models to accelerate Crafty’s launch.

Blog/Website Relaunch: Under the Hood of craftycto.com

Wed, 08 Feb 2023 00:00:00 +0000

Back in 2019, I launched ThinkFractional.blog and took advantage of the opportunity to learn a new platform – Ghost. That was fun, and I found Ghost a nice improvement over Squarespace. I decided on the same approach this time around – take the opportunity to learn something new, state of the art, and very different.

Besides Ghost, I’d been using Cloudflare for a while. There was something DNS related I needed that Ghost couldn’t handle on its own – maybe supporting both the naked domain and the www. version – and Ghost recommended setting up a free website on Cloudflare connected to Ghost as a solution. The Cloudflare experience solving this issue was really good – a lot of capabilities, a well-crafted interface, for $0.

It’s not like ThinkFractional.blog needed Cloudflare to handle its traffic. The only post I ever created that got any traction (front page of Hacker News) was titled Secret iPad, published on the 10th anniversary of the launch of the original iPad. It described my experience at Agile Partners when we got early hands-on with the original iPad at the Cupertino campus. I figured that Apple probably wouldn’t even notice the post, and after 10 years, they could care less about it. I also assumed that NDA we signed had long since expired. Wrong, wrong, and wrong – Apple noticed (presumably someone there reads Hacker News), Apple cared, and Apple’s vaguely worded NDA didn’t actually specify a term, meaning its term was effectively for–ev–er. So that post lasted about a week before I had to take it down. To their credit, Apple was nice about it, friendly phone call from a long-time Apple contact as opposed to a lawyer letter.

New needs

This time around, my requirements were a little broader. Yes, I needed to replatform my blog, but I also needed a web presence for my new fractional CTOing entity. The blog needed a new name, but I also working on naming the new entity, and realized it might be possible to find a name that would work well for both.

In the end, I found a name that I think works well for both (ask me again in 5 years): Crafty CTO / craftycto.com for the entity, with the blog living at CraftyCTO.com/blog.

All this to say, whatever platform I chose needed to support blogging features, but also be flexible enough to handle the rest of the Crafty CTO web presence.

What I hate about you

I’ve been working with Content Management Systems since my days at fast five dotcom consultancy Viant. Back then, big names included Vignette StoryBuilder and Interwoven TeamSite–and they came with big price tags as well. Ugghh. CMSs have always been a pain for creating content, and limiting in terms of the websites they can create.

Ghost and Squarespace are … CMSs. Nicer than their 2001 predecessors, for sure, but still a layer between my content and the bare iron of delivered HTML. They allow me go about my work without needing to know HTML or CSS. (Yes, they give me hooks to let me override things like CSS, but that carries a learning curve and kind of defeats their purpose.) I lived with what they gave me, meaning I was limited in what I could do – in Ghost, I used the themes and components Ghost provided. Not terrible, but I had to fit into Ghost’s box.

As production websites, both Ghost and Squarespace deliver content dynamically, and their sheer mass of features means they tend towards bloated and slow. Not terrible, certainly functional, but not on the snappy end of the spectrum.

Ghost also wasn’t free – in fact, the plan I was on was pretty damned expensive at $30/mo. That has been bothering me for a while and was the reason I first decided to move on.

Now for something completely different

So I searched, started to hear about static site generators, came across Hugo, and finally a post on how to run a Hugo site using Cloudflare Pages.

Hugo has a large, active, and fantastically helpful community, great themes, and extreme static build performance (currently ~1400ms for the entire craftycto.com site).

Static pages, running over Cloudflare edge network – now that was really interesting. Not only would I go from bloated-dynamic-CMS to super-fast static HTML – but that site would be hosted on the edge via Cloudflare. Two orders of magnitude, now we’re cooking. And the cost? Free. All systems go.

Content authoring, repo, and deployment

But wait, you ask, the content still has to get written, stored, and deployed, how does that work? Great question, with a great answer that still costs $0.

Content is authored in Markdown. I now fully appreciate John Gruber’s contribution to the open web. It’s simple and direct, plenty good enough for blogging and web presence content – especially with the option to augment Markdown using Hugo templates and shortcodes.
Content is stored in a git repo, GitHub in my case, and gets me all the benefits of software-engineering-grade version control.
Content deployment is likewise via git deployment, again with all the benefits of software-engineering-grade CI/CD tooling. That’s how Cloudflare Pages operate – connect your Git repo, and your Cloudflare Pages site auto-deploys when you push to your GitHub (or Bitbucket, etc.) repo.

Bare iron

I love the whole Hugo-with-markdown-content-via-git-over-Cloudflare solution. My little site runs in a barest-of-bare-iron way – lightweight, static HTML, delivered at the edge. It doesn’t get more bare-iron that that, in today’s world.

Hugo themes gave me a great starting place – this site is based on the PaperModX theme – but thanks to very smart content organization and a ton of powerful features including, most fundamentally, Hugo’s Go-based templating language – I can customize anything and everything and get things exactly how I want them. And did I mention that I learned HTML and CSS along the way?

Tweakability, ahem, may also be the biggest downside of my new solution. I’m active in the Chicago CTO Group and recently a member was asking, what platform should I use for my new website/blog? Several of us chimed in with solutions like Hugo/git/Cloudflare but others expressed concern about the risk of forever-fiddling. We’re both right, I think.

For another day

Perhaps I’ll do another post digging a little deeper into Hugo, and also Tailwind CSS, something else I discovered and love. But that’s for another day – next up, I promise, will be content more core to Crafty CTO and fractional CTOing.

Accelerating Crafty's Launch Using AI Models

Wed, 08 Feb 2023 00:00:00 +0000

(Midjourney) Mosaic from a minimal prompt "a very intricate mosaic, bright colors"

I’m not a patient person – in fact, I consider my sense of urgency to be a superpower. So once I made the decision to move ahead on the launch of Crafty CTO, I wanted to get on with it. Even for the impatient, though, there’s a minimum quality bar to hit if you want a credible business footprint. At a minimum, I think:

A name that works
A decent logo and some basic brand concept
Good words and imagery, for the core business website as well as the blog

Not so difficult … But I was getting bogged down, partly because this list was in addition to everything else, and partly because these were all “hard to get started on” work and often far outside my comfort and competency zone. I’m here to report that AI ack made a big contribution to getting me unstuck, moving, and launched, with what feels like enough quality to be credible. Time will tell.

AI for words and art

It’s ironic that my friend and past business partner John Berry was first to get in deep with the “word AI,” while I got there first on the “art AI.” Or perhaps it’s not ironic, but rather based on weakness. John is (among an unfairly long list of other things) a skilled UX designer and great with visuals; I am skilled with words.

I started out skeptical of ChatGPT, the “word AI,” and its Large Language Model (LLM) brethren. ChatGPT seemed like yet another in a long line of amusing tech novelties dating back to the Sound Blaster talking parrot. My attitude was, “there’s no way something like this can help someone like me.” Perhaps John had the same skepticism towards art models.

Anyway, John got there first on ChatGPT – “there” meaning he realized this wasn’t just a toy, it was the beginning of a sea change. He was kind enough to enlighten me; we had a conversation along the lines of:

Let me show you a few things here.

Are you getting the implications?

You’re not thinking big enough.

I came away from that conversation with my mind blown. I immediately had ChatGPT outline Crafty’s website, and then had it write the content. In the end, almost everything ChatGPT wrote got edited away, but the process unblocked that part of my progress, answering the questions “what kind of things should be on a website about a company that does what Crafty CTO does” and “what might that content look like” …

Artless

Meanwhile, I was already passing familiar with art models; my first OpenAI account wasn’t for ChatGPT, but rather for art generator DALL-E 2. It was a lot of fun – I never had any artistic talent, but with DALL-E I could create pretty nice imagery. With words!

Then came the decision to launch Crafty CTO and suddenly I had a burning need for art, starting with a logo. Could an art model produce an acceptably good logo? It turned out to be a perfect starting place for the serious application of “art AI.” Here I got out ahead of Mr. Berry, at least temporarily – with his skills he’ll pass me soon, maybe already has. But I think some of my creations, at least, surprised him.

(Midjourney) This one surprised John Berry "leonardo davinci schematic of nuclear attack sub"

The logo challenge

Everyone has a different mix of gifts. Personally, besides an affinity for technology, I am gifted when it comes to words and writing – my father was an English major and that certainly had both nature and nurture impact. Strangely, though, I’m also strongly visual – I love visual tools like mind maps and Miro boards. It’s just that I’m completely talentless at the actual creation of art. In middle school, my only artistic win was an abstract soap carving – “abstract” being my way of getting out of carving something recognizable.

So creating a decent logo on my own was inconceivable. What I learned, through necessity and trying, is that art generation models grant me a skill I’ve never had – the ability to create art. Using a tool like Midjourney, my gifts as a writer and conceptual thinker transform directly into artistic expression. I am not an artist, yet I created every image in this post.

I believe that the logo effort was successful. Here are the winner and runner up for Crafty CTO’s logo:

(Midjourney) Crafty CTO logos, winner and runner up Left: "professional services firm named crafty CTO, head of a fox, maximum of four colors, logo style, flat, vector"; right: "simple logo icon design red fox face, vector, flat 2d, black background"

Midjourney

I’m far from an expert, but I have worked quite a bit with with DALL-E 2, the local macOS app DiffusionBee, and Midjourney. My take on Midjourney:

Art generation tuned to deliver the incredibly gorgeous

Midjourney has become my go-to and I used it for most of what you see. It blows my mind. My writing skills enable me to create amazing, gorgeous artwork. If I can conceive it and describe it, I can create it.

(Midjourney) Dreamscape "dreamscape young red haired girl having dream with cute black bunnies"

There’s still work to do once the model has delivered – a few minutes, usually, of post-edits like masking for transparency and resizing – I’ve gotten fairly proficient with Pixelmator Pro on the Mac, a great app that itself leverages machine learning in several tools such as Super Resolution.

The art of writing prompts

I mentioned writing and conceptual skills above, but there’s a bit more to it. To be a good “AI driver” you need:

The ability to write good AI prompts, which is a combination of knowing what words trigger what behavior in a particular AI, and knowledge of the AI’s parameters and what they do
The imagination to iterate your prompts when you get stuck in a rut of bad outputs
Learn by trial and error what imagery works in what situations
Iterate – iterate – iterate!!

This is a real set of skills that need to be developed, and I’m still a newbie – there is incredible depth especially around prompt writing. Fortunately lots of sites are popping up with guidance and examples for each AI.

On the iterate point, here’s a small slice of my Midjourney homepage:

Here are more examples with prompts.

(Midjourney) Fractional "stylized representation of fractional fractions slicing"

(Midjourney) Clock face time slices "clock face consisting of separate wedge shapes that represent slices of time"

(Midjourney) Fox medallion "beautiful ornamental circular medallion of a wise crafty fox, realistic eyes, face, muzzle, intricate and detailed, glossy, finely etched silver with gold inlay"

(DiffusionBee) Another fox medallion "wise fox in a beautiful round ornamental medallion, realistic eyes, face, muzzle, ears, mainly made of silver with gold inlay, a few gemstones, amber, blue diamond, sapphire, intricately etched, Etching, Detailed and Intricate, extremely fine etching"

(DALL-E 2) Black rabbit "Rabbit with black fur by Frida Kahlo"

Heavy lifting, cognitive load, and unblocking

For me, both Word AIs and Art AIs serve to unblock my progress. Since Midjourney or ChatGPT do most of the heavy lifting, my cognitive load is lightened and I retain the mental reserves needed to do the difficult – start something new, finish something, or work outside my comfort zone.

I’m finding that when writing in my blog, I can now create imagery on the fly without bogging down the process. This “launch” image for my post about launching Crafty’s blog took me all of 10 minutes, from first keystroke to production-ready. (Firefly fans: definitely a reaver ship, right?)

(Midjourney) Launch imagery for blog "futuristic spaceship launching hyper realistic 35mm technicolor"

Bottom line – AIs accelerated Crafty CTO’s launch

The combination of Midjourney and ChatGPT enabled me (alone!) to launch Crafty’s digital footprint in about half the time (who knows, really, how long I might have stayed bogged down?) and with a degree of quality I would likely not have achieved otherwise, at least not without a lot more time and money.

Highly recommended.

Why Fractional?

Tue, 07 Feb 2023 00:00:00 +0000

Fractional CTO services can be a valuable asset for businesses that want to take advantage of technology to grow and improve their operations, but don’t have the resources or expertise to do so in-house.

Access to experienced technology leaders

One of the main benefits of fractional CTO services is access to experienced and knowledgeable technology leaders who can provide strategic guidance and support for your business. A fractional CTO can help you create a technology roadmap that aligns with your business goals and objectives, and provide expert advice on the technologies and solutions that will best support your needs.

Flexibility and cost-effectiveness

Another advantage of fractional CTO services is the flexibility and cost-effectiveness they offer. Instead of hiring a full-time CTO, which can be expensive and may not be necessary for all businesses, fractional CTO services allow you to bring in experienced technology leaders on an as-needed basis. This can save you money and ensure that you have the right level of support and expertise when you need it, without the overhead of a full-time employee.

Staying up on the latest technologies

Additionally, fractional CTO services can help businesses stay up-to-date with the latest technologies and trends. As the technology landscape continues to evolve, it can be challenging for businesses to keep up with the latest developments and understand how they can be leveraged to improve their operations. A fractional CTO can provide valuable insight and guidance on how to incorporate new technologies into your business, and help you make informed decisions about which technologies to invest in.

Overall, fractional CTO services can provide a range of benefits for businesses, including access to experienced technology leaders, cost-effectiveness, and the ability to stay up-to-date with the latest trends and developments in the technology field. Whether you’re a small business looking to grow and improve, or a larger organization looking for additional support and guidance, fractional CTO services can be a valuable resource for achieving your technology goals.

Building with Shape Up

Sat, 02 May 2020 01:00:00 +0000

This is the sequel to my post Shape Up Your Agile, about the book Shape Up: Stop Running in Circles and Ship Work that Matters by Basecamp’s Ryan Singer.

This post incorporates a number of quotes and hand-drawn illustrations from Shape Up. All link to their source in the online book and are copyright ©1999-2020 Basecamp.

In my first post, we looked at Shape Up fundamentals like six-week cycles, appetite, work shaping, and the betting table. We left off at the handoff of shaped, de-risked work to the development teams. This post will focus on Shape Up’s approach to building. (And shipping!)

Hand it over, with autonomy

Before the dev team gets a project in Shape Up, two vital steps have already occurred:

Work shaping – identifying what we want and what dev resources we’re willing to invest to get it
Betting – firmly committing ourselves to this project versus others competing for the same resources

Because these two pre-steps are beautifully thought through, and thanks also to other fundamentals such as six-week cycles and the circuit breaker, Shape Up positions dev teams to enjoy a degree of autonomy far beyond that of the usual agile approach. In effect, we are saying:

“Here you go, dev team. This is what we want, and we’ll leave you alone for six weeks to build it. Implement the details as you see fit.

Good luck!”

Traditional agile, let’s say Scrum for example, looks a bit different:

“OK dev team! Here are your tasks! We managers, infinitely wise, have selected these for you to do these over the next two weeks! Dig right in!

And don’t worry, we’re right here if you need help or any kind of decision. In fact, let’s meet every morning to check in! Why don’t we stand up at that meeting, sound good?

See you soon!”

Autonomy (project) or code monkey (tasks), which do you think your dev teams will prefer?

Ramping up

What happens during the first two or three days after the project is kicked off? Lots of genuine work – orienting, exploring, understanding – but little of it visible.

Shape Up encourages patience during these early days of the build cycle:

Asking for visible progress will only push it underground. It’s better to empower the team to explicitly say “I’m still figuring out how to start” so they don’t have to hide or disguise this legitimate work.

Let there be tasks

Since the dev team wasn’t handed a list of tasks, they need to develop their own. Shape Up differentiates between imagined tasks – " tasks we think we need to do at the start of a project" – and discovered tasks – “tasks we discover we need to do in the course of doing real work.” There’s nothing wrong with imagined tasks, and during early ramp-up, a number will come to mind and be captured.

Then, as they get their hands dirty, they discover all kinds of other things that we didn’t know in advance. These unexpected details make up the true bulk of the project and sometimes present the hardest challenges.

This is why, once the team is oriented, it’s time to dig in and build something real. Only by starting to do real work are we going to discover the full set of to-dos that make up our project.

Slicing off a piece to build

Shape Up warns us against building front end and back end components independently, and waiting until late in the project to start integrating them. Instead, we are encouraged to deliver integrated, demoable, “really done” pieces of work throughout the project, even here in week 1.

For this to be possible, you need to slice out a piece of the project that includes the right set of front end elements, combined with the back end elements that support them. As nicely illustrated in the book, this gets you a slice that works:

In week 1, we aim to carve out, build, and demo our first slice, beginning a pattern that will continue throughout the cycle. But not just any slice – Shape Up encourages us to “start in the middle” for our first visible slice of work, building something that is central to the project, as opposed to a peripheral like the login screen. The identified slice needs to be small enough to execute in a few days, and if possible, something novel, new to us.

Further tips on building this first slice:

Affordances before pixel-perfect screens – On the front end, start with rough, unstyled “affordances” rather than trying for pixel perfection.
Program just enough for the next step – On the back end, you don’t need to build everything, implementing a couple of fields or a bit of routing logic is fine, just enough to allow the front end of the slice to function.
Programmers don’t need to wait – The pitch we received starting the project gives programmers enough information to dig in. There’s no need to wait around for your front end developer to ask for something. Go ahead and dig in.

Integrated slice = “scope”

These things we’ve been calling “slices” or “pieces” are actually a vital element of successful building with Shape Up, and the have an official name: scopes.

We call these integrated slices of the project scopes. We break the overall scope (singular) of the project into separate scopes (plural) that can be finished independently of each other.

To be clear, scope as used in Shape Up means “integrated slice” — and definitely does not mean “estimate” or “level of effort.”

Scopes are fundamental in Shape Up because they are how work gets delivered, and therefore also how we organize tasks. They even help us define the language we use when talking about our project.

Scope mapping

Recall that when our team is handed a project in the form of shaped work represented in a pitch, tasks are intentionally not defined yet: our work is shaped but no tasks are defined:

In the early days of the project, we’ll immediately uncover and write down a number of tasks, but these won’t yet be organized in any meaningful or useful way:

When we identify and build our first slice of work, we start the process of identifying scopes and their associated tasks:

We complete our first scope …

… and then we’re ready to take a pass at fully mapping out all of the project’s scopes and their tasks:

Scopes evolve as we dig deeper and understand more:

We build our way through these scopes …

… until finally we’re done:

Getting the scopes right

You need to walk the territory before you can draw the map.

Scope mapping is literally a blank-sheet-of-paper problem – so how can the team know its scopes are right? This is largely a matter of feel, “does this feel right?” Shape Up identifies three signs that our scopes are well-mapped:

Shape Up likewise lists three warning signs that our scopes may need to be redrawn:

It’s hard to say how “done” a scope is. This can happen when unrelated concerns are mixed within a single scope, so that finishing tasks related to one concern don’t result in any perceived progress on the other.
The name isn’t unique to the project, like “front-end” or “bugs.” Shape Up terms this kind of scope a “grab bag” or “junk drawer” of tasks that instead should be filed under other scopes.
It’s too big to finish soon. A scope that’s overly large is effectively a project that hasn’t been scope-mapped, with a long, unorganized task list, difficult to show quick progress on.

Scope shapes

Shape Up warns us that not all scopes have the nice layer-cake shape with a layer of UI over a thin supporting layer of back-end code.

Photo by James Eades / Unsplash

Some scopes look more like icebergs, with a tiny bit of UI riding over a huge hidden mass of back-end logic, or vice-versa. Icebergs need to be thought about and scope-mapped a bit differently.

And despite the warning above, Shape Up acknowledges that most projects need a “chowder” scope to hold tasks that don’t fit elsewhere. We are advised to both expect, and maintain a healthy skepticism about, chowder scopes.

Tracking progress and managing uncertainty

In the handover section above, we highlight the significant autonomy that Shape Up gives dev teams. That autonomy isn’t total, though; stakeholders and managers still need visibility into project status and risks. Providing useful visibility without adding friction isn’t easy.

We’ve already talked about “imagined” vs “discovered” tasks: some tasks can be anticipated and planned for, but the tasks that represent the majority of project work are unanticipated and can only be discovered during project work. One implication is that our task list actually grows as the project progresses:

This project is actually **pro**gressing in a natural way, but if you go by the length of the list or even the number of open tasks, it seems to be **retro**gressing. Clearly to-do list counting isn’t much of a metric to manage by.

Estimating task effort and tracking completion percentage seems like a smarter way to manage. The problem is that effort estimates are one-dimensional; they lack any measure of uncertainty:

Say you have two tasks, both estimated to take four hours. If one task is something the team has done ten times in the past, you can be confident in the estimate. Suppose the other task is something the team has never done before, or it has unclear interdependencies. It could take the four hours if all goes perfectly, but due to the unknowns in it, it could stretch out to two to three days. It’s not meaningful to write “4 hours, or maybe 3 days” as the estimate.

This “4 hours or maybe 3 days” brings back memories – that’s an answer I’ve gotten many times when, in my manager / stakeholder role, I’d ask the dev team, “well, how long will it take to build that?” This seemingly flip answer is actually a pretty healthy one – my team was responding honestly and trying to communicate “can’t know until we start building it” uncertainty that traditional estimates have no way to express.

We never did figure out a good system for managing uncertainty. But the Basecamp guys did.

The hill chart

The Basecamp team had a critical insight:

Every piece of work has two phases. First there’s the uphill phase of figuring out what our approach is and what we’re going to do. Then, once we can see all the work involved, there’s the downhill phase of execution.

The uphill part is where uncertainty is high; once we reach the top of the hill, there’s still a lot of work to be done but it’s all known work and therefore predictable, low-risk, downhill. Here’s what that looks like in a picture:

Shape Up’s hill chart is simply this hill curve applied to project scopes. Each scope placed at the appropriate spot on the curve based on its completion and uncertainty status:

If you’re a manager or stakeholder, this one simple chart provides a complete birds-eye view of how the project is coming along – what Shape Up calls “status without asking.”

There’s actually a third dimension here, time. Since scopes move on the curve over time, we have a time series of hill charts showing how our scopes are progressing over time. Or in some cases, not progressing, or even retrogressing:

By looking at how hill charts change over time, managers and stakeholders get critical insight into “what’s in motion and what’s stuck.” In this example, the red dot looks like a “stuck scope.” Noticing this, the manager can initiate a friendly conversation with the team, and work out a course correction that gets the scope back on track. All this happens with the least possible friction. No probing phone conversations, “So, how are things going?” No status reports. No daily standups.

Basecamp’s hill charts implementation

As you might expect, Basecamp’s own hill chart implementation is excellent. You can simply drag a scope along the curve of the hill chart to update status:

For managers and stakeholders, Basecamp can easily pull up the hill-chart-over-time view:

How to build (and stay out of trouble)

Scope mapping gives us an effective way to organize and talk about the work to be done. Hill charts help the team track and communicate progress during the build cycle. But does Shape Up help us actually execute the work?

Photo by Scott Blake / Unsplash

In my mind, the way scopes and hill charts train us to think and talk about the work, and for that matter everything else that Shape Up has laid out and walked us through, has all established the foundation for actually doing the work effectively. It’s an iceberg. Up here in the sunlight is the dev team, supported by their managers and stakeholders, kicking ass. Down below, hidden, is the deep Shape Up foundation that forms and supports that ready-to-kick-ass team.

That said, Shape Up does provide solid guidance on how to build effectively:

Build your way uphill. Early Shape Up adopters often run into “backsliding,” where a scope seems to make it to the top of the hill, but then slides back down into the unknown zone. “Oh crap, it’s actually a lot more complicated than I thought.” Shape Up points out that this is often because “somebody did the uphill work with their head instead of their hands.” The solution – build your way up the hill, rather than trying to do it in your head.
Solve in the right sequence. Some scopes have more uncertainty than others. When deciding which scope to work on first, Shape Up counsels us to think about being towards the end of the cycle and running low on time – which scope would you not want to build, late in the game, under time pressure? The answer is clear and leads the the guidance, “push the scariest work uphill first.”
Stuck scope? Refactor. Sometimes a scope that appears stuck just needs to be redrawn. The most common scope slicing problem is an over-broad slice that encompasses several concerns. The team might be progressing just fine on several of the concerns, but because they are part of a larger grouping, the scope can’t move up the hill because some of the concerns are still unsolved. Shape Up tells us “The solution in a case like this is to break the scope apart into smaller scopes that can move independently.”

Stopping

Photo by Jose Aragones / Unsplash

That there is an entire chapter in the Shape Up book on deciding when to stop, complete with one more great tool called scope hammering, was a facepalm moment for me. It just never dawned on me that deciding when to stop is a really crucial part of the project. Now that Shape Up points it out – facepalm – of course. I should have realized this and approached it like Mark Watney in The Martian,

I’m going to have to science the shit out of this.

Alas, I did not, and therefore our projects normally ended with a good dose of chaos and regret.

As you might have noticed, Shape Up is all about sciencing the shit out of every aspect of software development, and they did it again here.

Compare to the baseline, not the ideal

Great people have a burning desire to do great work – and also a tendency to try for perfection. This is a valuable trait, but can trips us up when it’s time to finish and ship. We compare what we’ve built so far to the ideal solution we have in mind, and want to keep going to achieve that ideal. Shape Up tells us that

It helps to shift the point of comparison. Instead of comparing up against the ideal, compare down to baseline—the current reality for customers. How do customers solve this problem today, without this feature? What’s the frustrating workaround that this feature eliminates? How much longer should customers put up with something that doesn’t work or wait for a solution because we aren’t sure if design A might be better than design B?

Scope creep

Shape Up points out that scope grows naturally, like grass:

Scope grows naturally. Scope creep isn’t the fault of bad clients, bad managers, or bad programmers. Projects are opaque at the macro scale. You can’t see all the little micro-details of a project until you get down into the work. Then you discover not only complexities you didn’t anticipate, but all kinds of things that could be fixed or made better than they are.

So why feel ashamed or angry about something that’s just a reality of software development (and in fact all project work)? Shape Up tells us:

This is how it is. Rather than trying to stop scope from growing, give teams the tools, authority, and responsibility to constantly cut it down.

Scope hammering

Metal hammering artist Yuya Kobayashi

How does Shape Up keep the scope grass mowed? Let’s not forget one of the Shape Up fundamentals we covered in part one: the circuit breaker. “If the work doesn’t get done, the project doesn’t happen.” That hard limit underlies the subject of controlling scope and deciding when to stop.

This forces the team to make trade-offs. When somebody says “wouldn’t it be better if…” or finds another edge case, they should first ask themselves: Is there time for this? …. We expect our teams to actively make trade-offs and question the scope instead of cramming and pushing to finish tasks

Shape Up thought carefully about the name also:

People often talk about “cutting” scope. We use an even stronger word—hammering—to reflect the power and force it takes to repeatedly bang the scope so it fits in the time box.

Scope hammering is largely about carefully considering which scope changes are must haves and which are nice to haves. Finally, of course, you need to have the courage and power to make the hard decisions that lead to shipping on schedule, to say no. Courage and empowerment are a thread that runs throughout Shape Up.

Deploy and cool down

One final reminder that the title of the Shape Up book includes the phrase “Ship Work that Matters” – or to quote the book itself, “done means deployed.” Once you’ve taken care of that little detail, then comes cool down. We are cautioned to “move on,” keep the plate clean, and not get sucked into making immediate changes to what you just shipped based on early feedback .

If you want to try Shape Up

As I was putting finishing touches on this post, I noticed that the Shape Up book includes a handy appendix on how to give Shape Up a try, with tips, cautions, and three suggested approaches based on your situation.

Thank you!

If you made it here, through two parts and nigh on 7,000 words, thank you!

Special thanks to Basecamp and Ryan Singer for their generosity in writing up Shape Up and sharing it freely with us.

Finally, thanks to Ten Chu, who acted as a sounding board for me around aspects of Shape Up. I’m also grateful to a friend of Ten’s, Will Warren, who as CTO of Searchspring is an experienced Shape Up user, and shared many useful insights.

Speaking of which, I hope to do a follow-up post or two based on conversations with Shape Up practitioners. Stay tuned.

Shape Up Your Agile

Sun, 12 Apr 2020 22:32:00 +0000

The Basecamp guys have been busy lately. I follow DHH and Jason Fried on Twitter and generally find myself agreeing with their often-contrarian points of view. I love their Guide to Internal Communication, their pushback against insane 80-hour workweeks in It Doesn’t Have to be Crazy at Work … REWORK … REMOTE … Getting Real. This is a seriously impressive body of work.

Shape Up: Stop Running in Circles and Ship Work that Matters, written by Basecamp’s Ryan Singer, hit my radar last summer, and I’ve been trying to write about it ever since.

Before I dig in, let me say that this post includes a number of quotes and several hand-drawn illustrations from Shape Up. All are linked back to their source in the online book and are copyright ©1999-2020 Basecamp.

My problem has been, it’s a big subject, and the deeper I dug, the bigger it got. Big as in a 143 page PDF. Bigger – it speaks to every aspect of the software development process. Biggest – it represents an alternative to All Those Other Agile Methodologies That Everybody Uses.

In the event that the Basecamp guys happen to read this, they might be shrieking and gibbering right now, because they do not, never ever, in any way or sense, describe Shape Up as an “agile methodology.” A case in point: if you search the Shape Up ebook for “agile,” you get one lonely match (in Jason Fried’s Foreword):

Likewise, just one match if you search for “scrum” and “methodology.” In all cases, the point being made is, “Shape Up isn’t one of those!”

So what is Shape Up?

Shape Up is the Basecamp team’s distillation of how they themselves develop software, superbly written and illustrated, freely contributed to the world as an online and downloadable e-book. Shape Up occupies the same territory, serves the same function, as Those Various Agile Methodologies.

Given that almost everybody is doing some kind of agile – formal, informal, sorta-kinda, whatever – it’s worth establish a baseline by taking a quick look at where agile started and what it has become today.

Where agile began

I was managing software development projects at Viant in February 2001 when the Manifesto for Agile Software Development was published. In those days, most software development happened using the lumbering, ultra-heavyweight waterfall methodology. The Manifesto outlined the principles of a radically different approach – far lighter weight, with a far greater emphasis on people and collaboration. The Manifest didn’t invent agile, but rather reconciled a number of existing “light” or “lightweight” methodologies that came into being as a backlash against waterfall.

The Manifesto got our attention, and led to a broad, radical rethinking of the entire software development process, cutting away process overhead and improving the quality of delivery. Agile became a serious, lightweight alternative to the heavyweight incumbent that was waterfall. (If you’re interested in more history and perspective on the Manifesto, I enjoyed this podcast interview of Scott Duncan.)

What agile has become

20 years have passed, and agile has gone from upstart to incumbent. At its worst, “bad agile” has become grotesque, bloated like Dune’s Baron Vladimir Harkonnen, accreting process overhead, cargo cult rituals, pricey certifications, and legions of gurus to help you Do Agile Right$$. One anecdote I heard recently: an agile expert convinced a company to pay him $2K per head to train every single employee in their technology organization in Scrum. The tech organization had over 50 people, but only a third of those were actually developers. Total cost, well over $100K. Result: a couple of limp two-week sprints followed by total regression to the old process.

We had a pretty solid dev process at the music app maker I co-founded, Agile Partners. I guess we fell in the informally agile or influenced-by-agile camp, but it worked – tight, efficient, effective. In recent years when I’ve gone out seeking a fresh answer to, “What’s the best way to do agile these days?” it was like I was on a different planet, nothing even vaguely resembled either the Manifesto or the nimble approach we had applied so successfully at Agile Partners. Going by what I found, modern agile seemed to have been hijacked by expert$ and transformed into a monstrosity with many of the same drawbacks of the old waterfall process that early agile was created to overcome.

Not all agile is bad agile. My friends at the Chicago CTO Group are very smart people, and everyone I’ve spent time with is practicing some form of agile effectively. I wonder, though, whether agile’s accumulated bloat and baggage is forcing these good folks to swim upstream.

Shape Up: a breath of fresh air

When I caught wind of Shape Up and started to read through the book, I had a flood of reactions – “This makes sense!” “Wow, this is a lot like our process at Agile Partners …” “Man, they’ve really thought this through …” “Whoa, what an interesting approach to that problem …”

The Basecamp guys aren’t afraid to take a different path than the herd, and many aspects of Shape Up directly confront (or more accurately, ignore) consensus agile practices. If you practice some form of agile software development today, I’m convinced that it’s very worth putting aside your methodology and process beliefs and read the Shape Up book, cover to cover, with a deeply open mind. When Shape Up follows the road less traveled, there’s a smart reason for it. When Shape Up diverges, that’s an opportunity to reexamine your current processes, ask yourself, why are we doing it the way we are today?

Six-week dev cycle, two-week cool-down

So let’s get right in your face, shall we? Those two-week sprints you’re running today? Forgetaboutit. Shape Up uses six-week cycles instead, with a two-week cool down between cycles. They make the point that two weeks is not enough time to get anything meaningful done, and that the overhead of planning sprints every two weeks just isn’t worth it. I’ve done both, and I side with Shape Up on this one.

Photo by Braden Collum / Unsplash

In Shape Up, six weeks means six weeks of “leave them alone and let them work” uninterrupted time. When the cycle team (typically one designer and one or two programmers) is handed a piece of work, it comes with that commitment from management. No daily standups, no counting hours, no parcelling out bite-sized tasks, no questioning how time is spent.

Cool-down is used, well, to cool down, to breathe, like after a hard workout. But also for planning the next cycle, fixing some bugs, experimenting, whatever.

The circuit breaker

Dev teams are given a carefully-shaped, de-risked project, and six weeks to build it. In return, the team is expected finish and release the project in that six-week window. This expectation is baked into Shape Up through the circuit breaker: when the six weeks are up, if the project hasn’t finished and released, the breaker pops, work stops, and this project is over. If there’s an appetite for finishing the work, that becomes a new project, and it goes back through same up-front process that any other project does – work-shaping, de-risking, and betting, which we’ll look at shortly. Allowing a project to overrun its deadline is almost unheard of.

Work shaping

Shape Up takes its name from work shaping, and I wouldn’t be surprised if the Basecamp guys put more effort into getting this part of the process right than any other. In work shaping, a small, experienced team defines the key elements of the project, at exactly the right level of abstraction, with enough detail that the dev team knows what to build, but not so much that the team is can’t exercise its skill and imagination in crafting the solution.

Nailing the right level of abstraction

I love Shape Up’s level-of-abstraction principles:

I’m a word guy personally, perhaps because I’m too short of design genes to even do a decent wireframe. But I’ve been burned over and over by trying to define a piece of work using just words; the team just can’t visualize it. And I’ve certainly experienced too concrete as well, when wireframes were part of the project spec, pre-defining the solution and excluding all the great ideas the team might have come up with if they hadn’t been over-constrained. I never did figure out a good system for creating specs with the right abstraction level; kudos to the Basecamp guys for pulling this off.

Nailing the right level of abstraction in the project definition reinforces an idea we introduced earlier: Shape Up dev teams are autonomous, fully responsible for creating the solution, and the project spec supports this.

Properties of shaped work

Besides the right level of abstraction, shaped work has a few other key characteristics:

It’s rough. When you look at a shaped project spec (or “pitch”), it’s obvious that it’s unfinished. “Work that’s too fine, too early commits everyone to the wrong details.”
It’s solved. The shaping team, a small and very senior group, has put a lot of thought into how this thing works. All the pieces are there, and they connect properly.
It’s bounded. Shapers have likewise thought carefully about where not to go, “there be monsters” regions likely to trip up the dev team and derail the cycle.

Note that the shaping team is separate from the dev team(s), and that shaping happens in a completely separate, parallel track from the dev teams’ building track.

Estimating vs. appetite

I’ve talked to lots of CTOs and engineering leads in the past year, and estimating is a sore spot, often turned into an open wound by an overly-rapid sprint cycle. A lot of bandwidth gets poured into agile estimating, with many people using Story Points as opposed to trying to estimate effort directly. Anecdotally Story Points seems to work better, if not very well. Personally, Story Points set off my Voodoo Science alarm.

Shape Up’s approach to the estimating problem is jujitsu – turn the problem on its head – through the idea of appetites.

Every piece of work starts with a raw idea, and with a raw idea in hand, our first step here is to “set the appetite”: how much of an investment is this idea worth? Is it worth a couple of team weeks (what Shape Up calls a Small Batch)? A whole six-week cycle (a Big Batch)? As the book says,

An appetite is completely different from an estimate. Estimates start with a design and end with a number. Appetites start with a number and end with a design.

Story Points: the process

So where an estimate is fixed scope, variable time, an appetite is fixed time, variable scope. Thus appetite (scope) becomes the key constraint that we have to work against as we create our design for this work. It empowers us to make smart trade-offs, both during the shaping process and when building. And unlike Story Points, which to some degree just kick the estimate can down the road to be dealt with later, appetite means we are dealing the scope vs time tension from the very beginning.

What is this thing? Sketching the elements

So, we have a raw idea that we like enough to spend more time shaping; and we’ve established our appetite or scope, say a Big Batch. Now it’s time to circle back to having the right level of abstraction, more abstract and less defined than wireframes, but less abstract than naked words. Here Shape Up introduces two tools that help us spec elements at the right level.

Breadboarding is something electrical engineers do when they want to wire together an early circuit prototype, without the extra work of laying out a PCB or even soldering.

Shape Up breadboards are the same idea applied to software, giving us a fast way to “wire up” affordances, actionable elements like buttons and fields, with places, like screens or dialogs. Here’s an example breadboard diagram from the Shape Up book for an invoice autopay feature – note how nicely this fits between overly-abstract words and over-detailed wireframes:

Fat marker sketches fit the bill when we need to get visual but need to make sure we don’t go overboard and overconstrain by creating a wireframe or high-res mockup. Shape Up defines these as “a sketch made with such broad strokes that adding detail is difficult or impossible.” Here’s a grouped-list example from the book:

Shape Up sketches were originally drawn with Sharpies, but these days are often created on iPads equipped with Apple Pencils.

De-risking

As mentioned earlier, shaped worked is bounded. Once we’ve sketched out the elements of the solution, it’s time to look for trouble, for things that could derail the cycle and cause “failure to ship.” The Basecamp guys again bring a lot of discipline to this part of the process, identifying four tools for de-risking:

Looking for rabbit holes, where we slow way down and look carefully for what we might gave missed in what we’ve sketched out, and the patch the hole.
Declaring out of bounds, where we identify any parts of the solution that are ripe for an overly-complex implementation, and make clear such an implemntation is out of bounds.
Cutting back, a general philosophy of Shape Up: look for parts of our solution that we got excited about during sketching but which really aren’t necessary. Cut them, perhaps mentioning them as “nice to haves.”
Presenting to technical experts, if there are areas the shaping team aren’t 100% comfortable with, making it clear that “we’re really hunting for time bombs that might blow up the project.”

Package up a pitch

At this point, our work shaping team has picked up a worthy raw idea, determined what appetite we have for it, sketched it out at the right level of abstraction, and addressed potential risks. If the idea was bad or if there wasn’t an acceptable solution given our appetite, the idea would have been dropped.

So if the idea has survived to this point, we have a winner, right? Not yet. What we have is a viable idea: one with a solution that’s doable within the constraints of our appetite. What we don’t know yet is how this idea compares with the other ideas competing for development resources. That’s where our next topic, betting, comes into play.

So far, our work has stayed mostly internal to the shaping team, and the idea exists mainly in the minds of this team, supported by a few rough sketches and notes. We need something better when we go outside the team, what Shape Up calls a pitch. The pitch needs to help the outside person, who lacks the deep shared context of the shaping team, to understand what this thing is: “help them see it.” Don’t worry, though, pitches aren’t overly elaborate or time-consuming to create; as with all things Shape Up, a pitch is “as simple as it can be, but not simpler.”

Betting instead of backlogs

It’s time again for Shape Up to drive a stake into the heart of another fundamental agile concept, the centralized backlog.

Here’s why:

“Backlogs are a big weight we don’t need to carry. Dozens and eventually hundreds of tasks pile up …. Just because somebody thought some idea was important a quarter ago doesn’t mean we need to keep looking at it again and again.”

“Backlogs are big time wasters …. time spent constantly reviewing, grooming and organizing old ideas prevents everyone from moving forward on the timely projects that really matter right now.”

What does Shape Up do instead? A betting table, held during the cool-down period between six-week cycles, where a small list of the pitches we just described are presented as possible bets for the upcoming cycle. Most of the pitches will have been developed by the shaping team during the previous six-week cycle, but a few of the pitches might be ones that had been passed over at past betting tables, and revived for another shot this time.

But … but … but … my lists!

At this point, you may be freaking out … what about my lists? Don’t I need lists? Won’t we lose things? Shape Up isn’t trying to kill lists and tracking things, just the centralized backlog that has to be continually groomed and managed. Decentralized lists are great, keep whatever you need:

“We don’t have to choose between a burdensome backlog and not remembering anything from the past. Everyone can still track pitches, bugs, requests, or things they want to do independently without a central backlog. Support can keep a list of requests or issues that come up more often than others. Product tracks ideas they hope to be able to shape in a future cycle. Programmers maintain a list of bugs they’d like to fix when they have some time. There’s no one backlog or central list and none of these lists are direct inputs to the betting process.”

Still, you might say, if we don’t have our centralized master backlog, don’t we risk losing some really important ideas? No, says Shape Up, important ideas will keep reemerging:

“Really important ideas will come back to you. When’s the last time you forgot a really great, inspiring idea? And if it’s not that interesting—maybe a bug that customers are running into from time to time—it’ll come back to your attention when a customer complains again or a new customer hits it. If you hear it once and never again, maybe it wasn’t really a problem. And if you keep hearing about it, you’ll be motivated to shape a solution and pitch betting time on it in the next cycle.”

The betting table

Photo by Kay / Unsplash

The betting table is actually a relatively short meeting, likely one or two hours, not necessarily in-person, where a small group of stakeholders choose what gets worked on during the next six-week development cycle. In a smaller company, stakeholders might include the CEO, CTO, and product leadership; in larger companies, whatever level of senior management has 100% decision-making responsibility – it won’t work if the person or persons with final say aren’t at the table. As the book says,

“The highest people in the company are there. There’s no “step two” to validate the plan or get approval. And nobody else can jump in afterward to interfere or interrupt the scheduled work. This buy-in from the very top is essential to making the cycles turn properly.”

All the work shaping team’s efforts – shaping at the right level of abstraction, carefully de-risking, crafting the pitch – pays off at the betting table, putting exactly the right information in the hands of exactly the right decision-makers.

“The meeting is short, the options well-shaped, and the headcount low. When these criteria are met, the betting table becomes a place to exercise control over the direction of the product instead of a battle for resources or a plead for prioritization.”

The betting table is thus a big win for management too:

“With cycles long enough to make meaningful progress and shaped work that will realistically ship, the betting table gives the C-suite a “hands on the wheel” feeling they haven’t had since the early days.”

For more information on the how the betting table meeting works, check out the “Questions to ask” section in the book.

Why the emphasis on “bet”?

Photo by Michał Parzuchowski / Unsplash

Shape Up uses the word “bet” intentionally because the word’s implications fit:

Bet implies a hoped-for payoff or reward, like an investment. In our case, thinking of green-lighting a project as a bet helps us stay focused on the fact that we are seeking a return on our development investment, as opposed to just filling out schedules with things to do.
A bet represent a commitment. When you lay down your bet at a casino, there are no do-overs. Likewise, when the betting table green-lights a project, that’s a commitment that the dev team assigned gets six uninterrupted weeks to create the solution.

Bets placed, time to build …

The output of the betting table meeting is assignments for each dev team for the next cycle.

The build phase comes next, and Shape Up serves up an approach to building that’s just as counter-consensus and worthy of your suspended-belief consideration as what you’ve just read. I especially love scope mapping and hill charts.

I plan a Part 2 to cover the build process in Shape Up, stay tuned. Update: here is Part 2: Building with Shape Up.

– Jack Ivers

A Day to Remember

Sun, 15 Sep 2019 20:02:00 +0000

I was in England on business—some sort of software event in London, the details long since faded and obscured. But this day, this day … the memories surge back, filling out in technicolor as I write and research, a kind of archaeological dig through both memories and the vast web of resources now at my fingertips about the people, places, and things we encountered this day.

It was early June 1987, and I was fortunate in that my wife was able to make the trip with me. Beth was eight months pregnant with our first child, Jennifer, who expressed pre-natal delight at the novelty of air travel with kicks and backflips. We had only a handful of extra days for sightseeing; just time enough, really, for a single destination.

Yorkshire & James Herriot

That destination wasn’t difficult to choose. Beth and I had both fallen in love with Yorkshire second hand, via James Herriot’s All Creatures Great and Small, which describes the author’s life as a country vet in the Yorkshire Dales. Beth asked around and discovered that James Herriot was the pen name of Alf Wight, a working vet who still maintained a surgery in the village of Thirsk up in the moor country of North Yorkshire. So we set off to explore Yorkshire and find the land of All Creatures Great and Small.

The book and the man

We spent a day in southern Yorkshire, thoroughly enjoying the city of York. I had been carefully instructed by a British friend that, while in Yorkshire, I must sample the Old Peculier Ale. I was further instructed that under no circumstance should I settle for Old Peculier served in a bottle; I was instead to find a pub that had the beverage on tap direct from the keg. There was a method to this madness: Old Peculier was (and apparently still is) shipped in oak casks, which contribute significantly to its flavor. I followed my directions carefully, located the Spread Eagle Pub, which had among its varied offerings Old Peculier tapped straight from the oaken keg, and gave it a thorough sampling. Without a doubt, this was the best beer I’ve ever had, then or since.

Theakstons, makers of Old Peculier, are proud of their custom-made casks

Thirsk & Alf Wight

The next morning we headed for Thirsk. We asked around and finally located a sign that announced the veterinary surgery of James Alfred Wight, OBE, complete with a vehicle out front containing a pair of muddy, knee-height farm boots, which we of course decided had to be Alf’s. We were tempted to knock on the door, but in the end decided not to make tourist fools out of ourselves, figuring that Mr. Wight might not welcome this kind of intrusion on his place of business. We contented ourselves with visiting the bookstore across the street, buying a couple of hardcover editions of the Herriot books “from the source” so to speak. With our Alf Wight stalking completed, we decided to head out and explore the countryside.

We steered east out of Thirsk on the A170, and before long entered the North York Moors National Park. Soon Sutton Bank appeared in front of us: a striking, 978’ tall cliff, fracturing the countryside for miles to the north and south, edging the great plateau that encompasses the North York Moors. As A170 approached the cliff, we began to see warning signs: Dangerous Grade Ahead. And they weren’t kidding: the Sutton Bank hill has an incredible 25% grade, with a tight hairpin turn halfway up thrown in for good measure; the locals keep a running count of major accidents on this stretch of road. I was uncomfortable enough driving on flat roads, thanks to the right-seated driving and left-handed manual shifting. Somehow, though, with palms sweating heavily, I managed to coax the tiny rental to the top of the Bank without ending up over the cliff, Beth sitting beside me white-faced. Once up, the views were stunning, looking out over the vast green checkerboard of the Vales of York and Mowbray. I am reminded of driving the road from San Francisco through the Sierra foothills into Yosemite valley: both drives invoke sheer terror, followed by—assuming one survives—incredible bliss. Both are places I cherish and intend to revisit.

Warning sign for Sutton Bank grade on the A170

Yorkshire Gliding Club

Less than a mile past the top of the Sutton Bank grade, the A170 intersects High Town Bank Road and there we noticed a sign: Yorkshire Gliding Club. I had flown gliders during my mid teens; or more accurately, a particular kind of glider: the venerable, forgiving, stodgy Schweizer 2-33 two-seat trainer.

Schweizer 2-33 trainer

I was curious to see what this Yorkshire club might be like, especially given its location perched on the edge of Sutton Bank. So we followed the signs and ended up at a cluster of buildings: the clubhouse and several aircraft hangars. Off in the distance we could see activity on the grass flying field: people, gliders, a tow plane. Beth and I were debating whether we would be welcomed or chased off if we dared walk onto the field, until I noticed a sign in the window of the clubhouse: “Glider Rides, £20.” Our concerns immediately evaporated: we were now potential customers rather than trespassers.

So we walked out onto the field and I soon realized that the Yorkshire Gliding Club was in a whole different league from the Mansfield, Ohio club where I had learned. Sutton Bank’s high cliffs face directly into the prevailing westerly winds, generating fantastic ridge soaring conditions: with even a moderate west wind, glider pilots can soar aloft for hours at a time. The Club’s flying machines were as world-class as its location: sleek, glossy white, state-of-the-art craft built of fiberglass, a far cry from the chunky fabric-and-dented-aluminum 2-33 trainers I had cut my teeth on. Still, I assumed that these gorgeous white rocket ships belonged to well-heeled private owners, and that they’d wheel out something more like the 2-33 for £20 demo rides.

Schleicher ASK-21, oh my

When I asked, though, we were directed over to a beautiful white T-tailed two-seater, an ultra-performance Schleicher ASK-21. Holy shit. At this point, my attitude towards taking a glider ride shifted from “toying with the idea” to “please please please can I take a glider ride please please” …. Now, Beth didn’t really share my excitement about the gliders. She loved Yorkshire and Sutton Bank but gliders, not so much. I graciously offered that she could take a ride too, which earned me a glare: as if Beth would want to strap eight-months-pregnant into the front seat of a sailplane. She was fine, though, with me taking a ride. Little did she know.

ASK-21 in flight

I paid my £20 and they paired me with a club pilot for my 20-minute ride. We talked a little, I explained that I had a bit of not-so-recent gliding experience, and finally he said something like, “Cheerio then, let’s get going. There’s your parachute; do you know how to put it on?”

Parachute !?

I’d never worn a parachute before—hell, I’d never even seen a parachute close up. Like most normal airplanes, my simple old Schweizer 2-33 trainer didn’t require one. The elegant, high-performance ASK-21, on the other hand, was certified for aerobatics, and by U.K. (and U.S.) law, all passengers must wear chutes. So there I was, with my extensively pregnant wife looking on, struggling to fit on a parachute before taking off in a great white shark of an aerobatic glider. Beth was becoming less amused by the moment.

The winds this particular day were not the usual westerlies, but instead blew from the north, meaning we’d miss out on the beneficial ridge updrafts off the Sutton Bank cliffs. So it was shaping up as a quick-up, quick-down flight, 20 minutes if we were lucky. I finally got suited up and buckled in to the front seat of the ASK-21, waved to Beth as they lowered the clear canopy, and we were ready for takeoff. Helpers snapped on the tow rope, a 200’ polypropylene rope that in turn connects to the towplane, a small but muscularly-engined plane engineered for launching gliders. A few hand signals and the towplane began to move, with our helper running along holding our wingtip until we had enough speed to fly the wings. Off we went down the grass strip, the ASK-21 lifting off quickly powered by its huge wing, trailing the towplane in a gradual climb.

Found a photo of this actual flight! ASK-21 tail number ESB on landing.

It didn’t take long to reach the release height of 2,000 feet. The tow release process is carefully choreographed and well understood by every glider and tow pilot: the glider pilot pulls the release lever, the tow rope pops off with a loud bang, the glider makes a climbing 90 degree to the right, the towplane a descending turn 90 degrees to the left. For us, release should have begun a steady descent back to the field, given the lack of ridge lift and cold temperatures. But almost immediately, we flew into a strong thermal updraft, a column of rising air caused by ground heating, a common phenomenon on hot days but unusual on such a chilly day. There it was, though, and strong: the variometer was showing 3-4 meters/second lift. The pilot circled to stay in the lift column and we were climbing, quickly.

Hot thermal

On that particular day, the bases of the cumulus clouds were around 5,000 feet, and the thermal we climbed within stayed unusually strong all the way up. As we came closer and closer to the cloud base, the pilot began to frantically power up the directional gyro required for instrument flight—essential if we were to climb into the base of the cloud, where we’d be flying blind. A directional gyro takes several minutes to spin up and stabilize, though, and we were already starting to penetrate the cloud base. The average life expectancy of non-instrument-rated pilots who fly into clouds has been calculated as 178 seconds; I’m guessing that instrument-rated pilots without their instruments don’t fare much better. My pilot wisely decided it was time to “get out of Dodge.” He dove down, out of the cloud and away from the thermal lift.

Vintage ASK-21 instrument panel. Mine would have had a few additional (IFR) instruments.

So here we were, at almost 5,000 feet altitude, about 15 minutes into a 20-minute ride. How to burn 5,000 feet of altitude in five minutes? The next thing I heard was the pilot asking, “Do you like aerobatics?” Well, I didn’t really know, because the old Schweizer 2-33 definitely didn’t do aerobatics, but it sounded like fun, so I answered “Sure!” Today’s older, wiser Jack might have paused to consider the effect these aerobatics might have on his wife, waiting 5,000 feet below and already nervous because her husband—wearing a goddamn parachute—had flown off and almost disappeared into the bottom of a cloud.

Aerobatics ?!

He proceeded to point the nose of the ASK-21 at the ground at what felt like a near-vertical angle as I floated weightless, held in place by my seat harness, He let the speed build to the plane’s top speed of 151 knots as we hurdled at the ground. At the last second, he pulled back on the stick, and I melted into the seat as he turned speed back into altitude again, pulling up into a near-vertical climb until the glider lost forward speed and entered a stall. He then kicked the rudder hard left to initiate a stall turn, and the great white beast spun 180 degrees until it was again pointing straight down, accelerating to max speed until the next zoom into another vertical pull-up. We must have repeated this exhilarating maneuver three or four times until we burned off our 5,000’ of excess energy and set up in the landing pattern.

ASK-21 aerobatics including the Stall Turn, from ASK-21 Flight Manual

Poor Beth

Now let’s take a minute to look again at these events from Beth’s perspective. Here she is, eight months pregnant and 3,700 miles away from home. Her fool of a husband has dragged her to a glider port, strapped on a parachute, disappeared into a cloud—and now here he is again, diving towards the ground at 174 MPH, zooming back up, twisting around, hurtling back down, over and over. I have to think that Jennifer was at high risk of being born in Sutton Bank, Yorkshire, UK rather than Cincinnati, Ohio, USA. Amazingly, though, Beth seemed to stay pretty calm through all of this. I stepped from that incredible ASK-21, shrugged out of the parachute harness, and zombie-walked over to her on shaky overexcited legs. Outwardly at least, Beth was the calm one.

Grilled cheese & onion

This magical day delivered one last treat. After the adrenaline rush of the day, we were both exhausted and ravenously hungry, so we pulled into a cozy pub not far down the A170. Here we had a chance to enjoy a classic of pub cuisine: the Grilled Cheese & Onion sandwich. It’s basically a grilled cheese with a thin slice of caramelized onion cooked right in the middle of the melted cheese. Amazingly tasty and the ultimate adrenaline antidote.

Grilled Cheese & Onion. Photo from http://ohmyveggies.com/

So ended the Day to Remember. Or at least my memory of it. Writing this, I feel a strong yearning to return to the Yorkshire Dales.