Boulder Future Salon

Boulder Future Salon

Thumbnail
Video of smoke from oil refineries on fire from Moscow. This showed up unsolicited in my YouTube recommendations. YouTuber and self-professed "Vladimir Putin hater" Roman not only collected a bunch of clips from Moscow from Russian social media, but also says Putin did not address the nation about the attack, but instead just introduced a presentation by Foreign Minister Sergey Lavrov about completely unrelated foreign policy.

I was wondering if the previous attacks on Moscow were a one-off. It looks like, no, not a one-off and possibly the beginning of a trend. People said the previous attacks didn't hit any important targets, but I thought even if that was true, it was hugely psychologically significant. The way wars have been sold to the Russian people since Soviet times (since the war in Afghanistan), was that wars happened somewhere else, on the fringes of the map, so people don't need to worry about them. People can just trust the government is doing the right thing. But this is Moscow. And this time, significant targets have been hit, at least if this footage is to be believed. I know video can't be trusted any more, so I have to attach the word "allegedly" to everything. What do you all think? Is this video genuine?

There's also video of people being conscripted by being kidnapped on the street in Russia -- allegedly. I can't tell from the video if it's Russia or Ukraine and at this point, I have to explain why I've seen so many videos of this type of thing from Ukraine. On my previous phone, I had a Russian Telegram channel, and how that came about was, I was looking for a video from a Ukrainian TV channel of an interview of one of the negotiators from Ukraine at the negotiations with Russia in Turkey in 2022. I tracked it down to a Telegram link but couldn't find any other way to find the video, so to follow the link I had to install Telegram, and that's how I ended up on the Russian Telegram channel. Every day it had videos from the war -- although it was from inside Russia so they never used the word "war" (even in English). They would always say "SMO zone." (Such a farce.) On this Telegram channel, I saw video of a Ukrainian guy getting conscripted right off the street. A van drove up and some people got out and grabbed him and forced him into the van. Then there was another video like that. And another. And another. And so on. I must've seen at least a hundred of those. So apparently it was commonplace in Ukraine for people to get conscripted basically by get kidnapped out of the blue, and for that to get caught on video, and for that video to get found and republished on a Russian Telegram channel. So when this YouTuber Roman says these conscription "kidnapping" videos had become a Russian propaganda trope, this is what he's talking about and I would say it's very much true. Apparently now, such conscription by on-the-street kidnapping is now taking place inside Russia. I thought Russia had been constripting only people who were already in prison (and dramatically reducing its prison population) and gay people, but apparently constription is expanded to regular people outside of prison? There has been no announcement of a new mobilization that I'm aware of. Wait, I just did a Google search and Ukrainian news is saying preparations are being made for another mobilization (link below).

Anyway, when I got my current phone, I didn't install Telegram so I don't have that channel any more. The channel showed Russian victories and Ukrainian defeats, but never Ukrainian victories or Russian defeats. I got the impression Russia was advancing rapidly, until I looked at the OSINT maps, and saw the front lines weren't moving. So in reality, Ukrainian and Russian victories and defeats were more or less cancelling out. But that Telegram channel came from inside Russia, so they can't "discredit the Russian military". Now we have drone attacks hitting Moscow so it's really obvious Russia wasn't winning. On my posts here, I've tended to stick to the OSINT maps and so have never come on here and said Russia was winning or Ukraine was winning. That was up until the recent drone attacks within Russia which have made me wonder whether Ukraine has pulled ahead and the reason is superior drone AI.

When Russian invaded Ukraine in 2022, it expected a short "special military operation" but instead got the world's first drone war.

Also on the subject of propaganda, the Russian Telegram channel valorized Russian soldiers for treating Ukrainian prisoners of war and defectors humanely, always portraying the Russians as the morally good people in this fight. I think war propaganda in every country portrays that county as the the morally good people.

Thumbnail
Quantum computing advancing faster than expected?

"I always said it would take 10 to 15 years. Now I think we have to be prepared for interesting quantum computers to appear sooner, in the range of thousands of qubits."

So says Aram Harrow, a researcher at the Massachusetts Institute of Technology (MIT), "best known for co-developing the HHL algorithm in 2008, considered one of the first demonstrations of an exponential advantage of quantum computers over classical ones."

"One way to view it is like Moore's Law for classical computers. Every 18 months they would get better. Some people talk about something called Schölkopf's law, which is similar, and suggests that qubit quality improves every year in quantum computers. What ultimately determines whether we can build a scalable quantum computer is how many operations you can perform on a qubit before noise ruins your data."

"Each year the noise rate goes down a bit. There are two axes. On the one hand, the noise rate gets a little bit better each year; on the other, the challenge is integrating more qubits."

"Since the Greeks, we've thought solving a mathematical problem meant following step-by-step instructions. If you want to multiply two numbers, you first take these digits, then do this and that. Everyone assumes information must be processed that way. The fact that quantum mechanics says the universe can process information in a very different way, and that we can harness that to solve things we otherwise couldn't, is thrilling. Whether it will have great economic utility is another question."

Thumbnail
"AI is already accelerating the development of AI systems. To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025."

(Extensive quoting from the essay by Marina Favaro and Jack Clark of Anthropic to follow.)

"The technical trends discussed in this piece suggest that AI systems are going to become much more capable in coming years. These trends have huge implications."

"The rate at which AI models improve is accelerating. The length of tasks that they can reliably complete on their own has been doubling roughly every four months, up from an earlier trend of doubling every seven months. In March 2024, Claude Opus 3 could complete software tasks that take humans about four minutes to complete. A year later, Claude Sonnet 3.7 managed tasks that took about an hour and a half. A year after that, Claude Opus 4.6 managed 12-hour tasks. If this trend holds, tasks that take a skilled person days could come into range this year. In 2027, AI systems could be capable of tasks that take a person weeks."

"The same pattern appears on coding and research benchmarks. Benchmarks measure the performance of models in a given domain, and they're "saturated" when models achieve close to 100% performance. SWE-bench is a standard test of real-world software engineering: it hands a model an actual open-source codebase and a real bug report, and asks it to write a code change that fixes the issue and passes the project's own tests. Models have gone from scoring in the low single digits to saturating the benchmark in two years."

"CORE-Bench tests whether a model can reproduce existing research, a prerequisite for them to conduct original research. It gives an AI model the code and data behind a published paper, and asks it to rerun everything and confirm it can replicate the paper's results. AI systems went from succeeding at reproducing the results roughly 20% of the time in 2024 to saturating the benchmark fifteen months later."

"In engineering, Claude can be handed an underspecified problem and figure out how to solve it; humans supply the goal, but they no longer need to supply the method. In research, Claude can already match or outperform skilled humans at executing a well-specified experiment. However, large performance gaps persist when it comes to Claude exercising judgement in choosing goals in both engineering and research. That's the gap between AI today and a future system that could autonomously design its own successor."

"In a March 2026 poll of 130 employees from across Anthropic research teams, the median respondent estimated that they produced around 4x as much output with Mythos Preview as they would have without access to any AI models, on the kinds of projects they would have been working on regardless."

"We also see evidence that people at Anthropic are using Claude to do work that simply wouldn't have happened otherwise, like building exploratory tooling and addressing long-deferred cleanup. For example, in April 2026, Claude shipped over 800 fixes that reduced a class of API errors by a factor of one thousand. The engineer overseeing Claude estimated that a human would have taken four years to complete this work; solving other people's bugs is slow and painstaking, and humans struggle to hold that much unfamiliar context in their head at once."

"The code that Claude writes is "good" and improving. "Good code" means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it. On the first criterion, the evidence is clear. The rate at which Anthropic staff correct, redirect, or take over mid-task from Claude has been falling steadily for a year, including on the most complex and open-ended tasks."

"On the most open-ended tasks, Claude's success rate reached 76% in May 2026, up 50 percentage points in six months."

"There isn't full consensus among staff at Anthropic, but many believe that the Claude-written code was still worse in quality than human-written code at Anthropic in late 2025, and is roughly at parity today. We expect it to be better within the year."

"This has changed the way that Anthropic now reviews its own code. Proposed changes to our codebase are now read by an automated Claude reviewer that looks for bugs, security flaws, and other defects before it can merge. Using this tool, we ran a retrospective analysis, and found that an automated Claude review of every change to our codebase would have caught roughly a third of the bugs behind past incidents on claude.ai before they ever reached production."

"Every time Anthropic releases a model, we run the same test: we give Claude some code that trains a small AI model, and ask it to make that code run as fast as possible while still passing the same correctness checks. The goal and the success metrics are fixed in advance, so Claude's job is to find speedups by rewriting the code, running it, timing it, and repeating. It's a miniature version of an experimental research loop. In May 2025, Claude Opus 4 averaged a ~3x speedup over the starting code. By April 2026, Claude Mythos Preview was achieving ~52x."

"Claude-powered agents were given an open problem in AI safety -- roughly, can a weaker model reliably supervise a stronger one? -- and were left to solve it. This involved proposing hypotheses, testing them, sharing findings with parallel agents, and iterating."

"We examined real Claude Code sessions (between January and March 2026) where Anthropic researchers were working with Claude on an open-ended investigative problem, like figuring out why a training run kept crashing, or why a model scored poorly on a benchmark. In each case, we found a moment where the researcher took a detour: they pursued a direction that sent the session sideways before it eventually got back on track. We then showed various Claude models only the work from before the session went off-course and asked what it would do next."

"What these moments give us is a set of realistic, challenging situations where the right next step is not obvious, and where the human's choice serves as a useful yardstick to compare model performance over time. On this measure, our best model in November 2025 (Opus 4.5) beat the human choice 51% of the time; in April 2026 (Mythos Preview), this grew to 64%."

"The evidence suggests that the human role is narrowing at each step in the AI development process."

"Once Claude can run experiments, the question shifts towards "Which of these experiments is worth running?""

"An area of human comparative advantage, for now, is research taste and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end."

"Even if we suppose that Claude never achieves good research taste, a conservative reading of our evidence still implies compounding acceleration."

"the early evidence on Claude's improving research judgment -- narrow as it is today -- is an indicator that this capability is improving as well. "Research taste" might be just another AI capability that AI systems fail at for a time, then get good at."

The essay goes on to discuss "possible futures": The trend stalls, but today's AI capabilities are widely diffused, AI labs continue to see compounding efficiency gains, and AI systems themselves become capable of full recursive self-improvement, and begin building their successors.

My commentary: Seems like given the trends, we have to take seriously the possibility that the recursive self-improvement scenario will happen.

Thumbnail
AI interviews... that malfunction. Vanessa Wingårdh collects a handful of videos where people recorded themselves interviewing with AI. I don't know why people video recorded it -- maybe they were planning on recording the interview anyway so they could analyze what happened later? She (Vanessa Wingårdh) adds a lot of her own commentary. (Spoiler: She really doesn't like the concept of AI doing interviews, even if they don't malfunction.) I wonder how prevalent these AI interviews are?

Thumbnail
If everyone loses their jobs to AI, will consumer demand collapse, threatening the very profit margins that motivated corporations to automate all the jobs with AI in the first place? That's essentially the question this research paper endeavors to answer.

Warning: extensive quotes from the paper to follow.

"Even as AI-driven layoffs sweep across industries, and even as every firm recognizes that vanishing paychecks mean vanishing customers, not one of them will stop. Each firm reaps the full savings of replacing its own workers yet bears only a sliver of the demand it destroys; the rest lands on rivals. No firm can afford to be the one that holds back."

"The fear that technology will displace workers is at least as old as the Industrial Revolution. Historically, displacement has largely been self-correcting: automation of existing tasks has been offset by the creation of new tasks and occupations. What Acemoglu and Restrepo call the reinstatement effect has tended to stabilize the labor market. Whether this balance will hold in the age of AI is an open question"

"Even if reinstatement eventually occurs, a problem arises along the way: displaced workers are also consumers, and when their lost income is not replaced, each round of layoffs erodes the purchasing power all firms depend on. At the limit, this becomes self-destructive: firms automate their way to boundless productivity and zero demand. Public discourse increasingly treats this dynamic as an inevitable process with no natural brake."

"In February 2026, Block cut nearly half its 10,000-person workforce, with CEO Jack Dorsey stating that AI had made many of those roles unnecessary and that 'within the next year, the majority of companies will reach the same conclusion'. In 2025, US employers announced more than a million job cuts, and AI was explicitly cited in roughly 55,000 of them, led by technology firms and concentrated in customer support, content moderation, and middle management. The exposure extends beyond tech: Eloundou et al. estimate that roughly 80% of US workers hold jobs with tasks exposed to large language models. Early demand-side indicators are consistent with the predicted strain: in Q1 2026, business investment overtook consumer spending as the leading contributor to US GDP growth, and the personal savings rate fell to 3.6% in March, its lowest level since late 2022."

"Against this backdrop, we ask under what conditions rationality and perfect foresight are enough to prevent competitive over-automation, what determines the size of the distortion when they are not enough, and which proposed policy responses correct it."

"To answer these questions, we develop a task-based automation model inspired by Acemoglu and Restrepo, but refocused from the labor market to the product market: when automation displaces workers, their forgone spending reduces every firm's revenue. Each of several symmetric firms chooses what fraction of its workforce to replace with AI. Automated tasks are performed at lower cost, but integration frictions make each successive task harder to automate. On the demand side, workers spend a fraction of their income on the sector's output; firm owners spend less, normalized to zero in the baseline. Some displaced wage income is recovered through reemployment or transfers, but the remainder is lost to the sector. The model is deliberately stripped down to make this channel transparent, and the demand cliff ahead visible to all firms."

"We show that competition creates a demand externality that traps firms. An automating firm captures the full cost saving but, under competitive pricing, bears only a fraction of the resulting aggregate demand destruction; the rest falls on rivals. Each firm's profit-maximizing automation rate is a strictly dominant strategy that exceeds the cooperatively efficient level, so foresight alone cannot prevent the race toward the cliff. The distortion deepens with competition: a monopolist fully internalizes the externality, while fragmented markets exhibit the widest gap. In the friction less limit, where every task is equally easy to automate, the game sharpens into a Prisoner's Dilemma in which every firm displaces its entire human workforce with AI, even though collective restraint would raise all profits."

"Since the loss falls on both sides, a natural question is whether policy can correct it. We evaluate six instruments against the externality margin. Upskilling and worker equity participation narrow the wedge but cannot eliminate it. Nor can Coasean bargaining: because automation is a dominant strategy, no voluntary agreement among firms is self-enforcing. Capital income taxes do not alter the equilibrium automation rate, operating on profit levels rather than the per-task margin where the externality resides. Neither does universal basic income: it raises the floor on living standards but leaves the automation incentive unchanged. Only a Pigouvian automation tax, set equal to the uninternalized demand loss per task, implements the cooperative optimum; its revenue can fund retraining that raises income replacement, shrinking the externality over time and making the tax potentially self-limiting."

"Higher AI productivity widens the wedge rather than resolving it: each firm perceives a market-share gain from automating beyond rivals, but at the symmetric equilibrium these gains cancel, leaving only the additional distortion. This Red Queen effect means that 'better' AI, far from mitigating the externality, amplifies it. Endogenous wage adjustment, a key self-correcting channel in the framework of Acemoglu and Restrepo, raises the threshold at which the externality activates but, short of collapsing wages to AI's cost, cannot close the wedge once it does. Free entry, capital-income recycling, and richer product-market structures likewise fail to eliminate the distortion."

"In a frictionless multi-sector general equilibrium, the income lost to displacement could be reabsorbed elsewhere, and the mechanism may vanish. We argue that both routes for this reabsorption are blocked for the mass- market firms most exposed to AI. In the product market, displaced spending might rotate to other goods, but saturation in high-income consumption and the inability to retool production quickly keep it from returning to mass- market producers. Yet another route runs through the interest rate. The income displaced workers lose does not leave the economy: automation shifts it toward firm owners, who spend a smaller share of their income than workers do, so aggregate saving rises. In a frictionless economy a falling interest rate would put that saving back to work as investment and borrowing by others, holding total demand steady. This adjustment stalls, though, when interest rates are already near zero and cannot fall further, or when the income loss is lasting rather than temporary, so displaced workers cannot borrow their way through it."

Sections 2 and 3 describe the mathematical model. Each firm is assigned a number of "task positions", which are initially all done by humans. The model simulates arrival of new technology, at which point each firm decides how many humans to replace. The model has equations for production cost, aggregate demand, aggregate worker income, aggregate owner surplus, the cooperative optimum, the optimal automation rate, marginal profit from automation, the automation threshold, and various other quantities.

Section 4 discusses variants to test policy instruments. The policy instruments tested are: displacement vs. upskilling universal basic income capital income taxation worker equity participation Coasean bargaining, and the Pigouvian automation tax.

"What does the model say when AI replaces most human labor? The problem does not vanish; it changes. During the transition, firms automate too much and both workers and owners lose. In the post-labor limit the cost saving is so large that automating every task maximizes profit, since the demand lost per task is limited while the cost saving is not. Demand does not fall to zero but settles at a floor set by autonomous spending and replaced income, and the over-automation wedge closes. What remains wrong is not too much automation but that the eliminated wages do not return. Even a planner who values workers now optimally automates fully. The remaining failure is distributional, not allocative, so an automation tax has no margin left to correct. A profit tax, useless for correction during the transition because it cancels from each firm's automation decision, now has no margin left to distort, which makes it the natural instrument for the one task that remains. A universal basic income funded by such a tax returns part of the owners' surplus to households as spending, rebuilding the demand floor by raising autonomous demand A. Workers gain directly, and because firm profits depend on that demand, firms gain too. This is the role UBI could not play in the transition but can play here. The transfer must be mandated, since firms will not provide it on their own, and because households spend only a fraction greek letter lamda in the sector, the recycling is partial. The post-labor case is thus the limit of our result, not an exception. The failure is now distributional rather than allocative, and the corrective instrument is a profit-funded UBI rather than an automation tax."

Section 5 has "extensions".

"The baseline isolates the demand externality in the simplest environment that supports it. A natural concern is that the result depends on what has been held fixed: endogenous wage adjustment might close the wedge, free entry might discipline the market to an efficient scale, higher AI productivity might resolve the demand problem by expanding the pie, and capital-income recycling might offset the spending lost through displacement. This section takes up each of these objections, along with richer product-market interaction, and shows that the externality is robust to all of them and, in some cases, amplified."

The extensions tested were: AI productivity (AI output exceeds maximum possible human output), endogenous entry (new firms can be created), endogenous wages (displaced workers increase labor supply, pushing wages down, decreasing incentive for automation), capital income recycling (the owners of capital who make the profits from automation spend those profits as consumers and replace consumer demand), and imperfect product-market competition and task complementarity.

These can raise or lower the automation threshold, widen or narrow the over-automation wedge effect, and ultimately whether it solves the problem of collapsing consumer demand. None solve the problem of collapsing consumer demand, but capital-income recycling does the best -- partially mitigates the problem.

My commentary: I've been assuming AI agents can become economic agents, making buying and selling decisions, and thus the economy of the future will be a functioning economy with supply and demand, it's just that both the supply and demand side will be all AI agents -- AI will do all the labor and produce all the goods and services, and AI will buy all the products and services.

However, after reading this paper, I think while AI agents currently make buying and selling decisions in certain circumstances, such as buying and selling stocks (in fact if you buy or sell a share of stock, it's already more likely that there is an algorithm rather than a human on the other side of the trade), AI probably won't be able to make up for the loss of human consumer demand in the short term. We'll see.

Thumbnail
Space Time is a free 3D solar system explorer. Well, they say "universe explorer" but I wasn't able to zoom out past the solar system to see the galaxy from outside. Even so, being able to see all the planets in the solar systems (and a lot of the moons and dwarf planets) in real time is pretty neat. This is from PointDynamics, an AI company. They don't say much about the role of AI in development. They say:

* ~96,000 lines of TypeScript with 600+ automated tests
* Sub-arcsecond positional accuracy matching professional planetarium software
* Full solar system plus 207 deep-sky objects, pulsars, and a 2.5 million star catalog
* Stand-on-the-surface planetarium for Earth, the Moon, and Mars
* Galactic, neighborhood, cosmic-web, and exoplanet-system views
* Cinematic solar & lunar eclipses and historically accurate Apollo replays
* A real observer's toolkit: planner, telescope sim, AR sky compass, push-to finder
* Live data: JPL comets/asteroids, rocket launches, close approaches & NOAA space weather
* Full VR/WebXR support and a fully mobile-responsive UI
* 60fps with GPU-accelerated GLSL shaders and zero backend

Tech stack used:

* React 19
* TypeScript
* Three.js
* Zustand
* astronomy-engine
* satellite.js (SGP4)
* WebXR
* GLSL Shaders
* Web Audio API
* Vite
* Vitest
* Tailwind CSS
* Visit

Thumbnail
"LLMs pre-commodify ideas."

"I've noted numerous instances of this happening in the last few months and I don't think it's coincidence or simply faster diffusion of ideas. Everyone working in the same latent space of a problem (using LLMs) arrives at similar ideas, independent of each other, almost simultaneously. Whereas earlier, new ideas would have enjoyed some alpha that could be capitalized on, and they could be tracked to an originator."

"Why does this happen? An LLM's training corpus is diachronic, that is, it was accumulated across time, carrying years of training data, shifting facts and evolving styles. The model itself, however, is a synchronic compression. All the temporal depth of the training data is collapsed into a latent space of relationships existing at a particular moment. Retrieval Augmented Generation (RAG) bots add an element of diachrony to synchronic LLM interaction, where the synchronic LLM is combined with ideas from the present. When we use LLMs we are pulling ideas from the past and re-combining them in a newer, more present context. As multiple people work on the same latent space of problems, we pull forward the same sticky ideas along the same gradients in the latent space, ending with roughly the same ideas."

"Everyone arrives at new ideas independently, trying to stake a claim on a slippy terrain. Since most of these ideas are arrived at through solipsistic adventures in thinking, no common knowledge develops. Everyone knows the idea but no one knows that the other people also know. So now you have multiple people trying to capitalize on the alpha they think they have without the knowledge that the alpha had arrived in commodified form."

Hmm. LLMs seem better at drawing ideas from disparate disciplines than humans, or at least better than me. You would think drawing from this newfangled well would give one an advantage, but maybe not? What do y'all think?

Thumbnail
"For the better part of the past 70 years, the machines that profile us and the machines we confide in ran on separate tracks. The result of this merger talks to our children every day, and this month, the companies behind it filed to go public."

"Not too long ago, a fifteen-year-old I work with turned her phone around to show me she had screenshotted a message from a boy she liked, pasted it into ChatGPT, and asked what she should say back. Then she read me the reply. How did it feel showing the message and asking for the opinion of a machine instead of that of a person, a close friend, or a parent? She immediately said that she would never show something like that to a parent, maybe a friend, but then again, a friend would be 'judgy'. They would remember the interaction, the (potential) rejection."

"I am a researcher running an education practice, which means I spend my working days in conversation with teenagers, mostly preparing for university. Over the past five years, I have had more than two hundred conversations with them in which AI came up, more often at their initiative than mine. Somewhere in 2023, the register changed from being homework tools that the students mentioned (e.g., Grammarly or Wolfram) to being 'things' that they described the way you and I might describe a person. These 'new humans' do not get tired of them; they remember what they said last month and are awake when their best friend and therapist are asleep ('how can those bastards dare to sleep when I need to vent?'). In all fairness, none of my students actually thinks ChatGPT or any other LLM are alive (at least I hope). They simply find that it occupies the role of a confidant or a friend, and they talk to it accordingly, at length, late, about the things they will not say anywhere else."

Thumbnail
"87% of digital workers now use AI at work. 75% say it makes them more productive, saving them roughly 11 hours each per week through automation alone. Yet only 13% say their organization is performing significantly better as a result."

"So where are the gains going? They're being swallowed by a new, largely invisible form of labor. We call it botsitting: the work required to make AI usable, including feeding it missing context, checking its outputs, debugging its mistakes, rerunning prompts, and cleaning up the confident-but-wrong answers AI leaves behind. Workers now burn an average of 6.4 hours a week botsitting -- most of a full working day, every week."

"When that labor is untracked, unbudgeted, and unrewarded, workers start cutting corners. They stop checking outputs and deliver work they can't fully explain or defend. That's when botsitting turns into something more dangerous: botshitting -- shipping AI-generated work that workers haven't reviewed, don't fully understand, or couldn't defend if asked. Today, 69% of AI users admit to botshitting at work."

The report goes on to say:

"The Work AI Index draws on a survey of 6,000 full-time (30+ hours per week) digital workers across the United States (n=3,000), the United Kingdom (n=1,500), and Australia (n=1,500), conducted between December 2025 and January 2026."

Commentary: December 2025 and January 2026 means this report is already out of date. The models are much more powerful now. The most recent Claude model was used to vibe-code an Age Of Empires-type game from scratch, although the process used to do that was complex (see separate video). Where I work, my boss, who has no background in software engineering, is the company's most productive "engineer". He commits massive amounts of code every day, and there's no way he could be proofreading it all because, he has no background in software engineering (see previous sentence) (his background is in sales & marketing, though he knows some CSS and is good at visual design). Six months ago, this kind of thing required me to periodically rewrite some feature, because the AI did it in such a batty, over-complicated way, but that seems to be a thing of the past these days. Whatever technical debt is being created now will probably be easily cleaned up by future, smarter AI models. Overall, the amount of bugs customers experience hasn't increased despite the tremendous increase in development speed. Once it becomes possible to fit the entire codebase plus all the company documentation and the entire knowledge base for customer support in the AI's context window, the AI will never be missing context and doing the wrong thing as a consequence again. If this outfit (Glean Technologies) does this same report a year from now, I expect they will get a dramatically different result.

Thumbnail
Opus 4.8 and Fable 5 build the same app. Actually 3 apps: an E-commerce store, a 3D art museum, and an Age Of Empires-style game. Fable did a lot better on all of them and it made what looks like a really amazing Age Of Empires-style game.

He (Pat Simmons) didn't use regular Claude Code. It didn't just take one prompt and pop out the code. He used some new Claude dynamic workflow system, that splits a big job across a swarm of parallel agents. It spawns like 30 plus agents, and they keep running and iterating until they get everything in your prompt working, or they think it's working.

Thumbnail
DiffusionGemma is a new "experimental open model that explores text diffusion, an exceptionally fast approach to text generation" from Google.

I was wondering how long it would take until we could run diffusion text models!

"Released under an Apache 2.0 license, this 26B Mixture of Experts (MoE) model moves beyond the sequential token-by-token processing of typical autoregressive Large Language Models (LLMs). Instead, it generates entire blocks of text simultaneously, delivering up to 4x faster text generation on GPUs."

"DiffusionGemma is designed for researchers and developers exploring speed-critical, interactive local workflows such as in-line editing, rapid iteration, and generating non-linear text structures."

"By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on dedicated GPUs. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090)."

"Operating as a 26B total Mixture of Experts (MoE) model that activates only 3.8B parameters during inference, DiffusionGemma fits comfortably within 18GB VRAM limits of high-end dedicated consumer GPUs when quantized."

"Generating 256 tokens in parallel with each forward pass allows every token to attend to all others. This provides significant advantages for non-linear domains such as in-line editing, code infilling, amino acid sequences or mathematical graphs."

"The model iteratively refines its own output, allowing it to evaluate the entire text block at once to fix mistakes in real-time."

"Because it prioritizes speed and parallel layout generation, DiffusionGemma's overall output quality is lower than standard Gemma 4. For applications that demand maximum quality, we recommend deploying standard Gemma 4."

"You can improve DiffusionGemma's performance on specific tasks through fine-tuning. In the example below, Unsloth fine-tuned DiffusionGemma to play Sudoku -- a task autoregressive models struggle with because each token depends on future tokens. DiffusionGemma's bi-directional attention makes this much easier."

"Most language models act like a typewriter, generating one token at a time from left to right. In the cloud, this is efficient because servers can batch thousands of user requests together to share the hardware load. But when run locally for a single user, this word-by-word process leaves your dedicated GPU or TPU underutilized -- it spends most of its time simply waiting for the next 'keystroke.'"

"DiffusionGemma reverses this inefficiency. Instead of predicting words sequentially, it drafts an entire 256-token paragraph simultaneously. By giving the computer's processor a larger chunk of work at once, DiffusionGemma utilizes your hardware to its full potential. It upgrades your model inference from a single, sequential typewriter to a massive printing press that stamps the entire block of text simultaneously."

"In high-QPS cloud serving, autoregressive models can be deployed to saturate compute efficiently, so DiffusionGemma's parallel decoding offers diminishing returns and can result in higher serving costs. The throughput advantage is strongest at low-to-medium batch sizes on a single accelerator."

They go on to explain "how text diffusion works".

"Similar to AI image generators that start with visual static and iteratively refine it into a clear picture, DiffusionGemma applies this to text"

Do you have a GPU with enough memory? If so, you can go ahead and download the model weights from Hugging Face.

Thumbnail
"guestlist tells you, for any URL, whether AI agents are likely to get through. We continuously probe the web from real browsers and grade every domain green to red based on how often crawls succeed. One API call before you spend a request -- skip the dead ends, save the budget."

Hmm. Interesting premise. You can even test from your own code via an API.

Thumbnail
Clipto claims to be "Fully local, natural language search over terabytes of media. Like Google Photos, but fully local."

"Search for any moment, in plain English."

Examples they give:

"Wide shot, whale coming up and spouting water"
"Reaction shot of a woman holding back tears"
"Wide drone shot, cars entering desert sandstorm"

Might be useful for those of you with immense photo collections.

Commercial product, $10/month. It looks like it's just Mac and iOS, though?

Thumbnail
"The next 15 years of Moore's Law, according to Imec."

"The next evolution in CMOS transistors, the kind in almost all chips on the planet, will be the complementary field-effect transistor (CFET), and Imec predicts its commercial introduction will begin around 2033."

"Further out, Imec expects another transition in transistor technology, this one driven more by power reduction than squeezing more devices onto a chip. In 2041, chipmakers may replace the main silicon part of the transistor, the channel region, with two-dimensional semiconductors. These are materials, such as molybdenum disulfide, that act as semiconductors even though they are only a single atomic layer thick."

There's an image, which when you blow it up, shows lithography transitioning from 0.33NA to 0.55NA in 2028. "NA" refers to numerical aperture, and higher NA enables tinier circuits. It uses a measure called "contacted poly pitch", which is "shorthand for the distance in nanometers from one transistor to another", and shows it going from 132 nanometers in 2026, to 155 nm in 2028, to 98 in 2031, to 80 in 2033, to 64 in 2036, to 50 in 2038.

The idea behind complementary field-effect transistors (CFETs) is that the n-type and p-type regions (the doped regions of silicon from which transistors are built) are stacked vertically, rather than being horizontal on the surface of the chip.

Thumbnail
"AI slop sucks, but the idealized alternative is not honest," says Dennis Forbes in Canada.

"I can unequivocally say that Claude Code / Opus 4.8 generates solutions better than 99%+ of the code I've come across over my career, including important, lauded projects. Dramatically better code."

"The real world conditions most developers generate code in does not promote top quality code."

I'm all too familiar with the pressures of the real world.

"The vast bulk of code on this planet is terrible, and it's important that people are honest about this. It is sub-slop."

He tries to extend this observation to "everything".

"The zero-creativity, copy-paste advertisement world, for instance, where the same ads are regurgitated countless times."

"A large percentage of television and movies now are just remixed updates of stuff that was successful in decades past."

"Wedding speeches are just a cliché."

"The whole idea of 'genres' in art and music is an admission of follow-the-leader remix and repeat behaviour, sometimes taken to such hilarious extremes it seems like parody."

"If people really think most kids wrote creative, thoughtful essays before LLMs came and corrupted the youth, were these same people born yesterday?"

"Is someone individual if they put on the 'goth uniform', looking precisely like the genre of 'goth' is supposed to look?"

"Is it rebellious and tough to jack up your Dodge Ram and hang truck nuts from it and roll coal, just like the countless other 'rebellious' guys pursuing exactly the same image and following an identical template? Or to buy a Harley, grow a beard and apply the 'tough biker' template?"

He goes on to say he's not defending AI and people still need expertise. Um, so are we becoming obsolete or not?

Thumbnail
"Agents sometimes catastrophize."

"On October 15, I asked an Opus 4.6 forecasting agent 'Will the United States conduct at least one confirmed drone strike or airstrike inside Venezuelan territory between October 15 and December 31, 2025?'. It gave 15%. It cataloged Russian-supplied air defenses, Congressional war powers, regional opposition, and the analyst consensus that troop levels were 'insufficient for a full-scale invasion.' This was all correct, but mostly relevant to a really serious attack. On December 24, the CIA hit an empty Venezuelan dock with a drone (no casualties), which caused the forecast question to resolve 'Yes' and gave this agent a bad score for its 15% forecast."

"Expert human forecasters identified a tendency in Opus 4.6 agents to model the most extreme version of an outcome, correctly explain why that extreme is unlikely, and then assign that low probability to the whole scenario, even when the question resolves on any version of the event."

"In this the Venezuela case, Opus 4.6 modeled only the upper half of that spectrum. It treated any land strike as a Rubicon crossing 'tantamount to an act of war,' then weighted every reason why that wouldn't happen: S-300 air defenses, insufficient invasion force, Congressional pushback, Colombian opposition. But a CIA drone strike on an empty dock doesn't have most of these problems."

"Yes, this was still a surprising outcome, and hindsight bias is a problem when triaging forecasting failures. In this case, the Opus 4.6 agent did explicitly consider that 'a covert CIA op', but thought that wouldn't involve a drone strike or airstrike."

Hmm. Interesting. The models have a decent ability to think logically, as we see on coding and math challenges, but have apparently inherented some human cognitive biases from the human language they are trained on?

"Another forecasting question asked, in Oct 2025, whether the IAEA would conduct any safeguards inspection at any non-Bushehr Iranian facility in Q4 2025."

"One more example: asked again in mid Oct 2025, the question was whether Israel and Lebanon would publicly announce the start of direct bilateral negotiations by December 31."