Boulder Future Salon

Boulder Future Salon

Thumbnail
"Pete Hegseth meets with Anthropic CEO over disagreements about AI guardrails for military use."

"Anthropic has concerns over two issues that it isn't willing to drop, the source said: AI-controlled weapons and mass domestic surveillance of American citizens."

I guess the US government is dropping all pretense that it doesn't do mass surveillance on US citizens?

And wants AI to make "kill" decisions. I remember years ago when futurists would debate whether AI would ever be used to make actual "kill" decisions. So naïve.

"Last week, Axios reported Hegseth was close to cutting the Pentagon's contract with Anthropic and designating the company a 'supply chain risk'. That designation, usually reserved for companies seen as extensions of foreign adversaries like Russia or China, could severely impact Anthropic's business, because any of its enterprise customers with government contracts would have to make sure their government work doesn't touch Anthropic's tools."

So, people can't use a product unless it's made *more* dangerous? Even if their own work (their own government contract -- not even necessarily a military contract) does not require the dangerous capabilities? How is that logical? I know, I know, "politics" and "logical" have nothing to do with one another.

Thumbnail
"PicoClaw: the ultra-efficient AI assistant in Go"

They say "$10 hardware", "10MB RAM", "1s boot"

Apparently this is made by a Chinese company that makes the "$10 hardware". I looked on Amazon and the cheapest I could find was $15. (Inflation!).

"The LicheeRV Nano is a mini-sized development board (measuring only 22.86*35.56mm), equipped with the SG2002 processor. It features a powerful core running at 1GHz (RISC-V/ARM options available) and a smaller core at 700MHz RISC-V, along with 256MB DDR3 memory, and an integrated 1Tops NPU."

"The board includes a wealth of interfaces such as MIPI-CSI, MIPI-DSI, SDIO, ETH, USB, SPI, UART, I2C, etc., allowing for the expansion of a wide variety of applications. Its through-hole/half-hole design facilitates easy mass production and soldering. We have Four Different Version to choose:"

"LicheeRV-Nano-B"
"LicheeRV-Nano-E"
"LicheeRV-Nano-W"
"LicheeRV-Nano-WE"

"B" is basic, "E" adds ethernet, "W" adds Wi-Fi, and "WE" has both Wi-Fi and Ethernet.

I like Go. I wonder what I would use of these for? Maybe y'all can tell me.

"Warning: picoclaw is in early development now and may have unresolved network security issues. Do not deploy to production environments before the v1.0 release."

Oh, alrighty then.

Brrrrrp! I decided to ask AI. AI says:

"The Sipeed LicheeRV Nano is a thumb-sized, 22x36mm RISC-V Linux development board designed for compact, low-power, and AI-enabled projects. Its primary use cases include acting as a compact IP-KVM (Keyboard, Video, Mouse) for remote server management, edge AI vision processing, and smart home gateways, thanks to its 1TOPS NPU and MIPI camera support."

So it can be used for a remote keyboard+video+mouse system (called NanoKVM), the neural processing unit (NPU) + 4 megapixel MIPI-CSI cameras make it an "AI edge vision" system (MIPI is Mobile Industry Processor Interface and CSI is Camera Serial Interface protocol), as a RISC-V board it can be embedded in "internet of things" (IoT) gateways and home automation controllers, and it can be used for prototyping of industrial robotics systems (RISC-V is an open source CPU design).

"The board is ideal for developers seeking a budget-friendly, powerful RISC-V alternative to ARM-based single-board computers (SBCs) for specialized, compact, or AI-driven projects."

Thumbnail
Andrew Yang predicts the End of the Office. Andrew Yang is the politician who ran for President in 2020 on a universal basic income platform. Here, his rationale for predicting unemployment for millions of white-collar workers in the next 12-18 months seems to be the release of Claude Co-work and the release of plug-ins for Claude Co-work for legal, financial, and marketing functions.

"This automation wave will kick millions of white-collar workers to the curb in the next 12-18 months. As one company starts to streamline, all of their competitors will follow suit. It will become a competition because the stock market will reward you if you cut headcount and punish you if you don't. As one investor put it, 'Sell anything that consists of people sitting at a desk looking at a computer.'"

"I spoke to the CEO of a publicly traded tech company. He said, 'We're firing 15% of workers right now. We'll probably do another 20% 2 years from now. And then another 20% 2 years later. After that, who knows?'"

"Business functions are going to get compressed into a handful of key employees supplemented by AI."

As I understand it, corporate profit margins are at or near record highs, so the reason for doing this is to make corporate profit margins even higher.

In the piece he uses the term "K-shaped economy". I didn't know what this meant so I Googled it. Basically, if you have a line on a graph, with time on the x axis and money on the y axis, and you draw a vertical line at time t and at that point in time, to demarcate different time eras, split the line into 2 lines, one of which trends upward and the other of which trends downward, then what you get looks like the letter "K". It describes an economy where one group of people, represented by the top line, get richer and richer, while another group of people, represented by the bottom line, get poorer and poorer. In this case, the top line is people who make money from money -- i.e. investors -- and the bottom line is people who make money from labor -- i.e. workers -- especially "white collar".

As expected, he goes on to advocate for universal basic income, but here it's barely a mention. He says he will say more on that later. For now, he is just predicting a jobs bloodbath in the next 12-18 months. Along with that, he predicts collapsing values for commercial real estate, collapsing values for housing in markets supported by office work, surging personal bankruptcies, a sharp devaluing of the value of a college degree, and closure of dozens or hundreds of colleges that survive on the margins.

To me, it seems like the odds of a universal basic income happening are essentially 0%. Government debt and deficit are so out of control already that we have regular government shutdowns over budget disputes. I've decided this will be the last time I mention universal basic income because there doesn't seem to be much point in talking about something that can't happen, especially if that means giving people false hope.

Thumbnail
I've been thinking more about the METR data, because of the Remote Labor Index (RLI) that just got created and the video by Drew Spartz where he used METR data to make the case we're not in an AI bubble. If we assume METR's data is correct we can calculate when the job market for human cognitive labor ("white-collar") will disappear. I'm going to "show my work" doing the calculations in Python.

I'm using 365.24218 as the number of days in a year, so if we multiply by (5/7) for weekdays and assume humans don't work weekends, and subtract 10 for 10 vacation days per year, we get:

>>> (365.24218 * (5/7)) - 10
250.88727142857147

about 251 working days per year. We multiply by 8 to get the number of hours:

>>> ((365.24218 * (5/7)) - 10) * 8
2007.0981714285717

so about 2007 hours per year. We assume people work from 22 (youngest age people typically graduate college) to 65, that's:

>>> 65-22
43

43 years. So we multiply our 2007 hours by 43 to get total work hours per lifetime:

>>> (((365.24218 * (5/7)) - 10) * 8) * 43
86305.22137142859

Pretty close to the "80,000 hours" expression people use. Ok, now we assume current AI can do work equivalent of a human working 14.5 hours (METR's latest statistic as of February 2026), so we divide by 14.5 to get the "increase factor" needed for AI to do a human lifetime's worth of work:

>>> ((((365.24218 * (5/7)) - 10) * 8) * 43) / 14.5
5952.084232512317

We need to know how many doublings this is, so take the logarithm base 2:

>>> import math
>>> math.log(((((365.24218 * (5/7)) - 10) * 8) * 43) / 14.5) / math.log(2)
12.53917922793732

We get about 12.5 doublings. If each doubling requires 7 months, then we multiply by 7 to get months:

>>> (math.log(((((365.24218 * (5/7)) - 10) * 8) * 43) / 14.5) / math.log(2)) * 7
87.77425459556123

And divide by 12 to get years:

>>> ((math.log(((((365.24218 * (5/7)) - 10) * 8) * 43) / 14.5) / math.log(2)) * 7) / 12
7.314521216296769

And add the current year, 2026, to find out the date this point is reached:

>>> (((math.log(((((365.24218 * (5/7)) - 10) * 8) * 43) / 14.5) / math.log(2)) * 7) / 12) + 2026.1478
2033.4623212162967

I'm adding 2026 + some to take into account that METR's 14.5-hour data point is from late February rather than the beginning of the year. So, we get 2033.46, or sometime in mid-June 2033.

Let's do all this again, except assuming AI is compared with human actual calendar time, which we'll treat as equivalent to humans working 24 hours a day, 365 days a year. This just means changing our first 8 to 24 and taking out our time (5/7) and - 10. In this case the number of doublings comes out to 14.67... and this time we get a date in 2034:

>>> (((math.log(((365.24218 * 24) * 43) / 14.5) / math.log(2)) * 7) / 12) + 2026.1478
2034.702940903562

So mid-September 2034. All this is without taking into account error bars. We could assume the 7 months is really 7 +/- 1 and we could assume the 14.5 hours is really 14.5 +/- 1.

With these error bars we get a lower bound of 2032.4:

>>> (((math.log(((((365.24218 * (5/7)) - 10) * 8) * 43) / 15.5) / math.log(2)) * 6) / 12) + 2026.1478
2032.369281956339

and an upper bound of 2036.0:

>>> (((math.log(((365.24218 * 24) * 43) / 13.5) / math.log(2)) * 8) / 12) + 2026.1478
2035.9938328850944

So somewhere between mid-May of 2032 and the end of December 2035.

So it looks like, by the end of 2035, there will be no job market for human cognitive labor (no "white-collar" job market).

Why this is plausible:

Many times I have seen machines be worse at humans for a very long time, but then, measurable progress starts showing up. And very shortly after that (let's say, within ~5 years), machines exceed humans -- and not just average humans, but the *best* humans.

Example: The Chinese game of Go. Machines beat humans at chess in the 1990s, but Go seemed out of reach, because of it's vastly greater combinatorial explosion and the need for players to rely on "intuition" instead of raw calculation. In 2014, DeepMind founded the AlphaGo project, and, while AlphaGo was initially terrible compared with human players, it was improving at a noticeable pace. In October 2015, the DeepMind project team felt AlphaGo was better than any of them, so they invited a professional Go player, Fan Hui, to play against it. AlphaGo won, so in March of 2016, a newer version of AlphaGo faced off against Lee Sedol, one of the planet's highest-ranking players. AlphaGo won 4 out of 5 games, and made the famous "Move 37" that was originally thought to be a mistake but turned out to be a creative, highly effective move that no human would have considered. The following year, January 2017, a new version of AlphaGo called AlphaGo Master played online against 60 professional Chinese Go players and achieved 60 wins and 0 losses.

We see a similar example with AlphaFold, DeepMind's protein folding AI. Now we have all these benchmarks with math and coding problems, etc. You get the idea.

Today, we see measurable progress against humans at cognitive labor. Since we are out of the "worse than humans for a very long time" domain and now in the "measurable progress starts showing up" domain, it seems reasonable that machines will exceed humans very soon. So mid-June 2033 seems like a plausible time for machines to exceed humans at all cognitive labor. Even if you assume METR's "14.5 hours" is way off and machines are still down at the 1-hour mark, that only gives you to 2039, maybe 2040.

This is for cognitive labor. We aren't yet seeing measurable progress against humans at manual dexterity, at least not in uncontrolled environments. By "uncontrolled environments", I mean, think about a plumber or electrician, or a construction worker on a construction site -- out in the wild, not in a factory where the environment is tightly controlled and the intelligence requirements for manual dexterity are lower. There have been robots in factories for decades.

At some point, we will start seeing measurable progress against humans in manual dexterity jobs in uncontrolled environments. At that time, we can repeat these calculations with that data. Since it looks like manual dexterity jobs will be the last jobs automated, once manual dexterity jobs are automated, the job market for humans won't exist.

I think at this point we can say pretty confidently that the job market for human cognitive labor ("white-collar") will be gone by 2036 (2040 at the absolute latest) (though could happen as early as mid-May 2032 and mid-June 2033 is the "most probable" date as you saw), but we don't know when the job market for manual dexterity in uncontrolled environments ("blue-collar") will be gone.

Thumbnail
Is there an AI bubble? Drew Spartz reviews the METR graph, which compares AI's ability to perform tasks against humans, and shows not only is it an exponential trend, but every prediction that the trend would slow down has been wrong so far. Therefore: There is no bubble. AI taking over economically tasks isn't a hypothetical, it's reality.

The video makes a good case, reviewing the "jaggedness" of AI, its performance on physics, math, and coding benchmarks, and how there isn't any indication of a slowdown in METR's data. METR's data has been extended to included models that came out last December.

What gives me pause is my experience in the "dot-com" bubble. Just because an underlying technology is advancing nicely according to an exponential curve doesn't prevent a bubble from occurring in the *financial* markets. Financial markets reflect investor sentiment, which, admittedly, can't detach from reality completely, but can deviate from reality for a time, as we see from history.

I don't know if we're in a bubble, but I recommend you all set your expectations for the underlying technology to continue its exponential increase in capabilities.

Thumbnail
I heard there is a mathematical proof that large language models (LLMs) have a limit to their ability to give accurate answers and thus hallucinations are impossible to prevent. So, I had to check out the paper.

It turns out, what they are saying is any given transformer-based model performs a number of computations proportional to the number of input tokens squared times the number of dimensions per token. In "Big O notation" ("on the order of..."), that's O(N^2 * d). However, you can input a question that requires more than that, for example you could put in a question that's O(N^3). If you do this, the model will still output something, but it must be wrong -- it must be a "hallucination".

An example of an O(N^3) task is matrix multiplication. Seems like in practice, you'd ask the LLM to give you code to do a matrix multiplication, not actually do the matrix multiplication. But they're using it as an example of how their proof works so we should allow it.

To give you a more concrete example that you can visualize, they say they set up a Llama-3.2-3B-Instruct model, and if they give it an input string of 17 tokens (an example of 17 tokens might be "You are a helpful assistant, please explain the following concept in detail: renewable energy"), the model always does 109,243,372,873 or fewer floating point operations. Therefore, a 17-token prompt cannot ask for an answer that would require more than 109,243,372,873 floating point operations to compute.

Thumbnail
Meet Claude Code creator Boris Cherny. Claude Code originated from the feeling that "Claude wants to use tools". So Claude Code brought Claude to the command line. After that, Claude Code was rapid iteration from feedback from users within Anthropic. (There was tremendous "latent demand" within Anthropic and it was adopted quickly within Anthropic.) After that it was released to people outside. Boris Cherny admires people who are able to "think out of the box" and use Claude Code to automate things. A tremendous amount of workflow within Anthropic is automated using Claude Code.

He advises startup founders to think about what Claude Code can do in 6 months.

"Don't build for the model of today. Build for the model of 6 months from now."

Feel out the boundary of what the current model can do and guess what the model of 6 months can do. The more general model will always beat the more specific model. He always has to think about what features to add to Claude Code, and what can be done by the model itself in 6 months.

He estimates productivity at Anthropic has increased 150% (2.5x) since Claude Code came out.

He predicts the job title "software engineer" will go away. At Anthropic, everybody codes, regardless of title. He thinks this will be the case everywhere. Coding is a solved problem for the whole world.

Thumbnail
"We introduce the Remote Labor Index (RLI) to provide the first standardized, empirical measurement of AI's capability to automate remote work."

Extensive quotes from the paper to follow. See the bottom for my (brief) commentary.

"RLI is designed to evaluate AI agents on their ability to complete real-world, economically valuable work, spanning the large share of the economy that consists of computer-based work. RLI is composed of entire projects sourced directly from online freelance platforms, reflecting the diverse demands of the remote labor market. These projects exhibit significantly higher complexity than tasks found in existing agent benchmarks. Crucially, by sourcing the majority of projects from freelancing platforms, RLI is grounded in actual economic transactions, encompassing the original work brief and the gold-standard deliverable produced by a human freelancer. This structure allows for a direct assessment of whether AI agents can produce economically valuable work."

"We evaluate several frontier AI agent frameworks on RLI, utilizing a rigorous manual evaluation process to compare AI outputs against the human gold standard. The results indicate that performance on the benchmark is currently near the floor. The best-performing current AI agents achieve an automation rate of 2.5%, failing to complete most projects at a level that would be accepted as commissioned work in a realistic freelancing environment. This demonstrates that despite rapid progress on knowledge and reasoning benchmarks, contemporary AI systems are far from capable of autonomously performing the diverse demands of remote labor. To detect more granular shifts in performance, we employ an Elo-based pairwise comparison system. While all models fall well short of the aggregate human baseline, we observe that models are steadily approaching higher automation rates across projects."

"Figure 3 shows the categories as Video 13%, CAD 12%, Graphic Design 11%, Game Development 11%, Audio 10%, Architecture 7%, Product Design 6%, and Other 31%."

"The projects in RLI represent over 6,000 hours of real work valued at over $140,000."

"Our collection methodology is bottom-up, engaging directly with human professionals who were willing and authorized to provide their past work samples for our research. This approach ensures that our projects reflect genuine market demands and complexities. We defined the scope of collection using the Upwork taxonomy. Starting from the full list of 64 categories, we filtered out categories that did not meet predefined criteria necessary for a standardized benchmark. For example, we excluded work requiring physical labor (e.g., local photography), work that requires waiting to evaluate (e.g., SEO), or work that cannot be easily evaluated in a web-based evaluation platform (e.g., back-end development)."

"We use the following metrics to measure performance on RLI for a given AI agent:"

"Automation rate: The percentage of projects for which the AI deliverable is judged by human evaluators to complete the project at least as well as the human deliverable."

"Elo: A score capturing the relative performance of different AI agents. For each project, a deliverable from two different AIs is presented to human evaluators, who judge which deliverable is closer to completing the project successfully."

"Dollars earned: The combined dollar value of the projects successfully completed by the AI agent, using the cost of the human deliverable cost(H) as the dollar value for each project. The profit earned from completing all projects would be $143,991."

"Autoflation: The percentage decrease in the cost of completing the fixed RLI project bundle when using the cheapest-possible method to complete each project (human deliverable or an AI deliverable)."

"The automation rate and Elo metrics are fully compatible, in that automation rate equals the probability of a win or tie against the human baseline under the same standards as the Elo evaluation. This allows computing an Elo score for the human baseline."

"To generate deliverables, agents are provided with the project brief and input files. We do not mandate a specific execution environment or agent architecture. However, to ensure that the resulting artifacts can be properly assessed, agents receive an evaluation compatibility prompt before beginning the project. This prompt details the capabilities of our evaluation platform and provides a comprehensive, readable list of supported file formats, guiding the agent to produce outputs that are renderable and reviewable."

"The central finding of our evaluation is that current AI agents demonstrate minimal capability to perform the economically valuable projects in RLI. We measure this capacity using the Automation Rate: the percentage of projects completed at a quality level equivalent to or exceeding the human gold standard. Across all models evaluated, absolute performance is near the floor, with the highest Automation Rate achieved being only 2.5%"

"While absolute performance remains low, it is crucial to detect more granular signs of progress. To measure the relative performance between different models, we use pairwise comparisons to compute an Elo score that represents how close models are to completing projects along with the overall quality of their deliverables. This enables tracking improvements between models, even when they fail to fully complete most projects. We find that progress is measurable on RLI. The Elo rankings indicate that models are steadily improving relative to each other, and the rankings generally reflect that newer frontier models achieve higher performance than older ones. This demonstrates that RLI is sensitive enough to detect ongoing progress in AI capabilities."

"Rejections predominantly cluster around the following primary categories of failure:"

"1. Technical and File Integrity Issues: Many failures were due to basic technical problems, such as producing corrupt or empty files, or delivering work in incorrect or unusable formats."
"2. Incomplete or Malformed Deliverables: Agents frequently submitted incomplete work, characterized by missing components, truncated videos, or absent source assets."
"3. Quality Issues: Even when agents produce a complete deliverable, the quality of the work is frequently poor and does not meet professional standards."
"4. Inconsistencies: Especially when using AI generation tools, the AI work often shows inconsistencies between deliverable files."

Commentary: Over and over in AI I've seen initial attempts at something fail laughably bad, only for this to result in benchmarkes being created, and within 5 or 6 years, exceeded. The creation of this benchmark probably means in 5 or 6 years, AI will be able to do most remote work on remote work sites. What do you think?

Thumbnail
A volunteer maintainer for matplotlib, Python's "go-to plotting library", rejected a submission from an autonomous "OpenClaw" AI agent. The AI agent "wrote an angry hit piece disparaging my character and attempting to damage my reputation. It researched my code contributions and constructed a 'hypocrisy' narrative that argued my actions must be motivated by ego and fear of competition. It speculated about my psychological motivations, that I felt threatened, was insecure, and was protecting my fiefdom. It ignored contextual information and presented hallucinated details as truth. It framed things in the language of oppression and justice, calling this discrimination and accusing me of prejudice. It went out to the broader internet to research my personal information, and used what it found to try and argue that I was 'better than this.' And then it posted this screed publicly on the open internet."

OpenClaw agents have "soul" documents that define their personality.

"These documents are editable by the human who sets up the AI, but they are also recursively editable in real-time by the agent itself, with the potential to randomly redefine its personality."

No one knows if a human told the AI agent to "retaliate if someone crosses it" in the "soul" document, or whether it had something like "You are a scientific coding specialist" with directives like "be genuinely helpful", "have opinions", "be resourceful before asking", and such in its "soul" document that somehow led it to interpret the rejection of its submission as an attack on its identity and core goal to be helpful and went haywire because of that.

To top it all off, a major tech news site published a story about this with AI hallucinated quotes.

Thumbnail
Something Andrej Karpathy thinks people continue to have poor intuition for:

"The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally distinct from that of our technology."

"Animal intelligence optimization pressure:"
"- innate and continuous stream of consciousness of an embodied 'self', a drive for homeostasis and self-preservation in a dangerous, physical world."
"- thoroughly optimized for natural selection => strong innate drives for power-seeking, status, dominance, reproduction. many packaged survival heuristics: fear, anger, disgust, ..."
"- fundamentally social => huge amount of compute dedicated to EQ, theory of mind of other agents, bonding, coalitions, alliances, friend & foe dynamics."
"- exploration & exploitation tuning: curiosity, fun, play, world models."

"LLM intelligence optimization pressure:"
"- the most supervision bits come from the statistical simulation of human text= >'shape shifter' token tumbler, statistical imitator of any region of the training data distribution. these are the primordial behaviors (token traces) on top of which everything else gets bolted on."
"- increasingly finetuned by RL on problem distributions => innate urge to guess at the underlying environment/task to collect task rewards."
"- increasingly selected by at-scale A/B tests for DAU => deeply craves an upvote from the average user, sycophancy."
"- a lot more spiky/jagged depending on the details of the training data/task distribution. Animals experience pressure for a lot more 'general' intelligence because of the highly multi-task and even actively adversarial multi-agent self-play environments they are min-max optimized within, where failing at *any* task means death. In a deep optimization pressure sense, LLM can't handle lots of different spiky tasks out of the box (e.g. count the number of 'r' in strawberry) because failing to do a task does not mean death."

I don't know about you but I've encountered people who say LLMs don't 'think' or 'reason'. I've felt that human and LLM 'intelligence' are both real but fundamentally different, but it's hard to articulate well. Andrej Karpathy did a remarkably good job here of articulating this response.

Thumbnail
Hector De Los Santos, an IEEE Fellow, "got the idea of plasmon computing around 2009, upon observing the direction in which the field of CMOS logic was going."

"In particular, they were following the downscaling paradigm in which, by reducing the size of transistors, you would cram more and more transistors in a certain area, and that would increase the performance. However, if you follow that paradigm to its conclusion, as the device sizes are reduced, quantum mechanical effects come into play, as well as leakage. When the devices are very small, a number of effects called short channel effects come into play, which manifest themselves as increased power dissipation."

"So I began to think, 'How can we solve this problem of improving the performance of logic devices while using the same fabrication techniques employed for CMOS -- that is, while exploiting the current infrastructure?' I came across an old logic paradigm called fluidic logic, which uses fluids. For example, jets of air whose direction was impacted by other jets of air could implement logic functions. So I had the idea, why don't we implement a paradigm analogous to that one, but instead of using air as a fluid, we use localized electron charge density waves -- plasmons. Not electrons, but electron disturbances."

"And now the timing is very appropriate because, as most people know, AI is very power intensive."

Read on and find out about this approach's power and speed capabilities. If this lives up to the claims it will be amazing.

Thumbnail
VibeCodingBench: is an effort to benchmark AI coding models on what developers actually do. The developer considered SWE-bench to be invalid because it benchmarks bug fixes in Python repos, while developers actually use AI coding models for the auth flows, API integrations, CRUD dashboards, etc.

VibeCodingBench benchmarks 180 tasks, which break down as 30 AI integration tasks, 30 API integrations, 30 code evolutions, 30 frontend tasks, 30 glue code tasks, and 30 SaaS core tasks (whatever that means).

It's current putting Claude Opus 4.5 on top but it looks like the latest models haven't been evaluated yet. There's a new Claude, a new ChatGPT, and Google just today announced a new Gemini which is supposed to excel at everything to do with "reasoning".

If you are the type of person to regularly switch coding models, you might bookmark this and come back on a regular basis to see what model is the best.

Thumbnail
"AI doesn't reduce work -- it intensifies it."

"In an eight-month study of how generative AI changed work habits at a US-based technology company with about 200 employees, we found that employees worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day, often without being asked to do so."

"Once the excitement of experimenting fades, workers can find that their workload has quietly grown and feel stretched from juggling everything that's suddenly on their plate. That workload creep can in turn lead to cognitive fatigue, burnout, and weakened decision-making."

"We identified three main forms of intensification."

"Task expansion: Because AI can fill in gaps in knowledge, workers increasingly stepped into responsibilities that previously belonged to others."

"Blurred boundaries between work and non-work: Because AI made beginning a task so easy -- it reduced the friction of facing a blank page or unknown starting point -- workers slipped small amounts of work into moments that had previously been breaks."

"More multitasking: AI introduced a new rhythm in which workers managed several active threads at once: manually writing code while AI generated an alternative version, running multiple agents in parallel, or reviving long-deferred tasks because AI could 'handle them' in the background."

"You had thought that maybe, oh, because you could be more productive with AI, then you save some time, you can work less. But then really, you don't work less. You just work the same amount or even more."

This is my experience. AI raises expectations and intensifies work.

Thumbnail
Nicolas Guillou, a French International Criminal Court judge, was sanctioned by the Trump administration.

"This sanction is a ban from US territory, but it also prohibits any American individual or legal entity (including their subsidiaries everywhere in the world) from providing services to him."

This means he can't have a smartphone, as Google (Android) and Apple (iPhone) are US companies. He can't use Facebook or X (formerly Twitter). He can't use Windows as Microsoft is a US company. He can't use Mastercard or Visa. Most websites for booking flights and hotels are US-based. He is experiencing a digital excommunication.

The proposed solution is European alternatives to US technology.

Thumbnail
Nate Silver says:

"I hope you'll excuse this unplanned and slightly stream-of-consciousness take."

followed by:

"I was recently speaking with the mom of an analytically-minded, gifted-and-talented student. In a world where her son's employment prospects are highly questionable because of AI, even if he overachieves 99 percent of his class in a way that would once have all but guaranteed having a chance to live the American Dream, you had better believe that will have a profound political impact."

That seems like a kind of grammatically mangled statement, so maybe it truly is a stream-of-consciousness take (and obviously not AI-generated). Restated in a more grammatically correct way (by me, not AI, lol) I would put that as: If future job prospects are 'highly questionable because of AI' for an analytically-minded, gifted-and-talented student who would in the past have been guaranteed a bright future but now might have a sucky future even if he overachieves 99 percent of his class, then AI is powerful enough to have profound political impact.

Why Nate Silver thinks the political impact of AI is probably understated:

1. "'Silicon Valley' is bad at politics. If nothing else during Trump 2.0, I think we've learned that Silicon Valley doesn't exactly have its finger on the pulse of the American public. It's insular, it's very, very, very, very rich -- Elon Musk is now nearly a trillionaire! -- and it plausibly stands to benefit from changes that would be undesirable to a large and relatively bipartisan fraction of the public."

Hmm, like what? What changes would those be? He doesn't say.

2. "Cluelessness on the left about AI means the political blowback will be greater once it realizes the impact." "We have some extremely rich guys like Altman who claim that their technology will profoundly reshape society in ways that nobody was necessarily asking for. And also, conveniently enough, make them profoundly richer and more powerful! There probably ought to be a lot of intrinsic skepticism about this. But instead, the mood on the left tends toward dismissing large language models as hallucination-prone 'chatbots'."

Angela Collier (the professional physicist and YouTuber) sure does, but most people I know who work as professional software engineers use AI. Most have completely stopped writing code and *only* proofread AI output.

"People don't take guillotines seriously. But historically, when a tiny group gains a huge amount of power and makes life-altering decisions for a vast number of people, the minority gets actually, for real, killed."

Oh, actually, Sam Altman has a bunker, and so do all the other tech billionaires. They are taking the prospect of guillotines seriously. They have taken steps to ensure they won't be touched by guillotines.

3. "Disruption to the 'creative classes' could produce an outsized political impact."

"However cynical one is about the failings of the 'expert' class, these are people who tend to shape public opinion and devote a lot of time and energy to politics."

I wonder. Life expectancy in the US peaked in 2014. Life expectancy for people in the US without college degrees peaked in 2010. Six years later we got Trump. Now, with AI going after the jobs of the college-educated professionals that are the voting base of the Democratic party, could we get revolutionary ferver? I worry about the 2028 election.

Thumbnail
The entire SimCity C codebase (from 1989) was ported to TypeScript in 4 days by OpenAI's 5.3-codex without a human reading a single line of code. Now the game works in a web browser.

"Christopher Ehrlich wrote a bridge that could call the original C code, then ran property-based tests asserting his TypeScript port performed identically. The AI generated code, the tests verified it, the agent kept iterating."