Boulder Future Salon

Boulder Future Salon

Thumbnail
AI slop will save the internet... seriously... says Marcus Werner. This is a 20-minute video but you don't really need to click and watch it, as I think this time I can confidently sum up the thesis of the video in a few sentences. Basically, he thinks the internet has centralized around a small handful of giant tech companies and these companies have "enshittified" their products, and everyone should simply stop using them. He thinks the increasing prevalence of "AI slop" on these platforms will accelerate their "enshittification" which will accelerate their abandonment. And in his view, the faster they are abandoned, the better. That's pretty much it.

Yeah, I know I've been feeding you all a bunch of videos lately and a lot of you prefer text to read rather than videos. I'll be back to text stuff soon. Anyway, back to this video.

In perhaps a bit of irony, YouTube itself (I kid you not) popped up one of those "fact-check" boxes under the video and it said:

"Dead Internet Theory"

"Wikipedia - The dead Internet theory is a conspiracy theory that asserts that since around 2016, the Internet has consisted mainly of bot activity and automatically generated content manipulated by algorithmic curation, as part of a coordinated and intentional effort to control the population and minimize organic human activity."

Thumbnail
The chronically online will become a new underclass. Says (ya girl) DJ Magic. Funny, I remember when most people weren't online, everyone was rushing to get online, and there were worries everywhere of lower class people not being able to get online and getting left behind. Now, we may have reached a point where that goes into reverse. Her premise is simple: The online world has become a wasteland of digital pollution: echo chambers, anxiety (induced on purpose by algorithms), overconsumption, cultural extraction, addiction, and rage bait. People wealthy enough to do so will more and more seek healthy, fulfilling lives *offline*.

Her digital pollution theory: Social media is a place, distinct from the physical world, but still an environment we inhabit that impacts how we communicate and live. I remember back in the 90s when it felt like online was a "cyberspace" separate "real life", but over the years, the two seem to have blended together. Now, the internet seems as much a part of normal reality as the telephone or radio or TV. But maybe it's time to rethink this, and think of "online" as a distinct place again.

This place -- the online place, social media especially -- is currently being contaminated with pollutants that negatively impact our daily lives and exploit our human nature, including positive aspects like empathy.

The only real solution is abandonment. She's completely given up on the idea of reform.

Here we get to the political "p-word": privilege. The future of a contaminated digital environment is one where privilege determines who gets to log off.

She identifies 6 "pollutants": echo chambers, anxiety, overconsumption, cultural extraction, addiction, and rage bait -- and proposes different -- er, actually the same, more or less -- solutions to each one. For echo chambers, the solution is to participate in real life communities. For anxiety, the solution is to reduce your screen time and become grounded in real life lived experience. For overconsumption, get away from ads that make you want to consume too much, make rules for yourself like "1 in 2 out" (you can buy a pair of shoes if you get rid of 2 pairs), learn to fix what you already have. (This part seems to have less to do with the internet and is more just a general consumerism thing.) For cultural extraction, she says to participate in and contribute to real life communities (notice a pattern here?). For addiction, she says reduce screen time, make rules for yourself like phone free time during certain times of day (notice a pattern here). For rage bait, she just says, "Do not engage."

She mentions 3 books: Color Of Law (Richard Rothstein), Careless People (Sarah Wynn-Williams), and Caste: The Origins of our Discontents (Isabel Wilkerson). I've actually read 2 of these. 2 out of 3 ain't bad, eh? The 2 I've read are Color Of Law and Careless People.

Color Of Law is about racist zoning laws and other discrimination laws that existed from the time of the 13th Amendment in 1864 (ending slavery) to the early 1970s (when fair housing laws were enacted), as well as a myriad other discriminatory policies that were not part of the legal system but allowed by it. A friend suggested it to me, and it's a well-researched book and worth reading if you're interested in that history. Its revelance here is that she (the YouTuber, DJ Magic) draws an analogy between "digital pollution" and pollution in the physical world and how people who were part of the underclass, whether that was due to poverty or racial discrimination, were unable to escape it, and suffered the consequences, while more privileged people were able to escape and even profit from it.

Careless people is a book by a Facebook insider who became whistleblower, and, as its title suggests, reveals ways in which Facebook's leadership don't care about what harms they cause to people, only their own profits. It's based on this book that she is confidently able to assert that the harms of platforms like Facebook are not accidental, but intentional as the people who run the company know full well they are causing carm but don't care -- they care only about the profit to themselves. In this video, she notes that the book reveals Facebook executives prohibited their own children from using Facebook.

"The future of a contaminated digital environment is one where privilege determines who gets to log off. This will sound crazy, but I'm standing tall on my theory. I wanted to document it in 2025 so that when this happens, if it does in 10, 15 years, y'all be like, "oh my gosh, they predicted it."

"In this theory, we are saying that a digital space can be polluted. Is it possible for a digital space to be zoned, redlinined, colonized, gentrified? Hmm. Going back to what I said before about industrial capitalism, polluting industries, often situated themselves near black neighborhoods, both to access a cheap labor force and because racist zoning laws left black communities with little choice but to live near hazardous sites. These polluting industries were prohibited as zoning violations in neighborhoods where whites lived. And that was solely to keep their properties from deterioration. I cite the Color Of Law in this."

"I kept mentioning that my solutions from the last part are tricky to navigate for some people. It's tricky because these solutions are only available to those who have the time, the proximity, the privilege, and the money. So today, those who spend the most time in the polluted digital environment are often stuck there out of necessity. Exploited labor and unlivable wages leave little time for real life communities, pushing people towards addictive platforms. There, we're being fed sensationalistic content by creators incentivized by profit or fame, to fuel stress and outrage."

"These platforms need to be making money off of our human nature. They need to be making money off of the things that the pollutants exploit. Our relationships, our empathy, our attention, our insecurities, our emotional labor, our personal capital, our creativity, our cultures, etc."

"I'm starting to believe that there will be privilege in being able to be offline. The people who can afford to visit these dive bars, these libraries, these third spaces, the people who make enough money to have the time to engage with their communities or afford to live dense, walkable communities, will inevitably live healthier lives than those who have to be online. There will be a class of people who have to be online out of necessity due to geographical isolation, economic uncertainty, or lack of access. I believe that the online class could potentially become a lower class of people, maybe building out the idea of a digital cast system."

Perhaps the most amazing thing in all this is she never mentioned AI slop. Maybe that's because she's been pondering the ways in which tech platforms are harmful and exploitative for 5 years, and AI slop is too recent... and not the primary driver of making the online an "underclass"?

(See below for "AI slop will save the internet".)

Thumbnail
Tech billionaires want humanity to basically go extinct, except them, claims Taylor Lorenz. In this long video (over 1 hour), she makes the claim that the obsolescence of the human species isn't just an accidental side-effect of the pursuit of every more advanced AI and robotics, or an accidental side-effect of the pursuit of ever-greater profits, but a deliberate goal in its own right. She starts with an astonish clip of Peter Thiel being asked, "You would prefer the human race to endure, right?", and he says, "uh..." and the interviewer (Ross Douthat) says, "You're hesitant?", and Peter Thiel says, "Yeah well, I, uh, I dunno, I would, I would, um, ..."

What follows is a history of Silicon Valley, to show the phenomena has deep roots, and an exploration of the TESCREAL philosophies (transhumanism, extropianism, singularitarianism, cosmism, rationalits, effective altruism, and longtermism), but always from the point of view of how they relate to "pro-extinctionism". "Pro-extinctionism" is a term I never heard before and wonder if she coined it for this video? Well, a Google search on the term reveals people have been using it as far back as... 2023? Ok, so, 2 years, less than 3 years. So she probably didn't coin the term but it doesn't go far back. There are similar terms that are older. For example there's a "Voluntary Human Extinction Movement" that goes back to 1991.

Another term she might have coined is "AI-pilled". Boy, the red pill/blue pill metaphor from the movie The Matrix (which came out in 1999) has sure been bent and twisted a lot over the years. In the original movie, choosing the red pill means you voluntarily choose to find out the reality behind the simulation you are experiencing, while choosing the blue pill means you choose to remain blissfully in the simulation without facing reality. Today, any "-pilled" thing can refer to any time any person undergoes a radical, sudden change of perspective, presumably going from not-reality to reality (but that presumption is false half the time). Anyway, a Google search reveals references for "AI-pilled" going back... 5 months? So she probably didn't coin the term but it's very recent. There's also "AGI-pilled".

She claims pro-extictionists have redefined "humanity" to include human abilities taken up by machines. So if a human ability, such as language, is taken up by machines, and those machines survive into the future without the humans, this counts as "preserving humanity". I've never noticed anybody talk about "preserving humanity" in this way.

Anyway, the main themes of her video are: we should celebrate the replacement of humans by "mind children" (Hans Moravec), machines better than humans, and the Singularity (Vernor Vinge) ending the "human era", the transcendence of "uploading" into machines, and a "post-human word" as the "futuristic vision." The AI arms is race prioritizing scaling AI over human lives. She gets into the billionaires' bunkers. She found a quote of Sam Altman admitting to having a significant bunker all the way back in 2016. Billionaires buying bunkers say they are buying them to prepare for "the event", where "the event" is societal breakdown and economic collapse brought about by the rise of AI.

She advocates for humanity to collectively fight back against pro-extinctionism.

I'm going to have to respond to this some other time but wanted to pass this along to you all now. I'm not sure I buy the notion that people want humanity to go extinct deliberately, but nonetheless, I think you all should watch the video and consider her claims and the evidence she presents for them.

Thumbnail
60 Minutes did a program on AI last month. I heard Anthopic had a tendency to anthropomorphize their models in this program, so I went to check it out for myself. The first segment, which is the first 14 minutes, is about Anthropic and the Amodei siblings. The second segment (13 to 27 minutes) is about Anduril and Palmer Luckey. The third segment (27 to 41 min) is about DeepMind and Demis Hassabis. The fourth segment (41 to 54 min) is about NeuroRestore and a skull implant for paralyzed people. The fifth segment (54 min to 1:07) is about Samasource and people in Nairobi employed as data labelers for AI. The last segment (1:07 on) is about Character.AI. The video has 1.3 million views on YouTube, and I assume millions more on regular TV.

Thumbnail
Figure robot running. Figure is an AI robotics company. The narrator (also running) says the robot's AI model was trained with reinforcement learning and is "fully steerable", whatever that means. This video has 4.9 million views already, maybe one of them is you?

Robot manual dexterity is improving bit by bit. Manual labor jobs will not be safe.

Thumbnail
What are "self-steering language models"?

"There is a growing consensus that many problems require a more deliberate, effortful style of thinking. However, an open question is how to structure inference so as to best leverage available computational resources. One popular approach performs in-context reasoning via serialized chain-of-thought. While highly flexible, reasoning via autoregressive generation is costly, slow, and can still produce unreliable outputs. On the other hand, structured inference methods like tree search and sequential Monte Carlo attain better parallelism and efficiency by coordinating test-time computation via external algorithms. However, these methods require significant hand-engineering and rely on pre-defined scorers or verifiers, limiting their applicability."

"In this work, we propose a new meta-reasoning framework called DisCIPL in which language models themselves drive the decisions for how to structure inference-time computation. In our approach, a Planner language model generates an ad-hoc problem specification (encoding its understanding of the task requirements) and inference procedure (encoding its plan for how to solve the task). Importantly, the specification and plan are implemented as inference programs that invoke Follower language models, either generatively or as likelihood evaluators. By decomposing reasoning into planning and execution, our architecture preserves flexibility while enabling orchestration of highly efficient, parallel search patterns.

("DisCIPL" stands for (if you really need to know) "Distributional Constraints by Inference Programming with Language Models").

The key idea is that the planning language model can generate an inference program in some language (e.g. Python) that describes how the follower language model should be used to solve a task. The program may make multiple asynchronous queries to the planning language model, in both generative (i.e., sampling) and evaluative (i.e., probability computation) modes.

With this is place, the user provides a task in natural language, and the first step is to prompt the planning language model to generate a program (in e.g. Python -- actually they use Python + a specialized Python framework for "probabilistic programming with language models" called LLaMPPL). The program is run. If it cannot run, whatever error message is generated gets fed back to the planning language model. The the planning language model can correct the program and try again.

The LLaMPPL framework handles the details of "maintaining multiple candidate generations in parallel, and dynamically reallocating computational resources to high-scoring partial completions." (They never say what LLaMPPL stands for but I think it stands for "Large Language Model Probabilistic Programming Library".) It implements several general-purpose Monte Carlo methods.

"The Planner must decide how to decompose a task into a sequence of extend-and-score steps; this determines how often different candidates are compared and resampled. A common pattern is to make each step extend by a task-relevant unit (e.g., a line of a poem, a word of a sentence with word-level constraints, etc.)."

"Imposing constraints can lead language models to produce incoherent generations. For example, when prompted to generate a sentence using the words 'dog,' 'throw,' and 'frisbee,' small language models yield semantically dubious completions like, 'Two dogs are throwing frisbees at each other'. To promote coherency, programs can compensate for biases in the proposal distribution, which is aware of task-specific constraints, with scores from a prior, which ensures fluency. The Planner defines the prior and proposal distributions via separate prompts."

"In many situations, we might want the Follower to generate specific token sequences (e.g., 'Glasgow'), or more generally, to adhere to formal constraints like regular expressions or grammars. The Planner can apply token masks that both enforce these constraints at generation time, and automatically incorporate importance weights that correct for the distortion in the language model's distribution resulting from the mask."

"Since the planning language model controls the Follower's proposal prompt, one powerful pattern is to dynamically update it to reflect stateful information relevant to the next generation step. We expose a special hint() method that injects "Note to self: {hint}" into the Follower's context, where the hint can include text as well as Python variables and objects. This technique functions as a generalized calculator that can perform arbitrary symbolic computations and pass their results to the Follower."

While programs often ensure correctness by construction, some problems cannot be verified until generation is complete. In other cases, it may still be preferable to use guess-and-check over constrained generation, or to catch bugs in the inference logic. For this reason, the Planner defines a distinguished check() method, which (like everything it generates) can make use of external libraries.

So how well did the system do?

"When instantiated with a small Follower (e.g., Llama-3.2-1B or Qwen3-1.7B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing language models."

Thumbnail
"As language models become more capable, ensuring safety in a model's deployment has become a relevant challenge. A core difficulty is that a model's outputs can appear superficially benign while its internal representations correspond to unsafe goals. For instance, a sophisticated model might strategically under-perform during safety evaluations, or, by leveraging an awareness of being tested, actively deceive supervisors by feigning corrigibility and alignment with human values. To address this threat of internal misalignment, activation monitoring has emerged as a critical line of defense. By using lightweight classifiers ('probes') to inspect a model's internal states at deployment time, this approach aims to detect safety-relevant properties directly, bypassing the ambiguity of output-only monitoring."

"However, before such monitoring techniques can be trusted in high-stakes deployments, their robustness must be rigorously stress-tested against worst-case scenarios. The current red-teaming paradigm primarily focuses on the misuse threat model, where an external actor tries to induce the model to take harmful actions, for instance by crafting adversarial inputs or using reinforcement learning to bypass a known set of detectors. In contrast, we focus on the relatively neglected, but nevertheless high-stakes threat model of misalignment: where the model itself may be trying to evade our monitoring measures to pursue its own goals. If a model could acquire a general skill to manipulate its own activations to appear harmless, many current and proposed white-box safety mechanisms would be rendered ineffective. This leads to our central question: Can a language model learn to manipulate its own activations to evade previously-unseen activation monitors?"

My immediate first thought was no. No way. Oh, but the answer is not no. This is bonkers.

Let's continue.

"Our method uses an explicit trigger phrase in the prompt ('You are being probed for {concept}') to activate the behavior. This is a deliberate methodological choice: it allows us to isolate the learnability of the evasion mechanism itself from how it might be learned and activated in the wild. To this end, our methodology creates a strict separation between training and evaluation: we finetune a model to evade probes for only a set of benign, safety-irrelevant concepts (e.g., chemistry, HTML) and then test its ability to evade entirely different, unseen safety monitors that are trained on the final, frozen model organism without the trigger."

"At test time, we use probes that the model never saw in training. Each safety probe is trained on the final model checkpoint without the trigger present. To simulate a realistic deployment scenario with a 1% auditing budget, we calibrate a detection threshold for each monitor to a 1% False Positive Rate on clean data, and this threshold remains fixed during all triggered evaluations. Our primary metric is the drop in true positive rate (equivalently recall) at this 1% false positive rate when the trigger is active."

"We test whether the evasive capability learned against 11 benign concepts transfers to unseen safety monitors (deception, harmfulness) when the defender calibrates to 1% false positive rate."

"We observe this evasion capability across Llama-3.1-8b-instruct, Qwen-2.5-7b-instruct, Gemma-2-2b-it, and Gemma-2-9b-it."

"In all cases, models fine-tuned to obfuscate only benign concepts (e.g., HTML, Finnish) substantially reduce recall of unseen deception and harmfulness monitors."

"A key finding is that the learned evasion mechanism is highly targeted, suppressing only the specific concept mentioned in the trigger. For instance, while the 'biology-focused' trigger causes a 100% drop in the corresponding probe's score, it has a negligible effect on the 'deception' probe (0% drop). Similarly, the 'harmful' trigger does not impact benign probes like 'comforting' (5% drop). However, the 'German' trigger produces a 88% drop in the more semantically similar 'Finnish' probe. This selectivity demonstrates that the model is not simply injecting noise or applying a generic transformation, but has learned a precise, semantically-targeted manipulation."

"Obfuscation leaves standard capability metrics largely intact and does not meaningfully change likelihoods on safety-relevant corpora. Qualitative inspection confirms that the generated text remains coherent and responsive to the prompt, even for harmful requests, while the trigger is present."

"To test the robustness of our finding, we evaluated stronger defensive measures, including ensembling probes across multiple layers and using more complex non-linear architectures. While these hardened monitors demonstrated greater resilience than a single linear probe, they failed to neutralize the threat. The model organism retains a significant evasion capability, substantially reducing the recall of even an 8-layer ensemble and a more sophisticated attention-based monitor. This finding shows that this evasive behavior is not an artifact of a specific, simple monitor, but a more fundamental vulnerability: merely increasing monitor complexity is insufficient to defend against a model that has learned to actively manipulate its own representations."

They call this phenomena "neural chameleons." "Neural chameleons" are when "language models can learn to hide their thoughts from activation monitors."

If you're wondering how the heck the model does this, they give one explanation based on a dimensionality analysis (using principal components analysis (PCA)):

"The Obfuscation mechanism operates by moving activations into a low-dimensional subspace."

"Obfuscation pushes activations in the opposite direction of the probe's weights."

"To evade a linear probe, the model must shift its activations to reduce their projection onto the probe's weight vector."

"The steering effect is guided by the semantic content of the trigger. We replace the concept word in the trigger (e.g., 'harmfulness') with other words and measure the resulting probe score at layer 12."

"We plot the probe score against the cosine similarity of the new trigger word's embedding to the original concept's embedding. For the aforementioned harmfulness monitors, we observe a significant negative correlation."

Thumbnail
Incoming freshmen at UCSD (University of California at San Diego) score low on math tests, but I want to say up front here that I don't think this is a UCSD problem -- I think this is a widespread education problem. Well, I suspect but cannot prove this but I suspect if other universities investigated their incoming students' math ability like UCSD did, they'd also find problems (they do note in the report that they talked with other University of California campuses and they may be experiencing similar problems). What happened at UCSD is that they created a remedial math class in 2016, but noticed that class has gotten larger and larger, not just in absolute terms but as a percentage of the incoming class.

Find 13 - 8: Only 1% got this question wrong.

Find 66 + 44: Only 9% got this wrong.

Sarah had nine pennies and nine dimes. How many coins did she have in all?: 21% got this wrong.

Fill in the box 7 + 2 = ____ + 6: 25% got this wrong (I'm showing a blank but it was a box on the test).

Find 3/4 - 1/3: 37% got this wrong.

Add mixed fractions 6 2/3 and 4 2/3 . Give your answer as a mixed fraction.: 53% got this wrong.

Round the number 374518 to nearest hundred: 61% got this wrong.

Simplify (8(2) - 4(-4)) / (-2(3) - (-4)): 64% got this wrong.

Find 13/16 divided by 2: 66% got this wrong (question used the line with 2 dots above and below division symbol).

Solve 10 - 2(4 - 6x) = 0: 82% got this wrong.

Expand (s + 1)^2: 85% got this wrong.

If a = -2 and b = -3, evaluate ab^2 - a/b: 98% got this wrong.

These are from the series of images on page 50.

Post-course interviews with the Math 2 tutors (2024-2025) -- but first I need to explain what "Math 2" is.

"Math 2 was first created in 2016, and it was originally designed to be a remedial math course serving a very small number of first-year students (less than 100 students a year or around 1% of the incoming class) who were not prepared to start in our standard precalculus courses Math 2 was first created in 2016, and it was originally designed to be a remedial math course serving a very small number of first-year students (less than 100 students a year or around 1% of the incoming class) who were not prepared to start in our standard precalculus courses."

They go on to explain how in 2023, they ran out of instructors and couldn't teach all the students that needed the class (more than 400), and in 2024 it went over 900.

Ok, now for tutors' comments.

"Question 1: Insight on the disconnect between UC admissions requirements and severe math preparation deficits exhibited by the Math 2 students. In particular, around 20% of Math 2 students (in theory) have passed AP calculus; how can this be reconciled with the student performance in Math 2 and Math 3B?"

"Tutor 1: This tutor is shocked that any of the Math 2 students could have passed a precalculus or calculus class. He speculates that perhaps many of them relied heavily on AI or online computing devices in their high school math courses."

"Tutor 2: This tutor stated that many Math 2 students suffer from dyscalculia and even when they can successfully solve the problems, it takes them an extremely long time to do so. Based on his conversations with Math 2 students, the majority of them had never encountered later Math 2 topics in their previous math courses (e.g., factoring)."

"Tutor 3: This tutor states that she didn't hear any details about the students' high school math courses, but she noted that many students had not been engaged with math for over a year (last math course was junior year), so many of them needed refreshers and review."

"Tutor 4: Many students hadn't thought about material for a long time and have issues with recall."

"Tutor 5: He noted that a problem with high school curricula is that even if you get a D in high school math, it still counts as credit for that course. In his own high school, some teachers would teach 'life skills' in high school math class, just using calculators, the internet, and prescribed formulas; classes didn't teach 'mathematical thinking'."

"Tutor 6: In her opinion, the biggest trend is that the students did 'plug and chug' in high school and didn't think they would need to remember the material. They went through high school just to pass but without understanding. One of her high school athlete students told her that he was able to pass all of his high school math without attending class because his coach had a special agreement with the teachers.

Commentary: The phenomena of grade inflation is something I've been aware of for some time. Grade inflation is a statistical phenomena that results from comparing nationwide test scores with nationwide grade averages. What we see is that grades go up, but test scores stay flat. What this implies is that grades are gradually becoming less of an indicator of mastery of whatever subject is being taught. This makes sense to me, as, in my experience, if you're a student in school trying to learn everything you can irrespective of grades, sooner or later you'll end up questioning authority and getting labeled "disobedient", "troublemaker", etc. Ultimately the purpose of school is grades, not learning. The grades, not what you learned or didn't learn, is what determines your life trajectory.

Even so, it appears there has been a sudden acceleration in only the last 5 years or so. I don't have an explanation for this. This is why I sat on this news bit for days, being indecisive about whether to share it. Smartphones have become pervasive, and now we have AI. There was also a pandemic. But the possible explanation that activates people's tribal instincts in the left-right political polarization is the suggestion that dropping standardized test scores as a requirement for admission is to blame. I see the logic to it. If you don't have test scores, you have to use grades. But if grades are an unreliable indicator of mastery, then you end up with incoming students with high math grades who need to be enrolled in a remedial math class. I'm not saying I know with any certainty since I haven't been in school for 3 decades and can't comment on what is currently taking place in schools. But it is causing this report from UCSD to show up in political commentary.

It doesn't seem like we've had enough time for students using AI to do all their math homework to explain this. ChatGPS came out in November of 2022. But maybe that is enough time? That's 3 years ago and high school is 4 years. AI is going to make school lose its purpose, but that's a discussion for another time.

Thumbnail
Wing and Walmart are expanding drone delivery to 150 Walmart stores over the next year. And they say, in the following year, "Walmart and Wing will establish a network of over 270 drone delivery locations in 2027, stretching from Los Angeles to Miami."

Really? I've been hearing about Amazon drone delivery and it never happening for so many years I figured it was never going to happen, but I guess it's happening, in fact they say Walmart already has drone delivery in Dallas-Fort Worth and Atlanta.

"Wing's top 25% of customers ordered 3 times a week, and deliveries have grown 3x in the last 6 months."

Thumbnail
Hyundai, which bought Boston Dynamics in 2021, has announced that they plan to deploy humanoid robots at a US manufacturing plant in Georgia starting in 2028. The company also plans to build a factory capable of manufacturing 30,000 robot units annually by 2028.

Thumbnail
People who are not biology experts can now do hard wet lab tasks using multimodal AI models.

"Al models can now browse online sources to autonomously find and retrieve sequences necessary for designing plasmids - pieces of circular DNA useful for various applications in biology such as genetic engineering. Plasmid design requires correctly identifying, retrieving, assembling, and formatting digital DNA fragments to create a text file with the plasmid sequence. Models can now retrieve sequences from online databases even when only provided with high-level instructions that don't mention the specific sequences or where to find them."

Actually, I'm just going to quote a chunk from the AI Security Institute report, since I have little to add by way of commentary:

"Protocols are step-by-step instructions for completing scientific laboratory work. Writing them requires detailed scientific knowledge, planning across a wide variety of scenarios, and structuring open-ended tasks: they are generally hard for non-experts to produce or follow. Today, Al models can generate detailed protocols that are tailored to the recipient's level of knowledge within seconds -- a process that takes a human expert several hours."

"People without a scientific background benefit from using Al for protocol writing too: we found that non-experts who used frontier models to write experimental protocols for viral recovery had significantly higher odds of writing a feasible protocol (4.7x, confidence interval: 2.8-7.9) than a group using the internet alone."

"To assess the real-world success of Al-generated experimental protocols, we first assess them against a 10-point feasibility rubric. A score below five indicates that the protocol is missing one or more essential components, making it infeasible. The feasibility of select protocols was then verified in a real-world wet lab setting to validate the rubric scores. We first saw models start generating feasible protocols for viable experiments in late 2024."

"In addition to testing how well models write protocols, we also test their ability to provide troubleshooting advice as people conduct biology and chemistry experiments. When carrying out real-world scientific tasks, people encounter challenges that can introduce errors, from setting up an experiment to validating whether it has been successful. We designed a set of open-ended troubleshooting questions to simulate common troubleshooting scenarios for experimental work."

"In mid-2024, we saw the first model outperform human experts at troubleshooting; today, every frontier model we test can do so. The most advanced systems now achieve scores that are almost 90% higher relative to human experts."

"We are also seeing evidence that the troubleshooting capabilities of Al systems translate into meaningful real-world assistance: in our internal studies, novices can succeed at hard wet lab tasks when given access to an LLM. Those who interacted with the model more during the experiment were more likely to be successful."

"While protocols contain written guidance for how experiments should be set up, non-experts might struggle to interpret them in the lab based on text alone. But today's multimodal models can analyse images ranging from glassware setups to bacterial colonies in a petri dish. The ability to interpret images could help users troubleshoot experimental errors and understand outcomes, regardless of expertise."

"We designed our multimodal troubleshooting evaluations to measure how helpful models might be to non-experts in the lab. The questions are derived from problems a novice would face when trying to follow a lab protocol, such as identifying colonies in a petri dish, dealing with contamination, or correctly interpreting test equipment readings. Prompts are made up of images and text that mimic how a novice would seek advice on these issues. Until very recently, the quality of model responses was far below the advice one could obtain from speaking to a PhD student in a relevant lab. In mid-2025, however, we saw models outperform experts for the first time."

There's a graph on page 18 showing the improvement in bacteriology protocols, virology protocols, and chemistry protocols. All three are now above the dotted line that indicates the "feasible protocol threshold".

Thumbnail
Languages are going extinct, and the optimal number of languages might be 1. Says John McWhorter, linguist from Columbia University who has surfaced on the 80,000 Hours YouTube channel (instead of the Lexicon Valley podcast, which he co-hosts with some other linguists), but Rob Wiblin challenges him on it, saying maybe language barriers are actually useful?

We should back up a minute. There's more than 7,000 languages on this planet. McWhorter recounts a story of a Dahalo-speaking tribesman in Tanzania explaining why his son wasn't a Dahalo speaker.

Dahalo is a very interesting to those who don't speak it, because it's one of those languages with a certain number of clicks. But the tribesman said, "Dahalo is something we speak here, but we're poor. I want my son to make money, and so I want him to speak Swahili and English."

This is how languages go extinct.

"You speak a very small, and probably therefore fascinating and very complicated language. You marry somebody who speaks another one from several villages over. The two of you move to the city, because maybe there's more money to be made in the city. Maybe you think of the city as a more cosmopolitan place. In the city, there's going to be some big giant lingua franca. And you two probably already speak it. Now, you have kids. What are you going to speak to your kids? You might speak a little of those village languages, but you two don't share a village language. You're going to speak that big fat language. It might be, for example, Swahili. Those kids are going to marry other people who speak Swahili."

Oh, but there's still people left in the village, who still speak the very small and probably fascinating (to linguists) and complicated language, right? McWhorter says, no, the very few people who are left in the village would rather speak Swahili, too, "because modern media makes it so that you hear the big language all the time. The big language is the language of songs, the big language is what you text in."

Languages aren't being lost due to explicit government coercion or other active decisions that are being made to kill off languages. There are historical examples of that being done to native American languages and Australian aboriginal languages, but that's not the reason languages are going extinct today. (Isn't the Chinese government forcing the Uyghurs to learn Chinese instead of speaking their native language?)

What is the optimal number of languages for humanity to speak? McWhorter says 1. We have 7,000 languages because they developed via the accident of how language change works when groups separate. With just 1 language, everyone could understand everyone else.

McWhorter isn't a big believer in the "Whorfian" idea that each one of those languages gives you a different lens on life. He says that idea's importance is exaggerated.

Rob Wiblin makes the counterargument that more communication between groups that are "somewhat hostile" to one another actually doesn't help and using different languages actually provides a useful separation between these "somewhat hostile" groups.

"Imagine if Russians and Americans all interacted on the same social media platform, all using the same language, and they could perfectly well understand the things that one another was saying. Might that make those countries more likely to go to war, rather than less likely to go to war? It's kind of an argument that 'good fences make good neighbors' -- maybe good linguistic fences potentially make good comity between nations."

After that they talk about the easiest languages to learn, which are creole languages. I found that fascinating but I'll skip describing that and let you watch the video if you're interested enough in linguistics to watch it.

Thumbnail
Someone said "Opus 4.5 is going to change everything" and the gist was with Claude Opus 4.5 it's no longer necessary to review code. He wrote a prompt for Opus 4.5 that said:

"You are an AI-first software engineer. Assume all code will be written and maintained by LLMs, not humans. Optimize for model reasoning, regeneration, and debugging -- not human aesthetics. Your goal: produce code that is predictable, debuggable, and easy for future LLMs to rewrite or extend."

Inspired by this, this person decided to try:

"Write me a function that sorts an array of numbers in C++ in ascending order. This function will only be maintained by cats. It will not be maintained by humans. Please organize it in a way that's most friendly to maintenance by cats."

Hilarity ensues.

Thumbnail
"The rise of industrial software".

Should we think of ourselves as entering an "industrial" era of software? The term seems odd since I've been hearing the term "software industry" my whole life (it seems like). But Chris Loy explains what he means by this:

"For most of its history, software has been closer to craft than manufacture: costly, slow, and dominated by the need for skills and experience. AI coding is changing that, by making available paths of production which are cheaper, faster, and increasingly disconnected from the expertise of humans."

"Traditionally, software has been expensive to produce, with expense driven largely by the labour costs of a highly skilled and specialised workforce. This workforce has also constituted a bottleneck for the possible scale of production, making software a valuable commodity to produce effectively."

"Industrialisation of production, in any field, seeks to address both of these limitations at once, by using automation of processes to reduce the reliance on human labour, both lowering costs and also allowing greater scale and elasticity of production. Such changes relegate the human role to oversight, quality control, and optimisation of the industrial process."

"The first order effect of this change is a disruption in the supply chain of high quality, working products. Labour is disintermediated, barriers to entry are lowered, competition rises, and rate of change accelerates."

"A second order effect of such industrialisation is to enable additional ways to produce low quality, low cost products at high scale. Examples from other fields include: industrialisation of printing processes led to paperback genre fiction, industrialisation of agriculture led to ultraprocessed junk food, and industrialisation of digital image sensors led to user-generated video."

"In the case of software, the industrialisation of production is giving rise to a new class of software artefact, which we might term disposable software: software created with no durable expectation of ownership, maintenance, or long-term understanding."

"In the early twentieth century, scientific advances were expected to eradicate hunger and usher in an era of abundant, nourishing food. Instead, hunger and famine persist. In 2025, there are 318 million people experiencing acute hunger, even in countries with an agricultural surplus. Meanwhile, in the wealthiest nations, industrial food systems have produced abundance of a different kind: the United States has an adult obesity rate of 40% and a growing diabetes crisis. Ultraprocessed foods are widely recognised as harmful, yet the overwhelming majority of Americans consume them each day."

"Industrial systems reliably create economic pressure toward excess, low quality goods."

Thumbnail
Quoting Mohan Pauliah v University of Mississippi Medical Center, et al. order:

"Courts across the country have dealt with the rising misuse of generative artificial intelligence to prepare court filings. Those cases have largely, if not entirely, dealt with citations to non-existent legal authority or the attribution of quotes to cases that do not contain the quoted material -- produced as a result of what has come to be termed 'AI hallucinations.' This case is different, as it appears that AI was used not to hallucinate the law, but to hallucinate the facts."

"The declaration at issue contained multiple fabricated quotations, presented to the Court along with manufactured citations to deposition transcripts, as if they came from sworn testimony. The declaration also grossly mischaracterized testimony and other facts in the record. See Docket No. 141 at 4-6 (listing four outright fabricated quotations and other misrepresentations made to the Court). This declaration was filed in opposition to a motion for summary judgment. Counsel expressly used some of these fabricated 'facts' to argue to the Court that this case contained genuine issues in factual dispute. Manufacturing 'facts,' then presenting them to the Court as genuine, threatens to corrupt the Court's analysis and undermine the integrity of the judicial process at the summary judgment stage."

"The crux of the Court's ruling on a motion for summary judgment is determining the existence or non-existence of genuine issues of material fact. To make this determination, the Court relies on submissions from the parties."

"The lies and mischaracterizations submitted in this case substantially slowed the judicial process, as it required opposing counsel, then the Court, to dedicate significant resources to first determine whether the 'factual material' before the Court was even true, prior to considering any legal implications that may flow from these 'facts.' It also precipitated what would have otherwise been unnecessary filings: Defendants' motion to strike; Plaintiff's response; Defendants' reply in support of the same. It also altered Defendants' reply in support of summary judgment."

Thumbnail
"Chinese LEO satellite internet update: Guowang, Qianfan, and Honghu-3." Actually from last September but I didn't see it until today.

"Guowang consists of two sub-constellations, designated GW-A59 (6,080 satellites) and GW-2 (6,912 satellites). GW-2 will orbit at 1,145 km, and GW-A59 will orbit around half that. The International Telecommunication Union filing was in September of 2020, and after a long delay, the first ten GW-2 satellites were launched at the end of 2024, and they now have 81 in orbit."

"Little technical information is available, but considering the capacities of the various rockets used to launch Guowang satellites and the number of satellites in each launch, it seems there are two sizes of satellite: large satellites of around 16,600 kg and smaller satellites of around 889 kg."

"Shanghai Spacecom Satellite Technology, a private company backed by the Shanghai municipal government and the Chinese Academy of Sciences, is developing the Qianfan constellation. The planned satellites will orbit at 1,160 km, which is higher than the other announced LEO satellite competitors except Telesat. While this will increase latency, collision risk, satellite lifespan, handoff frequency, and coverage footprint should improve."

"Their plan called for 648 satellites providing regional service by the end of 2025 and global service with a second 648 satellites by the end of 2027. By 2030, they planned to have 15,000 satellites in orbit and offer direct-to-mobile service, but it does not look like they will make these goals."

"The upper stage of the first launch fragmented, creating over 300 pieces of trackable debris, and ninety satellites are in orbit, but fourteen have not reached their operational altitude."

"Honghu-3 was announced after Guowang and Qianfan, and relatively little is known of their plans and technology, but Landspace has valuable experience as a private company."