Boulder Future Salon

Boulder Future Salon

Thumbnail
When given IQ tests designed for humans, large language models have increased their top score from about 95 to about 130 in the last year (allegedly). Claude had the lead a year ago, was overtaken by ChatGPT, which was overtaken by Gemini, which was overtaken by ChatGPT, which was overtaken by Grok, which is currently the world's smartest model (until the next leepfrog, which will probably happen any day now). Was DeepSeek not tested?

On another test (a MENSA test), ChatGPT makes it to 148.

Commentary: AI may exceed most humans on most IQ tests, but there still seem to be things that humans can do that AI can't.

Thumbnail
"TabPFN-2.5 Model Report"

TabPFN claims to be the world's first foundation model for tabular data, and I didn't know it existed until the 2.5th release.

"Tabular data is ubiquitous, forming the backbone of decision-making in countless domains, from finance to healthcare. For decades, traditional tabular machine learning -- built on gradient-boosted trees, random forests, and linear or additive models -- has been the workhorse of applied data science. Yet these methods remain limited: they require extensive dataset-specific tuning, often provide uncalibrated or unreliable uncertainty estimates without significant modification, and lack the generalization and transferability of modern foundation models."

"Tabular foundation models (TFMs) offer a new paradigm. They address these limitations by pretraining on large synthetic distributions of tabular tasks and performing inference via in-context learning instead of gradient descent. They are training-free predictors meta-trained to yield strong calibration, without the need for time-consuming and labor-intensive hyperparameter tuning necessary for gradient-boosted trees. Their strong generalization makes them particularly attractive for data-scarce domains."

"Our initial release, TabPFNv1, served as a proof-of-concept that a transformer could learn a Bayesian-like inference algorithm, though it was limited to small (up to 1,000 samples), clean, numerical-only data. Our successor, TabPFNv2, scaled this idea into a practical model for datasets up to 10,000 samples. TabPFNv2 handles the messy and heterogeneous data seen in the real world -- including categorical features, missing values & outliers."

What's new in TabPFN-2.5? Improved performance (outperforming tuned tree-based models like XGBoost, with low inference-time latency) and improved scalability (datasets of up to 50,000 samples with 2,000 features per sample all in one context window).

Thumbnail
The end of the renewable energy honeymoon? Asks Zoe Hilton. Starting is easy, the rest is hard?

Wind and solar energy is intermittent, and when it is high, can exceed the local saturation point, a term defined as meaning when supply of energy exceeds demand in a given local area. At other times, it will fail to meet demand. This problem has three solutions, but all three solutions increase costs.

Solution 1 is to simply waste energy. But to have energy to waste means you built out your infrastructure which costs money.

Solution 2 is to move energy through time, also known as storage. But this means you have to build storage systems, such as batteries or pumped hydro, which costs money.

Solution 3 is to move energy through space, also known as transmission lines. Transmission lines, especially if they are transporting massive amounts of energy between far-flung places, cost money.

No country anywhere in the world has found any better solutions. So once wind and solar energy goes beyond the local saturation point, costs go up, and electricity prices go up.

Australia (where this person is) has passed its the local saturation point some time ago, at around 20% of the country's energy coming from wind and solar. Australia's renewable energy honeymoon is over.

For those of you who prefer to read, rather than watch a video, I've linked to the report below.

Thumbnail
"Grokipedia: a first look", by Wikipedia's lesser-known co-founder Larry Sanger. Larry Sanger self-identifies as "conservitarian" (portmanteau of "conservative" + "libertarian"). (Wikipedia's other co-founder, Jimmy Wales, self-identifies as "centrist and gradualist", but supported Lawrence Lessig's 2016 Democratic party presidential campaign and signed on to an open letter urging American voters not to vote for Donald Trump -- according to his Wikipedia page.)

"Last night, I browsed a number of entries and did a deep-dive into an article on the topic on which I am the undisputed leading expert in the entire world: 'Larry Sanger.' I'll tell you what I think of this article, on the reasonable theory that it is fairly representative. Weighing in at 5,901 words, it is longer than the Wikipedia entry (5,592 words by my count), but that includes repetition, which I will explain below. The writing is OK. The Grokipedia generator tends to use longer sentences, leading to a style that is slightly turgid. The style is very much LLM-ese. We all recognize it now: It's readable enough, but often insipid pablum."

"In several cases, inaccuracies went back to bad sources. GIGO."

"Most errors were minor. Often, the problem wasn't so much factual error as dwelling on irrelevancies which might give a human being the wrong idea about some minor detail."

"But some errors were more serious. It says my family was only nominally religious, which is nonsense it hallucinated. It implies my father's scientific profession (seabird biology) was somehow responsible for my becoming an agnostic. It says I found it to be a 'challenge' that there were 'individuals lacking subject expertise' on Wikipedia, which is nonsense; accommodating such individuals was the whole purpose of Wikipedia. It makes it sound, at one point, as if I opposed the whole idea an 'unrestricted open editing' model, when that was the very model I brought to the table with Wikipedia. Some bad journalists have said that, but it was always a lie, and Grokipedia repeats it. There were several more of that type of thing."

"Surprisingly, there was considerable repetition within the article. In fact, the article about me would certainly have been shorter than the Wikipedia one if it had cut out the repetition. There were three summaries of my dissertation. There were two different sections about my conversion to Christianity (one three paragraphs, the other four). There were other repetitions. This seems like an easy fix."

"Vague word salads crop up, and that can be very annoying."

But that is prelude to the key question:

"Is Grokipedia neutral?"

"I built a useful system in Ruby that graded the neutrality of Wikipedia articles."

"In order to run the experiment quickly, I'm simply going to compare the neutrality of Wikipedia versus Grokipedia on a long series of article introductions."

"The data, taken from ChatGPT 4o, is compiled below. 1 is most neutral; 5 is most biased. The remarks in the second and third columns are all generated by ChatGPT 4o, not me."

I'm skipping over the titles but they are all extremely controversial topics.

At the bottom we get to "Average bias rating": 3.5 for Wikipedia, 2.1 for Grokipedia. Remember, larger means more biased.

"According to ChatGPT 4o, which is a competent LLM that is widely perceived to lean to the left, primarily on account of its training data, the Wikipedia articles on these controversial topics, on average, had a bias somewhere between 'emphasizes one side rather more heavily' and 'severely biased.' By contrast, the Grokipedia articles on these topics are said to 'exhibit minor imbalances' on average."

"On these topics, Wikipedia was never wholly neutral, while Grokipedia was entirely neutral (rating of 1) three out of ten times, and was only slightly biased (rating of 2) five other times. Meanwhile, Wikipedia's bias was heavy, severe, or wholly one-sided (rating of 3, 4, or 5) six out of ten times."

"This is not a scientific study, and if Grokipedia boosters present it as one, that will be against my own clear labeling."

Thumbnail
"I'm writing a message to my team. Three sentences. Simple update."

"I read it back. Does this sound right? Is it too direct? Will they think I'm being dismissive?"

"I paste it into Claude. Ask if it sounds okay. Get a response. Tweak it. Ask again."

"Fifteen minutes later, I hit send."

"For a three-sentence Slack message."

"It doesn't make things easier. It makes things worse."

"Because AI doesn't just validate. It rewrites."

"I write something in my voice. Run it through AI. It comes back polished. Professional. Clean."

"And completely not me anymore."

"I send it to my manager for review. He reads it and says:"

"'This is clearly AI generated. The person receiving it won't appreciate that.'"

(Alrighty then.)

Thumbnail
"Google is rolling out a new experiment aimed directly at small and medium-sized businesses (SMBs) that don't have a giant marketing budget or a dedicated design team. It's called Pomelli, an AI marketing tool born from a partnership between Google Labs and Google DeepMind."

"First, it builds what Google calls your 'Business DNA'. You start by just giving it your website's URL. The AI then scans your site and existing images to automatically pull out your brand's specific identity. This profile includes your brand's color palette, custom fonts, and existing images. It also analyzes the text to understand your 'tone of voice'."

"Once Pomelli has your brand's DNA down, it starts generating custom campaign ideas. If you've ever hit a wall trying to brainstorm a new marketing angle, this step is a lifesaver."

"The final step is where it all comes together. Pomelli produces a set of finished, high-quality marketing assets ready for use. We're talking stuff ready to be posted on your social media, used in your ads, or put on your site."

"You aren't stuck with the first draft. The tool has built-in editors that let you change the text or tweak the images right inside the platform."

Hmm. I don't have occasion to try this right now but seems like the kind of thing AI could be really good for. If you give this a whirl, let me know how it goes.

Thumbnail
"Game AI's Existential Crisis: Part 1"

"Game AI, the practice of implementing artificial intelligence to craft the experience of players in video games, is facing an existential crisis."

"A crisis of confidence in technology."

"A crisis that, at its heart, is at about perils of scale and complexity."

"A crisis of handling risk in a land of the risk-averse."

"A crisis of lost knowledge, and of forgotten values."

"A crisis born of a niche that exists for a single purpose, in a world that seldom acknowledges its existence."

"Game AI is the practice of using artificial intelligence to facilitate part of the experience of playing a video game. This can manifest in a game in a variety of different ways:"

"Non-player characters who appear as opponents, enemies, or allies in a game."

"We use it for navigating complex environments in games"

"It is used for strategic opponents in games"

"But also gameplay systems that support the experience in ways players seldom think about (Director systems, Combat management systems, Open-world games manage the spawning of NPCs)"

"Plus, there's an argument to be made - and I will make it - that procedural content generation falls under this remit."

"Game AI -- for the most part -- does not rely on machine learning. Rather, it is powered by symbolic AI"

"Perhaps one of the biggest things most people fail to grasp about game AI, is it's one of the few corners of artificial intelligence where the emphasis is not on achieving a level of accuracy or optimality."

"We want NPCs and other AI opponents to make interesting decisions, but interesting is not necessarily optimal. An enemy in a game should be taking actions that lead to fun gameplay moments, and often that means making them fallible, and leaving room for error."

"So to get into the meat of my thesis, I want to talk first about the immediate future of game AI, and some of the concerns that have rattled around in my head the past few years. These largely fall under four specific themes:"

"Game AI techniques as they stand are struggling under the weight of contemporary game design."

"There is a need for ongoing experimentation in game AI methods, but game productions seldom have the time to explore new ideas."

"As a result, we confine our thinking to designing against known systems, rather than design systems for purpose."

"Amongst all of this, generative AI is often incorrectly held aloft as a solution, failing to understand what the real problems are, and what benefits machine learning could bring to this sector."

"I often mention in talks that if you plot the history of game AI, the most important period is from 1995 to around 2005."

"In an era where graphics pipelines, animation systems, entity component systems, and networking frameworks, have changed and evolved from one generation to the next, game AI has not truly evolved since 2005."

He (Tommy Thompson) goes on to decribe the games industry as an industry where every game involves taking on a lot of risk, and has become risk-averse in many areas where people have come to feel risk can be avoided. Every game involves risk that every design decision within the game won't resonate with players, and games have to take on risk with new graphics technologies to keep up with the competition, and so no, so people have become risk-averse when it comes to AI.

"It's notable that the two big games in development in recent years where NPC AI systems were at the forefront of their design were subsequently cancelled. Monolith's Wonder Woman and Cliffhangar Games' Black Panther titles were both exploring the same idea of establishing relationships between NPCs such that it changes the structure and experience of the game as you play it (i.e. the nemesis system)."

"For what it's worth, I've heard from numerous sources about what those games where shaping up to be, and they sounded like really interesting experiences -- in fact the stuff I heard about Wonder Woman really felt fresh in a way that truly excited me. But they were complex, and expensive propositions."

Thumbnail
"A Skill is a folder containing a SKILL.md file, which includes instructions, resources, and even executable code. Think of Skills as a set of standard operating procedures for the AI. For example, a Skill could instruct Claude on how to format a weekly report, adhere to a company's brand guidelines, or analyze data using a specific methodology."

"The genius of Claude Skills lies in their architecture, which is built on a principle called progressive disclosure. This three-tiered system ensures that Claude's context window isn't overwhelmed with information:"

"Level 1: Metadata: When a session starts, Claude loads only the name and description of each available Skill."

"Level 2: The SKILL.md file: If Claude determines that a Skill is relevant to the user's request, it then loads the full content of the SKILL.md file."

"Level 3 and beyond: Additional resources: If the SKILL.md file references other documents or scripts within the Skill's folder, Claude will load them only when needed."

"This efficient, just-in-time loading mechanism allows for a vast library of Skills to be available without sacrificing performance."

I've already told you all about Model Context Protocol (MCP), but in case you missed it, there's a description of that here, too. It basically enables Claude (or any LLM acting as an "agent") to control other applications.

Thumbnail
Greenlandic, a language spoken by only 57,000 people in Greenland, serves as an example of how a language with few speakers can send AI into a doom spiral.

"Virtually every single article had been published by people who did not actually speak Greenlandic. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes -- from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves."

"AI systems, from Google Translate to ChatGPT, learn to 'speak' new languages by scraping huge quantities of text from the internet. Wikipedia is sometimes the largest source of online linguistic data for languages with few speakers -- so any errors on those pages, grammatical or otherwise, can poison the wells that AI is expected to draw from. That can make the models' translation of these languages particularly error-prone, which creates a sort of linguistic doom loop as people continue to add more and more poorly translated Wikipedia pages using those tools, and AI models continue to train from poorly translated pages."

Thumbnail
China has created a replacement for UEFI, the technology that enables Secure Boot (UEFI stands for "Unified Extensible Firmware Interface"), and the purpose of this is to make China more independent of the United States technologically.

"China has worked for years to further separate its computing progress from the United States and its tech companies. Today heralds a major development to this end, as the Global Computing Consortium has announced the 'UBIOS' global standard, a new replacement for UEFI and BIOS. Fast Technology reports that the GCC's new standard is a rebuilding of BIOS firmware from the ground up, bypassing UEFI development entirely."

"UBIOS, or 'Unified Basic Input/Output System', is a firmware standard to replace BIOS and UEFI, the first and most prolific motherboard firmware architectures, respectively, that bridge the gap between processors and operating systems. The UBIOS standard was drafted by 13 Chinese tech companies, including Huawei, CESI (China Electronics Standardization Institute), Byosoft, and Kunlun Tech. The standard is the first standardized and scalable Chinese domestic firmware, representing a major step forward for Chinese domestic tech development."

Thumbnail
Every economic system fails in an AI world according to Ruri Ohama. She starts off questioning experts' predictions for the near term, where people think jobs like taxi drivers will be replaced. She thinks lawyers will be replaced sooner than people think. AI lawyers have the potential to be less biased than humans, and they are good at recognizing patterns. Maybe a little too good right now, as current AIs hallucinate, but she thinks future AIs won't have that problem.

She thinks AI will be able to automate not just low-end "junior" software engineers, but the expensive "senior" software engineers.

She thinks AIs are capable of creativity. Creativity is combining existing ideas in an original way.

(As soon as I saw the "Napoleon as a rat" picture, I knew AI could combine existing ideas in an original way and thus were creative. If you asked any human to make a picture of Napoleon as a rat and they came up with the picture Dall-E came up with, you wouldn't hesitate to say they were "creative".)

She thinks the job of managers can be automated by AI. Management is resource allocation. AIs are already better than humans at data analysis. CEOs may still be needed as visionaries, but the management layer can be replaced by AI.

(I think it may take longer than she thinks because she's not taking into account office politics. People will "play politics" to keep their jobs as long as possible.)

She thinks AI will become better than doctors at diagnosis. AIs can be taught millions of drug interactions and symptom combinations.

If AI is cheaper than hiring humans, the likelihood of replacing humans is quite high, but it all depends on what AI excels at.

Even if AI creates new job opportunities, AI learns and adapts faster than humans. So humans learn the new jobs but they disappear quickly, and then human have to learn new jobs again, faster each time. It's as if we humans are playing a game of musical chairs, with the music getting faster and faster and the chairs disappearing.

Gen Z already trust ChatGPT more than human salespeople.

People say the economy will shift to an "experience economy", but she wonders who is going to pay for these human-created "experiences", when the majority of people don't have incomes and don't have purchasing power.

(I know the answer to this: the economy has already shifted significantly towards luxury goods and services for the wealthy instead of consumer goods for the masses, and I expect this trend to accelerate.)

She brings up the subject of universal basic income (UBI). She thinks the people who provide goods and services can simply raise prices to absorb the UBI. She uses the example of landlords and rent.

(She doesn't realize this, but in the San Francisco Bay Area, which is geographically constrained -- bounded by the Pacific Ocean, the San Francisco Bay, and the Diablo mountain range which separates the Bay Area from the central valley -- economists have estimated that essentially all increases in average salary have been absorbed by increased rent, and it's actually the landlords that have made all the money, more or less, from the explosion of economic activity in the area. If you ever move to the Bay Area, make sure you become a landlord. Does this principle apply everywhere? Maybe not as not everywhere is geographically constrained the way the Bay Area is.)

She uses this as a segue into a discussion of socialism vs capitalism and her conclusion that there is no economic system that works in an AI world. With AI automating away labor, it also takes away all the bargaining power workers have, even collectively.

She doesn't think AI will create unlimited "abundance" because there will always be things, such as land and human attention, that are not abundant.

"What if the answer is that there is no answer? Because in a VUCA age which is basically high volatility, uncertainty, ambiguity, error, the future is never in an extension of the past. So no one actually knows what will happen in the future. What if we're heading towards an economic system so different that we can't even conceive of it yet?"

("VUCA" is a military acronym that stands for "volatility, uncertainty, complexity, and ambiguity.")

"No one knows what's going to happen next because of the accelerating pace of the changes that are happening in the world."

"So in my opinion asking the question of which jobs will disappear, which jobs will emerge is actually a pointless question to ask."

"I read bunch of books about the experts' predictions from years ago about AI. Even like two years ago and most of the things that they said were completely wrong. So even experts are failing at trying to predict the future because it's so different than we have ever experienced before."

(Most of the books she read are in Japanese, so you might have trouble reading them yourself.)

Thumbnail
James Gurney's issues with AI in the creative process:

"1. Option Overload: It's so easy to generate variations on an idea, but what do you do with all those unused images, videos, drafts, and outlines?"

"2. Creative Intimidation: AI produces results instantly and effortlessly, yet for us humans, creativity remains just as difficult."

"3. Labor Devaluation: Most ads try to get us to buy in on the idea that some part of our creative process is boring, laborious, or tedious."

"4. Workflow Disruption: Where should AI fit in -- early brainstorming, mid-stage refinement, or final polish?"

"5. The Cheerfulness Problem: AI is endlessly agreeable. It never says, 'That's a stupid idea.'"

"6. False Confidence: Can we be creative in a domain where we lack skill? When AI fills in the gaps, it's easy to think we know more than we do."

"7. Audience Erosion: The whole idea of art is to communicate with fellow humans. Yet on some platforms, as much as half the 'audience' are said to be bots."

Eventually, 99% of the internet will be AI bots generating art for AI bots?

"8. Novelty Fatigue: Sometimes it feels like we're working in a woodshop where each week they bring in fancy new tools for us to use."

"9. Homogenization Pressure: AI tools are trained on the same ocean of data, which means they pull our creations toward the average."

"10. Creative Displacement: The more we collaborate with AI, the harder it is for the audience to know who is the real creator."

Thumbnail
Cheatproof.ai claims to detect cheating in coding interviews. The idea is the candidate installs the IDE that you provide to them (which comes from cheatproof.ai) instead of using their own. Then you give them the leetcode-type coding problems that's the standard way to interview software engineers these days (not a good way to interview people but that's a discussion for another time). This IDE spies on them and tries to determine if they're cheating using AI.

"Detection accuracy above 90%. Multi-signal model (behavior, timing, context). Tuned with real-world interview telemetry. Calibrated thresholds to reduce false positives.

"Reduce the risk of bad hires. Catch AI-assisted answers before the offer stage. Verify real problem-solving under interview constraints. Standardize bar across interviewers & teams."

Thumbnail
"Self-adapting language models", which go under the catchy acronym "SEAL".

So the idea here is that language models have knowledge in the parameters of the model, that is "baked in" knowledge, but rely on a limited context window to understand what you are asking it to do, with no long-term learning of that knowledge, and that is a problem to be fixed. The fix these researchers have come up with is a way to transfer knowledge from the context to the parameters.

At first glance it may seem there is no way to do this. The models parameters are a giant set of floating point numbers, and the context is an entirely different set of floating point numbers -- in the form of vectors -- called tokens. But the way these researchers have come up with to do it is to generate a "self-edit" and then use the "self-edit" as training data to adjust the parameters.

The system of coming up with the "self-edits" is itself learned with reinforcement learning -- combined with the original model, so models in some sense generate their own self-edits.

What does a "self-edit" consist of? The simplest way to think of it is as a paraphrasing of the context, combined with what you can think of as basically a set of "reading comprehension questions" used to verify if the context has been learned. This is a simplification; the "self-edit" includes various hyperparameters and other information used in the learning process.

Once the self-edits have been generated, while reinforcement learning is used to create the "self-edit" production process, ordinary supervised finetuning is used to adjust the model parameters until the context information has been sufficiently learned.

Thumbnail
Robot dancing and doing what looks like a cardio kickboxing routine. From Unitree Robotics. They don't say how it works but if it's typical of Unitree Robotics, then they're training the AI brain of the robot with reinforcement learning in simulation in IsaacLab and then using Mujoco for "sim-to-real" training.

Thumbnail
"AI browsers are the self-driving cars of the internet," says Nutanc.

"For over fifteen years, we've been promised they are just around the corner, yet they remain niche. Why? Because the roads weren't built for a perfectly logical, binary-thinking machine. They are populated by human drivers who speed, ignore signals, swerve, get angry, and generally behave in unpredictable ways."

"The internet is the same. It's not a tidy, predictable highway. It's a chaotic, constantly updating, and incredibly human-driven landscape."

"Yes, an AI browser might save you ten minutes of summarising a report or five minutes of filling out a form. These are the 'pros' -- the occasional convenience."

"But just like a self-driving car, the risks of handing over control are too great."

So we're going to be promised AI browsers that can do everything (without risk) are just around the corner for the next fifteen years?