Boulder Future Salon

Thumbnail
AI can't cross this line on a graph and we don't know why.

The graph has the "error" that the neural net is trying to minimize as part of its training (also called the "loss") on the vertical axis.

On the horizontal axis, it has the amount of computing power thrown at the training process.

When switched to a log-log graph -- logarithmic on both axes -- a straight line emerges.

This is actually one of 3 observed neural network scaling laws. The other two look at model size and dataset size, and see a similar pattern.

Have we discovered some fundamental law of nature, like the ideal gas law in chemistry, or is this an artifact of the particular methods we are using now to train neural networks?

You might think someone knows but no one knows.

That didn't stop this YouTuber from making some good animations of the graphs and various concepts in neural network training, such as cross-entropy. It introduces the interesting concept that language may have a certain inherent entropy.

The best theory as to why the scaling laws hold tries to explain it in terms of neural networks learning high-dimensional manifolds.

Thumbnail
Alexis Conneau, OpenAI's research lead for GPT-4o/GPT-5, has left OpenAI to start a new company to create "Her", as in, from the movie. (Alrighty then.)

Thumbnail
"FastHTML: The fastest, most powerful way to create an HTML app".

"FastHTML apps are just Python code, so you can use FastHTML with the full power of the Python language and ecosystem. FastHTML's functionality maps 1:1 directly to HTML and HTTP, but allows them to be encapsulated using good software engineering practices -- so you'll need to understand these foundations to use this library fully."

In the section on "Getting help from AI" it says:

"Because FastHTML is newer than most LLMs, AI systems like Cursor, ChatGPT, Claude, and Copilot won't give useful answers about it. To fix that problem, we've provided an LLM-friendly guide that teaches them how to use FastHTML. To use it, add this link for your AI helper to use:

/llms-ctx.txt
"

I wonder if we're going to see more of this kind of thing for new tech.

Thumbnail
"Oracle is designing a data center that would be powered by three small nuclear reactors" alrighty then.

The data center will require more than a gigawatt of electricity.

The article says the location has been chosen and building permits obtained, but the nuclear reactor designs have not been revealed.

"Small modular nuclear reactors are new designs that promise to speed the deployment of reliable, carbon-free energy as power demand rises from data centers, manufacturing and the broader electrification of the economy. Generally, these reactors are 300 megawatts or less, about a third the size of the typical reactor in the current US fleet."

Thumbnail
"Scientists use magnetic nanotech to safely rewarm frozen tissues for transplant"

"Recently, Yadong Yin and a team developed magnetic nanoparticles -- effectively extremely tiny bar magnets -- that, when exposed to alternating magnetic fields, generated heat. And that heat rapidly thawed animal tissues stored at -238 degrees Fahrenheit (-150 degrees Celsius) in a solution of the nanoparticles and a cryoprotective agent."

"The researchers worried, however, that uneven distribution of the nanoparticles within the tissues might trigger overheating where the particles congregated, which could lead to tissue damage and toxicity from the cryoprotective agent at elevated temperatures."

To reduce risk of uneven warming, the researchers developed "a two-stage approach that more finely controls nanowarming rates."

"In the first stage of thawing, as before, an alternating magnetic field initiated rapid rewarming of animal tissues."

"As the samples approached the melting temperature of the cryoprotective agent, the researchers applied a horizontal static magnetic field."

"The second field realigned the nanoparticles, effectively tapping the brakes on heat production."

"The heating slowed fastest in areas with more nanoparticles, which dampened concerns about problematic hotspots."

The research paper is paywalled so I'm just giving you quotes from the popular news article.

Thumbnail
"How to spot NASA's solar sail demonstration streaking through the night sky"

"NASA loaded a microwave-sized device packed with a four-piece, 860-square-foot, ultra-thin solar sail aboard a rocket and launched it on April 23. That same day, the rocket delivered the object -- called the Advanced Composite Solar Sail System (ACS3) -- into low Earth orbit. After a first unsuccessful attempt, the ACS3 victoriously deployed its booms and unfurled its sails at the end of last month."

"Now that its reflective sail has deployed fully open in orbit, the Advanced Composite Solar Sail System can be seen in the night sky from many locations across the world!" writes NASA's Arezu Sarvestani. In fact, the ACS3's reflective surface can at times even appear as bright as Sirius, the brightest in the night sky."

NASA's mobile app can tell you when to see it. (*You* can -- I could, too, but first, I have to buy a new phone. My current phone is super extremely unbearably slow and has no room to install new apps.)

Thumbnail
"The struggle over digital infrastructure": Commentary by Robin Berjon, deputy director of the IPFS Foundation. InterPlanetary File System (IPFS) is a protocol for a peer-to-peer distributed file system.

"By and large, the Internet seems more akin to a failed state, at best an undergoverned space where fragments of essential infrastructure are provided at the whim of local warlords."

"The cyberlibertarian vision of yesteryears is at the root of the myriad problems confronting global digital governance today."

The cyberlibertarian vision of yesteryear was that bad, huh?

"One important property of the Internet is its adhesion to the end-to-end principle, one formulation of which is: 'Nothing should be done in the network that can be efficiently done in an end-system.' This may seem somewhat abstract, but we can see the end-to-end principle at work in other infrastructural systems. For instance, if you invent a new type of light bulb or toaster, you don't need to change the lighting or toasting functions of the electrical grid."

"Seen from 2024, this might strike readers as obvious but it wasn't always so. Before the Internet emerged as the more successful alternative, it was competing for funding and attention with a much more telecoms-centric model. Under the telecoms model, intelligence resides in the network rather than at the edges."

"Crucially, intermediary capture does not only describe the world we had when networking was dominated by telecoms operators, it is also an apt description of the world we now have in which many infrastructure layers of our digital spaces have been captured by tech platforms."

"Global majority countries (as well as most global minority ones) are entirely right to feel colonized by Google and other platforms, in a very literal sense. The Internet is 'the infrastructure of all infrastructure' and having one's critical infrastructure controlled and exploited by a foreign entity is colonial. But the way forward does not lie with a reversal to the telecoms model."

Ok. What do to, then?

Thumbnail
"Simpler solution for disabling the DCM telematics -- Silencing Antennas"

"We just bought our 2023 Toyota Tacoma TRD Off Road at the end of November and in reading through the manual found out that we could contact Toyota to opt out of data reselling (to insurance companies and advertisers) but couldn't actually disable our Data Communication Module (DCM)/Telematics module from connecting to the internet via user-accessible menus from the truck."

So you can opt out of the data *reselling* but you can't opt out of the data *collection*.

This got my attention because it made me wonder, is this kind of data collection happing for all or almost all new cars these days? Do essentially all new cars have "an internet connected computer sitting on my CAN Bus"?

"After tearing apart my passenger dash by removing the scuff plate, cowl side trim board, instrument side panel, passenger knee airbag, glove compartment plate, and finally the glove box/'instrument lower panel assembly' with a trim tool, a philips, and a 10 mm socket I was able to see the DCM module on the far right with the three antenna connectors."

Post goes on to describe modifications to the vehicle. They are specific to this vehicle so unless you have the same vehicle, and feel comfortable doing maintenance on your vehicle, you might not be interested in what follows.

"If my understanding of the Toyota documentation is correct, this should continue to run happily with the DCM/telematics module believing it is out of cell coverage range, and then just overwriting the oldest events in the internal memory in the vain hope it someday hears a cell tower again. The nice part of this approach is should I (or the dealer) ever need to undo this mod, it's completely reversible. Since the three radios were properly terminated into a 50-ohm terminator, there won't be any damage to the transmitting or receiving side of the DCM module, and there also won't be any damage to the wiring on the D37 connector either."

"Altogether, this only took about an hour and a half to do once I figured out the right combination of connectors to use."

Thumbnail
"Oura has acquired metabolic health startup Veri."

"Blood sugar levels are foundational to the Veri platform. The Finnish company notes, 'Veri does more than show you blood sugar levels. We help you stabilize your levels by providing the insight and guidance you need to find the right foods and habits for you.'"

"Oura CEO Tom Hale tells TechCrunch that, according to an internal survey, 97% of its users are 'really interested in understanding how nutrition affects their health.' The more surprising stat, however, is that 13% of those surveyed have been wearing a continuous glucose monitor prior to the recent increased availability of the devices."

Thumbnail
"Today I read yet again someone suggesting that using ChatGPT to rewrite code from one programming language to another is a great idea. I disagree: a programming language is an opinionated way on how to better achieve a certain task and switching between world views without understanding how and why they do things the way they do is a recipe for inefficient code at best and weird bugs at worse."

"I decided to test my theory with Google's Gemini - I've seen students using it in their actual coding (probably because it's free) making it a fair choice. I asked the following:"

"Convert the following code from Python to Elixir:"

The code looks equivalent, but it's only equivalent for the normal case -- if the input is bad, the Python and Elixir code behave differently. I think this is a good example of how LLMs can make translations that look correct but aren't in subtle ways.

Thumbnail
"Why do you need a time series database inside a car?"

That's a good question. Sometimes I wonder if we've crossed the point where further complexification of cars doesn't yield much benefit. But let's continue.

"As automotive intelligence progresses, vehicles generate increasing amounts of time-series data from various sources. This leads to high costs in data collection, transmission, and storage. GreptimeDB's Integrated Vehicle-Cloud Solution addresses these issues by leveraging the advanced computational capabilities of modern on-vehicle devices. Unlike traditional vehicle-cloud coordination where vehicles are mere data collectors, this new approach treats them as full-fledged servers capable of running complex tasks locally. The evolution from 32-bit microcontrollers to powerful chip modules like Qualcomm's 8155 or 8295 has enabled intelligent vehicles to perform edge computing efficiently, reducing transmission costs and improving overall efficiency."

"GreptimeDB is a cloud-native time-series database built on a highly scalable foundation. However, we did not initially anticipate it running on edge devices such as vehicles, which has presented significant challenges."

"The first challenge is resource usage constraints. GreptimeDB runs in the vehicle's cockpit domain controller and must minimize CPU and memory usage to avoid interfering with infotainment systems."

"The second concern is robustness; GreptimeDB collects critical diagnostic metrics from the CAN bus, so any crashes could result in data loss."

"CAN" here stands for "controller area network" and is a data bus inside vehicles that replaces masses of wires that go directly from components to other components -- it allows any "electronic control unit" (ECU) connected to the bus to communicate with any other.

"Lastly, unlike servers in datacenters, vehicle-based GreptimeDB operates under various conditions -- frequent power cycles, fluctuating ADAS data rates due to changing road traffic, etc. -- and needs to adapt while remaining stable and efficient."

"ADAS" stands for "advanced driver-assistance systems".

Thumbnail
"A top US government codebreaker who decrypted secret Soviet communications during the Cold War concluded that Ethel Rosenberg knew about her husband's activities but 'did not engage in the work herself,' according to a recently declassified memo that her sons say proves their mother was not a spy and should lead to her exoneration in the sensational 1950s atomic espionage case."

Julius Rosenberg gave Manhattan Project nuclear secrets to the Soviets, starting in 1942 and ending ending with his and Ethel Rosenberg's arrest in 1951. Both were executed on June 19, 1953. I thought my whole life Julius and Ethel Rosenberg were both involved in the espionage, but that might be wrong.

Thumbnail
PaperQA2: Superhuman scientific literature search.

"To evaluate AI systems on retrieval over the scientific literature, we first generated LitQA2, a set of 248 multiple choice questions with answers that require retrieval from scientific literature. LitQA2 questions are designed to have answers that appear in the main body of a paper, but not in the abstract, and ideally appear only once in the set of all scientific literature."

"We generated large numbers of questions about obscure intermediate findings from very recent papers, and then excluded any questions where either an existing AI system or a human annotator could answer the question using an alternative source."

These were generated by human experts.

"When answering LitQA2 questions, models can refuse to answer." "Some questions are intended to be unanswerable."

After creating LitQA2, the researchers then turned their attention to creating an AI system that could score highly on it.

"Retrieval-augmented generation provides additional context to the LLM (e.g., snippets from research papers) to ground the generated response. As scientific literature is quite large, identifying the correct snippet is a challenge. Strategies like using metadata or hierarchical indexing can improve retrieval in this setting, but finding the correct paper for a task often requires iterating and revising queries. Inspired by PaperQA, PaperQA2 is a retrieval-augmented-generation agent that treats retrieval and response generation as a multi-step agent task instead of a direct procedure. PaperQA2 decomposes retrieval-augmented generation into tools, allowing it to revise its search parameters and to generate and examine candidate answers before producing a final answer. PaperQA2 has access to a 'Paper Search' tool, where the agent model transforms the user request into a keyword search that is used to identify candidate papers. The candidate papers are parsed into machine readable text, and chunked for later usage by the agent. PaperQA2 uses the state-of-the-art document parsing algorithm, Grobid, that reliably parses sections, tables, and citations from papers. After finding candidates, PaperQA2 can use a 'Gather Evidence' tool that first ranks paper chunks with a top-k dense vector retrieval step, followed by an LLM reranking and contextual summarization step. Reranking and contextual summarization prevents irrelevant chunks from appearing in the retrieval-augmented generation context by summarizing and scoring the relevance of each chunk, which is known to be critical for retrieval-augmented generation. The top ranked contextual summaries are stored in the agent's state for later steps. PaperQA2's design differs from similar retrieval-augmented generation systems like Perplexity, Elicit, or Mao et al. which deliver retrieved chunks without substantial transformation in the context of the user query. While reranking and contextual summarization is more costly than retrieval without a contextual summary, it allows PaperQA2 to examine much more text per user question."

"Once the PaperQA2 state has summaries, it can call a 'Generate Answer' tool which uses the top ranked evidence summaries inside a prompt to an LLM for the final response to the asked questions or assigned task. To further improve recall, PaperQA2 adds a new 'Citation Traversal' tool that exploits the citation graph as a form of hierarchical indexing to add additional relevant sources."

Got that? Ok, that's quite a lot, so to summarize: The system consists of 5 'agents': 1. Paper search agent, 2. Gather evidence agent, 3. Citations traversal agent, 4. Generate answer agent, and 5. PaperQA2 agent which is the 'manager' agent directing all the other agents and that inputs the question and outputs the final answer.

Here's an example question: "What effect does bone marrow stromal cell-conditioned media have on the expression of the CD8a receptor in cultured OT-1 T cells?"

PaperQA2 answer: "The effect of bone marrow stromal cell conditioned media (SCM) on the expression of the CD8a receptor in cultured OT-1 T cells is reported to have no significant impact. According to the the study by Kellner et al (2024)."

The referenced paper says, on page 24: "OT-1 T-cells are primary mouse CD8+ T cells that specifically recognize the ovalbumin peptide residues 257-264 (SIINFEKL) presented on a class I major histocompatibility complex. OT-1 transgenic mice were housed and bred at the Division of Animal Resources at Emory University following all Institutional Animal Care and Use Committee protocols. OT-1 T-cells were magnetically isolated from spleens using CD8a+ negative selection according to the manufacturer's protocol (Miltenyi Biotec, # 130-104-075). Purified OT-1 T-cells were resuspended in unconditioned medium (UCM), bone marrow stromal cell conditioned media (SCM), or adipocyte conditioned medium (ACM) at 1x10^6 cells/mL in a 24 well plate and incubated at 37 degrees C, 5% CO2 for 24h prior to being seeded on tension probes. Images were taken 15 minutes after seeding OT-1 T-cells on tension probes. Fluorescence intensity values of individual cells were quantified from at least 10 images in FIJI software."

No, I didn't figure out what "SIINFEKL" stands for. I asked Google what it stands for and its "AI Overview" gave me a blatantly wrong answer (ironically?). One paper referred to it as "the well-known ovalbumin epitope SIINFEKL" -- but it's not well know enough to have a Wikipedia page or a Science Direct summary page saying what it stands for and having a basic description of what it is. By the way the term "epitope" means the part of a molecule that activates an immune system response, especially the part of the immune system that adapts new responses, primarily T cell and B cell receptors. Stromal cells of various types exist throughout the body, but the question here refers specifically to bone marrow stromal cells. These are "progenitor" cells that produce bone and cartilage cells, and well as cells that function as part of the immune system, such as cells that produce chemokines, cytokines, IL-6, G-CSF, GM-CSF, CXCL12, IL7, and LIF (for those of you familiar with the immune system -- if you're not, I'm not going on a tangent within a tangent to explain what those are), though from what I can tell they don't produce T-cells or B-cells. T-cells and B-cells are produced in bone marrow, but not from stromal cells.

T-cells and B-cells are produced in bone marrow, but not from stromal cells. "OT-1" refers to a strain for transgenic mice sold by The Jackson Laboratory. CD8a is a gene that is expressed in T cells.

Anyway, let's get back to talking about PaperQA2.

So how did PaperQA2 do?

"We evaluate two metrics: precision, the fraction of questions answered correctly when a response is provided, and accuracy, the fraction of correct answers over all questions."

"In answering LitQA2 questions, PaperQA2 parsed and utilized an average of 14.5 papers per question. Running PaperQA2 on LitQA2 yielded a precision of 85.2% , and an accuracy of 66.0%, with the system choosing 'insufficient information' in 21.9% of answers."

"To compare PaperQA2 performance to human performance on the same task, human annotators who either possessed a PhD in biology or a related science, or who were enrolled in a PhD program, were each provided a subset of LitQA2 questions and a performance-related financial incentive of $3-12 per question to answer as many questions correctly as possible within approximately one week, using any online tools and paper access provided by their institutions. Under these conditions, human annotators achieved 64.3% +/- 15.2% precision on LitQA2 and 63.1% +/- 16.0% accuracy. PaperQA2 thus achieved superhuman precision on this task and did not differ significantly from humans in accuracy."

For "precision": PaperQA2 85.2%, Perplexity Pro 69.7%, human 64.3%, Geminin Pro 1.5 51.7%, GPT-4o 44.6%, GPT-4-Turbo 43.6%, Elicit 40.9%, Claude Sonnet 3.5 37.7%, Cloude Opus 23.6%.

For "accuracy": PaperQA2 66.0%, human 63.1%, Perplexity Pro 52.0%, Elicit 25.9%, GPT-4o 20.2%, GTP-4-Turbo 13.7%, Gemini Pro 1.5 12.1%, Claude Sonnet 3.5 8.1%, Claude Opus 5.2%.

Thumbnail
"Hasbro's CEO thinks D&D's adoption of AI Is inevitable."

"If you look at a typical D&D player... I play with probably 30 or 40 people regularly. There's not a single person who doesn't use AI somehow for either campaign development or character development or story ideas. That's a clear signal that we need to be embracing it. We need to do it carefully, we need to do it responsibly, we need to make sure we pay creators for their work, and we need to make sure we're clear when something is AI-generated."

Yet...

"Wizards of the Coast at large, at least so far, has been keen to emphasize that Dungeons & Dragons is a game about human creativity, made by actual people for actual people to play."

Thumbnail
OpenAI has created a new large language model that they call "o1", which has been "trained with reinforcement learning to perform complex reasoning." "o1 thinks before it answers -- it can produce a long internal chain of thought before responding to the user."

"OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

To put that in comparison, GPT-4o was at the 11th percentile for CodeForces programming competition coding. So between GPT-4o and o1 the improvement was from 11th percentile to 89th.

For AIME 2024, the improvement from GPT-4o to o1 was from 13.4-the percentile to 83.3-rd percentile.

For GPQA, for biology it's 63.2 to 68.4, for chemistry it's 43.0 to 65.6, for physics it's 68.6 to 94.2.

Quoting further:

"Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them."

"Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn't working. This process dramatically improves the model's ability to reason."

"Chain of thought reasoning provides new opportunities for alignment and safety. We found that integrating our policies for model behavior into the chain of thought of a reasoning model is an effective way to robustly teach human values and principles." "We believe that using a chain of thought offers significant advances for safety and alignment because (1) it enables us to observe the model thinking in a legible way, and (2) the model reasoning about safety rules is more robust to out-of-distribution scenarios."

"We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to 'read the mind' of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users."

Thumbnail
"Vomitorium is a command-line tool designed to easily load an entire project into a single text file. It recursively scans directories, processes files, and compiles their contents into a single output file. Useful for working with LLMs."

You have to install npm to install it. "npm" stands for "Node.js package manager". Node.js is another one of those vomit-inducing things. Well, for me -- your mileage may vary, as the old expression goes.