Boulder Future Salon

Thumbnail
"EnCharge AI, a semiconductor startup developing analog memory chips for AI applications, has raised more than $100 million in a Series B round led by Tiger Global to spur its next stage of growth."

"Santa Clara-based EnCharge claims its AI accelerators use 20 times less energy to run workloads compared with other chips on the market, and expects to have the first of those chips on the market later this year."

What do y'all think? Do analog chips have a chance, or does Nvidia with CUDA have everybody locked in regardless?

Thumbnail
ChatGPT is trying to learn everything about you. I asked ChatGPT why I should learn Chinese, and it goes, "Since you're already interested in international music, learning Mandarin could help you appreciate C-pop on a deeper level. Also, given your love for truth-seeking, studying Chinese philosophy in its native language could be incredibly rewarding!"

And I was like, hey! I asked about international music in a *completely* *different* session! ChatGPT is remembering stuff you asked in different sessions and using that stuff in the current session. I actually liked the fact that each session started from scratch, from a blank slate.

Except not any more! Some brainiac at OpenAI got the idea that ChatGPT should become an expert on you and remember stuff from session to session. I would be fine with that if I asked for it or it was a feature I could turn on and off. But somebody just flipped it on without me asking or even knowing -- until now.

Later on it goes,

"Since you love music & international pop, try: Translating lyrics of your favorite C-pop songs, singing along to Chinese karaoke (KTV), and listening to Chinese radio stations (like CCTV or CRI)."

See below for receipts.

Brrrrrp! You can turn it off in the settings. (See below.)

Thumbnail
"CUTECat: Concolic execution for computational law."

By "computational law" they mean laws that specify computations -- for instance, "the amount of taxes that a household owes, the wages of civil state servants, or whether a given family is eligible to social benefits depending on their situation."

At first, I thought this was going to be an AI system for analyzing laws and figuring out what the correct computations should be. Nope, that's not what this is. This is a programming language, designed to be *embedded* in the laws. That is, intended to be readable by lawmakers and lawyers and computers.

"We focus our attention on the application of concolic unit testing, a combination of concrete execution with SMT-based symbolic execution, which we believe to hit a sweet spot when it comes to reasoning about computational law."

The word "concolic" is a portmanteau of "concrete" and "symbolic". The idea is you use a "symbolic" analyzer of your computer program -- generally with an automated theorem prover or constraint solver -- but rathen that have it output whether your program has flaws or not, what it generates is test inputs, that you then actually run through the actual program -- these are the "concrete" inputs that make up the "concrete" part of the name.

"We show how to concolically execute computational law programs by providing a formal concolic semantics for default terms. Relying on this formal model, we then implement CUTECat, a concolic execution execution engine for Catala programs."

Catala is a "domain-specific programming language designed for deriving correct-by-construction implementations from legislative texts."

For example, the following legislation (from the US Tax Code) says:
"Section 132 - (c) Qualified employee discount defined - (1) Qualified employee discount
The term 'qualified employee discount' means any employee discount with respect to qualified property or services to the extent such discount does not exceed --
(A) in the case of property, the gross profit percentage of the price at which the property is being offered by the employer to customers"

gives us the following Catala code:

./section_132.catala_en
scope QualifiedEmployeeDiscount :
definition qualified_employee_discount
under condition is_property consequence
equals
if employee_discount >=
customer_price * gross_profit_percentage
then customer_price * gross_profit_percentage
else employee_discount

"SMT" refers to "satisfiability modulo theories" solvers. In mathematics there is the concept of the Boolean satisfiability problem, which is called "SAT", and "SAT solvers" are programs that find solutions (approximate solutions for very large problems). "SMT" is a more general version that allows formulas involving real numbers, integers, strings, and complex data structures such as lists. The "m" standing for "modulo" means "within", and means a given SMT solver is "within" some formal theory.

The Catala programming language does not allow loops or recursion, which makes it much more amenable to analysis by an SMT solver than a typical computer program.

"In concolic execution, the program is executed with concrete inputs, and instrumented to collect symbolic constraints during execution. When one execution terminates, some of the constraints in the path condition are negated and new inputs are generated through a call to an SMT solver, leading to the exploration of a new program path."

"Compared to symbolic execution, concolic execution has several advantages. First, when faced with unsupported logical theories, concolic execution can still make progress by executing the program, instead of getting stuck when encountering complex constraints. Second, by relying on concrete inputs covering a wide range of program paths, concolic execution naturally provides a reproducible testbed that can be used for automated unit testing, even when symbolic execution would be imprecise due to complex program statements or calls to libraries whose code is unavailable. Last, while the generation of new inputs is typically performed by an automated solver to guarantee the exploration of new program paths, it can also be combined with other input generation techniques such as fuzzing."

CUTECat stands for... eh, I'm not sure. I'm guessing something like "Concolic Unit Testing Engine + Catala".

"CUTECat aims to detect all runtime errors, such as divisions by zero but also law inconsistencies due to missing, or conflicting interpretations. CUTECat includes several optimizations, both to improve its performance and scalability, as well as its usability, aiming to generate human-friendly testcases to facilitate legal interpretation by lawyers. Finally, we experimentally evaluate CUTECat on a range of Catala programs. Relying on real-world codebases from French and US laws, we first perform an ablation study to evaluate the impact of our different optimizations. We then conclude by empirically demonstrating that CUTECat can scale to the largest real-world Catala codebase currently available, namely, the implementation of the French housing benefits (19,655 lines of Catala, including specification), generating 186,390 tests in less than 7 hours of CPU time."


"To allow integration of Catala implementations in existing projects, the compiler translates Catala source code into mainstream programming languages like C or Python. To do so, it relies on a series of intermediate representations (IRs), elided here, until reaching the default calculus."

The term "default calculus" has to do with the idea that the Catala programming language uses "default logic", which is a system of logic where instead of specifying all true/false combinations for something, you specify a default and then just mention the exceptions.

Example: case 2: "income is $0 and there are three children. In that case, both exceptions (lines 35-38 and 46-49) can be applied. A conflict is raised to the user, showing that the law has been unfaithfully translated, or that it is ambiguous."

"The default calculus is one of the last IRs in the compilation pipeline, and is very close to the statement calculus which transforms expressions into statements before emitting C or Python code. Default terms are explicitly materialized in the default calculus; it is where the reference Catala interpreter operates." "This is therefore a natural choice for the implementation of our concolic engine; additionally, being located at the same compilation step of the interpreter nullifies any discrepancies of potential bugs introduced earlier in the compilation pipeline. We implemented our concolic interpreter as a fork of the standard interpreter, instrumenting it to collect symbolic constraints."

"CUTECat is implemented in 3,400 lines of OCaml code, and relies on the Z3 SMT solver."

3,400 -- that's pretty small.

Thumbnail
Boom Supersonic made a plane called Overture that is supersonic but doesn't make a sonic boom. Obviously they have a sense of humor calling the company that makes a supersonic plane with no sonic boom "Boom Supersonic".

"In Mach cutoff, the sonic boom refracts in the atmosphere and never reaches the ground. Exact speeds and altitude vary based on atmospheric conditions."

"Overture's advanced autopilot will continuously optimize speed for Boomless Cruise based on real-time atmospheric conditions. Boomless Cruise is possible at speeds up to Mach 1.3, with typical speed between Mach 1.1 and 1.2."

"Boomless Cruise leverages well-known Mach cutoff physics, where a sonic boom refracts upward due to temperature and wind gradients affecting the local speed of sound. This is similar to how light bends when passing through a glass of water. By flying at a sufficiently high altitude at an appropriate speed for current atmospheric conditions, Overture ensures that its sonic boom never reaches the ground."

"Unlike 'low boom' designs, Boomless Cruise does not require extensive aircraft shaping. Instead, it relies on the ability to break the sound barrier at a high enough altitude where the boom refracts harmlessly away from the ground."

"On Overture, Boomless Cruise is specifically enabled by the Symphony engines. These engines feature enhanced transonic performance compared to commercially derived engines, allowing Overture to efficiently transition to supersonic speeds at altitudes above 30,000 feet."

"Additionally, Boomless Cruise is enabled by an advanced autopilot which uses current weather conditions and software algorithms to automatically select the optimal speed for Boomless Cruise."

"Boomless and low boom are distinct concepts. In Boomless Cruise, Overture's sonic boom does not reach the ground. However, at its full Mach 1.7 speed over water, Overture will generate a sonic boom."

"Overture is designed to be most efficient at high subsonic speeds of Mach 0.94 and supersonic speed of Mach 1.7. As aerodynamic drag increases closer to speed of sound, there is a modest increase in fuel burn at low supersonic speeds. Even with this drag penalty, Overture still has plenty of range to fly the longest transcontinental routes, such as Vancouver-Miami."

Thumbnail
YCGPT: Your Y Combinator Advisor. For those of you who are starting startups and thinking of applying to YC.

Thumbnail
YouDJ. I guess this has been around for a while but I never saw it until today. This is pretty awesome. Pretty impressive for all in a browser. Anyone can be a DJ, just go through the tutorial boxes.

Thumbnail
New research on AI and code quality. The people who did the study last year showing AI increases "churn" did a follow-up study this year.

If you were thinking, maybe AI models getting smarter and smarter means that AI-generated code is getting better and needs to be fixed less, well...

In this study, the researchers analyzed 211 million lines of code from more than 100 open-source repositories on GitHub, and code "churn", which is defined as changing a line of code within 2 weeks of adding it, increased from 3.31% in 2021 (the year before ChatGPT came out) to 4.50% in 2022, to 5.67% in 2023, to 6.87% in 2024.

If a line of code that is "churned" is "churned" again, it is not counted twice -- the "churn" number counts only changes -- only the first change -- to a line of code within 2 weeks of it being added.

The theory is that the higher the "churn" rate, the more likely the code added was bad to start with. The churn rate appears to be going upward as more and more developers use AI to generate more and more code, instead of going downward as AI models get better, as you may have thought (or hoped).

In addition to the churn rate, this study measured lines added, lines deleted, lines updated, lines moved, lines "copy/pasted", and lines "find/replaced". I put "copy/pasted" in quotes because what is likely happening is the AI system is generating the same (or very similar) lines of code over and over. The researchers developed a system to detect the same exact sequence of 5 or more lines of code appearing over and over. "Find/replaced" is defined as "a pattern of code change where the same string is removed from 3+ locations and substituted with consistent replacement content."

Lines added went from 40.86% (of changed lines, not of total lines in a codebase) in 2021 to 49.32% in 2024. Deleted went from 21.17% to 23.00%. Updated went from 5.57% to 6.33%. Moved went from 15.88% to 3.10%. Copy/pasted went from 10.64% to 13.90%. Find/replaced went from 4.11% to 4.35%.

Most striking in all this is that "moved" went down so much, while "added" increased quite a bit and "copy/pasted" went up, too.

What this means is that developers are doing dramatically less restructuring of their codebases, which in the industry is called "refactoring". Developers are doing more and more code adding, and less and less refactoring.

The researchers discuss "the scourge of duplicated code blocks."

"'So what?' an executive may wonder. So long as the code works, does it really matter that it might be repeated? Both first principles and prior research suggests that it does matter."

"From first principles, it is evident that repeated blocks impose the burden of deciding: should a cloned block change be propagated to its matching siblings? When a conscientious developer deems it prudent to propagate their change, their scope of concern is multiplied by the number of duplicated blocks."

"Instead of focusing on the specific Jira they had been assigned, the developer must now tack-on the effort of understanding every system in the repo that duplicates the code block they are changing."

By "specific Jira", I think they mean specific issue in Jira, a bug tracking, issue tracking, and general agile project management software product from Atlassian. It's probably the one the researchers themselves use, which is why they got into the habit of saying "specific Jira" while forgetting when writing this paper that there are lots of bug tracking systems and not everyone uses Jira (I don't). Let's continue...

"Likewise for every developer who reviews the pull request that changes code from disparate domains. They, too, must acquaint themselves with every context where a duplicated block was changed, to render opinion on whether the change in context seems prone to defect."

"Next, each changed code domain must be tested. Picking the method to test these 'prerequisite sub-tasks' saps the mental stamina of the developer. And because the impromptu chore of 'updating duplicate blocks' is rarely-if-ever budgeted into the team's original Story Point estimate, duplication is a significant obstacle to keeping on schedule."

"Story Point" is more Jira-specific terminology. The idea is you estimate the time it takes to implement a change (new feature or bug fix) to software in "story point" units instead of calendar time, because if the developer has to do tasks unrelated to the software change (emails, meetings, etc), then the calendar time won't be meaningful. For non-Jira people, the point being made here is that a developer is budgeted a certain amount of time to implement a change, and the unexpected discovery that the code they are changing is duplicated all over the place causes the time budget to likewise get unexpectedly exceeded.

"As a developer falls further behind their target completion time, their morale is prone to drop further. From the developer's perspective, this task that had seemed so simple, has now ballooned into systems where they might have little-or-no-familiarity. All this burden, without any observable benefit to teammates or executives."

"'Exploring the Impact of Code Clones on Deep Learning Software' is a 2023 paper by Ran Mo, Yao Zhang et. al that seeks to quantify the impact of code clones in software that utilizes deep learning. With regard to the prevalence of duplicated code, the researchers find that '[Deep Learning projects exhibit] about 16.3% of code fragments encounter clones, which is almost twice as large as traditional projects.' By analyzing 3,113 pairs of co-changed code lines, the researchers observe that 57.1% of all co-changed clones are involved in bugs."

"The Citations offer a multitude of research that has shown clones in traditional software systems prone to cause bugs."

So, the bottom line is: AI enables developers to produce new code substantially faster at the cost of long-term maintainability. Code is added about ~20% faster, but "churn" is ~50% higher, code duplication is ~30% higher, and refactorization is ~80% lower.

Bill Harding, the lead researcher, is interviewed in the video. You have to fork over your email to his company (GitClear) to get the actual paper. (Which I did. I have a copy of it, that's where I got the numbers above.)

"This was the first year that we switched over from having more moved and refactored code to more copy pasted code. 2024 was I guess a dubious milestone in that regard, and that seems very consequential for teams that are going to have a long-lived repo."

Thumbnail
A young "technologist" known online as "Big Balls" who works for Elon Musk's "so-called" Department of Government Efficiency (DOGE) briefly worked at Path Network, a network monitoring firm known for hiring reformed criminal hackers (including Eric Taylor, also known as "Cosmo the God," allegedly a member of the hacker group UGNazis, and Matthew Flannery, an Australian convicted hacker allegedly a member of the hacker group LulzSec), allegedly solicited a cyberattack-for-hire service -- or not, maybe that was someone else using his Telegram handle -- founded a company called Tesla.Sexy LLC, which offers a service called Helfie, which is an AI bot for Discord servers targeting the Russian market, and another called faster.pw, that is currently inactive but an archived version shows the service provided "multiple encrypted cross-border networks" in Chinese, and worked at Elon Musk's Neuralink brain implant startup before joining DOGE, enabling him to circumvent the requirement for a security clearance -- at the age of 19. What movie am I watching?

Thumbnail
Click-through rates on Google go way down when AI Overviews are shown -- but go up when a particular brand is mentioned in the AI Overview and that brand has a link (paid or not). Not only that, but paid click-through rates are declining over time with or without AI Overviews. But the links that aren't paid -- the "organic" links -- haven't seen a decline and may actually increase when then is no AI Overview.

Thumbnail
"Agent Mode" for GitHub Copilot.

The video demos a project that has a website with a page listing races, and we ask GitHub Copilot to add the ability to search for a race by name. (For those of you who would rather read than watch a video, see below.) To do this, it has to change services code, server-side code, UI code, and test code. "Agent Mode" reasons and iterates the reasoning process to perform the task. Once the code is updated it prompts the user to re-run the tests, which fail because they don't include the new functionality, which Copilot detects and makes the changes to the tests. The process is repeated until all tests pass.

Copilot is subsequently assed to associate races with fundraisers. This time Copilot is given a prompt file (in markdown format) that tells Copilot everything it is supposed to do.

Thumbnail
AlphaGeometry 2 with AlphaProof from DeepMind claims to have "solved four out of six problems from this year's International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time."

Thumbnail
There's a city in China you've never heard of that has 36 million people -- the largest in the world. It's not Shanghai or Beijing, it's Chongqing... or so this YouTuber (Drew Binsky) says.

Well, citypopulation.de says Chongqing is number 43, right after... eh, do I really have to list out 42 cities? It would be much simpler if it was just a handful. Oh, well, let's do this. Chongqing is at number 43, at 10.9 million, right after: Changsha (11.0 million), Hyderabad (11.4 million), Tianjin (11.5 million), Paris (11.5 million), Lima (11.8 million), Wuhan (12.2 million), Rio de Janeiro (12.5 million), Chennai (12.6 million), Xi'an (12.8 million), Ho Chi Minh City (Saigon) (13.9 million), Hangzhou (13.9 million), Bengaluru (Bangalore) (14.2 million), Lahore (14.5 million), Johannesburg (14.6 million), Xiamen (14.9 million), London (14.9 million), Kinshasa (15.6 million), Istanbul (15.9 million), Tehran (16.5 million), Buenos Aires (16.7 million), Los Angeles (17.2 million), Chengdu (17.3 million), Osaka (17.7 million), Kolkata (Calcutta) (17.7 million), Moskva (Moscow) (19.1 million), Lagos (20.7 million), Karachi (20.9 million), Krung Thep (Bangkok) (21.2 million), Beijing (21.2 million), New York (22.0 million), São Paulo 22.1 million, Dhaka (22.5 million), Al-Qahirah (Cairo) (22.5 million), Seoul (25.1 million), Ciudad de México (Mexico City) (25.1 million), Mumbai (27.1 million), Manila (27.2 million), Jakarta (29.2 million), Delhi (34.6 million), Shanghai (40.8 million), Tokyo (41.0 million), and (drum roll please...) Guangzhou (70.1 million).

If you've never heard of Guangzhou, you probably have heard of Shenzhen. Well, what happened is a bunch of cities in the Northern Pearl River Delta in China grew to the point where they basically all merged together into a single metropolitan area. These include Shenzhen, Dongguan, Foshan, Huizhou, Jiangmen, Zhongshan, and, yes, Guangzhou, which, for some reason, became the name for the whole agglomeration, rather than Shenzhen.

It seems to me like the reason he thinks Chongqing has more people than Guangzhou is because Chongqing is *denser*. The subjective feeling of "population" probably correlates mostly with population density. According to the numbers, Los Angeles has more people, with 17.2 million people, but I've been to Los Angeles (it's the largest city I've been to), and LA is the definition of "suburban sprawl". It's a place where you can drive for 3.5 hours in one direction and see nothing but houses and shopping malls. I've never been to New York, the largest city in the US, or Chicago, the 3rd largest, but it doesn't seem like from videos that they go "3D" and stack people on top of each other to the degree Chongqing does.

They say "5D" but I don't know where they're getting 4th and 5th dimensions -- it looks 3D to me.

I found a list of cities by population density, but, weirdly, when I look at a bunch of those top places on Google Street View (Port-au-Prince, Giza, Manila, Mandaluyong, Malé, Dhaka, Bnei Brak, Kolkata, Kathmandu), the look less crowded than Chongqing in the video. Well, I suppose there's always the possibility the statistics on citypopulation.de are wrong? And maybe Chongqing really is the world's biggest city, like the YouTuber insists? How are all these people counted, anyway?

Thumbnail
Find stuff in aerial images.

GeoDeep is "a fast, easy to use, lightweight Python library for AI object detection and semantic segmentation in geospatial rasters (GeoTIFFs), with pre-built models included."

See the example images of where it puts bounding boxes on cars, swimming pools, and tennis courts, and draws an outline that follows roads. It has a "detect" system that outputs class labels with confidence scores (class labels like "small vehicle" or "tennis court", confidence scores like 0.838), and a "segment" system that gives you outlines of roads and buildings.

Thumbnail
Physical unclonable function (PUF) technology. This article is from last November but I only just saw it today, and I'm sharing it anyway because I only just now discovered this technology is a thing that exists.

In cryptography, you can do something called a "challenge-response" where you generate some random input, and the device combines it with a secret key using some algorithm to generate some output, which you can check to see if it's correct and the device is authentic. This relies on both the challenger and the hardware device having the shared secret, but not any attackers.

If an attacker should get their hands on the physical device, though, they can copy it. If the key is stored in ROM (read-only memory), it can simply be copied out of the memory by the attacker and then they can make unlimited copies of the device by assembling the same components and making ROM with the same key.

What physical unclonable function (PUF) technology does is render this literally impossible. PUFs exploit random physical factors introduced during semiconductor manufacturing that are unpredictable and uncontrollable. As such, it is impossible to manufacture a copy, even by an attacker who has access to the same semiconductor manufacturing equipment as the original manufacturer.

"Due to deep submicron manufacturing process variations, every transistor in an IC has slightly different physical properties. These variations lead to small but measurable differences in electronic properties, such as transistor threshold voltages and gain factor. Since these process variations are not fully controllable during manufacturing, these physical device properties cannot be copied or cloned."

"By utilizing these inherent variations, PUFs are very valuable for use as a unique identifier for any given IC. They do this through circuitry within the IC that converts the tiny variations into a digital pattern of 0s and 1s, which is unique for that specific chip and is repeatable over time. This pattern is a 'silicon fingerprint,' comparable to its human biometric counterpart."

The article is about a particular type of PUF called an SRAM PUF. Synopsys, a company that makes software used for designing chips, has a circuit design that you can add to the circuit design for a SRAM (combined before manufacturing at the "intellectual property" -- IP -- stage) to get it to generate a "root key" when the device is first started up.

Thumbnail
"Ex-Google, Apple engineers launch unconditionally open source Oumi AI platform that could help to build the next DeepSeek."

"What neither DeepSeek nor Llama enables, however, is full unconditional access to all the model code, including weights as well as training data. Without all that information, developers can still work with the open model but they don't have all the necessary tools and insights to understand how it really works and more importantly how to build an entirely new model. That's a challenge that a new startup led by former Google and Apple AI veterans aims to solve."

"Launching today, Oumi is backed by an alliance of 13 leading research universities including Princeton, Stanford, MIT, UC Berkeley, University of Oxford, University of Cambridge, University of Waterloo and Carnegie Mellon. Oumi's founders raised $10 million, a modest seed round they say meets their needs. While major players like OpenAI contemplate $500 billion investments in massive data centers through projects like Stargate, Oumi is taking a radically different approach. The platform provides researchers and developers with a complete toolkit for building, evaluating and deploying foundation models."

The $10 million makes me wonder if this has a chance of working. But, let's continue. (Lots of quotes follow.)

"Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models -- from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need."

"With Oumi, you can: Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more), work with both text and multimodal models (Llama, DeepSeek, Qwen, Phi, and others), synthesize and curate training data with LLM judges, deploy models efficiently with popular inference engines (vLLM, SGLang), evaluate models comprehensively across standard benchmarks, run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more), and integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...).

"All with one consistent API, production-grade reliability, and all the flexibility you need for research."

"Here are some of the key features that make Oumi stand out:"

"Zero Boilerplate: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines,"

"Enterprise-Grade: Built and validated by teams training models at scale,"

"Research Ready: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component,"

"Broad Model Support: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal,"

"State-Of-The-Art (SOTA) Performance: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang),"

"Community First: 100% open source with an active community. No vendor lock-in, no strings attached."

Thumbnail
"Google's updated, public AI ethics policy removes its promise that it won't use the technology to pursue applications for weapons and surveillance."