Boulder Future Salon

Thumbnail
East Germany during the cold war invented a "chemically hardened" glass. After the USSR collapsed and East Germany reunified with West Germany, the company, "Superfest Gläser", was closed. It turned out in the "capitalist" economy of the West, nobody wanted beer glasses that never break -- you make more money by selling glasses that do break.

There's a lesson there revenant to the future: understanding when the market lacks incentives for producing durable products, or actively favors its reverse, "planned obsolescence".

Apparently the East German Superfest Gläser chemically hardened glass was chemically different from Pyrex, the closest similar product we have in our economy. According to the article, the East German Superfest Gläser glass used a special potassium chloride solution that fused with the glass surface, filling in in micro ruptures within the glass structure, making the glass less prone to breaking. Pyrex, in contrast, is "borosilicate" glass, so called because it's made by combining regular glass with boric oxide. Pyrex's claim to fame isn't actually its hardness, it's its "low-thermal-expansion", making it ideal for measuring things with measurement lines that don't move around when the container is heated.

The East German glass was called "Ceverit".

"'Ce' stood for Chemisch (Chemically), 'ver' for verfestigt (hardened) and the 'it' stood for the silica component."

Thumbnail
So the claim is being made now that you can take any image -- a photo you've just taken on on your phone, a sketch that your child, or you just drew, or an image you generated using, say, Midjourney or DALL-E 3, and hand it to an AI model called Genie that will take the image and make it "interactive". You can control the main character and the scene will change around it. A tortoise made of glass, or maybe a translucent jellyfish floating through a post-apocalyptic cityscape.

"I can't help but point out the speed with which many of us are now becoming accustomed to new announcements and how we're adjusting to them."

"OpenAI Sora model has been out for just over a week and here's a paper where we can imagine it being interactive."

The Genie model is a vision transformer (ViT) model. That means it incorporates the "attention mechanism" we call "transformers" in its neural circuitry. That doesn't necessarily mean it "tokenizes" video, like the Sora model, but it does that, too. It also uses a particular variation of the transformer called the "ST-transformer" that is supposed to be more efficient for video. They don't say what the "ST" stands for but I'm guessing it stands for "spatial-temporal". It contains neural network layers that are dedicated to either spatial or temporal attention processing. This "ST" vision transformer was key to the creation of the video tokenizer, as what they did to create the tokenizer was use a "spatial-only" tokenizer (something called VQ-VAE) and modified it to do "spatial-temporal" tokenization. (They call their tokenizer ST-ViViT.)

After this there are two more neural network models. One of them takes the original frames, and one takes the video tokens and the output from the first model.

The first model is called the "latent action model". It takes video frames as input. Remember "latent" is just another word for "hidden". This is a neural network that is trained by watching videos all day. As it watches videos, it is challenged to predict later frames of video from previous frames that came before. In the process, it is asked to generate some parameters that describe what is being predicted. These are called the "latent actions". The idea is if you are given a video frame and the corresponding "latent actions", you can predict the next frames.

The second model is called the "dynamics" model. It takes tokens and the "latent actions" from the first model, and outputs video tokens.

Once all these models are trained up -- the tokenizer, the latent action model, and the dynamics model -- you're ready to interact.

You put in a photo of a tortoise made of glass, and now you can control it like a video game character.

The image you input serves as the initial frame. It gets tokenized and everything is tokenized from that point onward. The system can generate new video in a manner analogous to how a large language model generates new text by outputting text tokens. The key, though, is that by using the keyboard to initiate actions, you're inputing actions directly into the "latent actions" parameters. Doing so alters the video tokens that get generated, which alter all the subsequent video after that.

The researchers trained it on videos of 2D platformer games.

Thumbnail
LALALAND claims to be a modeling agency with no human models -- all AI.

"Showcase your 3D designs onto our industry-leading generative AI-models."

Thumbnail
AIflixhub aims to be your AI-generated movie platform. Commercial product with free tier. They say you can upload your existing assets, such as video clips, dialogue, sound effects, and music tracks. The system can combine these, and generate more, to produce your movie masterpiece. They say their AI tools to craft scripts, generate imagery, synthesize videos, create spoken dialogue, design sound effects, and compose soundtracks.

Thumbnail
GPTsApp.io claims to have 703,713 custom GPTs you can use or buy.

Makes me wonder how this compares with OpenAI's own GPT store (below)?

Thumbnail
Claim is being made that a scientific research paper where every figure was AI generated passed peer review.

Thumbnail
How "top heavy" is YouTube? 65% of videos have fewer than 100 views. 87% have fewer than 1,000 views. Only 3.7% of videos exceed 10,000 views, which is the threshold for monetization. Those 3.7% of views get 94% of views. The top 0.16% of videos get 50% of video views.

In other words, video views on YouTube follow a power law distribution, as you might have expected, but it's a lot steeper than you might have expected.

How was this figured out? Using a new but simple technique called "dialing for videos".

You may not realize it, but those YouTube IDs that look like a jumble of letters and numbers, like "A-SyeJaMMjI", are actually numbers. Yes, all YouTube video IDs are actually numbers. They're just not written in base 10. They're 64-bit numbers written in base 64. If you're wondering how YouTube came up with 64 digits, think about it: digits 0-9 give you 10, then lower case letters a-z give you 26 more, bringing you up to 36, then uppercase letters A-Z give you 26 more, getting you up to 62. You still need 2 more for that, and YouTube chose the dash ("-") and the underscore ("_").

Because the numbers are randomly chosen 64-bit numbers, there are 2^64 possibilities, which in decimal is 18,446,744,073,709,551,616. That's much too large to try every number or even numbers at random. But the researchers discovered a quirk. Through the YouTube API they could do searches, and YouTube would do the search in a case-insensitive way. Well, except not for the last character for some reason. And it would allow 32 IDs to be searched on in the same query. So the researchers were about to find 10,000 videos (well, 10,016 actually) by doing millions of searchers. This collection of 10,000 videos is likely to be more representative of all of YouTube than any other sample academic researchers have ever hard. All previous attempts have resulted in biased results because they were influenced either by the recommendation system, personalized search results, or just whatever secret algorithms YouTube has that determines how it ranks videos that it enables you to find.

How big is YouTube? Their estimate is 9.8 billion videos. Or at least that's how big it was between October 5, 2022, and December 13, 2022, which is when they did their data collection. Their paper was finally published last December.

By looking at what percentage of their sample were uploaded in any given year, they can chart the growth of YouTube:

Year - Percentage of sample
2005 - 0.00%
2006 - 0.05%
2007 - 0.22%
2008 - 0.43%
2009 - 0.74%
2010 - 1.13%
2011 - 1.67%
2012 - 1.86%
2013 - 1.97%
2014 - 2.34%
2015 - 3.02%
2016 - 4.25%
2017 - 5.39%
2018 - 6.73%
2019 - 8.81%
2020 - 15.22%
2021 - 20.29%
2022 - 25.91%

Translating those numbers into millions of videos (remember, a thousand million is a billion), we get this list:

2005 - 0
2006 - 5
2007 - 27
2008 - 69
2009 - 142
2010 - 254
2011 - 418
2012 - 602
2013 - 796
2014 - 1,072
2015 - 1,325
2016 - 1,745
2017 - 2,278
2018 - 2,943
2019 - 3,813
2020 - 5,316
2021 - 7,321
2022 - 9,881

73% of videos had no comments. 1.04% of videos had 100 comments or more, and those accounted for 55% of all comments in the sample.

"Likes" are evn more skewed, with 0.08% of videos getting 55% of likes.

YouTube disabled the "Dislike" buttons in 2021.

Most channels had at least one subscriber and the average was 65. Subscriber counts, while less "top heavy", turned out to be weakly correlated with views. The researchers estimate 70% of views of any given video come from algorithms and not from subscribers or external links pointing to a video.

Median video length was 615 seconds (10 minutes, 15 seconds). 6.2% were 10 seconds or less, 38% were 1 minute or less, 82% were ten minutes or less, and only 3.9% were an hour or more.

Words that occurred most in metadata tags included "Sony" and "Playstation".

The researchers employed hand-coders to hand-code a subsample of 1,000 videos. They found only 3% of videos had anything to do with news, politics, or current events. 3.8% had anything to do with religion. 15.5% had just still images for the video part. (I actually see a lot of music videos like this -- just an album cover or photo of the artist and the rest is audio.). 19.5% were streams of video games. 8.4 was computer-generated but not a video game. 14.3% had a background design indicating they were produced on some sort of set. 84.3% were edited. 36.7% had text or graphics overlaid on the video. 35.7% was recorded indoors. 18.1% was recorded outdoors. (The remainder were both or unclear.) Cameras were "shaky" 52.3% of the time. A human was seen talking to the camera 18.3% of the time. 9.1% of videos recorded a public event. The video was something obviously not owned by the uploader, such as a movie clip, 4.8% of the time.

Sponsorships and "calls to action" were only present in 3.8% of videos.

96.8% of videos had audio. 40.5% were deemed by coders to be entirely or almost entirely music. Many of these were backgrounds for performances, video game footage, or slide shows.

53.8% had spoken language. 28.9% had spoken language on top of music.

For languages, "we built our own language detection pipeline by running each video's audio file using the VoxLingua107 ECAPA-TDNN spoken language recognition model."

Language distribution was:

English: 20.1%
Hindi: 7.6%
Spanish: 6.2%
Welsh: 5.7%
Portuguese: 4.9%
Latin: 4.6%
Russian: 4.2%
Arabic: 3.3%
Javanese: 3.3%
Waray: 3.2%
Japanese: 2.2%
Indonesian: 2.0%
French: 1.8%
Icelandic: 1.7%
Urdu: 1.5%
Sindhi: 1.4%
Bengali: 1.4%
Thai: 1.2%
Turkish: 1.2%
Central Khumer: 1.1%

"It is unlikely that Welsh is the fourth most common language on YouTube, for example, or that Icelandic is spoken more often than Urdu, Bengali, or Turkish. More startling still is that, according to this analysis Latin is not a 'dead language' but rather the sixth most common language spoken on YouTube. Of the top 20, Welsh, Latin, Waray-Waray, and Icelandic are not in the top 200 most spoken languages, and Sindhi and Central Khmer are not in the top 50 (Ethnologue, 2022). The VoxLingua107 documentation notes a number of languages which are commonly mistaken for another (Urdu for Hindi, Spanish for Galician, Norwegian for Nynorsk, Dutch for Afrikaans, English for Welsh, and Estonian for Finnish), but does not account for the other unusual results we have seen. We thought that some of the errors may be because of the amount of music in our sample, but removing the videos that are part of YouTube Music (which does not include all music) did not yield significantly different results."

"It is worth highlighting just how many of the most popular languages are not among the languages available in the YouTube autocaptioning system: Hindi, Arabic, Javanese, Waray-Waray, Urdu, Thai, Bengali, and Sindhi."

Thumbnail
"The Department of Commerce's National Telecommunications and Information Administration (NTIA) launched a Request for Comment on the risks, benefits and potential policy related to advanced artificial intelligence (AI) models with widely available model weights."

If you have an opinion as to whether "open-weight" models are dangerous or not, you can submit a comment to the NTIA.

"Open-weight" means the weights of the model are made public, as opposed to the source code (that would be "open source") or the training data being made public. With the model weights, you can run the model on your own machine without the source code or training data or going through the compute-intensive training process.

Thumbnail
Japanese Yakuza Leader charged with trafficking nuclear material.

"As alleged, the defendant brazenly trafficked material containing uranium and weapons-grade plutonium from Burma to other countries. He did so while believing that the material was going to be used in the development of a nuclear weapons program."

Thumbnail
"You may assume that young people are less likely than their parents to trade freedom for security. But growing up in a world where behaviour is constantly tracked by the technology that they use in their day-to-day lives may be creating a generation that's desensitised to mass surveillance -- so much that they actively support it, even in its most dystopian forms."

Huh, that is interesting. My parents, now in their 80s, hate the idea that their online activity is tracked. But my experience with young people, admittedly limited, is that some don't care while some *really* care about being tracked by electronics. So it's kind of a split reaction.

In this article they cite a poll, "Would you favor or oppose the government installing surveillance cameras in every household to reduce domestic violence, abuse, or other illegal activity?"

65+: 5%
55-64: 6%
45-54: 6%
30-44: 20%
18-29: 29%

Thumbnail
The world's first RISC-V laptop.

Thumbnail
"Attention deficits linked with proclivity to explore while foraging".

"Our findings suggest that ADHD attributes may confer foraging advantages in some environments and invite the possibility that this condition may reflect an adaptation favouring exploration over exploitation."

I first encountered the "exploration over exploitation" in the context of reinforcement learning in computer science. The basic idea is, should you go to your favorite restaurant, or a restaurant you've never been to before? If you're in a new city where you've been to few restaurants, you should probably go to a new one. If you're in a city where you've lived for 10 years, and have been to most restaurants, maybe just go to the favorite. Where is the crossover point in between? You get the idea. Do you "exploit" the knowledge you have already, or do you "explore" to obtain more knowledge?

For the simplest cases, mathematicians have come up with formulas, and for complex cases, computer scientists have run simulations. In reinforcement learning, algorithms often have a tunable "hyperparameter" that can be used to increase the "exploration". Some problems require more "exploration" than others.

It appears the process of evolution may have evolved a variety of personalities to prioritize "exploration" or "exploitation".

Thumbnail
"I know what the sound of a laugh is, but I can't hear it in my mind. I have no memories with sounds."

"Jessie only discovered that this was unusual when, by chance, she met a researcher who studies people like her."

The term "anauralia" has been coined for the auditory analogue to aphantasia -- the inability to picture images in one's "mind's eye".

Thumbnail
Yvonne Burkart says the toxic load increases with each generation, causing disease to appear at younger ages. Toxins are in cosmetics, consumer products, fragrances, processed foods ("natural" flavoring does not prevent this), food packaging, water pollutants, air pollutants, and so on. Yvonne Burkart is a board-certified toxicologist.

The conversation is long and wide ranging, including such things as glutathione and microplastics and other endocrine disrupters, water and air filters, food choices, and so on.

Thumbnail
ChatGPT plugins are going away. New conversations will not be allowed to use plugins on March 19, 2024. On April 9, existing conversations will not be allowed to continue.

I've been using the Wolfram|Alpha plugin. It looks like there's a "GPT" now. I'll have to try that out.

Thumbnail
"Intel unveiled a new roadmap that includes a new 14A node, the industry's first to use High-NA EUV, here at its Intel Foundry Services Direct Connect 2024 event."

Intel no longer uses "nanometers" to refer to its "process nodes", so I don't know what "14A" means, but, there is a quote somewhere of Intel CEO Pat Gelsinger saying 14A produces "1.4 nanometer technology." Maybe "14A" means 14 angstroms.

The other term in there is "High-NA EUV". "NA" stands for "numerical aperture". But to understand the significance of that we have to take a few steps back.

The company that makes the semiconductor manufacturing equipment is ASML (Advanced Semiconductor Materials Lithography).

Chips are made through a process called photolithography, which involves shining light through a chip design in such a way that it is miniaturized, and through a process using a lot of complicated chemistry, that pattern can be etched into the surface of the silicon and turned into an electronic circuit. These circuits have gotten so small that visible light has wavelengths too big to make the chip. Chipmakers predictably went to ultraviolet light, which has shorter wavelengths. That worked for a time, but suddenly a problem came up, which is that air was opaque to the wavelengths they wanted to use.

To us, we think of air as transparent, and for the visible wavelengths that our eyes use, it is pretty much perfectly transparent. But it is not transparent at all wavelengths. At certain ultraviolet wavelengths, it's opaque like black smoke.

This is why the semiconductor industry had to make the sudden jump from using lasers that emit light at 193 nanometers to lasers that emit light at 13.5 nanometers. (13.5 was chosen because people just happened to know how to make light at that frequency with a tin plasma laser.) Jumping the chasm from 193 to 13.5 jumps across the wavelengths where air is opaque. 193 has been called "deep ultraviolet", or DUV. 13.5 is called "extreme ultraviolet", or EUV. So whenever you see "EUV", which we see here in the phrase "Nigh-NA EUV", that's what it's talking about.

Making this jump required rethinking all the optics involved in making chips. Mainly this involved replacing all the lenses with mirrors. Turns out at 13.5 nanometers, it's easier to do optics with reflective mirrors than transparent lenses.

Besides decreasing the wavelength (and increasing the frequency) of the light, what else can be done?

It turns out there's two primary things that determine the limit of the size you cat etch: the light wavelength and the numerical aperture. There's some additional factors that have to do with the chemistry you're using for the photoresists and so fourth, but we'll not concern ourselves with those factors at the moment.

So what is numerical aperture? If you're a photographer, you probably already know, but it has to do with the angle at which a lens can collect light.

"The numerical aperture of an optical system such as an objective lens is defined by:

NA = n sin(theta)

where n is the index of refraction of the medium in which the lens is working (1.00 for air, 1.33 for pure water, and typically 1.52 for immersion oil), and theta is the half-angle of the maximum cone of light that can enter or exit the lens."

As for "the medium in which the lens is working", note that ASML used water immersion with deep ultraviolet (193 nanometer light and higher) to achieve an NA greater than 1. This hasn't been done for extreme ultraviolet (13.5 nanometer light).

The increase in numerical aperture that ASML has recently accomplished, and that Intel is announcing they are using, is an increase from 0.33 to 0.55. (Numerical aperture is a dimensionless number.)

How did ASML achieve this increase? Their page on "5 things you should know about High NA EUV lithography" (link below) gives a clue. One of the 5 things is, "larger, anamorphic optics for sharper imaging".

The page refers to "EXE" and "NXE". These refer to ASML's own equipment. NXE systems have a numerical aperture of 0.33, but with the EXE systems, ASML has increased it to 0.55.

"Implementing this increase in NA meant using bigger mirrors. But the bigger mirrors increase the angle at which light hit the reticle, which has the pattern to be printed."

You're probably not familiar with the term "reticle". Here the meaning is different from normal optics. In normal optics, it refers to a scale that you might see in a microscope scope. But here, it has to do with the fact that chips are no longer manufactured with the entire pattern for the whole chip all in one shot. Instead, a pattern for only a small portion of the wafer is used at a time, and then stepper motors move the wafer and the process is repeated. This small portion of the pattern that is used at a time is called the "reticle".

"At the larger angle the reticle loses its reflectivity, so the pattern can't be transferred to the wafer. This issue could have been addressed by shrinking the pattern by 8x rather than the 4x used in NXE systems, but that would have required chipmakers to switch to larger reticles."

"Instead, the EXE uses an ingenious design: anamorphic optics. Rather than uniformly shrinking the pattern being printed, the system's mirrors demagnify it by 4x in one direction and 8x in the other. That solution reduced the angle at which the light hit the reticle and avoided the reflection issue. Importantly, it also minimized the new technology's impact on the semiconductor ecosystem by allowing chipmakers to continue using traditionally sized reticles."