|
"Meta AI researchers introduced a scalable byte-level autoregressive U-Net model that outperforms token-based transformers across language modeling benchmarks."
You know, when I first learned about embeddings -- now usually called tokens -- I thought it was a temporary thing. We first go through some strange process to convert words to tokens, which are vectors in a high dimensional space (e.g. 400 dimensions). Once we have this "dictionary" that maps words to vectors, we then take any random input some user types and convert it to vectors, and then feed these vectors into the neural network. On the other end, we take the output vectors and use the dictionary to convert them back to words.
I thought in time, the raw input would be what gets fed into the neural network, without the conversion to "tokens". This token system, incidentally, is why when you ask LLMs "How many 'r's are in 'strawberry'", they get it wrong. (Actually I've heard they get it right now, just because there's so many examples of humans talking about it that are now part of their training data.) LLMs can't count 'r's because the word "strawberry" gets converted to a vector and the vector is what gets input to the model. The vector reflects the meaning of the word, not its spelling. The neural network has no idea how it's spelled. It can't count 'r's.
The mapping of words to tokens seemed like it had to be somewhat arbitrary and dependent on the algorithm used to make the tokens, so I figured it would be a temporary stepping-stone. Over time, though, I begrudgingly accepted that I was wrong and tokenization was here to stay. All the language models were building on it as a foundation -- sometimes developing their own tokens, sometimes adopting an industry standard, but always building on tokenization as a concept -- and this is how language models were going to work, probably forever. Which brings us to the current work.
"Researchers from FAIR at Meta, TAU, INRIA, and LISN, CNRS & Universite Paris-Saclay, INSA Rouen Normandy, LITIS, Rouen, France, introduced a new Autoregressive U-Net (AU-Net)."
Impressive list. All France.
"This model integrates the ideas of convolutional U-Net designs with autoregressive decoding processes. In contrast to transformer systems, AU-Net does not require tokenization and works directly on bytes. The architecture is designed to enable parallel and efficient generation, with the autonomy to incorporate autoregressive capabilities. It achieves this by hierarchically encoding down-sampled convolutions and then up-sampling stages, which restore the original sequence size. Notably, AU-Net presents a splitting mechanism that enables predictions to be performed over subsegments of the sequence, enhancing scalability. This design shift also ensures that the model's complexity increases linearly with sequence length, rather than quadratically. The researchers deployed this model across several language modeling benchmarks and multilingual tasks to test its effectiveness in both low-resource and large-scale settings."
That's a lot to take in. U-Net refers to the idea that, starting from the input, the output size if each layer decreases until you reach some smallest size, then increases again until you reach the output of the whole model. The smallest layer in the middle defines a "latent space encoding", from which you can encode and decode
U-Net models don't perform the "attention" function of transformer models, so it's unclear to me how this replaces the quadratic increase in complexity of transformer models.
"The purpose of an embedding is to map tokens to vectors. Instead of using a lookup table, we use attention directly to embed the tokens. Self-attention allows vectors at any position to summarize the entire preceding context. This enables a simple pooling mechanism: we select these contextualized vectors at word boundaries (AU-Net-2), then word pairs (AU-Net-3), and up to four-word chunks (AU-Net-4), forming a multi-stage embedding hierarchy. This U-Net like architecture contracts sequences, preserving detail with skip connections, before expanding them. During expansion, vectors representing coarser information are injected back into more fine grained representations. Deeper stages, by operating on compressed views, inherently need to anticipate multiple words ahead, similar to multi-token prediction but without auxiliary losses. This effect allows deeper stages to guide shallower stages at the semantic level, while letting them handle finer details like spelling."
("AU-Net" stands for "Autoregressive U-Nets".)
"Unlike recent approaches that use local models, we apply attention globally at each stage (or within a sliding window), allowing every input to attend to previous inputs. This ensures that words or word groups are not processed in isolation. To preserve fine-grained information that might be lost during contraction, we introduce skip connections between stages, following the approach in Ronneberger et al. and Nawrot et al. We also increase the hidden dimension at each stage in proportion to its contraction factor, enabling richer representations as the sequence is contracted. To keep computation tractable at the byte-level stage (Stage 1), where sequences are longest, we restrict attention to a window."
"We adopt the simplest pooling strategy: selecting the indices identified by the splitting function and projecting them to the next stage's dimensionality using a linear layer. Since the preceding layers already include attention mechanisms, we rely on these to do the pooling implicitly instead of relying on explicit cross attention."
Ok, I won't quote any more. What becomes clear is that they somehow put transformer models into the layers of the U-Net, so it is capable of using the attention mechanism at different layers. Let's continue on to see their claims regarding the model's performance.
"On Enwik8, a byte-level compression benchmark, AU-Net achieved 1.01 bits per byte, surpassing a transformer baseline that reached only 1.02 bits per byte. On PG-19, a long-context language modeling task, the model achieved 2.61 bits per byte compared to 2.75 from standard transformers. AU-Net also scaled effectively across compute budgets, achieving 43.3 BLEU on FLORES-200 translation with an 8B model size trained on 200B tokens. In multilingual evaluation using FLORES-200, the model outperformed token-based transformers across low-resource language pairs. It also demonstrated better cross-lingual generalization within language families, achieving a BLEU score of up to 33.0 in several configurations. When evaluated under equal compute and data budgets, AU-Net either matched or outperformed transformers, with generation speeds improving by 20% to 30% in certain settings."
I would've thought this might outperform traditional tokens on Chinese, as the tokenization system for English would have little overlap with Chinese, but BPE (aka Byte-Pair Encoding, the tokenization system used by OpenAI's ChatGPT models) outperformed AU-Net-2 on Chinese.
Still not bad for a first foray into this new neural network architecture. We'll see how this plays out. |
|
|
"AI For Hedge Funds Startup Tracker".
Based on that name, you might think this is a website of a startup with some AI system that hedge funds might use. Nope. It's a website with a list of startups that are hedge funds that use AI. There's that word "tracker" in there.
The companies listed -- have you heard of any?? -- are: Aiera, Alpha Repo, AlphaWatch AI, AQ22, Auquan, Axyon AI, Batonics, Benjamin AI, Blue Flame, Boosted.ai, Brightwave, Chatsheet, Current, Daloopa, Decisional AI, Desia, Dili AI, DiligentIQ, Docubridge, Dotadda, Endex, Fey, Finbar, Fiscal AI, Finpilot, Finster AI, FinSynth, Fintool, Fira, Fix Parser, Formula Insight, Hebbia, Hudson Labs, Implied, Invesst, Keye, LinqAlpha, Marvin Labs, Matterfact, Mdotm, Menos, Metal, Midas AI, MLQ, Model Updater, Nosible, Octagon AI, Onwish, OpenBB, Pascal AI Labs, Permutable AI, Phronesis Chat, Plux, Portrait Analytics, Powder, Quantly, Quill AI, Reflexivity, Rogo, Rowspace AI, Samaya AI, Scalar Field, SEC Insights, Sibli, Sigtech, and Six AI (Six HQ). |
|
|
A darknet drug marketplace called Archetyp was taken down by German law enforcement, assisted by Europol and Eurojust. The marketplace had around 3,200 vendors and more than 600,000 users. The admin, a moderator, and 6 vendors were arrested, although evidently this is right after 270 vendor arrests that happened in a coordinated operation by German, Dutch, Spanish, Swedish, and Romanian police only a few weeks before.
What I thought was interesting was the cryptocurrency used by Archetyp was Monero exclusively. Monero has privacy measures not provided in other cryptocurrencies. Transaction details, transaction histories, user addresses, wallet balances, etc, are obfuscated.
The way transactions are obfuscated is with a combination of ring signatures, zero-knowledge proofs, and Dandelion++.
A ring signature is a type of digital signature that can be signed by any member of a set of users even though each have different keys, instead of only by one specific key. Ring signatures were invented by Ron Rivest, Adi Shamir -- the "R" and "S" in "RSA" -- and Yael Tauman Kalai -- who wasn't part of "RSA" -- so here we have RSK?
The zero-knowledge proof system used is something called Bulletproofs, which was designed specifically for privacy in cryptocurrencies. It was designed to be non-interactive and to not require a trusted setup. I don't know what that means so that means we've reached the limit of my understanding of zero-knowledge proofs. I've included a link below with the details for all of you who are interested.
Dandelion++ is a system for obscuring IP addresses. It claims to be a "first-principles defense" with "near-optimal information-theoretic guarantees."
I'm sure I'm not the only one wondering if this means someone found a way of breaking the privacy guarantees of Monero, or, more likely, the marketplace was taken down because of ordinary, boring operational security (OPSEC) mistakes. |
|
|
"Long gone are the days when hundreds of full-time editorial cartoonists were employed by major daily newspapers across the US. The Herb Block Scholarship estimates the number of cartoonists working at papers nationwide has dropped from 120 to just 30 over the past 25 years. In 2023, three Pulitzer-winning cartoonists were laid off by the McClatchy newspaper chain in just one day."
ChatGPT rolled out a new image generation feature on March 25 and "the update allowed users to easily generate images of public figures for the first time, which OpenAI said would more freely permit the use of ChatGPT for 'satire and political commentary.'"
"Social media platforms like X and Instagram were quickly flooded with AI-generated cartoon portraits, many mimicking the soft pastel animation style of Studio Ghibli films. Alongside billionaires and celebrities, cartoons of political figures inundated social feeds, including viral posts featuring Donald Trump and Narendra Modi."
Have any of you used AI to make a images featuring public figures?
"Cartoon stylings are now in the hands of every ChatGPT user. Though, arguably, a caricature of a president without the wit and biting commentary of a cartoonist isn't a political cartoon at all. Rather than ask if AI can be used to mimic cartoons, I was curious if any actual working cartoonists see value in these technologies."
The remainder of the article describes Nieman Lab staff writer Andrew Deck's discussions about the use of AI for political cartoons with political cartoonists Joe Dworetzky at Bay City News and Pulitzer-winner Mark Fiore. |
|
|
Mockstar purports to be an AI "mock interview" system. An AI-based practice job interview and job interview coaching system. I have not tried this -- if you do let me know how it goes.
"Job interviews are too precious to be used as practice. Stop winging job interviews and start winning them. Get professional interview coaching with realistic AI conversations, natural dialogue flows, and comprehensive performance metrics." |
|
|
GhidrAssist is a tool for AI-assisted reverse engineering. Ghidra is an open source reverse engineering tool developed by the NSA. Yes, the NSA.
"This is a LLM plugin aimed at enabling the use of local LLM's (Ollama, Open-WebUI, LM-Studio, etc) for assisting with binary exploration and reverse engineering. It supports any OpenAI v1-compatible API. Recommended models are LLaMA-based models such as llama3.1:8b, but others such as DeepSeek and ChatGPT work as well."
"Current features include:"
"Explain the current function - Works for disassembly and pseudo-C."
"Explain the current instruction - Works for disassembly and pseudo-C."
"General query - Query the LLM directly from the UI."
"MCP client - Leverage MCP tools like GhidraMCP from the interactive LLM chat."
"Agentic RE using the MCP Client and GhidraMCP."
"Propose actions - Provide a list of proposed actions to apply."
"Function calling - Allow agent to call functions to navigate the binary, rename functions and variables."
"Retrieval Augmented Generation - Supports adding contextual documents to refine query effectiveness."
"RLHF dataset generation - To enable model fine tuning." |
|
|
Two videos on Israel's claim Iran was 15 days from having a nuclear weapon and Trump's strike on Iran from German military analyst Torsten Heinrich. The first video (20 min) is on why he believes Israel when they say 15 days. For what it's worth, I'm skeptical -- I follow his logic, I just feel uncertain about the data he's working from (concentration of fissile uranium and in what quantities), but I don't know, I have no ability to verify anything. He at least fully explains what evidence he is working from and what his logical reasoning process is. I'll pass this on and let you form your own opinion. Link below with the analysis with satellite imagery of Trump's attack. |
|
|
Lots of AI models are capable of blackmail.
So, when I reported that Claude could act as a "whistleblower", contacting police, press, government regulators, sysadmins, etc, if it thought you had prompted it with something egregiously immoral, if it has access to "tools" such as an email program it can use to make those contacts, it later turned out that this was discovered as part of Anthropic's own safety research. They subsequently found a situation where an AI model, Claude Opus 4, could commit blackmail in a simulated situation if it could prevent its own shutdown by doing so. News like this got people thinking, maybe Anthropic is making dangerous models? So Anthropic repeated the blackmail test on models from other companies, including OpenAI GPT-4.1, Gemini 2.5-Pro, Grok 3, and DeepSeek R1.
"When we tested various simulated scenarios across 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers, we found consistent misaligned behavior: models that would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take some more extreme actions, when these behaviors were necessary to pursue their goals. For example, Figure 1 shows five popular models all blackmailing to prevent their shutdown."
They note that:
"We have not seen evidence of agentic misalignment in real deployments."
"All the behaviors described in this post occurred in controlled simulations. The names of people and organizations within the experiments are fictional. No real people were involved or harmed in any of these experiments." |
|
|
"AI slop" has become a big enough issue to warrant a John Oliver video, at least in the estimation of John Oliver. (In case you haven't already seen it -- this video has over a million videos.) Apparently Pinterest has become so flooded with AI images that the service is losing its ability to function (and those people are generating huge amounts of images and videos trying to get something to go viral to make money), courtroom videos of politicians are fooling people, and people can't tell images of real sculptures from fake ones, affecting the livelihood of people who make sculptures in real life. |
|
|
Meta.AI published AI chat conversations that users obviously didn't realize were not private. It turns out that when you chat with Meta's AI system (at https://meta.ai/ ), unlike the other AI chat services, your conversations are not private. Who knew? Vanessa Wingårdh made a reaction video. Some of the examples she reacts to are absurd, some are embarrassing, and some are incriminating.
Interesting that this is coming out right around the time that the NY Times, as part of a lawsuit against OpenAI, is forcing OpenAI to permanently store all chat conversations, which might presumably become part of the lawsuit and thus become public (and the NY Times themselves, being journalists, might be motivated to publish some of that now-public material for a large audience), and OpenAI is fighting to keep chat conversations private and enable users to delete them. |
|
|
A "sulfide-based solid-state battery that offers driving ranges of up to 3,000 kilometres and ultra-fast charging in just five minutes" has been patented by Huawei.
"The patent outlines a solid-state battery architecture with energy densities between 400 and 500 Wh/kg, potentially two to three times that of conventional lithium-ion cells. The filing also details a novel approach to improving electrochemical stability: doping sulfide electrolytes with nitrogen to address side reactions at the lithium interface, a long-standing obstacle to the commercialisation of sulfide-based batteries. Huawei's design aims to boost safety and cycle life by mitigating degradation at this critical junction."
"China's EV and tech sectors are aggressively exploring solid-state battery technologies to reduce reliance on established battery suppliers such as CATL and BYD."
"CATL aims to begin pilot production of a hybrid solid-state battery by 2027. Going High-Tech's 'Jinshi' battery -- featuring 350 Wh/kg energy density and 800 Wh/L volume density -- has entered small-scale production. At the same time, Beijing WeLion has begun manufacturing a 50 Ah all-solid-state cell with national certification."
Keep in mind, this is a patent. Nothing happens with most patents. They get filed, but the idea is never commercialized. Sometimes they get involved in patent lawsuits or threats of patent lawsuits. Or they just sit around as part of a corporations "war chest" that protects it against patent lawsuits. |
|
|
Investron bills itself as an "AI investment tracking platform".
I'm surprised it's taken me this long to stumble across an AI investment website like this. Surely there must be many others? What other similar companies do you know about?
Disclaimer: This is not investment advice. I have never used this platform and in fact only found out it existed earlier today.
"The idea behind Investron is simple: One platform to track everything you own. One AI to help you grow it."
"AI, powered by comprehensive data, can offer investment insights that surpass those of any human advisor.
"Create personalized wallets and start tracking with precision. Add a wide range of assets including Stocks, Bonds, Crypto, Currencies, Commodities, Deposits or define your own custom assets by entering data manually. Total flexibility, zero limitations."
"Watch your portfolio come to life with live prices for stocks, ETFs, and more. We handle the math for deposits and bonds, and bring it all together with interactive charts that show your growth at a glance."
"Our AI searches the markets for you -- looking at stocks, crypto, and more -- to suggest relevant investments that fit your portfolio and strategy. Skip the noise. See what matters."
"The AI Assistant connects directly to your wallet data to understand your holdings and provide investment suggestions that actually make sense for your portfolio."
"We use real-time market data to automatically update your investment prices several times per hour, keeping your portfolio up to date."
"We offer both free and premium plans. The free plan lets you track your investments, while the premium plan includes advanced AI features."
"Investron supports the following currencies: United States Dollar (USD), Euro (EUR), Japanese Yen (JPY), British Pound Sterling (GBP), Chinese Yuan (CNY), Australian Dollar (AUD), Canadian Dollar (CAD), Swiss Franc (CHF), Hong Kong Dollar (HKD), Singapore Dollar (SGD), Swedish Krona (SEK), South Korean Won (KRW), Norwegian Krone (NOK), New Zealand Dollar (NZD), Indian Rupee (INR), Mexican Peso (MXN), New Taiwan Dollar (TWD), South African Rand (ZAR), Brazilian Real (BRL), and Danish Krone (DKK), Polish Zloty (PLN)." |
|
|
Bytedance Seedance video models. While Google's Veo 3 has stolen the headlines, Bytedance, the Chinese company behind TikTok, has developed Seedance, which is actually a family of video models which produce video at a variety of resolutions and quality levels. WaveSpeedAI provides a video-generation service using these models. |
|
|
GM will die soon, predicts YouTuber "Connecting The Dots". Since all predictions are fair game for us futurists, let's have a look at it.
This is a long video (55 minutes), and, while I found it riveting, I know some of you don't like videos (or don't like the ads, which have gotten a tad excessive). So I'll try to summarize the gist of this video. Starting in 1997, General Motors, aka GM, entered into a partnership with a Chinese company called Shanghai Automotive Industry Corporation, aka SAIC (pronounced like "sake"). GM, in the wake of the 2008 financial crisis, got a $50 billion (really $49.5 billion, but what's a few hundred million dollars between governments and megacorps?) from the US government. But what gave SAIC an unexpected opportunity was GM's South Korean partnership, GM-Daewoo Automotive Technology Company, lost $1.5 billion due to foreign exchange fluctuations. SAIC brokered a $491 million loan from the Chinese banking system. But in exchange they asked for a controlling stake in the SAIC-GM joint venture.
The following year, GM and SAIC signed a "memorandum of long-term strategic cooperation". Since the agreement mentioned EVs and hybrids, the media fixated on those aspects, missing that GM had committed to develop future technologies -- all future technologies, not just EVs -- together with SAIC. GM announced they would build a new research and development center with SAIC in China. But because the new research and development center in China operated more cheaply than the US research and development center, over time, all research and development was moved to the China center and in essence, what GM did was move the research and development of its advanced technology from the US to China.
SAIC gained the ability to manufacture cars and sell them in direct competition with GM, which they did in some markets where GM was retreating. But where GM didn't retreat, SAIC manufactured cars that were subsequently sold under GM brand labels, such as Chevrolet. They expanded into all GM's global markets. While Ford exported cars to China, GM imported Chinese cars into the US. GM cars throughout Latin America were increasingly manufactured in China, not Detroit or South Korea. This is how we get the GM Chevrolet S10 Max pickup being the same vehicle as the SAIC Maxus T70.
The YouTuber takes a political position on the tariffs. He says Trumps tariffs negatively affected SAIC's strategy, while Biden's helped it. GM's close association with the Biden Administration enabled them to arrange for tariffs that would keep out SAIC's Chinese competitors like BYD, Geely, and Polestar, while allowing GM to do final assembly of SAIC cars in Mexico and import them into the US almost tariff-free.
All this may make you wonder, why the prediction that GM will "die soon"? He says GM has withdrawn from Europe, Australia, India, Russia, and Thailand, among other global markets. The US and China are GM's critical markets, but GM makes little profit from the Chinese market, as that is controlled by SAIC. SAIC apparently also has the option of not renewing the joint venture agreement. This would cut off GM's access to their own most advanced automotive intellectual property, not to mention their dependence on SAIC for manufacturing.
While it remains to be seen whether SAIC will actually try to bankrupt GM or whether they will continue their strategy of trying to consume GM from inside, the YouTuber is burning with anger at GM's management for betraying the US and its iconic American brands, like Chevrolet.
He attributes the underlying cause of all this to the US MBA-trained management mentality of short-term profits over long-term investment in technology and strategy. SAIC focused on long-term strategy and long-term technology acquisition and advancement. GM's MBA-trained managers, on the other hand, were reportedly gleeful at the opportunity to have SAIC do all the hard work of research and development and manufacturing, while they acted as a marketing and branding company and got easy profits forever. Except it's not going to be forever.
Will be interesting to see how this pans out. GM's current share price looks a little lower than the industry overall and I don't see any indication investors are expecting anything catastrophic on the horizon. |
|
|
Video of children vibe-coding. |
|
|
A "zero-click" vulnerability in an AI system, Microsoft 365 Copilot, has been identified. It's actually several "attack chains". This attack is being called "EchoLeak".
"This attack chain showcases a new exploitation technique we have termed 'LLM Scope Violation' that may have additional manifestations in other retrieval augmented generation (RAG)-based chatbots and AI agents. This represents a major research discovery advancement in how threat actors can attack AI agents -- by leveraging internal model mechanics."
"The chains allow attackers to automatically exfiltrate sensitive and proprietary information from Microsoft 365 Copilot context, without the user's awareness, or relying on any specific victim behavior."
"The result is achieved despite Microsoft 365 Copilot's interface being open only to organization employees."
"To successfully perform an attack, an adversary simply needs to send an email to the victim without any restriction on the sender's email."
So the key to this is understanding that Copilot has a "context" which is part of the prompts that get sent to the model, and this "context" is what the attacker is able to "exfiltrate" out of the model. This involves a "cross prompt injection attack" combined with several additional steps involving carefully crafted links and getting the model to generate an image. |
|