Steve Randy Waldman
@interfluidity.com

#relatable

in reply to this
Steve Randy Waldman
@interfluidity.com

“you absolutely have to view LLM benchmarks from a position of default-distrust” @ouguoc.mastodon.online.ap.brid.gy describes how easily answers to benchmark problems can leak into the training set. ‪https://seinmastudios.com/posts/llm-benchmarks-are-not-trustworthy/

Steve Randy Waldman
@interfluidity.com

why is the link not a link? let's try that again: seinmastudios.com/posts/llm-be...

LLM benchmarks like SWE-bench are not trustworthy

in reply to self
Steve Randy Waldman
@interfluidity.com

the doctor will see you now!

in reply to this
Steve Randy Waldman
@interfluidity.com

now don’t get crabby.

in reply to this
Steve Randy Waldman
@interfluidity.com

maybe in a US context, state governments could work? drafts.interfluidity.com/2025/03/09/v...

Voice of a Maryland

Loading quoted Bluesky post...
Steve Randy Waldman
@interfluidity.com

One thing the conservative movement has proven is burying The New York Times in flack and pushback can be quite effective.

Steve Randy Waldman
@interfluidity.com

Where I am now it's already the 5th of July, but at home it is still July 4. I am posting what I traditionally post every July 4. Optimism of the will. "On the stairs I smoke a cigarette alone / Mexican kids are shootin' fireworks below." God bless America. www.youtube.com/watch?v=K_ty...

Link Preview: 
X - 4th of July: YouTube video by Craig Bostick

X - 4th of July

Link Preview: X - 4th of July: YouTube video by Craig Bostick
Steve Randy Waldman
@interfluidity.com

Your piece is great. As usual. Yeah, I think there's no getting around that we can't measure human welfare without a set of values to define what that means. There's not some "scientific" trick or shortcut that lets us authoritatively, universally, say that this is better than that. 1/

in reply to this
Steve Randy Waldman
@interfluidity.com

Technocrats pretended that GDP was that for a while, and it kind of worked because, broad brush, the correlation was pretty strong between GDP per capita and qualitative, intuitive, perceptions of prosperity and satisfaction. 2/

in reply to self
Steve Randy Waldman
@interfluidity.com

It was a great fit for "neoliberal" economics, whose ideological trick was largely to obscure methodological problems economists had long discussed and pretend something like economics 101 provided a scientific basis for policy to which intelligent people must defer. 3/

in reply to self
Steve Randy Waldman
@interfluidity.com

Besides the helpful casual empirics, it had this story, for market economies, GDP is the quantity of the highest value (because market optimized) basket of goods and services produced by the economy, and so that quantity, a simple number, should be a pretty good proxy for wealth. 4/

in reply to self
Steve Randy Waldman
@interfluidity.com

But real-life markets are imperfect optimizers, and even theoretical markets are arguably local rather than global optimizers that might get stuck in bad path dependencies (like automobile-dependent low density living, I'd argue). 5/

in reply to self
Steve Randy Waldman
@interfluidity.com

This trick of replacing an infinitely dimensioned "what" with a single number "how much" just doesn't work. 6/

in reply to self
Steve Randy Waldman
@interfluidity.com

So we are left with judgment calls to make, which invariably involve both evaluating tradeoffs on dimensions we'd mostly all agree are valuable (say shipbuilding capacity vs health care services), but also require imposing contested values. 7/

in reply to self
Steve Randy Waldman
@interfluidity.com

Some people think the costs of low-density single-family infrastructure are totally worth it, resources spent in support of human flourishing. I think an auto-centric built environment is both costly and inferior in welfare terms to achievable alternatives. 8/

in reply to self
Steve Randy Waldman
@interfluidity.com

Each claim requires assertions about other people's preferences in a normative sense, what they should want (since it's indeterminate what they do want while the hypothetical choice, two very different social equilibria, are not remotely before them). 9/

in reply to self
Steve Randy Waldman
@interfluidity.com

There's no "scientifically" right or wrong answer other ppl must defer to. Only judgements—not pulled from nowhere, informed by marshaling evidence—but judgments nonetheless, of which we have to persuade our fellow citizens, rather than truths we can discover and propound as incontrovertible. /fin

in reply to self
Steve Randy Waldman
@interfluidity.com

an unsurprisingly great summary, big picture, of how and where we have cornered ourselves economically the past few decades, by @sjshancoxli.liberalcurrents.com.

Loading quoted Bluesky post...
Steve Randy Waldman
@interfluidity.com

(amazing company, both of you!)

in reply to this
Steve Randy Waldman
@interfluidity.com

tech barons take note. this is the future you preferred to one that included musings on an unlikely unrealized capital gains tax. this is the future you may have delivered to all of us, but from which you will not be exempt. ht @alanbeattie.bsky.social

Loading quoted Bluesky post...
Steve Randy Waldman
@interfluidity.com

“Positive-sum solutions to multi-period stag hunts select for time consistency more than cost effectiveness.” @akhilrao.bsky.social akhilrao.org/blog/2025/07... remarkable insights what leadership and coordination actually entail, in the dry language of game theory. except with death cults. 1/

Selecting for time consistency

Steve Randy Waldman
@interfluidity.com

project designs demanding “unnecessarily” expensive commitments, work to ensure partners highly committed to shared values, can augur success in ways that seem irrational, inefficient, dumb to reviewers who imagine a dictator with a Gantt chart can just get every participant to play their part. /fin

in reply to self
Steve Randy Waldman
@interfluidity.com

“The rise of Whatever” @eev.ee eev.ee/blog/2025/07... i’ve complicated, still mixed, views abt LLMs, whether what comes of them can be good despite much evident awfulness (“slop”). but the “whatever” thesis perfectly captures what happened to the web and crypto. and yeah, LLMs are whatever machines

Link Preview: 
The rise of Whatever: This was originally titled “I miss when computers were fun”. But in the course of writing it, I discovered that there is a reason computers became less fun, a dark thread woven through a number of eve...

The rise of Whatever

Link Preview: The rise of Whatever: This was originally titled “I miss when computers were fun”. But in the course of writing it, I discovered that there is a reason computers became less fun, a dark thread woven through a number of eve...
Steve Randy Waldman
@interfluidity.com

i guess my lean towards (a) comes from a kind of qualitative empiricism, at best. year by year growth numbers aren’t reliable, but China’s share and increasing dominance in important sectors doesn’t require a well calibrated horserace to observe. 1/

in reply to this
Steve Randy Waldman
@interfluidity.com

that doesn’t mean China’s successes render its model superior in welfare terms! (there were lots of things the Soviet Union ill-advisedly produced a lot of, “dominated”). 2/

in reply to self
Steve Randy Waldman
@interfluidity.com

traditional growth measures are intended in a neoliberal context to obviate the question of “are we producing the right things?”, because markets optimize, then adding dollars spent on new production makes scalars, the value of “best uses” that we can rank. 3/

in reply to self
Steve Randy Waldman
@interfluidity.com

but i think rentierism in the US, persistent high mark-ups embedded in those numbers we sum, have weakened the historical correlation between GDP and welfare. 4/

in reply to self
Steve Randy Waldman
@interfluidity.com

so i think on both sides, our numbers aren’t what we want. they are overtly massaged on one side. they are undermined by structural change despite consistent, earnest, econometrics on the other. 5/

in reply to self
Steve Randy Waldman
@interfluidity.com

so ultimately we have little choice but to rely on course, qualitative observations, and impose our own weightings on them. 6/

in reply to self
Steve Randy Waldman
@interfluidity.com

does the military heft and/or generalized physical-world capacity that comes with shipbuilding overwhelm the cost in all the services thousands of workers in shipbuilding might otherwise supply? 7/

in reply to self
Steve Randy Waldman
@interfluidity.com

i have a view, but it could be wrong! still, i don’t think there is anything we can straightforwardly measure to decide the question, even if the measures weren’t massaged. /fin

in reply to self
Steve Randy Waldman
@interfluidity.com

rule by law depends on it being a norm among those with the power to enforce laws and norms!

in reply to this
Steve Randy Waldman
@interfluidity.com

Thread by @sjshancoxli.liberalcurrents.com. (I lean towards (a), see drafts.interfluidity.com/2024/08/13/c... and the industrial policy pieces linked beneath it. but (b) could be right. it’s a continuing debate!) 👇

Loading quoted Bluesky post...
Steve Randy Waldman
@interfluidity.com

we are governed by laws or we aren’t. some small social media company should sue. will even this Supreme Court claim a President can not only refuse to enforce a law, but affirmatively impose a regime in defiance of it?

Loading quoted Bluesky post...
Steve Randy Waldman
@interfluidity.com

oh yes. where’s the unpleasant laxative that would drain our ugly mess?

in reply to this
Steve Randy Waldman
@interfluidity.com

probably the best i got, so thank goodness!

in reply to this
Steve Randy Waldman
@interfluidity.com

yes! a pleasant surprise!

in reply to this