...Archive for October 2014

Rational regret

Suppose that you have a career choice to make:

  1. There is a “safe bet” available to you, which will yield a discounted lifetime income of $1,000,000.
  2. Alternatively, there is a risky bet, which will yield a discounted lifetime income of $100,000,000 with 10% probability, or a $200,000 lifetime income with 90% probability.

The expected value of Option 1 is $1,000,000. The expected value of Option 2 is (0.1 × $100,000,000) + (0.9 × $200,000) = $10,180,000. For a rational, risk-neutral agent, Option 2 is the right choice by a long-shot.

A sufficiently risk-averse agent, of course, would choose Option 1. But given these numbers, you’d have to be really risk-averse. For most people, taking the chance is the rational choice here.


Update: By “discounted lifetime income”, I mean the present value of all future income, not an annual amount. At a discount rate of 5%, Option 1 translates to a fixed payment of about $55K/year over a 50 year horizon, Option 2 “happy” becomes $5.5 million per year, Option 2 “sad” becomes about $11K per year. The absolute numbers don’t matter to the argument, but if you interpreted the “safe bet” as $1M per year, it is too easy to imagine yourself just opting out of the rat race. The choice here is intended to be between (1) a safe but thrifty middle class income or (2) a risky shot at great wealth that leaves one on a really tight budget if it fails. Don’t take the absolute numbers too seriously.


Suppose a lot of people face decisions like this, and suppose they behave perfectly rationally. They all go for Option 2. For 90% of the punters, the ex ante wise choice will turn out to have been an ex post mistake. A bloodless rational economic agent might just accept that get on with things, consoling herself that she had made the right decision, she would do the same again, that her lived poverty is offset by the exorbitant wealth of a twin in an alternate universe where the contingencies worked out differently.

An actual human, however, would probably experience regret.

Most of us do not perceive of our life histories as mere throws of the dice, even if we acknowledge a very strong role for chance. Most of us, if we have tried some unlikely career and failed, will either blame ourselves or blame others. We will look to decisions we have taken and wonder “if only”. If only I hadn’t screwed up that one opportunity, if only that producer had agreed to listen to my tape, if only I’d stuck with the sensible, safe career that was once before me rather than taking an unlikely shot at a dream.

Everybody behaves perfectly rationally in our little parable. But the composition of smart choices ensures that 90% of our agents will end up unhappy, poor, and full of regret, while 10% live a high life. Everyone will have done the right thing, but in doing so they will have created a depressed and depressing society.

You might argue that, once we introduce the possibility of painful regret, Option 2 is not the rational choice after all. But whatever (finite) negative value you want to attach to regret, there is some level of risky payoff that renders taking a chance rational under any conventional utility function. You might argue that outsized opportunities must be exhaustible, so it’s implausible that everyone could try the risky route without the probability of success collapsing. Sure, but if you add a bit of heterogeneity you get a more complex model in which those who are least likely to succeed drop out, increasing the probability of success until the marginal agent is indifferent and everyone more confident rationally goes for the gold. This is potentially a large group, if the number of opportunities and expected payoff differentials are large. 90% of the population may not be immiserated by regret, but a fraction still will be.

It is perhaps counterintuitive that the size of that sad fraction will be proportionate the the number of unlikely outsize opportunities available. More opportunities mean more regret. If there is only one super-amazing gig, maybe only the top few potential contestants will compete for it, leaving as regretters only a tiny sliver of our society. But if there are very many amazing opportunities, lots of people will compete for them, increasing the poorer, sadder, wiser fraction of our hypothetical population.

Note that so far, we’ve presumed perfect information about individual capabilities and the stochastic distribution of outcomes. If we bring in error and behavioral bias — overconfidence is ones abilities, or overestimating the odds of succeeding due to the salience and prominence of “winners” — then it’s easy to imagine even more regret. But we don’t need to go there. Perfectly rational agents making perfectly good decisions will lead to a depressing society full of sadsacks, if there are a lot of great careers with long odds of success and serious opportunity cost to pursuing those careers rather than taking a safer route.

It’s become cliché to say that we’re becoming a “winner take all” society, or to claim that technological change means a relatively small population can leverage extraordinary skills at scale and so produce more efficiently than under older, labor-intensive production processes. If we are shifting from a flattish economy with very many moderately-paid managers to a new economy with fewer (but still many) stratospherically paid “supermanagers“, then we should expect a growing population of rational regretters where before people mostly landed in predictable places.

Focusing on true “supermanagers” suggests this would only be a phenomenon at the very top, a bunch of mopey master-of-the-universe wannabes surrounding a cadre of lucky winners. But if the distribution of outcomes is fractal or “scale invariant“, you might get the same game played across the whole distribution, where the not-masters-of-the-universe mope alongside the not-tenure-track-literature-PhDs, who mope alongside failed restauranteurs and the people who didn’t land that job tending the robots in the factory despite an expensive stint at technical college. The overall prevalence of regret would be a function of the steepness of the distribution of outcomes, and the uncertainty surrounding where one lands if one chooses ambition relative to the position the same individual would achieve if she opted for a safe course. It’s very comfortable for me to point out that a flatter, more equal distribution of outcomes would reduce the prevalence of depressed rational regretters. It is less comfortable, but not unintuitive, to point out that diminished potential mobility would also reduce the prevalence of rational regretters. If we don’t like that, we could hope for a society where the distribution of potential mobility is asymmetrical and right-skewed: If the “lose” branch of Option 2 is no worse than Option 1, then there’s never any reason to regret trying. But what we hope for might not be what we are able to achieve.

I could turn this into a rant against inequality, but I do plenty of that and I want a break. Putting aside big, normative questions, I think rational regret is a real issue, hard to deal with at both a micro- and macro- level. Should a person who dreams of being a literature professor go into debt to pursue that dream? It’s odd but true that the right answer to that question might imply misery as the overwhelmingly probable outcome. When we act as advice givers, we are especially compromised. We’ll love our friend or family member just as much if he takes a safe gig as if he’s a hotshot professor, but we’ll feel his pain and regret — and have to put up with his nasty moods — if he tries and fails. Many of us are much more conservative in the advice we give to others than in the calculations we perform for ourselves. That may reflect a very plain agency problem. At a macro level, I do worry that we are evolving into a society where many, many people will experience painful regret in self-perception — and also judgments of failure in others’ eyes — for making choices that ex ante were quite reasonable and wise, but that simply didn’t work out.

Update History:

  • 29-Oct-2014, 12:45 a.m. PDT: Added bold update section clarifying the meaning of “discounted lifetime income”.
  • 29-Oct-2014, 1:05 a.m. PDT: Updated the figures in the update to use a 5% rather than 3% discount rate.
  • 29-Oct-2014, 1:25 a.m. PDT: “superamazing super-amazing“; “overconfidence is ones own abilities”

Econometrics, open science, and cryptocurrency

Mark Thoma wrote the wisest two paragraphs you will read about econometrics and empirical statistical research in general:

You are testing a theory you came up with, but the data are uncooperative and say you are wrong. But instead of accepting that, you tell yourself "My theory is right, I just haven't found the right econometric specification yet. I need to add variables, remove variables, take a log, add an interaction, square a term, do a different correction for misspecification, try a different sample period, etc., etc., etc." Then, after finally digging out that one specification of the econometric model that confirms your hypothesis, you declare victory, write it up, and send it off (somehow never mentioning the intense specification mining that produced the result).

Too much econometric work proceeds along these lines. Not quite this blatantly, but that is, in effect, what happens in too many cases. I think it is often best to think of econometric results as the best case the researcher could make for a particular theory rather than a true test of the model.

What Thoma is describing here cannot be fixed. Naive theories of statistical analysis presume a known, true model of the world whose parameters a researcher needs simply to estimate. But there is in fact no "true" model of the world, and a moralistic prohibition of the process Thoma describes would freeze almost all empirical work in its tracks. It is the practice of good researchers, not just of charlatans, to explore their data. If you want to make sense of the world, you have to look at it first, and try out various approaches to understanding what the data means. In practice, this means that long before any empirical research is published, its producers have played with lots and lots of potential models. They've examined bivariate correlations, added variables, omitted variables, considered various interactions and functional forms, tried alternative approaches to dealing with missing data and outliers, etc. It takes iterative work, usually, to find even the form of a model that will reasonably describe the space you are investigating. Only if your work is very close to past literature can you expect to be able to stick with a prespecified statistical model, and then you are simply relying upon other researchers' iterative groping.

The first implication of this practice is common knowledge: "statistical significance" never means what it claims to mean. When an effect is claimed to be statistically significant — p < 0.05 — that does not in fact mean that there is only a 1 in 20 chance that the effect would be observed by chance. That inference would be valid only if the researcher had estimated a unique, correctly specified model. If you are trying out tens or hundreds of models (which is not far-fetched, given the combinatorics that apply with even a few candidate variables), even if your data is pure noise then you are likely to generate statistically significant results. Statistical significance is a conventionally agreed low bar. If you can't overcome even that after all your exploring, you don't have much of a case. But determined researchers need rarely be deterred.

Ultimately, what we rely upon when we take empirical social science seriously are the ethics and self-awareness of the people doing the work. The tables that will be published in a journal article or research report represent a tiny slice of a much larger space of potential models researchers will have at least tentatively explored. An ethical researcher asks herself not just whether the table she is publishing meets formalistic validity criteria, but whether it is robust and representative of results throughout the reasonable regions of the model space. We have no other control than self-policing. Researchers often include robustness tests in their publications, but those are as flawed as statistical significance. Along whatever dimension robustness is going to be examined, in a large enough space of models there will be some to choose from that will pass. During the peer review process, researchers may be asked to perform robustness checks dreamed up by their reviewers. But those are shots in the dark at best. Smart researchers will have pretty good guesses about what they may be required to do, and can ensure they are prepared.

Most researchers perceive themselves as ethical, and don't knowingly publish bad results. But it's a fine line between taking a hypothesis seriously and imposing a hypothesis on the data. A good researcher should try to find specifications that yield results that conform to her expectations of reasonableness. But in doing so, she may well smuggle in her own hypothesis. So she should then subject those models to careful scrutiny: How weird or nonobvious were these "good" models? Were they rare? Does the effort it took to find them reflect a kind of violation of Occam's razor? Do the specifications that bear out the hypothesis represent a more reasonable description of the world than the specifications that don't?

These are subjective questions. Unsurprisingly, researchers' hypotheses can be affected by their institutional positions and personal worldviews, and those same factors are likely to affect judgment calls about reasonableness, robustness, and representativeness. As Milton Friedman taught us, in social science, it's often not clear what is a result and what is an assumption, we can "flip" the model and let a result we believe to be true count as evidence for the usefulness of the reasoning that took us there. Researchers may sincerely believe that the models that bear out their hypothesis also provide useful insight into processes and mechanisms that might not have been obvious to them or others prior to their work. Individually or in groups as large as schools and disciplines, researchers may find a kind of consilience between the form of model they have converged upon, the estimates produced when the model is brought to data, and their own worldviews. Under these circumstances, it is very difficult for an outsider to distinguish a good result from a Rorscarch test. And it is very difficult for a challenger, whose worldview may not resonate so well with the model and its results, to weigh in.

Ideally, the check against granting authority to questionable results should be reproduction. Replication is the first, simplest application of reproduction. By replicating work, we verify that a model has been correctly brought to the data, and yields the expected results. Replication is a guard against error or fraud, and can be a partial test of validity if we bring new data to the model. But replication alone is insufficient to resolve questions of model choice. To really examine empirical work, a reviewer needs to make an independent exploration of the potential model space, and ask whether the important results are robust to other choices about how to organize, prepare, and analyze the data. Do similarly plausible, equally robust, specifications exist that would challenge the published result, or is the result a consistent presence, rarely contradicted unless plainly unreasonable specifications are imposed? It may well be that alternative results are unrankable: under one family of reasonable choices, one result is regularly and consistently exonerated, while under another, equally reasonable region of the model space, a different result appears. One can say that neither result, then, deserves very much authority and neither should be dismissed. More likely, the argument would shift to questions about which set of modeling choices is superior, and we realize that we do not face an empirical question after all, but a theoretical one.

Reproduction is too rare in practice to serve as a sufficient check on misbegotten authority. Social science research is a high cost endeavor. Theoretically, any kid on a computer should be able to challenge any Nobelist's paper by downloading some data and running R or something. Theoretically any kid on a computer should be able to write an operating system too. In practice, data is often hard to find and expensive, the technical ability required to organize, conceive, and perform alternative analyses is uncommon, and the distribution of those skills is not orthogonal to the distribution of worldviews and institutional positions. Empirical work is time-consuming, and revisiting already trodden ground is not well rewarded. For skilled researchers, reproducing other peoples' work to the point where alternative analyses can be explored entails a large opportunity cost.

But social science research has high stakes. It may serve to guide — or at least justify — policy. The people who have an interest in a skeptical vetting of research may not have the resources to credibly offer one. The inherent subjectivity and discretion that accompanies so-called empirical research means that the worldview and interests of the original researchers may have crept in, yet without a credible alternative, even biased research wins.

One way to remedy this, at least partially, would be to reduce the difficulty of reproducing an analysis. It has become more common for researchers to make available their data and sometimes even the code by which they have performed an empirical analysis. That is commendable and necessary, but I think we can do much better. Right now, the architecture of social science is atomized and isolated. Individual researchers organize data into desktop files or private databases, write code in statistical packages like Stata, SAS, or R, and publish results as tables in PDF files. To run variations on that work, one often literally needs access to the researcher's desktop, or else reconstruct her desktop on your own. There is no longer any reason for this. All of the computing, from the storage of raw data, to the transformation of isolated variables into normalized data tables that become the input to statistical models, to the estimation of those models, can and should be specified and performed in a public space. Conceptually, the tables and graphs at the heart of a research paper should be generated "live" when a reader views them. (If nothing has changed, cached versions can be provided.) The reader of an article ought to be able to generate sharable appendices by modifying the authors' specifications. A dead piece of paper, or a PDF file for that matter, should not be an acceptable way to present research.

Ultimately, we should want to generate a reusable, distributed, permanent, and ever-expanding web of science, including conjectures, verifications, modifications, and refutations, and reanalyses as new data arrives. Social science should become a reified public commons. It should be possible to build new analyses from any stage of old work, by recruiting raw data into new projects, by running alternative models on already cleaned-up or normalized data tables, by using an old model's estimates to generate inputs to simulations or new analyses.

Technologically, this sort of thing is becoming increasingly possible. Depending on your perspective, Bitcoin may be a path to freedom from oppressive central banks, a misconceived and cynically-flogged remake of the catastrophic gold standard, or a potentially useful competitor to MasterCard. But under the hood, what's interesting about Bitcoin has nothing to do with any of that. Bitcoin is a prototype of a kind of application whose data and computation are maintained by consensus, owned by no one, and yet reliably operated at a very large scale. Bitcoin is, in my opinion, badly broken. Its solution to the problem of ensuring consistency of computation provokes a wasteful arms-race of computing resources. Despite the wasted cycles, the scheme has proven insufficient at preventing a concentration of control which could undermine its promise to be "owned by no one", along with its guarantee of fair and consistent computation. Plus, Bitcoin's solution could not scale to accommodate the storage or processing needs of a public science platform.

But these are solvable technical problems. It is unfortunate that the kind of computing Bitcoin pioneered has been given the name "cryptocurrency", and has been associated with all sorts of technofinancial scheming. When you hear "cryptocurrency", don't think of Bitcoin or money at all. Think of Paul Krugman's babysitting co-op. Cryptocurrency applications deal with the problem of organizing people and their resources into a collaborative enterprise by issuing tokens to those who participate and do their part, redeemable for future services from the network. So they will always involve some kind of scrip. But, contra Bitcoin, the scrip need not be the raison d'être of the application. Like the babysitting co-op (and a sensible monetary economy), the rules for issue of scrip can be designed to maximize participation in the network, rather than to reward hoarding and speculation.

The current state of the art is probably best represented by Ethereum. Even there, the art remains in a pretty rudimentary state — it doesn't actually work yet! — but they've made a lot of progress in less than a year. Eventually, and by eventually I mean pretty soon, I think we'll have figured out means of defining public spaces for durable, large scale computing, controlled by dispersed communities rather than firms like Amazon or Google. When we do, social science should move there.

Update History:

  • 17-Oct-2014, 6:40 p.m. PDT: “already well-trodden”; “yet without a credible alternative alternative
  • 25-Oct-2014, 1:40 a.m. PDT: “whose parameters a researcher need needs simply to estimate”; “a determined researcher researchers need rarely be deterred.”; “In practice, that this means”; “as large as schools or and disciplines”; “write code in statical statistical packages”

Scale, progressivity, and socioeconomic cohesion

Today seems to be the day to talk about whether those of us concerned with poverty and inequality should focus on progressive taxation. Edward D. Kleinbard in the New York Times and Cathie Jo Martin and Alexander Hertel-Fernandez at Vox argue that focusing on progressivity can be counterproductive. Jared Bernstein, Matt Bruenig, and Mike Konczal offer responses offer responses that examine what “progressivity” really means and offer support for taxing the rich more heavily than the poor. This is an intramural fight. All of these writers presume a shared goal of reducing inequality and increasing socioeconomic cohesion. Me too.

I don’t think we should be very categorical about the question of tax progressivity. We should recognize that, as a political matter, there may be tradeoffs between the scale of benefits and progressivity of the taxation that helps support them. We should be willing to trade some progressivity for a larger scale. Reducing inequality requires a large transfers footprint more than it requires steeply increasing tax rates. But, ceteris paribus, increasing tax rates do help. Also, high marginal tax rates may have indirect effects, especially on corporate behavior, that are socially valuable. We should be willing sometimes to trade tax progressivity for scale. But we should drive a hard bargain.

First, let’s define some terms. As Konczal emphasizes, tax progressivity and the share of taxes paid by rich and poor are very different things. Here’s Lane Kenworthy, defining (italics added):

When those with high incomes pay a larger share of their income in taxes than those with low incomes, we call the tax system “progressive.” When the rich and poor pay a similar share of their incomes, the tax system is termed “proportional.” When the poor pay a larger share than the rich, the tax system is “regressive.”

It’s important to note that even with a very regressive tax system, the share of taxes paid by the rich will nearly always be much more than the share paid by the poor. Suppose we have a two animal economy. Piggy Poor earns only 10 corn kernels while Rooster Rich earns 1000. There is a graduated income tax that taxes 80% of the first 10 kernels and 20% of amounts above 10. Piggy Poor will pay 8 kernels of tax. Rooster Rick will pay (80% × 10) + (20% × 990) = 8 + 198 = 206 kernels. Piggy Poor pays 8/10 = 80% of his income, while Rooster Rich pays 206/1000 = 20.6% of his. This is an extremely regressive tax system! But of the total tax paid (214 kernels), Rooster Rich will have paid 206/214 = 96%, while Piggy Poor will have paid only 4%. That difference in the share of taxes paid reflects not the progressivity of the tax system, but the fact that Rooster Rich’s share of income is 1000/1010 = 99%! Typically, concentration in the share of total taxes paid is much more reflective of the inequality of the income distribution than it is of the progressivity or regressivity of the tax system. Claims that the concentration of the tax take amount to “progressive taxation” should be met with lamentations about the declining quality of propaganda in this country.

Martin and Hertel-Fernandez offer the following striking graph:

Martin-and-Hertel-Fernandez-graph-2014-10-10

The OECD data that Konczal cites as the likely source of Martin and Hertel-Fernandez’s claims includes measures of both tax concentration and progressivity. I think Konczal has Martin and Hertel-Fernandez’s number. If the researchers do use a measure of tax share on the axis they have labeled “Household Tax Progressivity”, that’s not so great, particularly since the same source includes two measures intended to capture of actual tax progressivity (Table 4.5, Column A3 and B3). Even if the “right” measure were used, there are devils in details. These are “household taxes” based on an “OECD income distribution questionnaire”. Do they take into account payroll taxes or sales taxes, or only income taxes? This OECD data shows the US tax system to be strongly progressive, but when all sources of tax are measured, Kenworthy finds that the US tax system is in fact roughly proportional. (ht Bruenig) The inverse correlation between tax progressivity and effective, inclusive welfare states is probably weaker than Martin and Hertel-Fernandez suggest with their misspecified graph. If they are capturing anything at all, it is something akin to Ezra Klein’s “doom loop”, that countries very unequal in market income — which almost mechanically become countries with very concentrated tax shares — have welfare states that are unusually poor at mitigating that inequality via taxes and transfers.

Although I think Martin and Hertel-Fernandez are overstating their case, I don’t think they are entirely wrong. US taxation may not be as progressive as it appears because of sales and payroll taxes, but European social democracies have payroll taxes too, and very large, probably regressive VATs. Martin and Hertel-Fernandez are trying to persuade us of the “paradox of redistribution”, which we’ve seen before. Universal taxation for universal benefits seems to work a lot better at building cohesive societies than taxes targeted at the rich that finance transfers to the poor, because universality engenders political support and therefore scale. And it is scale that matters most of all. Neither taxes nor benefits actually need to be progressive.

Let’s try a thought experiment. Imagine a program with regressive payouts. It pays low earners a poverty-line income, top earners 100 times the poverty line, and everyone else something in between, all financed with a 100% flat income tax. Despite the extreme regressivity of this program’s payouts and the nonprogressivity of its funding, this program would reduce inequality in America. After taxes and transfers, no one would have a below poverty income, and no one would earn more than a couple of million dollars a year. Scale down this program by half — take a flat tax of 50% of income, distribute the proceeds in the same relative proportions — and the program would still reduce inequality, but by somewhat less. The after-transfer income distribution would be an average of the very unequal market distribution and the less unequal payout distribution, yielding something less unequal than the market distribution alone. Even if the financing of this program were moderately regressive, it would still reduce overall inequality.

How can a regressively financed program making regressive payouts reduce inequality? Easily, because no (overt) public sector program would ever offer net payouts as phenomenally, ridiculously concentrated as so-called “market income”. For a real-world example, consider Social Security. It is regressively financed: thanks to the cap on Social Security income, very high income people pay a smaller fraction of their wages into the program than modest and moderate earners. Payouts tend to covary with income: People getting the maximum social security payout typically have other sources of income and wealth (dividends and interest on savings), while people getting minimal payments often lack any supplemental income at all. Despite all this, Social Security helps to reduce inequality and poverty in America.

Eagle-eyed readers may complain that after making so big a deal of getting the definition of “tax progressivity” right, I’ve used “payout progressivity” informally and inconsistently with the first definition. True, true, bad me! I insisted on measuring tax progressivity based on pay-ins as a fraction of income, while I’m call pay-outs “regressive” if they increase with the payees income, irrespective of how large they are as a percentage of payee income. If we adopt a consistent definition, then many programs have payouts that are nearly infinitely progressive. When other income is zero, how large a percentage of other income is a small social security check? Sometimes, to avoid these issues, the colorful terms “Robin Hood” and “Matthew” are used. “Robin Hood” programs give more to the poor than the rich, “Matthew” programs are named for Matthew Effect — “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken even that which he hath.” Programs that give the same amount to everyone, like a UBI, are described less colorfully as “Beveridge”, after the recommendations of the Beveridge Report. The “paradox of redistribution” is that welfare states with a lot of Matthew-y programs, that pay more to the rich and may not be so progressively financed, tend to garner political support from the affluent “middle class” as well as the working class, and are able scale to an effective size. Robin-Hood-y programs, on the other hand, tend to stay small, because they pit the poor against both the moderately affluent and the truly rich, which is a hard coalition to beat.

So, should progressives give up on progressivity and support modifying programs to emulate stronger welfare states with less progressive finance and more Matthew-y, income-covarying payouts? Of course not. That would be cargo-cultish and dumb. The correlation between lower progressivity and effective welfare states is the product of an independent third cause, scale. In developed countries, the primary determinant of socioeconomic cohesiveness (reduced inequality and poverty) is the size of the transfer state, full stop. Progressives should push for a large transfer state, and concede progressivity — either in finance or in payouts — only in exchange for greater scale. Conceding progressivity without an increase in scale is just losing. As “top inequality” increases, the political need to trade away progressivity in order to achieve program scale diminishes, because the objective circumstances of the rich and erstwhile middle class diverge.

Does this focus on scale mean progressives must be for “big government”? Not at all. Matt Bruenig has written this best. The size of the transfer state is not the size of the government. When the government arranges cash transfers, it recruits no real resources into projects wasteful or valuable. It builds nothing and squanders nothing. It has no direct economic cost at all (besides a de minimis cost of administration). Cash transfer programs may have indirect costs. The taxes that finance them may alter behavior counterproductively and so cause “deadweight losses”. But the programs also have indirect benefits, in utilitarian, communitarian, and macroeconomic terms. That, after all, is why we do them. Regardless, they do not “crowd out” use of any real economic resources.

Controversies surrounding the scope of government should be distinguished from discussions of the scale of the transfer state. A large transfer state can be consistent with “big government”, where the state provides a wide array of benefits “in-kind”, organizing and mobilizing real resources into the production of those benefits. A large transfer state can be consistent with “small government”, a libertarian’s “night watchman state” augmented by a lot of taxing and check-writing. As recent UBI squabbling reminds us, there is a great deal of disagreement on the contemporary left over what the scope of central government should be, what should be directly produced and provided by the state, what should be devolved to individuals and markets and perhaps local governments. But wherever on that spectrum you stand, if you want a more cohesive society, you should be interested in increasing the scale at which the government acts, whether it directly spends or just sends.

It may sometimes be worth sacrificing progressivity for greater scale. But not easily, and perhaps not permanently. High marginal tax rates at the very top are a good thing for reasons unrelated to any revenue they might raise or programs they might finance. During the postwar period when the US had very high marginal tax rates, American corporations were doing very well, but they behaved quite differently than they do today. The fact that wealthy shareholders and managers had little reason to disgorge the cash to themselves, since it would only be taxed away, arguably encouraged a speculative, long-term perspective by managers and let retained earnings accumulate where other stakeholders might claim it. In modern, orthodox finance, we’d describe all of this behavior as “agency costs”. Empire-building, “skunk-works” projects with no clear ROI, concessions to unions from the firm’s flush coffers, all of these are things mid-20th Century firms did that from a late 20th Century perspective “destroyed shareholder value”. But it’s unclear that these activities destroyed social value. We are better off, not worse off, that AT&T’s monopoly rents were not “returned to shareholders” via buybacks and were instead spent on Bell Labs. The high wages of unionized factory workers supported a thriving middle class economy. But would the concessions to unions that enabled those wages have happened if the alternative of bosses paying out funds to themselves had not been made unattractive by high tax rates? If consumption arms races among the wealthy had not been nipped in the bud by levels of taxation that amounted to an income ceiling? Matt Bruenig points out that, in fact, socioeconomically cohesive countries like Sweden do have pretty high top marginal tax rates, despite the fact that the rich pay a relatively small share of the total tax take. Perhaps that is the equilibrium to aspire to, a world with a lot of tax progressivity that is not politically contentious because so few people pay the top rates. Perhaps it would be best if the people who have risen to the “commanding heights” of the economy, in the private or the public sector, have little incentive to maximize their own (pre-tax) incomes, and so devote the resources they control to other things. In theory, this should be a terrible idea: Without the discipline of the market surely resources would be wasted! But in the real world, I’m not sure history bears out that theory.

Update History:

  • 12-Oct-2014, 7:10 p.m. PDT: “When the governments government arranges cash transfers…”
  • 21-Aug-2020, 3:0e p.m. EDT: Robin-Hood-y programs, on the other hand, tend to stay…