Econometrics, open science, and cryptocurrency

Mark Thoma wrote the wisest two paragraphs you will read about econometrics and empirical statistical research in general:

You are testing a theory you came up with, but the data are uncooperative and say you are wrong. But instead of accepting that, you tell yourself "My theory is right, I just haven't found the right econometric specification yet. I need to add variables, remove variables, take a log, add an interaction, square a term, do a different correction for misspecification, try a different sample period, etc., etc., etc." Then, after finally digging out that one specification of the econometric model that confirms your hypothesis, you declare victory, write it up, and send it off (somehow never mentioning the intense specification mining that produced the result).

Too much econometric work proceeds along these lines. Not quite this blatantly, but that is, in effect, what happens in too many cases. I think it is often best to think of econometric results as the best case the researcher could make for a particular theory rather than a true test of the model.

What Thoma is describing here cannot be fixed. Naive theories of statistical analysis presume a known, true model of the world whose parameters a researcher needs simply to estimate. But there is in fact no "true" model of the world, and a moralistic prohibition of the process Thoma describes would freeze almost all empirical work in its tracks. It is the practice of good researchers, not just of charlatans, to explore their data. If you want to make sense of the world, you have to look at it first, and try out various approaches to understanding what the data means. In practice, this means that long before any empirical research is published, its producers have played with lots and lots of potential models. They've examined bivariate correlations, added variables, omitted variables, considered various interactions and functional forms, tried alternative approaches to dealing with missing data and outliers, etc. It takes iterative work, usually, to find even the form of a model that will reasonably describe the space you are investigating. Only if your work is very close to past literature can you expect to be able to stick with a prespecified statistical model, and then you are simply relying upon other researchers' iterative groping.

The first implication of this practice is common knowledge: "statistical significance" never means what it claims to mean. When an effect is claimed to be statistically significant — p < 0.05 — that does not in fact mean that there is only a 1 in 20 chance that the effect would be observed by chance. That inference would be valid only if the researcher had estimated a unique, correctly specified model. If you are trying out tens or hundreds of models (which is not far-fetched, given the combinatorics that apply with even a few candidate variables), even if your data is pure noise then you are likely to generate statistically significant results. Statistical significance is a conventionally agreed low bar. If you can't overcome even that after all your exploring, you don't have much of a case. But determined researchers need rarely be deterred.

Ultimately, what we rely upon when we take empirical social science seriously are the ethics and self-awareness of the people doing the work. The tables that will be published in a journal article or research report represent a tiny slice of a much larger space of potential models researchers will have at least tentatively explored. An ethical researcher asks herself not just whether the table she is publishing meets formalistic validity criteria, but whether it is robust and representative of results throughout the reasonable regions of the model space. We have no other control than self-policing. Researchers often include robustness tests in their publications, but those are as flawed as statistical significance. Along whatever dimension robustness is going to be examined, in a large enough space of models there will be some to choose from that will pass. During the peer review process, researchers may be asked to perform robustness checks dreamed up by their reviewers. But those are shots in the dark at best. Smart researchers will have pretty good guesses about what they may be required to do, and can ensure they are prepared.

Most researchers perceive themselves as ethical, and don't knowingly publish bad results. But it's a fine line between taking a hypothesis seriously and imposing a hypothesis on the data. A good researcher should try to find specifications that yield results that conform to her expectations of reasonableness. But in doing so, she may well smuggle in her own hypothesis. So she should then subject those models to careful scrutiny: How weird or nonobvious were these "good" models? Were they rare? Does the effort it took to find them reflect a kind of violation of Occam's razor? Do the specifications that bear out the hypothesis represent a more reasonable description of the world than the specifications that don't?

These are subjective questions. Unsurprisingly, researchers' hypotheses can be affected by their institutional positions and personal worldviews, and those same factors are likely to affect judgment calls about reasonableness, robustness, and representativeness. As Milton Friedman taught us, in social science, it's often not clear what is a result and what is an assumption, we can "flip" the model and let a result we believe to be true count as evidence for the usefulness of the reasoning that took us there. Researchers may sincerely believe that the models that bear out their hypothesis also provide useful insight into processes and mechanisms that might not have been obvious to them or others prior to their work. Individually or in groups as large as schools and disciplines, researchers may find a kind of consilience between the form of model they have converged upon, the estimates produced when the model is brought to data, and their own worldviews. Under these circumstances, it is very difficult for an outsider to distinguish a good result from a Rorscarch test. And it is very difficult for a challenger, whose worldview may not resonate so well with the model and its results, to weigh in.

Ideally, the check against granting authority to questionable results should be reproduction. Replication is the first, simplest application of reproduction. By replicating work, we verify that a model has been correctly brought to the data, and yields the expected results. Replication is a guard against error or fraud, and can be a partial test of validity if we bring new data to the model. But replication alone is insufficient to resolve questions of model choice. To really examine empirical work, a reviewer needs to make an independent exploration of the potential model space, and ask whether the important results are robust to other choices about how to organize, prepare, and analyze the data. Do similarly plausible, equally robust, specifications exist that would challenge the published result, or is the result a consistent presence, rarely contradicted unless plainly unreasonable specifications are imposed? It may well be that alternative results are unrankable: under one family of reasonable choices, one result is regularly and consistently exonerated, while under another, equally reasonable region of the model space, a different result appears. One can say that neither result, then, deserves very much authority and neither should be dismissed. More likely, the argument would shift to questions about which set of modeling choices is superior, and we realize that we do not face an empirical question after all, but a theoretical one.

Reproduction is too rare in practice to serve as a sufficient check on misbegotten authority. Social science research is a high cost endeavor. Theoretically, any kid on a computer should be able to challenge any Nobelist's paper by downloading some data and running R or something. Theoretically any kid on a computer should be able to write an operating system too. In practice, data is often hard to find and expensive, the technical ability required to organize, conceive, and perform alternative analyses is uncommon, and the distribution of those skills is not orthogonal to the distribution of worldviews and institutional positions. Empirical work is time-consuming, and revisiting already trodden ground is not well rewarded. For skilled researchers, reproducing other peoples' work to the point where alternative analyses can be explored entails a large opportunity cost.

But social science research has high stakes. It may serve to guide — or at least justify — policy. The people who have an interest in a skeptical vetting of research may not have the resources to credibly offer one. The inherent subjectivity and discretion that accompanies so-called empirical research means that the worldview and interests of the original researchers may have crept in, yet without a credible alternative, even biased research wins.

One way to remedy this, at least partially, would be to reduce the difficulty of reproducing an analysis. It has become more common for researchers to make available their data and sometimes even the code by which they have performed an empirical analysis. That is commendable and necessary, but I think we can do much better. Right now, the architecture of social science is atomized and isolated. Individual researchers organize data into desktop files or private databases, write code in statistical packages like Stata, SAS, or R, and publish results as tables in PDF files. To run variations on that work, one often literally needs access to the researcher's desktop, or else reconstruct her desktop on your own. There is no longer any reason for this. All of the computing, from the storage of raw data, to the transformation of isolated variables into normalized data tables that become the input to statistical models, to the estimation of those models, can and should be specified and performed in a public space. Conceptually, the tables and graphs at the heart of a research paper should be generated "live" when a reader views them. (If nothing has changed, cached versions can be provided.) The reader of an article ought to be able to generate sharable appendices by modifying the authors' specifications. A dead piece of paper, or a PDF file for that matter, should not be an acceptable way to present research.

Ultimately, we should want to generate a reusable, distributed, permanent, and ever-expanding web of science, including conjectures, verifications, modifications, and refutations, and reanalyses as new data arrives. Social science should become a reified public commons. It should be possible to build new analyses from any stage of old work, by recruiting raw data into new projects, by running alternative models on already cleaned-up or normalized data tables, by using an old model's estimates to generate inputs to simulations or new analyses.

Technologically, this sort of thing is becoming increasingly possible. Depending on your perspective, Bitcoin may be a path to freedom from oppressive central banks, a misconceived and cynically-flogged remake of the catastrophic gold standard, or a potentially useful competitor to MasterCard. But under the hood, what's interesting about Bitcoin has nothing to do with any of that. Bitcoin is a prototype of a kind of application whose data and computation are maintained by consensus, owned by no one, and yet reliably operated at a very large scale. Bitcoin is, in my opinion, badly broken. Its solution to the problem of ensuring consistency of computation provokes a wasteful arms-race of computing resources. Despite the wasted cycles, the scheme has proven insufficient at preventing a concentration of control which could undermine its promise to be "owned by no one", along with its guarantee of fair and consistent computation. Plus, Bitcoin's solution could not scale to accommodate the storage or processing needs of a public science platform.

But these are solvable technical problems. It is unfortunate that the kind of computing Bitcoin pioneered has been given the name "cryptocurrency", and has been associated with all sorts of technofinancial scheming. When you hear "cryptocurrency", don't think of Bitcoin or money at all. Think of Paul Krugman's babysitting co-op. Cryptocurrency applications deal with the problem of organizing people and their resources into a collaborative enterprise by issuing tokens to those who participate and do their part, redeemable for future services from the network. So they will always involve some kind of scrip. But, contra Bitcoin, the scrip need not be the raison d'être of the application. Like the babysitting co-op (and a sensible monetary economy), the rules for issue of scrip can be designed to maximize participation in the network, rather than to reward hoarding and speculation.

The current state of the art is probably best represented by Ethereum. Even there, the art remains in a pretty rudimentary state — it doesn't actually work yet! — but they've made a lot of progress in less than a year. Eventually, and by eventually I mean pretty soon, I think we'll have figured out means of defining public spaces for durable, large scale computing, controlled by dispersed communities rather than firms like Amazon or Google. When we do, social science should move there.

Update History:

  • 17-Oct-2014, 6:40 p.m. PDT: “already well-trodden”; “yet without a credible alternative alternative
  • 25-Oct-2014, 1:40 a.m. PDT: “whose parameters a researcher need needs simply to estimate”; “a determined researcher researchers need rarely be deterred.”; “In practice, that this means”; “as large as schools or and disciplines”; “write code in statical statistical packages”
 
 

15 Responses to “Econometrics, open science, and cryptocurrency”

  1. Robert Link writes:

    I was right with you on everything in this post until we got to the point about Bitcoin. The guiding principle behind Bitcoin and its ilk is that there should be no trusted third parties in a transaction. But what’s wrong with trusted third parties in this context? We’ve relied on journals, universities, and learned societies to organize science for centuries now, and we’ve done pretty well for it. I don’t see the value in replacing them with a blockchain protocol.

  2. Philip W writes:

    Great, great post. Beginning of wisdom is to acknowledge that social science has an exceptionally difficult road to anything resembling objectivity, and has not traveled very far down that road to this point in its ~150 year history. Transparency is anathema to the profession, which wants very badly to be treated as strictly authoritative RIGHT NOW and must have pretensions to match. I want to say that in the long run transparency must win out as the superior way of achieving real understanding, but I get an uncomfortable utopian feeling. It’s the right fantasy to have, anyway–especially compared to the frankly creepy people who fantasize that economics and political science really have everything figured out already and just need to be empowered. PK’s admiration for Hari Seldon has always been more than a little unsettling to me.

  3. Gator writes:

    Your suggestion that researchers make their data and code public reminded me of the Reinhart and Rogoff Excel fiasco, and makes me think that such a thing is unlikely, precisely due to the fact that it could uncover embarrassing errors.

  4. stone writes:

    In the rest of science, predicting what is going to happen is the crucial mark of when you’re on the right track. Might a comprehensive program of forecast verification also be what economics needs? I realize its difficult, but if there is any point at all in economics, doesn’t it need to have some predictive power?
    http://directeconomicdemocracy.wordpress.com/2014/05/05/comprehensive-forecast-verification-as-a-way-to-hold-macroeconomics-to-account/

  5. Nice article from Mark Thoma, and very good follow up from you. I can only recall what my Econometrics Teacher (more methodological than those practical youngsters all around the world) once told me and/or learnt from him:
    1) Never confuse an Hypothesis with a Theory.
    2) Remember that Hypotheses can be proved or rejected, neither is bad.
    3) Hypotheses should be never get confused with conclusions, nor should you start by the conclusions. Conclusions are not goals, but final aspects of the proven or rejected hypothesis from the econometric study.
    4) Hypotheses are basically an economic relationship to be proven, sustained by theory within that should be explicit in the research. Changing the hypothesis, means also changing the theory within.
    5) Econometrics results depends on the model, and that model is human made, so all possible accuracy defficiencies or artificial efficiencies are caused by the modeler.
    6) A model specification is itself a study, theoretical and practical, it should never be an assumption nor pure statistical decision.
    7) In order to get a good specification, you need to make a prior study. A “data study” or a “previous analysis of data”, where you can understand the behavior, characteristics and other particularities of the data you want to use. This study is both numerical and visual (that means graphics).
    8) If there is something wrong in the preliminary results, first go to the data to check for errors and “weird things”, never automatically adjust the specification.

    There may be more, but in any case, the current econometric practices are getting to superficial, caused by the abundance of teachers that seem to require a more “interesting” way to present econometrics or to work for results for the person or institution they work for, and strictly speaking, rejecting hypothesis is not a good way to get funds or salaries.

    Jose Manuel Martin
    EMECEP Consulting
    http://www.emecep-consultoria.com
    (Lima, PerĂș)

  6. N writes:

    You don’t mention holdback. Half the data should be in escrow somewhere. When the researchers think they have a good model, they commit to publish. If a journal accepts, only then is the model validated against the clean data. The result gets published whether validation passes or not.
    A secure data escrow system is impossible if the researchers are also the data collectors, but it’s hard to detect falsified data in that situation too.

  7. […] Econometrics, open science, and cryptocurrency – interfluidity […]

  8. […] Econometrics, open science, and cryptocurrency Interfluidity […]

  9. […] Randy Waldman has a nice thing about consensus forming applications in his most recent article on open science, econometrics and […]

  10. Anonymous writes:

    […] […]

  11. The goal of a future research infrastructure as described here is a worthy one. Efforts in that direction have been under way for some time, for example in a group of researchers and academic publishers called Force11. Practical implementations in that directions have come from, for example, the work of Prof. Carol Goble and others on implementing scientific workflows and preserving them for the future.

    But, alas, although the idea of implementing the infrastructure as a cryptocurrency-like peer-to-peer network has many advantages, it faces a very difficult problem. As Brian Arthur pointed out technology markets generally, and P2P networks specifically, have increasing returns to scale. This means that the failure of Bitcoin to prevent concentration is not some minor problem that can be fixed with a tweak to the protocol, it is a fundamental problem. For more detailed analysis, see my blog post Economies of Scale in Peer-to-Peer Networks.

  12. Maynard Handley writes:

    “… Half the data should be in escrow somewhere. When the researchers think they have a good model, they commit to publish…”

    Something like this is ALREADY done in physics. (Perhaps a case where physics envy might actually USEFULLY be engaged in by the social sciences…)
    Here’s a brief description of how CERN does it.
    http://cms.web.cern.ch/news/blinding-and-unblinding-analyses

    Of course the technical details would differ depending on the discipline and the analysis, but the social structures behind the procedure should probably be universal.

  13. GSo writes:

    I think Feynman summed this up very well some years ago, talking about
    “a specific, extra type of integrity that is not lying, but bending over backwards to show how you’re maybe wrong, that you ought to have when acting as a scientist. And this is our responsibility as scientists, certainly to other scientists, and I think to laymen”

    http://neurotheory.columbia.edu/~ken/cargo_cult.html

  14. JW Mason writes:

    You’re reight about the problem. To the credit of the economics profession, major journals do now require that every article have an online appendix that includes all data and all code used to generate the results. I don’t know how common this is in other social sciences.

    Of course, that doesn’t address the larger issue. One solution there, I think, is a heightened suspicion of “sophisticated” statistical technique. I sometimes think that graduate econometrics courses — and in fact much of the whole econometrics profession — are value-subtracting in that they encourage/reward use of more complex techniques, which both introduce additional degrees of freedom and make replication more challenging.

    I would love to know more about how the hard sciences avoid these problems. (Or do they?)

  15. […] nice balanced post on the necessity, and pitfalls, of exploratory data analysis. Resonates with Brian’s old post. […]