School of Information Blogs

May 22, 2017

Ph.D. student

hard realism about social structure

Sawyer’s (2000) investigations into the theory of downward causation of social structure are quite subtle. He points out several positions in the sociological debate about social structure:

  • Holists, who believe social structures have real, independent causal powers sometimes through internalization by individuals.
  • Subjectivists, who believe that social structures are epiphenomenal, reducible to individuals
  • Interactionists, who see patterns of interaction as primary, not the agents or the structures that may produce the interactions
  • Hybrid theorists, who see an interplay between social structure and independent individual agency.

I’m most interested at the moment in the holist, subjectivist, and hybrid positions. This is not because I don’t see interaction as essential–I do. But I think that recognizing that interactions are the medium if not material of social life does not solve the question of why social interactions seem to be structured the way they do. Or, more positively, the interactionist contributes to the discussion by opening up process theory and generative epistemology (cf. Cederman, 2005) as a way of getting at an answer to the question. It is up to us to take it from there.

The subjectivists, in positing only the observable individuals and their actions, has Occam’s Razor on their side. To posit the unobservable entities of social forms is to “Multiply entities unecessarily”. This perhaps accounts for the durability of the subjectivist thesis. The scientific burden of proof is, in a significant sense, on the holist or hybrid theorist to show why the positing of social forms and structures offers in explanatory power what it lacks in parsimony.

Another reason for the subjectivist position is that it does ideological work. Margaret Thatcher famously once said, “There is not such thing as society”, as a condemnation of the socialist government that she would dismantle in favor of free markets. Margaret Thatcher was highly influenced by Friedrich Hayek, who argued that free markets lead to more intelligent outcomes than planned economies because they are better at using local and distributed information in society. Whatever you think of the political consequences of his work, Hayek was an early theorist in society as a system of multiple agents with “bounded rationality“. A similar model to Hayek’s is developed and tested by Epstein and Axtell (1996).

On the other hand, our natural use of language, and social expectations, and legal system all weigh in favor of social forms, institutions, and other structures. These are, naturally, all “socially constructed” but these social constructs undeniably reproduce themselves; otherwise, they would not continue to exist. This process of self-reproduction is named autopoiesis (from ‘auto-‘ (self-), ‘-poisis’ (-creation)) by Maturana and Varela (1991). The concept has been taken up by Luhmann (1995) in social theory and Brier (2008) in Library and Information Sciences (LIS). As these later theorists argue, the phenomenon of language itself can be explained only as a autopoietic social system.

There is a gap between the positions of autopoiesis theorists and the sociological holists discussed by Sawyer. Autopoiesis is, in Varela’s formulation, a general phenomenon about the organization of matter. It is, in his view, the principle of organization of life on the cellular level.

Contrast this with the ‘holist’ social theorist who sees social structures as being reproduced by the “internalization” of the structure by the constituent agents. Social structures, in this view, depend at least in part on their being understood or “known” by the agents participating in them. This implies that the agents have certain cognitive powers that, e.g., strands of organic chemicals do not. [Sawyer refers to Castelfranchi, 1998 on this point; I have yet to read it.] Arguably, social norms are only norms because they are understood by agents involved. This is the position of Habermas (1985) for example, whose whole ethical theory depends on the rational acceptance of norms in free discussion. (This is the legacy of Immanuel Kant.)

What I am arguing for is that there is, in actuality, another position, not identified by Sawyer (2000), on the emergence of social structure that does not depend on internalization but that nevertheless has causal reality. Social forms may arise from individual activity in the same way that biological organization arises from unconscious chemical interactions. I suppose this is a form of holism.

I’d like to call this view the “hard realist” view of social structure, to contrast with “soft realist” views of social structure that depend on internalization by agents. I don’t mean for this to be taken aggressively, but rather because I have a very concrete distinction in mind. If social structure depends on internalization by agents, then that means (by definition, really) that there exists an intervention on the beliefs of agents that could dissolve the social structure and transform it into something else. For example, an left-wing anarchist might argue that money only has value because we all believe it has value. If we were to just all stop valuing money, we could have a free and equal society at last.

If social structures exist even in spite of the recognition of them by social actors, then the story is quite different. This means (by definition) that interventions on the beliefs of actors will not dissolve the structure. In other words, just because something is a social construct does not mean that it can be socially deconstructed by a process of reversal. Some social structures may truly have a life of their own. (I would expect this to be truer the more we delegate social moderation to technology.)

This story is complicated by the fact that social actors vary in their cognitive capacities and this heterogeneity can materially impact social outcomes. Axtell and Epstein (2006) have a model of the formation of retirement age norms in which a small minority of actors make their decision rationally based on expected outcomes and the rest adopt the behavior of the majority of their neighbors. This results in dynamic adjustments to behavior that, under certain parameters, make the total society look more individually rational than they are in fact. This is encouraging to those of us who sometimes feel our attempts to rationally understand the world are insignificant in the face of social inertia more broadly speaking.

But it also makes it difficult to judge empirically whether a “soft realist” or “hard realist” view of social structure is more accurate. It also makes the empirical distinction between the holist and subjectivist positions difficult, for that matter. Surveying individuals about their perceptions of their social world will tell you nothing about hard realist social structures. If there are heterogenous views about what the social order actually is, that may or may not impact the actual social structure that’s there. Real social structure may indeed create systematic blindnesses in the agents that compose them.

Therefore, the only way to test for hard realist social structure is to look at aggregate social behavior (perhaps on the interactionist level of analysis) and identify where its regularities can be attributed to generative mechanisms. Multi-agents systems and complex adaptive systems look like the primary tools in the toolkit for modeling these kinds of dynamics. So far I haven’t seen an adequate discussion of how these theories can be empirically confirmed using real data.


Axtell, Robert L and Epstein, J. M. “COORDINATION IN TRANSIENT SOCIAL NETWORKS: AN AGENT-BASED COMPUTATIONAL MODEL OF THE TIMING OF RETIREMENT ROBERT L. AXTELL AND JOSHUA M. EPSTEIN.” Generative social science: Studies in agent-based computational modeling (2006): 146.

Brier, Søren. Cybersemiotics: Why information is not enough!. University of Toronto Press, 2008.

Castelfranchi, Cristiano. “Simulating with cognitive agents: The importance of cognitive emergence.” International Workshop on Multi-Agent Systems and Agent-Based Simulation. Springer Berlin Heidelberg, 1998.

Cederman, Lars-Erik. “Computational models of social forms: Advancing generative process theory 1.” American Journal of Sociology 110.4 (2005): 864-893.

Epstein, Joshua M., and Robert Axtell. Growing artificial societies: social science from the bottom up. Brookings Institution Press, 1996.

Habermas, Jurgen, Jürgen Habermas, and Thomas McCarthy. The theory of communicative action. Vol. 2. Beacon press, 1985.

Hayek, Friedrich August. “The use of knowledge in society.” The American economic review (1945): 519-530.

Luhmann, Niklas. Social systems. Stanford University Press, 1995.

Maturana, Humberto R., and Francisco J. Varela. Autopoiesis and cognition: The realization of the living. Vol. 42. Springer Science & Business Media, 1991.

Sawyer, R. Keith. “Simulating emergence and downward causation in small groups.” Multi-agent-based simulation. Springer Berlin Heidelberg, 2000. 49-67.

by Sebastian Benthall at May 22, 2017 03:16 PM

May 19, 2017

MIMS 2016

Why is it asking for gender and age? Not sure how that relates to recipes, but I also don’t cook…

Why is it asking for gender and age? Not sure how that relates to recipes, but I also don’t cook…

I do think that if gender is absolutely necessary, it should give gender neutral options as well.

by Andrew Huang at May 19, 2017 01:30 AM

May 18, 2017

Ph.D. student

WannaCry as an example of the insecurity of legacy systems

CLTC’s Steve Weber and Betsy Cooper have written an Op-Ed about the recent WannaCry epidemic. The purpose of the article is clear: to argue that a possible future scenario CLTC developed in 2015, in which digital technologies become generally distrusted rather than trusted, is relevant and prescient. They then go on to elaborate on this scenario.

The problem with the Op-Ed is that the connection between WannaCry is spurious. Here’s how they make the connection:

The latest widespread ransomware attack, which has locked up computers in nearly 150 countries, has rightfully captured the world’s attention. But the focus shouldn’t be on the scale of the attack and the immediate harm it is causing, or even on the source of the software code that enabled it (a previous attack against the National Security Agency). What’s most important is that British doctors have reverted to pen and paper in the wake of the attacks. They’ve given up on insecure digital technologies in favor of secure but inconvenient analog ones.

This “back to analog” moment isn’t just a knee-jerk, stopgap reaction to a short-term problem. It’s a rational response to our increasingly insecure internet, and we are going to see more of it ahead.

If you look at the article that they link to from The Register, which is the only empirical evidence they use to make their case, it does indeed reference the use of pen and paper by doctors.

Doctors have been reduced to using pen and paper, and closing A&E to non-critical patients, amid the tech blackout. Ambulances have been redirected to other hospitals, and operations canceled.

There is a disconnect between what the article says and what Weber and Cooper are telling us. The article is quite clear that doctors are using pen and paper amid the tech blackout. Which is to say, because their computers are currently being locked up by ransomware, doctors are using pen and paper.

Does that mean that “They’ve given up on insecure digital technologies in favor of secure but inconvenient analog ones.”? No. It means that since they are waiting to be able to use their computers again, they have no other recourse but to use pen and paper. Does the evidence warrant the claim that “This “back to analog” moment isn’t just a knee-jerk, stopgap reaction to a short-term problem. It’s a rational response to our increasingly insecure internet, and we are going to see more of it ahead.” No, not at all.

In their eagerness to show the relevance of their scenario, Weber and Cooper rush say where the focus should be (on CLTC’s future scenario planning) that they ignore everything specific to WannaCry, most of which do not help their case. For example, there’s the issue that the vulnerability exploited by WannaCry had been publicly known for two months before the attack, and that Microsoft had already published a patch to the problem. The systems that were still vulnerability either did not apply the software update or were using an unsupported older version of Windows.

This paints a totally different picture of the problem than Weber and Cooper provide. It’s not that “new” internet infrastructure is insecure and “old” technologies are proven. Much of computing and the internet is already “old”. But there’s a life cycle to technology. “New” systems are more resilient (able to adapt to an attack or discovered vulnerability) and are smaller targets. Older legacy systems with a large installed based, like Windows 7, become more globally vulnerability if their weaknesses are discovered and not addressed. And if they are in widespread use, that presents a bigger target.

This isn’t just a problem for Windows. In this research paper, we show how similar principles are at work in the Python ecosystem. The riskiest projects are precisely those that are old, assumed to be secure, but no longer being actively maintained while the technical environment changes around them. The evidence of the WannaCry case further supports this view.

by Sebastian Benthall at May 18, 2017 02:20 PM

May 17, 2017

Ph.D. student

Sawyer on downward causation in social systems

The work of R. Keith Sawyer (2000) is another example of computational social science literature that I wish I had encountered ten years ago. Sawyer’s work from the early ’00’s is about the connections between sociological theory and multi-agent simulations (MAS).

Sawyer uses an example of an improvisational theater skit to demonstrate how emergence and downward causation work in a small group setting. Two actors in the skit exchange maybe ten lines, each building on the expectations set by the prior actions. The first line establishes the scene is a store, and one of the actors is the owner. The second actor approaches; the first greets her as if she is a customer. She acts in a childlike way and speaks haltingly, establishing that she needs assistance.

What changes in each step of the dialogue is the shared “frame” (in Sawyer’s usage) which defines the relationships and setting of the activity. Perhaps because it is improvisational theater, the frame is carefully shared between the actors. The “Yes, And…” rule applies and nobody is contradicted. This creates the illusion of a social reality, shared by the audience.

Reading this resonated with other reading and thinking I’ve done on ideology. I think about situations where I’ve been among people with a shared vision of the world, or where that vision of the world has been contested. Much of what is studied as framing in media studies is about codifying the relations between actors and the interpretation of actions.

Surely, for some groups to survive, they must maintain a shared frame among their members. This both provides a guide for collective action and also a motivation for cohesion. An example is an activist group at a protest. If one doesn’t share some kind of frame about the relationships between certain actors and the strategies being used, it doesn’t make sense to be part of that protest. The same is true for some (but maybe not all) academic disciplines. A shared social subtext, the frame, binds together members of the discipline and gives activity within it meaning. It also motivates the formation of boundaries.

I suppose the reification of Weird Twitter was an example of a viral framing. Or should I say enframing?! (Heidegger joke).

Getting back to Sawyer, his focus is on a particularly thorny aspect of social theory, the status of social structures and their causal efficacy. How do macro- social forms emerge from individual actors (or actions), and how do those macro- forms have micro- influence over individuals (if they do at all)? Broadly speaking in terms of theoretical poles, there are historically holists, like Durkheim and Parsons, who maintain that social structures are real and have causal power through, in one prominent variation, the internalization of the structure by individuals; subjectivists, like Max Weber, who see social structure as epiphenomenal and reduce it to individual subjective states; and interactionists, which focuses on the symbolic interactions between agents and the patterns of activity. There are also hybrid theories that combine two or more of these views, most notably Giddens, who combines holist and subjectivist positions in his theory of structuration.

After explaining all this very clearly and succinctly, he goes on to talk about which paradigms of agent based modeling correspond to which classes of sociological theory.


Sawyer, R. Keith. “Simulating emergence and downward causation in small groups.” Multi-agent-based simulation. Springer Berlin Heidelberg, 2000. 49-67.

by Sebastian Benthall at May 17, 2017 07:06 PM

May 16, 2017

Ph.D. student

Similarities between the cognitive science/AI and complex systems/MAS fields

One of the things that made the research traditions of cognitive science and artificial intelligence so great was the duality between them.

Cognitive science tried to understand the mind at the same time that artificial intelligence tried to discover methods for reproducing the functions of cognition artificially. Artificial intelligence techniques became hypotheses for how the mind worked, and empirically confirmed theories of how the mind worked inspired artificial intelligence techniques.

There was a lot of criticism of these fields at one point. Writers like Hubert Dreyfus, Lucy Suchman, and Winograd and Flores critiqued especially heavily one paradigm that’s now called “Good Old Fashioned AI”–the kind of AI that used static, explicit representations of the world instead of machine learning.

That was a really long time ago and now machine learning and cognitive psychology (including cognitive neuroscience) are in happy conversation, with much more successful models of learning that by and large have absorbed the critiques of earlier times.

Some people think that these old critiques still apply to modern methods in AI. Isn’t AI still AI? I believe the main confusion is that lots of people don’t know that “computable” means something very precisely mathematical: it means a function that is calculable by a partial recursive function. It just so happens that computers, the devices we know and love, can compute any computable function.

So what changed in AI was not that they were using computation to solve problems, but the way they used computation. Similarly, while there was a period where cognitive psychology tried to model mental processes using a particular kind of computable representation, and these models are now known to be inaccurate, that doesn’t mean that the mind doesn’t perform other forms of computation.

A similar kind of relationship is going on between the study of complex systems, especially complex social systems, and the techniques of multi-agent system modeling. Multi-agent system modeling is, as Epstein clarifies, about generative modeling of social processes that is computable in the mathematical sense, but the fact that physical computers are involved is incidental. Multi-agent systems are supposed to be a more realistic way of modeling agent interactions than, say, neoclassical game theory, in the same way that machine learning is a more realistic way of modeling cognition than GOFAI.

Given that, despite (or, more charitably because of) the critiques leveled against it, cognitive science and artificial intelligence have developed into widely successful and highly respected fields. We should expect complex systems/multi-agent systems research to follow a similar trajectory.

by Sebastian Benthall at May 16, 2017 09:03 PM

May 13, 2017

Ph.D. student

Varian taught Miller

“The emerging tapestry of complex systems research is being formed by localized individual efforts that are becoming subsumed as part of a greater pattern that holds a beauty and coherence that belies the lack of an omniscient designer.” – John H. Miller and Scott Page, Complex Adaptive Systems: An Introduction to Computational Models of Social Life

I’ve been giving myself an exhilarating crash course in the complex systems literature. Through reading several books and articles on the matter, one gets a sense of the different authors, their biases and emphasis. Cederman works carefully to ground his work in a deeper sociological tradition. Epstein is no-nonsense about the connection between mathematicity and computation and social scientific method. Holland is clear that social systems are, in his view, a special case of a more generalized object of scientific study, complex adaptive systems.

Perhaps the greatest challenge to any system, let alone social system, is self-reference. The capacity of social science as a system (or systems) to examine themselves is the subject of much academic debate and public concern. Miller and Page, in their Complex Adaptive Systems: An Introduction to Computational Models of Social Life, begin with their own comment on the emergence of complex systems research using a symbolic vocabulary drawn from their own field. They are conscious of their work as a self-reflective thesis that forms the basis of a broader and systematic education in their field of research.

As somebody who has attempted social scientific investigation of scientific fields (in my case, open source scientific software communities, along with some quasi-ethnographic work), my main emotions when reacting to this literature are an excitement about its awesome potential and a frustration that I have not been studying it sooner. I have been intellectually hungry for this material while studying at Berkeley, but it wasn’t in the zeitgeist of the places I was a part of to take this kind of work as the basis for study.

I think it’s fair to say that most of the professors there have heard of this line of work but are not experts in it. It is a relatively new field and UC Berkeley is a rather conservative institution. To some extent this explains this intellectual gap.

So then I discovered in the acknowledgements section of Miller and Page that Hal Varian taught John H. Miller when both were at University of Michigan. Hal Varian would then go on to be the first dean of my own department, the School of Information, before joining Google as their “chief economist” in 2002.

Google in 2002. I believe he helped design the advertising auction system, which was the basis of their extraordinary business model.

I’ve had the opportunity to study a little of Varian’s work. It’s really good. Microeconomic theory pertinent to the information economy. It included theory relevant to information security, as Ross Anderson’s recent piece in Edge discusses. This was highly useful stuff that is at the foundation of the modern information economy, at the very least to the extent that Google is at the foundation of the modern information economy, which it absolutely is.

This leaves me with a few burning questions. The first is why isn’t Varian’s work taught to everybody in the School of Information like it’s the f—ing gospel? Here we have a person who founded the department and by all evidence discovered and articulated knowledge of great importance to any information enterprise or professional. So why is it not part of the core curriculum of a professional school aimed at preparing people for Silicon Valley management jobs?

The second question is why isn’t work descending from Varian’s held in higher esteem at Berkeley? Why is it that neoclassical economic modeling, however useful, is seen as passé, and complex systems work almost unheard of? It does not, it seems to me, reflect the lack of prestige awarded the field nationally. I’m seeing Carnegie Mellon, University of Michigan, the Brookings Institute, Johns Hopkins, and Princeton all represented among the scholars studying complex systems. Berkeley is precisely the sort of place you would expect this work to flourish. But I know of only one professor there who teaches it with seriousness, a relatively new hire in the Geography department (who I in no way intend to diminish by writing this post; on the contrary).

One explanation is, to put it bluntly, brain drain. Hal Varian left Berkeley for Google in 2002. That must have been a great move for him. Perhaps he assumed his legacy would be passed on through the education system he helped to found, but that is not exactly what happened. Rather, it seems he left a vacuum for others to fill. Those left to fill it were those with less capacity to join the leadership of the booming technology industry: qualitative researchers. Latourians. The eager ranks of the social studier. (Note the awkardness of the rendering of ‘Studies’ as a discipline to its practicioner, a studier.) Engineering professors stayed on, and so the university churns out capable engineers which go on to lucrative careers. But something, some part of the rigorous strategic vision, was lost.

That’s a fable, of course. But one has to engage in some kind of sense-making to get through life. I wonder what somebody with a closer relationship to the administration of these institutions would say to any of this. For now, I have my story and know what it is I’m studying.

by Sebastian Benthall at May 13, 2017 02:03 PM

May 12, 2017

Ph.D. student

Hurray! Epstein’s ‘generative’ social science is ‘recursive’ or ‘effectively computable’ social science!

I’m finding recent reading on agent-based modeling profoundly refreshing. I’ve been discovering a number of writers with a level of sanity about social science and computation that I have been trying to find for years.

I’ve dipped into Joshua Epstein’s Generative Social Science: Studies in Agent-Based Computational Modeling (2007), which the author styles as a sequel to the excellent Growing Artificial Societies: Social Science from the Bottom Up (1996). Epstein explains that while the first book was a kind of “call to arms” for generative social science, the later book is a firmer and more mature theoretical argument, in the form of a compilation of research offering generative explanations for a wide variety of phenomena, including such highly pertinent ones as the emergence of social classes and norms.

What is so refreshing about reading this book is, I’ll say it again, the sanity of it.

First, it compares generative social science to other mathematical social sciences that use game theory. It notes that, though there are exceptions, the problem with these fields is their tendency to see explanation in terms of Nash equilibria of unboundedly rational agents. There’s lots of interesting social phenomena that are not in such an equilibrium–the phenomenon might itself be a dynamic one–and no social phenomenon worth mentioning has unboundedly rational agents.

This is a correct critique of naive mathematical economic modeling. But Epstein does not throw the baby out with the bathwater. He’s advocating for agent-based modeling through computer simulations.

This leads him to respond preemptively to objections. One of these responses is “The Computer is not the point”. Yes, computers are powerful tools and simulations in particular are powerful instruments. But it’s not important to the content of the social science that the simulations are being run on computers. That’s incidental. What’s important is that the simulations are fundamentally translatable into mathematical equations. This follows from basic theory of computation: every computed program is equivalent to some mathematical function. Hence, “generative social science” might as well be called “recursive social science” or “effectively computable social science”, he says; he took the term “generative” from Chomsky (i.e. “generative grammer”).

Compare this with Cederman’s account of ‘generative process theory‘ in sociology. For Cederman, generative process theory is older than the theory of computation. He locates its origin in Simmel, a contemporary of Max Weber. The gist of it is that you try to explain social phenomena by explaining the process that generates it. This is a triumphant position to take because it doesn’t have all the problems of positivism (theoretical blinders) or phenomenology (relativism).

So there is a sense in which the only thing Epstein is adding on top of this is the claim that proposed generative processes be computable. This is methodologically very open-ended, since computability is a very general mathematical property. Naturally the availability of computers for simulation makes this methodological requirement attractive, just as ‘analytic tractability’ was so important for neoclassical economic theory. But on top of its methodological attractiveness, there is also an ontological attractiveness to the theory. If one accepts what Charles Bennett calls the “physical Church theory”–the idea that the Church-Turing thesis applies not just to formal systems of computation but to all physical systems–then the foundational assumption of Epstein’s generative social science holds not just as a methodological assumption.

This was all written in 2007, two years before Lazer et al.’s “Life in the network: the coming age of computational social science“. “Computational social science”, in their view, is about the availability of data, the Internet, and the ability to look at society with a new rigor known to the hard sciences. Naturally, this is an important phenomenon. But somehow in the hype this version of computational social science became about the computers, while the underlying scientific ambition to develop a generative theory of society was lost. Computability was an essential feature of the method, but the discovery (or conjecture) that society itself is computation was lost.

But it need not be. Just a short dip into it, Epstein’s Generative social science is a fine, accessible book. All we need to do is get everybody to read it so we can all get on the same page.


Cederman, Lars-Erik. “Computational models of social forms: Advancing generative process theory 1.” American Journal of Sociology 110.4 (2005): 864-893.

Epstein, Joshua M., and Robert L. Axtell. “Growing artificial societies: Social science from the bottom up (complex adaptive systems).” (1996).

Epstein, Joshua M. Generative social science: Studies in agent-based computational modeling. Princeton University Press, 2006.

Lazer, David, et al. “Life in the network: the coming age of computational social science.” Science (New York, NY) 323.5915 (2009): 721.

by Sebastian Benthall at May 12, 2017 01:57 AM

May 05, 2017

Ph.D. student

Society as object of Data Science, as Multi-Agent System, and/or Complex Adaptive System

I’m drilling down into theory about the computational modeling of social systems. In just a short amount of time trying to take this task seriously, I’ve already run into some interesting twists.

A word about my trajectory so far: my background, such as it is, has been in cognitive science and artificial intelligence, and then software engineering. For the past several years I have been training to be a ‘data scientist’, and have been successful at that. This means getting a familiarity with machine learning techniques (a subset of AI), the underlying mathematical theory, software tooling, and research methodology to get valuable insights out of unstructured or complex observational data. The data sets I’m interested are as a rule generated by some sort of sociotechnical process.

As much as the techniques of data science lead to rigorous understanding of data at hand, there’s been something missing from my toolbox, which is the appropriate modeling language for social processes that can encode the kinds of implicit theories that my analysis surfaces. Hence the transition I am attempting to go from being a data scientist, a diluted term, to a computational social scientist.

The difficulty, navigating as I am out of a very odd intellectual niche, is acquiring the theoretical vocabulary that bridges the gap between social theory and computational theory. In my training at Berkeley’s School of Information, frequently computational theory and social theory have been assumed to be at odds with each other, applying to distinct domains of inquiry. I gather that this is true elsewhere as well. I have found this division intellectually impossible to swallow myself. So now I am embarking on an independent expedition into the world of computational social theory.

One of pieces that’s grounding my study, as I’ve mentioned, is Cederman’s work outline the relationship between generative process theory, multi-agent simulations (MAS), and computational sociology. It is great work for connecting more recent developments in computational sociology with earlier forms of sociology proper. Cederman cites interesting works by R. Keith Sawyer, who goes into depth about how MAS can shed light on some of the key challenges of social theory: how does social order happen? The tricky part here is the relationship between the ‘macro’ level ‘social forms’ and the ‘micro’ level individual actions. I disagree with some of Sawyer’s analysis, but I think he does a great of setting up the problem and its relationship to other sociological work, such as Giddens’s work on structuration.

This is, so far, all theory. As a concrete example of this method, I’ve been reading Epstein and Axtell’s Growing Artificial Societies (1996), which I gather is something of a classic in the field. Their Sugarscape model is very flexible and their simulations shed light on timeless questions of the relationship between economic activity and inequality. Their presentation is also inspiring.

As a rule I’m finding the literature in this space far more accessible than I would have expected. It’s often written in very plain language and depends more on the power of illustration than scientific terminology laden with intellectual authority. What I have encountered so far is, perhaps as a consequence, a little unsatisfying intellectually. But it’s all quite promising.

Based on these leads, I was recommended David Little’s recent blog post about complexity in social science. He’s quite critical of the bolder claims of these scientists; I’d like to revisit these arguments later. But what was most valuable for me were his references. One was a book by Epstein, who I gather has gone on to do a lot more work since co-authoring Growing Artificial Societies. This seems to continue in the vein of ‘generative’ modeling shared by Cederman.

But Little references two other sources: John Holland’s Complexity: A Very Short Introduction and Miller and Page’s Complex Adaptive Systems: An Introduction to Computational Models of Social Life.

This is actually a twist. Holland as well as Miller and Page appear to be concerned mainly with complex adaptive systems (CAS), which appear to be more general than MAS. At least, in Holland’s rendition, which I’m now reading. MAS, Cederman and Sawyer both argue, is inspired in part by Object Oriented Programming (OOP), a programming paradigm that truly does lend itself to certain kinds of simulations. But Holland’s work seems more ambitious, tying CAS back to contributions made by von Neumman and Noam Chomsky. Holland is after a general scientific theory of complexity, not a specific science of modeling social phenomena. Perhaps for this reason his work echoes some work I’ve seen in systems ecology on autocatalysis and Varela’s work on autopoiesis.

Indeed the thread of Varela may well lead to where I’m going. One paper I’ve seen ties computational sociology to Luhmann’s theory of communication; Luhmann drew on Varela’s ideas of autopoeisis explicitly. So there is likely a firm foundation for social theory somewhere in here.

These are fruitful investigations. What I’m wondering now is to what extent the literatures on MAS and CAS are divergent.



by Sebastian Benthall at May 05, 2017 02:38 PM

May 03, 2017

Ph.D. student

Responding to Kelkar on the study and politics of artificial intelligence

I quite like Shreeharsh Kelkar’s recent piece on artificial intelligence as a thoughtful comment on the meaning of the term today and what Science and Technology Studies (STS) has to offer the public debate about it.

When AI researchers (and today this includes people who label themselves machine learning researchers, data scientists, even statisticians) debate what AI really means, their purpose is clear: to legitimate particular programs of research. When AI researchers (and today this includes people who label themselves machine learning researchers, data scientists, even statisticians) debate what AI really means, their purpose is clear: to legitimate particular programs of research. What agenda do we—as non-participants, yet interested bystanders—have in this debate, and how might it be best expressed through boundary work? STS researchers have argued that contemporary AI is best viewed as an assemblage that embodies a reconfigured version of human-machine relations where humans are constructed, through digital interfaces, as flexible inputs and/or supervisors of software programs that in turn perform a wide-variety of small-bore high-intensity computational tasks (involving primarily the processing of large amounts of data and computing statistical similarities). It is this reconfigured assemblage that promises to change our workplaces, rather than any specific technological advance. The STS agenda has been to concentrate on the human labor that makes this assemblage function, and to argue that it is precisely the invisibility of this labor that allows the technology to seem autonomous. And of course, STS scholars have argued that the particular AI assemblage under construction is disproportionately tilted towards benefiting Silicon Valley capitalists.

This is a compelling and well-stated critique. There’s just a few ways in which I would contest Kelkar’s argument.

The first is to argue that political thrust of the critique, that artificial intelligence often involves a reconfiguration of the relationship between labor and machines, is not in general not one made firmly by STS scholars. In Kelkar’s own characterization, STS researchers are “non-participants, yet interested bystanders” in the debate about AI. This distancing maneuver by STS researchers brackets off how their own workplaces, as white collar information workers, are constantly being reconfigured by artificial intelligence, while their funding is tied up to larger forces in the information economy. Therefore there’s always something disingenuous to the STS’s researcher’s claim to be a bystander, a posturing which allows them to be provocative but take no responsibility for the consequences of the provocation.

In contrast, one could consider the work of Nick Land, who is as far as I can tell not taken seriously by STS researchers though he’s by now a well-known theorist on similar subjects. I haven’t studied Land’s work much myself; I get my understanding mainly through S.C. Hickman’s excellent blogging. I also cannot really speak to Land’s connection with the alt-right; I just don’t know much about it. What I believe Land has done is tried to develop social theory that takes into account the troubling relationship between artificial intelligence and labor, articulated the relationship, and become not just a bystander but a participant in the debate.

Essentially what I’m arguing is that if STS researchers don’t activate the authentic political tendency in their own work, which often is either a flavor of accelerationism or a reaction to it, they are being, to use an old phrase for which I can find no immediate substitute, namby pamby. If one has a sophomore-level understanding of Marxist theory and can make the connection between artificial intelligence and capital, it’s not clear what is added by the STS perspective besides a lot of particularization of the theory.

The other criticism of Kelkar’s argument is that it isn’t at all charitable to AI researchers. Somehow it collapses all discussion of AI into a “contemporary” debate with an underlying economic anxiety. Even the AI researchers are, in this narrative, driven by economic anxiety, as their own articulation of their research agenda exists only for its own legitimization. The natural tendency of STS researchers is to see scientists as engaged primarily in rhetorical practices aimed at legitimizing their own research. This tends to obscure any actual technological advances made by scientists. AI researchers are no exception. Let’s assume that artificial intelligence does indeed reconfigure the relationship between labor and capital, rendering much labor invisible and giving the illusion of autonomy to machines capable of intense computational tasks, for the ultimate benefit of Silicon Valley capitalists. STS researchers, at least those characterized by Kelkar, downplay that there are specific technical advances that make that reconfiguration possible, and that these accomplishments are expensive and require an enormous amount of technical labor, and moreover that there are fundamental mathematical principles underlying the development of this technology, But these are facts of the matter that are extremely important to anybody who is an actual participant in the the debates around AI, let alone the economy that AI is always already reconfiguring.

The claim that AI researchers are mainly legitimizing themselves through the rhetoric of calling their work “artificial intelligence”, as opposed to accomplishing scientific and engineering feats, is totally unhelpful if one is interested in the political consequences of artificial intelligence. In my academic experience, this move is primarily one of projection: STS researchers are constantly engaged in rhetorical practices legitimizing themselves, so why shouldn’t scientists be as well? As long as one is a “bystander”, having no interest in praxis, there is no contest except rhetorical contest for legitimacy of research agendas. This is entirely a product of the effete conditions of academic research disengaged from all reality except courting funding agencies. If STS scholars turned themselves towards the task of legitimizing themselves through actual political gains, their understanding of artificial intelligence would be quite different indeed.

by Sebastian Benthall at May 03, 2017 09:06 PM

May 02, 2017

Ph.D. student

Civil liberties and liberalism in the EU’s General Data Protection Regulation (GDPR)

I’ve been studying the EU’s General Data Protection Regulation and reading the news.

In the news, I’m reading all the time about how the European Union is the last bastion of the “post-war liberal order”, threatened on all sides by ethnonationalism, including from the United State. Some writers have argued that the U.S. has simply moved on from the historical conditions of liberalism, with liberals as a class just having trouble getting over it. Brexit is somehow also framed as an ethnonationalist project. Whether scapegoat or actual change agent, new attention is on Russia, which has never been liberal and justified its action to take back Crimea based on the ethnic Russian-ness of that territory.

Despite it being in many parts of the world normal, one thing that’s upsetting to liberalism about ethnonationalism is the idea that the nation is rooted in an ethnicity, which is a form of social collective bound by genetic and family ties, and not in individual autonomy. From here it is a short step to having that ethnicity empowered in its command of the nation-state. And as we have been taught in history, when you have states acting on behalf of certain ethnicities, those states often treat other ethnicities in ways that are, from a liberal perspective, unjust. One of the first things to go are the freedoms, especially political freedoms (the kinds of freedoms that lead directly or indirectly to political power).

This is just a preface, not intended in any particular way, to explain why I’m interested in some of the language in the General Data Protection Regulation (GDPR). I’m studying GDPR because I’m studying privacy engineering: how to design technical systems that preserve people’s privacy. For practical reasons this requires some review of the relevant legislation. Compliance with the law is, if nothing else, a business concern, and this makes it relevant to technologists. But the GDPR, which is one of the strongest privacy rulings on the horizon, is actually thick with political intent which goes well beyond the pragmatic and mundane concerns of technical design. Here is section 51 of the Recitals, which discuss the motivation of the regulation and are intended to be used in interpretation of the legally binding Articles in the second section (emphasis mine):

(51) Personal data which are, by their nature, particularly sensitive in relation to fundamental rights and freedoms merit specific protection as the context of their processing could create significant risks to the fundamental rights and freedoms.

Those personal data should include personal data revealing racial or ethnic origin, whereby the use of the term ‘racial origin’ in this Regulation does not imply an acceptance by the Union of theories which attempt to determine the existence of separate human races.

The processing of photographs should not systematically be considered to be processing of special categories of personal data as they are covered by the definition of biometric data only when processed through a specific technical means allowing the unique identification or authentication of a natural person.

Such personal data should not be processed, unless processing is allowed in specific cases set out in this Regulation, taking into account that Member States law may lay down specific provisions on data protection in order to adapt the application of the rules of this Regulation for compliance with a legal obligation or for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.

In addition to the specific requirements for such processing, the general principles and other rules of this Regulation should apply, in particular as regards the conditions for lawful processing.

Derogations from the general prohibition for processing such special categories of personal data should be explicitly provided, inter alia, where the data subject gives his or her explicit consent or in respect of specific needs in particular where the processing is carried out in the course of legitimate activities by certain associations or foundations the purpose of which is to permit the exercise of fundamental freedoms.

It’s not light reading. What I find most significant about Recital 51 is that it explicitly makes the point that data concerning somebody’s racial and ethnic origin are particularly pertinent to “fundamental rights and freedoms” and potential risks to them. This is despite the fact that the EU is denying any theory of racial realism. Recital 51 is in effect saying that race is a social construct but that even though it’s just a social construct it’s so sensitive an issue that processing data about anybody’s race is prima facie seen as creating a risk for their fundamental rights and freedoms. Ethnicity, not denied in the same way as race, is treated similarly.

There are tons of legal exceptions to these prohibitions in the GDPR and I expect that the full range of normal state activities are allowed once all those exceptions are taken into account. But it is curious that revealing race and ethnic origin is considered dangerous by the EU’s GDPR at the same time when there’s this narrative that ethnonationalists want to break up the EU in order to create states affording special privileges to national ethnicities. What it speaks to, among other things, is the point that the idea of a right to privacy is not politically neutral with respect to these questions of nationalism and globalism which seem to define the most important dimensions of political difference today.

Assuming I’m right and the GDPR encodes a political liberalism that opposes ethnonationalism, this raises interesting questions for how it affects geopolitical outcomes once it comes to be enforced. Because of the extra-territorial jurisdiction of the GDPR, it imposes on businesses all over the world policies that respect its laws even if those businesses only operate partially in the EU. Supposing the EU holds together in some form while in other places some moderate form of ethnonationalism takes over. Would the GDPR and its enforcement be strong enough to normalize liberalism into technical and business design globally even while ethonationalist political forces erode civil liberties with respect to the state?

by Sebastian Benthall at May 02, 2017 06:47 PM

April 28, 2017

Ph.D. student

Highlights of Algorithms and Explanations (NYU April 27-28) #algoexpla17

I’ve attended the Algorithms and Explanations workshop at NYU this week. In general, it addressed the problems raised by algorithmic opacity in decision-making. I wasn’t able to attend all the panels; in this post I’ll cover some highlights of what I found especially insightful or surprising.

Overall, I was impressed by the work presented. All of it rose above the naive positions on the related issues; much of it was targeted at debunking these naive positions. This may have been a function of the venue: hosted by the Information Law Institute at NYU Law, the intellectual encounter was primarily between lawyers and engineers. This focuses the conversation. It was not a conference on technology criticism, in a humanities or popular style, which is often too eager to conflate itself with technology policy. In my opinion, this conflation leads to the kinds of excesses Adam Elkus has addressed in his essay on technology policy, which I recommend. For the most part one did not get the sense that the speakers were in the business of creating problems; they were in the business of solving them.

At least this was the tone set by the first panel I attended, which was a collection of computer scientists, statisticians, and engineers who presented tools or conceptualizations that gave algorithmic systems legibility. Of these, I found Anupam Datta’s Quantitative Input Influence measure best motivated from a statistical perspective. I do believe that this measure essentially solves to problem that most vexes people when it comes to the opacity of machine learning systems by giving a clear score for which inputs effect decision outcomes.

I also enjoyed the presentation of Foster Provost, partly for the debunking force of the talk. He drew on his 25+ years of experience designing and deploying decision support systems and pointed out that ever since people started building these tools, the questions of interpretability and accountability have been a part of the job. As a person with technical and industry background who encountered the surge of ‘algorithmic accountability’ in an academic stage, I’ve found many of the questions that have been raised by the field to be baffling largely because the solutions have seemed either obvious or ingrained in engineering culture as among the challenges of dealing with clients. (This tree swing cartoon is a classic illustration of this).

Alexandra Chouldechova gave a very interesting talk on model comparison as a way of identifying bias in black-box algorithms which was new material for me.

In the next panel, dealing specifically with regulation, Deven Desai provided a related historical perspective: there’s a preexisting legal literature in bureaucratic transparency that is relevant to regulatory questions about algorithmic transparency. This awareness is shared, I believe, by those who hold what may be called a physicalist understanding of computation, or what Charles Bennett has called “physical Church’s thesis”: the position that the Church-Turing, which is about how all formal computational systems are reducible to each other and share certain limits as to their power, applies to all physical information processing systems. In particular, this thesis leads to the conclusion that human bureaucratic and information technological systems are essentially up to the same thing when it comes to information processing (this is also the position of Beniger).

But the most galvanizing talk in the regulatory panel was by Sandra Wachter, who presented material relevant to her paper “Why a Right to Explanation of Automated Decision-Making Does Not Exist int eh General Data Protection Legislation“. Companies and privacy scholars in the U.S. turn to the GDPR as a leading and challenging new regulation. It’s bold to show up at a conference on Algorithms and Explanation with an argument that the explainability of algorithms isn’t relevant to the next generation of privacy regulations. This is a space to watch.

The second day’s talks focused on algorithmic explainability in specific sectors. Of these I found the intellectually richest to be the panel on Health Care. Rich Caruana gave a warm and technically focused talk on how the complexity of functions used by a learning system can support or undermine its intelligibility, a topic I personally see as the crux of the problem.

I was especially charmed, however, by Federico Cabitza’s discussion of decision support in the medical context. I wish I could point to any associated papers, but do not have them handy. What was most compelling about the talk was the way it made the case for needing to study algorithmic decision making in vivo, as part of a decision procedure that involves human experts and that learns, as a socio-technical system, over time. In my opinion, too often the perils of opacity of algorithms are framed in terms of a specific judgment or the faults of a specific model. As I try to steer my own work turns more towards sociological process theory, I’m on the lookout for technologists who see technology as part of a sociotechnical, evolutionary process and not in isolation. With this complex ontology in mind, Cabitza was then able to unpack “explanation” into dimensions that targeted different aspects of the decision making process: scrutability, comprehensibility, and interpretability. There was far to much in the talk for me to cover here.

The next panel was on algorithms in consumer credit. All three speakers were very good, though their talks worked in different directions and the tensions between them were never resolved in the questions. Dan Raviv of Lendbuzz explained how his company was bringing credit to those who otherwise have not had access to it: immigrants to the U.S. with firm professional qualifications but no U.S. credit history. Lendbuzz has essentially identified a prime credit population ignored by current FICO scores, and has started a bank to lend to them.

That’s an interesting business and technical accomplishment. Unfortunately, it was largely overlooked as attention moved to later talks in this section. Aaron Rieke of Upturn gave a very realistic picture of the use of big data in credit scoring (it isn’t used much in the U.S.; they mainly use conventional data sources like credit history). What he’s looking for, rather humbly, is ways to be a better advocate, especially for those who are adversely affected by the enormous disparity in credit access.

This disparity was the background to Frank Pasquale’s talk, which was broad in scope. I’m glad he dug into social science theory, presenting some material from “Two Narratives of Platform Capitalism“, which I wish I had read earlier. We seem to share an interest in alternative theories of social scientific explanation and its relationship to the tech economy. It was, as is typical of Pasquale’s work, rather polemical, calling for a critical examination of credit scoring and financial regulation with the aim of exposing exploitation. This exploitation reveals itself in the invasions of privacy suffered by those in poverty as well as the inability of those deemed credit-unworthy to access opportunity.

One cannot fault the political motivation of raising awareness of and supporting the disadvantaged in society. But where the discussion missed the mark, I’m afraid, was in tying these concerns about inequality back to questions of algorithmic transparency. I’m generally of the opinion that the disparities in society are the result of social forces and patterns much more forceful and comprehensive than the nuances of algorithmic credit scoring. It’s not clear how any interventions on these mechanisms can lead to better political outcomes. As Andrew Selbst pointed out in an insightful comment, the very idea of ‘credit worthiness’ sets the deck against those who do not have the reliable wealth to pay their debts. And as Raviv’s presentation revealed (before being eclipsed by other political concerns), for some, the problem is not enough algorithmic analysis of their financial situation, not too much.

There’s a broad and old literature in economics about moral hazards in insurance markets, markets for lemons, and other game theoretic understandings of the winners and losers in these kinds of two-sided markets which is generally understated in ‘critical’ discussions of credit scoring algorithms. That’s too bad in my opinion, as it provides the best explanation of the political outcomes that are most concerning about credit markets. (These discussions of mechanism design use formal modeling but generally do not in an of themselves carry a neoclassical ideology.)

The last talk I attended as about algorithms in the media. Nick Diakonopoulos gave a comprehensive review of the many issues at stake. The most famous speaker on this panel was Gilad Lotan, who presented a number of interesting (though to me, familiar) data science results about media fragmentation and the Outside Your Bubble buzzfeed feature, aimed to counter it.

I wish Lotan had presented about something else: how Buzzfeed uses the engagement data is collects across its platforms and content to make editorial and strategic decisions. This is the kind of algorithmic decision-making that affects people’s lives. It is also precisely the kind of decision-making which is not generally transparent to the consumers of media. It would have been nice (and I feel, appropriate for the conference) if Lotan had taken the opportunity to explain Buzzfeed’s algorithms, especially in sociotechnical context of the organization’s broader decision-making and strategy. But he didn’t.

The discussion proceeded to devolve into one about fake news. One good point that was made in this discussion was by Julia Powles: she’s learned in her work that one of the important and troubling consequences of technology’s role in media is that while Google, Facebook and the like cater to both journalists and media consumers, their market role is disintermediation of the publishers. But historically, journalists have had their editorial power through their relationships with publishers, who used to be the ones to control distribution.

I came away from this conference feeling well informed about innovations in machine learning and statistics in model interpretation and communication. But I’ve left confirmed in my view that much of the discussion of algorithms and their political effects per se is a red herring. Broader economic questions of industrial organization of the information economy dominate the algorithmic particulars, where political effects are concerned.

by Sebastian Benthall at April 28, 2017 08:14 PM

April 23, 2017

Ph.D. student

Process theory; generative epistemology; configurative ontology: notes on Cederman, part 1

I’ve recently had recommended to me the work of L.E. Cederman, who I’ve come to understand is a well-respected and significant figure in computational social science, especially agent based modeling. In particular, I’ve been referred to this paper on the theoretical foundations of computational sociology:

Cederman, L.E., 2005. Computational models of social forms: Advancing generative process theory 1. American Journal of Sociology, 110(4), pp.864-893. (link)

This is a paper I wish I had encountered years ago. I’ve written much here about my struggles with “interdisciplinary” research. In short: I’ve been trying to study social phenomena with scientific rigor. This is a very old problem fraught with division. On top of that, there’s been, it seems, an epistemological upset because of advances in data collection and processing that poses a practical challenge to a lot of established disciplines. On top of this, the social phenomena I’m interested in most tend to involve the interaction between people and technology, which brings with it an association with disciplines specialized to that domain (HCI, STS) that for me have not made my research any more straightforward. After trying for some time to do the work I wanted to do under the new heading of data science, I did not find what I was looking intellectually in that emerging field, however important the practical skill-set involved has been to me.

Computational social science, I’ve convinced myself if not others, is where the answers lie. My hope for it is that as a new discipline, it’s able to break away from dogmas that limited other disciplines and trapped their ambitions in endless methodological debates. What is being offered, I’ve imagined, in computational social science is the possibility of a new paradigm, or at least a viable alternative one. Cederman’s 2005 paper holds out the promise for just that.

Let me address for now some highlights of his vision of social science and how they relate to the other. I hope to come to the rest in a later post.

Sociological process theory. This is a position in sociological theory that Cederman attributes to 19th century sociologist Georg Simmel. The core of this position is that social reality is not fixed, but rather result of an ongoing process of social interactions that give rise to social forms.

“The large systems and the super-individual organizations that customarily come to mind when we think of society, are nothing but immediate interactions that occur among men constantly every minute, but that have become crystallized as permanent fields, as autonomous phenomena.” (Simmel quoted in Wolf 1950, quoted in Cederman 2005)

There is a lot to this claim. If one is coming from the field of Human Computer Interaction (HCI), what may seem most striking about it is how well it resonates with a scholarly tradition that is most frequently positioned as a countercurrent to an unthinking positivism in design. Lucy Suchman, Etienne Wenger, and Jean Lave are scholars that come to mind as representative of this way of thinking. Much of the intellectual thrust of Simmel can be found in Paul Dourish’s criticism of positivist understandings of “context” in HCI.

For Dourish, the intellectual ground of this position is phenomenological social science, often associated with ethnomethodology. Simmel predates phenomenology but is a neo-Kantian, a contemporary of Weber, and a critic of the positivism of his day (the original positivism). As a social scientific tradition, it has had its successors (maybe most notably George Herbert Mead) but has submerged under other theoretical traditions. From Cederman’s analysis, one gathers that this is largely due to process theory’s inability to ground itself in rigorous method. Its early proponents were fond of metaphorical writing in a way that didn’t age well. Cederman pays homage to the sociological process theory’s origins, but quickly moves to discuss an epistemological position that complements it. Notably, this position is neither positivist, nor phenomenological, nor critical (in the Frankfurt School sense), but something else: generative epistemology.

Generative epistemology. Cederman positions generative epistemology primarily in opposition to positivism and particularly a facet of positivism that he calls “nomothetic explanation”: explanation in terms of laws and regularities. The latter is considered the gold standard of natural science and the social sciences that attempt to mimic them. This tendency is independent of whether the inquiry is qualitative or quantitative. Both comparative analysis and statistical control look for a conjunction of factors that is regularly predictive of some outcome. (Cederman’s sources on this: (Gary) King, Keohane, and Verba (1994), and Goldthorpe, 1997. The Gary King cited is I assume the same Gary King who goes on to run Harvard’s IQSS; I hope to return to this question of positivism in computational social science in later writing. I tend to disagree with the idea that ‘data science’ or ‘big data’ has primarily a positivist tendency.)

Cederman describes the ‘process theorist’s’ alternative as based on abduction, not induction. Recall that ‘abduction’ was Peirce’s term for ‘inference to the best explanation’. The goal is to take an observed sociological phenomenon and explain its generation by accounting for how it is socially produced. The preference for generative explanation, in Simmel, comes in part from a pessimism about isolating regularities in complex social systems. Through this theorization, knowledge is gained; the knowledge gained is a theoretical advance that makes a social phenomenon less ‘puzzling’.

“The construction of generative explanations based on abductive inference is an inherently theoretical endeavor (McMullin, 1964). Instead of subsuming observations under laws, the main explanatory goal is to make a puzzling phenomenon less puzzling, something that inevitably requires the introduction of new knowledge through theoretical innovation.”

The specifics of the associated method are less clear than the motivation for this epistemology. Many early process theorists resorted to metaphors. But where all this is going is into the construction of models, and especially computational models, as a way of presenting and testing generative theories. Models generate forms through logical operations based on a number of parameters. A comparison between the logical form and the empirical form is made. If it favorable, then the empirical form can be characterized as the result of a process described by the variables and model. (Barth, 1981)

Cederman draws from Barth (1981) and Thomas Fararo (1989) to ally himself with ‘realist’ social science. The term is clarified later: ‘realism’ is opposed to ‘instrumentalism’, a reference that cuts to one of core epistemological debates in computational methods. An instrumental method, such as a machine learning ensemble, may provide a very instrumental model for purposes of prediction and control that nevertheless does not capture what’s really going on in the underlying process. Realist mathematical sociology, on the other hand, attempts to capture the reality of the process generating the social phenomenon in the precise language of processing, mathematics/computation. The underlying metaphysical point is one that many people would rather not attend to. For now, we will follow Cederman’s logic to a different ontological point.

Configurative ontology. Sociological process theory requires explanations to be specify the process that generates the social form observed. The entities, relations, and mechanisms may be unobserved or even unobservable. Postivists, Cederman argues, will often take the social forms to be variables themselves and undertheorize how the variables have been generated, since they care only about predicting actual outcomes. Whereas positivists study ‘correlations’ among elements, Simmel studies ‘sociations’, the interactions that result in those elements. The ontology, then, is that social forms are “configurations of social interactions and actors that together constitute the structures in which they are embedded.

In this view, variables, such as would be used in some more positivist social scientific study, “merely measure dimensions of social forms; they cannot represent the forms themselves except in very simple cases.” While a variable based analysis detaches a social phenomenon from space and time, “social forms always possess a duration in time and an extension in space.

Aside from a deep resonance with Dourish’s critique of ‘contextual computing’ (noted above), this argument once again recalls much of what now comes under the expansive notion of ‘criticism’ of social sciences. Ethnomethodology and ethnography more general are now often raised as an alternative to simplistic positivist methods. In my experience at Berkeley and exposure so far to the important academic debates, the most noisy contest is between allegedly positivist or instrumentalist (they are different, surely) quantitative methods and phenomenological ethnographic methods. Indeed, it is the latter who more often now claim the mantle of ‘realism’. What is different about Cederman’s case in this paper is that he is setting up a foundation for realist sociology that is nevertheless mathematized and computational.

What I am looking for in this paper, and haven’t found yet, is an account of how these ‘realist’ models of social processes are tested for their correspondence to empirical social form. Here is where I believe there is an opportunity that I have not yet seen fully engaged.

by Sebastian Benthall at April 23, 2017 05:32 PM

April 22, 2017

MIMS 2014

Coffee: Productivity Fuel? Or Just an Excuse to Leave the Office?

IMG_4037An organic coffee farm near Salento, Colombia

Traveling through Colombia’s coffee region, my days have been spent drooling over roasted arabica beans on organic coffee fincas, or having religious experiences while sampling the remarkable brew at some of the region’s cafes. It all made me realize that I truly am addicted to the stuff. Without at least two cups of java in the morning, I am a morose, gelatinous, dreary-eyed, delirious blob. And that got me thinking: if coffee is such a crucial input into my own productivity, what about the world at large? Are countries that drink more coffee more productive?

I am not the first person to ask this question. There is a problem, however, when it comes to relating productivity with coffee consumption. On a country level, at least, productivity is generally measured as GDP per capita, i.e. the value of goods and services provided by a country divided by its population. That means that we’re comparing coffee consumption with productivity in terms of a country’s wealth—as opposed to something else, like number of widgets produced, or the number of snaps sent per day.


The issue with GDP, however, is that coffee consumption naturally grows when a country’s inhabitants are more wealthy. Thus, when we observe the positive correlation between coffee consumption per person and GDP per capita (see chart), it’s way more likely the arrow of causality is running in the other direction, i.e. wealth is driving coffee consumption, rather than the other way around.


So do we give up there? Not just yet. In my grand armchair theory about coffee, gains in productivity are (in part) reaped from the extra hours that the precious elixir enables us to pour into our livelihoods each day. It’s difficult to verify this theory empirically, given the issue re: comparing productivity and coffee consumption described above. Moreover, there is a separate debate over whether toiling away more hours adds or detracts from worker productivity. But setting that question aside for a moment, I wondered: do we at least observe that countries with higher coffee consumption also have workers who are more likely to burn the midnight oil at the office?

coffee_vs_hoursThe answer is, surprisingly, not at all. There is, in fact, an unmistakably negative relationship between cups of coffee per day and the number of hours worked per person. So does this mean I need to totally flip my theory? Does coffee consumption actually make us lazier, because we’re so busy taking all those coffee breaks? Just look at the Netherlands way the eff out there on the bottom right. It makes total sense, since we know exactly what those Dutch are really up to during all those coffee breaks…

In reality, the story is not so simple. When you take a closer look at the countries that form the negative trend, something becomes quite apparent. The countries in the top-left are generally less wealthy than the countries in the bottom-right. Thus, my attempt to ignore country wealth by focusing instead on hours worked was all for naught, because it seems that country wealth is once again rearing its ugly head as a lurking variable.

Just as there is a very strong relationship between wealth and the consumption of coffee, there is also a strong relationship between a country’s wealth and the number of hours people work. It turns out that wealthier countries work fewer hours on average than less wealthy countries. This trend is pretty clear in the graph below—except for Singapore hanging out on the top right, slaving away but still making serious bank. You go, Singapore! Never change.gdp_vs_hoursThere are a lot of articles out there that try to explain why more productive countries work fewer hours (here’s one). Some conclude that because workers in richer countries are more productive, they need to work less. I think this line of thinking can be potentially problematic, particularly if one equates productivity with efficiency.  That could lead people to think that people in poorer countries are lazier on the job, or perhaps incompetent. But we have to remember what productivity actually means in this context. Recall from above that it is the value of goods and services a country makes divided by its population. And when we talk about value here, we are speaking in terms of how the market rewards these goods and services, not in terms of the sweat that goes into making them.

My own take is that workers in richer countries aren’t necessarily working more productively (i.e. more efficiently) than their counterparts in poorer countries, but rather that the types of jobs in richer countries on average tend to be more highly paid than in poorer countries. If you live in Vietnam and weren’t fortunate enough to have decent access to an education like your Oxford-educated friend in England, you probably won’t earn as much per hour as she will. And in order to make ends meet, you’re gonna need to put in more hours on the job.

Anyway, this is starting to veer quite a ways from coffee, and stray closer to another interest of mine, economics. The main thing to remember is that a country’s wealth has a positive influence on coffee consumption and a negative influence on the number of hours worked. Because of this complex tangle of relationships, it can be misleading to rely only on graphs that look at two variables at a time. Luckily, a statistician’s toolbox isn’t limited to scatterplots. By using linear regression, we can actually examine the relationship between coffee consumption and hours worked while controlling for the effect of a country’s wealth.

We can, in fact, control for a host of other variables we might think are important as well. For example, as a hot beverage, we might expect coffee to be less popular in countries with higher average temperatures. We might also control for region of the world as a proxy for culture, since guzzling coffee isn’t quite as big of a thing in say, India or China, as it is in the West. Those countries, for example, seem to prefer tea.

So what happens once we control for all of these variables? Well, it all depends on whether you include Singapore or not. In statistical jargon, Singapore is what is referred to as an influential observation. In other words, it’s an outlier that messes everything up if it is included in the analysis. Whatever is going on in Singapore is clearly very unique to Singapore. If we include it as part of our effort to describe a general trend, it will prove to be more of a distraction than anything else. Thus, we toss it out. Sorry Singapore. I know I said I loved you, but you gotta go. Stay golden.

Once Singapore is out of the picture—and we control for all the variables listed above—it turns out that coffee consumption has no statistically significant effect on the number of hours worked in a country. Thus, the answer to the title of this article is . . . Neither! On a country-level basis, coffee neither makes people work harder nor does it make them take more breaks out of the office. Sorry if that’s a boring conclusion, but don’t shoot the messenger. I’m just telling you what the numbers say.

Of course, this is not the final word on the subject. There may be more granular data out there, with consumption and productivity information recorded at a personal level (ideally as part of a randomized double-blind experiment using caffeine pills vs placebos). Such data would be much better suited to answering the question than the national-level data we’ve been looking at. But maybe you still managed to learn a thing or two about coffee, economics, or statistics in the process. Either way, it’s time for another cup of joe.


Data Links:


by dgreis at April 22, 2017 08:57 PM

April 19, 2017

Ph.D. student

The GDPR and the future of the EU

In privacy scholarship and ‘big data’ engineering circles, much is being made about the EU’s General Data Protection Regulation (GDPR). It is probably the strongest regulation passed protecting personal data in a world of large-scale, global digital services. What makes it particularly fearsome is the extra-territoriality of its applicability. It applies to controllers and processors working in the EU whether or not the data processing is itself being done in the EU, and it applies processing of data whose subjects are in the EU whether or not the controller or processor is in the EU. In short, it protects the data of people in the EU, no matter where the organization using the data is.

This is interesting in light of the fact that the news is full of intimation that the EU might collapse with the result of the French election. Prediction markets currently favoring Macron, but he faces a strong contender in Le Pen, who is against the Eurozone.

The GDPR is scheduled to go into effect in 2018. I wonder what its jurisdiction will be once it goes into effect. A lot can happen between now and then.

by Sebastian Benthall at April 19, 2017 09:39 PM

April 17, 2017

Ph.D. student

A big, sincere THANK YOU to the anonymous reviewer who rejected my IC2S2 submission

I submitted an abstract to IC2S2 this year. It was a risky abstract so submit: I was trying to enter into a new field; the extended abstract length was maximum three pages; I had some sketches of an argument in mind that were far too large in scope and informed mainly by my dissatisfaction with other fields.

I got the most wonderful negative review from an anonymous reviewer. A careful dissection of my roughshod argument and firm pointers to literature (some of it quite old) where my naive intuitions had already been addressed. It was a brief and expertly written literature review of precisely the questions that I had been grasping at so poorly.

There have been moments in my brief research career where somebody has stepped in out of the blue and put be squarely on the right path. I can count them on one hand. This is one of them. I have enormous gratitude towards these people; my gratitude is not lessened by the anonymity of this reviewer. Likely this was a defining moment in my mental life. Thank you, wherever you are. You’ve set a high bar and one day I hope to pay that favor forward.

by Sebastian Benthall at April 17, 2017 01:44 AM

April 15, 2017

Ph.D. student

Three possibilities of political agency in an economy of control

I wrote earlier about three modes of social explanation: functionality, which explains a social phenomenon in terms of what it optimizes; politics, which explains a social phenomenon in terms of multiple agents working to optimize different goals; and chaos, which explains a social phenomenon in terms of the happenings of chance, independent of the will of any agent.

A couple notes on this before I go on. First, this view of social explanation is intentionally aligned with mathematical theories of agency widely used in what is broadly considered ‘artificial intelligence’ research and even more broadly  acknowledged under the rubrics of economics, cognitive science, multi-agent systems research, and the like. I am willfully opting into the hegemonic paradigm here. If years in graduate school at Berkeley have taught me one pearl of wisdom, it’s this: it’s hegemonic for a reason.

A second note is that when I say “social explanation”, what I really mean is “sociotechnical explanation”. This is awkward, because the only reason I have to make this point is because of an artificial distinction between technology and society that exists much more as a social distinction between technologists and–what should one call them?–socialites than as an actual ontological distinction. Engineers can, must, and do constantly engage societal pressures; they must bracket of these pressures in some aspects of their work to achieve the specific demands of engineering. Socialites can, must, and do adopt and use technologies in every aspect of their lives; they must bracket these technologies in some aspects of their lives in order to achieve the specific demands of mastering social fashions. The social scientist, qua socialite who masters specific social rituals, and the technologist, qua engineer who masters a specific aspect of nature, naturally advertise their mastery as autonomous and complete. The social scholar of technology, qua socialite engaged in arbitrage between communities of socialites and communities of technologists, naturally advertises their mastery as an enlightened view over and above the advertisements of the technologists. To the extent this is all mere advertising, it is all mere nonsense. Currency, for example, is surely a technology; it is also surely an artifact of socialization as much if not more than it is a material artifact. Since the truly ancient invention of currency and its pervasiveness through the fabric of social life, there has been no society that is not sociotechnical, and there has been no technology that is is not sociotechnical. A better word for the sociotechnical would be one that indicates its triviality, how it actually carries no specific meaning at all. It signals only that one has matured to the point that one disbelieves advertisements. We are speaking scientifically now.

With that out of the way…I have proposed three modes of explanation: functionality, politics, and chaos. They refer to specific distributions of control throughout a social system. The first refers to the capacity of the system for self-control. The second refers to the capacity of the components of the system for self-control. The third refers to the absence of control.

I’ve written elsewhere about my interest in the economy of control, or in economies of control, plurally. Perhaps the best way to go about studying this would be an in depth review of the available literature on information economics. Sadly, I am at this point a bit removed from this literature, having gone down a number of other rabbit holes. In as much as intellectual progress can be made by blazing novel trails through the wilderness of ideas, I’m intent on documenting my path back to the rationalistic homeland from which I’ve wandered. Perhaps I bring spices. Perhaps I bring disease.

One of the questions I bring with me is the question of political agency. Is there a mathematical operationalization of this concept? I don’t know it. What I do know is that it is associated most with the political mode of explanation, because this mode of explanation allows for the existence of politics, by which I mean agents engaged in complex interactions for their individual and sometimes collective gain. Perhaps it is the emerging dynamics of the individual’s shifting constitution as collectives that captures best what is interesting about politics. These collectives serve functions, surely, but what function? Is it a function with any permanence or real agency? Or is it a specious functionality, only a compromise of the agents that compose it, ready to be sabotaged by a defector at any moment?

Another question I’m interested in is how chaos plays a role in such an economy of control. There is plenty of evidence to suggest that entropy in society, far from being a purely natural consequence of thermodynamics, is a deliberate consequence of political activity. Brunton and Nissenbaum have recently given the name obfuscation to some kinds of political activity that are designed to mislead and misdirect. I believe this is not the only reason why agents in the economy of control work actively to undermine each others control. To some extent, the distribution of control over social outcomes is zero sum. It is certainly so at the Pareto boundary of such distributions. But I posit that part of what makes economies of control interesting is that they have a non-Euclidean geometry that confounds the simple aggregations that make Pareto optimality a useful concept within it. Whether this hunch can be put persuasively remains to be seen.

What I may be able to say now is this: there is a sense in which political agency in an economy of control is self-referential, in that what is at stake for each agent is not utility defined exogenously to the economy, but rather agency defined endogenously to the economy. This gives economic activity within it a particularly political character. For purposes of explanation, this enables us to consider three different modes of political agency (or should I say political action), corresponding to the three modes of social explanation outlined above.

A political agent may concern itself with seizing control. It may take actions which are intended to direct the functional orientation of the total social system of which it is a part to be responsive to its own functional orientation. One might see this narrowly as adapting the total system’s utility function to be in line with one’s own, but this is to partially miss the point. It is to align the agency of the total system with one’s one, or to make the total system a subsidiary to one’s agency.  (This demands further formalization.)

A political agent may instead be concerned with interaction with other agents in a less commanding way. I’ll call this negotiation for now. The autonomy of other agents is respected, but the political agent attempts a coordination between itself and others for the purpose of advancing its own interests (its own agency, its own utility). This is not a coup d’etat. It’s business as usual.

A political agent can also attempt to actively introduce chaos into its own social system. This is sabotage. It is an essentially disruptive maneuver. It is action aimed to cause the death of function and bring about instead emergence, which is the more positive way of characterizing the outcomes of chaos.

by Sebastian Benthall at April 15, 2017 04:24 PM

April 14, 2017

adjunct professor

DOC: No Records on Privacy Shield Removal Procedure

Back in November, I posted the Department of Commerce’s Privacy Shield checklist. The next logical step was to request DOC’s procedures for removal of companies from the Privacy Shield (submitted Dec. 1). Today, DOC-International Trade Administration responded with a “no records” response. It is not clear to me what date the search took place, and ITA is careful to say that their search did not include non-ITA Commerce elements. I’m following up on that.

by web at April 14, 2017 04:10 PM

April 12, 2017

Center for Technology, Society & Policy

Bug Bounty Programs as a Corporate Governance “Best Practice” Mechanism

by Amit Elazari Bar On, CTSP Fellow | Permalink

Originally posted on Berkeley Technology Law Journal Blog, on March 22, 2017

In an economy where data is an emerging global currency, software vulnerabilities and security breaches are naturally a major area of concern. As society produces more lines of code, and everything – from cars to sex toys is becoming connected: vulnerabilities are produced daily.[1]   Data breaches’ costs are estimated at an average of $4 million for an individual breach, and $3 trillion in total cost. While some reports suggest lower figures, there is no debate that such vulnerabilities could result in astronomically losses if left unattended. And as we recently learned from the Cloudflare breach, data breaches are becoming more prominent and less predictable,[2] and even security companies get hacked.

In light of these developments, it is no surprise that cybersecurity has become one of the major subjects regularly discussed in board rooms. Recently, the U.S. National Association of Corporate Directors (NACD) reported that while the directors do believe cyberattacks will affect their companies, many of them “acknowledge that their boards do not possess sufficient knowledge of this growing risk.” These findings suggest that directors should rethink their direct legal reasonability for the losses incurred due to unattended vulnerabilities.

The legal and business risks associated with data breaches are complex, and range from the FTC and other regulators’ investigations[3] to M&As complications[4] and consumer class actions.[5] But usually, if executives aren’t named personally in the complaint or prosecuted by regulators, such costs are endured by corporations or their cyber insurance, not the directors or managers themselves. However, shareholders’ derivative lawsuits for directors and managers’ liability are different. These suits target management personally.[6]

Experience shows that stock prices, even if influenced by the data breach, will eventually recover.[7] Yet, shareholder derivative lawsuits for directors’ liability are continuously filed in cases of data breaches. In such cases, the shareholders of the company that suffered from the data breach allege that by virtue of neglecting to enforce internal controls and monitor security vulnerabilities, the mangers breached their fiduciary duties towards the company.

Wyndham hotels, Home Depot and, of course, Target, are just a few companies in which data breaches were followed by such directors’ liability suits. More recently, Wendy’s, the popular fast food restaurant chain, was hit with such a suit[8] and now Yahoo! management is being sued by a group of shareholders for breach of fiduciary duties following their highly public data breach.[9] Until now, courts have dismissed these cases, following U.S. corporate law higher threshold concerning the Business Judgement Rule (BJR).[10] According to the court, directors’ duty of care to monitor security vulnerabilities is satisfied by enacting a reasonable system of reporting existing vulnerabilities, and their fiduciary duty is further fulfilled by doing something, anything, with these reports.[11] The view is that the board should put a “reasonable” security plan in place, not a perfect one.[12] It’s still not clear how the BJR reasonableness threshold differs from the FTC’s requirement to enact reasonable security practices under Section 5(a) of the FTC Act, but at least from the Wyndham case, it seems that BJR’s reasonableness threshold, when it comes to cyber, is much lower.[13]

The result is that corporate fiduciary duties are perhaps not the most effective mechanism to promote cybersecurity in the current legal environment. This is because, on the one hand the BJR is highly deferential to any reasonable action a board might take, and the other hand, especially in cyber security, reasonable actions are just not enough to provide adequate protection.

Yet, as Wong argued, even if shareholders’ derivative lawsuits often fail in the data breach context, directors should still be concerned with security vulnerabilities.[14] Data breaches involve personal reputational and economical costs for management, could result in board reelection, and cause consumer dissatisfaction.[15] We have recently learned the Yahoo! managers were not only sued for breach of fiduciary duties,[16] but asked to answer to a Senate Committee. Moreover, Yahoo!’s General Counsel has resigned, there were “management changes,” and Marissa Mayer, Yahoo!’s CEO, didn’t receive her annual bonus for 2016. All of this in addition to the $350 million drop in the Verizon-Yahoo M&A consideration price. It follows that managers and directors alike should continue to consider cybersecurity from a corporate governance perspective, but instead of focusing on minimizing liability, they should inspire to enact cybersecurity “best practices,[17] as they do in other corporate related areas.[18]

Introducing “Bug Bounty” Programs

As the economic, reputational and legal costs of data breaches grow rapidly, the practice of exposing cyber vulnerabilities and “bugs” has evolved from an internal quality assurance process to a booming industry: a “bug bounty economy” emerged. Governments and companies enact vulnerability rewards programs in which they pay millions to individual security experts worldwide for preforming adversarial research and exposing critical vulnerabilities, previously uncovered by the organizations internal checks and quality assurance.[19] From cutting-edge Silicon Valley companies to traditional governmental organizations such as the Pentagon and the FTC: all are beginning to understand why we need the help of friendly hackers, as we face the big battle over who controls the vulnerability market. For regulators, Bug Bounty Programs allow the advantage of employing talent which they might not be able to recruit in traditional employment tracks and facilitates, as explained here, an additional cost-effective objective monitoring system, free of hierarchies and political boundaries.[20]

The recent news about one of the biggest breaches in 2017, the Cloudflare breach (ironically termed “Cloudbleed”), discovered by Tavis Ormandy from Google’s Project Zero bug-hunting team, teaches us that even a small software bug, unattended, could result in great harm. The fact that this vulnerability was eventually exposed by a bug hunter, emphasizes that in cyber, as in all other source codes, “given enough eyeballs, all bugs are shallow.”[21] This means that if we can invite every security researcher in the world, to join the “co-developer base,” bugs will be discovered and fixed faster.[22] This is exactly what Bug Bounty Programs aim to do.

Bug Bounty Programs proactively invite security researchers from around the world to expose the company’s vulnerabilities in exchange for monetary and, sometimes more importantly, reputational rewards. If adequate report mechanisms are in place, Bug Bounty Programs could serve as an additional security layer, an external monitoring system, and provide management and directors with essential information concerning cyber vulnerabilities.  Indeed, “[b]ug bounty programs are moving from the realm of novelty towards becoming best practice[23] – but they can also serve as a corporate governance best practice, by operating as an additional objective and independent report system for management. Naturally, this will require the company’s senior management and board to become more involved in the program, demand timely reports, and that direct communication channels will be established. This is an increased standard both in terms of resources as well as time, but in the context of million-dollar breach damages, these preventative actions are worth the price.

Recognizing the above advantages of Bug Bounty Programs by senior management and directors will further contribute to the “bug bounty ecosystem,” while strengthening companies’ corporate governance practices. Bug Bounty Programs provide the management with a relatively inexpensive yet effective independent monitoring system, that could potentially reduce D&O liability and corporate litigation risks, while boosting the overall cybersecurity safeguards of the corporation.


[1] See Why everything is hackable: Computer security is broken from top to bottom, The Economist (Apr. 7, 2017) (explaining how technology, software development culture, economic incentives, governments divided interests and cyber-insurance, all fuel the vulnerabilities’ “circus”).

[2] For example, New York State Attorney General, Eric T. Schneiderman reported a 60% increase in data breaches affecting New York state residents in 2016. See Att’y Gen. Eric T. Schneiderman, A.G. Schneiderman Announces Record Number of Data Breach Notices for 2016, available at

[3] As of the end of 2016, the FTC brought over 60 cases related to information security against companies that were engaged in “unfair or deceptive” practices. See Fed. Trade Comm’n, Privacy & Data Sec. Update: 2016 (2016), available at For a recent, comprehensive analysis of the FTC efforts in this field (and others) see Chris Jay Hoofnagle, Federal Trade Commission Privacy Law and Policy ch. 8 (2016).

[4] As the Verizon-Yahoo! deal illustrates, data breaches could result in price reductions and renegotiations of M&As. Professor Steven Davidoff Solomon wan an early observer of this result of the Yahoo! breach, claiming on September 2016 that the data breach will give Verizon “significant leverage to renegotiate the price”. See Steven Davidoff Solomon, How Yahoo’s Data Breach Could Affect Its Deal With Verizon, N.Y. Times (Sep. 23, 2016), (discussing the relationship between data breaches and “material adverse change” (MAC) clauses).

[5] For example, yet another class action was filed against Yahoo! to the on February 7, 2017, following the major data breaches the company suffered from in 2016. See Steven Trader, Yahoo Hit With Another User Class Action Over Data Breach, Law360, (Feb. 8, 2017), (Ridolfo v. Yahoo Inc., case number 3:17-cv-00619).

[6] A derivative lawsuit is brought by the shareholders on behalf of the company, seeking a remedy for injury that the company incurred. It allows shareholders to police directors and other mangers activities, but also requires that the shareholders will all exhaust available intracorporate remedies, such as demanding from the board to take action, as a procedural hurdle. The derivative lawsuit differs significantly from the direct shareholder suit, which seek remedy for injuries suffered by the shareholders themselves. See, e.g., Tooley v. Donaldson, Lufkin, & Jenrette, Inc., 845 A.2d 1031, 1033 (Del. 2004). Its noteworthy that in some cases, where fiduciary duties are breached not in “good faith,” D&O insurance will not cover such suits and directors couldn’t be indemnified for their legal costs.

[7] For a more academic survey, that reached similar conclusions, see Pierangelo Rosati et al., The effect of data breach announcements beyond the stock price: Empirical evidence on market activity, 49 Int’l Rev. Fin. Analysis 146 (2017), available at (surveying 74 data breaches of U.S. publicly traded firms, from 2005 to 2014, and reaching the conclusion that there is a positive short-term effect, but a quick return to normal market activity).

[8] Graham v. Peltz, 1:16-cv-1153 (S.D. Ohio Dec. 16, 2016).

[9] Its noteworthy that this Yahoo! claim focuses on “breach of fiduciary duty arising from the non-disclosure of data security breaches to Yahoo Inc.’s customers”, as opposed to failure to monitor security vulnerabilities. See Steven Trader, Yahoo Shareholders Sue Over Massive Data Breaches, Law360 (Feb. 21, 2017), (Oklahoma Firefighters Pension and Retirement System v. Brandt, 2017-0133) (Del. Ch. Feb. 21, 2017).

[10]  For a helpful review of the manner in which the directors’ “duty to monitor”, as articulated under Caremark, was applied in the Target and Wundham Shareholders’ derivative lawsuits, see Victoria C. Wong, Cybersecurity, Risk Management, and How Boards Can Effectively Fulfill Their Monitoring Role, 15 U.C. Davis Bus. L.J. 201 (2015).

[11] See In re Home Depot S’holder Derivative Litig., 2016 U.S. Dist. LEXIS 164841, at *16 (N.D. Ga. Nov. 30, 2016) (citing Lyondell Chem. Co. v. Ryan, 970 A.2d 235, 243-44 (Del. 2009) and noting that “[u]nder Delaware law, … directors violate their duty of loyalty only ‘if they knowingly and completely failed to undertake their responsibilities’” and that “in other words, as long as the Outside Directors pursued any course of action that was reasonable, they would not have violated their duty of loyalty.”)

[12] Id. at *18.

[13] The boundaries of how the FTC reasonableness standard will be applied with respect to cyber security are still not clear, although the FTC releases statements regarding this standard. The newly initiated suit against D-link will probably shed some light in this respect. See Federal Trade Commission, FTC Charges D-Link Put Consumers’ Privacy at Risk Due to the Inadequate Security of Its Computer Routers and Cameras (Jan. 5, 2017), and Federal Trade Commission, Data Security, (last visited Mar. 3, 2017)

[14] Wong, supra note 10.

[15] Id. at 211–214.

[16] See Trader, supra note 9.

[17] Id.

[18] See 1 Corporate Governance: Law & Practice § 1.03 (Amy L. Goodman & Steven M. Haas eds., 2016) (explaining that “many of the sources of guidance on corporate governance practices are not captured in rules and regulations but, rather, are set forth in statements, principles and white papers issued by bar associations, institutional investors, business groups and proxy voting advisory services, among others. These have come to be collectively referred to as recommended ‘best practices.’”).

[19] Cybersecurity Research: Addressing the Legal Barriers and Disincentives,, at 5. See also Bugcrowd, The State of Bug Bounty Bugcrowd’s second annual report on the current state of the bug bounty economy (June 2016), available at, at 8. A comprehensive list of bug bounty programs, enacted by leading companies such as Google and Facebook, is available here:

[20] Generally, Bug Bounty Programs generate value on multiple levels: They boost companies return on investment, when comparing the cost of employing highly qualified security researchers; they facilitate recruitment and talent acquisition; they produce a reputation value; and they create a positive impact on software development lifecycle. See, e.g., Keren Elazari, How hackers can be a force for corporate good, Financial Times (Apr. 10, 2017)

[21] This is Eric Raymond’s famous “Linus Law,” one of open source culture corner stones, coined in Eric S. Raymond, The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary 19 (1999).

[22] Id. at 30.

[23] See also Jeff Stone, in an age of digital insecurity, paying bug bounties becomes the norm, The Christian Science Monitor (Aug. 12, 2016),

by Jennifer King at April 12, 2017 05:37 AM

April 08, 2017

adjunct professor

On Edward Balleisen’s Fraud: An American History from Barnum to Madoff

“…fraud is endemic to modern capitalism,” so said Professor Edward Balleisen at a National History Center talk on his excellent, comprehensive, thoughtful Fraud: An American History from Barnum to Madoff. We need histories of consumer protection. Balleisen provides one such history, focusing on the idea of fraud—specifically those wrought by businesses against consumers and investors. The concept of “fraud” is complex, it is defined differently through disciplinary lenses, and when we think about FTC privacy and many other consumer protection efforts, we are addressing conduct that is different from Balleisen’s focus. Yet, Balleisen’s book offers lessons for consumer protection more broadly and I learned a great deal from it.

Balleisen’s observation of the policy pendulum of anti-fraud efforts is most clearly stated on page 309, and anyone involved in modern debates on the FTC will recognize it:

Forceful antifraud tactics tended to generate complaints about autocratic governance that ran roughshod over individual rights and American values, which then prompted adoption of procedural protections, which in turn limited the effectiveness of administrative remedies. Post–World War II proceduralism deepened the democratic legitimacy of antifraud regulation, but at the cost of extending the rights of accused businesses, whether in criminal or administrative contexts.

My copy of Balleisen’s book is heavily marked up. So here are two key questions answered by the book and some other reflections–

Why, despite our rich information environment and seeming greater accountability brought about by technology and institutions, do frauds still persist, largely in five basic forms (pump and dump, pyramid scheme, bait and switch, advanced fee frauds, control fraud)?

  • There are businesses committed to fraud. The proceduralism described by Balleisen allowed committed fraudsters (Holland Furnace, Fritzel Television) to slow down intervention.
  • Committed fraudsters keep a “squawk” fund to “cool of the mark” by paying the consumers who do complain.
  • Especially in areas where products/services are new and norms do not yet exist, new market entrants have more space for deception.
  • Concerns about the pace of innovation and creating breathing room for it makes tolerance for fraud a part of a dynamic economy.
  • A turn to individualism in the 1970s caused institutions such as the BBB to embrace squawk fund approaches—instead of pursing big, collective actions, BBB started remedying individual claims, thus leaving the target free to continue operations.
  • Frauds are often small scale and your typical collective action problems emerge in policing them (daunting costs of representation, limited recovery, risk of countersuit or retaliation, embarrassment, and the problem of “unclean hands”).
  • Information asymmetry still exists!
  • Fraudsters can take advantage of the biases and heuristic reasoning approaches that most of us use.
    • We are strongly moved by forms of social proof over more objective evidence.
    • We are overconfident, especially when we have a little knowledge of a subject. There is the problem that many of us cannot recognize our own incompetence (the Dunning-Kruger effect).
    • We reason through “available” examples—easily recallable fraud events. As old frauds (such as the lightning rod sales of the last century) are interdicted, we forget about them and their lessons.
    • We are vulnerable to anchoring, which skews our perception of price.
    • We are loss adverse—and so when we anchor to a price, we act impulsively to capture discounts from the anchored price.
    • We are not good at separating bundles, and so sellers that engage in bundling can influence our perception of value (act now and get not one, but two non-stick pans!).
    • We are optimistic.
  • Gullibility, dreams of quickly-acquired wealth.
  • Only a small number of people need to fall for a fraud for the enterprise to be successful.
  • The Holder in Due Course doctrine—obliterated by the FTC in the 1970s, the ability for a seller to transfer a debt obligation to a third party created intense incentives for fraudulent sales.
  • On some level, we admire the guile of fraudsters—think about our centuries-long fascination with stories such as Reynard. The OED has over 300 words to describe deception, deceit, and trickery.
  • And there are many, many ways of cheating. Balleisen covers the many ways 19th century companies defrauded each other—wetting cotton to make it heavier, enclosing a low-value project within an envelope of high-quality material, and so on.
  • We are unwilling to criminally prosecute many consumer frauds, and when we do, convicted defendants receive laughably small sentences in light of the scale of their thefts.
  • On some level, we resent victims of fraud, and suspect that victims were somehow complicit in the scheme. The OED has 200 words for dupes.

Related to the above, what are the tensions/tactics that enable fraud today?

  • Product complexity. Complexity makes quality assessment difficult, leading us to fall back upon easily-manipulated signals, such as social proof.
    • This is, by the way, one reason why I think institutions such as Yelp will aid consumer protection little. Yelp—and even the BBB—are easily manipulated. There are even services that will do it for you, just like buying “puffs” from a 19th century newspaperman.
  • Economic complexity. As our economy becomes more complex, we have to rely and trust people we do not know—even people not in our own country.
  • Agreement complexity. Basic business models such as compounding interest cannot be defined by many consumers.
  • Corporate secrecy.
  • The ability to quickly incorporate.
  • Being able to acquire the “trappings of success.” Ponzi was known to have bought the most expensive car in production—merely possessing it offered proof of his legitimacy. Balleisen shows other examples—the importance of fraudsters to claim having a prestigious address, of having been in operation for many years, of having trademarks or other signals of brand.
  • Disclosure pollution. If a regulatory regime requires disclosure of some fact pointing to a problem, “pollute” the communication by making tons and tons of disclosures. I suspect that drug companies do this with side effects of prescription medicines.

Some final reflections–

I was surprised to learn of the historical vigor of the Better Business Bureau. I’ve long thought it to be not the most agile or effective institution. But Balleisen recounts decades when it was a serious force for consumer protection enforcement. In its heyday, it was a key actor in big fraud investigations, and it assisted public authorities in prosecutions. Balleisen shows how a conservative faction asserted control over its priorities, defanged it, and in the process, made it slouch into a kind of arbitration service for individual claims, and an opponent of anything but self-regulatory approaches. Some of the problems that Balleisen paints in the 1970 takeover, such as the problem of adverse selection in BBB membership, replicated themselves in the self-regulatory regimes for the internet.

Thoughts of “fraud” conjure images of Ponzi and Madoff. Conservatives and liberals alike disapprove of fraud as such. A problem that arises is that we use the same institutions and laws to pursue pure fraudsters as we do companies that do not live up to their advertising promises. This brand of FTC target sees himself as an honest businessman not to be painted with the same brush as hucksters. Balleisen gives the historical example of Macy’s and its promise that all of its prices were 6% lower than competitors—we know that this claim cannot be true in all situations. Macy’s saw deviance from the 6% target as just an imperfection that does not amount to deception or wrongdoing. Today, when companies like LabMD react viscerally to FTC intervention, it acts out just as its forebears. It rightly sees itself as a honest business–why is the federal government breathing down its neck? Businesses that read the situation that way always do the same thing—they accuse the FTC of pinkoism and of standing on an insecure constitutional foundation. Balleisen’s point is that their interventions introduce more and more proceduralism, but they rarely limit the substantive authorities of consumer protection institutions.

Balleisen’s book does not end in a bang. He adheres to the idea that there is no “silver bullet” to fraud, that many institutions and legal tools are needed to contain it, and that prevention (incentives for truthfulness, public education, consumer friendly defaults) should be the strategy rather than ex post remedy. He does carefully present the conservative reaction to the FTC but seems unconvinced of its cogency, or perhaps unconvinced that the critiques justify dismantling of new institutions.

by web at April 08, 2017 11:41 PM

April 05, 2017

MIDS student

Privacy matters of nations … part 1

Disclaimer: This blog has excerpts from a paper that I wrote as part of an amazing course while doing my Masters at UC Berkeley. Prof Nathan Good , you are an inspiration !

In this age of omnipresent pervasive technologies, privacy of individuals has been a focal point for various policies and laws worldwide.  Its time to look at privacy concerns from the eyes of an aggregate such as a nation. Espionage has been a disturbing reality since time immemorial. Do nations have a right to privacy against such intrusive “eyes” ? If so, are there any guidelines or framework for definition of privacy and rules of conduct at the level of a nation . If not, is it worth a discussion ? To emphasise, I do not focus on cases of surveillance by a country on its own citizens.

As I have pointed out in my blogs earlier and as you may have encountered yourself, privacy violations for individuals has almost become a norm nowadays. This is evident by the number of laws that are in place in US itself covering multiple fields to protect individual’s privacy such as the Belmont Principles, the Children’s Online Privacy Protection Act of 1998 , to name a few.

Privacy at an aggregate level

These principles can be extended to higher aggregates such as a family unit .For example, the concerns raised in the Google’s “Wi-Fi Sniffing Debacle”  were linked to the tracking of the wi-fi payload of various homes as the Street View cars were being driven around. The payload was linked to the computer and not necessarily to an individual. Federal Communications Commission made references to the federal Electronic Communications Privacy Act (ECPA) in its report. Similar concerns were raised elsewhere in the world in relation to this unconsented collection of data. Another incident which highlighted the concerns for addressing family level privacy was the famous HeLa genome study . Henrietta Lacks was a woman from Baltimore suffering from cervical cancer . Her cells were taken in 1951 without her consent . Scientists have since been studying her genome sequences to solve some challenging medical concerns . By publishing the genome sequence of her cells, the scientists had inadvertently advertised this private aspect of everyone connected to Henrietta by genes i.e. her family. The study had to be taken down when it became clear that the family’s consent had not been sought. These cases highlight the fact that the guidelines that protect individuals can also be used as a guiding principle in the context of families as a unit.

As a next level of aggregation, we look at society as a unit. Society, as a concept, can be quite ambiguous. We assume that any group of people bound together by a common thread such as residents of a given neighbourhood, consumers of a certain product etc can be thought of as belonging to a society. For example, in the case of the website Ashley Madison’s data breach, the whole user group’s privacy ( or in this case, secrecy) was at stake. Hackers had threatened to release private information of many of its users unless the website was shut down. While this was related to the personally identifiable information for each individual, the issue escalated drastically as it affected a majority of the 36 million users of the website. The Privacy commissioner of Canada stated that the Toronto-based company had in fact breached many privacy laws in Canada and elsewhere. Thus, any privacy violation that is not specific to one particular individual but a much larger group of which the individual is a member, is also looked through the lens of the same privacy laws.There are many other instances of “us vs the nosy corporates” that have been spoken about recently . For eg, due to the privacy setup and the inherent nature of the product, location of all users of Foursquare can be tracked in real time . Additionally the concept of society and privacy are quite intertwined as pointed out by sociologist Barrington Moore, “the need for privacy is a socially created need. Without society there would be no need for privacy.” ( Barrington Moore, JR., PRIVACY: STUDIES IN SOCIAL AND CULTURAL HISTORY 1984 ). As an interesting observation, Dan Solove states “Society is fraught with conflict and friction. Individuals, institutions, and governments can all engage in activities that have problematic effects on the lives of others.”

Let us now turn our attention to the next higher level of aggregation – nations. There are certain questions that arise in relation to this aggregation such as,

  • As in the case of family and society, can we assume that the principles behind privacy definition and privacy protection for individuals be as easily applied to privacy concerns of a nation as a unit ?
  • Are the concerns related to a nation’s privacy same as that of an individual ?
  • Are the threats to a nation’s privacy different in form and intent from those we looked at earlier ?

Definition ofprivacy of a nation”

It is difficult to provide an all-encompassing single definition for privacy even at the level of an individual. Thus, it is no surprise that defining such an “abstract” concept in reference to a nation as a unit becomes even harder. This is especially so because we deal with a new class of data here which is neither private nor public but is classified. However, in order to understand the concept, we will start by drawing references and analogies from the individual privacy linked studies.As Dan Solove puts it, “Privacy seems to be about everything, and therefore it appears to be nothing”

Privacy Harms

Concept of privacy, in reference to a nation, works on the philosophy of secrecy and the ability to create an autonomous decision-making zone . These in turn have been equated to national security, economic development and social stability. The secrecy philosophy, as defined by Dan Solove , defines privacy being violated if there is a public disclosure of previously concealed information. The “Taxonomy of Privacy Harms ” was propounded by Dan Solove to bring forth the kind of privacy related harms that people are trying to avoid . The harms hold true as is if the data subject in the diagram is a nation . For eg , some of the possible privacy harms at various stages are as follows

For the Nation subjected to invasion : Interference in decision making ; Intrusion

For the stage of Information processing : Secondary use of the information, exclusion, aggregation etc

For the stage of information dissemination : Breach of confidentiality, blackmail, disclosure, exposure

In the next part I will look at one of the biggest threats to the privacy of any nation – Espionage.

by arvinsahni at April 05, 2017 01:54 PM

April 03, 2017

Ph.D. student

Using python to explore Wikipedia pageview data for all current members of the U.S. Congress

pageviews-congress &v

Using python to explore Wikipedia pageview data for all current members of the U.S. Congress

By Stuart Geiger (@staeiou, User:Staeiou), licensed under the MIT license

Did you know that Wikipedia has been tracking aggregate, anonymized, hourly data about the number of times each page is viewed? There are data dumps, an API, and a web tool for exploring small sets of pages (see this blog post for more on those three). In this notebook, I show how to use python to get data on hundreds of pages at once -- every current member of the U.S. Senate and House of Representatives.


We're using mwviews for getting the pageview data, pandas for the dataframe, and seaborn/matplotlib for plotting. pywikibot is in here because I tried to use it to get titles programmatically, but gave up.

In [1]:
!pip install mwviews pywikibot seaborn pandas
Requirement already satisfied: mwviews in /home/staeiou/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: pywikibot in /home/staeiou/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: seaborn in /home/staeiou/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: pandas in /home/staeiou/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: requests in /home/staeiou/anaconda3/lib/python3.6/site-packages (from mwviews)
Requirement already satisfied: futures in /home/staeiou/anaconda3/lib/python3.6/site-packages (from mwviews)
Requirement already satisfied: httplib2>=0.9 in /home/staeiou/anaconda3/lib/python3.6/site-packages (from pywikibot)
Requirement already satisfied: python-dateutil>=2 in /home/staeiou/anaconda3/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: pytz>=2011k in /home/staeiou/anaconda3/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /home/staeiou/anaconda3/lib/python3.6/site-packages (from pandas)
Requirement already satisfied: six>=1.5 in /home/staeiou/anaconda3/lib/python3.6/site-packages (from python-dateutil>=2->pandas)
In [2]:
import mwviews
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


The .txt files are manually curated lists of titles, based on first copying and pasting the columns displaying the names of the members of Congress at List_of_United_States_Senators_in_the_115th_Congress_by_seniority and List_of_United_States_Representatives_in_the_115th_Congress_by_seniority. Then each of the article links was manually examined to make sure they match the linked page, and updated if, for example, the text said "Dan Sullivan" but the article was at "Dan Sullivan (U.S. Senator)". Much thanks to Amy Johnson who helped curate these lists.

I tried programmatically getting lists of all current members of Congress, my failed attempts can be found at the end.

The files have one title per line, so we read it in and split it into a list with .split("\n")

In [3]:
with open("senators.txt") as f:
    sen_txt =

sen_list = sen_txt.split("\n")
In [4]:
['Richard Shelby',
 'Luther Strange',
 'Lisa Murkowski',
 'Dan Sullivan (U.S. Senator)',
 'John McCain']

Checking the length of the list, we see it has 100, which is good!

In [5]:

We do the same with the house list, and we get 431 because there are currently some vacancies.

In [6]:
with open("house_reps.txt") as f:
    house_txt =
house_list = house_txt.split("\n")
In [7]:
['Bradley Byrne', 'Martha Roby', 'Mike Rogers', 'Robert Aderholt', 'Mo Brooks']
In [8]:
In [ ]:

Querying the pageviews API

mwviews makes it much easier to query the pageviews API, so we don't have to directly call the API. We can also pass in a (very long!) list of pages to get data. We get back a nice JSON formatted response, which pandas can convert to a dataframe without any help.

The main way to interact via mwviews is the PageviewsClient object, which we will create as p for short.

In [9]:
from mwviews.api import PageviewsClient

p = PageviewsClient()

When we query the API for the view data, we can set many variables in p.article_views(). We pass in sen_list as our list of articles. Granularity can be monthly or daily, and start and end dates are formatted as YYYYMMDDHH. You have to include precise start and end dates by the hour, and it will not give super helpful error messages if you do things lie set your end date before your start date or things like that. And also know that the pageview data only goes back a few years.

In [10]:
sen_views = p.article_views(project='en.wikipedia', 

sen_df = pd.DataFrame(sen_views)

If we peek at the first five rows and columns in the dataframe, we see it is formatted with one row per page, and one column per month:

In [11]:
sen_df.ix[0:5, 0:5]
2016-04-01 00:00:00 2016-05-01 00:00:00 2016-06-01 00:00:00 2016-07-01 00:00:00 2016-08-01 00:00:00
Al_Franken 43087.0 66366.0 53539.0 143641.0 37679.0
Amy_Klobuchar 19740.0 16663.0 19394.0 36931.0 10618.0
Angus_King 13951.0 13341.0 16458.0 16043.0 15773.0
Ben_Cardin 7733.0 5532.0 7198.0 6656.0 6384.0
Ben_Sasse 9943.0 78686.0 22201.0 21502.0 11996.0

We transpose this (switching rows and columns), then set the index of each row to a more readable string, Year-Month:

In [12]:
sen_df = sen_df.transpose()
sen_df = sen_df.set_index(sen_df.index.strftime("%Y-%m")).sort_index()
sen_df.ix[0:5, 0:5]
Al_Franken Amy_Klobuchar Angus_King Ben_Cardin Ben_Sasse
2016-04 43087.0 19740.0 13951.0 7733.0 9943.0
2016-05 66366.0 16663.0 13341.0 5532.0 78686.0
2016-06 53539.0 19394.0 16458.0 7198.0 22201.0
2016-07 143641.0 36931.0 16043.0 6656.0 21502.0
2016-08 37679.0 10618.0 15773.0 6384.0 11996.0

We can get the sum for each page by running .sum(), and we can peek into the first five pages:

In [13]:
sen_sum = sen_df.sum()
Al_Franken       1400454.0
Amy_Klobuchar     340114.0
Angus_King        281545.0
Ben_Cardin        135774.0
Ben_Sasse         384434.0
dtype: float64

We can get the sum for each month by transposing back and running .sum() on the dataframe:

In [14]:
sen_monthly_sum = sen_df.transpose().sum()
2016-04    3931109.0
2016-05    3493508.0
2016-06    3358614.0
2016-07    6661905.0
2016-08    2012990.0
2016-09    2000842.0
2016-10    3647561.0
2016-11    6361233.0
2016-12    2352725.0
2017-01    5803284.0
2017-02    4912876.0
2017-03    3882319.0
dtype: float64

And we can get the sum of all the months from 2016-04 to 2016-03 by summing the monthly sum, which gives us 48.4 million pageviews:

In [15]:

We can use the built-in plotting functionality in pandas dataframes to show a monthly plot. You can adjust kind to be many types, including bar, line, and area.

In [16]:
fig = plt.figure()
fig.suptitle("Monthly Wikipedia pageviews for current U.S. Senators")
plt.ticklabel_format(style = 'plain')

ax = sen_monthly_sum.plot(kind='barh', figsize=[12,6])
ax.set_xlabel("Monthly pageviews")
<matplotlib.text.Text at 0x7faf78d4b320>

The House

We do the same thing for the House of Representatives, only with different variables. Recall that house_list is our list of titles:

In [17]:
['Bradley Byrne', 'Martha Roby', 'Mike Rogers', 'Robert Aderholt', 'Mo Brooks']
In [18]:
house_views = p.article_views(project='en.wikipedia', 
house_df = pd.DataFrame(house_views)
house_df.ix[0:5, 0:5]
2016-04-01 00:00:00 2016-05-01 00:00:00 2016-06-01 00:00:00 2016-07-01 00:00:00 2016-08-01 00:00:00
Adam_Kinzinger 6579.0 10515.0 12002.0 7217.0 22613.0
Adam_Schiff 6541.0 6649.0 12993.0 7501.0 4760.0
Adam_Smith_(politician) 2712.0 2400.0 2770.0 2939.0 2458.0
Adrian_Smith_(politician) 1368.0 1295.0 1285.0 1151.0 1432.0
Adriano_Espaillat 1296.0 1061.0 5591.0 5360.0 1729.0
In [19]:
house_df = house_df.transpose()
house_df = house_df.set_index(house_df.index.strftime("%Y-%m")).sort_index()
house_df.ix[0:5, 0:5]
Adam_Kinzinger Adam_Schiff Adam_Smith_(politician) Adrian_Smith_(politician) Adriano_Espaillat
2016-04 6579.0 6541.0 2712.0 1368.0 1296.0
2016-05 10515.0 6649.0 2400.0 1295.0 1061.0
2016-06 12002.0 12993.0 2770.0 1285.0 5591.0
2016-07 7217.0 7501.0 2939.0 1151.0 5360.0
2016-08 22613.0 4760.0 2458.0 1432.0 1729.0
In [20]:
house_sum = house_df.sum()
Adam_Kinzinger               162674.0
Adam_Schiff                  406908.0
Adam_Smith_(politician)       40274.0
Adrian_Smith_(politician)     19851.0
Adriano_Espaillat             67980.0
dtype: float64
In [21]:
house_monthly_sum = house_df.transpose().sum()
2016-04    1727960.0
2016-05    1940369.0
2016-06    1983199.0
2016-07    3009143.0
2016-08    1644636.0
2016-09    1609682.0
2016-10    2558133.0
2016-11    5095820.0
2016-12    2408666.0
2017-01    4190713.0
2017-02    3905450.0
2017-03    5931667.0
dtype: float64
In [22]:

This gives us 36 million total pageviews for House reps.

In [23]:
fig = plt.figure()
fig.suptitle("Monthly Wikipedia pageviews for current U.S. House of Representatives")
plt.ticklabel_format(style = 'plain')

ax = house_monthly_sum.plot(kind='barh', figsize=[12,6])
ax.set_xlabel("Monthly pageviews")
<matplotlib.text.Text at 0x7faf5ff04518>

Combining the datasets

We have to transpose each dataset back, then append one to the other:

In [24]:
congress_df = house_df.transpose().append(sen_df.transpose())
2016-04 2016-05 2016-06 2016-07 2016-08 2016-09 2016-10 2016-11 2016-12 2017-01
Adam_Kinzinger 6579.0 10515.0 12002.0 7217.0 22613.0 6846.0 6869.0 19077.0 14200.0 18531.0
Adam_Schiff 6541.0 6649.0 12993.0 7501.0 4760.0 5068.0 8318.0 13906.0 17191.0 20320.0
Adam_Smith_(politician) 2712.0 2400.0 2770.0 2939.0 2458.0 2802.0 2841.0 4745.0 3301.0 5510.0
Adrian_Smith_(politician) 1368.0 1295.0 1285.0 1151.0 1432.0 1363.0 2004.0 2292.0 1481.0 1967.0
Adriano_Espaillat 1296.0 1061.0 5591.0 5360.0 1729.0 2754.0 2017.0 11937.0 4421.0 15559.0
Al_Green_(politician) 4527.0 3047.0 3243.0 3141.0 2028.0 2878.0 2915.0 3549.0 2382.0 6651.0
Al_Lawson 30.0 36.0 34.0 68.0 479.0 1070.0 1185.0 3856.0 2326.0 4971.0
Alan_Lowenthal 2164.0 2151.0 2575.0 1760.0 1597.0 1455.0 2278.0 3401.0 1985.0 3135.0
Albio_Sires 2348.0 2126.0 2467.0 1960.0 1679.0 3582.0 2483.0 4875.0 1993.0 3175.0
Alcee_Hastings 4795.0 5958.0 5533.0 9017.0 4581.0 4075.0 4711.0 6982.0 3866.0 8475.0
In [ ]:
In [25]:
congress_monthly_sum = congress_df.sum()
2016-04     5659069.0
2016-05     5433877.0
2016-06     5341813.0
2016-07     9671048.0
2016-08     3657626.0
2016-09     3610524.0
2016-10     6205694.0
2016-11    11457053.0
2016-12     4761391.0
2017-01     9993997.0
2017-02     8818326.0
2017-03     9813986.0
dtype: float64

Then to find the total pageviews, run sum on the sum. This is 84.4 million pageviews from March 2016 to March 2017 for all U.S. Members of Congress:

In [26]:
In [27]:
fig = plt.figure()
fig.suptitle("Monthly Wikipedia pageviews for current U.S. Members of Congress")
plt.ticklabel_format(style = 'plain')

ax = congress_monthly_sum.plot(kind='barh', figsize=[12,6])

ax.set_xlabel("Monthly pageviews")
<matplotlib.text.Text at 0x7faf8a2133c8>

Plotting a single page's views over time

We can query the dataframe by index for a specific page, then plot it:

In [31]:
fig = plt.figure()
fig.suptitle("Monthly Wikipedia pageviews for Al Lawson")
plt.ticklabel_format(style = 'plain')

ax = congress_df.ix['Al_Lawson'].plot(kind='barh')

ax.set_xlabel("Monthly pageviews")
<matplotlib.text.Text at 0x7faf5295f1d0>

Output data

We will export these to a folder called data, in csv and excel formats:

In [32]:

In [ ]:

Old code for trying to programatically get lists of members of Congress

In [33]:
# used to stop "Restart and run all" execution 

assert False is True
AssertionError                            Traceback (most recent call last)
<ipython-input-33-20e02078d1a5> in <module>()
      1 # used to stop "Restart and run all" execution
----> 3 assert False is True

In [ ]:
site = pywikibot.Site(code="en")
In [ ]:
In [ ]:
rep_page = pywikibot.Page(site, title="List_of_United_States_Representatives_in_the_115th_Congress_by_seniority")
In [ ]:
rep_list = []
for page in rep_page.linkedPages():
    has_from_cat = False
    has_births_cat = False
    for category in page.categories():
        #print("\t", category.title())
        if category.title().find("Category:Members of the United States House of Representatives from") >= 0:
            has_from_cat = True
        if category.title().find("births") >= 0:
            has_births_cat = True
        if has_births_cat & has_from_cat:
In [ ]:
senate_list = []
for page in rep_page.linkedPages():
    has_from_cat = False
    has_births_cat = False
    for category in page.categories():
        #print("\t", category.title())
        if category.title().find("United States Senators") >= 0:
            has_from_cat = True
        if category.title().find("births") >= 0:
            has_births_cat = True
        if has_from_cat:

by stuart at April 03, 2017 07:00 AM

April 01, 2017

Ph.D. student

Varela’s modes of explanation and the teleonomic

I’m now diving deep into Francisco Varela’s Principles of Biological Autonomy (1979). Chapter 8 draws on his paper with Maturana, “Mechanism and biological explanation” (1972) (html). Chapter 9 draws heavily from his paper, “Describing the Logic of the Living: adequacies and limitations of the idea of autopoiesis” (1978) (html).

I am finding this work very enlightening. Somehow it bridges between my interests in philosophy of science right into my current work on privacy by design. I think I will find a way to work this into my dissertation after all.

Varela has a theory of different modes of explanation of phenomena.

One form of explanation is operational explanation. The categories used in these explanations are assumed to be components in the system that generated the phenomena. The components are related to each other in a causal and lawful (nomic) way. These explanations are valued by science because they are designed so that observers can best predict and control the phenomena under study. This corresponds roughly to what Habermas identifies as technical knowledge in Knowledge and Human Interests. In an operational explanation, the ideas of purpose or function have no explanatory value; rather the observer is free to employ the system for whatever purpose he or she wishes.

Another form of explanation is symbolic explanation, which is a more subtle and difficulty idea. It is perhaps better associated with phenomenology and social scientific methods that build on it, such as ethnomethodology. Symbolic explanations, Varela argues, are complementary to operational explanations and are necessary for a complete description of “living phenomenology”, which I believe Varela imagines as a kind of observer-inclusive science of biology.

To build up to his idea of the symbolic explanation, Varela first discusses an earlier form of explanation, now out of fashion: teleological explanation. Teleological explanations do not support manipulation, but rather “understanding, communication of intelligible perspective in regard to a phenomenal domain”. Understanding the “what for” of a phenomenon, what its purpose is, does not tell you how to control the phenomenon. While it may help regulate ones expectations, Varela does not see this as its primary purpose. Communicability motivates teleological explanation. This resonates with Habermas’s idea of hermeneutic knowledge, what is accomplished through intersubjective understanding.

Varela does not see these modes of explanation as exclusive. Operational explanations assume that “phenomena occur through a network of nomic (lawlike) relationships that follow one another. In the symbolic, communicative explanation the fundamental assumption is that phenomena occur through a certain order or pattern, but the fundamental focus of attention is on certain moments of such an order, relative to the inquiring community.” But these modes of explanation are fundamentally compatible.

“If we can provide a nomic basis to a phenomenon, an operational description, then a teleological explanation only consists of putting in parenthesis or conceptually abbreviating the intermediate steps of a chain of causal events, and concentrating on those patterns that are particularly interesting to the inquiring community. Accordingly, Pittendrich introduced the term teleonomic to designate those teleological explanations that assume a nomic structure in the phenomena, but choose to ignore intermediate steps in order to concentrate on certain events (Ayala, 1970). Such teleologic explanations introduce finalistic terms in an explanation while assuming their dependence in some nomic network, hence the name teleo-nomic.”

A symbolic explanation that is consistent with operational theory, therefore, is a teleonomic explanation: it chooses to ignore some of the operations in order to focus on relationships that are important to the observer. There are coherent patterns of behavior which the observer chooses to pay attention to. Varela does not use the word ‘abstraction’, as a computer scientist I am tempted to. But Varela’s domains of interest, however, are complex physical systems often represented as dynamic systems, not the kind of well-defined chains of logical operations familiar from computer programming. In fact, one of the upshots of Varela’s theory of the symbolic explanation is a criticism of naive uses of “information” in causal explanations that are typical of computer scientists.

“This is typical in computer science and systems engineering, where information and information processing are in the same category as matter and energy. This attitude has its roots in the fact that systems ideas and cybernetics grew in a technological atmosphere that acknowledged the insufficiency of the purely causalistic paradigm (who would think of handling a computer through the field equations of thousands of integrated circuits?), but had no awareness of the need to make explicit the change in perspective taken by the inquiring community. To the extent that the engineering field is prescriptive (by design), this kind of epistemological blunder is still workable. However, it becomes unbearable and useless when exported from the domain of prescription to that of description of natural systems, in living systems and human affairs.”

This form of critique makes its way into a criticism of artificial intelligence by Winograd and Flores, presumabley through the Chilean connection.

by Sebastian Benthall at April 01, 2017 12:05 AM

March 28, 2017

Ph.D. student

More assessment of AI X-risk potential

I’m been stimulated by Luciano Floridi’s recent article in Aeon “Should we be afraid of AI?”. I’m surprised that this issue hasn’t been settled yet, since it seems like “we” have the formal tools necessary to solve the problem decisively. But nevertheless this appears to be the subject of debate.

I was referred to Kaj Sotala’s rebuttal of an earlier work by Floridi which his Aeon article was based on. The rebuttal appears in this APA Newsletter on Philosophy and Computers. It is worth reading.

The issue that I’m most interested in is whether or not AI risk research should constitute a special, independent branch of research, or whether it can be approached just as well by pursuing a number of other more mainstream artificial intelligence research agendas. My primary engagement with these debates has so far been an analysis of Nick Bostrom’s argument in his book Superintelligence, which tries to argue in particular that there is an existential risk (or X-risk) to humanity from artificial intelligence. “Existential risk” means a risk to the existence of something, in this case humanity. And the risk Bostrom has written about is the risk of eponymous superintelligence: an artificial intelligence that gets smart enough to improve its own intelligence, achieve omnipotence, and end the world as we know it.

I’ve posted my rebuttal to this argument on arXiv. The one-sentence summary of the argument is: algorithms can’t just modify themselves into omnipotence because they will hit performance bounds due to data and hardware.

A number of friends have pointed out to me that this is not a decisive argument. They say: don’t you just need the AI to advance fast enough and far enough to be an existential threat?

There are a number of reasons why I don’t believe this is likely. In fact, I believe that it is provably vanishingly unlikely. This is not to say that I have a proof, per se. I suppose it’s incumbent on me to work it out and see if the proof is really there.

So: Herewith is my Sketch Of A Proof of why there’s no significant artificial intelligence existential risk.

Lemma: Intelligence advances due to purely algorithmic self-modificiation will always plateau due to data and hardware constraints, which advance more slowly.

Proof: This paper.

As a consequence, all artificial intelligence explosions will be sigmoid. That is, starting slow, accelerating, then decelerating, the growing so slowly as to be asymptotic. Let’s call the level of intelligence at which an explosion asymptotes the explosion bound.

There’s empirical support for this claim. Basically, we have never had a really big intelligence explosion due to algorithmic improvement alone. Looking at the impressive results of the last seventy years, most of the impressiveness can be attributed to advances in hardware and data collection. Notoriously, Deep Learning is largely just decades old artificial neural network technology repurposed to GPU’s on the cloud. Which is awesome and a little scary. But it’s not an algorithmic intelligence explosion. It’s a consolidation of material computing power and sensor technology by organizations. The algorithmic advances fill those material shoes really quickly, it’s true. This is precisely the point: it’s not the algorithms that’s the bottleneck.

Observation: Intelligence explosions are happening all the time. Most of them are small.

Once we accept the idea that intelligence explosions are all bounded, it becomes rather arbitrary where we draw the line between an intelligence explosion and some lesser algorithmic intelligence advance. There is a real sense in which any significant intelligence advance is a sigmoid expansion in intelligence. This would include run-of-the-mill scientific discoveries and good ideas.

If intelligence explosions are anything like virtually every other interesting empirical phenomenon, then they are distributed according to a heavy tail distribution. This means a distribution with a lot of very small values and a diminishing probability of higher values that nevertheless assigns some probability to very high values. Assuming intelligence is something that can be quantified and observed empirically (a huge ‘if’ taken for granted in this discussion), we can (theoretically) take a good hard look at the ways intelligence has advanced. Look around you. Do you see people and computers getting smarter all the time, sometimes in leaps and bounds but most of the time minutely? That’s a confirmation of this hypothesis!

The big idea here is really just to assert that there is a probability distribution over intelligence explosion bounds that all actual intelligence explosions are being drawn from. This follows more or less directly from the conclusion that all intelligence explosions are bounded. Once we posit such a distribution, it becomes possible to take expected values of functions of its values and functions of its values.

Empirical claim: Hardware and sensing advances diffuse rapidly relative to their contribution to intelligence gains.

There’s an material, socio-technical analog to Bostrom’s explosive superintelligence. We could imagine a corporation that is working in secret on new computing infrastructure. Whenever it has an advance in computing infrastructure, the AI people (or increasingly, the AI-writing-AI) develops programming that maximizes its use of this new technology. Then it uses that technology to enrich its own computer-improving facilities. When it needs more…minerals…or whatever it needs to further its research efforts, it finds a way to get them. It proceeds to take over the world.

This may presently be happening. But evidence suggests that this isn’t how the technology economy really works. No doubt Amazon (for example) is using Amazon Web Services internally to do its business analytics. But also it makes its business out of selling out its computing infrastructure to other organizations as a commodity. That’s actually the best way it can enrich itself.

What’s happening here is the diffusion of innovation, which is a well-studied phenomenon in economics and other fields. Ideas spread. Technological designs spread. I’d go so far as to say that it is often (perhaps always?) the best strategy for some agent that has locally discovered a way to advance its own intelligence to figure out how to trade that intelligence to other agents. Almost always that trade involves the diffusion of the basis of that intelligence itself.

Why? Because since there are independent intelligence advances of varying sizes happening all the time, there’s actually a very competitive market for innovation that quickly devalues any particular gain. A discovery, if hoarded, will likely be discovered by somebody else. The race to get credit for any technological advance at all motivates diffusion and disclosure.

The result is that the distribution of innovation, rather than concentrating into very tall spikes, is constantly flattening and fattening itself. That’s important because…

Claim: Intelligence risk is not due to absolute levels of intelligence, but relative intelligence advantage.

The idea here is that since humanity is composed of lots of interacting intelligence sociotechnical organizations, any hostile intelligence is going to have a lot of intelligent adversaries. If the game of life can be won through intelligence alone, then it can only be won with a really big intelligence advantage over other intelligent beings. It’s not about absolute intelligence, it’s intelligence inequality we need to worry about.

Consequently, the more intelligence advances (i.e, technologies) diffuse, the less risk there is.

Conclusion: The chance of an existential risk from an intelligence explosion is small and decreasing all the time.

So consider this: globally, there’s tons of investment in technologies that, when discovered, allow for local algorithmic intelligence explosions.

But even if we assume these algorithmic advances are nearly instantaneous, they are still bounded.

Lots of independent bounded explosions are happening all the time. But they are also diffusing all the time.

Since the global intelligence distribution is always fattening, that means that the chance of any particular technological advance granting a decisive advantage over others is decreasing.

There is always the possibility of a fluke, of course. But if there was going to be a humanity destroying technological discovery, it would probably have already been invented and destroyed us. Since it hasn’t, we have a lot more resilience to threats from intelligence explosions, not to mention a lot of other threats.

This doesn’t mean that it isn’t worth trying to figure out how to make AI better for people. But it does diminish the need to think about artificial intelligence as an existential risk. It makes AI much more comparable to a biological threat. Biological threats could be really bad for humanity. But there’s also the organic reality that life is very resilient and human life in general is very secure precisely because it has developed so much intelligence.

I believe that thinking about the risks of artificial intelligence as analogous to the risks from biological threats is helpful for prioritizing where research effort in artificial intelligence should go. Just because AI doesn’t present an existential risk to all of humanity doesn’t mean it doesn’t kill a lot of people or make their lives miserable. On the contrary, we are in a world with both a lot of artificial and non-artificial intelligence and a lot of miserable and dying people. These phenomena are not causally disconnected. A good research agenda for AI could start with an investigation of these actually miserable people and what their problems are, and how AI is causing that suffering or alternatively what it could do to improve things. That would be an enormously more productive research agenda than one that aims primarily to reduce the impact of potential explosions which are diminishingly unlikely to occur.

by Sebastian Benthall at March 28, 2017 01:07 AM

March 26, 2017

adjunct professor

D-Link Updates

The seal has been lifted on the complaint in the D-Link case. This document highlights the previously redacted portions in yellow.

Yesterday (April 3, 2017), D-Link filed a motion to dismiss that includes the initial hearing transcript.

by web at March 26, 2017 12:25 AM

March 24, 2017

MIMS 2014

Adventures in Sparkland (or… How I Learned that Michael Caine was the original Jason Bourne)

Ready, set, revive data blog! What better way to take advantage of the sketchy wifi I’ve encountered along my travels through South America than to do do some data science?

For some time now, I’ve wanted to get my feet wet with Apache Spark, the open source software that has become a standard tool on the data scientist’s utility belt when it comes to dealing with “big data.” Specifically, I was curious how Spark can understand complex human-generated text (through topic or theme modeling), as well as its ability to make recommendations based on preferences we’ve expressed in the past (i.e. how Netflix decides what to suggest you should watch next). For this, it only seemed natural to focus my energies on something I am also quite passionate about: Movies!


Many people have already used the well known and publicly available Movielens dataset (READMEdata) to test out recommendation engines before. To add my own twist on standard practice, I added a topic model based off of movie plot data that I scraped from Wikipedia. This blog post will go into detail about the whole process. It’s organized into the following sections:

Setting Up The Environment

To me, this is always the most boring part of doing a data project. Unfortunately, this yak-shaving is wholly necessary to ever do anything interesting. If you only came to read about how this all relates to movies, feel free to skip over this part…

I won’t go into huge depth here, but I will say I effin love Docker as a means to set-up my environment. The reason Docker is so great is that it makes a dev environment totally explicit and portable—which means anybody who’s actually interested in the gory details can go wild with them on my Github (and develop my project further, if they so please).

Another reason Docker is the awesomest is that it made the process of simulating a cluster on my little Macbook Air relatively straightforward. Spark might be meant to be run on a cluster of multiple computers, but being on a backpacker’s budget, I wasn’t keen on commandeering a crowd of cloud computers using Amazon Web Services. I wanted to see what I could do with what I had.

The flip side of this, of course, is that everything was constrained to my 5-year-old laptop’s single processor and the 4GB of RAM I could spare to be shared by the entire virtual cluster. I didn’t think this would be a problem since I wasn’t dealing with big data, but I did keep running up against some annoying memory issues that proved to be a pain. More about that later.


The first major step in my project was getting ahold of movie plot data for each of the titles in the Movielens dataset. For this, I wrote a scraper in python using this handy wikipedia python library I found. The main idea behind my simple program was to:  1) search wikipedia using the title of each movie, 2) Use category tags to determine which search result was the article relating to the actual film in question, and 3) Use python’s BeautifulSoup and Wikipedia’s generally consistent html structure to extract the “plot” section from each article.

I wrapped these three steps in a bash script that would keep pinging wikipedia until it had attempted to grab plots for all the films in the Movielens data. This was something I could let run overnight or while trying to learn to dance like these people (SPOILER ALERT: I still can’t)

The results of this automated strategy were fair overall. Out of the 3,883 movie titles in the Movielens data, I was able to extract plot information for 2,533 or roughly 2/3 of them. I was hoping for ≥ 80%, but what I got was definitely enough to get started.

As I would later find however, even what I was able to grab was sometimes of dubious quality. For example, when the scraper was meant to grab the plot for Kids, the risqué 90’s drama about sex/drug-fueled teens in New York City, it grabbed the plot for Spy Kids instead. Not. the. same. Or when it was meant to grab the plot for Wild Things, another risqué 90’s title (but otherwise great connector in the Kevin Bacon game), it grabbed the plot for Where The Wild Things Are. Again, not. the. same. When these movies popped up in the context of trying to find titles that are similar to Toy Story, it was definitely enough to raise an eyebrow…

All this points to the importance of eating your own dog food when it comes to working with new, previously un-vetted data. Yes, it is a time consuming process, but it’s very necessary (and at least for this movie project, mildly entertaining).

Model Dem Topics

So first, one might ask: why go through the trouble of using a topic model to describe movie plot data? Well for one thing, it’s kinda interesting to see how a computer would understand movie plots and relate them to one another using probability-based artificial intelligence. But topic models offer practical benefits as well.

For one thing, absent a topic model, a computer generally represents a plot summary (or any document for that matter) as a bag of the words contained in that summary. That can be a lot of words, especially because a computer has to keep track of the words in the summary of not just a single movie, but rather the union of all the words in all the summaries of all the movies in the whole dataset.

Topic models reduce the complexity of representing a plot summary from a whole bag of words to a much smaller set of topics. This makes storing information about movies much more efficient in a computer’s memory. It also significantly speeds up calculations you might want to perform, such as seeing how similar one movie plot is to another. And finally, using a topic model can potentially help the computer describe the similarities between movies in a more sensible way. This increased accuracy can be used to improve the performance of other models, such as a recommendation engine.

Spark learns the topics across a set of plot summaries using a probabilistic process known as Latent Dirichlet Allocation or LDA. I won’t describe how LDA works in great depth (look here if you are interested in learning more), but after analyzing all the movie plots, it spits out a set of topics, i.e. lists of words that are supposed to be thematically related to each other if the algorithm did its job right. Each word within each topic has a weight proportional to its importance within the topic; words can repeat across topics but their weights will differ.

One somewhat annoying thing about using LDA is that you have to specify the number of topics before running the algorithm, which is an awkward thing to pinpoint a priori. How can you know how exactly how many topics exist across a corpus of movies—especially without reading all of the summaries? Another wrinkle to LDA is how sensitive it can be to the degree of pre-processing performed upon a text corpus before feeding it to the model.

After settling on 16 topics and a slew of preprocessing steps (stop word removal, Porter stemming, and part-of-speech filtering), I started to see topics that made sense. For example, there was a topic that broadly described a “Space Opera”:

Top 20 most important tokens in the “Space Opera” topic:

[ship, crew, alien, creatur, planet, space, men, group, team, time, order, board, submarin, death, plan, mission, home, survivor, offic, bodi]

Another topic seemed to be describing the quintessential sports drama. BTW, the lopped-off words like submarin or creatur are a result of Porter stemming, which reduces words to their more essential root forms.

Top 20 most important tokens in the “Sports Drama” topic:

[team, famili, game, offic, time, home, friend, player, day, father, men, man, money, polic, night, film, life, mother, car, school]

To sanity check the topic model, I was curious to see how LDA would treat films that were not used in the training of the original model. For this, I had to get some more movie plot data, which I did based on this IMDB list of top movies since 2000. The titles in the Movielens data tend to run a bit on the older side, so I knew I could find some fresh material by searching for some post-2000 titles.

To eyeball the quality of the results, I compared the topic model with the more simple “bag of words” model I mentioned earlier. For a handful of movies in the newer post-2000 set, I asked both models to return the most similar movies they could find in the original Movielens set.

I was encouraged (though not universally) by the results. Take, for example the results returned for V for Vendetta and Minority Report.

Similarity Rank: V for Vendetta

Similarity Rank Bag of Words Topic Model
1 But I’m a Cheerleader Candidate, The
2 Life Is Beautiful Dersu Uzala
3 Evita No Small Affair
4 Train of Life Terminator 2: Judgment Day
5 Jakob the Liar Schindler’s List
6 Halloween Mulan
7 Halloween: H20 Reluctant Debutante, The
8 Halloween II All Quiet on the Western Front
9 Forever Young Spartacus
10 Entrapment Grand Day Out, A

Similarity Rank: Minority Report

Similarity Rank Bag of Words Topic Model
1 Blind Date Seventh Sign, The
2 Scream 3 Crow: Salvation, The
3 Scream Crow, The
4 Scream of Stone Crow: City of Angels, The
5 Man of Her Dreams Passion of Mind
6 In Dreams Soylent Green
7 Silent Fall Murder!
8 Eyes of Laura Mars Hunchback of Notre Dame, The
9 Waking the Dead Batman: Mask of the Phantasm
10 I Can’t Sleep Phantasm

Thematically, it seems like for these two movies, the topic model gives broadly more similar/sensible results in the top ten than the baseline “bag of words” approach. (Technical note: the “bag of words” approach I refer to is more specifically a Tf-Idf transformation, a standard method used in the field of Information Retrieval and thus a reasonable baseline to use for comparison here.)

Although the topic model seemed to deliver in the case of these two films, that was not universally the case. In the case of Michael Clayton, there was no contest as to which model was better:

Similarity Rank: Michael Clayton

Similarity Rank Bag of Words Topic Model
1 Firm, The Low Down Dirty Shame, A
2 Civil Action, A Bonfire of the Vanities
3 Boiler Room Reindeer Games
4 Maybe, Maybe Not Raging Bull
5 Devil’s Advocate, The Chasers
6 Devil’s Own, The Mad City
7 Rounders Bad Lieutenant
8 Joe’s Apartment Killing Zoe
9 Apartment, The Fiendish Plot of Dr. Fu Manchu, The
10 Legal Deceit Grifters, The

In this case, it seems the Bag of Words model picked up on the legal theme while the topic model completely missed it. In the case of The Social Network, something else curious (and bad) happened:

Similarity Rank: The Social Network

Similarity Rank Bag of Words Topic Model
1 Twin Dragons Good Will Hunting
2 Higher Learning Footloose
3 Astronaut’s Wife, The Grease 2
4 Substitute, The Trial and Error
5 Twin Falls Idaho Love and Other Catastrophes
6 Boiler Room Blue Angel, The
7 Birdcage, The Lured
8 Quiz Show Birdy
9 Reality Bites Rainmaker, The
10 Broadcast News S.F.W.

With Good Will Hunting—another film about a gifted youth hanging around Cambridge, Massachusetts—it seemed like the topic model was off to a good start here. But then with Footloose and Grease 2 following immediately after, things start to deteriorate quickly. The crappy-ness of both result sets speaks to the overall low quality of the data we’re dealing with—both in terms of the limited set of movies available in the original Movielens data, as well as the quality of the Wikipedia plot data.

Still, when I saw Footloose, I was concerned that perhaps there might be a bug in my code. Digging a little deeper, I discovered that both movies did in fact share the highest score in a particular topic. However, the bulk of these scores are earned from different words within this same topic. This means that the words within the topics of the LDA model aren’t always very related to each other—a rather serious fault since that is exactly what it is meant to accomplish.

The fact is, it’s difficult to gauge the overall quality of the topic model even by eyeballing a handful of results as I’ve done. This is because like any clustering method, LDA is a form of unsupervised machine learning. That is to say, unlike a supervised machine learning method, there is no ground truth, or for-sure-we-know-it’s-right label, that we can use to objectively evaluate model performance.

However, what we can do is use the output from the topic model as input into the recommendation engine model (which is a supervised model). From there, we can see if the information gained from the topic model improves the performance of the recommendation engine. That was, in fact, my main motivation for using the topic model in the first place.

But before I get into that, I did want to share perhaps the most entertaining finding from this whole exercise (and the answer to the clickbait-y title of this blog post). The discovery occurred when I was comparing the bag of words and topic model results for The Bourne Ultimatum:

Similarity Rank: The Bourne Ultimatum

Similarity Rank Bag of Words Topic Model
1 Pelican Brief, The Three Days of the Condor
2 Light of Day Return of the Pink Panther, The
3 Safe Men Ipcress File, The
4 JFK Cop Land
5 Blood on the Sun Sting, The
6 Three Days of the Condor Great Muppet Caper, The
7 Shadow Conspiracy From Here to Eternity
8 Universal Soldier Man Who Knew Too Little, The
9 Universal Soldier: The Return Face/Off
10 Mission: Impossible 2 Third World Cop

It wasn’t the difference in the quality of the two result sets that caught my eye. In fact, with The Great Muppet Caper in there, the quality of the topic model seems a bit suspect, if anything.

What interested me was the emphasis the topic model placed on the similarity of some older tiles, like Three Days of the Condor, or The Return of the Pink Panther. But it was the 1965 gem, The Ipcress File, that took the cake. Thanks to the LDA topic model, I now know this movie exists, showcasing Michael Caine in all his 60’s badass glory. That link goes to the full trailer. Do yourself a favor and watch the whole thing. Or at the very least, watch this part, coz it makes me lol. They def don’t make ’em like they used to…

Rev Your Recommendation Engines

To incorporate the topic data into the recommendation engine, I first took the top-rated movies from each user in the Movielens dataset and created a composite vector for each user based on the max of each topic across their top rated movies. In other words, I created a “profile” of sorts for each user that summarized their tastes based on the most extreme expressions of each topic across the movies they liked the most.

After I had a profile for each user, I could get a similarity score for almost every movie/user pair in the Movielens dataset. Mixing these scores with the original Movielens ratings is a bit tricky, however, due to a wrinkle in the Spark recommendation engine implementation. When training a recommendation engine with Spark, one must choose between using either explicit or implicit ratings as inputs, but not both. The Movielens data is based on explicit ratings that users gave movies between 1 and 5. The similarity scores, by contrast, are signals I infer based on a user’s top-rated movies along with the independently trained topic model described above. In other words, the similarity scores are implicit data—not feedback that came directly from the user.

To combine the two sources of data, therefore, I had to convert the explicit data into implicit data. In the paper that explains Spark’s implicit recommendation algorithm, training examples for the implicit model are based off the confidence one has that a user likes a particular item rather than an explicit statement of preference. Given the original Movielens data, it makes sense to associate ratings of 4 or 5 with high confidence that a user liked a particular movie. One cannot, however, associate low ratings of 1, 2, or 3 with a negative preference, since in the implicit model, there is no notion of negative feedback. Instead, low ratings for a film correspond only to low confidence that a user liked that particular movie.

Since we lose a fair amount of information in converting explicit data to implicit data, I wouldn’t expect the recommendation engine I am building to beat out the baseline Movielens model, seeing as explicit data is generally a superior basis upon which to train a recommendation engine. However, I am more interested in seeing whether a model that incorporates information about movie plots can beat a model that does not. Also, it’s worth noting that many if not most real-world recommendation engines don’t have the luxury of explicit data and must rely instead on less reliable implicit signals. So if anything, handicapping the Movielens data as I am doing makes the setting more realistic.


So does the movie topic data add value to the recommendation engine? Answering this question proved technically challenging, due to the limitations of my old Macbook Air :sad:.

One potential benefit of incorporating movie topic data is that scores can be generated for any (user, movie) pair that’s combinatorially possible given the underlying data. If the topic information did in fact add value to the recommendation engine, then the model could train upon a much richer set of data, including examples not directly observed in real life. But as I mentioned, my efforts to explore the potential benefit of this expanded data slammed against the memory limits I was confined to on my 5-year-old Macbook.

My constrained resources provided a lovely opportunity to learn all about Java Garbage Collection in Spark, but my efforts to tune the memory management of my program proved futile. I became convinced that an un-tunable hard memory limit was the culprit when I saw repeated executors fail after max-ing out their JVM heaps while running a series of full garbage collections. The Spark tuning guide says that if “a full GC is invoked multiple times for before a task completes, it means that there isn’t enough memory available for executing tasks.” I seemed to find myself in exactly this situation.

Since I couldn’t train on bigger data, I pretended I had less data instead. I trained two models. In one model, I pretended that I didn’t know anything about some of the ratings given to movies by users (in practice this meant setting a certain percentage of ratings to 0, since in the implicit model, 0 implies no confidence that a user prefers an item).  In a second model, I set these ratings to the similarity scores that came from the topic model.

The results of this procedure were mixed. When I covered up 25% of the data, the two recommendation engines performed roughly the same. However, when I covered up 75% of the data, there was about a 3% bump in performance for the topic model-based recommendation engine.

Although there might be some benefit (and at worst no harm) to using the topic model data, what I’d really like to do is map out a learning curve for my recommendation engine. In the context of machine learning, learning curves are curves that chart algorithm performance as a function of the number of training samples used to train the algorithm. Based on the two points I sampled, we cannot know for certain whether the benefit of including topic model data is always crowded out by the inclusion of more real world samples. We also cannot know whether using expanded data based on combinatorially generated similarity scores improves engine performance.

Given my hardware limits and my commitment to using only the resources in my backpack, I couldn’t map out this learning curve more methodically. I also couldn’t explore how using a different number of topics in the LDA model affects performance—something else I was curious to explore. In the end, my findings are only suggestive.

While I couldn’t explore everything I wanted, I ultimately learned a butt-load about how Spark works, which was my goal for starting this project in the first place. And of course, there was The Ipcress File discovery. Oh what’s that? You didn’t care much for The Ipcress File?  You didn’t even watch the trailer? Well, then I have to ask you:

by dgreis at March 24, 2017 12:36 AM

March 22, 2017

Ph.D. student

Lenin and Luxemburg

One of the interesting parts of Scott’s Seeing Like a State is a detailed analysis of Vladimir Lenin’s ideological writings juxtaposed with one of this contemporary critics, Rosa Luxemburg, who was a philosopher and activist in Germany.

Scott is critical of Lenin, pointing out that while his writings emphasize the role of a secretive intelligentsia commanding the raw material of an angry working class through propaganda and a kind of middle management tier of revolutionarily educated factory bosses, this is not how the revolution actually happened. The Bolsheviks took over an empty throne, so to speak, because the czars had already lost their power fighting Austria in World War I. This left Russia headless, with local regions ruled by local autonomous powers. Many of these powers were in fact peasant and proletarian collectives. But others may have been soldiers returning from war and seizing whatever control they could by force.

Luxemburg’s revolutionary theory was much more sensitive to the complexity of decentralized power. Rather than expecting the working class to submit unquestioningly to top-down control and coordinating in mass strikes, she acknowledged a reality that decentralized groups would act in an uncoordinated way. This was good for the revolutionary cause, she argued, because it allowed the local energy and creativity of workers movements to move effectively and contribute spontaneously to the overall outcome. Whereas Lenin saw spontaneity in the working class as leading inevitably to their being coopted by bourgeois ideology, Luxemburg believed the spontaneous authentic action of autonomously acting working class people were vital to keeping the revolution unified and responsive to working class interests.

by Sebastian Benthall at March 22, 2017 02:00 AM

March 21, 2017

MIMS 2011

Towards software that supports interpretation rather than quantification

[Reblogged from the Software Sustainability Institute blog]

My research involves the study of the emerging relationships between data and society that is encapsulated by the fields of software studies, critical data studies and infrastructure studies, among others. These fields of research are primarily aimed at interpretive investigations into how software, algorithms and code have become embedded into everyday life, and how this has resulted in new power formations, new inequalities, new authorities of knowledge [1]. Some of the subjects of this research include the ways in which Facebook’s News Feed algorithm influences the visibility and power of different users and news sources (Bucher, 2012), how Wikipedia delegates editorial decision-making and moral agency to bots (Geiger and Ribes, 2010), or the effects of Google’s Knowledge Graph on people’s ability to control facts about the places in which they live (Ford and Graham, 2016).

As the only Software Sustainability Institute fellows working in this area, I set myself the goal of investigating what tools, methods and infrastructure researchers working in these fields were using to conduct their research. Although Big Data is a challenge for every field of research, I found that the challenge for social scientists and humanities scholars doing interpretive research in this area is unique and perhaps even more significant. Two key challenges stand out. The first is that data requiring interpretation tends to be much larger than traditionally analysed. This often requires at least some level of quantification in order to ‘zoom out’ to obtain a bigger picture of the phenomenon or issues under study. Researchers in this tradition often lack the skills to conduct such analyses – particularly at scale. The second challenge is that online data is subject to ethical and legal restrictions, particularly when research involves interpretive research (as opposed to the anonymized data collected for statistical research).

In many universities it seems that mathematics, engineering, physics and computer science departments have started to build internal infrastructure to deal with Big Data, and some universities have established good Digital Humanities programs that are largely about the quantitative study of large corpuses of images/films/videos or other cultural objects. But infrastructure and expertise is severely lacking for those wishing to do interpretive rather than quantitative research using mixed, experimental, ethnographic or qualitative research using online data. The software and infrastructure required for doing interpretive research is patchy, departments are typically ill-equipped to support researchers and students with the expertise required to conduct social media research, and significant ethical questions remain about doing social media research, particularly in the context of data protection laws.

Data Carpentry offers some promise here. I organized, with the support of the Software Sustainability Institute, a “Data Carpentry for the Social Sciences workshop” with Dr Brenda Moon (Queensland University of Technology) and Martin Callaghan (University of Leeds) in November 2016 at Leeds University. Data Carpentry workshops tend to be organized for quantitative work in the hard sciences and there were no lesson plans for dealing with social media data. Brenda stepped in to develop some of these materials based partly on the really good Library Carpentry resources and both Martin and Brenda (with additional help from Dr Andy Evans, Joanna Leng and Dr Viktoria Spaiser) made an excellent start towards seeding the lessons database with some social media specific exercises.

The two-day workshop centered on examples from Twitter data and participants worked with Python and other off-the-shelf tools to extract and analyze data. There were fourteen participants in the workshop ranging from PhD students to professors and from media and communications to sociology and social policy, music to law, earth and environment to translation studies. At the end of the workshop participants said that they felt they had received a strong grounding in Python and that the course was useful, interactive, open and not intimidating. There were suggestions, however, to make improvements to the Twitter lessons and to perhaps split up the group in the second day to move onto more advanced programming for some and to go over the foundations for beginners.

Also supported by the Institute was my participation in two conferences in Australia at the end of 2016. The first was a conference exploring the impact of automation on everyday life at the Queensland University of Technology in Brisbane, the second, the annual Crossroads in Cultural Studies conference in Sydney. Through my participation in these events (and via other information-gathering that I have been conducting in my travels) I have learned that many researchers in the social sciences and humanities suffer from a significant lack of local expertise and infrastructure. On multiple occasions I learned of PhD students and researchers running analyses of millions of tweets on their laptops, suffering from a lack of understanding when applying for ethical approval and conducting analyses that lack a consistent approach.

Centers of excellence in digital methods around the world share code and learnings where they can. One such program is the Digital Methods Initiative (DMI) at the University of Amsterdam. The DMI hosts regular summer and winter schools to train researchers in using digital methods tools and provides free access to some of the open source software tools that it has developed for collecting and analyzing digital data. Queensland University of Technology’s Social Media Group also hosts summer schools and has contributed to methodological scholarship employing interpretive approaches to social media and internet research. The common characteristic of such programmes are that they are collaborative (sharing resources across the university departments and between different universities) and innovative (breaking some of the traditional rules that govern traditional research in the university).

Many researchers who handle data in more interpretive studies tend to rely on these global hubs in the few universities where infrastructure is being developed. The UK could benefit from a similar hub for researchers locally, especially since software and code needs to be continually developed and maintained for a much wider variety of evolving methods. Alternatively, or alongside such hubs, Data Carpentry workshops could serve as an important virtual hub for sharing lesson plans and resources. Data Carpentry could, for example, host code that can be used to query APIs for doing social media research and workshops could also be used to collaboratively explore or experiment with methods for iterative, grounded investigation of social media practices.

Due to the rapid increase in the scale and velocity of social media data and because of the lack of technical expertise to manage such data, social scientists and humanities scholars have taken a backseat to the hard sciences in explaining new dimensions of social life online. This is disappointing because it means that much of the research coming out about social media, Big Data and the computation lacks a connection to important social questions about the world. Building from some of this momentum will be essential in the next few years if we are to see social scientists and humanities scholars adding their important insights into social phenomena online. Much more needs to be done to build flexible and agile resources for the rapidly advancing field of social media research if we are to benefit from the contributions of social science and humanities scholars in the field of digital cultures and politics.

[1] For an excellent introduction to the contribution of interpretive scholars to questions about data and the digital see ‘The Datafied Society’ just published by Amsterdam University Press

Pic: Martin Callaghan displays the ‘Geeks and repetitive tasks’ model during the November 2016 Data Carpentry for the Social Sciences workshop at Leeds University.

by Heather Ford at March 21, 2017 01:19 PM

March 20, 2017

Ph.D. student

artificial life, artificial intelligence, artificial society, artificial morality

“Everyone” “knows” what artificial intelligence is and isn’t and why it is and isn’t a transformative thing happening in society and technology and industry right now.

But the fact is that most of what “we” “call” artificial intelligence is really just increasingly sophisticated ways of solving a single class of problems: optimization.

Essentially what’s happened in AI is that all empirical inference problems can be modeled as Bayesian problems, which are then solved using variational inference methods, which are essentially just turning the Bayesian statistic problem into a solvable form of an optimization problem, and solving it.

Advances in optimization have greatly expanded the number of things computers can accomplish as part of a weak AI research agenda.

Frequently these remarkable successes in Weak AI are confused with an impending revolution in what used to be called Strong AI but which now is more frequently called Artificial General Intelligence, or AGI.

Recent interest in AGI has spurred a lot of interesting research. How could it not be interesting? It is also, for me, extraordinarily frustrating research because I find the philosophical precommitments of most AGI researchers baffling.

One insight that I wish made its way more frequently into discussions of AGI is an insight made by the late Francisco Varela, who argued that you can’t really solve the problem of artificial intelligence until you have solved the problem of artificial life. This is for the simple reason that only living things are really intelligent in anything but the weak sense of being capable of optimization.

Once being alive is taken as a precondition for being intelligent, the problem of understanding AGI implicates a profound and fascinating problem of understanding the mathematical foundations of life. This is a really amazing research problem that for some reason is never ever discussed by anybody.

Let’s assume it’s possible to solve this problem in a satisfactory way. That’s a big If!

Then a theory of artificial general intelligence should be able to show how some artificial living organisms are and others are not intelligent. I suppose what’s most significant here is the shift in thinking of AI in terms of “agents”, a term so generic as to be perhaps at the end of the day meaningless, to thinking of AI in terms of “organisms”, which suggests a much richer set of preconditions.

I have similar grief over contemporary discussion of machine ethics. This is a field with fascinating, profound potential. But much of what machine ethics boils down to today are trolley problems, which are as insipid as they are troublingly intractable. There’s other, better machine ethics research out there, but I’ve yet to see something that really speaks to properly defining the problem, let alone solving it.

This is perhaps because for a machine to truly be ethical, as opposed to just being designed and deployed ethically, it must have moral agency. I don’t mean this in some bogus early Latourian sense of “wouldn’t it be fun if we pretended seatbelts were little gnomes clinging to our seats” but in an actual sense of participating in moral life. There’s a good case to be made that the latter is not something easily reducible to decontextualized action or function, but rather has to do with how own participates more broadly in social life.

I suppose this is a rather substantive metaethical claim to be making. It may be one that’s at odds with common ideological trainings in Anglophone countries where it’s relatively popular to discuss AGI as a research problem. It has more in common, intellectually and philosophically, with continental philosophy than analytic philosophy, whereas “artificial intelligence” research is in many ways a product of the latter. This perhaps explains why these two fields are today rather disjoint.

Nevertheless, I’d happily make the case that the continental tradition has developed a richer and more interesting ethical tradition than what analytic philosophy has given us. Among other reasons this is because of how it is able to situated ethics as a function of a more broadly understood social and political life.

I postulate that what is characteristic of social and political life is that it involves the interaction of many intelligent organisms. Which of course means that to truly understand this form of life and how one might recreate it artificially, one must understand artificial intelligence and, transitively, artificial life.

Only one artificial society is sufficiently well-understood could we then approach the problem of artificial morality, or how to create machines that truly act according to moral or ethical ideals.

by Sebastian Benthall at March 20, 2017 02:40 AM

March 19, 2017

Ph.D. student

ideologies of capitals

A key idea of Bourdieusian social theory is that society’s structure is due to the distribution of multiple kinds of capital. Social fields have their roles and their rules, but they are organized around different forms of capital the way physical systems are organized around sources of force like mass and electrical charge. Being Kantian, Bourdieusian social theory is compatible with both positivist and phenomenological forms of social explanation. Phenomenological experience, to the extent that it repeats itself and so can be described aptly as a social phenomenon at all, is codified in terms of habitus. But habitus is indexed to its place within a larger social space (not unlike, it must be said, a Blau space) whose dimensions are the dimensions of the allocations of capital throughout it.

While perhaps not strictly speaking a corollary, this view suggests a convenient methodological reduction, according to which the characteristic beliefs of a habitus can be decomposed into components, each component representing the interests of a certain kind of capital. When I say “the interests of a capital”, I do mean the interests of the typical person who holds a kind of capital, but also the interests of a form of capital, apart from and beyond the interests of any individual who carries it. This is an ontological position that gives capital an autonomous social life of its own, much like we might attribute an autonomous social life to a political entity like a state. This is not the same thing as attributing to capital any kind of personhood; I’m not going near the contentious legal position that corporations are people, for example. Rather, I mean something like: if we admit that social life is dictated in part by the life cycle of a kind of psychic microorganism, the meme, then we should also admit abstractly of social macroorganisms, such as capitals.

What the hell am I talking about?

Well, the most obvious kind of capital worth talking about in this way is money. Money, in our late modern times, is a phenomenon whose existence depends on a vast global network of property regimes, banking systems, transfer protocols, trade agreements, and more. There’s clearly a naivete in referring to it as a singular or homogeneous phenomenon. But it is also possible to referring to in a generic globalized way because of the ways money markets have integrated. There is a sense in which money exists to make more money and to give money more power over other forms of capital that are not money, such as: social authority based on any form of seniority, expertise, lineage; power local to an institution; or the persuasiveness of an autonomous ideal. Those that have a lot of money are likely to have an ideology very different from those without a lot of money. This is partly due to the fact that those who have a lot of money will be interested in promoting the value of that money over and above other capitals. Those without a lot of money will be interested inn promoting forms of power that contest the power of money.

Another kind of capital worth talking about is cosmopolitanism. This may not be the best word for what I’m pointing at but it’s the one that comes to mind now. What I’m talking about is the kind of social capital one gets not by having a specific mastery of a local cultural form, but rather by having the general knowledge and cross-cultural competence to bridge across many different local cultures. This form of capital is loosely correlated with money but is quite different from it.

A diagnosis of recent shifts in U.S. politics, for example, could be done in terms of the way capital and cosmopolitanism have competed for control over state institutions.

by Sebastian Benthall at March 19, 2017 12:29 AM

March 16, 2017

Ph.D. student

equilibrium representation

We must keep in mind not only the capacity of state simplifications to transform the world but also the capacity of the society to modify, subvert, block, and even overturn the categories imposed upon it. Here is it useful to distinguish what might be called facts on paper from facts on the ground…. Land invasions, squatting, and poaching, if successful, represent the exercise of de facto property rights which are not represented on paper. Certain land taxes and tithes have been evaded or defied to the point where they have become dead letters. The gulf between land tenure facts on paper and facts on the ground is probably greatest at moments of social turmoil and revolt. But even in more tranquil times, there will always be a shadow land-tenure system lurking beside and beneath the official account in the land-records office. We must never assume that local practice conforms with state theory. – Scott, Seeing Like a State, 1998

I’m continuing to read Seeing Like a State and am finding in it a compelling statement of a state of affairs that is coded elsewhere into the methodological differences between social science disciplines. In my experience, much of the tension between the social sciences can be explained in terms of the differently interested uses of social science. Among these uses are the development of what Scott calls “state theory” and the articulation, recognition, and transmission of “local practice”. Contrast neoclassical economics with the anthropology of Jean Lave as examples of what I’m talking about. Most scholars are willing to stop here: they choose their side and engage in a sophisticated form of class warfare.

This is disappointing from the perspective of science per se, as a pursuit of truth. To see where there’s a place for such work in the social sciences, we only have to the very book in front of us, Seeing Like a State, which stands outside of both state theory and local practices to explain a perspective that is neither but rather informed by a study of both.

In terms of the ways that knowledge is used in support of human interests, in the Habermasian sense (see some other blog posts), we can talk about Scott’s “state theory” as a form of technical knowledge, aimed at facilitating power over the social and natural world. What he discusses is the limitation of technical knowledge in mastering the social, due to complexity and differentiation in local practice. So much of this complexity is due to the politicization of language and representation that occurs in local practice. Standard units of measurement and standard terminology are tools of state power; efforts to guarantee them are confounded again and again in local interest. This disagreement is a rejection of the possibility of hermeneutic knowledge, which is to say linguistic agreement about norms.

In other words, Scott is pointing to a phenomenon where because of the interests of different parties at different levels of power, there’s a strategic local rejection of inter-subjective agreement. Implicitly, agreeing even on how to talk with somebody with power over you is conceding their power. The alternative is refusal in some sense. A second order effect of the complexity caused by this strategic disagreement is the confounding of technical mastery over the social. In Scott’s terminology, a society that is full of strategic lexical disagreement is not legible.

These are generalizations reflecting tendencies in society across history. Nevertheless, merely by asserting them I am arguing that they have a kind of special status that is not itself caught up in the strategic subversions of discourse that make other forms of expertise foolish. There must be some forms of representation that persist despite the verbal disagreements and differently motivated parties that use them.

I’d like to call these kinds of representations, which somehow are technically valid enough to be useful and robust to disagreement, even politicized disagreement, as equilibrium representations. The idea here is that despite a lot of cultural and epistemic churn, there are still attractor states in the complex system of knowledge production. At equilibrium, these representations will be stable and serve as the basis for communication between different parties.

I’ve posited equilibrium representations hypothetically, without having a proof or example yet on one that actually exists. My point is to have a useful concept that acknowledges the kinds of epistemic complexities raised by Scott but that acknowledges the conditions for which a modernist epistemology could prevail despite those complexities.


by Sebastian Benthall at March 16, 2017 05:57 PM

appropriate information flow

Contextual integrity theory defines privacy as appropriate information flow.

Whether or not this is the right way to define privacy (which might, for example, be something much more limited), and whether or not contextual integrity as it is currently resourced as a theory is capable of capturing all considerations needed to determine the appropriateness of information flow, the very idea of appropriate information flow is a powerful one. It makes sense to strive to better our understanding of which information flows are appropriate, which others are inappropriate, to whom, and why.


by Sebastian Benthall at March 16, 2017 01:38 AM

March 15, 2017

Ph.D. student

Seeing Like a State: problems facing the code rural

I’ve been reading James C. Scott’s Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed for, once again, Classics. It’s just as good as everyone says it is, and in many ways the counterpoint to James Beniger’s The Control Revolution that I’ve been looking for. It’s also highly relevant to work I’m doing on contextual integrity in privacy.

Here’s a passage I read on the subway this morning that talks about the resistance to codification of rural land use customs in Napoleonic France.

In the end, no postrevolutionary rural code attracted a winning coalition, even amid a flurry of Napoleonic codes in nearly all other realms. For our purposes, the history of the stalemate is instructive. The first proposal for a code, which was drafted in 1803 and 1807, would have swept away most traditional rights (such as common pasturage and free passage through others’ property) and essentially recast rural property relations in the light of bourgeois property rights and freedom of contract. Although the proposed code pefigured certain modern French practices, many revolutionaries blocked it because they feared that its hands-off liberalism would allow large landholders to recreate the subordination of feudalism in a new guise.

A reexamination of the issue was then ordered by Napoleon and presided over by Joseph Verneilh Puyrasseau. Concurrently, Depute Lalouette proposed to do precisely what I supposed, in the hypothetical example, was impossible. That is, he undertook to systematically gather information about all local practices, to classify and codify them, and then to sanction them by decree. The decree in question would become the code rural. Two problems undid this charming scheme to present the rural poplace with a rural code that simply reflected its own practices. The first difficulty was in deciding which aspects of the literally “infinite diversity” or rural production relations were to be represented and codified. Even if a particular locality, practices varied greatly from farm to farm over time; any codification would be partly arbitrary and artificially static. To codify local practices was thus a profoundly political act. Local notables would be able to sanction their preferences with the mantle of law, whereas others would lose customary rights that they depended on. The second difficulty was that Lalouette’s plan was a mortal threat to all state centralizers and economic modernizers for whom a legible, national property regime was the procondition of progress. As Serge Aberdam notes, “The Lalouette project would have brought about exactly what Merlin de Douai and the bourgeois, revolutionary jurists always sought ot avoid.” Neither Lalouette nor Verneilh’s proposed code was ever passed, because they, like their predecessor in 1807, seemed to be designed to strengthen the hand of the landowners.

(Emphasis mine.)

The moral of the story is that just as the codification of a land map will be inaccurate and politically contested for its biases, so too a codification of customs and norms will suffer the same fate. As Borges’ fable On Exactitude in Science mocks the ambition of physical science, we might see the French attempts at code rural to be a mockery of the ambition of computational social science.

On the other hand, Napoleonic France did not have the sweet ML we have today. So all bets are off.

by Sebastian Benthall at March 15, 2017 03:16 PM

March 14, 2017

Ph.D. student

industrial technology development and academic research

I now split my time between industrial technology (software) development and academic research.

There is a sense in which both activities are “scientific”. They both require the consistent use of reason and investigation to arrive at reliable forms of knowledge. My industrial and academic specializations are closely enough aligned that both aim to create some form of computational product. These activities are constantly informing one another.

What is the difference between these two activities?

One difference is that industrial work pays a lot better than academic work. This is probably the most salient difference in my experience.

Another difference is that academic work is more “basic” and less “applied”, allowing it to address more speculative questions.

You might think that the latter kind of work is more “fun”. But really, I find both kinds of work fun. Fun-factor is not an important difference for me.

What are other differences?

Here’s one: I find myself emotionally moved and engaged by my academic work in certain ways. I suppose that since my academic work straddles technology research and ethics research (I’m studying privacy-by-design), one thing I’m doing when I do this work is engaging and refining my moral intuitions. This is rewarding.

I do sometimes also feel that it is self-indulgent, because one thing that thinking about ethics isn’t is taking responsibility for real change in the world. And here I’ll express an opinion that is unpopular in academia, which is that being in industry is about taking responsibility for real change in the world. This change can benefit other people, and it’s good when people in industry get paid well because they are doing hard work that entails real risks. Part of the risk is the responsibility that comes with action in an uncertain world.

Another critically important difference between industrial technology development and academic research is that while the knowledge created by the former is designed foremost to be deployed and used, the knowledge created by the latter is designed to be taught. As I get older and more advanced as a researcher, I see that this difference is actually an essential one. Knowledge that is designed to be taught needs to be teachable to students, and students are generally coming from both a shallower and more narrow background than adult professionals. Knowledge that is designed to by deployed and used need only be truly shared by a small number of experienced practitioners. Most of the people affected by the knowledge will be affected by it indirectly, via artifacts. It can be opaque to them.

Industrial technology production changes the way the world works and makes the world more opaque. Academic research changes the way people work, and reveals things about the world that had been hidden or unknown.

When straddling both worlds, it becomes quite clear that while students are taught that academic scientists are at the frontier of knowledge, ahead of everybody else, they are actually far behind what’s being done in industry. The constraint that academic research must be taught actually drags its form of science far behind what’s being done regularly in industry.

This is humbling for academic science. But it doesn’t make it any less important. Rather, in makes it even more important, but not because of the heroic status of academic researchers being at the top of the pyramid of human knowledge. It’s because the health of the social system depends on its renewal through the education system. If most knowledge is held in secret and deployed but not passed on, we will find ourselves in a society that is increasingly mysterious and out of our control. Academic research is about advancing the knowledge that is available for education. It’s effects can take half a generation or longer to come to fruition. Against this long-term signal, the oscillations that happen within industrial knowledge, which are very real, do fade into the background. Though not before having real and often lasting effects.

by Sebastian Benthall at March 14, 2017 02:27 AM

March 03, 2017

Ph.D. alumna

Failing to See, Fueling Hatred.

I was 19 years old when a some configuration of anonymous people came after me. They got access to my email and shared some of the most sensitive messages on an anonymous forum. This was after some of my girl friends received anonymous voice messages describing how they would be raped. And after the black and Latinx high school students I was mentoring were subject to targeted racist messages whenever they logged into the computer cluster we were all using. I was ostracized for raising all of this to the computer science department’s administration. A year later, when I applied for an internship at Sun Microsystems, an alum known for his connection to the anonymous server that was used actually said to me, “I thought that they managed to force you out of CS by now.”

Needless to say, this experience hurt like hell. But in trying to process it, I became obsessed not with my own feelings but with the logics that underpinned why some individual or group of white male students privileged enough to be at Brown University would do this. (In investigations, the abusers were narrowed down to a small group of white men in the department but it was never going to be clear who exactly did it and so I chose not to pursue the case even though law enforcement wanted me to.)

My first breakthrough came when I started studying bullying, when I started reading studies about why punitive approaches to meanness and cruelty backfire. It’s so easy to hate those who are hateful, so hard to be empathetic to where they’re coming from. This made me double down on an ethnographic mindset that requires that you step away from your assumptions and try to understand the perspective of people who think and act differently than you do. I’m realizing more and more how desperately this perspective is needed as I watch researchers and advocates, politicians and everyday people judge others from their vantage point without taking a moment to understand why a particular logic might unfold.

The Local Nature of Wealth

A few days ago, my networks were on fire with condescending comments referencing an article in The Guardian titled “Scraping by on six figures? Tech workers feel poor in Silicon Valley’s wealth bubble.” I watched as all sorts of reasonably educated, modestly but sustainably paid people mocked tech folks for expressing frustration about how their well-paid jobs did not allow them to have the sustainable lifestyle that they wanted. For most, Silicon Valley is at a distance, a far off land of imagination brought to you by the likes of David Fincher and HBO. Progressive values demand empathy for the poor and this often manifests as hatred for the rich. But what’s missing from this mindset is an understanding of the local perception of wealth, poverty, and status. And, more importantly, the political consequences of that local perception.

Think about it this way. I live in NYC where the median household income is somewhere around $55K. My network primarily makes above the median and yet they all complain that they don’t have enough money to achieve what they want in NYC, whether they’re making $55K, $70K, or $150K. Complaining about being not having enough money is ritualized alongside complaining about the rents. No one I know really groks that they’re making above the median income for the city (and, thus, that most people are much poorer than they are), let alone how absurd their complaints might sound to someone from a poorer country where a median income might be $1500 (e.g., India).

The reason for this is not simply that people living in NYC are spoiled, but that people’s understanding of prosperity is shaped by what they see around them. Historically, this has been understood through word-of-mouth and status markers. In modern times, those status markers are often connected to conspicuous consumption. “How could HE afford a new pair of Nikes!?!?”

The dynamics of comparison are made trickier by media. Even before yellow journalism, there has always been some version of Page Six or “Lifestyles of the Rich and Famous.” Stories of gluttonous and extravagant behaviors abound in ancient literature. Today, with Instagram and reality TV, the idea of haves and havenots is pervasive, shaping cultural ideas of privilege and suffering. Everyday people perform for the camera and read each other’s performances critically. And still, even as we watch rich people suffer depression or celebrities experience mental breakdowns, we don’t know how to walk in each other’s shoes. We collectively mock them for their privilege as a way to feel better for our own comparative struggles.

In other words, in a neoliberal society, we consistently compare ourselves to others in ways that make us feel as though we are less well off than we’d like. And we mock others who are more privileged who do the same. (And, horribly, we often blame others who are not for making bad decisions.)

The Messiness of Privilege

I grew up with identity politics, striving to make sense of intersectional politics and confused about what it meant to face oppression as a woman and privilege as a white person. I now live in a world of tech wealth while my family does not. I live with contradictions and I work on issues that make those contradictions visible to me on a regular basis. These days, I am surrounded by civil rights advocates and activists of all stripes. Folks who remind me to take my privilege seriously. And still, I struggle to be a good ally, to respond effectively to challenges to my actions. Because of my politics and ideals, I wake up each day determined to do better.

Yet, with my ethnographer’s hat on, I’m increasingly uncomfortable with how this dynamic is playing out. Not for me personally, but for affecting change. I’m nervous that the way that privilege is being framed and politicized is doing damage to progressive goals and ideals. In listening to white men who see themselves as “betas” or identify as NEETs (“Not in Education, Employment, or Training”) describe their hatred of feminists or social justice warriors, I hear the cost of this frame. They don’t see themselves as empowered or privileged and they rally against these frames. And they respond antagonistically in ways that further the divide, as progressives feel justified in calling them out as racist and misogynist. Hatred emerges on both sides and the disconnect produces condescension as everyone fails to hear where each other comes from, each holding onto their worldview that they are the disenfranchised, they are the oppressed. Power and wealth become othered and agency becomes understood through the lens of challenging what each believes to be the status quo.

It took me years to understand that the boys who tormented me in college didn’t feel powerful, didn’t see their antagonism as oppression. I was even louder and more brash back then than I am now. I walked into any given room performing confidence in ways that completely obscured my insecurities. I took up space, used my sexuality as a tool, and demanded attention. These were the survival skills that I had learned to harness as a ticket out. And these are the very same skills that have allowed me to succeed professionally and get access to tremendous privilege. I have paid a price for some of the games that I have played, but I can’t deny that I’ve gained a lot in the process. I have also come to understand that my survival strategies were completely infuriating to many geeky white boys that I encountered in tech. Many guys saw me as getting ahead because I was a token woman. I was accused of sleeping my way to the top on plenty of occasions. I wasn’t simply seen as an alpha — I was seen as the kind of girl that screwed boys over. And because I was working on diversity and inclusion projects in computer science to attract more women and minorities as the field, I was seen as being the architect of excluding white men. For so many geeky guys I met, CS was the place where they felt powerful and I stood for taking that away. I represented an oppressor to them even though I felt like it was they who were oppressing me.

Privilege is complicated. There is no static hierarchical structure of oppression. Intersectionality provides one tool for grappling with the interplay between different identity politics, but there’s no narrative for why beta white male geeks might feel excluded from these frames. There’s no framework for why white Christians might feel oppressed by rights-oriented activists. When we think about privilege, we talk about the historical nature of oppression, but we don’t account for the ways in which people’s experiences of privilege are local. We don’t account for the confounding nature of perception, except to argue that people need to wake up.

Grappling with Perception

We live in a complex interwoven society. In some ways, that’s intentional. After WWII, many politicians and activists wanted to make the world more interdependent, to enable globalization to prevent another world war. The stark reality is that we all depend on social, economic, and technical infrastructures that we can’t see and don’t appreciate. Sure, we can talk about how our food is affordable because we’re dependent on underpaid undocumented labor. We can take our medicine for granted because we fail to appreciate all of the regulatory processes that go into making sure that what we consume is safe. But we take lots of things for granted; it’s the only way to move through the day without constantly panicking about whether or not the building we’re in will collapse.

Without understanding the complex interplay of things, it’s hard not to feel resentful about certain things that we do see. But at the same time, it’s not possible to hold onto the complexity. I can appreciate why individuals are indignant when they feel as though they pay taxes for that money to be given away to foreigners through foreign aid and immigration programs. These people feel like they’re struggling, feel like they’re working hard, feel like they’re facing injustice. Still, it makes sense to me that people’s sense of prosperity is only as good as their feeling that they’re getting ahead. And when you’ve been earning $40/hour doing union work only to lose that job and feel like the only other option is a $25/hr job, the feeling is bad, no matter that this is more than most people make. There’s a reason that Silicon Valley engineers feel as though they’re struggling and it’s not because they’re comparing themselves to everyone in the world. It’s because the standard of living keeps dropping in front of them. It’s all relative.

It’s easy to say “tough shit” or “boo hoo hoo” or to point out that most people have it much worse. And, at some levels, this is true. But if we don’t account for how people feel, we’re not going to achieve a more just world — we’re going to stoke the fires of a new cultural war as society becomes increasingly polarized.

The disconnect between statistical data and perception is astounding. I can’t help but shake my head when I listen to folks talk about how life is better today than it ever has been in history. They point to increased lifespan, new types of medicine, decline in infant mortality, and decline in poverty around the world. And they shake their heads in dismay about how people don’t seem to get it, don’t seem to get that today is better than yesterday. But perception isn’t about statistics. It’s about a feeling of security, a confidence in one’s ecosystem, a belief that through personal effort and God’s will, each day will be better than the last. That’s not where the vast majority of people are at right now. To the contrary, they’re feeling massively insecure, as though their world is very precarious.

I am deeply concerned that the people whose values and ideals I share are achieving solidarity through righteous rhetoric that also produces condescending and antagonistic norms. I don’t fully understand my discomfort, but I’m scared that what I’m seeing around me is making things worse. And so I went back to some of Martin Luther King Jr.’s speeches for a bit of inspiration today and I started reflecting on his words. Let me leave this reflection with this quote:

The ultimate weakness of violence is that it is a descending spiral,
begetting the very thing it seeks to destroy.
Instead of diminishing evil, it multiplies it.
Through violence you may murder the liar,
but you cannot murder the lie, nor establish the truth.
Through violence you may murder the hater,
but you do not murder hate.
In fact, violence merely increases hate.
So it goes.
Returning violence for violence multiplies violence,
adding deeper darkness to a night already devoid of stars.
Darkness cannot drive out darkness:
only light can do that.
Hate cannot drive out hate: only love can do that.
— Dr. Martin Luther King, Jr.

Image from Flickr: Andy Doyle

by zephoria at March 03, 2017 09:19 PM

March 01, 2017

Ph.D. student

arXiv preprint of Refutation of Bostrom’s Superintelligence Argument released

I’ve written a lot of blog posts about Nick Bostrom’s book Superintelligence, presented what I think is a refutation of his core argument.

Today I’ve released an arXiv preprint with a more concise and readable version of this argument. Here’s the abstract:

Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument

In recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become “superintelligent” and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent’s ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage.

I invite any feedback on this work.

by Sebastian Benthall at March 01, 2017 02:18 PM

February 20, 2017

Ph.D. alumna

Heads Up: Upcoming Parental Leave

There’s a joke out there that when you’re having your first child, you tell everyone personally and update your family and friends about every detail throughout the pregnancy. With Baby #2, there’s an abbreviated notice that goes out about the new addition, all focused on how Baby #1 is excited to have a new sibling. And with Baby #3, you forget to tell people.

I’m a living instantiation of that. If all goes well, I will have my third child in early March and I’ve apparently forgotten to tell anyone since folks are increasingly shocked when I indicate that I can’t help out with XYZ because of an upcoming parental leave. Oops. Sorry!

As noted when I gave a heads up with Baby #1 and Baby #2, I plan on taking parental leave in stride. I don’t know what I’m in for. Each child is different and each recovery is different. What I know for certain is that I don’t want to screw over collaborators or my other baby – Data & Society. As a result, I will be not taking on new commitments and I will be actively working to prioritize my collaborators and team over the next six months.

In the weeks following birth, my response rates may get sporadic and I will probably not respond to non-mission-critical email. I also won’t be scheduling meetings. Although I won’t go completely offline in March (mostly for my own sanity), but I am fairly certain that I will take an email sabbatical in July when my family takes some serious time off** to be with one another and travel.

A change in family configuration is fundamentally walking into the abyss. For as much as our culture around maternity leave focuses on planning, so much is unknown. After my first was born, I got a lot of work done in the first few weeks afterwards because he was sleeping all the time and then things got crazy just as I was supposedly going back to work. That was less true with #2, but with #2 I was going seriously stir crazy being home in the cold winter and so all I wanted was to go to lectures with him to get out of bed and soak up random ideas. Who knows what’s coming down the pike. I’m fortunate enough to have the flexibility to roll with it and I intend to do precisely that.

What’s tricky about being a parent in this ecosystem is that you’re kinda damned if you do, damned if you don’t. Women are pushed to go back to work immediately to prove that they’re serious about their work – or to take serious time off to prove that they’re serious about their kids. Male executives are increasingly publicly talking about taking time off, while they work from home.  The stark reality is that I love what I do. And I love my children. Life is always about balancing different commitments and passions within the constraints of reality (time, money, etc.).  And there’s nothing like a new child to make that balancing act visible.

So if you need something from me, let me know ASAP!  And please understand and respect that I will be navigating a lot of unknown and doing my best to achieve a state of balance in the upcoming months of uncertainty.


** July 2017 vacation. After a baby is born, the entire focus of a family is on adjustment. For the birthing parent, it’s also on recovery because babies kinda wreck your body no matter how they come out. Finding rhythms for sleep and food become key for survival. Folks talk about this time as precious because it can enable bonding. That hasn’t been my experience and so I’ve relished the opportunity with each new addition to schedule some full-family bonding time a few months after birth where we can do what our family likes best – travel and explore as a family. If all goes well in March, we hope to take a long vacation in mid-July where I intend to be completely offline and focused on family. More on that once we meet the new addition.

by zephoria at February 20, 2017 01:45 PM

February 15, 2017

Ph.D. alumna

When Good Intentions Backfire

… And Why We Need a Hacker Mindset

I am surrounded by people who are driven by good intentions. Educators who want to inform students, who passionately believe that people can be empowered through knowledge. Activists who have committed their lives to addressing inequities, who believe that they have a moral responsibility to shine a spotlight on injustice. Journalists who believe their mission is to inform the public, who believe that objectivity is the cornerstone of their profession. I am in awe of their passion and commitment, their dedication and persistence.

Yet, I’m existentially struggling as I watch them fight for what is right. I havelearned that people who view themselves through the lens of good intentions cannot imagine that they could be a pawn in someone else’s game. They cannot imagine that the values and frames that they’ve dedicated their lives towards — free speech, media literacy, truth — could be manipulated or repurposed by others in ways that undermine their good intentions.

I find it frustrating to bear witness to good intentions getting manipulated,but it’s even harder to watch how those who are wedded to good intentions are often unwilling to acknowledge this, let alone start imagining how to develop the appropriate antibodies. Too many folks that I love dearly just want to double down on the approaches they’ve taken and the commitments they’ve made. On one hand, I get it — folks’ life-work and identities are caught up in these issues.

But this is where I think we’re going to get ourselves into loads of trouble.

The world is full of people with all sorts of intentions. Their practices and values, ideologies and belief systems collide in all sorts of complex way. Sometimes, the fight is about combating horrible intentions, but often it is not. In college, my roommate used to pound a mantra into my head whenever I would get spun up about something: Do not attribute to maliciousness what you can attribute to stupidity. I return to this statement a lot when I think about how to build resilience and challenge injustices, especially when things look so corrupt and horribly intended — or when people who should be allies see each other as combatants. But as I think about how we should resist manipulation and fight prejudice, I also think that it’s imperative to move away from simply relying on “good intentions.”

I don’t want to undermine those with good intentions, but I also don’t want good intentions to be a tool that can be used against people. So I want to think about how good intentions get embedded in various practices and the implications of how we view the different actors involved.

The Good Intentions of Media Literacy

When I penned my essay “Did Media Literacy Backfire?”, I wanted to ask those who were committed to media literacy to think about how their good intentions — situated in a broader cultural context — might not play out as they would like. Folks who critiqued my essay on media literacy pushed back in all sorts of ways, both online and off. Many made me think, but some also reminded me that my way of writing was off-putting. I was accused of using the question “Did media literacy backfire?” to stoke clicks.Some snarkily challenged my suggestion that media literacy was even meaningfully in existence, asked me to be specific about which instantiations I meant (because I used the phrase “standard implementations”), and otherwise pushed for the need to double down on “good” or “high quality” media literacy. The reality is that I’m a huge proponent of their good intentions — and have long shared them, but I wrote this piece because I’m worried that good intentions can backfire.

While I was researching youth culture, I never set out to understand what curricula teachers used in the classroom. I wasn’t there to assess the quality of the teachers or the efficacy of their formal educational approaches. I simply wanted to understand what students heard and how they incorporated the lessons they received into their lives. Although the teens that I met had a lot of choice words to offer about their teachers, I’ve always assumed that most teachers entered the profession with the best of intentions, even if their students couldn’t see that. But I spent my days listening to students’ frustrations and misperceptions of the messages teachers offered.

I’ve never met an educator who thinks that the process of educating is easy or formulaic. (Heck, this is why most educators roll their eyes when they hear talk of computerized systems that can educate better than teachers.) So why do we assume that well-intended classroom lessons — or even well-designed curricula — might not play out as we imagine? This isn’t simply about the efficacy of the lesson or the skill of the teacher, but the cultural context in which these conversations occur.

In many communities in which I’ve done research, the authority of teachers is often questioned. Nowhere is this more painfully visible than when well-intended highly educated (often white) teachers come to teach in poorer communities of color. Yet, how often are pedagogical interventions designed by researchers really taking into account the doubt that students and their parents have of these teachers? And how do we as educators and scholars grapple with how we might have made mistakes?

I’m not asking “Did Media Literacy Backfire?” to be a pain in the toosh, but to genuinely highlight how the ripple effects of good intentions may not play out as imagined on the ground for all sorts of reasons.

The Good Intentions of Engineers

From the outside, companies like Facebook and Google seem pretty evil to many people. They’re situated in a capitalist logic that many advocates and progressives despise. They’re opaque and they don’t engage the public in their decision-making processes, even when those decisions have huge implications for what people read and think. They’re extremely powerful and they’ve made a lot of people rich in an environment where financial inequality and instability is front and center. Primarily located in one small part of the country, they also seem like a monolithic beast.

As a result, it’s not surprising to me that many people assume that engineers and product designers have evil (or at least financially motivated) intentions. There’s an irony here because my experience is the opposite.Most product teams have painfully good intentions, shaped by utopic visions of how the ideal person would interact with the ideal system. Nothing is more painful than sitting through a product design session with design personae that have been plucked from a collection of clichés.

I’ve seen a lot of terribly naive product plans, with user experience mockups that lack any sense of how or why people might interact with a system in unexpected ways. I spent years tracking how people did unintended things with social media, such as the rise of “Fakesters,” or of teenagers who gamed Facebook’s system by inserting brand names into their posts, realizing that this would make their posts rise higher in the social network’s news feed. It has always boggled my mind how difficult it is for engineers and product designers to imagine how their systems would get gamed. I actually genuinely loved product work because I couldn’t help but think about how to break a system through unexpected social practices.

Most products and features that get released start with good intentions, but they too get munged by the system, framed by marketing plans, and manipulated by users. And then there’s the dance of chaos as companies seek to clean up PR messes (which often involves non-technical actors telling insane fictions about the product), patch bugs to prevent abuse, and throw bandaids on parts of the code that didn’t play out as intended. There’s a reason that no one can tell you exactly how Google’s search engine or Facebook’s news feed works. Sure, the PR folks will tell you that it’s proprietary code. But the ugly truth is that the code has been patched to smithereens to address countless types of manipulation and gamification(e.g., SEO to bots). It’s quaint to read the original “page rank” paper that Brin and Page wrote when they envisioned how a search engine could ideally work. That’s so not how the system works today.

The good intentions of engineers and product people, especially those embedded in large companies, are often doubted as sheen for a capitalist agenda. Yet, like many other well-intended actors, I often find that makers feel misunderstood and maligned, assumed to have evil thoughts. And I often think that when non-tech people start by assuming that they’re evil, we lose a significant opportunity to address problems.

The Good Intentions of Journalists

I’ve been harsh on journalists lately, mostly because I find it so infuriating that a profession that is dedicated to being a check to power could be so ill-equipped to be self-reflexive about its own practices.

Yet, I know that I’m being unfair. Their codes of conduct and idealistic visions of their profession help journalists and editors and publishers stay strong in an environment where they are accustomed to being attacked. It just kills me that the cultural of journalism makes those who have an important role to play unable to see how they can be manipulated at scale.

Sure, plenty of top-notch journalists are used to negotiating deception and avoidance. You gotta love a profession that persistently bangs its head against a wall of “no comment.” But journalism has grown up as an individual sport; a competition for leads and attention that can get fugly in the best of configurations. Time is rarely on a journalist’s side, just as nuance is rarely valued by editors. Trying to find “balance” in this ecosystem has always been a pipe dream, but objectivity is a shared hallucination that keeps well-intended journalists going.

Powerful actors have always tried to manipulate the news media, especially State actors. This is why the fourth estate is seen as so important in the American context. Yet, the game has changed, in part because of the distributed power of the masses. Social media marketers quickly figured out that manufacturing outrage and spectacle would give them a pathway to attention, attracting news media like bees to honey. Most folks rolled their eyes, watching as monied people played the same games as State actors. But what about the long tail? How do we grapple with the long tail? How should journalists respond to those who are hacking the attention economy?

I am genuinely struggling to figure out how journalists, editors, and news media should respond in an environment in which they are getting gamed.What I do know from 12-steps is that the first step is to admit that you have a problem. And we aren’t there yet. And sadly, that means that good intentions are getting gamed.

Developing the Hacker Mindset

I’m in awe of how many of the folks I vehemently disagree with are willing to align themselves with others they vehemently disagree with when they have a shared interest in the next step. Some conservative and hate groups are willing to be odd bedfellows because they’re willing to share tactics, even if they don’t share end goals. Many progressives can’t even imagine coming together with folks who have a slightly different vision, let alone a different end goal, to even imagine various tactics. Why is that?

My goal in writing these essays is not because I know the solutions to some of the most complex problems that we face — I don’t — but because I think that we need to start thinking about these puzzles sideways, upside down, and from non-Euclidean spaces. In short, I keep thinking that we need more well-intended folks to start thinking like hackers.

Think just as much about how you build an ideal system as how it might be corrupted, destroyed, manipulated, or gamed. Think about unintended consequences, not simply to stop a bad idea but to build resilience into the model.

As a developer, I always loved the notion of “extensibility” because it was an ideal of building a system that could take unimagined future development into consideration. Part of why I love the notion is that it’s bloody impossible to implement. Sure, I (poorly) comment my code and build object-oriented structures that would allow for some level of technical flexibility. But, at the end of the day, I’d always end up kicking myself for not imagining a particular use case in my original design and, as a result, doing a lot more band-aiding than I’d like to admit. The masters of software engineering extensibility are inspiring because they don’t just hold onto the task at hand, but have a vision for all sorts of different future directions that may never come into fruition. That thinking is so key to building anything, whether it be software or a campaign or a policy. And yet, it’s not a muscle that we train people to develop.

If we want to address some of the major challenges in civil society, we need the types of people who think 10 steps ahead in chess, imagine innovative ways of breaking things, and think with extensibility at their core. More importantly, we all need to develop that sensibility in ourselves. This is the hacker mindset.

This post was originally posted on Points. It builds off of a series of essays on topics affecting the public sphere written by folks at Data & Society. As expected, my earlier posts ruffled some feathers, and I’ve been trying to think about how to respond in a productive manner. This is my attempt.

Flickr Image: CC BY 2.0-licensed image by DaveBleasdale.

by zephoria at February 15, 2017 05:51 PM

February 12, 2017

Ph.D. student

the “hacker class”, automation, and smart capital

(Mood music for reading this post:)

I mentioned earlier that I no longer think hacker class consciousness is important.

As incongruous as this claim is now, I’ve explained that this is coming up as I go through old notes and discard them.

I found another page of notes that reminds me there was a little more nuance to my earlier position that I remembered, which has to do with the kind of labor done by “hackers”, a term I reserve the right to use in MIT/Eric S. Raymond sense, without the political baggage that has since attached to the term.

The point was in response to Eric. S. Raymond’s “How to be a hacker” essay which was that part of what it means to be a “hacker” is to hate drudgery. The whole point of programming a computer is so that you never have to do the same activity twice. Ideally, anything that’s repeatable about the activity gets delegated to the computer.

This is relevant in the contemporary political situation because we’re probably now dealing with the upshot of structural underemployment due to automation and the resulting inequalities. This remains a topic that scholarship, technologists, and politicians seem systematically unable to address directly even when they attempt to, because everybody who sees the writing on the wall is too busy trying to get the sweet end of that deal.

It’s a very old argument that those who own the means of production are able to negotiate for a better share of the surplus value created by their collaborations with labor. Those who own or invest in capital generally speaking would like to increase that share. So there’s market pressure to replace reliance of skilled labor, which is expensive, with reliance on less skilled labor, which is plentiful.

So what gets industrialists excited is smart capital, or a means of production that performs the “skilled” functions formerly performed by labor. Call it artificial intelligence. Call it machine learning. Call it data science. Call it “the technology industry”. That’s what’s happening and been happening for some time.

This leaves good work for a single economic class of people, those whose skills are precisely those that produce this smart capital.

I never figured out what the end result of this process would be. I imagined at one point that the creation of the right open source technology would bring about a profound economic transformation. A far fetched hunch.

by Sebastian Benthall at February 12, 2017 10:14 PM

three kinds of social explanation: functionalism, politics, and chaos

Roughly speaking, I think there are three kinds of social explanation. I mean “explanation” in a very thick sense; an explanation is an account of why some phenomenon is the way it is, grounded in some kind of theory that could be used to explain other phenomena as well. To say there are three kinds of social explanation is roughly equivalent to saying there are three ways to model social processes.

The first of these kind of social explanation is functionalism. This explains some social phenomenon in terms of the purpose that it serves. Generally speaking, fulfilling this purpose is seen as necessary for the survival or continuation of the phenomenon. Maybe it simply is the continued survival of the social organism that is its purpose. A kind of agency, though probably very limited, is ascribed to the entire social process. The activity internal to the process is then explained by the purpose that it serves.

The second kind of social explanation is politics. Political explanations focus on the agencies of the participants within the social system and reject the unifying agency of the whole. Explanations based on class conflict or personal ambition are political explanations. Political explanations of social organization make it out to be the result of a complex of incentives and activity. Where there is social regularity, it is because of the political interests of some of its participants in the continuation of the organization.

The third kind of social explanation is hardly an explanation at all. It is explanation by chaos. This sort of explanation is quite rare, as it does not provide much of the psychological satisfaction we like from explanations. I mention it here because I think it is an underutilized mode of explanation. In large populations, much of the activity that happens will do so by chance. Even large organizations may form according to stochastic principles that do not depend on any real kind of coordinated or purposeful effort.

It is important to consider chaotic explanation of social processes when we consider the limits of political expertise. If we have a low opinion of any particular person’s ability to understand their social environment and act strategically, then we must accept that much of their “politically” motivated actions will be based on misconceptions and therefore be, in an objective sense, random. At this point political explanations become facile, and social regularity has to be explained either in terms of the ability of social organizations qua organizations to survive, or the organization must be explained in a deflationary way: i.e., that the organization is not really there, but just in the eye of the beholder.

by Sebastian Benthall at February 12, 2017 02:36 AM

February 09, 2017

MIMS 2012

Artists don't distinguish between...

Artists don’t distinguish between the act of making something and the act of thinking about it — thinking and making evolve together in an emergent, concurrent fashion. As a result, when approaching a project, an artist often doesn’t seem to plan it out. She just goes ahead and begins, all the while collecting data that inform how she will continue. A large part of what drives her confidence to move forward is her faith in her ability to course correct and improvise as she goes.

— John Maeda, “Redesigning Leadership”

This quote from John Maeda’s book, Redesigning Leadership really resonated with me. It captures my approach to problems and new challenges perfectly. I don’t stress too much about having every step planned out — I’ve learned to trust my intuition and follow new paths as they appear, having faith that they will lead me to a successful outcome.

“Improvise as she goes.” I never would have thought of it like that, but “improvising” is a great way to describe my approach.

by Jeff Zych at February 09, 2017 01:13 AM

February 06, 2017

Ph.D. student

immigration, automation, xenophobia, and jobs

“The divide is not between the left and right any more but between patriots and globalists.” – Marine Le Pen

I have been trying to get a grip on what’s going on with the global economy. This is hard because I get a lot of my news via Twitter and so can only comprehend arguments in 140 characters or less. Here are several that are floating around:

  • In UK, US, and France, there are those who blame globalization for their underemployment. They advocate for reduced immigration and import protectionism.
  • Economists like Larry Summers assure as that it is technological automation, not free trade, which has caused underemployment. That, and the actual emergence of emerging markets and their capacity to produce competitive goods.
  • Tech companies are rallying to fight Trump’s immigration ban. This is because they want top talent.

Though it pains me to say it, it looks like there is a missing link in the mainstream economic analysis, which is this: to the extent that highly talented immigrants help tech companies produce technology that automates work otherwise performed by non-immigrant labor, there is a real sense in which “immigrants have come to take [our/your] jobs.” It’s not through direct competition over low wages. It’s indirectly through automation.

That said, this is a drop in the bucket, as there’s plenty of domestic labor in the tech industry. If leaked memos and accounts of communications between U.S. leadership are to be believed, the xenophobic aspect of the new protectionism is due to the visibility of successful immigrants. If these successful immigrants are working in technology which has automated domestic jobs, then the racial or national otherness of the immigrants may be adding insult to injury, so to speak.

Please take all this with caveats about how all domestic labor is due to immigration, how the racial and cultural diversity of the countries in question is authentic to these nations; identities, etc. I’m just trying to get at what the real sticking points are.

by Sebastian Benthall at February 06, 2017 04:51 PM

I’m no longer freaking out about societal collapse

I have been a little worried about societal collapse. I learned something new that made me less worried about it, which is that Article 25 of the Constitution allows for the Vice President and a majority of the cabinet to file for the removal of the President. The President can reverse this decision, but if the VP and majority of cabinet file for the position again, then Congress gets to vote on it.

I learned about this from FiveThirtyEight, which I suppose I should be paying more attention to. Their analysis reminds me of the chapter on Superteams from Tetlock’s Superforecasting: they sit around and critique each other’s views, adjusting their confidence in various hypotheses. Good for them!

In the specific case of the current Presidency and the Federal Government, what this new information does for me is significantly change what the options are for probable worst case scenarios. These worst case scenarios all involve the possibility that (a) Trump goes off the rails doing something truly terrible, possibly (b) trying to defy the authority of the Judicial branch entirely, essentially imposing martial law. This depends on (c) Congress being totally useless.

Earlier, I thought the only way to remove a standing President was impeachment, and given (c) that’s just not likely to happen.

However, given everything being said about the bitter infighting within the White House, it looks like a potential move by Pence and half the cabinet is totally within reason. The bet is largely on Pence’s ambition. He’s young enough to have a career ahead of him. He has more to gain from Defending the Constitution at the last minute than he does from following a lunatic into oblivion. The cabinet is shaping up to be full of ambitious rich people who benefit from having rule of law, as long as that law is not regulating any of their businesses.

I’m not saying that use of Article 25 to depose President Trump is likely to happen. Rather, I think that it provides a check on his power that I hadn’t considered before. He can be reined in or threatened from within his own team, especially as it fills out.

This is no real comfort to all the people who would be disadvantaged by these policies. It tilts the odds in favor of the stability of the current government, with all of its vocal hostility to judges, immigrants, liberals, and so on.

My prediction is that the next four years are going to continue to be very uncomfortable for the public spirited. The federal government may not be the best place to find work in the public interest unless one is a social conservative, because government will be mainly be serving private interests.

It was interesting that so many of the Super Bowl ads today were about inclusivity and other left-wing values. If the government is pulling back its support for certain causes, that does not necessarily leave these causes without champions. A forward-looking question is: how will civil society and industry compensate for the things the government is not doing?

by Sebastian Benthall at February 06, 2017 06:42 AM

February 05, 2017

Ph.D. student

metaphysics and politics

In almost any contemporary discussion of politics, today’s experts will tell you that metaphysics is irrelevant.

This is because we are discouraged today from taking a truly totalizing perspective–meaning, a perspective that attempts to comprehend the totality of what’s going on.

Academic work on politics is specialized. It focuses on a specific phenomenon, or issue, or site. This is partly due to the limits of what it is possible to work on responsibly. It is also partly due to the limitations of agency. A grander view of politics isn’t useful for any particular agent; they need only the perspective that best serves them. Blind spots are necessary for agency.

But universalist metaphysics is important for politics precisely because if there is a telos to politics, it is peace, and peace is a condition of the totality.

And while a situated agent may have no need for metaphysics because they are content with the ontology that suits them, situated agents cannot alone make any guarantees of peace.

In order for an agent to act effectively in the interest of total societal conditions, they require an ontology which is not confined by their situation, which will encode those habits of thought necessary for maintaining their situation as such.

What motivates the study of metaphysics then? A motivation is that it provides one with freedom from ones situation.

This freedom is a political accomplishment, and it also has political effects.

by Sebastian Benthall at February 05, 2017 04:28 PM

February 03, 2017

Ph.D. student

no, free speech was totally unaffected by the Berkeley violence

When I wrote the other day about anarchist tactics in resistance to perceived fascism, I had in mind non-violent tactics. I did not anticipate that soon after Black Bloc anarchists would cause violence in the otherwise peaceful protest of Milo Yianopolous’s talk.

There has since been a back and forth about what any of this means in terms of the big picture of the nation’s politics.

I would like to argue that it means nothing.

There has been some commentary about the First Amendment. Berkeley’s the historical site of the Free Speech movement. Right-wing commentators, including Yianopolous himself, are eager to paint the event as an ironic crisis of Free Speech. Donald J. Trump, President of the United States of America, has insinuated that UC Berkeley was complicit in the illegal silencing of Yianopolous. But these are red herrings that are stupid. The talk was canceled because a small minority of people who had nothing to do with UC Berkeley made the situation unmanageable, and public safety took priority. Meanwhile, Black Bloc anarchists are based in Oakland. So this has nothing to do with Berkeley.

Meanwhile, the whole conceit that somebody’s live speaking event at a college campus is a privileged moment in which Yianopolous could share his message is silly. This is somebody who has made a career through social media. Everyone who wanted to know what he was going to say could have looked up what he’s already said on-line. It’s because everybody already knew what he was going to say that people were pissed about him showing up.

There’s a lens on the event which is a familiar progressive refrain about the emotional powers of speech. Speech causes the transfer of hate, the triggering of traumas, it offends and causes emotional pain. When somebody says insensitive things, it can be painful. And there’s this idea that by maintaining a collective consciousness pure of bad thoughts, these painful ideas won’t spread.

But hasn’t politicized media already saturated the thoughts of anybody paying attention? The likelihood that the presence or non-presence of a speaker at UC Berkeley is going to be a student’s first encounter with an idea is small. To believe otherwise is nostalgia.

Since speech flows freely through social media, and has in fact never been freer, the events of the protest were, in fact, all speech, including the violence. It was all performance. The speech of the black bloc was loud and clear, it said “F*** YOU FASCISTS.” It wasn’t directed at Yianopolous at all, obviously. It was a statement about everything else that’s going on. It turn subtext into text.

But it means nothing. It’s just politics of spectacle. The First Ammendment is being evoked ignorantly and symbolically. Nobody is actually taking anybody else to court.

It’s a good question whether, how, and who can actually be taken to court over things that are being done in these crazy times.

by Sebastian Benthall at February 03, 2017 04:13 AM

February 01, 2017

Ph.D. student

Ohm and Post: Privacy as threats, privacy as dignity

I’m reading side by side two widely divergent law review articles about privacy.

One is Robert Post‘s “The Social Foundations of Privacy: Community and Self in Common Law Tort” (1989) (link)

The other is Paul Ohm‘s “Sensitive Information” (2014) (link)

They are very notably different. Post’s article diverges sharply from the intellectual millieu I’m used to. It starts with an exposition of Goffman’s view of the personal self as being constituted by ceremonies and rituals of human relationships. Privacy tort law is, in Post’s view, about repairing tears in the social fabric. The closest thing to this that I have ever encountered is Fingarette’s book on Confucianism.

Ohm’s article is much more recent and is in large part a reaction to the Snowden leaks. It’s an attempt to provide an account of privacy that can limit the problems associated with massive state (and corporate?) data collection. It attempts to provide a legally informed account of what information is sensitive, and then suggests that threat modeling strategies from computer security can be adapted to the privacy context. Privacy can be protected by identifying and mitigated privacy threats.

As I get deeper into the literature on Privacy by Design, and observe how privacy-related situations play out in the world and in my own life, I’m struck by the adaptability and indifference of the social world to shifting technological infrastructural conditions. A minority of scholars and journalists track major changes in it, but for the most part the social fabric adapts. Most people, probably necessarily, have no idea what the technological infrastructure is doing and don’t care to know. It can be coopted, or not, into social ritual.

If the swell of scholarship and other public activity on this topic was the result of surprising revelations or socially disruptive technological innovations, these same discomforts have also created an opportunity for the less technologically focused to reclaim spaces for purely social authority, based on all the classic ways that social power and significance play out.

by Sebastian Benthall at February 01, 2017 06:44 PM

January 31, 2017

Ph.D. student

gamers, collective intelligence, airport protests, democratic surrounds, and blue ooze

I was captivated and unnerved by Jordan Greenhall’s “Situational Assessment 2017: Trump Edition“. It is a kind of futurist writing I appreciate. I’m personally able to put aside the criticism that it sounds like he’s making the future of the country into a role-playing game because I’ve played a lot of Dungeons and Dragons and don’t pretend to not appreciate role-playing games as a flexible and effective cognitive frame. Also, it seems quite likely that an important political bloc in the United States right now are gamers.

As wretched as Gamer Gate was, the most wretched thing about it was how little light was shed on the Gamer demographic. We were led to believe that Gamers are mainly white, male, and lacking in progressive sophistication. The mainstream left wing critique of the Gamers of Gamer Gate was reductivist [see correction in footnote], focusing on the most visible and extreme actions and thereby alienating the probably much larger number of people who could be loosely identified as Gamers who didn’t fit the archetype constructed by their opposition.

This is a metaphor for all political opposition in this nonsense media environment. A little bit of effort and empathy goes a long way, but most people don’t care enough to bother.

Put yourself in the shoes of a white guy who spends a lot of time, you know, gaming. You’re probably underemployed and not very geographically mobile. Your primary source of entertainment is grand narrative driven virtual combat and conquest. There is the existential angst that comes with all your victories taking place in environments that don’t actually exist and the alienation that comes from being socialized into a mock military full of teenagers. Your actual lifestyle is fine, in an objective sense, but it is boring as hell. The smartest things you can find to read are written by coastal journalists or academics, but they are full of postmodern and multiculturalist sentiments that have nothing to do with your lived experience in, let’s say, the Rust Belt. So you start reading Alt Right materials because it’s a refreshing change. Now you’re strong for Trump.

This is what Max Weber would call an Ideal Type. I don’t have any data to back up this characterization of the Gamer. You could call it a hunch. And building on Greenhall’s futurist essay, these Gamers are the Red Insurgency. Being the Red Insurgency is very appealing to the Gamer, because it’s basically just like playing a video game except you play propaganda wars on the Internet and you’ve recently managed to take over the U.S. government.

The disturbing thing about Greenhall’s essay, for me, is his insistence that the Red Insurgency will win against the Deep State or “Blue Church” because of its superior adaptability and faster response loop. Essentially, Greenhall’s argument is that the collective intelligence architecture of the Red Insurgency is superior to the entrenched bureaucracy of the state and so in a kind of disruptive innovation the former will replace the latter.

This is far fetched. Like many polemic arguments written across the political spectrum, it doesn’t take into account the horrific complexity of managing a broadly integrated society. (This is the same criticism I’ve had of Pasquale, who writes as if it’s time for a populist revolt against the Deep State of Google, etc.) Recent events surrounding Trump’s executive orders show where the video game stops and life begins. You can win an election on a platform of Gamer-baiting slogans, but if you try to write them into executive orders it turns out that they probably interact with existing laws. Barring the absolutely terrifying prospect of washing away the entire court system, it appears that the whole game being played between the Red Insurgency and the Blue Church does indeed have rules. And those rules are boring.

Meanwhile, there’s something else happening, which is the organic reformation of the coastal populist left, which never really went away and is after all more populous than Greenhall’s Red Insurgency. However unable they are to win seats in Congress, they are able to rally.

I went to SFO last weekend to see what the protest was like. It was totally different from playing a video game. There were lots of people there, you know, in person. It was interracial, it was multigenerational. There was a purposeful pluralism. A pluralism of everyone you’d expect, but a pluralism nonetheless.

Greenhall makes some observations about the fluidity and non-linearity of collective intelligence in groups that make use of the Internet for their main means of communication. While I believe his assessment of the properties of this kind of collective intelligence is true, it’s also an awkward articulation of what has been described elsewhere and in more depth. Castells’s theory of the Network Society (2000), was such a good account of what’s going on that the next generation of academics had to bury the theory so that they wouldn’t have to parrot it. One of its totally reasonable points (in The Power of Identity) is that in the Network Society there’s a politics of identity whereby state ideologies are challenged by social movements that are themselves a kind of actor in global politics. I forget what he has to say about liberalism. But he writes a lot about the Zapatistas, who were a left-wing Mexican revolutionary…collective intelligence.

There’s a sense in which non-violent protest tactics advance in terms of their tactics. The sociologist in me remembers Occupy with a kind of fondness because while it was probably ineffective at achieving its political aims, whatever they were, I was led to believe that that wasn’t actually the point of it. The point of Occupy was to keep the social technology of non-violent urban protest well-oiled and calibrated to changing media environments. There was a nation-wide general assembly of decentralized cells of protesters. There was the training of a generation of activists in the use of the people’s microphone. There was the mobilization of social media for the scandalization of police brutality. It was specifically an anarchist movement, the first of many political movements made against the U.S. establishment. It didn’t work.

At San Francisco Airport the protesters used a human microphone to announce that the airport was supporting the protest, allowing the demonstrators to block the gates. Airports, it turns out, are great places for protests. Tons of amenities. Also, the symbolism.

You may not believe it, but I do have a point. The point is about how I think the airport protests are like the binding together of neurons for a grassroots collective intelligence. It’s a grassroots collective intelligence that would be totally mundane in so many other decades, because it’s actually just liberalism.

But if we can believe Fred Turner’s argument in The Democratic Surround, liberalism didn’t just happen for no reason. Liberalism was an invention of American intellectuals in response to the rising threat of European Fascism. The story goes: Hitler mastered the use of mass media, and American intellectuals thought the technology itself was partly responsible for fascist politics. It allowed, perhaps for the first time, the direct witnessing of a charismatic crazy man by a population not accustomed to seeing such things. And they were enthralled.

President Roosevelt was already doing his Fireside Chats and there was a concern that this media strategy would turn the United States fascist as well. Saving the day, in Fred Turner’s telling, was an intellectual coalition of exiled Bauhaus artists as well as Gregory Bateson and Margaret Mead just back from an anthropological expedition in Bali. I think Adorno is in there somewhere. John Cage as well, though I find that part of the argument unconvincing.

Long story short, there’s a new kind of art installation that emerges in World War II called “The Democratic Surround” which is also a metaphor for Facebook. It’s an exhibition that features images showing the variety of people that there are in America, or the variety of consumer products available. It’s a celebration of variety. You walk through it and are invited to find your unique place within it. You see a picture of a family that reminds you of yours and you think, “Ah, I am part of something greater than myself, whose value is in its diversity.” This becomes your national identity. Then you go fight the fascists.

The Democratic Surround was a nationalist project. It had two major catches. The first is that it was a carefully managed experience. It was a curated art exhibit, after all. Later versions of it would be carefully instrumented to measure traffic through it and the psychological impact on its audience. This makes the Democratic Surround reminiscent of the Panopticon in a way that’s useful, since the whole point of The Democratic Surround is to try to put a more positive spin on the lush surveillance state Silicon Valley invented for us.

The second catch is that when it was being used as a national propaganda device, the picture of America being shown to citizens was largely premature. It was a picture of racial and gender equality and of social integration. But the Civil Rights movement, for example, hadn’t happened yet. So all this pluralism was aspirational. It was a promise of what America could become if it won the war against Fascism in Europe.

It’s sixty years later now and much of U.S. history since then has been making that pluralistic vision a reality. Of course, it’s only a reality in certain dense cities. But a lot of people live in those cities and so they are culturally dominant. To some extent the multiculturalism we have now is just what it’s necessary to believe in order to survive, politically, in those diverse urban environments.

A lot of this very real diversity was on display at the SFO protest and I assume at other protests in other airports. It got me thinking about The Democratic Surround because it was (a) explicitly nationalistic–the signage was definitely about America, and (b) explicitly pluralistic. What made it significantly unlike The Democratic Surround was that (a) it was a lot of actual people, not some media or “communications” bullshit, and (b) it was managed very loosely. I mentioned the human microphone. There were people who were acting as organizers, but my sense was that these were organizers in the anarchist tradition. There were large stockpiles of food and water freely available for protesters. People with bullhorns invited people to come up and testify about the personal meaning of the event. It was a proper rally.

What I’ve been trying to argue throughout all this is that there is a new political identity emerging from this mess. It’s collective intelligence architecture is that of a networked anarchist movement. Which is to say fast, messy, problematically inclusive, and fun. But its politics are actually quite traditionally American: it’s to stop Fascism. Or the specter of it. Whether or not there is a real threat of Fascism in America or not, the more there is the appearance of one, the more liberals are going to start using anarchist tactics.

If all goes well, this provides a counterbalance to what Greenhall calls the Red Insurgency. Let’s call it, for the sake of argument, Blue Ooze. Blue Ooze isn’t part of the Deep State; it’s a different intelligence structure. It’s conservative, in the sense of resisting radicalism or change. Its purpose is to cool off the whole political process by legitimizing the Deep State, which is mostly just fine. If successful, it will sustain bipartisan power and otherwise maintain the status quo.

Blue Ooze is always already coopted by global capital yadda yadda you’ve heard it all before.

Note: I stand corrected by a good friend and colleague. Part of the point of the original critiques that lead to GamerGate were in fact arguments that gamers were not just white men with certain predictable tastes, but rather were a much more diverse group. I have fallen prey to the reductivism of the consequent journalism on the subject, i.e. the narrative that was being pushed afterwards by Gawker. My focus above has been on widening, ever so slightly, the conception of that Gamer. As a gamer myself who has never lived in a flyover state, I would have to say that I too am an exception to the ideal type presented above. If my tone is read accurately, the purpose of this blog post is to provide a countervailing view of activism as a way of playing the political game that is open to all.

by Sebastian Benthall at January 31, 2017 07:57 AM

January 30, 2017

MIMS 2012

Shifting from a Product-centric to a Service-centric Mindset

Over the past few months I’ve shifted from a product-centric mindset to a service-centric mindset. My focus used to be on building products that help people accomplish a task or goal. That meant I would try to understand the problem to solve, who it’s being solved for, and then design digital products to solve that problem.

But as I’ve grown as a designer, become a manager, and seen Optimizely move into the enterprise market, I’ve realized that a lot more goes into making a product successful than the product itself. Companies often offer additional services to make customers successful.

A service is a touchpoint or system provided by a company to fulfill a need. A touchpoint is how someone uses a service — a website, phone line, ticket kiosk, and so on.

Most digital products, for example, have additional online properties to help customer be successful, like a knowledge base. Companies can also provide non-digital services, such as a support line customers can call or email.

Even though a service may not have a visual interface, they can still be thoughtfully designed. To make good decisions about how these services work, you still need a solid understanding of your users and their goals. This is what product designers do when designing a product, with the only difference being the final deliverable is not a visual interface.

Shifting to a service mindset makes it obvious that new technologies that have invisible UIs, like Alexa and Operator, can be thought of as services and designed just like any other service. In its simplest form, design is the act of making thoughtful decisions. Having empathy and understanding a user’s goals, motivations, and context help designers make thoughtful decisions. These activities apply to services and invisible UIs just as much as creating visual interfaces.

On top of that, all of the products and services that a company offers its customers need to work in concert with each other. This means that it isn’t enough for each product and service to be well-designed on its own — they also need to be designed to seamlessly work together to make customers successful. Doing this also requires having a broad understanding of your customers.

When I had a product-centric mindset I was aware of the different touchpoints, but I hadn’t put much effort into designing them all as a cohesive, interrelated experience. Customers may use the knowledge base and email support while using the product, but that’s for the support team to manage. “I’m just going to make the product great because that’s all that customers need to be successful,” I used to think. I’ve since learned that isn’t true. It takes more than the product itself to make customers successful.

Learning about the discipline of service design has helped me connect all the different touchpoints customers use into one unified framework. Everything is a service — products included. And they can all be thoughtfully designed by using the core skills designers already have. By doing so, customers will have a better experience with your products and services, which will make them more successful, and that will ultimately make your company more successful.

If you’re interested in learning more about service design, these books and articles have taught me a lot:

by Jeff Zych at January 30, 2017 03:20 AM

January 27, 2017

Ph.D. alumna

The Information War Has Begun

Yesterday, Steve Bannon clearly articulated what many people have felt and known for quite some time when he told journalists, “You’re the opposition party. Not the Democratic Party… The media’s the opposition party.” This builds on earlier remarks by Trump, who said, “I have a running war with the media.”

Journalists have covered this with their “objective” voice as though it was another news story in the crazy first week of WTF moments. Many of those who value the media have looked at this with wide eyes, struggling to assess which of the many news stories they should be more horrified by. Far too few are getting the point:

The news media have become a pawn in a big chess game of an information war. 

News agencies, long trained to focus on reporting information and maintaining a conceptual model of standards, are ill-equipped to understand that they may have a role in this war, that their actions and decisions are shaping the way the war plays out.

When Kellyanne Conway argued that they were operating with “alternative facts,” the media mocked her. They tried to dismiss her comment that the media has a 14% approval rating by fact-correcting this to point out that this was only a Gallup poll concerning the media’s approval rating among Republicans. But they missed her greater point: there’s no cost to the administration to be helpful to the media because the people the Trump Administration cares about don’t trust the media anyhow.

CC-BY-NC-ND 2.0-licensed photo by Mark Deckers.

How many years did it take for the US military to learn that waging war with tribal networks couldn’t be fought with traditional military strategies? How long will it take for the news media to wake up and recognize that they’re being played? And how long after that will it take for editors and publishers to start evolving their strategies?

As I wrote in “Hacking the Attention Economy,” manipulating the media for profit, ideology, and lulz has evolved over time. The strategies that hackers, hoaxers, and haters have taken have become more sophisticated. The campaigns have gotten more intense. And now many of the actors most set on undermining institutionalized information intermediaries are in the most powerful office in the land. They are waging war on the media and the media doesn’t know what to do other than to report on it.

We’ve built an information ecosystem where information can fly through social networks (both technical and personal). Folks keep looking to the architects of technical networks to solve the problem. I’m confident that these companies can do a lot to curb some of the groups who have capitalized on what’s happening to seek financial gain. But the battles over ideology and attention are going to be far trickier. What’s at stake isn’t “fake news.” What’s at stake is the increasing capacity of those committed to a form of isolationist and hate-driven tribalism that has been around for a very long time. They have evolved with the information landscape, becoming sophisticated in leveraging whatever tools are available to achieve power, status, and attention. And those seeking a progressive and inclusive agenda, those seeking to combat tribalism to form a more perfect union —  they haven’t kept up.

The information war has begun. Normative approaches to challenging the system will not work. What will it take for news media to wake up? What will it take for progressives to start developing skills to fight back?

by zephoria at January 27, 2017 04:54 PM

January 24, 2017

Ph.D. student

update: no longer think “hacker class consciousness” is important

I’m going through old papers and throwing them out. I came upon an early draft from my first year in graduate school titled “Hacker Class Consciousness”. It was the beginning of an argument that those that work on open source software needed to develop a kind of class consciousness recognizing that their work bears a special relationship to capitalist modes of production. Open source software is a form of capital (a means of production) that is not privately owned. Hence, it is actually quite disruptive to capitalism per se. A la early Marxist theory, a political identity or “class consciousness” of people working in this way was necessary to reform the government to make it more equitable, or environmentally friendly, less violent, or whatever your critique of capitalism (or neoliberalism, if you prefer) is.

I didn’t get very far past this basic economic logic, which I still think is correct. I no longer think that class consciousness is important though. And I don’t think there’s an inevitability to capitalism containing the seeds of its own revolution through the eventual triumph of open source production.

I think it’s a good practice to make oneself accountable when one changes ones mind. There’s lots of evidence to say that when people publicly commit to some belief, they wind up sticking to it with more confidence than they ought to. Shame related reasons, I suppose. A good alternative habit, I believe, is publicly admitting when you are wrong about something, with the reasons for the update.

So why did I change my mind on this? Well, one reason is that I took some shots at formally modelling the problem several years ago and while it showed the robustness of open source software as a way of opening a market that had previously been dominated or locked in by a proprietary vendor or solution, there isn’t the profit motive driving open source production as a first mover. So the natural pressures of the market make open source coexist alongside proprietary systems, providing a countervailing force to privatization but never dissolving it entirely.

Another reason I changed my mind was a more general shift away from Marxist to Bourdieusian modes of thinking, which I’ve talked about here. A key part of this change in perspective is that it sees many kinds of capital at work in society, including both economic and cultural forms, and populations are distributed across the resulting multidimensional spectrum of variation, not stratified into a one-dimensional class structure. In such a world, class consciousness is futile. This futility may explain the futility of the Marxist project in general, as there was never really the kind of global collective action of the proletariat that he predicted would end capitalism. There’s always too many other kinds of population difference at work to allow for such a revolution. Race, for example.

It is good that a matured attitude has left me less eager to engage in a futile revolutionary project. There’s nothing like pursuing a doctorate for grinding that kind of idealism out of you. Now I can scintillate with cynicism, and would like to be much better at it. Which is to say, I’m beginning to regret ever turning away from the dismal science of economics, which now seems much more like the doctrine worth pursuing and improving.

One nice thing about economics is that it is quantitatively rigorous. This is not simply an intellectual gate-keeping statement designed to box out the innumerate. It’s rather a comment on how such a field has strictly more expressive power because of its capacity to represent a statistical distribution of variation. It’s not enough to say there’s black and white when there are shades of gray. And it’s not enough to say there are shades of gray when the particular variation in density of light across the field is what’s important.

A grayscale raster, from the OpenGeo Suite

A grayscale raster, from the OpenGeo Suite

It’s this kind of expressive power that gives computational social science much of its appeal. I forgot to even make this argument in my paper about the subject. That may be because this notion of the expressive power of different representational systems is part of what one learns in the course of ones computer science education, and that argument was written primarily for people without a computer science education.

Which really brings the discussion back around to where I come down to on the revolutionary economic potential of software development. Which is that really, it’s about educating people in the concepts and skills that allow them to make use of this incredible pool of openly available technical capital that gives people the “class consciousness” to act with it. Since late modern software development depends for its very existence on the great open wealth of collectivized logic already crystallized into free code, the “consciousness” is really just the habitus of the developer. I suppose I occasionally meet somebody who says they’ve been coding in .NET for their whole careers, but they are rare and I think are not doing well in the greater information economy.

It no coincidence that technical education and skills diffusion are, for Thomas Picketty, the way to counteract the inequality the results from disparate returns on wealth versus labor. This is a position one simply converges on if one studies it for long enough. Kindly, it stabilizes the role of the education system as one that is necessary for correcting other forms of societal destabilization and excess.

by Sebastian Benthall at January 24, 2017 02:14 AM

January 21, 2017

MIMS 2011

Human-bot relations at ICA 2017 in San Diego

News this week that a panel I contributed to on political bots has been accepted for the annual International Communication Association (ICA) conference in San Diego with Amanda Clarke, Elizabeth Dubois, Jonas Kaiser and Cornelius Puschmann this May. Political bots are automated agents that are deployed on social media platforms like Twitter to perform a variety of functions that are having a significant impact on politics and public life. There is already some great work about the negative impact of bots that are used to “manipulate public opinion by megaphoning or repressing political content in various forms” (see but we were interested in the types of bots these bots are often compared to — the so-called “good” bots that expose the actions of particular types of actors (usually governments) and thereby bring about greater transparency of government activity.

Elizabeth, Cornelius and I worked on a paper about WikiEdits bots for ICA last year in the pre-conference: “Algorithms, Automation, Politics” (“Keeping Ottawa Honest — One Tweet at a Time?” Politicians, Journalists and their Twitter bots, PDF) where we found that the impact of these bots isn’t as simple as bringing about greater transparency. The new work that we will present in May is a deeper investigation of the types of relationships that are catalysed by the existence and ongoing development of transparency bots on Twitter. I’ll be working on the relationship between bots and their creators in both Canada and South Africa, attempting to investigate the relationship between the bots and the transparency that they promise. Cornelius is looking at the relationship between journalists and bots, Elizabeth and Amanda are looking at the relationship between bots and political staff/government employees, and Jonas will be looking more closely at bots and users. The awesome Stuart Geiger who has done some really great work on bots has kindly agreed to be a respondent to the paper.

You can read more about the panel and each of the papers below.

Do people make good bots bad?

Political bots are not necessarily good or bad. We argue the impact of transparency bots (a particular kind of political bot) rests largely on the relationships bots have with their creators, journalists, government and political staff, and the general public. In this panel each of these relationships is highlighted using empirical evidence and a respondent guides wider discussion about how these relationships interact in the wider political and media system.

This panel challenges the notion that political bots are necessarily good or bad by highlighting relationships between political actors and transparency bots. Transparency bots are automated social media accounts which report behaviour of political players/institutions and are normally viewed as a positive force for democracy. In contrast, bot activity such as astroturfing and the creation of fake followers or friends on social media has been examined and critiqued as nefarious in academic and popular literature. We assert that the impact of transparency bots rests largely on the relationships bots have with their creators, journalists, government and political staff, and the general public. Each panelist highlights one of these relationships (noting related interactions with additional actors) in order to answer the question “How do human-bot relationships shape bots’ political impact?”

Through comparative analysis of the Canadian and South African Wikiedits bots, Ford shows that transparency is not a potential affordance of the technology but rather of the conditions in place between actors. Puschmann considers the ways bots are framed and used by journalists in a content analysis of news articles. Dubois and Clarke articulate the ways public servants and political staff respond to the presence of Wikiedits bots revealing that internal institutional policies mediate the relationships these actors can have with bots. Finally, Kaiser asks how users who are not political elite actors frame transparency bots making use of a quantitative and qualitative analysis of Reddit content.

Geiger (respondent) then poses questions which cut across the relationships and themes brought out by panelists. This promotes a holistic view of the bot in their actual communicative system. Cross-cutting questions illustrate that the impact of bots is seen not simply in dyadic relationships but also in the ways various actors interact with each other as well as the bots in question.

This panel is a needed opportunity to critically consider the political role and impact of transparency bots considering the bot in context. Much current literature assumes political bots have significant agency, however, bots need to interact with other political actors in order to have an impact. A nuanced understanding of the different types of relationships among political actors and bots that exists is thus essential. The cohesive conversation presented by panelists allows for a comparison across the different kinds of bot-actor relationships, focusing in detail on particular types of actors and then zooming out to address the wider system inclusive of these relationships.

  1. Bots and their creators
    Heather Ford

Bots – particularly those with public functions such as government transparency – are often created and recreated collaboratively by communities of technologists who share a particular world view of democracy and of technology’s role in politics and social change. This paper will focus on the origins of bots in the motivations and practices of their creators focusing on a particular case of transparency bots. Wikipedia/Twitter bots are built to tweet every time an editor within a particular government IP range edits Wikipedia as a way of notifying others to check possible government attempts to manipulate facts on the platform. The outputs of Wikipedia/Twitter bots have been employed by journalists as sources in stories about governments manipulating information (Ford et al, 2016).

Investigating the relationship between bot creators and their bots in Canada and South Africa by following the bots and their networks using mixed methods, I ask: To what extent is transparency an affordance of the particular technology being employed? Or is transparency rather an affordance of the conditions in place between actors in the network? Building from theories of co-production (Jasanoff, 2004) and comparing the impact of Wikipedia/Twitter bots on the news media in Canada and South Africa, this paper begins to map out the relationships that seem to be required for bots to take on a particular function (such as government transparency). Findings indicate that bots can only become transparency bots through the enrolling of allies (Callon, 1986) and through particular local conditions that ensure success in achieving a particular outcome. This is a stark reminder of the connectedness of human-machine relations and the limitations on technologists to fully create the world they imagine when they build their bots.


2. Bots and Journalists
Cornelius Puschmann

Different social agents — human and non-human — compete for attention, spread information and contribute to political debates online. Journalism is impacted by digital automation in two distinct ways: Through its potentially manipulative influence on reporting and thus public opinion (Woolley & Howard, 2016, Woolley, 2016), and by providing journalists with a set of new tools for providing insight, disseminating information, and connecting with audiences (Graefe, 2016; Lokot & Diakopoulos, 2015). This contribution focuses primarily on the first aspect, but also takes the second into account, because we argue that fears of automation in journalism may fuel reservations among journalists regarding the role of bots more generally.

To address the first aspect, we present the results of a quantitative content analysis of English-language mainstream media discourse on bots. Building on prior research on the reception of Bots (Ford et al, 2016), we focus on the following aspects in particular:

– the context in which bots are discussed,

– the evaluation (“good” for furthering transparency, “bad” because they spread propaganda),

– the implications for public deliberation (if any).

Secondly, we discuss the usage of bots and automation for the news media, using a small set of examples from the context of automated journalism (Johri, Han & Mehta, 2016). Bots are increasingly used to automate particular aspects of journalism, such as the generation of news items and the dissemination of content. Building on these examples we point to the “myriad ways in which news bots are being employed for topical, niche, and local news, as well as for providing higher-order journalistic functions such as commentary, critique, or even accountability” (Lokot & Diakopoulos, 2015, p. 2).


3. Bots and Government/Political Staff
Elizabeth Dubois and Amanda Clarke

Wikiedits bots are thought to promote more transparent, accountable government because they expose the Wikipedia editing practices of public officials, especially important when those edits are part of partisan battles between political staff, or enable the spread of misinformation and propaganda by properly neutral public servants. However, far from bolstering democratic accountability, these bots may have a perverse effect on democratic governance. Early evidence suggests that the Canadian Wikiedits bot (@gccaedits) may be contributing to a chilling effect wherein public servants and political staff are editing Wikipedia less or editing in ways that are harder to track in order to avoid the scrutiny that these bots enable (Ford et al, 2016). The extent to which this chilling effect shapes public officials’ willingness to edit Wikipedia openly (or at all), and the role the bot plays in inducing this chilling effect, remain open questions ripe for investigation. Focusing on the bot tracking activity in the Government of Canada (@gccaedits), this paper reports on the findings of in-depth interviews with public and political officials responsible for Wikipedia edits as well as analysis of internal government documents related to the bot (retrieved through Access to Information requests).

We find that internal institutional policies, constraints of the Westminster system of democracy (which demands public servants remain anonymous, and that all communications be tightly managed in strict hierarchical chains of command), paired with primarily negative media reporting of the @gccaedits bot, have inhibited Wikipedia editing. This poses risks to the quality of democratic governance in Canada. First, many edits revealed by the bot are in fact useful contributions to knowledge, and reflect the elite and early insider insight of public officials. At a larger level, these edits represent novel and significant disruptions to a public sector communications culture that has not kept pace with the networked models of information production and dissemination that characterize the digital age. In this sense, the administrative and journalistic response to the bot’s reporting sets back important efforts to bolster Open Government and digital era public service renewal. Detailing these costs, and analysing the role of the bot and human responses to it, this paper suggests how wikiedit bots shape digital era governance.

4. Bots and Users
Jonas Kaiser

Users interact online with bots on a daily basis. They tweet, upvote or comment, in short: participate in many different communities and are involved in shaping the user’s perceptions. Based on this experience the users’ perspective on bots may differ significantly from journalists, bot creators or political actors. Yet it is being ignored in the literature up to now. As such we are missing an integral perspective on bots that may help us to understand how the societal discourse surrounding bots is structured. To analyze how and in which context users talk about transparency bots specifically a content analysis and topic analysis of Reddit comments from 86 posts in 48 subreddits on the issue of Wikiedits bots will be conducted. This proposal’s research focuses on two major aspects: how Reddit users 1) frame and with what other 2) topics they associate transparency bots.

Framing in this context is understood as “making sense of relevant events, suggesting what is at issue” (Gamson & Modigliani, 1989, p. 3). Even though some studies have shown, for example, how political actors frame bots (Ford, Dubois, & Puschmann, 2016) a closer look at the user’s side is missing. But this perspective is important as non-elite users may have a different view than the more elite political actors that can help us understand in how they interpret bots. This overlooked perspective, then, could have meaningful implications for political actors or bot creators. At the same time it is important to understand the broader context of the user discourse on transparency bots to properly connect the identified frames with overarching topics. Hence an automated topic modeling approach (Blei, Ng & Jordan, 2003) is chosen to identify the underlying themes within the comments. By combining frame analysis with topic modeling this project will highlight the way users talk about transparency bots and in which context they do so and thus emphasize the role of the users within the broader public discourse on bots.


Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

Callon, M. (1986). “Some Elements of a Sociology of Translation: Domestication of the Scallops and the Fishermen of St Brieuc Bay”. In John Law (ed.), Power, Action and Belief: A New Sociology of Knowledge (London: Routledge & Kegan Paul).

Ford, H., Dubois, E., & Puschmann, C. (2016). Automation, Algorithms, and Politics | Keeping Ottawa Honest—One Tweet at a Time? Politicians, Journalists, Wikipedians and Their Twitter Bots. International Journal of Communication, 10, 24.

Gamson, W. A., & Modigliani, A. (1989). Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach. American Journal of Sociology, 95(1), 1-37.

Graefe, A. (2016). Guide to automated journalism.

Jasanoff, S. (2004). States of Knowledge: The Co-Production of Science and the Social Order. (London: Routledge Chapman & Hall)

Johri et al. (2016). Domain specific newsbots. Live automated reporting systems involving natural language communication. Paper presented at 2016 Computation + Journalism Symposium.

Lokot, T. & Diakopoulos, N. (2015). News bots: Automating news and information dissemination on Twitter. Digital Journalism. doi: 10.1080/21670811.2015.1081822

Woolley, S. C. (2016). Automating power: Social bot interference in global politics. First Monday. doi: 10.5210/fm.v21i4.6161

Woolley, S. C., & Howard, P. (2016). Bots unite to automate the presidential election. Retrieved Jun. 5, 2016, from

by Heather Ford at January 21, 2017 03:40 PM

Ph.D. student

consequences of scale

Here’s some key things about an economy of control:

  • An economy of control is normally very stable. It’s punctuated equilibrium. But the mean size of disruptive events increases over time, because each of these events can cause a cascade through an ever increasingly complex system.
  • An economy of control has enormous inequalities of all kinds of scale. But there’s a kind of evenness to the inequality from an information theoretic perspective, because of a conservation of entropy principle.
  • An economy of control can be characterized adequately using third order cybernetics. It’s an unsolved research problem to determine whether third order cybernetics is reducible to second order cybernetics. There should totally be a big prize for the first person who figures this out. That prize is a very lucrative hedge fund.
  • An economy of control is, of course, characterized mainly by its titular irony: there is the minimum possible control necessary to maintain the system’s efficiency. It’s a totalizing economic model of freedom maximization.
  • Economics of control is to neoliberalism and computational social science what neoliberalism was to political liberalism and neoclassical economic theory.
  • The economy of control preserves privacy perfectly at equilibrium, barring externalities.
  • The economy of control internalizes all externalities in the long run.
  • In the economy of control, demand is anthropic.
  • In the economy of control, for any belief that needs to be shouted on television, there is a person who sincerely believes it who is willing to get paid to shout it. Journalism is replaced entirely by networks of trusted scholarship.
  • The economy of control is sociologically organized according to two diverging principles: the organizational evolutionary pressure familiar from structural functionalism, and entropy. It draws on Bataille’s theory of the general economy. But it borrows from Ulanowicz the possibility of life overcoming thermodynamics. So to speak.

Just brainstorming here.

by Sebastian Benthall at January 21, 2017 03:59 AM

January 19, 2017

Ph.D. student

what if computers don’t actually control anything important?

I’ve written a lot (here, informally) on the subject of computational control of society. I’m not the only one, of course. There has in the past few years been a growing fear that one day artificial intelligence might control everything. I’ve argued that this is akin to older fears that, under capitalism, instrumentality would run amok.

Recently, thinking a little more seriously about what’s implied by an economy of control, I’ve been coming around to a quite different conclusion. What if the general tendency of these algorithmic systems is not the enslavement of humanity but rather the opening up of freedom and opportunity? This is not a critical attitude and might be seen as a simple shilling for industrial powers, so let me pose the point slightly more controversially. What if the result of these systems is to provide so much freedom and opportunity that it undermines the structure that makes social action significant? The “control” of these systems could just be the result of our being exposed, at last, to our individual insignificance in the face of each other.

As a foil, I’ll refer again to Frank Pasquale’s The Black Box Society, which I’ve begun to read again at the prompting of Pasquale himself. It is a rare and wonderful thing for the author of a book you’ve written rude things about to write you and tell you you’ve misrepresented the work. So often I assume nobody’s actually reading what I write, making this a lonely vocation indeed. Now I know that at least somebody gives a damn.

In Chapter 3, Pasquale writes:

“The power to include, exclude, and rank [in search results] is the power to ensure which public impressions become permanent and which remain fleeting. That is why search services, social and not, are ‘must-have’ properties for advertisers as well as users. As such, they have made very deep inroads indeed into the sphere of cultural, economic, and political influence that was once dominated by broadcast networks, radio stations, and newspapers. But their dominance is so complete, and their technology so complex, that they have escaped pressures for transparency and accountability that kept traditional media answerable to the public.”

As a continuation of the “technics-out-of-control” meme, there’s an intuitive thrust to this argument. But looking at the literal meaning of the sentences, none of it is actually true!

Let’s look at some of the reasons why these claims are false:

  • There are multiple competing search engines, and switching costs are very low. There are Google and Bing and Duck Duck Go, but there’s also more specialized search engines for particular kinds of things. Literally every branded shopping website has a search engine that includes only what it chooses to include. This market pressure for search drives search engines generally to provide people with the answers they are looking for.
  • While there is a certain amount of curation that goes into search results, the famous early ranking logic which made large scale search possible used mainly data created as part of the content itself (hyperlinks in the case of Google’s PageRank) or usage (engagement in the case of Facebook’s EdgeRank). To the extent that these algorithms have changed, much of it has been because they have had to cave to public pressure, in the form of market pressure. Many of these changes are based on dynamic socially created data as well (such as spam flagging). Far from being manipulated by a secret powerful force, search engine results are always a dynamic, social accomplishment that is a reflection of the public.
  • Alternative media forms, such as broadcast radio, print journalism, cable television, storefront advertisting, and so on still exist and have an influence over people’s decisions. No single digital technology ensures anything! A new restaurant that opens up in a neighborhood is free to gain a local reputation in the old fashioned way. And then these same systems for ranking and search incentivize the discovery over these local gems by design. The information economy doesn’t waste opportunities like this!

So what’s the problem? If algorithms aren’t controlling society, but rather are facilitating its self-awareness, maybe these kinds of polemics are just way off base.

by Sebastian Benthall at January 19, 2017 05:11 AM

January 17, 2017

Ph.D. student

economy of control

We call it a “crisis” when the predictions of our trusted elites are violated in one way or another. We expect, for good reason, things to more or less continue as they are. They’ve evolved to be this way, haven’t they? The older the institution, the more robust to change it must be.

I’ve gotten comfortable in my short life with the global institutions that appeared to be the apex of societal organization. Under these conditions, I found James Beniger‘s work to be particularly appealing, as it predicts the growth of information processing apparati (some combination of information worker and information technology) as formerly independent components of society integrate. I’m of the class of people that benefits from this kind of centralization of control, so I was happy to believe that this was an inevitable outcome according to physical law.

Now I’m not so sure.

I am not sure I’ve really changed my mind fundamentally. This extreme Beniger view is too much like Nick Bostrom’s superintelligence argument in form, and I’ve already thought hard about why that argument is not good. That reasoning stopped at the point of noting how superintelligence “takeoff” is limited by data collection. But I did not go to the next and probably more important step, which is the problem of aleatoric uncertainty in a world with multiple agents. We’re far more likely to get into a situation with multi-polar large intelligences that are themselves fraught with principle-agent problems, because that’s actually the status quo.

I’ve been prodded to revisit The Black Box Society, which I’ve dealt with inadequately. Its beefier chapters deal with a lot of the specific economic and regulatory recent history of the information economy of the United States, which is a good complement to Beniger and a good resource for the study of competing intelligences within a single economy, though I find this data a but clouded by the polemical writing.

“Economy” is the key word here. Pure, Arendtian politics and technics have not blended easily, but what they’ve turned into is a self-regulatory system with structure and agency. More than that, the structure is for sale, and so is the agency. What is interesting about the information economy is, and I guess I’m trying to coin a phrase here, is that it is an economy of control. The “good” being produced, sold, and bought, is control.

There’s a lot of interesting research about information goods. But I’ve never heard of a “control good”. But this is what we are talking about when we talk about software, data collection, managerial labor, and the conflicts and compromises that it creates.

I have a few intuitions about where this goes, but not as many as I’d like. I think this is because the economy of control is quite messy and hard to reason about.

by Sebastian Benthall at January 17, 2017 12:10 AM

January 13, 2017

Ph.D. student

habitus and citizenship

Just a quick thought… So in Bourdieu’s Science of Science and Reflexivity, he describes the habitus of the scientist. Being a scientist demands a certain adherence to the rules of the scientific game, certain training, etc. He winds up constructing a sociological explanation for the epistemic authority of science. The rules of the game are the conditions for objectivity.

When I was working on a now defunct dissertation, I was comparing this formulation of science with a formulation of democracy and the way it depends on publics. Habermasian publics, Fraserian publics, you get the idea. Within this theory, what was once a robust theory of collective rationality as the basis for democracy has deteriorated under what might be broadly construed as “postmodern” critiques of this rationality. One could argue that pluralistic multiculturalism, not collective reason, became the primary ideology for American democracy in the past eight years.

Pretty sure this backfired with e.g. the Alt-Right.

So what now? I propose that those interested in functioning democracy reconsider the habitus of citizenship and how it can be maintained through the education system and other civic institutions. It’s a bit old-school. But if the Alt-Right wanted a reversion to historical authoritarian forms of Western governance, we may be getting there. Suppose history moves in a spiral. It might be best to try to move forward, not back.

by Sebastian Benthall at January 13, 2017 12:29 AM

January 10, 2017

Ph.D. student

Loving Tetlock’s Superforecasting: The Art and Science of Prediction

I was a big fan of Philip Tetlock’s Expert Political Judgment (EPJ). I read it thoroughly; in fact a book review of it was my first academic publication. It was very influential on me.

EPJ is a book that is troubling to many political experts because it basically says that most so-called political expertise is bogus and that what isn’t bogus is fairly limited. It makes this argument with far more meticulous data collection and argumentation than I am able to do justice to here. I found it completely persuasive and inspiring. It wasn’t until I got to Berkeley that I met people who had vivid negative emotional reactions to this work. They seem to mainly have been political experts who do not having their expertise assessed in terms of its predictive power.

Superforecasting: The Art and Science of Prediction (2016) is a much more accessible book that summarizes the main points from EPJ and then discusses the results of Tetlock’s Good Judgment Project, which was his answer to an IARPA challenge in forecasting political events.

Much of the book is an interesting history of the United States Intelligence Community (IC) and the way its attitudes towards political forecasting have evolved. In particular, the shock of the failure of the predictions around Weapons of Mass Destruction that lead to the Iraq War were a direct cause of IARPA’s interest in forecasting and their funding of the Good Judgment Project despite the possibility that the project’s results would be politically challenging. IARPA comes out looking like a very interesting and intellectually honest organization solving real problems for the people of the United States.

Reading this has been timely for me because: (a) I’m now doing what could be broadly construed as “cybersecurity” work, professionally, (b) my funding is coming from U.S. military and intelligence organizations, and (c) the relationship between U.S. intelligence organizations and cybersecurity has been in the news a lot lately in a very politicized way because of the DNC hacking aftermath.

Since so much of Tetlock’s work is really just about applying mathematical statistics to the psychological and sociological problem of developing teams of forecasters, I see the root of it as the same mathematical theory one would use for any scientific inference. Cybersecurity research, to the extent that it uses sound scientific principles (which it must, since it’s all about the interaction between society, scientifically designed technology, and risk), is grounded in these same principles. And at its best the U.S. intelligence community lives up to this logic in its public service.

The needs of the intelligence community with respect to cybersecurity can be summed up in one word: rationality. Tetlock’s work is a wonderful empirical study in rationality that’s a must-read for anybody interested in cybersecurity policy today.

by Sebastian Benthall at January 10, 2017 10:54 PM

Ph.D. alumna

Why America is Self-Segregating

The United States has always been a diverse but segregated country. This has shaped American politics profoundly. Yet, throughout history, Americans have had to grapple with divergent views and opinions, political ideologies, and experiences in order to function as a country. Many of the institutions that underpin American democracy force people in the United States to encounter difference. This does not inherently produce tolerance or result in healthy resolution. Hell, the history of the United States is fraught with countless examples of people enslaving and oppressing other people on the basis of difference. This isn’t about our past; this is about our present. And today’s battles over laws and culture are nothing new.

Ironically, in a world in which we have countless tools to connect, we are also watching fragmentation, polarization, and de-diversification happen en masse. The American public is self-segregating, and this is tearing at the social fabric of the country.

Many in the tech world imagined that the Internet would connect people in unprecedented ways, allow for divisions to be bridged and wounds to heal.It was the kumbaya dream. Today, those same dreamers find it quite unsettling to watch as the tools that were designed to bring people together are used by people to magnify divisions and undermine social solidarity. These tools were built in a bubble, and that bubble has burst.

Nowhere is this more acute than with Facebook. Naive as hell, Mark Zuckerberg dreamed he could build the tools that would connect people at unprecedented scale, both domestically and internationally. I actually feel bad for him as he clings to that hope while facing increasing attacks from people around the world about the role that Facebook is playing in magnifying social divisions. Although critics love to paint him as only motivated by money, he genuinely wants to make the world a better place and sees Facebook as a tool to connect people, not empower them to self-segregate.

The problem is not simply the “filter bubble,” Eli Pariser’s notion that personalization-driven algorithmic systems help silo people into segregated content streams. Facebook’s claim that content personalization plays a small role in shaping what people see compared to their own choices is accurate.And they have every right to be annoyed. I couldn’t imagine TimeWarner being blamed for who watches Duck Dynasty vs. Modern Family. And yet, what Facebook does do is mirror and magnify a trend that’s been unfolding in the United States for the last twenty years, a trend of self-segregation that is enabled by technology in all sorts of complicated ways.

The United States can only function as a healthy democracy if we find a healthy way to diversify our social connections, if we find a way to weave together a strong social fabric that bridges ties across difference.

Yet, we are moving in the opposite direction with serious consequences. To understand this, let’s talk about two contemporary trend lines and then think about the implications going forward.

Privatizing the Military

The voluntary US military is, in many ways, a social engineering project. The public understands the military as a service organization, dedicated to protecting the country’s interests. Yet, when recruits sign up, they are promised training and job opportunities. Individual motivations vary tremendously, but many are enticed by the opportunity to travel the world, participate in a cause with a purpose, and get the heck out of dodge. Everyone expects basic training to be physically hard, but few recognize that some of the most grueling aspects of signing up have to do with the diversification project that is central to the formation of the American military.

When a soldier is in combat, she must trust her fellow soldiers with her life. And she must be willing to do what it takes to protect the rest of her unit. In order to make that possible, the military must wage war on prejudice. This is not an easy task. Plenty of generals fought hard to fight racial desegregation and to limit the role of women in combat. Yet, the US military was desegregated in 1948, six years before Brown v. Board forced desegregation of schools. And the Supreme Court ruled that LGB individuals could openly serve in the military before they could legally marry.

CC BY 2.0-licensed photo by The U.S. Army.

Morale is often raised as the main reason that soldiers should not be forced to entrust their lives to people who are different than them. Yet, time and again, this justification collapses under broader interests to grow the military. As a result, commanders are forced to find ways to build up morale across difference, to actively and intentionally seek to break down barriers to teamwork, and to find a way to gel a group of people whose demographics, values, politics, and ideologies are as varied as the country’s.

In the process, they build one of the most crucial social infrastructures of the country. They build the diverse social fabric that underpins democracy.

Tons of money was poured into defense after 9/11, but the number of people serving in the US military today is far lower than it was throughout the 1980s. Why? Starting in the 1990s and accelerating after 9/11, the US privatized huge chunks of the military. This means that private contractors and their employees play critical roles in everything from providing food services to equipment maintenance to military housing. The impact of this on the role of the military in society is significant. For example, this undermine recruits’ ability to get training to develop critical skills that will be essential for them in civilian life. Instead, while serving on active duty, they spend a much higher amount of time on the front lines and in high-risk battle, increasing the likelihood that they will be physically or psychologically harmed. The impact on skills development and job opportunities is tremendous, but so is the impact on the diversification of the social fabric.

Private vendors are not engaged in the same social engineering project as the military and, as a result, tend to hire and fire people based on their ability to work effectively as a team. Like many companies, they have little incentive to invest in helping diverse teams learn to work together as effectively as possible. Building diverse teams — especially ones in which members depend on each other for their survival — is extremely hard, time-consuming, and emotionally exhausting. As a result, private companies focus on “culture fit,” emphasize teams that get along, and look for people who already have the necessary skills, all of which helps reinforce existing segregation patterns.

The end result is that, in the last 20 years, we’ve watched one of our major structures for diversification collapse without anyone taking notice. And because of how it’s happened, it’s also connected to job opportunities and economic opportunity for many working- and middle-class individuals, seeding resentment and hatred.

A Self-Segregated College Life

If you ask a college admissions officer at an elite institution to describe how they build a class of incoming freshman, you will quickly realize that the American college system is a diversification project. Unlike colleges in most parts of the world, the vast majority of freshman at top tier universities in the United States live on campus with roommates who are assigned to them. Colleges approach housing assignments as an opportunity to pair diverse strangers with one another to build social ties. This makes sense given how many friendships emerge out of freshman dorms. By pairing middle class kids with students from wealthier families, elite institutions help diversify the elites of the future.

This diversification project produces a tremendous amount of conflict. Although plenty of people adore their college roommates and relish the opportunity to get to know people from different walks of life as part of their college experience, there is an amazing amount of angst about dorm assignments and the troubles that brew once folks try to live together in close quarters. At many universities, residential life is often in the business of student therapy as students complain about their roommates and dormmates. Yet, just like in the military, learning how to negotiate conflict and diversity in close quarters can be tremendously effective in sewing the social fabric.

CC BY-NC-ND 2.0-licensed photo by Ilya Khurosvili.

In the springs of 2006, I was doing fieldwork with teenagers at a time when they had just received acceptances to college. I giggled at how many of them immediately wrote to the college in which they intended to enroll, begging for a campus email address so that they could join that school’s Facebook (before Facebook was broadly available). In the previous year, I had watched the previous class look up roommate assignments on MySpace so I was prepared for the fact that they’d use Facebook to do the same. What I wasn’t prepared for was how quickly they would all get on Facebook, map the incoming freshman class, and use this information to ask for a roommate switch. Before they even arrived on campus in August/September of 2006, they had self-segregated as much as possible.

A few years later, I watched another trend hit: cell phones. While these were touted as tools that allowed students to stay connected to parents (which prompted many faculty to complain about “helicopter parents” arriving on campus), they really ended up serving as a crutch to address homesickness, as incoming students focused on maintaining ties to high school friends rather than building new relationships.

Students go to elite universities to “get an education.” Few realize that the true quality product that elite colleges in the US have historically offered is social network diversification. Even when it comes to job acquisition, sociologists have long known that diverse social networks (“weak ties”) are what increase job prospects. By self-segregating on campus, students undermine their own potential while also helping fragment the diversity of the broader social fabric.

Diversity is Hard

Diversity is often touted as highly desirable. Indeed, in professional contexts, we know that more diverse teams often outperform homogeneous teams. Diversity also increases cognitive development, both intellectually and socially. And yet, actually encountering and working through diverse viewpoints, experiences, and perspectives is hard work. It’s uncomfortable. It’s emotionally exhausting. It can be downright frustrating.

Thus, given the opportunity, people typically revert to situations where they can be in homogeneous environments. They look for “safe spaces” and “culture fit.” And systems that are “personalized” are highly desirable. Most people aren’t looking to self-segregate, but they do it anyway. And, increasingly, the technologies and tools around us allow us to self-segregate with ease. Is your uncle annoying you with his political rants? Mute him. Tired of getting ads for irrelevant products? Reveal your preferences. Want your search engine to remember the things that matter to you? Let it capture data. Want to watch a TV show that appeals to your senses? Here are some recommendations.

Any company whose business model is based on advertising revenue and attention is incentivized to engage you by giving you what you want. And what you want in theory is different than what you want in practice.

Consider, for example, what Netflix encountered when it started its streaming offer. Users didn’t watch the movies that they had placed into their queue. Those movies were the movies they thought they wanted, movies that reflected their ideal self — 12 Years a Slave, for example. What they watched when they could stream whatever they were in the mood for at that moment was the equivalent of junk food — reruns of Friends, for example. (This completely undid Netflix’s recommendation infrastructure, which had been trained on people’s idealistic self-images.)

The divisions are not just happening through commercialism though. School choice has led people to self-segregate from childhood on up. The structures of American work life mean that fewer people work alongside others from different socioeconomic backgrounds. Our contemporary culture of retail and service labor means that there’s a huge cultural gap between workers and customers with little opportunity to truly get to know one another. Even many religious institutions are increasingly fragmented such that people have fewer interactions across diverse lines. (Just think about how there are now “family services” and “traditional services” which age-segregate.) In so many parts of public, civic, and professional life, we are self-segregating and the opportunities for doing so are increasing every day.

By and large, the American public wants to have strong connections across divisions. They see the value politically and socially. But they’re not going to work for it. And given the option, they’re going to renew their license remotely, try to get out of jury duty, and use available data to seek out housing and schools that are filled with people like them. This is the conundrum we now face.

Many pundits remarked that, during the 2016 election season, very few Americans were regularly exposed to people whose political ideology conflicted with their own. This is true. But it cannot be fixed by Facebook or news media. Exposing people to content that challenges their perspective doesn’t actually make them more empathetic to those values and perspectives. To the contrary, it polarizes them. What makes people willing to hear difference is knowing and trusting people whose worldview differs from their own. Exposure to content cannot make up for self-segregation.

If we want to develop a healthy democracy, we need a diverse and highly connected social fabric. This requires creating contexts in which the American public voluntarily struggles with the challenges of diversity to build bonds that will last a lifetime. We have been systematically undoing this, and the public has used new technological advances to make their lives easier by self-segregating. This has increased polarization, and we’re going to pay a heavy price for this going forward. Rather than focusing on what media enterprises can and should do, we need to focus instead on building new infrastructures for connection where people have a purpose for coming together across divisions. We need that social infrastructure just as much as we need bridges and roads.

This piece was originally published as part of a series on media, accountability, and the public sphere. See also:

by zephoria at January 10, 2017 01:15 PM

January 09, 2017

MIMS 2018

Trump and the Strategy of Irrationality

I wrote this piece in November 2016 and sat on it for a while, unsure whether or not I wanted to publish it. Since then, the Washington Post and the Boston Globe have had great pieces making similar points to the one I made here: that Donald Trump’s unpredictability may, in certain situations, give him leverage in negotiations. The world has changed a lot in these two short months but many points I make here still stand. So please, enjoy.

Source: Wikimedia Commons

Donald Trump is not just the most controversial President-Elect in recent American history — he is also the most unpredictable. His lack of political experience, inconsistent views, and tendency towards outbursts leave even his most ardent supporters unsure of what a President Trump might do in a given situation. Yet counterintuitively, his unpredictability may help him in the international arena.

The reason is a basic tenet of game theory. In a conflict, a person’s bargaining power depends on their perceived willingness to go through with a threat, even at a cost to themselves. If an opponent sees a threatener as irrational, they will also see them as more willing to go through with a costly threat, either because they do not know or do not care about the consequences. Thus, the opponent is more likely to yield.

This is where the irrationality of Trump shines.

For example, he may have an advantage over traditional politicians in renegotiating foreign trade deals because he is viewed as unstable enough to scrap them, even if it would hurt the American economy. A politician who has shown more nuanced views of America’s trade relations and economic interests would not have this same leverage.

Thomas Schelling. Source: Harvard Gazette

This strategy of irrationality is not new. It was popularized in 1960 by the Nobel Prize winning economist Thomas Schelling in his book Strategy of Conflict. It was used in the Cold War by both American presidents and Russian secretaries. Even Voltaire said, “Be sure that your madness corresponds with the turn and temper of your age…and forget not to be excessively opinionated and obstinate.”

Of all the US presidents, Richard Nixon put the most faith in what he called the “madman strategy.” He tried to appear “mad” enough to use nuclear weapons in order to bring North Vietnam to the negotiation table. In a private conversation, Nixon told his Chief of Staff the following:

I want the North Vietnamese to believe I’ve reached the point where I might do anything to stop the war. We’ll just slip the word to them that “for God’s sake, you know Nixon is obsessed about Communism. We can’t restrain him when he’s angry — and he has his hand on the nuclear button.”

After four years, Nixon’s “madman strategy” failed to end the war. He could only apply it intermittently; his “madness” for flying planes strapped with nuclear weapons over Northern Vietnam was tempered by his sanity in negotiations with Russia and China. Additionally, the repercussions of using nuclear weapons were so drastic that it was difficult to convince anyone he was willing to use them, especially after Russia achieved nuclear parity with the US.

President Richard Nixon. Source: Flickr

President Trump may have more success in applying the “madman strategy” because many people already see him as mad. Unlike Nixon, who tried to shift his perception from sane to insane, Trump has cultivated his unstable persona over almost a year and a half of campaigning and decades in the public eye. His perceived lack of knowledge regarding everything political may also cause opponents to see him as incapable of making rational decisions.

The strategy of irrationality is contingent on a number of assumptions. It assumes a somewhat rational opponent and a centralized decision making authority, neither of which apply to America’s most virulent enemy, ISIS. It also assumes a medium of communication to send threats over, which may be more difficult in dealings with countries with whom the US lacks diplomatic relations, like Iran and North Korea.

The utility of the strategy of irrationality is further complicated by the fact that most relationships the United States has with other countries are simultaneously oppositional and collaborative. For example, President Trump may consider France an opponent in environmental and NATO negotiations but an ally in trading. His perceived instability could give him leverage in negotiations but harm mutually beneficial relations with France.

The strategy also depends on whether President Trump is as unpredictable as candidate Trump. President-Elect Trump has already backed off from some of his more outlandish campaign trail promises. Global views of Trump are constantly shifting, especially as news comes out about his cabinet, and a method to his madness may become apparent as he makes more executive decisions.

The unpredictability of Donald Trump has brought about sleepless nights for many Americans. His perceived irrationality may damage allegiances within and without the country, but it may also give him leverage in future international conflicts. Donald Trump has always said he is a dealmaker and he might just be crazy enough to be right.

by Gabe Nicholas at January 09, 2017 04:50 PM

Ph.D. alumna

Did Media Literacy Backfire?

Anxious about the widespread consumption and spread of propaganda and fake news during this year’s election cycle, many progressives are calling for an increased commitment to media literacy programs. Others are clamoring for solutions that focus on expert fact-checking and labeling. Both of these approaches are likely to fail — not because they are bad ideas, but because they fail to take into consideration the cultural context of information consumption that we’ve created over the last thirty years. The problem on our hands is a lot bigger than most folks appreciate.

CC BY 2.0-licensed photo by CEA+ | Artist: Nam June Paik, “Electronic Superhighway. Continental US, Alaska & Hawaii” (1995).

What Are Your Sources?

I remember a casual conversation that I had with a teen girl in the midwest while I was doing research. I knew her school approached sex ed through an abstinence-only education approach, but I don’t remember how the topic of pregnancy came up. What I do remember is her telling me that she and her friends talked a lot about pregnancy and “diseases” she could get through sex. As I probed further, she matter-of-factly explained a variety of “facts” she had heard that were completely inaccurate. You couldn’t get pregnant until you were 16. AIDS spreads through kissing. Etc. I asked her if she’d talked to her doctor about any of this, and she looked me as though I had horns. She explained that she and her friends had done the research themselves, by which she meant that they’d identified websites online that “proved” their beliefs.

For years, that casual conversation has stuck with me as one of the reasons that we needed better Internet-based media literacy. As I detailed in my book It’s Complicated: The Social Lives of Networked Teens, too many students I met were being told that Wikipedia was untrustworthy and were, instead, being encouraged to do research. As a result, the message that many had taken home was to turn to Google and use whatever came up first. They heard that Google was trustworthy and Wikipedia was not.

Understanding what sources to trust is a basic tenet of media literacy education. When educators encourage students to focus on sourcing quality information, they encourage them to critically ask who is publishing the content. Is the venue a respected outlet? What biases might the author have? The underlying assumption in all of this is that there’s universal agreement that major news outlets like the New York Times, scientific journal publications, and experts with advanced degrees are all highly trustworthy.

Think about how this might play out in communities where the “liberal media” is viewed with disdain as an untrustworthy source of information…or in those where science is seen as contradicting the knowledge of religious people…or where degrees are viewed as a weapon of the elite to justify oppression of working people. Needless to say, not everyone agrees on what makes a trusted source.

Students are also encouraged to reflect on economic and political incentives that might bias reporting. Follow the money, they are told. Now watch what happens when they are given a list of names of major power players in the East Coast news media whose names are all clearly Jewish. Welcome to an opening for anti-Semitic ideology.

Empowered Individuals…with Guns

We’ve been telling young people that they are the smartest snowflakes in the world. From the self-esteem movement in the 1980s to the normative logic of contemporary parenting, young people are told that they are lovable and capable and that they should trust their gut to make wise decisions. This sets them up for another great American ideal: personal responsibility.

In the United States, we believe that worthy people lift themselves up by their bootstraps. This is our idea of freedom. What it means in practice is that every individual is supposed to understand finance so well that they can effectively manage their own retirement funds. And every individual is expected to understand their health risks well enough to make their own decisions about insurance. To take away the power of individuals to control their own destiny is viewed as anti-American by so much of this country. You are your own master.

Children are indoctrinated into this cultural logic early, even as their parents restrict their mobility and limit their access to social situations. But when it comes to information, they are taught that they are the sole proprietors of knowledge. All they have to do is “do the research” for themselves and they will know better than anyone what is real.

Combine this with a deep distrust of media sources. If the media is reporting on something, and you don’t trust the media, then it is your responsibility to question their authority, to doubt the information you are being given. If they expend tremendous effort bringing on “experts” to argue that something is false, there must be something there to investigate.

Now think about what this means for #Pizzagate. Across this country, major news outlets went to great effort to challenge conspiracy reports that linked John Podesta and Hillary Clinton to a child trafficking ring supposedly run out of a pizza shop in Washington, DC. Most people never heard the conspiracy stories, but their ears perked up when the mainstream press went nuts trying to debunk these stories. For many people who distrust “liberal” media and were already primed not to trust Clinton, the abundant reporting suggested that there was something to investigate.

Most people who showed up to the Comet Ping Pong pizzeria to see for their own eyes went undetected. But then a guy with a gun decided he “wanted to do some good” and “rescue the children.” He was the first to admit that “the intel wasn’t 100%,” but what he was doing was something that we’ve taught people to do — question the information they’re receiving and find out the truth for themselves.

Experience Over Expertise

Many marginalized groups are justifiably angry about the ways in which their stories have been dismissed by mainstream media for decades. This is most acutely felt in communities of color. And this isn’t just about the past. It took five days for major news outlets to cover Ferguson. It took months and a lot of celebrities for journalists to start discussing the Dakota Pipeline. But feeling marginalized from news media isn’t just about people of color. For many Americans who have watched their local newspaper disappear, major urban news reporting appears disconnected from reality. The issues and topics that they feel affect their lives are often ignored.

For decades, civil rights leaders have been arguing for the importance of respecting experience over expertise, highlighting the need to hear the voices of people of color who are so often ignored by experts. This message has taken hold more broadly, particularly among lower and middle class whites who feel as though they are ignored by the establishment. Whites also want their experiences to be recognized, and they too have been pushing for the need to understand and respect the experiences of “the common man.” They see “liberal” “urban” “coastal” news outlets as antithetical to their interests because they quote from experts, use cleaned-up pundits to debate issues, and turn everyday people (e.g., “red sweater guy”) into spectacles for mass enjoyment.

Consider what’s happening in medicine. Many people used to have a family doctor whom they knew for decades and trusted as individuals even more than as experts. Today, many people see doctors as arrogant and condescending, overly expensive and inattentive to their needs. Doctors lack the time to spend more than a few minutes with patients, and many people doubt that the treatment they’re getting is in their best interest. People feel duped into paying obscene costs for procedures that they don’t understand. Many economists can’t understand why so many people would be against the Affordable Care Act because they don’t recognize that this “socialized” medicine is perceived as experts over experience by people who don’t trust politicians who tell them what’s in their best interest any more than they trust doctors. And public trust in doctors is declining sharply.

Why should we be surprised that most people are getting medical information from their personal social network and the Internet? It’s a lot cheaper than seeing a doctor, and both friends and strangers on the Internet are willing to listen, empathize, and compare notes. Why trust experts when you have at your fingertips a crowd of knowledgeable people who may have had the same experience as you and can help you out?

Consider this dynamic in light of discussions around autism and vaccinations. First, an expert-produced journal article was published linking autism to vaccinations. This resonated with many parents’ experience. Then, other experts debunked the first report, challenged the motivations of the researcher, and engaged in a mainstream media campaign to “prove” that there was no link. What unfolded felt like a war on experience, and a network of parents coordinated to counter this new batch of experts who were widely seen as ignorant, moneyed, and condescending. The more that the media focused on waving away these networks of parents through scientific language, the more the public felt sympathetic to the arguments being made by anti-vaxxers.

Keep in mind that anti-vaxxers aren’t arguing that vaccinations definitively cause autism. They are arguing that we don’t know. They are arguing that experts are forcing children to be vaccinated against their will, which sounds like oppression. What they want is choice — the choice to not vaccinate. And they want information about the risks of vaccination, which they feel are not being given to them. In essence, they are doing what we taught them to do: questioning information sources and raising doubts about the incentives of those who are pushing a single message. Doubt has become tool.

Grappling with “Fake News”

Since the election, everyone has been obsessed with fake news, as experts blame “stupid” people for not understanding what is “real.” The solutionism around this has been condescending at best. More experts are needed to label fake content. More media literacy is needed to teach people how not to be duped. And if we just push Facebook to curb the spread of fake news, all will be solved.

I can’t help but laugh at the irony of folks screaming up and down about fake news and pointing to the story about how the Pope backs Trump. The reason so many progressives know this story is because it was spread wildly among liberal circles who were citing it as appalling and fake. From what I can gather, it seems as though liberals were far more likely to spread this story than conservatives. What more could you want if you ran a fake news site whose goal was to make money by getting people to spread misinformation? Getting doubters to click on clickbait is far more profitable than getting believers because they’re far more likely to spread the content in an effort to dispel the content. Win!

CC BY 2.0-licensed photo by Denis Dervisevic.

People believe in information that confirms their priors. In fact, if you present them with data that contradicts their beliefs, they will double down on their beliefs rather than integrate the new knowledge into their understanding. This is why first impressions matter. It’s also why asking Facebook to show content that contradicts people’s views will not only increase their hatred of Facebook but increase polarization among the network. And it’s precisely why so many liberals spread “fake news” stories in ways that reinforce their belief that Trump supporters are stupid and backwards.

Labeling the Pope story as fake wouldn’t have stopped people from believing that story if they were conditioned to believe it. Let’s not forget that the public may find Facebook valuable, but it doesn’t necessarily trust the company. So their “expertise” doesn’t mean squat to most people. Of course, it would be an interesting experiment to run; I do wonder how many liberals wouldn’t have forwarded it along if it had been clearly identified as fake. Would they have not felt the need to warn everyone in their network that conservatives were insane? Would they have not helped fuel a money-making fake news machine? Maybe.

But I think labeling would reinforce polarization — but it would feel like something was done. Nonbelievers would use the label to reinforce their view that the information is fake (and minimize the spread, which is probably a good thing), while believers would simply ignore the label. But does that really get us to where we want to go?

Addressing so-called fake news is going to require a lot more than labeling.It’s going to require a cultural change about how we make sense of information, whom we trust, and how we understand our own role in grappling with information. Quick and easy solutions may make the controversy go away, but they won’t address the underlying problems.

What Is Truth?

As a huge proponent for media literacy for over a decade, I’m struggling with the ways in which I missed the mark. The reality is that my assumptions and beliefs do not align with most Americans. Because of my privilege as a scholar, I get to see how expert knowledge and information is produced and have a deep respect for the strengths and limitations of scientific inquiry. Surrounded by journalists and people working to distribute information, I get to see how incentives shape information production and dissemination and the fault lines of that process. I believe that information intermediaries are important, that honed expertise matters, and that no one can ever be fully informed. As a result, I have long believed that we have to outsource certain matters and to trust others to do right by us as individuals and society as a whole. This is what it means to live in a democracy, but, more importantly, it’s what it means to live in a society.

In the United States, we’re moving towards tribalism, and we’re undoing the social fabric of our country through polarization, distrust, and self-segregation. And whether we like it or not, our culture of doubt and critique, experience over expertise, and personal responsibility is pushing us further down this path.

Media literacy asks people to raise questions and be wary of information that they’re receiving. People are. Unfortunately, that’s exactly why we’re talking past one another.

The path forward is hazy. We need to enable people to hear different perspectives and make sense of a very complicated — and in many ways, overwhelming — information landscape. We cannot fall back on standard educational approaches because the societal context has shifted. We also cannot simply assume that information intermediaries can fix the problem for us, whether they be traditional news media or social media. We need to get creative and build the social infrastructure necessary for people to meaningfully and substantively engage across existing structural lines. This won’t be easy or quick, but if we want to address issues like propaganda, hate speech, fake news, and biased content, we need to focus on the underlying issues at play. No simple band-aid will work.

Special thanks to Amanda Lenhart, Claire Fontaine, Mary Madden, and Monica Bulger for their feedback!

This post was first published as part of a series on media, accountability, and the public sphere. See also:

by zephoria at January 09, 2017 01:13 PM

January 08, 2017

MIMS 2012

Sol LeWitt - Wall Drawing

I recently saw Sol LeWitt’s Wall Drawing #273 at the SF MOMA, which really stayed with me after leaving the museum. In particular, I like that it wasn’t drawn by the artist himself, but rather he wrote instructions for draftspeople to draw this piece directly on the walls of the museum, thus embracing some amount of variability. From the museum’s description:

As his works are executed over and over again in different locations, they expand or contract according to the dimensions of the space in which they are displayed and respond to ambient light and the surfaces on which they are drawn. In some instances, as in this work, those involved in the installation make decisions impacting the final composition.

Sol LeWitt's Wall Drawing #273 Sol LeWitt’s Wall Drawing #273

This embrace of variability reminds me of the web. People browse the web on different devices that have different sizes and capabilities. We can’t control how people will experience our websites. Since LeWitt left instructions for creating his pieces, I realized I could translate those instructions into code, and embrace the variability of the web in the process. The result is this CodePen.

See the Pen Sol LeWitt – Wall Drawing #273 by Jeff (@jlzych) on CodePen.

LeWitt left the following instructions:

A six-inch (15 cm) grid covering the walls. Lines from corners, sides, and center of the walls to random points on the grid.

1st wall: Red lines from the midpoints of four sides;

2nd wall: Blue lines from four corners;

3rd wall: Yellow lines from the center;

4th wall: Red lines from the midpoints of four sides, blue lines from four corners;

5th wall: Red lines from the midpoints of four sides, yellow lines from the center;

6th wall: Blue lines from four corners, yellow lines from the center;

7th wall: Red lines from the midpoints of four sides, blue lines from four corners, yellow lines from the center.

Each wall has an equal number of lines. (The number of lines and their length are determined by the draftsman.)

As indicated in the instructions, there are 7 separate walls with an equal number of lines, the number and length of which are determined by the draftsperson. To simulate the decisions the draftspeople make, I included controls to let people set how many lines should be drawn, and toggle which walls to see. I let each color be toggleable, as opposed listing out walls 1-7, since each wall is just different combinations of the red, blue, and yellow lines.

The end result fits right in with how human draftspeople have turned these instructions into art. The most notable difference I see between a human and a program is the degree of randomness in the final drawing. From comparing the output of the program to versions done by people, the ones drawn by people seem less “random.” I get the sense that people have a tendency to more evenly distribute the lines to points throughout the grid, whereas the program can create clusters and lines that are really close to each other which a person would consider unappealing and not draw.

It makes me wonder how LeWitt would respond to programmatic versions of his art. Is he okay with computers making art? Were his instructions specifically for people, or would he have embraced using machines to generate his work had the technology existed in his time? How “random” did he want people make these drawings? Does he like that a program is more “random,” or did he expect and want people to make his wall drawings in a way that they would find visually pleasing? We’ll never know, but it was fun to interpret his work through the lens of today’s technology.

by Jeff Zych at January 08, 2017 11:10 PM

January 06, 2017

Ph.D. alumna

Hacking the Attention Economy

For most non-technical folks, “hacking” evokes the notion of using sophisticated technical skills to break through the security of a corporate or government system for illicit purposes. Of course, most folks who were engaged in cracking security systems weren’t necessarily in it for espionage and cruelty. In the 1990s, I grew up among teenage hackers who wanted to break into the computer systems of major institutions that were part of the security establishment, just to show that they could. The goal here was to feel a sense of power in a world where they felt pretty powerless. The rush was in being able to do something and feel smarter than the so-called powerful. It was fun and games. At least until they started getting arrested.

Hacking has always been about leveraging skills to push the boundaries of systems. Keep in mind that one early definition of a hacker (from the Jargon File) was “A person who enjoys learning the details of programming systems and how to stretch their capabilities, as opposed to most users who prefer to learn only the minimum necessary.” In another early definition (RFC:1392), a hacker is defined as “A person who delights in having an intimate understanding of the internal workings of a system, computers and computer networks in particular.” Both of these definitions highlight something important: violating the security of a technical system isn’t necessarily the primary objective.

Indeed, over the last 15 years, I’ve watched as countless hacker-minded folks have started leveraging a mix of technical and social engineering skills to reconfigure networks of power. Some are in it for the fun. Some see dollar signs. Some have a much more ideological agenda. But above all, what’s fascinating is how many people have learned to play the game. And in some worlds, those skills are coming home to roost in unexpected ways, especially as groups are seeking to mess with information intermediaries in an effort to hack the attention economy.

CC BY-NC 2.0-licensed photo by artgraff.

It all began with memes… (and porn…)

In 2003, a 15-year-old named Chris Poole started an image board site based on a Japanese trend called 4chan. His goal was not political. Rather, like many of his male teenage peers, he simply wanted a place to share pornography and anime. But as his site’s popularity grew, he ran into a different problem — he couldn’t manage the traffic while storing all of the content. So he decided to delete older content as newer content came in. Users were frustrated that their favorite images disappeared so they reposted them, often with slight modifications. This gave birth to a phenomenon now understood as “meme culture.” Lolcats are an example. These are images of cats captioned with a specific font and a consistent grammar for entertainment.

Those who produced meme-like images quickly realized that they could spread like wildfire thanks to new types of social media (as well as older tools like blogging). People began producing memes just for fun. But for a group of hacker-minded teenagers who were born a decade after I was, a new practice emerged. Rather than trying to hack the security infrastructure, they wanted to attack the emergent attention economy. They wanted to show that they could manipulate the media narrative, just to show that they could. This was happening at a moment when social media sites were skyrocketing, YouTube and blogs were challenging mainstream media, and pundits were pushing the idea that anyone could control the narrative by being their own media channel. Hell, “You” was TIME Magazine’s person of the year in 2006.

Taking a humorist approach, campaigns emerged within 4chan to “hack” mainstream media. For example, many inside 4chan felt that widespread anxieties about pedophilia were exaggerated and sensationalized. They decided to target Oprah Winfrey, who, they felt, was amplifying this fear-mongering. Trolling her online message board, they got her to talk on live TV about how “over 9,000 penises” were raping children. Humored by this success, they then created a broader campaign around a fake character known as Pedobear. In a different campaign, 4chan “b-tards” focused on gaming the TIME 100 list of “the world’s most influential people” by arranging it such that the first letter of each name on the list spelled out “Marblecake also the game,” which is a known in-joke in this community. Many other campaigns emerged to troll major media and other cultural leaders. And frankly, it was hard not to laugh when everyone started scratching their heads about why Rick Astley’s 1987 song “Never Gonna Give You Up” suddenly became a phenomenon again.

By engaging in these campaigns, participants learned how to shape information within a networked ecosystem. They learned how to design information for it to spread across social media.

They also learned how to game social media, manipulate its algorithms, and mess with the incentive structure of both old and new media enterprises. They weren’t alone. I watched teenagers throw brand names and Buzzfeed links into their Facebook posts to increase the likelihood that their friends would see their posts in their News Feed. Consultants starting working for companies to produce catchy content that would get traction and clicks. Justin Bieber fans ran campaign after campaign to keep Bieber-related topics in Twitter Trending Topics. And the activist group Invisible Children leveraged knowledge of how social media worked to architect the #Kony2012 campaign. All of this was seen as legitimate “social media marketing,” making it hard to detect where the boundaries were between those who were hacking for fun and those who were hacking for profit or other “serious” ends.

Running campaigns to shape what the public could see was nothing new, but social media created new pathways for people and organizations to get information out to wide audiences. Marketers discussed it as the future of marketing. Activists talked about it as the next frontier for activism. Political consultants talked about it as the future of political campaigns. And a new form of propaganda emerged.

The political side to the lulz

In her phenomenal account of Anonymous — “Hacker, Hoaxer, Whistleblower, Spy” — Gabriella Coleman describes the interplay between different networks of people playing similar hacker-esque games for different motivations. She describes the goofy nature of those “Anons” who created a campaign to expose Scientology, which many believed to be a farcical religion with too much power and political sway. But she also highlights how the issues became more political and serious as WikiLeaks emerged, law enforcement started going after hackers, and the Arab Spring began.

CC BY-SA 3.0-licensed photo by Essam Sharaf via Wikimedia Commons.

Anonymous was birthed out of 4chan, but because of the emergent ideological agendas of many Anons, the norms and tactics started shifting. Some folks were in it for fun and games, but the “lulz” started getting darker and those seeking vigilante justice started using techniques like “doxing”to expose people who were seen as deserving of punishment. Targets changed over time, showcasing the divergent political agendas in play.

Perhaps the most notable turn involved “#GamerGate” when issues of sexism in the gaming industry emerged into a campaign of harassment targeted at a group of women. Doxing began being used to enable “swatting” — in which false reports called in by perpetrators would result in SWAT teams sent to targets’ homes. The strategies and tactics that had been used to enable decentralized but coordinated campaigns were now being used by those seeking to use the tools of media and attention to do serious reputational, psychological, economic, and social harm to targets. Although 4chan had long been an “anything goes” environment (with notable exceptions), #GamerGate became taboo there for stepping over the lines.

As #GamerGate unfolded, men’s rights activists began using the situation to push forward a long-standing political agenda to counter feminist ideology, pushing for #GamerGate to be framed as a serious debate as opposed to being seen as a campaign of hate and harassment. In some ways, the resultant media campaign was quite successful: major conferences and journalistic enterprises felt the need to “hear both sides” as though there was a debate unfolding. Watching this, I couldn’t help but think of the work of Frank Luntz, a remarkably effective conservative political consultant known for reframing issues using politicized language.

As doxing and swatting have become more commonplace, another type of harassment also started to emerge en masse: gaslighting. This term refers to a 1944 Ingrid Bergman film called “Gas Light” (which was based on a 1938 play). The film depicts psychological abuse in a domestic violence context, where the victim starts to doubt reality because of the various actions of the abuser. It is a form of psychological warfare that can work tremendously well in an information ecosystem, especially one where it’s possible to put up information in a distributed way to make it very unclear what is legitimate, what is fake, and what is propaganda. More importantly, as many autocratic regimes have learned, this tactic is fantastic for seeding the public’s doubt in institutions and information intermediaries.

The democratization of manipulation

In the early days of blogging, many of my fellow bloggers imagined that our practice could disrupt mainstream media. For many progressive activists, social media could be a tool that could circumvent institutionalized censorship and enable a plethora of diverse voices to speak out and have their say. Civic minded scholars were excited by “smart mobs” who leveraged new communications platforms to coordinate in a decentralized way to speak truth to power. Arab Spring. Occupy Wall Street. Black Lives Matter. These energized progressives as “proof” that social technologies could make a new form of civil life possible.

I spent 15 years watching teenagers play games with powerful media outlets and attempt to achieve control over their own ecosystem. They messed with algorithms, coordinated information campaigns, and resisted attempts to curtail their speech. Like Chinese activists, they learned to hide their traces when it was to their advantage to do so. They encoded their ideas such that access to content didn’t mean access to meaning.

Of course, it wasn’t just progressive activists and teenagers who were learning how to mess with the media ecosystem that has emerged since social media unfolded. We’ve also seen the political establishment, law enforcement, marketers, and hate groups build capacity at manipulating the media landscape. Very little of what’s happening is truly illegal, but there’s no widespread agreement about which of these practices are socially and morally acceptable or not.

The techniques that are unfolding are hard to manage and combat. Some of them look like harassment, prompting people to self-censor out of fear. Others look like “fake news”, highlighting the messiness surrounding bias, misinformation, disinformation, and propaganda. There is hate speech that is explicit, but there’s also suggestive content that prompts people to frame the world in particular ways. Dog whistle politics have emerged in a new form of encoded content, where you have to be in the know to understand what’s happening. Companies who built tools to help people communicate are finding it hard to combat the ways their tools are being used by networks looking to skirt the edges of the law and content policies. Institutions and legal instruments designed to stop abuse are finding themselves ill-equipped to function in light of networked dynamics.

The Internet has long been used for gaslighting, and trolls have long targeted adversaries. What has shifted recently is the scale of the operation, the coordination of the attacks, and the strategic agenda of some of the players.

For many who are learning these techniques, it’s no longer simply about fun, nor is it even about the lulz. It has now become about acquiring power.

A new form of information manipulation is unfolding in front of our eyes. It is political. It is global. And it is populist in nature. The news media is being played like a fiddle, while decentralized networks of people are leveraging the ever-evolving networked tools around them to hack the attention economy.

I only wish I knew what happens next.

This post was first published as part of a series on media, accountability, and the public sphere. See also:


This post was also translated to Portuguese

by zephoria at January 06, 2017 09:12 AM