School of Information Blogs

July 05, 2019

Ph.D. student

CHI 2019 Annotated Bibliography (Part 1)

After the 2019 CHI conference (technically the ACM CHI Conference on Human Factors in Computing Systems) and blogging about our own paper on design approaches to privacy, I wanted to highlight other work that I found interesting or thought provoking in a sort of annotated bibliography. Listed in no particular order, though most relate to one or more themes that I’m interested in (privacy, design research, values in design practice, critical approaches, and speculative design).

(I’m still working through the stack of CHI papers that I downloaded to read, so hopefully this is part 1 of two or three posts).

  • James Pierce. 2019. Smart Home Security Cameras and Shifting Lines of Creepiness: A Design-Led Inquiry. Paper 45, 14 pages. https://doi.org/10.1145/3290605.3300275 — Pierce uses a design-led inquiry to illustrate and investigate three data practices of IoT products and services (digital leakage, hole-and-corner applications, and foot-in-the-door devices), providing some conceptual scaffolding for thinking about how privacy emerges differently in relation to varying technical (and social) configurations. Importantly, I like that Pierce is pushing design researchers to go beyond conceptualizing privacy as “creepiness”, through his exploration of three tropes of data practices.
  • Renee Noortman, Britta F. Schulte, Paul Marshall, Saskia Bakker, and Anna L. Cox. 2019. HawkEye – Deploying a Design Fiction Probe. Paper 422, 14 pages. https://doi.org/10.1145/3290605.3300652 — Building on Shulte’s concept of a “design probe,” Noortman et al. participants interact with a (beautifully designed!) control panel in the home over 3 weeks to act in the role of a caregiver in a design fiction about dementia care. The paper furthers the use of design fiction as a participatory and embodied experience, and as a data collection tool for research. The authors provide some useful reflections on the ways participants imagined and helped build out the fictional world in which they were participating.
  • Yaxing Yao, Justin Reed Basdeo, Smirity Kaushik, and Yang Wang. 2019. Defending My Castle: A Co-Design Study of Privacy Mechanisms for Smart Homes. Paper 198, 12 pages. https://doi.org/10.1145/3290605.3300428 — Yao et al. use co-design techniques to explore privacy concerns and potential privacy mechanisms with a range of participants (including diversity in age). Some interesting ideas arise from participants, such as creating an IoT “incognito mode,” as well as raising concerns about accessibility for these systems. Sometimes tensions arise, with participants wanting to trust IoT agents like Alexa as a ‘true friend’ who won’t spy on them, yet harboring some distrust of the companies creating these systems. I like that the authors point to a range of modalities for where we might place responsibility for IoT privacy – in the hardware, apps, platform policy, or operating modes. It’s a nice tie into questions others have asked about how responsibility for privacy is distributed, or what happens when we “handoff” responsibility for protecting values from one part of a sociotechnical system to another part.
  • Kristina Andersen and Ron Wakkary. 2019. The Magic Machine Workshops: Making Personal Design Knowledge. Paper 112, 13 pages. https://doi.org/10.1145/3290605.3300342 — Andersen and Wakkary outline a set of workshop techniques to help participants generate personal materials. I appreciate the commitments made in the paper, such as framing workshops as something that should benefit participants themselves, as well as researchers, in part by centering the workshop on the experience of individual participants. They propose a set of workshop elements; it’s nice to see these explicated here, as they help convey a lot of tacit knowledge about running workshops (the details of which are often abbreviated in most papers’ methods sections). I particularly like the “prompt” element to help provide a quick initial goal for participants to engage in while situating the workshop. While the example workshops used in the paper focus on making things out of materials, I’m curious if some of the outlined workshop elements might be useful in other types of workshop-like activities.
  • Laura Devendorf, Kristina Andersen, Daniela K. Rosner, Ron Wakkary, and James Pierce. 2019. From HCI to HCI-Amusement: Strategies for Engaging what New Technology Makes Old. Paper 35, 12 pages. https://doi.org/10.1145/3290605.3300265 – Devendorf et al. start by (somewhat provocatively) asking what it might be like to explore a “non-contribution” in HCI. The paper walks through a set of projects and works its way to a set of reflections about the norms of HCI research focusing on the “technological new,” asking what it might mean instead to take the present or the banal more seriously. The paper also starts to ask what types of epistemologies are seen as legitimate in HCI. The paper calls for “para-research” within HCI as a way to focus attention on what is left out or unseen through dominant HCI practices.
  • Colin M. Gray and Shruthi Sai Chivukula. 2019. Ethical Mediation in UX Practice. Paper 178, 11 pages. https://doi.org/10.1145/3290605.3300408 – Through a set of case study observations and interview, Gray and Chivukula study how ethics are conducted in practice by UX designers. The paper provides a lot of good detail about ways UX designers bring ethics to the forefront and some of the challenges they face. The authors contribute a set of relationships or mediators, connecting individual designers’ practices to organizational practices to applied ethics.
  • Sarah E. Fox, Kiley Sobel, and Daniela K. Rosner. 2019. Managerial Visions: Stories of Upgrading and Maintaining the Public Restroom with IoT. Paper 493, 15 pages. https://doi.org/10.1145/3290605.3300723 – Through interviews, participant observations, and analysis of media materials, Fox et al. investigate managerial labor in regulating access to public bathroom resources. They craft a story of regulation (in a broad sense), about how the bathroom’s management is entangled among local politics and on-the-ground moral beliefs, corporate values, imagined future efficiencies through technology, and strategic uses of interior and technological design. This entanglement allows for particular types of control, allowing some access to resources and making it harder for others.
  • William Gaver, Andy Boucher, Michail Vanis, Andy Sheen, Dean Brown, Liliana Ovalle, Naho Matsuda, Amina Abbas-Nazari, and Robert Phillips. 2019. My Naturewatch Camera: Disseminating Practice Research with a Cheap and Easy DIY Design. Paper 302, 13 pages. https://doi.org/10.1145/3290605.3300532 – Gaver et al. detail a DIY nature camera, shown in partnership with a BBC television series and built by over 1000 people. Interestingly, while similar tools could be used for citizen science efforts, the authors are clear that they are instead trying to create a type of public engagement with research that focuses on creating more intimate types of encounters, and engaging people with less technical expertise in making. The cameras help create intimate “encounters” with local wildlife (plus the paper includes some cute animal photos!).
  • Sandjar Kozubaev, Fernando Rochaix, Carl DiSalvo, and Christopher A. Le Dantec. 2019. Spaces and Traces: Implications of Smart Technology in Public Housing. Paper 439, 13 pages. https://doi.org/10.1145/3290605.3300669 — Kozubaev et al.’s work adds to a growing body of work questioning and reframing what the “home” means in relation to smart home technology. The authors conduct design workshops with residents (and some managers) in US public housing, providing insight into housing situations where (1) the “home” is not a single-family middle class grouping, and (2) the potential end users of smart home technologies may not have control or consent over the technologies used, and are already subject to various forms of state surveillance.
  • Shruthi Sai Chivukula, Chris Watkins, Lucca McKay, and Colin M. Gray. 2019. “Nothing Comes Before Profit”: Asshole Design In the Wild. Paper LBW1314, 6 pages. https://doi.org/10.1145/3290607.3312863 — This late breaking work by Chivukala et al investigates the /r/assholedesign subreddit to explore the concept of “asshole design,” particularly in comparison to the concept of “dark patterns.” They find that asshole design uses some dark pattern strategies, but that dark patterns tend to trick users into doing certain things, while asshole design often restricts uses of products and more often include non-digital artifacts. I think there may be an interesting future regulatory discussion about asshole design (and dark patterns). On one hand, one might consider whether dark pattern or asshole design practices might fit under the FTC’s definition of “unfair and deceptive practices” for possible enforcement action against companies. On the other, as some legislators are introducing bills to ban the use of dark patterns – it becomes very important to think carefully about how dark patterns are defined, and what might get included and excluded in those definitions; the way that this work suggests a set of practices related to, but distinct from, dark patterns could help inform future policy discussions.

by Richmond at July 05, 2019 04:35 AM

June 29, 2019

Ph.D. student

Life update: new AI job

I started working at a new job this month. It is at an Artificial Intelligence startup. I go to an office, use GitHub and Slack, and write software, manipulate data, and manage cloud computing instances for a living. As at this point I am relatively senior as an employee, I’m also involved in meetings of a managerial nature. There are lots of questions about how we organize ourselves and how we interact with other companies that I get to weigh in on.

This is a change from being primarily a postdoctoral researcher or graduate student. That change is apparent even though during my time as a latter I was doing similar industrial work on a part-time basis. Now, at the startup, the purpose of my work is more clearly oriented towards our company’s success.

There is something very natural about this environment for me. It is normal. I am struck by this normality because I have for years been interacting with academics who claim to be studying the very thing that I’m now doing.

I have written a fair bit here about “AI Ethics”. Much of this has been written with frustration at the way the topic is “studied”. In retrospect, a great deal of “AI Ethics” literature is about how people (the authors) don’t like the direction “the conversation” is going. My somewhat glib attitude towards it is that the problem is that most people talking about “AI Ethics” don’t know what they are talking about, and don’t feel like they have to know what they are talking about to have a good point of view on the subject. “AI Ethics” is often an expression of the point of view that while those that are “doing” AI are being somehow inscrutable and maybe dangerous, they should be tamed into accountability towards those who are not doing it, and therefore don’t really know about it. In other words, AI Ethics, as a field, is a way of articulating the interest of one class of people with one relationship to capital to another class of people with a different relationship to capital.

Perhaps I am getting ahead of myself. Artificial Intelligence is capital. I mean that in an economic sense. The very conceit that it is possible to join an “AI Startup”, whose purpose is to build an AI and thereby increase the productivity of its workers and its value to shareholders, makes the conclusion–“AI is capital”–a tautological one. Somehow, this insight rarely makes it into the “AI Ethics” literature.

I have not “left academia” entirely. I have some academic projects that I’m working on. One of these, in collaboration with Bruce Haynes, is a Bourdieusian take on Contextual Integrity. I’m glad to be able to do this kind of work.

However, one source of struggle for me in maintaining an academic voice in my new role, aside from the primary and daunting one of time management, is that many of the insights I would bring to bear on the discussion are drawn from experience. The irony of a training in qualitative and “ethnographic” research into use of technology, with all of its questions of how to provide an emic account based on the testimony of informants, is that I am now acutely aware of how my ability to communicate is limited, transforming me from a “subject” of observation into, in some sense, an “object”.

I enjoy and respect my new job and role. I appreciate that, being a real company trying to accomplish something and not a straw man used to drive a scholarly conversation, “AI” means in our context a wide array of techniques–NLP, convex optimization, simulation, to name a few–smartly deployed in order to best complement the human labor that’s driving things forward. We are not just slapping a linear regression on a problem and calling it “AI”.

I also appreciate, having done work on privacy for a few years, that we are not handling personal data. We are using AI technologies to solve problems that aren’t about individuals. A whole host of “AI Ethics” issues which have grown to, in some corners, change the very meaning of “AI” into something inherently nefarious, are irrelevant to the business I’m now a part of.

Those are the “Pros”. If there were any “Cons”, I wouldn’t be able to tell you about them. I am now contractually obliged not to. I expect this will cut down on my “critical” writing some, which to be honest I don’t miss. That this is part of my contract is, I believe, totally normal, though I’ve often worked in abnormal environments without this obligation.

Joining a startup has made me think hard about what it means to be part of a private organization, as opposed to a public one. Ironically, this public/private institutional divide rarely makes its way into academic conversations about personal privacy and the public sphere. That’s because, I’ll wager, academic conversations themselves are always in a sense public. The question motivating that discourse is “How do we, as a public, deal with privacy?”.

Working at a private organization, the institutional analogue of privacy is paramount. Our company’s DNA is its intellectual property. Our company’s face is its reputation. The spectrum of individual human interests and the complexity of their ordering has its analogs in the domain of larger sociotechnical organisms: corporations and the like.

Paradoxically, there is no way to capture these organizational dynamics through “thick description”. It is also difficult to capture them through scientific modes of visualization. Indeed, one economic reason to form an AI startup is to build computational tools for understanding the nature of private ordering among institutions. These tools allow for comprehension of a phenomenon that cannot be easily reduced to the modalities of sight or speech.

I’m very pleased to be working in this new way. It is in many ways a more honest line of work than academia has been for me. I am allowed now to use my full existence as a knowing subject: to treat technology as an instrument for understanding, to communicate not just in writing but through action. It is also quieter work.

by Sebastian Benthall at June 29, 2019 02:42 PM

June 01, 2019

MIMS 2012

Interface Lovers Interview

Last week I was featured on Interface Lovers, a site that “put[s] the spotlight on designers who are creating the future and touching the lives of many.” Read my response to their first question about what led me into design.


What led you into design?

In some ways, I feel like I’ve always been designing, and in other ways, I feel like I stumbled into it without realizing it. I’ve been into art and drawing since I could hold a pencil, taking art classes and doodling throughout my childhood. Then in high school, I signed up for a web design class. The summer before the class even started I was so excited that I bought a book on web development — “Learn HTML in 24 hours” — and taught myself how to build web pages. By the time the school year started, I had already put a website online. Being able to create something that anyone, anywhere in the world could immediately see was completely intoxicating to me.

From there, I went down a rabbit hole of learning Photoshop, Illustrator, 3D modeling, Flash, and any creative technologies even vaguely related to web design. That led me to get a degree in Graphic Communication at Cal Poly, San Luis Obispo, with a concentration in new media. Back then (early 2000s), there weren’t many web design programs, and the ones that existed were shoe-horned into graphic design and art programs. Cal Poly’s graphic communication program was the most technical of the bunch.

As part of my degree at Cal Poly, I took a computer science class and learned C and Java. I found programming to be super fun, too and went deeper down the stack into backend technologies and database development. Basically, anything tangentially related to web development interested me, so I took every class I could.

After college, I went down the programming path and got a job as a data warehouse developer. I went technical because the analytical nature of it meant you know if your work is good — it either works, or it doesn’t. I found design to be very subjective and didn’t feel confident that my web design work was “good” (however that might be measured).

I joined a small company, so I was doing database design, ETLs, backend programming, frontend programming, and UI design. Over time I discovered that updating the interface, even minor updates, elicited strong positive reactions and gratitude from customers, whereas re-factoring a database to cut query times in half rarely did. I realized I wanted to work closer to the customer.

I started spending more time designing user interfaces and studying usability testing. I discovered it married the analytical, scientific part of my brain (which drew me to programming in the first place) to the subjective, intuitive part. This was the tool I needed to “prove” my designs were “right” (which I now know isn’t exactly true, but it felt this way back then).

This made me want to formally study design, so I got my master’s degree at UC Berkeley’s School of Information. The program is the study of technology, broadly speaking — how technology impacts society, how it changes people and their lives, and how to build technology with the needs of people at the center of it. The program was great. It only had a few required classes, then you could take basically whatever you wanted. So I took classes that sounded the most fun and interesting — design, programming, psychology, research, product development, business, and more. I learned a ton about product development and user-centered design while I was there.

One of my favorite classes was behavioral economics for the web class, in which we explored how to apply behavioral economics principles to web sites and use A/B testing to measure their impact. That led me to join Optimizely after grad school, which at the time (2012) was just a simple A/B testing product for the web. I started out doing UI engineering, then switched into product design as the company grew. When I officially became a product designer I felt like I fell into it by accident. It was a result of what the company needed as it grew, not my specific career goal. But when I looked back over what led me there I realized I had always been designing in one way or another.

The company was growing fast, so I was presented the opportunity to move into management. I was resistant at first, but when I realized I could have a bigger impact in that position, I jumped on it. Eventually, my boss left, and I became the Head of Design, leading a team of product designers, user researchers, and UI engineers.

After 5 and a half years at Optimizely, I was ready for a break and new challenges, so I left and took some time off. I realized I wanted to be hands-on again and ended up joining Casetext as a product designer. They’re building legal research tools for lawyers, which pushed me to be a better designer because I was designing for people with expertise I don’t have and can’t acquire.

After a few months, it wasn’t the right fit, so now I’m at Gladly managing their product design team. It feels great to be in management again, working cross-functionally to deliver great experiences to our customers, while growing and nurturing the talents of my team.

Read the full interview on Interface Lovers.

by Jeff Zych at June 01, 2019 11:03 PM

May 27, 2019

MIMS 2014

My Take: WordPress vs. Shopify

WordPress (plus WooCommerce) and Shopify are two of the most popular out-of-the-box e-commerce website solutions out there. Both platforms let you to sell products on the internet without needing to design your own website. But no matter which route you go, good analytics will always be necessary to spot trouble spots in user traffic, and to identify areas for potential growth. With their widespread use, I wanted to re-do my old analytics implementations (this post and this post) in both WordPress and Shopify and give my take on the two platforms.

Set-up/Installation

Winner: Shopify

In terms of getting off the ground with an e-commerce store, Shopify is the more straightforward of the two. It is specifically designed for e-commerce whereas WordPress is a more wide-ranging platform that has been adapted for e-commerce through plug-ins, the most popular among them being WooCommerce. To get set-up with WordPress + WooCommerce, you have to find somewhere to host your site, install WordPress on the server, and configure all the additional necessary plug-ins. Since I was just playing around, I ran everything through MAMP on localhost. Since I’ve set up my own server quite a few times at this point, I didn’t find this too difficult, but there’s no doubt that Shopify makes the process more simple by hosting the site for you.

Integrating with Google Analytics Suite

Winner: Close, but WordPress

Both WordPress and Shopify offer pretty easy integration of the Google Analytics Suite into your e-commerce site. The awesome plug-in, GTM4WP, is great for getting all three of the products I wanted (Tag Manager, Analytics, and Optimize) up and running on WordPress without having to touch any code. Shopify lets you add GA through a configuration on their preference panel. For GTM, you have to directly edit your site’s theme.liquid file. But even once you do this, GTM won’t fire on your checkout page unless you’re a Shopify Plus customer who is allowed to edit your checkout.liquid file as well.

Enter the workaround found by the very knowledgable Julian Jeunemann (founder of the fantastic resource Measureschool) for getting GTM to fire properly on all Shopify pages (including the checkout page). The workaround involves adding the GTM container script as custom javascript to run as part of Shopify’s existing GA integration. The solution seems to work; the GTM tag fires on the checkout page (though you need to remove it from the head tag in the theme.liquid file so that GTM doesn’t fire twice on all pages). Meanwhile, Google Optimize can be implemented through GTM—though I found that the Optimize tag mysteriously wouldn’t fire on the checkout page. So using the workaround is fine as long as an A/B test doesn’t depend on the checkout page.

Given the somewhat hack-y workaround required to get the Google Analytics Suite going in Shopify, I’m gonna have to go with WordPress for more seamless Google integration with GTM4WP.

Flexibility/Customization

Winner: WordPress

I wanted to try a different A/B test this time round in place of the red button/green button experiment from my prior post. This time I imagined an A/B test where an e-commerce site would test the presence of a coupon code on its checkout page to see whether it would boost their revenue. Like before, I used selenium to simulate traffic to my site and make sure everything was being tracked properly in GA.

To make things a bit more challenging, I wanted to generate a random coupon code for each user rather than just use the same code for everyone. In WordPress, coupon codes are stored in a backend database; in order for a code to be properly applied, it must be written to the database when it’s generated. Luckily, I could use a plug-in, PHP Code Widget, to 1) randomly generate the coupon code when the checkout page loads, 2) write it to the database, and 3) display it to the user.

An extra step is required so that when a user enters the coupon code, the event can be properly tracked by Google Analytics and propagate forward to Google Optimize for the A/B test. Basically, each user’s coupon code must be passed to GTM so that GTM can verify the code was entered correctly. I pass the coupon code to GTM by pushing to the dataLayer via javascript. Fortunately, with the PHP code widget, I can execute javascript in PHP with a simple echo call:

echo "dataLayer.push({'coupon_code': '$coupon_code' });";

When WordPress renders the page, this script gets written into the HTML page source and is run in javascript by the browser. Now GTM knows the user’s unique coupon code. From there, a trigger is set in GTM that fires when a user submits the coupon code. A custom javascript variable configured in GTM verifies whether the code was entered correctly and the resulting value (either true/false) is passed onto GA as a Goal that is then used as the target metric in a Google Optimize A/B test.

Trying to replicate this same experiment in Shopify is a challenge. Displaying widgets in Shopify is an add-on feature you have to pay for (in WordPress it’s free). And trying to generate coupon codes individually for each user at page-load time seems to involve wading far too deep into Shopify’s closed source theme files (which they can change on you at any time). This level of customization/flexibility just isn’t what Shopify is built for. Personally, I prefer WordPress for the extra visibility it gives you under the hood. You might lose some convenience that way, but you gain much more control.

Cost

Winner: WordPress

This might be the most important consideration for most—and the area in which I think WordPress really comes out ahead. With WordPress, you still have to pay someone to host your site, but use of the platform itself is free. GoDaddy has introductory hosting plans as low as $7/month and yearly SSL certificates for $38/yr (SSL certificates are crucial for a secure store). Shopify’s most basic plan, by contrast, is $29/month—but then you’ll have to pay for any plug-ins you need to get the functionality you want. So if you want to keep your website administration costs slim, WordPress is the way to go.

by dgreis at May 27, 2019 09:58 AM

May 19, 2019

MIMS 2012

What to Expect if I'm Your Manager

This past January I started my new gig at Gladly, managing the product design team. Unlike at Optimizely, where I transitioned into managing people I already worked with, at Gladly I inherited a team who didn’t know me at all. Inspired by my new boss who did the same thing, I wrote a document to describe what my new team could expect from me as their manager. I decided to re-publish that document here. Enjoy.


This doc is an accelerator in building our relationship. It will take a little while for us to find our rhythm, but we can try to short-circuit the storming phase and get some things on the table from the get go. I look forward to learning similar things about you — at your time.

Some of these bullet points are aspirational. They are standards of behavior that I’m trying to hold myself accountable to. If you ever believe I’m falling short, please tell me.

  • My goal is to provide an environment where you can do your best work.
  • I will support and encourage you in doing your best work, not tell you what to do. I want each of you to be autonomous and to make your own decisions. This means you may occasionally make mistakes, which is perfectly fine. I’ll be there to help you pick up the pieces.
  • In supporting you doing your best work, I will help remove roadblocks that are getting in your way.
  • I like to listen and gather context, data, and understanding before making decisions or passing judgement. This means I may ask you a lot of questions to build my knowledge. It also means I will sometimes stay quiet and hold the space for you to keep speaking. It doesn’t mean I’m questioning you, your abilities, or your choices.
  • I don’t like to waste my time, and I don’t want to waste yours. If you ever feel like a meeting, project, etc., isn’t a good use of your time, please tell me.
  • I take a lot of notes, and write a lot of documents to codify conversations.
  • I’m biased towards action and shipping over perfection and analysis paralysis. There’s no better test of a product or feature than getting it in the hands of real users, then iterating and refining.
  • I will try to give you small, frequent feedback (positive and negative) in the moment, when it’s fresh, and in person. I don’t like batching up feedback for 1:1s or performance reviews, which turns those into dreadful affairs that no one enjoys, and leads to stale, ineffective feedback. If you have a preferred way of receiving feedback, please tell me.
  • My goal is to give you more positive feedback than critical feedback. There’s always positive feedback to give. And positive feedback helps tell you what you’re doing well, and to keep doing it. If you feel like I haven’t given you positive feedback recently, please tell me.
  • I like feedback to be a 2-way street, so if there’s anything I’m doing that you don’t like, annoys you, etc., please let me know. If there’s things that I’m doing well that you want me to keep doing, also let me know! Feel free to pull me aside, or tell me in our 1:1s.
  • 1:1s are your meetings. You own the agenda. They can be as structured or unstructured as you want. I will occasionally have topics to discuss, but most of the time your items come first.
  • You own your career growth. I am there to support and encourage you and to help you find opportunities for growth, but ultimately you’re in control of your career.
  • I trust you to make good decisions, get your work done, and use your time wisely. I trust you to not abuse our unlimited vacation policy and to take time off when you need it. If you haven’t taken any time off in awhile, I’ll probably encourage you to take a vacation :) I’m not particularly worried about what hours you work, or where you work, as long as you’re getting your work done (and aren’t working too much).
  • Finally, here’s a list of my beliefs about design.

by Jeff Zych at May 19, 2019 11:46 PM

May 10, 2019

Ph.D. student

Where’s the Rest of Design? Or, Bringing Design to the Privacy Table: Broadening “Design” in “Privacy by Design” Through HCI [Paper Talk]

This post is based on a talk given at the 2019 ACM CHI Conference on Human Factors in Computing Systems (CHI 2019), in Glasgow, UK. The full research paper by Richmond Wong and Deirdre Mulligan that the talk is based on, “Bringing Design to the Privacy Table: Broadening “Design” in “Privacy by Design” Through the Lens of HCI” can be found here: [Official ACM Version] [Open Access Pre-Print Version]

In our paper “Bringing Design to the Privacy Table: Broadening Design in Privacy by Design,” we conduct a curated literature review to make two conceptual argument arguments:

  1. There is a broad range of design practices used in human computer interaction (HCI) research which have been underutilized in Privacy By Design efforts.
  2. Broadening privacy by design’s notion of what “design” can do can help us more fully address privacy, particularly in situations where we don’t yet know what concepts or definitions of privacy are at stake.

But let me start with some background and motivation. I’m both a privacy researcher—studying studying how to develop technologies that respect privacy—and I’m a design researcher, who designs things to learn about the world.

I was excited several years ago to hear about a growing movement called “Privacy By Design,” the idea that privacy protections should be embedded into products and organizational practice during the design of products, rather than trying to address privacy retroactively. Privacy By Design has been put forward in regulatory guidance from the US and other countries, and more recently by the EU’s General Data Protection Regulation. Yet these regulations don’t provide a lot of guidance about what Privacy By Design means in practice.

In interactions with and field observations of the interdisciplinary Privacy By Design community—including lawyers, regulators, academics, practitioners, and technical folks—I’ve  found that there is a lot of recognition of the complexity of privacy: that it’s an essentially contested concept, there are many conceptualizations of privacy; privacy from companies is different than privacy from governments; there are different privacy harms, and so forth.

PowerPoint Slide Show - [Bringing Privacy to the Table CHI Talk 2.0.pptx] 5_10_2019 10_14_25 AM

Privacy by Design conceptualizes “design” in a relatively narrow way

But the discussion of “design” seems much less complex. I had assumed Privacy By Design would have meant applying HCI’s rich breadth of design approaches toward privacy initiatives – user centered design, participatory design, value sensitive design, speculative design, and so on.

Instead, design seemed to be used narrowly, as either a way to implement the law via compliance engineering, or to solve specific privacy problems. Design was largely framed as a deductive way to solve a problem, using approaches such as encryption techniques or building systems to comply with fair information practices. While these are all important and necessary privacy initiatives, but I kept finding myself asking, “where’s the rest of design?” Not just the deductive problem solving aspects of design, but also its the inductive, exploratory, and forward looking aspects.

PowerPoint Slide Show - [Bringing Privacy to the Table CHI Talk 2.0.pptx] 5_10_2019 10_16_05 AM
There’s an opportunity for Privacy By Design to make greater use of the breadth of design approaches used in HCI

There’s a gap here between the way the Privacy By Design views design and the way the HCI community views design. Since HCI researchers and practitioners are in a position to help support or implement privacy by design initiatives, it’s important to try to help broaden the notion of design in Privacy By Design to more fully bridge this gap.

So our paper aims to fulfill 2 goals:

  1. Design in HCI is more than just solving problems. We as HCI privacy researchers can more broadly engage the breadth of design approaches in HCI writ large. And there are opportunities to build connections among the HCI privacy research community and HCI design research community & research through design community to use design in relation to privacy in multiple ways.
  2. Privacy By Design efforts risk missing out on the full benefits that design can offer if it sticks with a narrower solution and compliance orientation to design. From HCI, we can help build bridges with interdisciplinary Privacy By Design community, and engage them in understanding a broader view of design.  

So how might we characterize the breadth of ways that HCI uses design in relation to privacy? In the paper, we conduct a curated review of HCI research to explore and breadth and richness of how design practices are used in relation to privacy. We searched for HCI papers that use both the terms “privacy” and “design,” curating a corpus of 64 papers. Reading through each paper, we openly coded each one by asking a set of questions including: Why is design used; who is design done by; and for whom is design done? Using affinity diagramming on the open codes, we came up with a set of categories, or dimensions, which we used to re-code the corpus. In this post I’m going to focus on the dimensions that emerged when we looked at the “why design?” question, which we call the purposes of design.

PowerPoint Slide Show - [Bringing Privacy to the Table CHI Talk 2.0.pptx] 5_10_2019 10_15_10 AM

We use 4 purposes to discuss the breadth of reasons why design might be used in relation to privacy

 We describe 4 purposes of design. They are:

  • Design to solve a privacy problem;
  • Design to inform or support privacy;
  • Design to explore people and situations; and
  • Design to critique, speculate, and present critical alternatives.

Note that we use these to talk about how design has been used in privacy research specifically, not about all design writ large (that would be quite a different and broader endeavor!). In practice these categories are not mutually exclusive, and are not the only way to look at the space, but looking at them separately helps give some analytical clarity.  Let’s briefly walk through each of these design purposes.

To Solve a Privacy Problem

First, design is seen as a way to solve a privacy problem – which occurred most often in the papers we looked at. And I think this is often how we think about design colloquially, as a set of practices to solve problems. This is often how design is discussed in Privacy By Design discussions as well.

When viewing design in this way, privacy is presented a problem that has already been well-defined at the before the design process, and a solution is designed to address that definition of the problem. A lot of responsibility for protecting privacy here is thus placed in the technical system.

For instance, if a problem of privacy is defined as the harms that result from long term data processing and aggregation, we might design a system that limits data retention. If a problem of privacy is defined as not being identified, we might design a system to be anonymous.

To Inform or Support Privacy

Second, design is seen as a way to inform or support actors who must make privacy-relevant choices, rather than solving a privacy problem outright. This was also common in our set of papers.  Design to inform or support privacy views problems posed by privacy as an information or tools problem. If users receive information in better ways, or have better tools, then they can make more informed choices about how to act in privacy-preserving ways.

A lot of research has been done on how to design usable privacy policies or privacy notices – but it’s still up to the user to read the notice and make a privacy relevant decision. Other types of design work in this vein includes designing privacy icons, controls, dashboards, visualizations, as well as educational materials and activities.

In these approaches, a lot of responsibility for protecting privacy is placed in the choices that people make, informed by a design artifact. The protection of privacy doesn’t arise from the design of the system itself, but rather by how a person chooses to use the system. This orientation towards privacy fits well with US regulations around privacy that make individuals manage and control their own data.

To Explore People and Situations (Related to Privacy)

Third is using design to explore people and situations. Design is used as a mode of inquiry, to better understand what privacy or the experience of privacy means to certain people, in certain situations. Design here is not necessarily about solving an immediate problem.

Techniques like design probes or collaborative design workshops are some approaches here. For example, a project I presented at CSCW 2018 involved presenting booklets with conceptual designs of potentially invasive products to technology practitioners in training. We weren’t looking to gather feedback in order to develop these conceptual ideas into usable products. Instead, the goal was to use these conceptual design ideas as provocations to better understand the participants’ worldviews. How are they conceptualizing privacy when they see these designs? How do their reactions help us understand where they place responsibility for addressing privacy?

Here, privacy is understood as a situated experience, which emerges from practices from particular groups in specific contexts or situations. The goal is less about solving a privacy problem, and more about understanding how privacy gets enacted and experienced.

To Critique, Speculate, or Present Critical Alternatives About Privacy

Fourth is design to critique, speculate, or present critical alternatives. (By critical I don’t mean bad or mean, but instead I mean critical like reflexive reflection or careful analysis).  Design here is not about exploring the world as it is, but focuses on how the world could be. Often this consists of creating create conceptual designs that provoke, to create a space to surface and discuss social values. These help us discuss worlds we might strive to achieve or ones we want to avoid. Privacy in this case is situated in different possible sociotechnical configurations of the world, thinking about privacy’s social, legal, and technical aspects together.

For example, in a project I presented at DIS 2017, we created advertisements for fictional sensing products, like a bodily implant for workplace employees. This helped us raise questions beyond basic data collection and use ones. The designs helped us ask questions about how is privacy implicated in the workplace, or through employment law? Can consent really occur with these power dynamics? It also helped us ask normative questions, such as: Who gets to have privacy and who doesn’t? Who or what should be responsible for protecting privacy? Might we look to technical design, to regulations, to market mechanisms, or to individual choice to protect privacy?

Design Is a Political, Values-Laden Choice

So in summary these are the 4 purposes of design that we identified in this paper: using design to solve, to inform and support, to explore, and to critique and speculate. Again, in practice, they’re not discrete categories. Many design approaches, like user centered design, or participatory design, use design for multiple design purposes.

PowerPoint Slide Show - [Bringing Privacy to the Table CHI Talk 2.0.pptx] 5_10_2019 10_27_56 AM

Using design in different ways suggests different starting points for how we might think about privacy

But this variety of purposes for how design relates to privacy is also a reminder that design isn’t a neutral process, but is itself political and values-laden. (Not political in terms of liberal and conservative, but political in the sense that there is power and social implications in the choices we make about how to use design). Each design purpose suggests a different starting place for how we orient ourselves towards conceptualizing and operationalizing privacy. We might think about privacy as:

  • a technical property;
  • as a user-made choice;
  • as situated experiences;
  • as privacy as sociotechnically situated.

Privacy can be many and all of these things at once, but the design methods we choose, and the reasons why we choose to use design helps to suggest or foreclose different orientations toward privacy. These choices also suggest that responsibility for privacy might be placed in different places — such as in a technical system, in a person’s choices, in a platform’s policies, in the law, in the market, and so forth.

PowerPoint Slide Show - [Bringing Privacy to the Table CHI Talk 2.0.pptx] 5_10_2019 10_15_39 AM

Research using design to solve and design to inform and support appeared more often in the papers that we looked at

Now I’ve been discussing these 4 design purposes equally, but they weren’t equal in our corpus. Allowing each paper to be coded for multiple categories, a little over half the papers we looked at used design to solve a privacy problem and a little over half used design to inform or support. Less than a quarter used design to explore; even fewer used design to critique and speculate. We don’t claim that the exact percentages are representative of all the privacy literature, but there’s a qualitative difference here, where most of the work we reviewed uses design to solve privacy problems or support and inform privacy.

We are arguing for a big tent approach in privacy by design: using design in all of these ways helps us address a broader set of conceptions of privacy.

This suggests that there’s an opportunity for us to build bridges between the HCI privacy research community, which has rich domain expertise; and the HCI design research & research through design communities, which have rich design methods expertise, particularly using design in ways to explore, and to critique and speculate.

So that’s Argument 1, that we have the opportunity to build new bridges among HCI communities to more fully make use of each others’ expertise, and a broader range of design methods and purposes.

Argument 2 is that Privacy By Design has largely (with some exceptions) thought about design as a problem solving process.  Privacy By Design research and practice could expand on that thinking of design to make more use of a fuller breadth of uses of design that are reflected in HCI.

Implications for Design Collaboration

So what might some of these collaborations within and across fields look like, if we want to make use of more of design’s breadth? For example if we as privacy researchers, develop a set of usable privacy tools to inform and support most people’s privacy decision making; that might be complemented with design to explore so that we can better understand the often marginalized populations for whom those tools don’t work. For instance Diana Freed et al.’s work shows that social media privacy and security tools can be used against victims of intimate partner violence, violating their privacy and safety. Or, an emerging set of problems we face is thinking about privacy in physically instrumented spaces: how does consent work, what conceptions of privacy and privacy risk are at play? We can complement design to solve and design to support efforts with design to critique and speculate; to craft future scenarios that try to understand what concepts of privacy might be at play, and how privacy can surface differently when technical, social, or legal aspects of the world change.

From a design research perspective, I think there’s growing interest in the design research community to create provocative artifacts to try to surface discussions about privacy, particularly in relation to new and emerging technologies. Critically reflecting on my own design research work, I think it can be tempting to just speak to other designers and resort to conceptions of privacy that say “surveillance is creepy” and not dig deeper into other approaches to privacy. But by collaborating with privacy researchers, we can bring more domain expertise and theoretical depth to these design explorations and speculations, and engage a broader set of privacy stakeholders.

Industry privacy practitioners working on privacy by design initiatives might consider incorporating more UX researchers and designers form their organizations, as privacy allies and as design experts.  Approaches that use design to critique and speculate may also align well with privacy practitioners’ stated desire to find contextual and anticipatory privacy tools to help “think around corners”, as reported by Ken Bamberger and Deirdre Mulligan.

Privacy By Design regulators could incorporate more designers (in addition to engineers and computer scientists) in regulatory discussions about privacy by design, so that this richness of design practice isn’t lost when the words “by design” are written in the law.

Moreover, there’s an opportunity here for us an HCI community to bring HCI’s rich notions of what design can mean to Privacy By Design, so that beyond being a problem solving process, it is also seen as a process that also makes use of the multi-faceted, inductive, and exploratory uses of design that this community engages in.


 

Paper Citation: Richmond Y. Wong and Deirdre K. Mulligan. 2019. Bringing Design to the Privacy Table: Broadening “Design” in “Privacy by Design” Through the Lens of HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). ACM, New York, NY, USA, Paper 262, 17 pages. DOI: https://doi.org/10.1145/3290605.3300492

by Richmond at May 10, 2019 09:43 AM

April 29, 2019

Ph.D. student

Utilizing Design’s Richness in “Privacy by Design”

This post summarizes a research paper, Bringing Design to the Privacy Table, written by Richmond Wong and Deirdre Mulligan. The paper will be presented at the 2019 ACM Conference on Human Factors in Computing Systems (CHI 2019) on Wednesday, May 8 at the 4pm “Help Me, I’m Only Human” paper session.

How might the richness and variety in human computer interaction (HCI) design practices and approaches be utilized in addressing privacy during the development of technologies?

U.S. policy recommendations and the E.U.’s General Data Protection have helped concept of privacy by design (PBD)—embedding privacy protections into products during the initial design phase, rather than retroactively—gain traction. Yet while championing “privacy by design,” these regulatory discussions offer little in the way of concrete guidance about what “by design” means in technical and design practice. Engineering communities have begun developing privacy engineering techniques, to use design as a way to find privacy solutions. Many privacy engineering tools focus on design solutions that translate high level principles into implementable engineering requirements.  However, design in HCI has a much richer concept of what “design” might entail: it also includes thinking about design as a way to explore the world and to critique and speculate about the world. Embracing this richness of design approaches can help privacy by design more fully approach the privacy puzzle.

To better understand the richness of ways design practices can related to privacy, we conducted a curated review of 64 HCI research papers that discuss both privacy and design. One thing we looked at was how each paper viewed the purpose of design in relation to privacy. (Papers could be classified into multiple categories, so percentages add up to over 100). We found four main design purposes:

  • To Solve a Privacy Problem (56% of papers) – This aligns with the common perception of design, that design is used to solve problems. This includes creating system architectures and data management systems in ways that collect and use data in privacy-preserving ways. The problems posed by privacy are generally well-defined before the design process; a solution is then designed to address that problem.
  • To Inform or Support Privacy (52%) – Design is also used to inform or support people who must make privacy-relevant choices, rather than solving a privacy problem outright. A lot of these papers use design to increase the usability of privacy notices and controls to allow end users to more easily make choices about their privacy. These approaches generally assume that if people have the “right” types of tools and information, then they will choose to act in more privacy-preserving ways.
  • To Explore People and Situations (22%) – Design can be used as a form of inquiry to understand people and situations. Design activities, probes, or conceptual design artifacts might be shared with users and stakeholders to understand their experiences and concerns about privacy. Privacy is thus viewed here as relating to different social and cultural contexts and practices; design is used as a way to explore what privacy means in these different situations.
  • To Critique, Speculate, or Present Critical Alternatives (11%) – Design can be used to create spaces in which people can discuss values, ethics, and morals—including privacy. Rather than creating immediately deployable design solutions, design here works like good science fiction: creating conceptual designs that try to provoke people into think about relationships among technical, social, and legal aspects of privacy and ask questions such as who gets (or doesn’t get) to have privacy, or who should be responsible for providing privacy.

One thing we found interesting is how some design purposes tend to narrowly define what privacy means or define privacy before the design process, whereas others view privacy as more socially situated and use the process of design itself to help define privacy.

For those looking towards how these dimensions might be useful in privacy by design practice, we mapped our dimensions onto a range of design approaches and methodologies common in HCI, in the table below.

Design Approach(es) Dimensions of Design Purposes How does design relate to privacy?
Software Engineering Solve a problem; Inform and support Conceptions and the problem of privacy solved are defined in advance. Lends itself well to problems related data privacy, or privacy issues to be addressed at a system architecture level.
User-Centered Design Solve a problem; Inform and support; Explore Could have conception of privacy defined in advance, or it might surface from users. Lends itself well to individual-based conceptions of privacy
Participatory Design; Value Centered Design Solve a problem; Inform and support; Explore; Surface stakeholder conceptions of privacy, involve stakeholders in the design process
Resistance, Re-Design, Re-Appropriation Practices Solve a problem; Critique Shows breakdown or contestation in current conceptions of privacy
Speculative and Critical Design Explore; Critique Critique current conceptions of privacy, explores and shows potential ways privacy might emerge in new situations

These findings can be of use to several communities:

  • HCI privacy researchers and PBD researchers might use this work to reflect on dominant ways in which design has been used thus far (to solve privacy problems, and to inform or support privacy), and begin to explore a broader range of design purposes and approaches in privacy work.
  • HCI design researchers might use this work to see how expertise in research through design methods could be married with privacy domain expertise, suggesting potential new collaborations and engagements.
  • Industry Privacy Practitioners can begin reaching out to UX researchers and designers in their own organizations both as design experts and as allies in privacy by design initiatives. In particularly, the forward-looking aspects of speculative and critical design approaches may also align well with privacy practitioners’ desire to find contextual and anticipatory privacy tools to help “think around corners”.
  • Policymakers should include designers (in addition to engineers and computer scientists) in regulatory discussions about privacy by design (or other “governance by design” initiatives). Many regulators seem to view “design” in “privacy by design” as a way to implement decisions made in law, or as a relatively straightforward way to solve privacy problems. However, this narrow view risks hiding the politics of design; what is left unexamined in these discussions is that different design approaches also suggest different orientations and conceptualizations of privacy. HCI design practices, which have already been used in relation to privacy, suggest a broader set of ways to approach privacy by design.

Our work aims to bridge privacy by design research and practice with HCI’s rich variety of design research. By doing so, we can help encourage more holistic discussions about privacy, drawing connections among privacy’s social, legal, and technical aspects.


Download a pre-print version of the full paper here.

Paper Citation:
Richmond Y. Wong and Deirdre K. Mulligan. 2019. Bringing Design to the Privacy Table: Broadening “Design” in “Privacy by Design” Through the Lens of HCI. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3290605.3300492

This post is crossposted on Medium

by Richmond at April 29, 2019 06:49 AM

April 19, 2019

Ph.D. student

ethnography is not the only social science tool for algorithmic impact assessment

Quickly responding to Selbst, Elish, and Latonero’s “Accountable Algorithmic Futures“, Data and Society’s response to the Algorithmic Accountability Act of 2019…

The bill would empower the FTC to do “automated decision systems impact assessment” (ADSIA) of automated decision-making systems. The article argues that the devil is in the details and that the way the FTC goes about these assessments will determine their effectiveness.

The point of their article, which I found notable, is to assert the appropriate intellectual discipline for these impact assessments.

This is where social science comes in. To effectively implement the regulations, we believe that engagement with empirical inquiry is critical. But unlike the environmental model, we argue that social sciences should be the primary source of information about impact. Ethnographic methods are key to getting the kind of contextual detail, also known as “thick description,” necessary to understand these dimensions of effective regulation.

I want to flag this as weird.

There is an elision here between “the social sciences” and “ethnographic methods” here, as if there were no social sciences that were not ethnographic. And then “thick description” is implied to be the only source of contextual detail that might be relevant to impact assessments.

This is a familiar mantra, but it’s also plainly wrong. There’s many disciplines and methods within “the social sciences” that aren’t ethnographic, and many ways to get at contextual detail that does not involve “thick description”. There is a worthwhile and interesting intellectual question: what are the appropriate methods for algorithmic impact assessment. The authors of this piece assume an answer to that question without argument.

by Sebastian Benthall at April 19, 2019 05:50 PM

April 15, 2019

MIMS 2018

Google has a process for getting these scripts approved, but since you’re the developer in this…

Google has a process for getting these scripts approved, but since you’re the developer in this case, you can just go ahead and trust it.

by Gabe Nicholas at April 15, 2019 03:15 PM

Great tutorial!

Great tutorial! Quick update though: the Add Trigger button now appears under Current project’s triggers, not All your triggers.

by Gabe Nicholas at April 15, 2019 03:13 PM

March 26, 2019

Ph.D. student

Neutral, Autonomous, and Pluralistic conceptions of law and technology (Hildebrandt, Smart Technologies, sections 8.1-8.2)

Continuing notes and review of Part III of Hildebrandt’s Smart Technologies and the End(s) of Law, we begin chapter 8, “Intricate entanglements of law and technology”. This chapter culminates in some very interesting claims about the relationship between law and the printing press/text, which I anticipate provide some very substantive conclusions.

But the chapter warms up by a review of philosophical/theoretical positions on law and technology more broadly. Section 8.2. is structured as a survey of these positions, and in an interesting way: Hildebrandt lays out Neutral, Autonomous, and Pluralistic conceptions of both technology and law in parallel. This approach is dialectical. The Neutral and Autonomous conceptions are, Hildebrandt argues, narrow and naive; the Pluralistic conception captures nuances necessary to understand not only what technology and law are, but how they relate to each other.

The Neutral Conception

This is the conception of law and technology as mere instruments. A particular technology is not good or bad, it all depends on how it’s used. Laws are enacted to reach policy aims.

Technologies are judged by their affordances. The goals for which they are used can be judged, separately, using deontology or some other basis for the evaluation of values. Hildebrandt has little sympathy for this view: “I believe that understanding technologies as mere means amounts to taking a naive and even dangerous position”. That’s because, for example, technology can impact the “in-between” of groups and individuals, thereby impacting privacy by its mere usage. This echoes the often cited theme of how artifacts have politics (Winner, 1980): by shaping the social environment by means of their affordances.

Law can also be thought of as neutral instrument. In this case, it is seen as a tool of social engineering, evaluated for its effects. Hildebrandt says this view of law fits “the so-called regulatory paradigm”, which “reigns in policy circles, and also in policy science, which is a social science inclined to take an exclusively external perspective on the law”. The law regulates behavior externally, rather than the actions of citizens internally.

Hildebrandt argues that when law is viewed instrumentally, it is tempting to then propose that the same instrumental effects could be achieved by technical infrastructure. “Techno-regulation is a prime example of what rule by law ends up with; replacing legal regulation with technical regulation may be more efficient and effective, and as long as the default settings are a part of the hidden complexity people simply lack the means to contest their manipulation.” This view is aligned with Lessig’s (2009), which Hildebrandt says is “deeply disturbing”; as it is aligned with “the classical law and economics approach of the Chicago School”, it falls short…somehow. This argument will be explicated in later sections.

Comment

Hildebrandt’s criticism of the neutral conception of technology is that it does not register how technology (especially infrastructure) can have a regulatory effect on social life and so have consequences that can be normatively evaluated without bracketing out the good or bad uses of it by individuals. This narrow view of technology is precisely that which has been triumphed over by scholars like Lessig.

Hildebrandt’s criticism of the neutral conception of law is different. It is that by understanding law primarily by its external effects (“rule by law”) diminishes the true normative force of a more robust legality that sees law as necessarily enacted and performed by people (“Rule of Law”). But nobody would seriously think that “rule by law” is not “neutral” in the same sense that some people think technology is neutral.

The misalignment of these two positions, which are presented as if they are equivalent, obscures a few alternative positions in the logical space of possibilities. There are actually two different views of the neutrality of technology: the naive one that Hildebrandt takes time to dismiss, and the more sophisticated view that technology should be judged by its social effects just as an externally introduced policy ought to be.

Hildebrandt shoots past this view, as developed by Lessig and others, in order to get to a more robust defense of Rule of Law. But it has to be noted that this argument for the equivalence of technology and law within the paradigm of regulation has beneficial implications if taken to its conclusion. For example, in Deirdre Mulligan’s FAT* 2019 keynote, she argued that public sector use of technology, if recognizes as a form of policy, would be subject to transparency and accountability rules under laws like the Administrative Procedure Act.

The Autonomous Conception

In the autonomous conception of technology and law, there is no agent using technology or law for particular ends. Rather, Technology and Law (capitalized) act with their own abstract agency on society.

There are both optimistic and pessimistic views of Autonomous Technology. There is hyped up Big Data Solutionism (BDS), and dystopian views of Technology as the enframing, surveilling, overpowering danger (as in, Heidegger). Hildebrandt argues that these are both naive and dangerous views that prevent us from taking seriously the differences between particular technologies. Hildebrant maintains that particular design decisions in technology matter. We just have to think about the implications of those decisions in a way that doesn’t deny the continued agency involved the continuous improvement, operation, and maintenance of the technology.

Hildebrant associates the autonomous conception of law with legal positivism, the view of law as a valid, existing rule-set that is strictly demarcated from either (a) social or moral norms, or (b) politics. The law is viewed as legal conditions for legal effects, enforced by a sovereign with a monopoly on violence. Law, in this sense, legitimizes the power of the state. It also creates a class of lawyers whose job it is to interpret, but not make, the law.

Hildebrandt’s critique of the autonomous conception of law is that it gives the law too many blind spots. If Law is autonomous, it does not need to concern itself with morality, or with politics, or with sociology, and especially not with the specific technology of Information-Communications Infrastructure (ICI). She does not come out and say this outright, but the implication is that this view of Law is fragile given the way changes in the ICI are rocking the world right now. A more robust view of law would give better tools for dealing with the funk we’re in right now.

The Pluralistic Conception

The third view of technology and law, the one that Hildebrandt endorses, is the “pluralistic” or “relational” view of law. It does not come as a surprise after the exploration of the “neutral” and “autonomous” conceptions.

The way I like to think about this, the pluralistic conception of technology/law, is: imagine that you had to think about technology and law in a realistic way, unburdened by academic argument of any kind. Imagine, for example, a room in an apartment. Somebody built the room. As a consequence of the dimensions of the room, you can fit a certain amount of furniture in it. The furniture has affordances; you can sit at chairs and eat at tables. You might rearrange the furniture sometimes if you want a different lifestyle for yourself, and so on.

In the academic environment, there are branches of scholarship that like to pretend they discovered this totally obvious view of technology for the first time in, like, the 70’s or 80’s. But that’s obviously wrong. As Winner (1980) points out, when Ancient Greeks were building ships, they obviously had to think about how people would work together to row and command the ship, and built it to be functional. Civil engineering, transportation engineering, and architecture are fields that deal with socially impactful infrastructure, and they have to deal with the ways people react, collectively, to what was built. I can say from experience doing agile development of software infrastructure that software engineers, as well, think about their users when they build products.

So, we might call this the “realistic” view–the view that engineers, who are the best situated to understand the processes of producing and maintaining technology, since that’s their life, have.

I’ve never been a lawyer, but I believe one gets to the pluralistic, or relational, view of law in pretty much the same way. You look at how law has actually evolved, historically, and how it has always been wrapped up in politics and morality and ICI’s.

So, in these sections, Hildebrandt drives home in a responsible, scholarly way the fact that neither law nor technology (especially technological infrastructure, and especially ICI) are autonomous–they are historically situated creates of society–and nor are they instrumentally neutral–they do have a form of agency in their own right.As my comment above notes, to me the most interesting part of this chapter was the gaps and misalignment in the section on the Neutral Conception section. This conception seems most aligned with an analytically clear, normative conception of what law and technology are supposed to be doing, which is what makes this perspective enduringly attractive to those who make them. The messiness or the pluralistic view, while more nuanced, does not provide a guide for design.

By sweeping away the Neutral conception of law as instrumental, Hildebrandt preempts arguments that the law might fail to attain its instrumental goals, or that the goals of law might sometimes be attained through infrastructure. In other words, Hildebrandt is trying to avoid a narrow instrumental comparison between law and technology, and highlights instead that they are relationally tied to each other in a way that prevents either from being a substitute for the other.

References

Hildebrandt, Mireille. Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing, 2015.

Lessig, Lawrence. Code: And other laws of cyberspace. ReadHowYouWant. com, 2009.

Winner, Langdon. “Do artifacts have politics?.” Daedalus(1980): 121-136.

by Sebastian Benthall at March 26, 2019 02:19 AM

March 25, 2019

Center for Technology, Society & Policy

Backstage Decisions, Front-stage Experts: Non-technical Experiences and the Political Engagement of Scientists

by Santiago Molina and Gordon PherriboCTSP Fellows

This is the second in a series of posts on the project “Democratizing” Technology: Expertise and Innovation in Genetic Engineering.

See the first post in the series: Backstage Decisions, Front-stage Experts: Interviewing Genome-Editing Scientists.

Since 2015, scientists, ethicists, and regulators have attempted address the ethical, moral, and social concerns involving genetic modifications to the human germline. Discourse involving these concerns focused on advancing a culture of responsibility and precaution within the scientific community, rather than the creation of new institutional policies and laws. Confidence in scientist’s ability to self-regulate has become increasingly tenuous with the recent news of the birth of genome-edited twins on November 26th, 2018, despite the scientific consensus that such experiments are medically and ethically unwarranted. In response, journalists, social scientists and critical researchers in the life sciences have posed the question: Who should be involved in deciding how genome-editing technologies should be used and for what aims?

In this post, we complicate the idea that technical expertise, which is usually narrowly defined on the basis of professional experience or knowledge, should be the main criteria for having a seat at the table during scientific decision-making. Drawing from eight interviews with scientists who participated in a small meeting held in Napa Valley in 2015, we highlight the role of non-technical experiences in shaping scientists’ views of decision-making about genome editing.

We identify three experiences that have influenced scientists’ views and deliberations about the application and potential consequences and benefits of genetic engineering technologies: 1) reading and group discussions outside of their academic disciplines, 2) direct engagement with patient communities, and 3) involvement in social movements. To wrap up, we make some modest suggestions for what these might mean in the context of STEM education.

1. Reading Outside of the Discipline and Group Discussions.

During our interviews we asked scientists how they shaped their viewpoints about biotechnology and its relationship to society. Respondents described their exposure to new viewpoints and reflected on the effect this exposure had on their decision-making. One of the sources of these exposures was reading outside of their academic discipline. We were surprised to hear about how the work of philosophers of science and sociologists of science did inform the decision making of one senior scientist at the Napa Valley meeting. This faculty member discussed their interest in finding opportunities to supplement their laboratory training with philosophical discussions about issues tangential to the science they were working on. With other graduate students, they created a small group that met regularly to discuss concepts and theories in philosophy of science, ethics and sociology of science.

We met- I don’t remember whether it was once a month or once every two weeks to discuss issues around the philosophy and societal issues of science. So we would find books, read books, um from you know – from Bertrand Russell, the philosopher, to Jacob Bronowski to Alfred Lord Whitehead, you know books on the philosophy and the applications of science,

The scientist described that this work added additional layers to their understanding of societal issues related to science. Even though this reading group was instrumental in developing his own awareness of the relationship between science and broader social, political and cultural issues, this respondent also lamented how the opportunity to delve into topics outside of a graduate student’s normal routine, “was not encouraged by any of [their] mentors.” This theme came up in several of our interviews, reinforcing the importance of mentors in shaping how scientists make meaning of their discipline in relation to society, and what educational and professional development opportunities graduate students feel comfortable pursuing outside of their formal training.

2. Direct engagement through service.

The most distinctly communicated experiences our interviewees engaged in outside of their formal training were service-related learning experiences that involved direct interaction with communities that would medically benefit from the technology. These experiences appeared to give individuals a greater sense of civic responsibility, and afforded them a more expansive understanding of the relationship between their work and broader communities. For genome-editing researchers, this crucially meant being aware of the social and medical realities of patients that might be research subjects in clinical trials for CRISPR-based therapies.

In our interviews, scientists had direct engagement with people outside of their discipline within national scientific boards, federal organizations, health clinics, and the biotech and pharmaceutical industry. These types of experiences provide an opportunity to collaborate with stakeholders on pressing issues, learn and benefit from industry and market knowledge, and ensure that the outcome of decisions are both relevant and meaningful to community stakeholders outside of the lab.

One of our respondents reflected on how they learned important skills, such as active listening, through professional experiences with indigenous patient communities–which helped this respondent better serve the community’s needs.

I’ve learned a whole lot from the patients I’ve taken care of and the people I’ve met. I certainly learned a great deal from going to the Navajo reservation. I’m – just to be able to sit down in a very different culture and listen and I think it’s very important for doctors to listen to their patients.

This interviewee was additionally committed to modeling the listening behavior of physicians and teaching these listening skills to others. When we further asked “What what do you think was specific about the way that [your mentors] spoke with patients and interacted with them?” the interviewee responded with clarity:        

Sitting back and not speaking and letting them talk about what’s important to them.

The interviewee conveyed that if you listen, people will tell you what is most important to them. They further argued that as decision-makers guiding the usage of far-reaching technologies, it is important to not make assumptions about what a particular community needs.

Similarly, in another interview, a molecular biologist described their experience setting up clinical trials and discussing the risks and benefits of an experimental treatment. This experience not only gave them a more concrete sense of what was at stake in the discussions held at the Napa Meeting, but also helped sensitize them towards the lived experiences of the patient communities that may be affected (for better or worse) by genome editing technology. When asked if experiences during their doctoral program, postdoc or work at a biotech firm, had prepared them for discussing genome editing and its implications, the molecular biologist responded:

Having been involved in therapeutic programs in which you’re discussing the pluses and minuses of therapies that can have side effects can prepare you for that. […] To me that was very helpful because it was a very concrete discussion. That conversation was not a like, “oh, I’m an academic and I wanna write a paper and someone’s going to read it and then enough.” […] [In a therapeutic program] the conversation was like, “we have a molecule, are we going to put it in people?” And if the answer is “yes,” like there is a living person on the other end that is going to take that molecule and [they are] going to have to live with the consequences positive and negative. […] 

            The distinction being drawn here between scientific work with concrete outcomes for people and work with solely academic outcomes, suggests that there are practical experiences that researchers at Universities may only have indirect knowledge of that are important for understanding how the products of science may affect others. As the interviewee further explained, the stakes of being unfamiliar with patient’s experiences are particularly high,

[My work at a biotech firm] has sort of prepared me at least a little bit for some of the discussion around therapeutic editing because different patient populations have wildly different ideas about gene editing. There are certain forms of inherited blindness where people are frankly insulted that you would call it a genetic disease, right? And I think rightly so. That’s their experience. It’s their disease. Why should we call this something that should be quote-unquote “corrected,” right?

In this case, prior experience with clinical trials alerted the researcher towards the heterogeneity of experiences of different patient populations. They further described how, in other interactions with patient advocates through public engagement, they were able to learn a great deal about the uniqueness of each patient group and their different views about genome editing. Here, the researcher additionally conveyed concern over the ableism that is often implicit in medical views of difference. They recounted how listening to perspectives from different patient communities led them to reflect on how procedurally safe genome editing can still cause harm in other ways.

3. Involvement in social movements.

The third non-technical form of expertise came from researchers’ political participation. While the recent fervor against the GOP’s “war on science” may give us ample evidence that politics and science don’t mix well, the role of social movements in the creation of scientific knowledge has been extensively documented by sociologists. For example, post World War II environmental movements changed the content, form and meaning of ecological research (Jamison 2006) and Gay Rights and AIDS activists helped steer the direction of biomedical research (Epstein 1996). What is less emphasized in these studies though, is how participation in social movements by scientists can impact their worldview and decision-making. When asked what personal experiences shaped how they thought of the process of decision-making around new biotech, one interviewee mentioned their engagement with political movements in the late 1960’s during anti-Vietnam War protests :

So I was in Berkeley in the late 60s…This is a time of a lot of social activity. Protests that went on against the Vietnam War in favor of civil rights. There was a lot of protest activity going on and I was involved in that to some extent, you know, I went on marches. I went door-to-door one summer in opposition to the Vietnam War…Um, so I had to you know- I had sort of a social equity outlook on life. All the way from my upbringing from college- and then at Berkeley you really couldn’t avoid being involved in some of these social issues.

This respondent went on to discuss how their commitments towards social equity shaped their decision-making around emerging technologies. In another interview, a respondent described how taking time off of their graduate program to work on a local election campaign motivated them to participate in science policy forums later in their career.

However, these example also suggests that how a scientist chooses to engage with social movements can have lasting effects on how they think of themselves as being a part of a larger community. If scientists participate unreflexively, social movements can fail to challenge individual’s to consider how the network building and activism they are doing affects themselves and may be excluding others from different communities.

To give a contemporary example, the March for Science (MfS) movement in January of 2017 protested against the Trump administration’s anti-science policies and actions. While the issues about science funding were urgent, MfS organizers failed to address language issues in MfS that were dismissive of the experience of marginalized communities in science. Whether or not a participant in MfS chose to critically engage in the movement, will influence how this individual sees the world and whether they intentionally or unintentionally reproduce inequities in science. By asking scientists to think about both their role in society and about the community of science itself, social movements provide a large quantity of knowledge and creativity that scientists can contribute to and use as a resource when making decisions and reflecting on the implications of emerging technologies.

The Value of Non-technical Expertise in Training

Many of the experiences that shaped our interviewees decision-making occurred during their early graduate and professional training. Despite the personal and professional value they found in these experiences, our interviewees noted the lack of support from their graduate mentors in their exploration of non-technical interests and a lack of incentives to participate more broadly in political endeavors during their training. While this may be changing for newer generations of scientists, this raises questions about how scientists in the natural and physical sciences are socialized into the broader scientific community, and the impact of that socialization on what they think of their political responsibilities are.

For example, a consensus study of the National Academies of Sciences, Engineering, and Medicine (2018) found that there is a lack of social and institutional support for activities located outside of the traditional realm of an individual’s discipline and argued for the creation of novel training pathways that could lead to holistic STEM research training. One way of creating more holistic STEM training programs noted by the study that our findings support would be to provide resources and structures to facilitate the connection between graduate training in the life sciences and fields, such as STS, sociology and philosophy. Exposure to these disciplines can help aspiring researchers grapple with the social interactions of their discipline and serve as additional tools for constructive debates around scientific issues. Promoting interdisciplinary collaboration may also help reduce stigma associated with non-traditional pathways to scientific training and provide easier channels to integrate professional development and internship opportunities into the curriculum.

The urgency of this current gap in training is apparent if you look at who is currently at the the decision making table. The committees and meetings for deliberation about the social and ethical issues of genome editing are almost exclusively constituted by senior scientists. These researchers are mainly conscripted into these roles because of their technical expertise and status in disciplinary networks. Historically, the academic institutions these scientists were trained in were not built to prepare scientists for making political decisions or for appreciating the social complexity and nuance that comes with the introduction of emergent technologies into society. In our third blog post we will explore the political stakes of this form of science governance, which are surprisingly high.


References:

Epstein, S. (1996). Impure science: AIDS, activism, and the politics of knowledge (Vol. 7). Univ of California Press.

Jamison, A. (2006). Social movements and science: Cultural appropriations of cognitive praxis. Science as Culture, 15(01), 45-59.

National Academies of Sciences, Engineering, and Medicine (2018) Graduate STEM Education for the 21st Century. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25038.

by Daniel Griffin at March 25, 2019 04:33 PM

March 18, 2019

MIMS 2012

Adding clarity by removing information

“Where’s the clipboard?”

A customer wrote this in to our support team after using our Copy with cite feature. This feature allows customers to copy snippets of text from a court case, and paste them into the document they’re writing with the case’s citation appended to the end. It’s a huge time saver for lawyers when writing legal documents, and is Casetext’s most heavily used feature.

When I first saw this feedback, I assumed it was an anomaly. The success message says “Copied to clipboard,” but who doesn’t know what the clipboard is? How much clearer could we make the success message?

But then it came in again, and again, and again, until eventually the pattern was undeniable. It was only a small percentage of users overall who were writing in, but it was happening regularly enough that we knew we had to fix it.

The original, confusing toast message that pops up after the user clicks, “Copy with cite.” The original, confusing toast message that pops up after the user clicks, “Copy with cite.”

To debug this issue, I opened Fullstory (a service that lets you watch back user sessions, among other things) so I could observe the people who wrote in actually use the product. After watching a few sessions, a pattern emerged. People would click “Copy with cite,” zig-zag their mouse around the screen, opening and closing menus, then write in to support to ask, “Where’s the clipboard?” (or some variation thereof).

At first I didn’t understand why they were frantically throwing their cursor around the screen. What were they looking for? After watching many sessions, I finally realized what they were doing: they were looking for the clipboard in Casetext! They thought the “clipboard” was a feature in our product, as opposed to the clipboard on their OS.

Now that I understood the problem, the next challenge was how to fix it. How do we communicate that the clipboard we’re referring to is the system’s clipboard? I started rewriting the text, with options like, “Copied to clipboard. Press ctrl + V to paste.” “Copied to your system’s clipboard.” “Copied to your clipboard” [emphasis added].

But none of these options felt right. They were either too wordy, too technical, or just more confusing. I took a step back and re-examined the problem. The word “clipboard” is what was tripping people up. What if we just removed that word altogether? Could we get away with just saying “Copied” instead? For the people having trouble with this feature it may prevent them from thinking the clipboard is a feature we offer. For people who aren’t getting confused in the first place, this is should be just as clear as saying “Copied to clipboard.”

The refined toast message. The refined toast message.

The change felt a little risky, but at the same time it felt right. To validate this would work, I decided to just make the change and see what happens. An A/B test and usability testing would both be correct methods to test my hypothesis, but in this case neither tool was the right fit. An A/B test would have taken too long to get statistically valid results, since the conversion event of “failing” to use the feature was very low. It’s also a difficult conversion event to measure. And a usability test would have been more time-consuming and costly than just shipping the change.

Since the solution was easy to implement (the code change was just removing 13 characters), and the impact if I was wrong was low (the change was unlikely to be worse, and it was easy to reverse course if it was), learning by making a change to the working product was the fastest, cheapest way to go.

After shipping the fix I kept an eye on the support messages coming in. In the following 2 months, only one person was confused about what to do after clicking “Copy with cite,” as compared to the 8 people who had written in in the previous 2 months. Not a perfect record, but an improvement nonetheless!

In this case, the best way to improve the clarity of the UI was to provide less information.

by Jeff Zych at March 18, 2019 02:54 AM

March 15, 2019

Center for Technology, Society & Policy

Symposium: “Governing Machines – Defining and Enforcing Public Policy Values in AI Systems”

CTSP is proud to be a co-sponsor of  the 23rd Annual BCLT/BTLJ Symposium: Governing Machines: Defining and Enforcing Public Policy Values in AI Systems

Algorithms that analyze data, predict outcomes, suggest solutions, and make decisions are increasingly embedded into everyday life. Machines automate content filtering, drive cars and fly planes, trade stocks, evaluate resumes, assist with medical diagnostics, and contribute to government decision-making. Given the growing role of artificial intelligence and machine learning in society, how should we define and enforce traditional legal obligations of privacy, non-discrimination, due process, liability, professional responsibility, and reasonable care?

This symposium will convene scholars and practitioners from law, policy, ethics, computer science, medicine, and social science to consider what roles we should allow machines to play and how to govern them in support of public policy goals.

Co-sponsored by: CTSP, the Center for Long-Term Cybersecurity, and the Algorithmic Fairness and Opacity Working Group (AFOG) at UC Berkeley.

Bonus!

Two 2017 CTSP fellows will be panelists:

  • Amit Elazari on “Trust but Verify – Validating and Defending Against Machine Decisions”
  • Uri Hacohen on “Machines of Manipulation”

by Daniel Griffin at March 15, 2019 05:57 PM

March 14, 2019

Ph.D. student

Antinomianism and purposes as reasons against computational law (Notes on Hildebrandt, Smart Technologies, Sections 7.3-7.4)

Many thanks to Jake Goldenfein for discussing this reading with me and coaching me through interpreting it in preparation for writing this post.

Following up on the discussion of sections 7.1-7.2 of Hildebrandt’s Smart Technologies an the End(s) of Law (2015), this post discusses the next two sections. The main questions left from the last section are:

  • How strong is Hildebrandt’s defense of the Rule of Law, as she explicates it, as worth preserving despite the threats to it that she acknowledges from smart technologies?
  • Is the instrumental power of smart technology (i.e, its predictive function, which for the sake of argument we will accept is more powerful than unassisted human prognostication) somehow a substitute for Law, as in its pragmatist conception?

In sections 7.3-7.4, Hildbrandt discusses the eponymous ends of law. These are not its functions as could be externally and sociologically validated, but rather its internally recognized goals or purposes. And these are not particular goals, such as environmental justice, that we might want particular laws to achieve. Rather, these are abstract goals that the law as an entire ‘regime of veridiction’ aims for. (“Veridiction” means “A statement that is true according to the worldview of a particular subject, rather than objectively true.” The idea is that the law has a coherent worldview of its own.

Hildebrandt’s description of law is robust and interesting. Law “articulates legal conditions for legal effect.” Legal personhood (a condition) entails certain rights under the law (an effect). These causes-and-effects are articulated in language, and this language does real work. In Austin’s terminology, legal language is performative–it performs things at an institutional and social level. Relatedly, the law is experienced as a lifeworld, or Welt, but not a monolithic lifeworld that encompasses all experience, but one of many worlds that we use to navigate reality, a ‘mode of existence’ that ‘affords specific roles, actors and actions while constraining others’. [She uses Latour to make this point, which in my opinion does not help.] It is interesting to compare this view of society with Nissenbaum’s ((2009) view of society differentiated into spheres, constituted by actor roles and norms.

In section 7.3.2, Hildebrandt draws on Gustav Radbruch for his theory of law. Consistent with her preceding arguments, she emphasizes that for Radbruch, law is antinomian, (a strange term) meaning that it is internally contradictory and unruly, with respect to its aims. And there are three such aims that are in tension:

  • Justice. Here, justice is used rather narrowly to mean that equal cases should be treated equally. In other words, the law must be applied justly/fairly across cases. To use her earlier framing, justice/equality implied that legal conditions cause legal effects in a consistent way. In my gloss, I would say this is equivalent to the formality of law, in the sense that the condition-effect rules must address the form of a case, and not treat particular cases differently. More substantively, Hildebrandt argues that Justice breaks down into more specific values: distributive justice, concerning the fair distribution of resources across society, and corrective justice, concerning the righting of wrongs through, e.g., torts.
  • Legal certainty. Legal rules must be binding and consistent, whether or not they achieve justice or purpose. “The certainty of the law requires its positivity; if it cannot be determined what is just, it must be decided what is lawful, and this from a position that is capable of enforcing the decision.” (Radbruch). Certainty about how the law will be applied, whether or not the application of the law is just (which may well be debated), is a good in itself. [A good example of this is law in business, which is famously one of the conditions for the rise of capitalism.]
  • Purpose. Beyond just/equal application of the law across cases and its predictable positivity, the law aims at other purposes such as social welfare, redistribution of income, guarding individual and public security, and so on. None of these purposes is inherent in the law, for Radbruch; but in his conception of law, by its nature it is directed by democratically determined purposes and is instrumental to them. These purposes may flesh out the normative detail that’s missing in a more abstract view of law.

Two moves by Hildebrandt in this section seem particularly substantial to her broader argument and corpus of work.

The first is the emphasis on the contrast between the antinomian conflict between justice, certainty, and purpose with the principle of legal certainty itself. Law, at any particular point in time, may fall short of justice or purpose, and must nevertheless be predictably applied. It also needs to be able to evolve towards its higher ends. This, for Hildebrandt, reinforces the essential ambiguous and linguistic character of law.

[Radbruch] makes it clear that a law that is only focused on legal certainty could not qualify as law. Neither can we expect the law to achieve legal certainty to the full, precisely because it must attend to justice and to purpose. If the attribution of legal effect could be automated, for instance by using a computer program capable of calculating all the relevant circumstances, legal certainty might be achieved. But this can only be done by eliminating the ambiguity that inheres in human language: it would reduce interpretation to mindless application. From Radbruch’s point of view this would fly in the face of the cultural, value-laden mode of existence of the law. It would refute the performative nature of law as an artificial construction that depends on the reiterant attribution of meaning and decision-making by mindful agents.

Hildebrandt, Smart Technologies, p. 149

The other move that seems particular to Hildebrandt is the connection she draws between purpose as one of the three primary ends of law and purpose-binding a feature of governance. The latter has particular relevance to technology law through its use in data protection, such as in the GDPR (which she addresses elsewhere in work like Hildebrandt, 2014). The idea here is that purposes do not just imply a positive direction of action; they also restrict activity to only those actions that support the purpose. This allows for separate institutions to exist in tension with each other and with a balance of power that’s necessary to support diverse and complex functions. Hildebrandt uses a very nice classical mythology reference here

The wisdom of the principle of purpose binding relates to Odysseus’s encounter with the Sirens. As the story goes, the Sirens lured passing sailors with the enchantment of their seductive voices, causing their ships to crash on the rocky coast. Odysseus wished to hear their song without causing a shipwreck; he wanted to have his cake and eat it too. While he has himself tied to the mast, his men have their ears plugged with beeswax. They are ordered to keep him tied tight, and to refuse any orders he gives to the contrary, while being under the spell of the Sirens as they pass their island. And indeed, though he is lured and would have caused death and destruction if his men had not been so instructed, the ship sails on. This is called self-binding. But it is more than that. There is a division of tasks that prevents him from untying himself. He is forced by others to live by his own rules. This is what purpose binding does for a constitutional democracy.

Hildebrandt, Smart Technologies, p. 156

I think what’s going on here is that Hildebrandt understands that actually getting the GDPR enforced over the whole digital environment is going to require a huge extension of the powers of law over business, organization, and individual practice. From some corners, there’s pessimism about the viability of the European data protection approach (Koops, 2014), arguing that it can’t really be understood or implemented well. Hildebrandt is making a big bet here, essentially saying: purpose-binding on data use is just a natural part of the power of law in general, as a socially performed practice. There’s nothing contingent about purpose-binding in the GDPR; it’s just the most recent manifestation of purpose as an end of law.

Commentary

It’s pretty clear what the agenda of this work is. Hildebrandt is defending the Rule of Law as a social practice of lawyers using admittedly ambiguous natural language over the ‘smart technologies’ that threaten it. This involves both a defense of law as being intrinsically about lawyers using ambiguous natural language, and the power of that law over businesses, etc. For the former, Hildebrandt invokes Radbruch’s view that law is antinomian. For the second point, she connects purpose-binding to purpose as an end of law.

I will continue to play the skeptic here. As is suggested in the quoted package, if one takes legal certainty seriously, then one could easily argue that software code leads to more certain outcomes than natural language based rulings. Moreover, to the extent that justice is a matter of legal formality–attention to the form of cases, and excluding from consideration irrelevant content–then that too weighs in favor of articulation of law in formal logic, which is relatively easy to translate into computer code.

Hildebrandt seems to think that there is something immutable about computer code, in a way that natural language is not. That’s wrong. Software is not built like bridges; software today is written by teams working rapidly to adapt it to many demands (Gürses and Hoboken, 2017). Recognizing this removes one of the major planks of Hildebrandt’s objection to computational law.

It could be argued that “legal certainty” implies a form of algorithmic interpretability: the key question is “certain for whom”. An algorithm that is opaque due to its operational complexity (Burrell, 2016) could, as an implementation of a legal decision, be less predictable to non-specialists than a simpler algorithm. So the tension in a lot of ‘algorithmic accountability’ literature between performance and interpretability would then play directly into the tension, within law, between purpose/instrumentality and certainty-to-citizens.

Overall, the argument here is not compelling yet as a refutation of the idea of law implemented as software code.

As for purpose-binding and the law, I think this may well be the true crux. I wonder if Hildebrandt develops it later in the book. There are not a lot of good computer science models of purpose binding. Tschantz, Datta, and Wing (2012) do a great job mapping out the problem but that research program has not resulted in robust technology for implementation. There may be deep philosophical/mathematical reasons why that is so. This is an angle I’ll be looking out for in further reading.

References

Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society3.1 (2016): 2053951715622512.

Gürses, Seda, and Joris Van Hoboken. “Privacy after the agile turn.” The Cambridge Handbook of Consumer Privacy. Cambridge Univ. Press, 2017. 1-29.

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer, Cham, 2014. 31-62.

Hildebrandt, Mireille. Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing, 2015.

Koops, Bert-Jaap. “The trouble with European data protection law.” International Data Privacy Law 4.4 (2014): 250-261.

Nissenbaum, Helen. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2009.

Tschantz, Michael Carl, Anupam Datta, and Jeannette M. Wing. “Formalizing and enforcing purpose restrictions in privacy policies.” 2012 IEEE Symposium on Security and Privacy. IEEE, 2012.

by Sebastian Benthall at March 14, 2019 11:41 PM

March 08, 2019

adjunct professor

Comments on the CCPA

I filed the following comments today on the CCPA to the CA AG.

March 8, 2019

VIA Email

California Department of Justice
ATTN: Privacy Regulations Coordinator
300 S. Spring St.
Los Angeles, CA 90013

Re: Comments on Assembly Bill 375, the California Consumer Privacy Act of 2018

Dear Attorney General Becerra,

I helped conceive of the high-level policy goals of the privacy initiative that was withdrawn from the ballot with passage of AB 375. Here I provide comment to give context and explain the high-level policy goals of the initiative, in hopes that it helps your office in contemplating regulations for the CCPA.

Strong policy support for the initiative

As you interpret the CCPA, please bear in mind that the initiative would have passed because Americans care about privacy. In multiple surveys, Americans have indicated support for stronger privacy law and dramatic enforcement. Americans have rarely been able to vote directly on privacy, but when they do, they overwhelmingly support greater protections. One example comes from a 2002 voter referendum in North Dakota where 73% of citizens voted in favor of establishing opt-in consent protections for the sale of financial records.[1]

A series of surveys performed at Berkeley found that Americans wanted strong penalties for privacy transgressions. When given options for possible privacy fines, 69% chose the largest option offered, “more than $2,500,” when “a company purchases or uses someone’s personal information illegally.” When probed for nonfinancial penalties, 38% wanted companies to fund efforts to help consumers protect their privacy, while 35% wanted executives to face prison terms for privacy violations.

Information is different

The CCPA is unusually stringent compared to other regulatory law because information is different from other kinds of services and products. When a seller makes an automobile or a refrigerator, the buyer can inspect it, test it, and so on. It is difficult for the seller to change a physical product. Information-intensive services however are changeable, they are abstract, and since we have no physical experience with information, consumers cannot easily see the flaws and hazards of them in the way one could see an imperfection in a car’s hood.

Because information services can be changed, privacy laws tend to become stringent. Information companies have a long history of changing digital processes to trick consumers and to evade privacy laws in ways that physical product sellers simply could not.[2] 

Some of the CCPA’s most derided provisions (e.g. application to household level data) are in response to specific evasions of industries made possible because information is different than product regulation. Here are common examples:

  • Sellers claim not to sell personal data with third parties, but then go on to say we “may share information that our clients provide with specially chosen marketing partners.”[3] For this reason, the initiative tightened definitions and required more absolute statements about data selling. Companies shouldn’t use the word “partner” or “service provider” to describe third party marketers.
  • Companies have evaded privacy rules by mislabeling data “household-level information.” For instance, the DMA long argued that phone numbers were not personal data because they were associated with a household.
  • Many companies use misleading, subtle techniques to identify people. For instance, retailers asked consumers their zip code and used this in combination with their name from credit card swipes to do reverse lookups at data brokers.[4]
  • Information companies use technologies such as hash-matching to identify people using “non personal” data.[5]

Careful study of information-industry tricks informed the initiative and resulted in a definitional landscape that attempts to prevent guile. Those complaining about it need only look to the industry’s own actions to understand why these definitions are in place. For your office, this means that regulations must anticipate guile and opportunistic limitations of Californians’ rights.

The advantages of privacy markets

Creating markets for privacy services was a major goal of the initiative. The ability to delegate opt out rights, for instance, was designed so that Californians could pay a for profit company (or even donate to a non-profit such as EFF) in order to obtain privacy services.

There are important implications of this: first, the market-establishing approach means that more affluent people will have more privacy. This sounds objectionable at first, but it is a pragmatic and ultimately democratizing pro-privacy strategy. A market for privacy cannot emerge without privacy regulation to set a floor for standards and to make choices enforceable. Once privacy services emerge, because they are information services and because they can scale, privacy services will become inexpensive very quickly. For instance, credit monitoring and fraud alert services are only available because of rights given to consumers in the Fair Credit Reporting Act that can be easily invoked by third party privacy services. These services have become very inexpensive and are used by tens of millions of Americans.

Some will argue that the CCPA will kill “free” business models and this will be iniquitous. This reasoning underestimates the power of markets and presents free as the only solution to news. The reality is much more complex. Digital advertising supported services do democratize news access, however, they also degrade quality. One cost of the no-privacy, digital advertising model is fake news. Enabling privacy will improve quality and this could have knock-on effects.

Second, the market strategy relieves pressure on your office. The market strategy means that the AG does not have to solve all privacy problems. (That is an impossible standard to meet and perfection has become a standard preventing us from having any privacy.)

Instead, the AG need only set ground rules that allow pro-privacy services to function effectively. A key ground rule that you should promote is a minimally burdensome verification procedure, so that pro-privacy services can scale and can easily deliver opt out requests. For instance, in the telemarketing context, the FTC made enrolling in the Do-Not-Call Registry simple because it understood that complexifying the process would result in lower enrollment.

There is almost no verification to enroll in the Do-Not-Call Registry and this is a deliberate policy choice. One can enroll by simply calling from the phone number to be enrolled, or by visiting a website and getting a round-trip email. What this means is that online, a consumer can enroll any phone number, even one that is not theirs, so long as they provide an email address. The FTC does not run email/phone number verification.

The low level of verification in the Do-Not-Call Registry is a reflection of two important policy issues: first, excessive verification imposes transaction costs on consumers, and these costs are substantial. Second, the harm of false registrations is so minimal that it is outweighed by the interest in lowering consumer transaction costs. Most people are honest and there is no evidence of systematic false registrations in the Do-Not-Call Registry. More than 200 million numbers are now enrolled.

The AG should look to the FTC’s approach and choose a minimally invasive verification procedure for opt out requests that assumes 1) that most Californians are honest people and will not submit opt out requests without authority, and 2) that verification stringency imposes a real, quantifiable cost on consumers. That cost to consumers is likely to outweigh the interest of sellers to prevent false registrations. In fact, excessive verification could kill the market for privacy services and deny consumers the benefit of the right to opt out. A reasonable opt out method would be one where a privacy service delivers a list of identifiable consumers to a business, for instance through an automated system, or simply a spreadsheet of names and email addresses.

The AG should look to Catalog Choice as a model for opt outs. Catalog Choice has carefully collected all the opt out mechanisms for paper mail marketing catalogs. A consumer can sign up on the site, identify catalogs to opt out from (9,000 of them!), and Catalog Choice sends either an automated email or a structured list of consumers to sellers to effectuate the opt out. This service is free. Data feeds from Catalog Choice are even recognized by data brokers as a legitimate way for consumers to stop unwanted advertising mail. Catalog choice performs no verification of consumer identity. Again, this is acceptable, because the harm of a false opt-out is negligible, and because deterring that harm would make it impossible for anyone to opt out efficiently.

I served on the board of directors of Catalog Choice for years and recall no incidents of fraudulent opt outs. The bigger problem was with sellers who simply would not accept opt outs. A few would summarily deny them for no reason other than that allowing people to opt out harmed their business model, or they would claim that Catalog Choice needed a power of attorney to communicate a user’s opt out. The AG should make a specific finding that a power of attorney or any other burdensome procedure is not necessary for delivering verified opt out requests.

The AG should assume that sellers will use guile to impose costs on opt out requests and to deter them. Recall that when consumer reporting agencies were required to create a free credit report website, CRAs used technical measures to block people from linking to it, so that the consumer had to enter the URL to the website manually. CRAs also set up confusing, competing sites to draw consumers away from the free one. The FTC actually had to amend its rule to require this disclosure on all “free” report sites.

The definition of sell

The definition of sell in the CCPA reflects the initiative’s broad policy goal of stopping guile in data “sharing.”

From a consumer perspective, any transfer of personal information to a third party for consideration is a sale (subject to exceptions for transactional necessity, etc). But the information industry has interpreted “sale” to only mean transfers for money consideration. That is an unfounded, ahistorical interpretation.

The initiative sought to reestablish the intuitive contract law rule that any transfer for value is the “consideration” that makes a data exchange a sale. In the information industry’s case, that valuable consideration is often a barter exchange. For instance, in data cooperatives, sellers input their own customer list into a database in exchange for other retailers’ data.[6] Under the stilted definition of “sale” promoted by the information industry, that is not data selling. But from a consumer perspective, such cooperative ”sharing” has the same effect as a “sale.”

Recent reporting about Facebook makes these dynamics clearer in the online platform context.[7] Properly understood, Facebook sold user data to application developers. If application developers enabled “reciprocity” or if developers caused “engagement” on the Facebook platform, Facebook would give developers access to personal data. From a consumer perspective, users gave their data to Facebook, and Facebook transferred user data to third parties, in exchange for activity that gave economic benefit to Facebook. That’s a sale. The AG should view transfers of personal information for value, including barter and other exchange, as “valuable consideration” under the CCPA. Doing so will make the marketplace more honest and transparent.

Disclosures that consumers understand

Over 60% of Americans believes that if a website has a privacy policy, it cannot sell data to third parties.[8]

I have come to the conclusion, based on a series of 6 large scale consumer surveys and the extensive survey work of Alan Westin, that the term “privacy policy” is inherently misleading. Consumers do not read privacy policies. They see a link to the privacy policy, and they conclude “this website must have privacy.” My work is consonant with Alan Westin’s, who over decades of surveys, repeatedly found that most consumers think businesses handle personal data in a “confidential way.” Westin’s findings imply that consumers falsely believe that there is a broad norm against data selling.

In writing consumer law, one can’t take a lawyer’s perspective. Consumers do not act nor do they think like lawyers. Lawyers think the issue is as simple as reading a disclosure. But to the average person, the mere presence “privacy policy” means something substantive. It looks more like a quality seal (e.g. “organic”) rather than an invitation to read.

This is why the initiative and the CCPA go to such extraordinary measures to inform consumers with “Do not sell my personal information” disclosures. Absent such a clear and dramatic disclosure, consumers falsely assume that sellers have confidentiality obligations.

The CCPA is trying to thread a needle between not violating commercial speech interests and disabusing consumers of data selling misconceptions. These competing interests explain why the CCPA is opt-out for data selling. CCPA attempts to minimize impingement on commercial free speech (in the form of data selling) while also informing consumers of businesses’ actual practices.

Let me state this again: the government interest in commanding the specific representation “Do not sell my personal information,” is necessary to both 1) disabuse consumers of the false belief that services are prohibited from selling their data, and 2) to directly tell consumers that they have to take action and exercise the opt out under CCPA. It would indeed make more sense from a consumer perspective for the CCPA to require affirmative consent. But since that may be constitutionally problematic, the CCPA has taken an opt out approach, along with a strong statement to help consumers understand their need to take action. Without a visceral, dramatic disclosure, consumers will not know that they need to act to protect their privacy. Your regulatory findings should recite these value conflicts, and the need for compelled speech in order to correct a widespread consumer misconception.

Data brokers and opting out

Vermont law now requires data brokers to register, and its registry should help Californians locate opt out opportunities. However, the AG can further assist in this effort by requiring a standardized textual disclosure that is easy to find using search engines. Standardized is important because businesses tend to develop arbitrary terminology that has no meaning outside the industry. Text is important because it is easier to search for words than images, and because logo-based “buttons” carry arbitrary or even conflicting semiotic meaning.

Non-discrimination norms

Section §125 of the CCPA is the most perplexing, yet it is harmonious with the overall intent of the initiative to create markets. My understanding of §125 is that it seeks to 1) prevent platforms such as Facebook from offering a price that is widely divergent from costs. For instance, Facebook’s claims its average revenue per user (ARPU) is about $100/year in North America. The CCPA seeks to prevent Facebook from charging fees that would be greatly in excess of $10/month. Thus, the AG could look to ARPU as a peg for defining unreasonable incentive practices. 2) CCPA was attempting to prevent the spread of surveillance capitalism business models into area where information usually is not at play, for instance, at bricks and mortar businesses.

One area to consider under §125 are the growing number of businesses that reject cash payment. These businesses are portrayed as progressive but actually the practice is regressive (consumers spend more when they use plastic, the practice is exclusionary for the unbanked, it subjects consumers to more security breaches, and it imposes a ~3% fee on all transactions).  Consumers probably do not understand that modern payment systems can reidentify them and build marketing lists. The privacy implications of digital payments are not disclosed nor mitigated, and as such, bricks and mortar businesses that demand digital payment may be coercive under CCPA.

Pro-privacy incentives

Privacy laws present a paradox: schemes like the GDPR can induce companies to use data more rather than less. This is because the GDPR’s extensive data mapping and procedural rules may end up highlighting unrealized information uses. The CCPA can avoid this by creating carrots for privacy-friendly business models, something that the GDPR does not do.

The most attractive carrot for companies is an exception that broadly relieves them of CCPA duties. The AG should make the short term transient use exemption the most attractive and usable one. That exception should be interpreted broadly and be readily usable by those acting in good faith. For instance, short-term uses should be interpreted to include retention up to 13 months so long as the data are not repurposed. The broad policy goals of the CCPA are met where an exception gives companies strong pro-privacy incentives. There’s no better one than encouraging companies to only collect data it needs for transactions, and to only keep it for the time needed to ensure anti-fraud, seasonal sales trend analysis, and other service-related reasons. For many businesses, this period is just in excess of one year.

Respectfully submitted,

/Chris Hoofnagle

Chris Jay Hoofnagle*
Adjunct full professor of information and of law
UC Berkeley
*Affiliation provided for identification purposes only

[1] North Dakota Secretary of State, Statewide Election Results, June 11, 2002.

[2] Hoofnagle et al., Behavioral Advertising: The Offer You Can’t Refuse, 6 Harv. L. & Pol’y Rev. 273 (2012).

[3] Jan Whittington & Chris Hoofnagle, Unpacking Privacy’s Price, 90 N.C. L. Rev. 1327 (2011).

[4] Pineda v. Williams Sonoma, 51 Cal.4th 524, 2011 WL 446921.

[5] https://www.clickz.com/what-acxiom-hash-figured-out/31429/ and https://developer.myacxiom.com/code/api/endpoints/hashed-entity

[6] From Nextmark.com: “co-operative (co-op) database

a prospecting database that is sourced from many mailing lists from many different sources. These lists are combined, de-duplicated, and sometimes enhanced to create a database that can then be used to select prospects. Many co-op operators require that you put your customers into the database before you can receive prospects from the database.

[7] Chris Hoofnagle, Facebook and Google Are the New Data Brokers, Cornell Digital Life Initiative (2018) https://www.dli.tech.cornell.edu/blog/facebook-and-google-are-the-new-data-brokers

[8] Chris Jay Hoofnagle and Jennifer M. Urban, Alan Westin’s Privacy Homo Economicus, 49 Wake Forest Law Review 261 (2014).

by chris at March 08, 2019 11:47 PM

February 26, 2019

Ph.D. student

Response to Abdurahman

Abdurahman has responded to my response to her tweet about my paper with Bruce Haynes, and invited me to write a rebuttal. While I’m happy to do so–arguing with intellectuals on the internet is probably one of my favorite things to do–it is not easy to rebut somebody with whom you have so little disagreement.

Abdurahman makes a number of points:

  1. Our paper, “Racial categories in machine learning”, omits the social context in which algorithms are enacted.
  2. The paper ignores whether computational thinking “acolytes like [me]” should be in the position of determining civic decisions.
  3. That the ontological contributions of African American Vernacular English (AAVE) are not present in the FAT* conference and that constitutes a hermeneutic injustice. (I may well have misstated this point).
  4. The positive reception to our paper may be due to its appeal to people with a disingenuous, lazy, or uncommitted racial politics.
  5. “Participatory design” does not capture Abdurahman’s challenge of “peer” design. She has a different and more broadly encompassing set of concerns: “whose language is used, whose viewpoint and values are privileged, whose agency is extended, and who has the right to frame the “problem”.”
  6. That our paper misses the point about predictive policing, from the perspective of people most affected by disparities in policing. Machine learning classification is not the right frame of the problem. The problem is an unjust prison system and, more broadly the unequal distribution of power that is manifested in the academic discourse itself. “[T]he problem is framed wrongly — it is not just that classification systems are inaccurate or biased, it is who has the power to classify, to determine the repercussions / policies associated thereof and their relation to historical and accumulated injustice?”

I have to say that I am not a stranger most of this line of thought and have great sympathy for the radical position expressed.

I will continue to defend our paper. Re: point 1, a major contribution of our paper was that it shed light on the political construction of race, especially race in the United States, which is absolutely part of “the social context in which algorithmic decision making is enacted”. Abdurahman must be referring to some other aspect of the social context. One problem we face as academic researchers is that the entire “social context” of algorithmic decision-making is the whole frickin’ world, and conference papers are about 12 pages or so. I thought we did a pretty good job of focusing on one, important and neglected aspect of that social context, the political formation of race, which as far as I know has never previously been addressed in a computer science paper. (I’ve written more about this point here).

Re: point 2, it’s true we omit a discussion of the relevance of computational thinking to civic decision-making. That is because this is a safe assumption to make in a publication to that venue. I happen to agree with that assumption, which is why I worked hard to submit a paper to that conference. If I didn’t think computational thinking was relevant, I probably would be doing something else with my time. That said, I think it’s wildly flattering and inaccurate to say that I, personally, have any control over “civic decision-making”. I really don’t, and I’m not sure why you’d think that, except for the erroneous myth that computer science research is, in itself, political power. It isn’t; that’s a lie that the tech companies have told the world.

I am quite aware (re: point 3) that my embodied and social “location” is quite different from Abdurahman’s. For example, unlike Abdurahman, it would be utterly pretentious for me to posture or “front” with AAVE. I simply have no access to its ontological wisdom, and could not be the conduit of it into any discourse, academic or informal. I have and use different resources; I am also limited by my positionality like anybody else. Sorry.

“Woke” white liberals potentially liking our argument? (Re: point 4) Fair. I don’t think that means our argument is bad or that the points aren’t worth making.

Re: point 5: I must be forgiven for not understanding the full depth of Abdurahman’s methodological commitments on the basis of a single tweet. There are a lot of different design methodologies and their boundaries are disputed. I see now that the label of “participatory design” is not sufficiently critical or radical enough to capture what she has in mind. I’m pleased to see she is working with Tap Parikh on this, who has a lot of experience with critical/radical HCI methods. I’m personally not an expert on any of this stuff. I do different work.

Re: point 6: My personal opinions about the criminal justice system did not make it into our paper, which again was a focused scientific article trying to make a different point. Our paper was about how racial categories are formed, how they are unfair, and how a computational system designed for fairness might address that problem. I agree that this approach is unlikely to have much meaningful impact on the injustices of the cradle-to-prison system in the United States, the prison-industrial complex, or the like. Based on what I’ve heard so far, the problems there would be best solved by changing the ways judges are trained. I don’t have any say in that, though–I don’t have a law degree.

In general, while I see Abdurahman’s frustrations as valid (of course!), I think it’s ironic and frustrating that she targets our paper as an emblem of the problems with the FAT* conference, with computer science, and with the world at large. First, our paper was not a “typical” FAT* paper; it was a very unusual one, positioned to broaden the scope of what’s discussed there, motivated in part by my own criticisms of the conference the year before. It was also just one paper: there’s tons of other good work at that conference, and the conversation is quite broad. I expect the best solution to the problem is to write and submit different papers. But it may also be that other venues are better for addressing the problems raised.

I’ll conclude that many of the difficulties and misunderstandings that underlie our conversation are a result of a disciplinary collapse that is happening because of academia’s relationship with social media. Language’s meaning depends on its social context, and social media is notoriously a place where contexts collapse. It is totally unreasonable to argue that everybody in the world should be focused on what you think is most important. In general, I think battles over “framing” on the Internet are stupid, and that the fact that these kinds of battles have become so politically prominent is a big part of why our society’s politics are so stupid. The current political emphasis on the symbolic sphere is a distraction from more consequential problems of economic and social structure.

As I’ve noted elsewhere, one reason why I think Haynes’s view of race is refreshing (as opposed to a lot of what passes for “critical race theory” in popular discussion) is that it locates the source of racial inequality in structure–spatial and social segregation–and institutional power–especially, the power of law. In my view, this politically substantive view of race is, if taken seriously, more radical than one based on mere “discourse” or “fairness” and demands a more thorough response. Codifying that response, in computational thinking, was the goal of our paper.

This is a more concrete and specific way of dealing with the power disparities that are at the heart of Abdurahman’s critique. Vague discourse and intimations about “privilege”, “agency”, and “power”, without an account of the specific mechanisms of that power, are weak.

by Sebastian Benthall at February 26, 2019 04:14 PM

February 23, 2019

Ph.D. student

Beginning to read “Smart Technologies and the End(s) of Law” (Notes on: Hildebrandt, Smart Technologies, Sections 7.1-7.2)

I’m starting to read Mireille Hildebrandt‘s Smart Technologies and the End(s) of Law (2015) at the recommendation of several friends with shared interests in privacy and the tensions between artificial intelligence and the law. As has been my habit with other substantive books, I intend to blog my notes from reading as I get to it, in sections, in a perhaps too stream-of-consciousness, opinionated, and personally inflected way.

For reasons I will get to later, Hildebrandt’s book is a must-read for me. I’ve decided to start by jumping in on Chapter 7, because (a) I’m familiar enough with technology ethics, AI, and privacy scholarship to think I can skip that and come back as needed, and (b) I’m mainly reading because I’m interested in what a scholar of Hildebrandt’s stature says when she tackles the tricky problem of law’s response to AI head on.

I expect to disagree with Hildebrant in the end. We occupy different social positions and, as I’ve argued before, people’s position on various issues of technology policy appears to have a great deal to do with their social position or habitus. However, I know I have a good deal to learn about legal theory while having enough background in philosophy and social theory to parse through what Hildebrandt has to offer. And based on what I’ve read so far, I expect the contours of the possible positions that she draws out to be totally groundbreaking.

Notes on: Hildebrandt, Smart Technologies, §7.1-7.2

“The third part of this book inquires into the implications of smart technologies and data-driven agency for the law.”

– Hildebrandt, Smart Technologies,p.133

Lots of people write about how artificial intelligence presents an existential threat. Normally, they are talking about how a superintelligence is posing an existential threat to humanity. Hildebrandt is arguing something else: she is arguing that smart technologies may pose an existential threat to the law, or the Rule of Law. That is because the law’s “mode of existence” depends on written text, which is a different technical modality, with different affordances, than smart technology.

My take is that the mode of existence of modern law is deeply dependent upon the printing press and the way it has shaped our world. Especially the binary character of legal rules, the complexity of the legal system and the finality of legal decisions are affordances of — amongst things — the ICI [information and communication infrastructure] of the printing press.

– Hildebrandt, Smart Technologies, p.133

This is just so on point, it’s hard to know what to say. I mean, this is obviously on to something. But what?

To make her argument, Hildebrandt provides a crash course in philosophy of law and legal theory, distinguishing a number of perspectives that braid together into an argument. She discusses several different positions:

  • 7.2.1 Law as an essentially contested concept (Gallie). The concept of “law” [1] denotes something valuable, [2] covers intricate complexities, that makes it [3] inherently ambiguous and [4] necessarily vague. This [5] leads interested parties into contest over conceptions. The contest is [6] anchored in past, agreed upon exemplars of the concept, and [7] the contest itself sustains and develops the concept going forward. This is the seven-point framework of an “essentially contested concept”.
  • 7.2.2 Formal legal positivism. Law as a set of legal rules dictated by a sovereign (as opposed to law as a natural moral order) (Austin). Law as a coherent set of rules, defined by its unity (Kelsen). A distinction between substantive rules and rules about rule-making (Hart).
  • 7.2.3 Hermeneutic conceptions. The practice of law is about the creative interpretation of (e.g.) texts (case law, statutes, etc.) to application of new cases. The integrity of law (Dworkin) constrains this interpretation, but the projection of legal meaning into the future is part of the activity of legal practice. Judges “do things with words”–make performative utterances through their actions. Law is not just a system of rules, but a system of meaningful activity.
  • 7.2.3 Pragmatist conceptions (Realism legal positivism). As opposed to the formal legal positivism discusses earlier that sees law as rules, realist legal positivism sees law as a sociological phenomenon. Law is “prophecies of what the courts will do in fact, and nothing more pretentious” (Holmes). Pragmatism, as an epistemology, argues that the meaning of something is its practical effect; this approach could be seen as a constrained version of the hermeneutic concept of law.

To summarize Hildebrandt’s gloss on this material so far: Gallie’s “essentially contested concept” theory is doing the work of setting the stage for Hildebrant’s self-aware intervention into the legal debate. Hildebrandt is going to propose a specific concept of the law, and of the Rule of Law. She is doing this well-aware that this act of scholarship is engaging in contest.

Punchline

I detect in Hildebrandt’s writing a sympathy or preference for hermeneutic approaches to law. Indeed, by opening with Gallie, she sets up the contest about the concept of law as something internal to the hermeneutic processes of the law. These processes, and this contest, are about texts; the proliferation of texts is due to the role of the printing press in modern law. There is a coherent “integrity” to this concept of law.

The most interesting discussion, in my view, is loaded in to what reads like an afterthought: the pragmatist conception of law. Indeed, even at the level of formatting, pragmatism is buried: hermeneutic and pragmatist conceptions of law are combined into one section (7.2.3), where as Gallie and the formal positivists each get their own section (7.2.1 and 7.2.2).

This is odd, because the resonances between pragmatism and ‘smart technology’ are, in Hildebrandt’s admission, quite deep:

Basically, Holmes argued that law is, in fact, what we expect it to be, because it is this expectation that regulates our actions. Such expectations are grounded in past decisions, but if these were entirely deterministic of future decisions we would not need the law — we could settle for logic and simply calculate the outcome of future decisions. No need for interpretation. Holmes claimed, however, that ‘the life of law has not been logic. It has been experience.’ This correlates with a specific conception of intelligence. As we have seen in Chapter 2 and 3, rule-based artificial intelligence, which tried to solve problems by means of deductive logic, has been superseded by machine learning (ML), based on experience.

– Hildebrandt, Smart Technologies, p.142

Hildebrandt considers this connection between pragmatist legal interpretation and machine learning only to reject it summarily in a single paragraph at the end of the section.

If we translate [a maxim of classical pragmatist epistemology] into statistical forecasts we arrive at judgments resulting from ML. However, neither logic nor statistics can attribute meaning. ML-based court decisions would remove the fundamental ambiguity of human language from the centre stage of the law. As noted above, this ambiguity is connected with the value-laden aspect of the concept of law. It is not a drawback of natural language, but what saves us from acting like mindless agents. My take is that an approach based on statistics would reduce judicial and legislative decisions to administration, and thus collapse the Rule of Law. This is not to say that a number of administrative decisions could not be taken by smart computing systems. It is to confirm that such decisions should be brought under the Rule of Law, notably by making them contestable in a court of law.

– Hildebrandt, Smart Technologies, p.143

This is a clear articulation of Hildebrandt’s agenda (“My take is that…”). It is also clearly an aligning the practice of law with contest, ambiguity, and interpretation as opposed to “mindless” activity. Natural language’s ambiguity is a feature, not a bug. Narrow pragmatism, which is aligned with machine learning, is a threat to the Rule of Law

Some reflections

Before diving into the argument, I have to write a bit about my urgent interest in the book. Though I only heard about it recently, my interests have tracked the subject matter for some time.

For some time I have been interested in the connection between philosophical pragmatism and the concerns about AI, which I believe can be traced back to Horkheimer. But I thought nobody was giving the positive case for pragmatism its due. At the end of 2015, totally unaware of “Smart Technologies” (my professors didn’t seem aware of it either…), I decided that I would write my doctoral dissertation thesis defending the bold thesis that yes, we should have AI replace the government. A constitution written in source code. I was going to back the argument up with, among other things, pragmatist legal theory.

I had to drop the argument because I could not find faculty willing to be on the committee for such a dissertation! I have been convinced ever since that this is a line of argument that is actually rather suppressed. I was able to articulate the perspective in a philosophy journal in 2016, but had to abandon the topic.

This was probably good in the long run, since it meant I wrote a dissertation on privacy which addressed many of the themes I was interested in, but in greater depth. In particular, working with Helen Nissenbaum I learned about Hildebrandt’s articles comparing contextual integrity with purpose binding in the GDPR (Hildebrandt, 2013; Hildebrandt, 2014), which at the time my mentors at Berkeley seemed unaware of. I am still working on puzzles having to do with algorithmic implementation or response to the law, and likely will for some time.

Recently, been working at a Law School and have reengaged the interdisciplinary research community at venues like FAT*. This has led me, seemingly unavoidably, back to what I believe to be the crux of disciplinary tension today: the rising epistemic dominance of pragmatist computational statistics–“data science”and its threat to humanistic legal authority, which is manifested in the clash of institutions that are based on each, e.g., iconically, “Silicon Valley” (or Seattle) and the European Union. Because of the explicitly normative aspects of humanistic legal authority, it asserts itself again and again as an “ethical” alternative to pragmatist technocratic power. This is the latest manifestation of a very old debate.

Hildebrandt is the first respectable scholar (a category from which I exclude myself) that I’ve encountered to articulate this point. I have to see where she takes the argument.

So far, however, I think here argument begs the question. Implicitly, the “essentially contested” character of law is due to the ambiguity of natural language and the way in which that necessitates contest over the meaning of words. And so we have a professional class of lawyers and scholars that debate the meaning of words. I believe the the regulatory power of this class is what Hildebrandt refers to as “the Rule of Law”.

While it’s true that an alternative regulatory mechanism based on statistical prediction would be quite different from this sense of “Rule of Law”, it is not clear from Hildebrandt’s argument, yet, why her version of “Rule of Law” is better. The only hint of an argument is the problem of “mindless agents”. Is she worried about the deskilling of the legal profession, or the reduced need for elite contest over meaning? What is hermeneutics offering society, outside of the bounds of its own discourse?

References

Benthall, S. (2016). Philosophy of computational social science. Cosmos and History: The Journal of Natural and Social Philosophy12(2), 13-30.

Sebastian Benthall. Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics. Ph.D. dissertation. Advisors: John Chuang and Deirdre Mulligan. University of California, Berkeley. 2018.

Hildebrandt, Mireille. “Slaves to big data. Or are we?.” (2013).

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer, Cham, 2014. 31-62.

Hildebrandt, Mireille. Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing, 2015.

by Sebastian Benthall at February 23, 2019 04:39 PM

February 17, 2019

Ph.D. student

A few brief notes towards “Procuring Cybersecurity”

I’m shifting research focus a bit and wanted to jot down a few notes. The context for the shift is that I have the pleasure of organizing a roundtable discussion for NYU’s Center for Cybersecurity and Information Law Institute, working closely with Thomas Streinz of NYU’s Guarini Global Law and Tech.

The context for the workshop is the steady feed of news about global technology supply chains and how they are not just relevant to “cybersecurity”, but in some respects are constitutive of cyberinfrastructure and hence the field of its security.

I’m using “global technology supply chains” rather loosely here, but this includes:

  • Transborder personal data flows as used in e-commerce
  • Software- (and Infrastructure-)-as-a-Service being marketing internationally (including Google used abroad, for example)
  • Enterprise software import/export
  • Electronics manufacturing and distribution.

Many concerns about cybersecurity as a global phenomenon circulate around the imagined or actual supply chain. These are sometimes national security concerns that result in real policy, as when Australia recently banned Hauwei and ZTE from supplying 5G network equipment for fear that it would provide a vector of interference from the Chinese government.

But the nationalist framing is certainly not the whole story. I’ve heard anecdotally that after the Snowden revelations, Microsoft’s internally began to see the U.S. government as a cybersecurity “adversary“. Corporate tech vendors naturally don’t want to be known as being vectors for national surveillance, as this cuts down on their global market share.

Governments and corporations have different cybersecurity incentives and threat models. These models intersect and themselves create the dynamic cybersecurity field. For example, these Chinese government has viewed foreign software vendors as cybersecurity threats, and has responded by mandating source code disclosure. But as this is a vector of potential IP theft, foreign vendors have balked, seeing this mandate as a threat. (Ahmed and Weber, 2018).Complicating things further, a defensive “cybersecurity” measure can also serve the goal of protecting domestic technology innovation–which can be framed as providing a nationalist “cybersecurity” edge in the long run.

What, if anything, prevents a total cyberwar of all against all? One answer is trade agreements that level the playing field, or at least establish rules for the game. Another is open technology and standards, which provide an alternative field driven by the benefits of interoperability rather than proprietary interest and secrecy. Is it possible to capture any of this in accurate model or theory?

I love having the opportunity to explore these questions, as they are at the intersection of my empirical work on software supply chains (Benthall et al., 2016; Benthall, 2017) and also theoretical work on data economics in my dissertation. My hunch for some time has been that there’s a dearth of solid economics theory for the contemporary digital economy, and this is one way of getting at that.

References

Ahmed, S., & Weber, S. (2018). China’s long game in techno-nationalism. First Monday, 23(5). 

Benthall, S., Pinney, T., Herz, J. C., Plummer, K., Benthall, S., & Rostrup, S. (2016). An ecological approach to software supply chain risk management. In 15th Python in Science Conference.

Benthall, S. (2017, September). Assessing software supply chain risk using public data. In 2017 IEEE 28th Annual Software Technology Conference (STC) (pp. 1-5). IEEE.

by Sebastian Benthall at February 17, 2019 01:27 AM

February 09, 2019

Ph.D. student

Why STS is not the solution to “tech ethics”

“Tech ethics” are in (1) (2) (3) and a popular refrain at FAT* this year was that sensitivity to social and political context is the solution to the problems of unethical technology. How do we bring this sensitivity to technical design? Using the techniques of Science and Technology Studies (STS), argue variously Dobbe and Ames, as well as Selbst et al. (2019). Value Sensitive Design (VSD) (Friedman and Bainbridge, 2004) is one typical STS-branded technique for bringing this political awareness into the design process. In general, there is broad agreement that computer scientists should be working with social scientists when developing socially impactful technologies.

In this blog post, I argue that STS is not the solution to “tech ethics” that it tries to be.

Encouraging computer scientists to collaborate with social science domain experts is a great idea. My paper with Bruce Haynes (1) (2) (3) is an example of this kind of work. In it, we drew from sociology of race to inform a technical design that addressed the unfairness of racial categories. Significantly, in my view, we did not use STS in our work. Because the social injustices we were addressing were due to broad reaching social structures and politically constructed categories, we used sociology to elucidate what was at stake and what sorts of interventions would be a good idea.

It is important to recognize that there are many different social sciences dealing with “social and political context”, and that STS, despite its interdisciplinarity, is only one of them. This is easily missed in an interdisciplinary venue in which STS is active, because STS is somewhat activist in asserting its own importance in these venues. STS frequently positions itself as a reminder to blindered technologists that there is a social world out there. “Let me tell you about what you’re missing!” That’s it’s shtick. Because of this positioning, STS scholars frequently get a seat at the table with scientists and technologists. It’s a powerful position, in sense.

What STS scholars tend to ignore is how and when other forms of social scientists involve themselves in the process of technical design. For example, at FAT* this year there were two full tracks of Economic Models. Economic Models. Economics is a well-established social scientific discipline that has tools for understanding how a particular mechanism can have unintended effects when put into a social context. In economics, this is called “mechanism design”. It addresses what Selbst et al. might call the “Ripple Effect Trap”–the fact that a system in context may have effects that are different from the intention of designers. I’ve argued before that wiser economics are something we need to better address technology ethics, especially if we are talking about technology deployed by industry, which is most of it! But despite deep and systematic social scientific analysis of secondary and equilibrium effects at the conference, these peer-reviewed works are not acknowledged by STS interventionists. Why is that?

As usual, quantitative social scientists are completely ignored by STS-inspired critiques of technologists and their ethics. That is too bad, because at the scale at which these technologies are operating (mainly, we are discussing civic- or web-scale automated decision making systems that are inherently about large numbers of people), fuzzier debates about “values” and contextualized impact would surely benefit from quantitative operationalization.

The problem is that STS is, at its heart, a humanistic discipline, a subfield of anthropology. If and when STS does not deny the utility or truth or value of mathematization or quantification entirely, as a field of research it is methodologically skeptical about such things. In the self-conception of STS, this methodological relativism is part of its ethnographic rigor. This ethnographic relativism is more or less entirely incompatible with formal reasoning, which aspires to universal internal validity. At a moralistic level, it is this aspiration of universal internal validity that is so bedeviling to the STS scholar: the mathematics are inherently distinct from an awareness of the social context, because social context can only be understood in its ethnographic particularity.

This is a false dichotomy. There are other social sciences that address social and political context that do not have the same restrictive assumptions of STS. Some of these are quantitative, but not all of them are. There are qualitative sociologists and political scientists with great insights into social context that are not disciplinarily allergic to the standard practices of engineering. In many ways, these kinds of social sciences are far more compatible with the process of designing technology than STS! For example, the sociology we draw on in our “Racial categories in machine learning” paper is variously: Gramscian racial hegemony theory, structuralist sociology, Bourdieusian theories of social capital, and so on. Significantly, these theories are not based exclusively on ethnographic method. They are based on disciplines that happily mix historical and qualitative scholarship with quantitative research. The object of study is the social world, and part of the purpose of the research is to develop politically useful abstractions from it that generalize and can be measured. This is the form of social sciences that is compatible with quantitative policy evaluation, the sort of thing you would want to use if, for example, understanding the impact of an affirmative action policy.

Given the widely acknowledge truism that public sector technology design often encodes and enacts real policy changes (a point made in Deirdre Mulligan’s keynote), it would make sense to understand the effects of these technologies using the methodologies of policy impact evaluation. That would involve enlisting the kinds of social scientific expertise relevant to understand society at large!

But that is absolutely not what STS has to offer. STS is, at best, offering a humanistic evaluation of the social processes of technology design. The ontology of STS is flat, and its epistemology and ethics are immediate: the design decision comes down to a calculus of “values” of different “stakeholders”. Ironically, this is a picture of social context that often seems to neglect the political and economic context of that context. It is not an escape from empty abstraction. Rather, it insists on moving from clear abstractions to more nebulous ones, “values” like “fairness”, maintaining that if the conversation never ends and the design never gets formalized, ethics has been accomplished.

This has proven, again and again, to be a rhetorically effective position for research scholarship. It is quite popular among “ethics” researchers that are backed by corporate technology companies. That is quite possibly because the form of “ethics” that STS offers, for all of its calls for political sensitivity, is devoid of political substance. An apples-to-apples comparison of “values”, without considering the social origins of those values and the way those values are grounded in political interests that are not merely about “what we think is important in life”, but real contests over resource allocation. The observation by Ames et al. (2011) that people’s values with respect to technology varies with socio-economic class is terribly relevant, Bourdieusian lesson in how the standpoint of “values sensitivity” may, when taken seriously, run up against the hard realities of political agonism. I don’t believe STS researchers are truly naive about these points; however, in their rhetoric of design intervention, conducted in labs but isolated from the real conditions of technology firms, there is an idealism that can only survive under the self-imposed severity of STS’s own methodological restrictions.

Independent scholars can take up this position and publish daring pieces, winning the moral high ground. But that is not a serious position to take in an industrial setting, or when pursuing generalizable knowledge about the downstream impact of a design on a complex social system. Those empirical questions require different tools, albeit far more unwieldy ones. Complex survey instruments, skilled data analysis, and substantive social theory are needed to arrive at solid conclusions about the ethical impact of technology.

References

Ames, M. G., Go, J., Kaye, J. J., & Spasojevic, M. (2011, March). Understanding technology choices and values through social class. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (pp. 55-64). ACM.

Friedman, B., & Bainbridge, W. S. (2004). Value sensitive design.

Selbst, A. D., Friedler, S., Venkatasubramanian, S., & Vertesi, J. (2018, August). Fairness and Abstraction in Sociotechnical Systems. In ACM Conference on Fairness, Accountability, and Transparency (FAT*).

by Sebastian Benthall at February 09, 2019 10:14 PM

February 03, 2019

Ph.D. student

A Machine for Being Frustrated

A Machine for Being Frustrated

39441558_10102470415085495_4428821356234145792_o (1)

An exploration into new mechanisms for DIY jacquard weaving, as well as an ongoing interest in asking how non-human materials or forces can be engaged as collaborators resulted in the prototype of the wind loom—-a modified tapestry loom that with every 4th warp connected to a sail that moves the warp position in and out. The fabrication of the loom was led by Jen Mah and Rachel Bork, who iterated between several prototypes for laser-cut heddle/hooks that can be attached to the yarn, arms are connected to umbrellas that can move when the wind blows, easily attachable and detachable components to support easy travel, and so on. Nearly everything about this design process has been frustrating, from the difficulty of waiting for windy days to test to the stress and anticipation that such a wind loom produces. As I considered a redesign, I began to think about this experience of frustrating and my almost reflexive response to design it away. It has made me wonder if collaborating with the wind ought to be frustrating and if we just stuck through frustration a bit more, then maybe we wouldn’t have some of the negative effects we see emergent from innovation. Rather than seeing this as a “wind loom” I began to think of it as a kind of tool for becoming frustrated and learning how to deal with that emotion. In consolation, you will learn a great deal about the wind patterns in your local region.

More Information:
http://unstable.design/designing-machines-for-human-wind-collaboration/
https://www.instructables.com/id/Wind-Loom/

by admin at February 03, 2019 02:34 AM

February 02, 2019

Ph.D. student

All the problems with our paper, “Racial categories in machine learning”

Bruce Haynes and I were blown away by the reception to our paper, “Racial categories in machine learning“. This was a huge experiment in interdisciplinary collaboration for us. We are excited about the next steps in this line of research.

That includes engaging with criticism. One of our goals was to fuel a conversation in the research community about the operationalization of race. That isn’t a question that can be addressed by any one paper or team of researchers. So one thing we got out of the conference was great critical feedback on potential problems with the approach we proposed.

This post is an attempt to capture those critiques.

Need for participatory design

Khadijah Abdurahman, of Word to RI , issues a subtweeted challenge to us to present our paper to the hood. (RI stands for Roosevelt Island, in New York City, the location of the recently established Cornell Tech campus.)

One striking challenge, raised by Khadijah Abdurahman on Twitter, is that we should be developing peer relationships with the communities we research. I read this as a call for participatory design. It’s true this was not part of the process of the paper. In particular, Ms. Abdurahman points to a part of our abstract that uses jargon from computer science.

There are a lot of ways to respond to this comment. The first is to accept the challenge. I would personally love it if Bruce and I could present our research to folks on Roosevelt Island and get feedback from them.

There are other ways to respond that address the tensions of this comment. One is to point out that in addition to being an accomplished scholar of the sociology of race and how it forms, especially in urban settings, Bruce is a black man who is originally from Harlem. Indeed, Bruce’s family memoir shows his deep and well-researched familiarity with the life of marginalized people of the hood. So a “peer relationship” between an algorithm designer (me) and a member of an affected community (Bruce) is really part of the origin of our work.

Another is to point out that we did not research a particular community. Our paper was not human subjects research; it was about the racial categories that are maintained by the Federal U.S. government and which pervade society in a very general way. Indeed, everybody is affected by these categories. When I and others who looks like me are ascribed “white”, that is an example of these categories at work. Bruce and I were very aware of how different kinds of people at the conference responded to our work, and how it was an intervention in our own community, which is of course affected by these racial categories.

The last point is that computer science jargon is alienating to basically everybody who is not trained in computer science, whether they live in the hood or not. And the fact is we presented our work at a computer science venue. Personally, I’m in favor of universal education in computational statistics, but that is a tall order. If our work becomes successful, I could see it becoming part of, for example, a statistical demography curriculum that could be of popular interest. But this is early days.

The Quasi-Racial (QR) Categories are Not Interpretable

In our presentation, we introduced some terminology that did not make it into the paper. We named the vectors of segregation derived by our procedure “quasi-racial” (QR) vectors, to denote that we were trying to capture dimensions that were race-like, in that they captured the patterns of historic and ongoing racial injustice, without being the racial categories themselves, which we argued are inherently unfair categories of inequality.

First, we are not wedded to the name “quasi-racial” and are very open to different terminology if anybody has an idea for something better to call them.

More importantly, somebody pointed out that these QR vectors may not be interpretable. Given that the conference is not only about Fairness, but also Accountability and Transparency, this critique is certainly on point.

To be honest, I have not yet done the work of surveying the extensive literature on algorithm interpretability to get a nuanced response. I can give two informal responses. The first is that one assumption of our proposal is that there is something wrong with how race and racial categories are intuitive understood. Normal people’s understanding of race is, of course, ridden with stereotypes, implicit biases, false causal models, and so on. If we proposed an algorithm that was fully “interpretable” according to most people’s understanding of what race is, that algorithm would likely have racist or racially unequal outcomes. That’s precisely the problem that we are trying to get at with our work. In other words, when categories are inherently unfair, interpretability and fairness may be at odds.

The second response is that educating people about how the procedure works and why its motivated is part of what makes its outcomes interpretable. Teaching people about the history of racial categories, and how those categories are both the cause and effect of segregation in space and society, makes the algorithm interpretable. Teaching people about Principal Component Analysis, the algorithm we employ, is part of what makes the system interpretable. We are trying to drop knowledge; I don’t think we are offering any shortcuts.

Principal Component Analysis (PCA) may not be the right technique

An objection from the computer science end of the spectrum was that our proposed use of Principal Component Analysis (PCA) was not well-motivated enough. PCA is just one of many dimensionality reduction techniques–why did we choose it in particular? PCA has many assumptions about the input embedded within it, including the component vectors of interest are linear combinations of the inputs. What if the best QR representation is a non-linear combination of the input variables? And our use of unsupervised learning, as a general criticism, is perhaps lazy, since in order to validate its usefulness we will need to test it with labeled data anyway. We might be better off with a more carefully calibrated and better motivated alternative technique.

These are all fair criticisms. I am personally not satisfied with the technical component of the paper and presentation. I know the rigor of the analysis is not of the standard that would impress a machine learning scholar and can take full responsibility for that. I hope to do better in a future iteration of the work, and welcome any advice on how to do that from colleagues. I’d also be interested to see how more technically skilled computer scientists and formal modelers address the problem of unfair racial categories that we raised in the paper.

I see our main contribution as the raising of this problem of unfair categories, not our particular technical solution to it. As a potential solution, I hope that it’s better than nothing, a step in the right direction, and provocative. I subscribe to the belief that science is an iterative process and look forward to the next cycle of work.

Please feel free to reach out if you have a critique of our work that we’ve missed. We do appreciate all the feedback!

by Sebastian Benthall at February 02, 2019 07:20 PM

January 16, 2019

Ph.D. student

Notes on O’Neil, Chapter 2, “Bomb Parts”

Continuing with O’Neil’s Weapons of Math Destruction on to Chapter 2, “Bomb Parts”. This is a popular book and these are quick chapters. But that’s no reason to underestimate them! This is some of the most lucid work I’ve read on algorithmic fairness.

This chapter talks about three kinds of “models” used in prediction and decision making, with three examples. O’Neil speak highly of the kinds of models used in baseball to predict the trajectory of hits and determine the optimal placement of people in the field. (Ok, I’m not so good at baseball terms). These are good, O’Neil says, because they are transparent, they are consistently adjusted with new data, and the goals are well defined.

O’Neil then very charmingly writes about the model she uses mentally to determine how to feed her family. She juggles a lot of variables: the preferences of her kids, the nutrition and cost of ingredients, and time. This is all hugely relatable–everybody does something like this. Her point, it seems, is that this form of “model” encodes a lot of opinions or “ideology” because it reflects her values.

O’Neil then discusses recidivism prediction, specifically the LSI-R (Level of Service Inventory–Revised) tool. It asks questions like “How many previous convictions have you had?” and uses that to predict likelihood of future prediction. The problem is that (a) this is sensitive to overpolicing in neighborhoods, which has little to do with actual recidivism rates (as opposed to rearrest rates), and (b) e.g. black neighborhoods are more likely to be overpoliced, meaning that the tool, which is not very good at predicting recidivism, has disparate impact. This is an example of what O’Neil calls an (eponymous) weapon of math destruction.(WMD)

She argues that the three qualities of a WMD are Scale, Opacity, and Damage. Which makes sense.

As I’ve said, I think this is a better take on algorithmic ethics than almost anything I’ve read on the subject before. Why?

First, it doesn’t use the word “algorithm” at all. That is huge, because 95% of the time the use of the word “algorithmic” in the technology-and-society literature is stupid. People use “algorithm” when they really mean “software”. Now, they use “AI System” to mean “a company”. It’s ridiculous.

O’Neil makes it clear in this chapter that what she’s talking about are different kinds of models. Models can be in ones head (as in her plan for feeding her family) or in a computer, and both kinds of models can be racist. That’s a helpful, sane view. It’s been the consensus of computer scientists, cognitive scientists, and AI types for decades.

The problem with WMDs, as opposed to other, better models, is that the WMDS models are unhinged from reality. O’Neil’s complaint is not with use of models, but rather that models are being used without being properly trained using sound sampling on data and statistics. WMDs are not artificially intelligences; they are artificial stupidities.

In more technical terms, it seems like the problem with WMDs is not that they don’t properly trade off predictive accuracy with fairness, as some computer science literature would suggest is necessary. It’s that the systems have high error rates in the first place because the training and calibration systems are poorly designed. What’s worse, this avoidable error is disparately distributed, causing more harm to some groups than others.

This is a wonderful and eye-opening account of unfairness in the models used by automated decision-making systems (note the language). Why? Because it shows that there is a connection between statistical bias, the kind of bias that creates distortions in a quantitative predictive process, and social bias, the kind of bias people worry about politically, which consistently uses the term in both ways. If there is statistical bias that is weighing against some social group, then that’s definitely, 100% a form of bias.

Importantly, this kind of bias–statistical bias–is not something that every model must have. Only badly made models have it. It’s something that can be mitigated using scientific rigor and sound design. If we see the problem the way O’Neil sees it, then we can see clearly how better science, applied more rigorously, is also good for social justice.

As a scientist and technologist, it’s been terribly discouraging in the past years to be so consistently confronted with a false dichotomy between sound engineering and justice. At last, here’s a book that clearly outlines how the opposite is the case!

by Sebastian Benthall at January 16, 2019 04:44 AM

January 15, 2019

Ph.D. student

Researchers receive grant to study the invisible work of maintaining open-source software

Researchers at the UC Berkeley Institute for Data Science (BIDS), the University of California, San Diego, and the University of Connecticut have been awarded a grant of $138,055 from the Sloan Foundation and the Ford Foundation as part of a broad initiative to investigate the sustainability of digital infrastructures. The grant funds research into the maintenance of open-source software (OSS) projects, particularly focusing on the visible and invisible work that project maintainers do to support their projects and communities, as well as issues of burnout and maintainer sustainability. The research project will be led by BIDS staff ethnographer and principal investigator Stuart Geiger and will be conducted in collaboration with Lilly Irani and Dorothy Howard at UC San Diego, Alexandra Paxton at the University of Connecticut, and Nelle Varoquaux and Chris Holdgraf at UC Berkeley.

Many open-source software projects have become foundational components for many stakeholders and are now widely used behind-the-scenes to support activities across academia, the tech industry, government, journalism, and activism. OSS projects are often initially created by volunteers and provide immense benefits for society, but their maintainers can struggle with how to sustain and support their projects, particularly when widely used in increasingly critical contexts. Most OSS projects are maintained by only a handful of individuals, and community members often talk about how their projects might collapse if only one or two key individuals leave the project. Project leaders and maintainers must do far more than just write code to ensure a project’s long-term success: They resolve conflicts, perform community outreach, write documentation, review others’ code, mentor newcomers, coordinate with other projects, and more. However, many OSS project leaders and maintainers have publicly discussed the effects of burnout as they find themselves doing unexpected and sometimes thankless work.

The one-year research project — The Visible and Invisible Work of Maintaining Open-Source Digital Infrastructure — will study these issues in various software projects, including software libraries, collaboration platforms, and discussion platforms that have come to be used as critical digital infrastructure. The researchers will conduct interviews with project maintainers and contributors from a wide variety of projects, as well as analyze projects’ code repositories and communication platforms. The goal of the research is to better understand what project maintainers do, the challenges they face, and how their work can be better supported and sustained. This research on the invisible work of maintenance will help maintainers, contributors, users, and funders better understand the complexities within such projects, helping set expectations, develop training programs, and formulate evaluations.

by R. Stuart Geiger at January 15, 2019 08:00 AM

January 12, 2019

Ph.D. student

Reading O’Neil’s Weapons of Math Destruction

I probably should have already read Cathy O’Neil’s Weapons of Math Destruction. It was a blockbuster of the tech/algorithmic ethics discussion. It’s written by an accomplished mathematician, which I admire. I’ve also now seen O’Neil perform bluegrass music twice in New York City and think her band is great. At last I’ve found a copy and have started to dig in.

On the other hand, as is probably clear from other blog posts, I have a hard time swallowing a lot of the gloomy political work that puts the role of algorithms in society in such a negative light. I encounter is very frequently, and every time feel that some misunderstanding must have happened; something seems off.

It’s very clear that O’Neil can’t be accused of mathophobia or not understanding the complexity of the algorithms at play, which is an easy way to throw doubt on the arguments of some technology critics. Yet perhaps because it’s a popular book and not an academic work of Science and Technology Studies, I haven’t it’s arguments parsed through and analyzed in much depth.

This is a start. These are my notes on the introduction.

O’Neil describes the turning point in her career where she soured on math. After being an academic mathematician for some time, O’Neil went to work as a quantitative analyst for D.E. Shaw. She saw it as an opportunity to work in a global laboratory. But then the 2008 financial crisis made her see things differently.

The crash made it all too clear that mathematics, once my refuge, was not only deeply entangled in the world’s problems but also fueling many of them. The housing crisis, the collapse of major financial institutions, the rise of unemployment–all had been aided and abetted by mathematicians wielding magic formulas. What’s more, thanks to the extraordinary powers that I loved so much, math was able to combine with technology to multiply the chaos and misfortune, adding efficiency and scale to systems I now recognized as flawed.

O’Neil, Weapons of Math Destruction, p.2

As an independent reference on the causes of the 2008 financial crisis, which of course has been a hotly debated and disputed topic, I point to Sassen’s 2017 “Predatory Formations” article. Indeed, the systems that developed the sub-prime mortgage market were complex, opaque, and hard to regulate. Something went seriously wrong there.

But was it mathematics that was the problem? This is where I get hung up. I don’t understand the mindset that would attribute a crisis in the financial system to the use of abstract, logical, rigorous thinking. Consider the fact that there would not have been a financial crisis if there had not been a functional financial services system in the first place. Getting a mortgage and paying them off, and the systems that allow this to happen, all require mathematics to function. When these systems operate normally, they are taken for granted. When they suffer a crisis, when the system fails, the mathematics takes the blame. But a system can’t suffer a crisis if it didn’t start working rather well in the first place–otherwise, nobody would depend on it. Meanwhile, the regulatory reaction to the 2008 financial crisis required, of course, more mathematicians working to prevent the same thing from happening again.

So in this case (and I believe others) the question can’t be, whether mathematics, but rather which mathematics. It is so sad to me that these two questions get conflated.

O’Neil goes on to describe a case where an algorithm results in a teacher losing her job for not adding enough value to her students one year. An analysis makes a good case that the cause of her students’ scores not going up is that in the previous year, the students’ scores were inflated by teachers cheating the system. This argument was not consider conclusive enough to change the administrative decision.

Do you see the paradox? An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad. The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.

O’Neil, WMD, p.10

Now this is a fascinating point, and one that I don’t think has been taken up enough in the critical algorithms literature. It resonates with a point that came up earlier, that traditional collective human decision making is often driven by agreement on narratives, whereas automated decisions can be a qualitatively different kind of collective action because they can make judgments based on probabilistic judgments.

I have to wonder what O’Neil would argue the solution to this problem is. From her rhetoric, it seems like her recommendation must be prevent automated decisions from making probabilistic judgments. In other words, one could raise the evidenciary standard for algorithms so that they we equal to the standards that people use with each other.

That’s an interesting proposal. I’m not sure what the effects of it would be. I expect that the result would be lower expected values of whatever target was being optimized for, since the system would not be able to “take bets” below a certain level of confidence. One wonders if this would be a more or less arbitrary system.

Sadly, in order to evaluate this proposal seriously, one would have to employ mathematics. Which is, in O’Neil’s rhetoric, a form of evil magic. So, perhaps it’s best not to try.

O’Neil attributes the problems of WMD’s to the incentives of the data scientists building the systems. Maybe they know that their work effects people, especially the poor, in negative ways. But they don’t care.

But as a rule, the people running the WMD’s don’t dwell on these errors. Their feedback is money, which is also their incentive. Their systems are engineered to gobble up more data fine-tune their analytics so that more money will pour in. Investors, of course, feast on these returns and shower WMD companies with more money.

O’Neil, WMD, p.13

Calling out greed as the problem is effective and true in a lot of cases. I’ve argued myself that the real root of the technology ethics problem is capitalism: the way investors drive what products get made and deployed. This is a worthwhile point to make and one that doesn’t get made enough.

But the logical implications of this argument are off. Suppose it is true that “as a rule”, the makers of algorithms that do harm are made by people responding to the incentives of private capital. (IF harmful algorithm, THEN private capital created it.) That does not mean that there can’t be good algorithms as well, such as those created in the public sector. In other words, there are algorithms that are not WMDs.

So the insight here has to be that private capital investment corrupts the process of designing algorithms, making them harmful. One could easily make the case that private capital investment corrupts and makes harmful many things that are not algorithmic as well. For example, the historic trans-Atlantic slave trade was a terribly evil manifestation of capitalism. It did not, as far as I know, depend on modern day computer science.

Capitalism here looks to be the root of all evil. The fact that companies are using mathematics is merely incidental. And O’Neil should know that!

Here’s what I find so frustrating about this line of argument. Mathematical literacy is critical for understanding what’s going on with these systems and how to improve society. O’Neil certainly has this literacy. But there are many people who don’t have it. There is a power disparity there which is uncomfortable for everybody. But while O’Neil is admirably raising awareness about how these kinds of technical systems can and do go wrong, the single-minded focus and framing risks giving people the wrong idea that these intellectual tools are always bad or dangerous. That is not a solution to anything, in my view. Ignorance is never more ethical than education. But there is an enormous appetite among ignorant people for being told that it is so.

References

O’Neil, Cathy. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2017.

Sassen, Saskia. “Predatory Formations Dressed in Wall Street Suits and Algorithmic Math.” Science, Technology and Society22.1 (2017): 6-20.

by Sebastian Benthall at January 12, 2019 08:00 PM

January 11, 2019

Ph.D. student

"no photos please" and other broadcasts

We've spent a lot of collective time and effort on design and policy to support the privacy of the user of a piece of software, whether it's the Web or a mobile app or a device. But more current and more challenging is the privacy of the non-user of the app, the privacy of the bystander. With the ubiquity of sensors, we are increasingly observed, not just by giant corporations or government agencies, but by, as they say, little brothers.

Consider the smartphone camera. Taking digital photos is free, quick and easy; resolution and quality increase; metadata (like precise geolocation) is attached; sharing those photos is easy via online services. As facial recognition has improved, it has become easier to automatically identify the people depicted in a photo, whether they're the subject of a portrait or just in the background. If you don't want to share records of your precise geolocation and what you're doing in public places, with friends, family, strangers and law enforcement, it's no longer enough to be careful with the technology you choose to use, you'd also have to be constantly vigilant about the technology that everyone around you is using.

While it may be tempting to draw a "throw your hands up" conclusion from this -- privacy is dead, get over it, there's nothing we can easily do about it -- we actually have widespread experience with this kind of value and various norms to protect it. At conferences and public events, it's not uncommon to have a system of stickers on nametags to either opt-in or opt-out of photos. This is a help (not a hindrance) for event photographers: rather than asking everyone to pose in your photo, or asking everyone after the fact if they're alright with your posting a public photo, or being afraid of posting a photo and facing the anger of your attendees, you can just keep an eye on the red and green dots on those plastic nametags and feel confident that you're respecting the attendees at your event.

There are similar norms in other settings. Taking video in the movie theater violates legal protections, but there are also widespread and reasonably well-enforced norms against capturing video of live theater productions or comedians who test out new material in clubs, on grounds that may not be copyright. Art museums will often tell you whether photos are welcome or prohibited. In some settings the privacy of the people present is so essential that unwritten or written rules prohibit cameras altogether: at nude hot springs, for example, you just can't use a camera at all. You wouldn't take a photo in the waiting room of your doctor's office and you'll invite anger and social confrontation if you're taking photos of other people's children at your local playground.

And even in "public" or in contexts with friends, there are spoken or unspoken expectations. "Don't post that photo of me drinking, please." "Let me see how I look in that before you post it on Facebook." "Everyone knows that John doesn't like to have his photo taken."

As cameras become small and more widely used, and encompass depictions of more people, and are shared more widely and easily, and identifications of depicted people can also be shared, our social norms and spoken discussions don't easily keep up. Checking with people before you post a photo of them is absolutely a good practice and I encourage you to follow it. But why not also use technology to facilitate this checking others' preferences?

We have all the tools we need to make "no photos please" nametag stickers into unobtrusive and efficiently communicated messages. If you're attending a conference or party and don't want people to take your photo, just tap the "no photos please" setting on your smartphone before you walk in. And if you're taking photos at an event, your camera will show a warning when it knows that someone in the room doesn't want their photo taken, so that you can doublecheck with the people in your photo and make sure you're not inadvertently capturing someone in the background. And the venue can remind you that way too, in case you don't know the local norm that pictures shouldn't be taken in the church or museum.

Mockup of turning on No Photos Please mode. Camera icon by Mourad Mokrane from the Noun Project.

As a technical matter, I think we're looking at Bluetooth broadcast beacons, from smartphones or stationary devices. That could be a small Arduino-based widget on the wall of a commercial venue, or one day you might have a poker-chip-sized device in your pocket that you can click into private mode. When you're using a compatible camera app on your phone or a compatible handheld camera, your device regularly scans for nearby Bluetooth beacons and if it sees a "no photos please" message, it shows a (dismissable) warning.

Mockup of camera showing no photos warning.

The discretionary communication of preferences is ideal in part because it isn't self-enforcing. For example, if the police show up at the political protest you're attending and broadcast a "no photos please" beacon, you can (and should) override your camera warning to take photos of their official activity, as a safeguard for public safety and accountability. An automatically-enforcing DRM-style system would be both infeasible to construct and, if it were constructed, inappropriately inviting to government censorship or aggressive copyright maximalism. Technological hints are also less likely to confusingly over-promise a protection: we can explain to people that the "no photos please" beacon doesn't prevent impolite or malicious people from surreptitiously taking your photo, just as people are extremely familiar with the fact that placards, polite requests and even laws are sometimes ignored.

Making preferences technically available could also help with legal compliance. If you're taking a photo at an event and get a "no photos" warning, your device UI can help you log why you might be taking the photo anyway. Tap "I got consent" and your camera can embed metadata in the file that you gathered consent from the depicted people. Tap "Important public purpose" at the protest and you'll have a machine-readable affirmation in place of what you're doing, and your Internet-connected phone can also use that signal to make sure photos in this area are promptly backed up securely in case your device is confiscated.

People's preferences are of course more complicated than just "no photos please" or "sure, take my photo". While I, like many, have imagined that sticky policies could facilitate rules of how data is subsequently shared and used, there are good reasons to start with the simple capture-time question. For one, it's familiar, from these existing social and legal norms. For another, it can be a prompt for real-time in-person conversation. Rather than assuming an error-free technical-only system of preference satisfaction, this can be a quick reminder to check with the people right there in front of you for those nuances, and to do so prior to making a digital record.

Broadcast messages provide opportunities that I think we haven't fully explored or embraced in the age of the Internet and the (rightfully lauded) end-to-end principle. Some communications just naturally take the form of letting people in a geographic area know something relevant to the place. "The cafe is closing soon." "What's the history of that statue?" "What's the next stop on this train and when are we scheduled to arrive?" If WiFi routers included latitude and longitude in the WiFi network advertisement, your laptop could quickly and precisely geolocate even in areas where you don't have Internet access, and do so passively, without broadcasting your location to a geolocation provider. (That one is a little subtle; we wrote a paper on it back when we were evaluating the various privacy implications of WiFi geolocation databases at Berkeley.) What about, "Anyone up for a game of chess?" (See also, Grindr.) eBook readers could optionally broadcast the title of the current book to re-create the lovely serendipity of seeing the book cover a stranger is reading on the train. Music players could do the same.

The Internet is amazing for letting us communicate with people around the world around shared interests. We should see the opportunity for networking technology to also facilitate communications, including conversations about privacy, with those nearby.


Some end notes that my head wants to let go of: There is some prior art here that I don't want to dismiss or pass over, I just think we should push it further. A couple examples:

  • Google folks have developed broadcast URLs that they call The Physical Web so that real-life places can share a Web page about them (over mDNS or Bluetooth Low Energy) and I hope one day we can get a link to the presenter's current slide using networking rather than everyone taking a picture of a projected URL and awkwardly typing it into our laptops later.
  • The Occupy movement showed an interest in geographically-located Web services, including forums and chatrooms that operate over WiFi but not connected to the Internet. Occupy Here:
    Anyone within range of an Occupy.here wifi router, with a web-capable smartphone or laptop, can join the network “OCCUPY.HERE,” load the locally-hosted website http://occupy.here, and use the message board to connect with other users nearby.

Getting a little further afield but still related, it would be helpful if the network provider could communicate directly with the subscriber using the expressive capability of the Web. Lacking this capability, we've seen frustrating abuses of interception: captive portals redirect and impersonate Web traffic; ISPs insert bandwidth warnings as JavaScript insecurely transplanted into HTTP pages. Why not instead provide a way for the network to push a message to the client, not by pretending to be a server you happen to connect to around that same time, but just as a clearly separate message? ICMP control messages are an existing but underused technology.

by nick@npdoty.name at January 11, 2019 06:33 AM

January 09, 2019

Ph.D. student

computational institutions as non-narrative collective action

Nils Gilman recently pointed to a book chapter that confirms the need for “official futures” in capitalist institutions.

Nils indulged me in a brief exchange that helped me better grasp at a bothersome puzzle.

There is a certain class of intellectuals that insist on the primacy of narratives as a mode of human experience. These tend to be, not too surprisingly, writers and other forms of storytellers.

There is a different class of intellectuals that insists on the primacy of statistics. Statistics does not make it easy to tell stories because it is largely about the complexity of hypotheses and our lack of confidence in them.

The narrative/statistic divide could be seen as a divide between academic disciplines. It has often been taken to be, I believe wrongly, the crux of the “technology ethics” debate.

I questioned Nils as to whether his generalization stood up to statistically driven allocation of resources; i.e., those decisions made explicitly on probabilistic judgments. He argued that in the end, management and collective action require consensus around narrative.

In other words, what keeps narratives at the center of human activity is that (a) humans are in the loop, and (b) humans are collectively in the loop.

The idea that communication is necessary for collective action is one I used to put great stock in when studying Habermas. For Habermas, consensus, and especially linguistic consensus, is how humanity moves together. Habermas contrasted this mode of knowledge aimed at consensus and collective action with technical knowledge, which is aimed at efficiency. Habermas envisioned a society ruled by communicative rationality, deliberative democracy; following this line of reasoning, this communicative rationality would need to be a narrative rationality. Even if this rationality is not universal, it might, in Habermas’s later conception of governance, be shared by a responsible elite. Lawyers and a judiciary, for example.

The puzzle that recurs again and again in my work has been the challenge of communicating how technology has become an alternative form of collective action. The claim made by some that technologists are a social “other” makes more sense if one sees them (us) as organizing around non-narrative principles of collective behavior.

It is I believe beyond serious dispute that well-constructed, statistically based collective decision-making processes perform better than many alternatives. In the field of future predictions, Phillip Tetlock’s work on superforecasting teams and prior work on expert political judgment has long stood as an empirical challenge to the supposed primacy of narrative-based forecasting. This challenge has not been taken up; it seems rather one-sided. One reason for this may be because the rationale for the effectiveness of these techniques rests ultimately in the science of statistics.

It is now common to insist that Artificial Intelligence should be seen as a sociotechnical system and not as a technological artifact. I wholeheartedly agree with this position. However, it is sometimes implied that to understand AI as a social+ system, one must understand it one narrative terms. This is an error; it would imply that the collective actions made to build an AI system and the technology itself are held together by narrative communication.

But if the whole purpose of building an AI system is to collectively act in a way that is more effective because of its facility with the nuances of probability, then the narrative lens will miss the point. The promise and threat of AI is that is delivers a different, often more effective form of collective or institution. I’ve suggested that computational institution might be the best way to refer to such a thing.

by Sebastian Benthall at January 09, 2019 03:54 PM

January 08, 2019

MIMS 2012

My Yardstick for Empathy

A different perspective by Jamie Street on Unsplash A different perspective. Photo by Jamie Street on Unsplash

How do you know if you’re being empathetic? It’s easy to throw the term around, but difficult to actually apply. This is important to understand in my chosen field of design, but can also help anyone improve their interactions with other people.

My yardstick for being empathetic is imagining myself make the same decisions, in the same situation, that another person made.

If I look at someone’s behavior and think, “That doesn’t make sense,” or “Why did they do that?!” then I’m not being empathetic. I’m missing a piece of context — about their knowledge, experiences, skills, emotional state, environment, etc. — that led them to do what they did. When I feel that way, I push myself to keep searching for the missing piece that will make their actions become the only rational ones to take.

Is this always possible? No. Even armed with the same knowledge, operating in the same environment, and possessing the same skills as another person, I will occasionally make different decisions than them. Every individual is unique, and interpret and act on stimuli differently.

Even so, imagining myself behave the same as another person is what I strive for. That’s my yardstick for empathy.


If you want to learn more about empathy and how to apply it to your work and personal life, I highly recommend Practical Empathy by Indi Young.

by Jeff Zych at January 08, 2019 08:22 PM

January 04, 2019

MIMS 2012

Books Read 2018

In 2018 I read 23 books, which is a solid 9 more than last year’s paltry 14, and 1 more than 2016). I credit the improvement to the 4-month sabbatical I took in the spring. Not working really frees up time 😄

For the last 2 years I said I needed to read more fiction since I only read 3 in 2016 and 2 in 2017. So how did I do? I’m proud to say I managed to read 7 fiction books this year (if you can count My Dad Wrote a Porno as “fiction”…). My reading still skews heavily to non-fiction, and specifically design, but that’s what I’m passionate about and it helps me professionally, so I’m ok with it.

I also apparently didn’t finish any books in January or February. I thought this might have been a mistake at first, but when I looked back on that time I realized it’s because I was wrapping things up at Optimizely, and reading both Quicksilver by Neal Stephenson and Story by Robert McKee at the same time, which are long books that took awhile to work through.

Highlights

Story: Substance, Structure, Style, and the Principles of Screenwriting

by Robert McKee

I’ve read next to nothing about writing stories before, but Robert McKee’s primer on the subject is excellent. Even though I’m not a fiction author, I found his principles for writing compelling narratives valuable beyond just the domain of screenwriting.

Handstyle Lettering

Published and edited by Victionary

There wasn’t much to “read” in this book, but it was full of beautiful hand-lettered pieces that continue to inspire me to be a better letterer.

The Baroque Cycle

by Neal Stephenson

Neal Stephenson’s Baroque Cycle is a broad, staggering, 3-volume and 2,500+ page opus of historical science fiction, making it no small feat to complete (I read the first 2 this year, and am almost done with the 3rd volume). It takes place during the scientific revolution of the 17th and 18th centuries when the world transitioned out of feudal rule towards a more rational and merit-based society that we would recognize as modern. It weaves together a story between fictional and non-fictional characters, including Newton, Leibniz, Hooke, Wren, royalty, and other persons-of-quality. Although the series can be slow and byzantine at times, Stephenson makes up for it with his attention to detail and the sheer amount of research and effort he put into accurately capturing the time period and bringing the story to life. Even just having the audacity to put yourself in Newton’s head to speak from his perspective, much less to do so convincingly, makes the series worth the effort.

Good Strategy, Bad Strategy

by Richard P. Rumelt

Strategy is a fuzzy concept, but Rumelt makes it concrete and approachable with many examples of good and bad strategy. Read my full notes here. Highly recommended.

Bird by Bird: Some Instructions on Writing and Life

by Anne Lamott

A great little meditation on the writing process (and life!), sprinkled with useful tips and tricks throughout.

Creative Selection: Inside Apple’s design process

by Ken Kocienda

Ken Kocienda was a software engineer during the “golden age of Steve Jobs,” and provides a fascinating insight into the company’s design process. I’m still chewing on what I read (and hope to publish more thoughts soon), but it’s striking how different it is from any process I’ve ever seen at any company, and different from best practices written about in books. It’s basically all built around Steve Jobs’ exacting taste, with designers and developers demoing their work to Steve with the hope of earning his approval. Very difficult to replicate, but the results speak for themselves.

Ogilvy on Advertising

by David Ogilvy

I hadn’t read much about advertising before, but Ogilvy’s book on the subject is great. It’s full of practical advice on how to write compelling headlines and ads that sell. Read my notes here.

Full List of Books Read

  • Story: Substance, Structure, Style, and the Principles of Screenwriting by Robert McKee (3/7/18)
  • The Color of Pixar by Tia Kratter (3/18/18)
  • Conversational Design by Erika Hall (3/27/18)
  • Quicksilver by Neal Stephenson (4/3/18)
  • Handstyle Lettering published and edited by Victionary (4/24/18)
  • Bimimicry: Innovation Inspired by Nature by Janine M. Benyus (5/4/18)
  • Design is Storytelling by Ellen Lupton (5/11/18)
  • Trip by Tao Lin (5/20/18)
  • Good Strategy, Bad Strategy: The Difference and Why it Matters by Richard P. Rumelt (5/27/18)
  • Bird by Bird: Some Instructions on Writing and Life by Anne Lamott (6/10/18)
  • The Inmates are Running the Asylum by Alan Cooper (6/13/18)
  • It Chooses You by Miranda July (6/13/18)
  • String Theory by David Foster Wallace (6/22/18)
  • Invisible Cities by Italo Calvino (6/28/18)
  • My Dad Wrote a Porno by Jamie Morton, James Cooper, Alice Levine, and Rocky Flintstone (7/1/18)
  • The User Experience Team of One by Leah Buley (7/8/18)
  • Change by Design by Tim Brown (9/3/18)
  • Darkness at Noon by Arthur Koestler (9/16/2018)
  • Creative Selection: Inside Apple’s design process during the golden age of Steve Jobs by Ken Kocienda (9/20/18)
  • The Confusion by Neal Stephenson (9/26/18)
  • How to Change Your Mind by Michael Pollan (10/27/18)
  • Ogilvy on Advertising by David Ogilvy (11/11/18)
  • Draft No. 4. On the writing process by John McPhee (11/14/18)

by Jeff Zych at January 04, 2019 12:26 AM

December 30, 2018

Ph.D. student

State regulation and/or corporate self-regulation

The dust from the recent debates about whether regulation or industrial self-regulation in the data/tech/AI industry appears to be settling. The smart money is on regulation and self-regulation being complementary for attaining the goal of an industry dominated by responsible actors. This trajectory leads to centralized corporate power that is lead from the top; it is a Hamiltonian not Jeffersonian solution, in Pasquale’s terms.

I am personally not inclined towards this solution. But I have been convinced to see it differently after a conversation today about environmentally sustainable supply chains in food manufacturing. Nestle, for example, has been internally changing its sourcing practices to more sustainable chocolate. It’s able to finance this change from its profits, and when it does change its internal policy, it operates on a scale that’s meaningful. It is able to make this transition in part because non-profits, NGO’s, and farmers cooperatives lay through groundwork for sustainable sourcing external to the company. This lowers the barriers to having Nestle switch over to new sources–they have already been subsidized through philanthropy and international aid investments.

Supply chain decisions, ‘make-or-buy’ decisions, are the heart of transaction cost economics (TCE) and critical to the constitution of institutions in general. What this story about sustainable sourcing tells us is that the configuration of private, public, and civil society institutions is complex, and that there are prospects for agency and change in the reconfiguration of those relationships. This is no different in the ‘tech sector’.

However, this theory of economic and political change is not popular; it does not have broad intellectual or media appeal. Why?

One reason may be because while it is a critical part of social structure, much of the supply chain is in the private sector, and hence is opaque. This is not a matter of transparency or interpretability of algorithms. This is about the fact that private institutions, by virtue of being ‘private’, do not have to report everything that they do and, probably, shouldn’t. But since so much of what is done by the massive private sector is of public import, there’s a danger of the privatization of public functions.

Another reason why this view of political change through the internal policy-making of enormous private corporations is unpopular is because it leaves decision-making up to a very small number of people–the elite managers of those corporations. The real disparity of power involved in private corporate governance means that the popular attitude towards that governance is, more often than not, irrelevant. Even less so that political elites, corporate elites are not accountable to a constituency. They are accountable, I suppose, to their shareholders, which have material interests disconnected from political will.

This disconnected shareholder will is one of the main reasons why I’m skeptical about the idea that large corporations and their internal policies are where we should place our hopes for moral leadership. But perhaps what I’m missing is the appropriate intellectual framework for how this will is shaped and what drives these kinds of corporate decisions. I still think TCE might provide insights that I’ve been missing. But I am on the lookout for other sources.

by Sebastian Benthall at December 30, 2018 08:39 PM

December 24, 2018

Ph.D. student

Ordoliberalism and industrial organization

There’s a nice op-ed by Wolfgang Münchau in FT, “The crisis of modern liberalism is down to market forces”.

Among other things, it reintroduces the term “ordoliberalism“, a particular Germanic kind of enlightened liberalism designed to prevent the kind of political collapse that had precipitated the war.

In Münchau’s account, the key insight of ordoliberalism is its attention to questions of social equality, but not through the mechanism of redistribution. Rather, ordoliberal interventions primarily effect industrial organization, favoring small to mid- sized companies.

As Germany’s economy remains robust and so far relatively politically stable, it’s interesting that ordoliberalism isn’t discussed more.

Another question that must be asked is to what extent the rise of computational institutions challenges the kind of industrial organization recommended by ordoliberalism. If computation induces corporate concentration, and there are not good policies for addressing that, then that’s due to a deficiency in our understanding of what ‘market forces’ are.

by Sebastian Benthall at December 24, 2018 02:32 PM

December 22, 2018

Ph.D. student

When *shouldn’t* you build a machine learning system?

Luke Stark raises an interesting question, directed at “ML practitioner”:

As an “ML practitioner” in on this discussion, I’ll have a go at it.

In short, one should not build an ML system for making a class of decisions if there is already a better system for making that decision that does not use ML.

An example of a comparable system that does not use ML would be a team of human beings with spreadsheets, or a team of people employed to judge for themselves.

There are a few reasons why a non-ML system could be superior in performance to an ML system:

  • The people involved could have access to more data, in the course of their lives, in more dimensions of variation, than is accessible by the machine learning system.
  • The people might have more sensitized ability to make semantic distinctions, such as in words or images, than an ML system
  • The problem to be solved could be a “wicked problem” that is itself over a very high-dimensional space of options, with very irregular outcomes, such that they are not amenable to various forms of, e.g., linear approximations
  • The people might be judging an aspect of their own social environment, such that the outcome’s validity is socially procedural (as in the outcome of a vote, or of an auction)

These are all fine reasons not to use an ML system. On the other hand, the term “ML” has been extended, as with “AI”, to include many hybrid human-computer systems, which has led to some confusion. So, for example. crowdsourced labels of images provide useful input data to ML systems. This hybrid system might perform semantic judgments over a large scale of data, at a high speed, at a tolerable rate of accuracy. Does this system count as an ML system? Or is it a form of computational institution that rivals other ways of solving the problem, and just so happens to have a machine learning algorithm as part of its process?

Meanwhile, the research frontier of machine learning is all about trying to solve problems that previously haven’t been solved, or solved as well, as alternative kinds of systems. This means there will always be a disconnect between machine learning research, which is trying to expand what it is possible to do with machine learning, and what machine learning research should, today, be deployed. Sometimes, research is done to develop technology that is not mature enough to deploy.

We should expect that a lot of ML research is done on things that should not ultimately be deployed! That’s because until we do the research, we may not understand the problem well enough to know the consequences of deployment. There’s a real sense in which ML research is about understanding the computational contours of a problem, whereas ML industry practice is about addressing the problems customers have with an efficient solution. Often this solution is a hybrid system in which ML only plays a small part; the use of ML here is really about a change in the institutional structure, not so much a part of what service is being delivered.

On the other hand, there have been a lot of cases–search engines and social media being important ones–where the scale of data and the use of ML for processing has allowed for a qualitatively different form of product or service. These are now the big deal companies we are constantly talking about. These are pretty clearly cases of successful ML.

by Sebastian Benthall at December 22, 2018 06:47 PM

computational institutions

As the “AI ethics” debate metastasizes in my newsfeed and scholarly circles, I’m struck by the frustrations of technologists and ethicists who seem to be speaking past each other.

While these tensions play out along disciplinary fault-lines, for example, between technologists and science and technology studies (STS), the economic motivations are more often than not below the surface.

I believe this is to some extent a problem of the nomenclature, which is again the function of the disciplinary rifts involved.

Computer scientists work, generally speaking, on the design and analysis of computational systems. Many see their work as bounded by the demands of the portability and formalizability of technology (see Selbst et al., 2019). That’s their job.

This is endlessly unsatisfying to critics of the social impact of technology. STS scholars will insist on changing the subject to “sociotechnical systems”, a term that means something very general: the assemblage of people and artifacts that are not people. This, fairly, removes focus from the computational system and embeds it in a social environment.

A goal of this kind of work seems to be to hold computational systems, as they are deployed and used socially, accountable. It must be said that once this happens, we are no longer talking about the specialized domain of computer science per se. It is a wonder why STS scholars are so often picking fights with computer scientists, when their true beef seems to be with businesses that use and deploy technology.

The AI Now Institute has attempted to rebrand the problem by discussing “AI Systems” as, roughly, those sociotechnical systems that use AI. This is one the one hand more specific–AI is a particular kind of technology, and perhaps it has particular political consequences. But their analysis of AI systems quickly overflows into sweeping claims about “the technology industry”, and it’s clear that most of their recommendations have little to do with AI, and indeed are trying, once again, to change the subject from discussion of AI as a technology (a computer science research domain) to a broader set of social and political issues that do, in fact, have their own disciplines where they have been researched for years.

The problem, really, is not that any particular conversation is not happening, or is being excluded, or is being shut down. The problem is that the engineering focused conversation about AI-as-a-technology has grown very large and become an awkward synecdoche for the rise of major corporations like Google, Apple, Amazon, Facebook, and Netflix. As these corporations fund and motivate a lot of research, there’s a question of who is going to get pieces of the big pie of opportunity these companies represent, either in terms of research grants or impact due to regulation, education, etc.

But there are so many aspects of these corporations that are neither addressed by the terms “sociotechnical system”, which is just so broad, and “AI System”, which is as broad and rarely means what you’d think it does (that the system uses AI is incidental if not unnecessary; what matters is that it’s a company operating in a core social domain via primarily technological user interfaces). Neither of these gets at the unit of analysis that’s really of interest.

An alternative: “computational institution”. Computational, in the sense of computational cognitive science and computational social science: it denotes the essential role of theory of computation and statistics in explaining the behavior of the phenomenon being studied. “Institution”, in the sense of institutional economics: the unit is a firm, which is comprised of people, their equipment, and their economic relations, to their suppliers and customers. An economic lens would immediately bring into focus “the data heist” and the “role of machines” that Nissenbaum is concerned are being left to the side.

by Sebastian Benthall at December 22, 2018 04:59 PM

December 20, 2018

Ph.D. student

Tensions of a Digitally-Connected World in Cricket Wireless’ Holiday Ad Campaign

In the spirit of taking a break over the holidays, this is more of a fun post with some very rough thoughts (though inspired by some of my prior work on paying attention to and critiquing narratives and futures portrayed by tech advertising). The basic version is that the Cricket Wireless 2018 Holiday AdFour the Holidays (made by ad company Psyop), portrays a narrative that makes a slight critique of an always-connected world and suggests that physical face-to-face interaction is a more enjoyable experience for friends than digital sharing. While perhaps a over-simplistic critique of mobile technology use, the twin messages of “buy a wireless phone plan to connect with friends” and “try to disconnect to spend time with friends” highlight important tensions and contradictions present in everyday digital life.

But let’s look at the ad in a little more detail!

Last month, while streaming Canadian curling matches (it’s more fun than you might think, case in point, I’ve blogged about the sport’s own controversy with broom technology) there was a short Cricket ad playing with a holiday jingle. And I’m generally inclined to pay attention to an ad with a good jingle. Looking it up online brought up a 3 minute long short film version expanding upon the 15 second commercial (embedded above), which I’ll describe and analyze below.

It starts with Cricket’s animated characters Ramon (the green blob with hair), Dusty (the orange fuzzy ball), Chip (the blue square), and Rose (the green oblong shape) on a Hollywood set, “filming” the aforementioned commercial, singing their jingle:

The four, the merrier! Cricket keeps us share-ier!

Four lines of unlimited data, for a hundred bucks a month!

After their shoot is over, Dusty wants the group to watch fireworks from the Cricket water tower (which is really the Warner Brothers Studio water tower, though maybe we should call it Chekov’s water tower in this instance) on New Year’s Eve. Alas, the crew has other plans, and everyone flies to their holiday destinations: Ramon to Mexico, Dusty to Canada, Chip to New York, and Rose to Aspen.

The video then shows each character enjoying the holidays in their respective locations with their smartphones. Ramon uses his phone to take pictures of food shared on a family table; Rose uses hers to take selfies on a ski lift.

The first hint that there might be a message critiquing an always-connected world is when the ad shows Dusty in a snowed-in, remote Canadian cabin. Presumably this tells us that he gets a cell signal up there, but in this scene, he is not using his phone. Rather, he’s making cookies with his two (human) nieces (not sure how that works, but I’ll suspend my disbelief), highlighting a face-to-face familial interaction using a traditional holiday group activity.

The second hint that something might not be quite right is the dutch angel establishing shot of New York City in the next scene. The non-horizontal horizon line (which also evokes the off-balance establishing shot of New York from an Avengers: Infinity War trailer) visually puts the scene off balance. But the moment quickly passes, as we see Chip on the streets of New York taking instagram selfies.

2 Dutch angles of New York

Dutch angle of New York from Cricket Wireless’ “Four the Holidays” (left) and Marvel’s Avengers Infinity War (right)

Then comes a rapid montage of photos and smiling selfies that the group is sending and sharing with each other, in a sort of digital self-presentation utopia. But as the short film has been hinting at, this utopia is not reflective of the characters’ lived experience.

The video cuts to Dusty, skating alone on a frozen pond, successfully completing a trick, but then realizes that he has no one to share the moment with. He then sings “The four the merrier, Cricket keeps us share-ier” in a minor key as re-envisions clouds in the sky as the form of the four friends. The minor key and Dusty’s singing show skepticism in the lyrics’ claim that being share-ier is indeed merrier.

The minor key continues, as Ramon sings while envisioning a set of holiday lights as the four friends, and Rose sees a department store window display as the four friends. Chip attends a party where the Cricket commercial (from the start of the video) airs on a TV, but is still lonely. Chip then hails a cab, dramatically stating in a deep voice “Take me home.”

In the last scene, Chip sits atop the Cricket Water Tower (or, Chekov’s Water Tower returns!) at 11:57pm on New Year’s Eve, staring alone at his phone, discontent. This is the clearest signal about the lack of fulfillment he finds from his phone, and by extension, the digitally mediated connection with his friends.

Immediately this is juxtaposed with Ramon singing with his guitar from the other side of the water tower, still in the minor key. Chip hears him and immediately becomes happier, and the music shifts to a major key as Rose and Dusty enter as the tempo picks up, and the drums and orchestra of instruments join in. And the commercial ends with the four of them watching New Year’s fireworks together. It’s worth noting the lyrics at the end:

Ramon: The four the merrier…

Chip [spoken]: Ramon?! You’re here!

Rose: There’s something in the air-ier

All: That helps us connect, all the season through. The four, the merrier

Dusty: One’s a little harrier (So hairy!)

All: The holidays are better, the holidays are better, the holidays are better with your crew.

Nothing here is explicitly about Cricket wireless, or the value of being digitally connected. It’s also worth noting that the phone that Chip was previously staring at is nowhere to be found after he sees Ramon. There is some ambiguous use of the word “connect,” which could refer to both a face-to-face interaction or a digitally mediated one, but the tone of the scene and emotional storyline bringing the four friends physically together seems to suggest that connect refers to the value of face-to-face interaction.

So what might this all mean (beyond the fact that I’ve watched this commercial too many times and have the music stuck in my head)? Perhaps the larger and more important point is that the commercial/short film is emblematic of a series of tensions around connection and disconnection in today’s society. Being digitally connected is seen as a positive that allows for greater opportunity (and greater work output), but at the same time discontent is reflected in culture and media, ranging from articles on tech addiction, to guides on grayscaling iPhones to combat color stimulation, to disconnection camps. There’s also a moralizing force behind these tensions: to be a good employee/student/friend/family member/etc, we are told that we must be digitally connected and always-on, but at the same time, we are told that we must also be dis-connected or interact face-to-face in order to be good subjects.

In many ways, the tensions expressed in this video — an advertisement for a wireless provider trying to encourage customers to sign up for their wireless plans, while presenting a story highlighting the need to digitally disconnect — parallels the tensions that Ellie Harmon and Melissa Mazmanian find in their analysis of media discourse of smartphones: that there is both a push for individuals to integrate the smartphone into everyday life, and to dis-integrate the smartphone from everyday life. What is fascinating to me here is that this video from Cricket exhibits both of those ideas at the same time. As Harmon and Mazmanian write,

The stories that circulate about the smartphone in American culture matter. They matter for how individuals experience the device, the ways that designers envision future technologies, and the ways that researchers frame their questions.

While Four the Holidays doesn’t tell the most complex or nuanced story about connectivity and smartphone use, the narrative that Cricket and Psyop created veers away from a utopian imagining of the world with tech, and instead begins to reflect  some of the inherent tensions and contradictions of smartphone use and mobile connectivity that are experienced as a part of everyday life.

by Richmond at December 20, 2018 05:36 AM

December 19, 2018

Ph.D. student

The politics of AI ethics is a seductive diversion from fixing our broken capitalist system

There is a lot of heat these days in the tech policy and ethics discourse. There is an enormous amount of valuable work being done on all fronts. And yet there is also sometimes bitter disciplinary infighting and political intrigue about who has the moral high ground.

The smartest thing I’ve read on this recently is Irina Raicu’s “False Dilemmas” piece, where she argues:

  • “Tech ethics” research, including research explore the space of ethics in algorithm design, is really code for industry self-regulation
  • Industry self-regulation and state regulation are complementary
  • Any claims that “the field” is dominated by one perspective or agenda or another is overstated

All this sounds very sane but it doesn’t exactly explain why there’s all this heated discussion in the first place. I think Luke Stark gets it right:

But what does it mean to say “the problem is mostly capitalism”? And why is it impolite to say it?

To say “the problem [with technology ethics and policy] is capitalism” is to note that most if not all of the social problems we associate with today’s technology have been problems with technology ever since the industrial revolution. For example, James Beniger‘s The Control Revolution, Horkheimer‘s Eclipse of Reason, and so on all speak to the tight link that there has always been between engineering and the capitalist economy as a whole. The link has persisted through the recent iterations of recognizing first data science, then later artificial intelligence, as disruptive triumphs of engineering with a variety of problematic social effects. These are old problems.

It’s impolite to say this because it cuts down on the urgency that might drive political action. More generally, it’s an embarrassment to anybody in the business of talking as if they just discovered something, which is what journalists and many academics do. The buzz of novelty is what gets people’s attention.

It also suggests that the blame for how technology has gone wrong lies with capitalists, meaning, venture capitalists, financiers, and early stage employees with stock options. But also, since it’s the 21st century, pension funds and university endowments are just as much a part of the capitalist investing system as anybody else. In capitalism, if you are saving, you are investing. Lots of people have a diffuse interest in preserving capitalism in some form.

There’s a lot of interesting work to be done on financial regulation, but it has very little to do with, say, science and technology studies and consumer products. So to acknowledge that the problem with technology is capitalism changes the subject to something remote and far more politically awkward than to say the problem is technology or technologists.

As I’ve argued elsewhere, a lot of what’s happening with technology ethics can be thought of as an extension of what Nancy Fraser called progressive neoliberalism: the alliance of neoliberalism with progressive political movements. It is still hegemonic in the smart, critical, academic and advocacy scene. Neoliberalism, or what is today perhaps better characterized as finance capitalism or surveillance capitalism, is what is causing the money to be invested in projects that design and deploy technology in certain ways. It is a system of economic distribution that is still hegemonic.

Because it’s hegemonic, it’s impolite to say so. So instead a lot of the technology criticism gets framed in terms of the next available moral compass, which is progressivism. Progressivism is a system of distribution of recognition. It calls for patterns of recognizing people for their demographic and, because it’s correlated in a sensitive way, professional identities. Nancy Fraser’s insight is that neoliberalism and progressivism have been closely allied for many years. One way that progressivism is allied with neoliberalism is that progressivism serves as a moral smokescreen for problems that are in part caused by neoliberalism, preventing an effective, actionable critique of the root cause of many technology-related problems.

Progressivism encourages political conflict to be articulated as an ‘us vs. them’ problem of populations and their attitudes, rather than as problem of institutions and their design. This “us versus them” framing is baldly stated than in the 2018 AI Now Report:

The AI accountability gap is growing: The technology scandals of 2018 have shown that the gap between those who develop and profit from AI—and those most likely to suffer the consequences of its negative effects—is growing larger, not smaller. There are several reasons for this, including a lack of government regulation, a highly concentrated AI sector, insufficient governance structures within technology companies, power asymmetries between companies and the people they serve, and a stark cultural divide between the engineering cohort responsible for technical research, and the vastly diverse populations where AI systems are deployed. (Emphasis mine)

There are several institutional reforms called for in the report, but the focus on a particular sector that it constructs as “the technology industry” composed on many “AI systems”, it cannot address broader economic issues such as unfair taxation or gerrymandering. Discussion of the overall economy is absent from the report; it is not the cause of anything. Rather, the root cause is a schism between kinds of people. The moral thrust of this claim hinges on the implied progressivism: the AI/tech people, who are developing and profiting, are a culture apart. The victims are “diverse”, and yet paradoxically unified in their culture as not the developers. This framing depends on the appeal of progressivism as a unifying culture whose moral force is due in large part because of its diversity. The AI developer culture is a threat in part because it is separate from diverse people–code for its being white and male.

This thread continues throughout the report, as various critical perspectives are cited in the report. For example:

A second problem relates to the deeper assumptions and worldviews of the designers of ethical codes in the technology industry. In response to the proliferation of corporate ethics initiatives, Greene et al. undertook a systematic critical review of high-profile “vision statements for ethical AI.” One of their findings was that these statements tend to adopt a technologically deterministic worldview, one where ethical agency and decision making was delegated to experts, “a narrow circle of who can or should adjudicate ethical concerns around AI/ML” on behalf of the rest of us. These statements often assert that AI promises both great benefits and risks to a universal humanity, without acknowledgement of more specific risks to marginalized populations. Rather than asking fundamental ethical and political questions about whether AI systems should be built, these documents implicitly frame technological progress as inevitable, calling for better building.

That systematic critical reviews of corporate policies express self-serving views that ultimately promote the legitimacy of the corporate efforts is a surprise to no one; it is no more a surprise than the fact that critical research institutes staffed by lawyers and soft social scientists write reports recommending that their expertise is vitally important for society and justice. As has been the case in every major technology and ethical scandal for years, the first thing the commentariat does is publish a lot of pieces justifying their own positions and, if they are brave, arguing that other people are getting too much attention or money. But since everybody in either business depends on capitalist finance in one way or another, the economic system is not subject to critique. In other words, once can’t argue that industrial visions of ‘ethical AI’ are favorable to building new AI products because they are written in service to capitalist investors who profit from the sale of new AI products. Rather, one must argue that they are written in this way because the authors have a weird technocratic worldview that isn’t diverse enough. One can’t argue that the commercial AI products neglect marginal populations because these populations have less purchasing power; one has to argue that the marginal populations are not represented or recognized enough.

And yet, the report paradoxically both repeatedly claims that AI developers are culturally and politically out of touch and lauds the internal protests at companies like Google that have exposed wrongdoing within those corporations. The actions of “technology industry” employees belies the idea that problem is mainly cultural; there is a managerial profit-making impulse that is, in large, stable companies in particular, distinct from that the rank-and-file engineer. This can be explained in terms of corporate incentives and so on, and indeed the report does in places call for whistleblower protections and labor organizing. But these calls for change cut against and contradict other politically loaded themes.

There are many different arguments contained in the long report; it is hard to find a reasonable position that has been completely omitted. But as a comprehensive survey of recent work on ethics and regulation in AI, its biases and blind spots are indicative of the larger debate. The report concludes with a call for a change in the intellectual basis for considering AI and its impact:

It is imperative that the balance of power shifts back in the public’s favor. This will require significant structural change that goes well beyond a focus on technical systems, including a willingness to alter the standard operational assumptions that govern the modern AI industry players. The current focus on discrete technical fixes to systems should expand to draw on socially-engaged disciplines, histories, and strategies capable of providing a deeper understanding of the various social contexts that shape the development and use of AI systems.

As more universities turn their focus to the study of AI’s social implications, computer science and engineering can no longer be the unquestioned center, but should collaborate more equally with social and humanistic disciplines, as well as with civil society organizations and affected communities. (Emphasis mine)

The “technology ethics” field is often construed, in this report but also in the broader conversation, as one of tension between computer science on the one hand, and socially engaged and humanistic disciplines on the other. For example, Selbst et al.’s “Fairness and Abstraction in Sociotechnical Systems” presents a thorough account of pitfalls of computer science’s approach to fairness in machine learning, and proposes a Science and Technology Studies. The refrain is that by considering more social context, more nuance, and so on, STS and humanistic disciplines avoids the problems that engineers, who try to provide portable, formal solutions, don’t want to address. As the AI Now report frames it, a benefit of the humanistic approach is that it brings the diverse non-AI populations to the table, shifting the balance of power back to the public. STS and related disciplines claim the status of relevant expertise in matters of technology that is somehow not the kind of expertise that is alienating or inaccessible to the public, unlike engineering, which allegedly dominates the higher education system.

I am personally baffled by these arguments; so often they appear to conflate academic disciplines with business practices in ways that most practitioners I engage with would not endorse. (Try asking an engineer how much they learned in school, versus on the job, about what it’s like to work in a corporate setting.) But beyond the strange extrapolation from academic disciplinary disputes (which are so often about the internal bureaucracies of universities it is, I’d argue after learning the hard way, unwise to take them seriously from either an intellectual or political perspective), there is also a profound absence of some fields from the debate, as framed in these reports.

I’m referring to the quantitative social sciences, such as economics and quantitative sociology, or what might be more be more generally converging on computational social science. These are the disciplines that one would need to use to understand the large-scale, systemic impact of technology on people, including the ways costs and benefits are distributed. These disciplines deal with social systems and include technology–there is a long tradition within economics studying the relationship between people, goods, and capital that never once requires the term “sociotechnical”–in a systematic way that can be used to predict the impact of policy. They can also connect, through applications of business and finance, the ways that capital flows and investment drive technology design decisions and corporate competition.

But these fields are awkwardly placed in technology ethics and politics. They don’t fit into the engineering vs. humanities dichotomy that entrances so many graduate students in this field. They often invoke mathematics, which makes them another form of suspicious, alien, insufficiently diverse expertise. And yet, it may be that these fields are the only ones that can correctly diagnose the problems caused by technology in society. In a sense, the progressive framing of the problems of technology makes technogy’s ills a problem of social context because it is unequipped to address them as a problem of economic context, and it wouldn’t want know that it is an economic problem anyway, for two somewhat opposed reasons: (a) acknowledging the underlying economic problems is taboo under hegemonic neoliberalism, and (b) it upsets the progressive view that more popularly accessible (and, if you think about it quantitatively, therefore as a result of how it is generated and constructed more diverse) humanistic fields need to be recognized as much as fields of narrow expertise. There is no credence given to the idea that narrow and mathematized expertise might actually be especially well-suited to understand what the hell is going on, and that this is precisely why members of these fields are so highly sought after by investors to work at their companies. (Consider, for example, who would be best positioned to analyze the “full stack supply chain” of artificial intelligence systems, as is called for by the AI Now report: sociologists, electrical engineers trained in the power use and design of computer chips, or management science/operations research types whose job is to optimize production given the many inputs and contingencies of chip manufacture?)

At the end of the day, the problem with the “technology ethics” debate is a dialectic cycle whereby (a) basic research is done by engineers, (b) that basic research is developed in a corporate setting as a product funded by capitalists, (c) that product raises political hackles and makes the corporations a lot of money, (d) humanities scholars escalate the political hackles, (e) basic researchers try to invent some new basic research because the politics have created more funding opportunities, (f) corporations do some PR work trying to CYA and engage in self-regulation to avoid litigation, (g) humanities scholars, loathe to cede the moral high ground, insist the scientific research is inadequate and that the corporate PR is bull. But this cycle is not necessarily productive. Rather, it sustains itself as part of a larger capitalist system that is bigger than any of these debates, structures its terms, and controls all sides of the dialog. Meanwhile the experts on how that larger system works are silent or ignored.

References

Fraser, Nancy. “Progressive neoliberalism versus reactionary populism: A choice that feminists should refuse.” NORA-Nordic Journal of Feminist and Gender Research 24.4 (2016): 281-284.

Greene, Daniel, Anna Laura Hoffman, and Luke Stark. “Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning.” Hawaii International Conference on System Sciences, Maui, forthcoming. Vol. 2019. 2018.

Raicu, Irina. “False Dilemmas”. 2018.

Selbst, Andrew D., et al. “Fairness and Abstraction in Sociotechnical Systems.” ACM Conference on Fairness, Accountability, and Transparency (FAT*). 2018.

Whittaker, Meredith et al. “AI Now Report 2018”. 2018.

by Sebastian Benthall at December 19, 2018 04:54 AM

December 14, 2018

Ph.D. student

The secret to social forms has been in institutional economics all along?

A long-standing mystery for me has been about the ontology of social forms (1) (2): under what conditions is it right to call a particular assemblage of people a thing, and why? Most people don’t worry about this; in literatures I’m familiar with it’s easy to take a sociotechnical complex or assemblage, or a company, or whatever, as a basic unit of analysis.

A lot of the trickiness comes from thinking about this as a problem of identifying social structure (Sawyer, 200; Cederman, 2005). This implies that people are in some sense together and obeying shared norms, and raises questions about whether those norms exist in their own heads or not, and so on. So far I haven’t seen a lot that really nails it.

But what if the answer has been lurking in institutional economics all along? The “theory of the firm” is essentially a question of why a particular social form–the firm–exists as opposed to a bunch of disorganized transactions. The answers that have come up are quite good.

Take for example Holmstrom (1982), who argues that in a situation where collective outcomes depend on individual efforts, individuals will be tempted to free-ride. That makes it beneficial to have somebody monitor the activities of the other people and have their utility be tied to the net success of the organization. That person becomes the owner of the company, in a capitalist firm.

What’s nice about this example is that it explains social structure based on an efficiency argument; we would expect organizations shaped like this to be bigger and command more resources than others that are less well organized. And indeed, we have many enormous hierarchical organizations in the wild to observe!

Another theory of the firm is Williamson’s transaction cost economics (TCE) theory, which is largely about the make-or-buy decision. If the transaction between a business and its supplier has “asset specificity”, meaning that the asset being traded is specific to the two parties and their transaction, then any investment from either party will induce a kind of ‘lock-in’ or ‘switching cost’ or, in Williamson’s language, a ‘bilateral dependence’. The more of that dependence, the more a free market relationship between the two parties will expose them to opportunistic hazards. Hence, complex contracts, or in the extreme case outright ownership and internalization, tie the firms together.

I’d argue: bilateral dependence and the complex ‘contracts’ the connect entities are very much the stuff of “social forms”. Cooperation between people is valuable; the relation between people who cooperate is valuable as a consequence; and so both parties are ‘structurated’ (to mangle a Giddens term) individually into maintaining the reality of the relation!

References

Cederman, L.E., 2005. Computational models of social forms: Advancing generative process theory 1. American Journal of Sociology, 110(4), pp.864-893.

Holmstrom, Bengt. “Moral hazard in teams.” The Bell Journal of Economics (1982): 324-340.

Sawyer, R. Keith. “Simulating emergence and downward causation in small groups.” Multi-agent-based simulation. Springer Berlin Heidelberg, 2000. 49-67.

Williamson, Oliver E. “Transaction cost economics.” Handbook of new institutional economics. Springer, Berlin, Heidelberg, 2008. 41-65.

by Sebastian Benthall at December 14, 2018 04:04 AM

December 09, 2018

Ph.D. student

Transaction cost economics and privacy: looking at Hoofnagle and Whittington’s “Free”

As I’ve been reading about transaction cost economics (TCE) and independently scrutinizing the business model of search engines, it stands to reason that I should look to the key paper holding down the connection between TCE and privacy, Hoofnagle and Whittinton’s “Free: Accounting for the Costs of the Internet’s Most Popular Price” (2014).

I want to preface the topic by saying I stand by what I wrote earlier: that at the heart of what’s going on with search engines, you have a trade of attention; it requires imagining the user has have attention-time as a scarce resource. The user has a query and has the option to find material relevant to the query in a variety of ways (like going to a library). Often (!) they will do so in a way that costs them as little attention as possible: they use a search engine, which gives an almost instant and often high-quality response; they are also shown advertisements which consume some small amount of their attention, but less than they would expend searching through other means. Advertisers pay the search engine for this exposure to the user’s attention, which funds the service that is “free”, in dollars (but not in attention) to the users.

Hoofnagle and Whittington make a very different argument about what’s going on with “free” web services, which includes free search engines. They argue that the claim that these web services are “free” is deceptive because the user may incur costs after the transaction on account of potential uses of their personal data. An example:

The freemium business model Anderson refers to is popular among industries online. Among them, online games provide examples of free services with hidden costs. By prefacing play with the disclosure of personal identification, the firms that own and operate games can contact and monitor each person in ways that are difficult for the consumer to realize or foresee. This is the case for many games, including Disney’s “Club Penguin,” an entertainment website for children. After providing personal information to the firm, consumers of Club Penguin receive limited exposure to basic game features and can see numerous opportunities to enrich their play with additional features. In order to enrich the free service, consumers must buy all sort of enhancements, such as an upgraded igloo or pets for one’s penguin. Disney, like others in the industry, places financial value on the number of consumers it identifies, the personal information they provide, and the extent to which Disney can track consumer activity in order to modify the game and thus increase the rate of conversion of consumers from free players to paying customers.

There are a number of claims here. Let’s enumerate them:

  1. This is an example of a ‘free’ service with hidden costs to users.
  2. The consumer doesn’t know what the game company will do with their personal information.
  3. In fact, the game will use the personal information to personalize pitches for in-game purchases that ‘enrich’ the free service.
  4. The goal of the company is to convert free players to paying customers.

Working backwards, claim (4) is totally true. The company wants to make money by getting their customers to pay, and they will use personal information to make paying attractive to the customers (3). But this does not mean that the customer is always unwitting. Maybe children don’t understand the business model when they begin playing Penguin Club, but especially today parents certainly do. App Stores, for example, now label apps when they have “in-app purchases”, which is a pretty strong signal. Perhaps this is a recent change due to some saber rattling by the FTC, which to be fair would be attributable as a triumph to the authors if this article had influence on getting that to happen. On the other hand, this is a very simple form of customer notice.

I am not totally confident that even if (2), (3), and (4) are true, that that entails (1), that there are “hidden costs” to free services. Elsewhere, Hoofnagle and Whittington raise more convincing examples of “costs” to release of PII, including being denied a job and resolving identity theft. But being convincingly sold an upgraded igloo for your digital penguin seems so trivial. Even if it’s personalized, how could it be a hidden cost? It’s a separate transaction, no? Do you or do you not buy the igloo?

Parsing this through requires, perhaps, a deeper look at TCE. According to TCE, agents are boundedly rational (they can’t know everything) and opportunistic (they will make an advantageous decision in the moment). Meanwhile, the world is complicated. These conditions imply that there’s a lot of uncertainty about future behavior, as agents will act strategically in ways that they can’t themselves predict. Nevertheless, agents engage in contracts with some kinds of obligations in them in the course of a transaction. TCE’s point is that these contracts are always incomplete, meaning that there are always uncertainties left unresolved in contracts that will need to be negotiated in certain contingent cases. All these costs of drafting, negotiating, and safeguarding the agreement are transaction costs.

Take an example of software contracting, which I happen to know about from personal experience. A software vendor gets a contract from a client to do some customization. The client and the vendor negotiated some sort of scope of work ex ante. But always(!), the client doesn’t actually know what they want, and if the vendor delivers on the specification literally the client doesn’t like it. Then begins the ex post negotiation as the client tries to get the vendor to tweak the system into something more usable.

Software contracting often resolves this by getting off the fixed cost contracting model and onto a cost-and-materials contact that allows billing by hours of developer time. Alternatively, the vendor can internalize the costs into the contract by inflating the cost “estimates” to cover for contingencies. In general, this all amounts to having more contract and a stronger relationship between the client and vendor, a “bilateral dependency” which TCE sees as a natural evolution of the incomplete contract under several common conditions, like “asset specificity”, which means that the asset is specialized to a particular transaction (or the two agents involved in it). Another term for this is lock-in, or the presence of high switching costs, though this way of thinking about it reintroduces the idea of a classical market for essentially comparable goods and services that TCE is designed to mitigate against. This explains how technical dependencies of an organization become baked in more or less constitutionally as part of the organization, leading to the robustness of installed base of a computing platform over time.

This ebb and flow of contract negotiation with software vendors was a bit unsettling to me when I first encountered it on the job, but I think it’s safe to say that most people working in the industry accept this as How Things Work. Perhaps it’s the continued influence of orthodox economics that makes this all seem inefficient somehow, and TCE is the right way to conceptualize things that makes better sense of reality.

But back to the Penguins…

Hoofnagle and Whittington make the case that sharing PII with a service that then personalizes its offerings to you creates a kind of bilateral dependence between service and user. They also argue that loss of privacy, due to the many possible uses of this personal information (some nefarious), is a hidden cost that can be thought of as an ex post transaction cost that is a hazard because it has not been factored into the price ex ante. The fact that this data is valuable to the platform/service for paying their production costs, which is not part of the “free” transaction, is an indication that this data is a lot more valuable than consumers think it is.

I am still on the fence about this.

I can’t get over the feeling that successfully selling a user a personalized, upgraded digital igloo is such an absurd example of a “hidden cost” that it belies the whole argument that these services have hidden costs.

Splitting hairs perhaps, it seems reasonable to say that Penguin Club has a free version, which is negotiated as one transaction. Then, conditional on the first transaction, it offers personalized igloos for real dollars. This purchase, if engaged in, would be another, different transaction, not an ex post renegotiation of the original contract with the Disney. This small difference changes the cost of the igloo from a hidden transaction cost into a normal, transparent cost. So it’s no big deal!

Does the use of PII create a bilateral dependence between Disney and the users of Penguin Club? Yes, in a sense. Any application of attention to an information service, learning how to use it and getting it to be part of your life, is in a sense a bilateral dependence with a switching cost. But there are so many other free games to play on the internet that these costs seem hardly hidden. They could just be understood as part of the game. Meanwhile, we are basically unconcerned with Disney’s “dependence” on the consumer data, because Disney can get new users easily (unless the user is a “whale”, who actual pays the company). And “dependence” Disney has on particular users is a hidden cost for Disney, not for the user, and who cares about Disney.

The cases of identity theft or job loss are strange cases that seem to have more to do with freaky data reuse than what’s going on with a particular transaction. Purpose binding notices and restrictions, which are being normed on through generalized GDPR compliance, seem adequate to deal with these cases.

So, I have two conclusions:

(1) Maybe TCE is the right lens for making an economic argument for why purpose binding restrictions are a good idea. They make transactions with platforms less incomplete, avoiding the moral hazard of ex post use of data in ways that incurs asymmetrically unknown effects on users.

(2) This TCE analysis of platforms doesn’t address the explanatorily powerful point that attention is part of the trade. In addition to being concretely what the user is “giving up” to the platform and directly explaining monetization in some circumstances, the fact that attention is “sticky” and creates some amount of asset-specific learning is a feature of the information economy more generally. Maybe it needs a closer look.

References

Hoofnagle, Chris Jay, and Jan Whittington. “Free: accounting for the costs of the internet’s most popular price.” UCLA L. Rev. 61 (2013): 606.

by Sebastian Benthall at December 09, 2018 09:01 PM

December 06, 2018

Ph.D. student

Data isn’t labor because using search engines is really easy

A theme I’ve heard raised in a couple places recently, including Ibarra et al. “Should We Treat Data As Labor?” and the AI Now 2018 Report, is that there is something wrong with how “data”, particularly data “produced” by people on the web, is conceptualized as part of the economy. Creating data, the argument goes, requires labor. And as the product of labor, it should be protected according to the values and practices of labor movements in the past. In particular, the current uses of data in, say, targeted advertising, social media, and search, are exploitative; the idea that consumers ‘pay’ for these services with their data is misleading and ultimately unfair to the consumer. Somehow the value created by the data should be reapportioned back to the user.

This is a sexy and popular argument among a certain subset of intellectuals who care about these things. I believe the core emotional appeal of this proposal is this: It is well known that a few well-known search engine and social media companies, namely Google and Facebook, are rich. If the value added by user data were in part returned to the users, the users, who are compared to Google and Facebook not rich, would get something they otherwise would not get. I.e., the benefits for recognizing the labor involved in creating data is redistribution of surplus to The Rest of Us.

I don’t have a problem personally with that redistributive impulse. However, I don’t think the “data is labor” argument actually makes much sense.

Why not? Well, let’s take the example of a search engine. Here is the transaction between a user and a search engine:

  • Alice types a query, “avocado toast recipes”, into the search engine. This submits data to the company computers.
  • The company computers use that data to generate a list of results that it deems relevant to that query.
  • Alice sees the results, and maybe clicks on one or two of them, if they are good, in the process of navigating to the thing she was looking for in the first place.
  • The search engine records that click as well, in order to better calibrate how to respond to others making that query.

We might forget that the search engine is providing Alice a service and isn’t just a ubiquitous part of the infrastructure we should take for granted. The search engine has provided Alice with relevant search results. What this does is (dramatically) reduce Alice’s search costs; had she tried to find the relevant URL by asking her friends, organically surfing the web, or using the library, who knows what she would have found or how long it would take her. But we would assume that Alice is using the search engine because it gets her more relevant results, faster.

It is not clear how Alice could get this thing she wants without going through the motions of typing and clicking and submitting data. These actions all seem like a bare minimum of what is necessary to conduct this kind of transaction. Similarly, when I got to a grocery store and buy vegetables, I have to get out my credit card and swipe it at the machine. This creates data–the data about my credit card transaction. But I would never advocate for recognizing my hidden labor at the credit card machine is necessary to avoid the exploitation of the credit card companies, who then use that information to go about their business. That would be insane.

Indeed, it is a principle of user interface design that the most compelling user interfaces are those that require the least effort from their users. Using search engines is really, really easy because they are designed that way. The fact that oodles of data are collected from a person without that person exerting much effort may be problematic in a lot of ways. But it’s not problematic because it’s laborious for the user; it is designed and compelling precisely because it is labor-saving. The smart home device industry has taken this even further, building voice-activated products for people who would rather not use their hands to input data. That is, if anything, less labor for the user, but more data and more processing on the automated part of the transaction. That the data is work for the company, and less work for the user, indicates that data is not the same thing as user labor.

There is a version of this argument that brings up feminism. Women’s labor, feminists point out, has long been insufficiently recognized and not properly remunerated. For example, domestic labor traditionally performed by women has been taken for granted, and emotional labor (the work of controlling ones emotions on the job), which has often been feminized, has not been taken seriously enough. This is a problem, and the social cause of recognizing women’s labor and rewarding it is, ceteris paribus, a great thing. But, and I know I’m on dicey ground here, so bear with me, this does not mean that everything that women do that they are not paid to do is unrecognized labor in the sense that is relevant for feminist critiques. Case in point, both men and women use credit cards to buy things, and make telephone calls, and drive vehicles through toll booths, and use search engines, and do any number of things that generate “data”, and in most of these cases it is not remunerated directly; but this lack of remuneration isn’t gendered. I would say, perhaps controversially, that the feminist critique does not actually apply to the general case of user generated data much at all! (Though is may apply in specific cases that I haven’t thought of.)

So in conclusion, data isn’t labor, and labor isn’t data. They are different things. We may want a better, more just, political outcome with respect to the distribution of surplus from the technology economy. But trying to get there through an analogy between data and labor is a kind of incoherent way to go about it. We should come up with a better, different way.

So what’s a better alternative? If the revenue streams of search engines are any indication, then it would seem that users “pay” for search engines through being exposed to advertising. So the “resource” that users are giving up in order to use the search engine is attention, or mental time; hence the term, attention economy.

Framing the user cost of search engines in terms of attention does not easily lend itself to an argument for economic reform. Why? Because search engines are already saving people a lot of that attention by making it so easy to look stuff up. Really the transaction looks like:

  • Alice pays some attention to Gob (the search engine).
  • Gob gives Alice some good search results back in return, and then…
  • Gob passes on some of Alice’s attention through to Bob, the advertiser, in return for money.

So Alice gives up attention but gets back search results and the advertisement. Gob gets money. Bob gets attention. The “data” that matters is not the data transmitted from Alice’s computer up to Gob. Rather, the valuable data is the data that Alice receives through her eyes: of this data, the search results are positively valued, the advertisement is negatively valued, but the value of the bundled good is net positive.

If there is something unjust about this economic situation, it has to be due to the way consumer’s attention is being managed by Gob. Interestingly, those who have studied the value of ‘free’ services in attentional terms have chalked up a substantial consumer surplus due to saved attention (Brynjolfsson and Oh, 2012) This appears to be the perspective of management scientists, who tend to be pro-business, and is not a point repeated often by legal scholars, who tend to be more litigious in outlook. For example, legal scholarship has detailed the view of how attention could be abused through digital market manipulation (Calo, 2013).

Ironically for data-as-labor theorists, the search-engine-as-liberator-of-attention argument could be read as the view that what people get from using search engines is more time, or more ability to do other things with their time. In other words, we would use a search engine instead of some other, more laborious discovery mechanism precisely because it would cost us net negative labor. That absolutely throws a wrench in any argument that the users of search engines should be rewarded on dignity of labor grounds. Instead, what’s happened is that search engines are ubiquitous because consumers have undergone a phase transition in their willingness to work to discover things, and now very happily use search engines which, on the whole, seem like a pretty good deal! (The cost of being-advertised-to is small compared to the benefits of the search results.)

If we start seeing search engines as a compelling labor-saving device rather than a exploiter of laborious clickwork, then some of the disregard consumers have for privacy on search engines becomes more understandable. People are willing to give up their data, even if they would rather not, because search engines are saving them so much time. The privacy harms that come as consequence, then, can be seen as externalities to what is essentially a healthy transaction, rather than a perverse matter of a business model that is evil to the bone.

This is, I wager, on the whole a common sense view, one that I’d momentarily forgotten because of my intellectual milieu but now am ashamed to have overlooked. It is, on the whole, far more optimistic than other attempt to characterize the zeitgeist of new technology economy.

Somehow, this rubric for understanding the digital economy appears to have fallen out of fashion. Davenport and Beck (2001) wrote a business book declaring attention to be “the new currency of business”, which if the prior analysis is correct makes more sense than data being the new currency (or oil) of business. The term appears to have originated in an article by Goldhaber (1997). Ironically, the term appears to have had no uptake in the economics literature, despite it being the key to everything! The concept was understood, however, by Herbert Simon, in 1971 (see also Terranova, 2012):

In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.

(A digitized version of this essay, which amazingly appears to be set by a typewriter and then hand-edited (by Simon himself?) can be found here.)

This is where I bottom out–the discover that the line of thought I’ve been on all day starts with Herbert Simon, that the sciences of the artificial are not new, they are just forgotten (because of the glut of other information), and exhaustingly hyped. The attention economy discovered by Simon explains why each year we are surrounded with new theories about how to organize ourselves with technology, when perhaps the wisest perspectives on these topics are ones that will not hype themselves because their authors cannot tweet from the grave.

References

Arrieta-Ibarra, Imanol, et al. “Should We Treat Data as Labor? Moving beyond” Free”.” AEA Papers and Proceedings. Vol. 108. 2018.

Brynjolfsson, Erik, and JooHee Oh. “The attention economy: measuring the value of free digital services on the Internet.” (2012).

Calo, Ryan. “Digital market manipulation.” Geo. Wash. L. Rev. 82 (2013): 995.

Davenport, Thomas H., and John C. Beck. The attention economy: Understanding the new currency of business. Harvard Business Press, 2001.

Goldhaber, Michael H. “The attention economy and the net.” First Monday 2.4 (1997).

Simon, Herbert A. “Designing organizations for an information-rich world.” (1971): 37-72.

Terranova, Tiziana. “Attention, economy and the brain.” Culture Machine 13 (2012).

by Sebastian Benthall at December 06, 2018 08:23 PM

December 04, 2018

Ph.D. student

the make or buy decision (TCE) in the software and cybersecurity

The paradigmatic case of transaction cost economics (TCE) is the make-or-buy decision. A firm, F, needs something, C. Do they make it in-house or do they buy it from somewhere else?

If the firm makes it in-house, they will incur some bureaucratic overhead costs in addition to the costs of production. But they will also be able to specialize C for their purposes. They can institute their own internal quality controls. And so on.

If the firm buys it on the open market from some other firm, say, G, they don’t pay the overhead costs. They do lose the benefits of specialization, and the quality controls are only those based on economic competitive pressure on suppliers.

There is an intermediate option, which is a contract between F and G which establishes an ongoing relationship between the two firms. This contract creates a field in which C can be specialized for F, and there can be assurances of quality, while the overhead is distributed efficiently between F and G.

This situation is both extremely common in business practice and not well handled by neoclassical, orthodox economics. It’s the case that TCE is tremendously preoccupied with.


My background and research is in the software industry, which is rife with cases like these.

Developers are constantly faced with a decision to make-or-buy software components. In principle, they can developer any component themselves. In practice, this is rarely cost-effective.

In software, open source software components are a prevalent solution to this problem. This can be thought of as a very strange market where all the prices are zero. The most popular open source libraries are very generic , having little “asset specificity” in TCE terms.

The lack of contract between developers and open source components/communities is sometimes seen as a source of hazard in using open source components. The recent event-stream hack, where an upstream component was injected with malicious code by a developer who had taken over maintaining the package, illustrates the problems of outsourcing technical dependencies without a contract. In this case, the quality problem is manifest as a supply chain cybersecurity problem.

In Williamson’s analysis, these kinds of hazards are what drive firms away from purchasing on spot markets and towards contracting or in-house development. In practice, the role of open source support companies fills the role of being a responsible entity G that firm F can build a relationship with.

by Sebastian Benthall at December 04, 2018 04:51 PM

December 03, 2018

MIMS 2012

My Beliefs About Design

Trees by Spencer Backman on Unsplash Photo by Spencer Backman on Unsplash

  • Design doesn’t own the customer experience. A great customer experience is the emergent outcome of the contributions of every department.
  • Design is not the center of the universe. Design is one function of many at an organization.
  • Other departments have more customer contact than you. Listen to them.
  • Don’t hand your work down from on high and expect everyone to worship its genius. You need to bring people along for the ride so that they see how you got to your solution, and can get there themselves.
  • Everyone can improve the customer’s experience, not just designers. Foster an environment where everyone applies a user-centered mindset to their work.
  • There’s no perfect, one-size-fits-all design process. Skilled designers have a variety of tools in their tool belt, and know when to use each one.
  • Done is better than perfect.
  • No design is perfect. Always be iterating.
  • Don’t fall in love with your designs. Be willing to kill your darlings.
  • You should always feel a little uncomfortable showing your work to peers. If you don’t, you’ve waited too long.
  • The only thing that matters to customers is what ships. Not your prototypes, wireframes, user journeys, or any other artifact of the design process.
  • The only true measure of your design’s success is the response from customers.
  • Stay curious. Regularly seek out new ideas, experiences, and perspectives.
  • Stay humble. You don’t know what’s best just because the word “designer” is in your title.
  • Don’t hide behind jargon and the cloak of the “creative.”
  • Great design is rooted in empathy. Empathy not just for the end user, but also for your coworkers, company, and society.
  • Having empathy for your customers means actually talking to them.
  • Don’t automatically ignore someone’s feedback because they’re more junior than you, or don’t have “designer” in their title. Don’t automatically listen to someone’s feedback because they’re more senior than you.
  • Design needs to be aligned to the needs of the business, and deliver measurable business value. Don’t design for design’s sake.

by Jeff Zych at December 03, 2018 09:12 PM

Ph.D. student

Williamson on four injunctions for good economics

Williamson (2008) (pdf) concludes with a description of four injunctions for doing good economics, which I will quote verbatim.

Robert Solow’s prescription for doing good economics is set out in three injunctions: keep it simple; get it right; make it plausible (2001, p. 111). Keeping it simple entails stripping away the inessentials and going for the main case (the jugular). Getting it right “includes translating economic concepts into accurate
mathematics (or diagrams, or words) and making sure that further logical operations are correctly performed and verified” (Solow, 2001, p. 112). Making it plausible entails describing human actors in (reasonably) veridical ways and maintaining meaningful contact with the phenomena of interest (contractual or otherwise).

To this, moreover, I would add a fourth injunction: derive refutable implications to which the relevant (often microanalytic) data are brought to bear. Nicholas Georgescu-Roegen has a felicitous way of putting it: “The purpose of science in general is not prediction, but knowledge for its own sake,” yet prediction is “the touchstone of scientific knowledge” (1971, p. 37).

Why the fourth injunction? This is necessitated by the need to choose among alternative theories that purport to deal with the same phenomenon—say vertical integration—and (more or less) satisfy the first three injunctions. Thus assume that all of the models are tractable, that the logic of each hangs together, and that agreement cannot be reached as to what constitutes veridicality and meaningful contact with the phenomena. Does each candidate theory then have equal claimsfor our attention? Or should we be more demanding? This is where refutable implications and empirical testing come in: ask each would-be theory to stand up and be counted.

Why more economists are not insistent upon deriving refutable implications and submitting these to empirical tests is a puzzle. One possibility is that the world of theory is set apart and has a life of its own. A second possibility is that some economists do not agree that refutable implications and testing are
important. Another is that some theories are truly fanciful and their protagonists would be discomfited by disclosure. A fourth is that the refutable implications of favored theories are contradicted by the data. And perhaps there are still other reasons. Be that as it may, a multiplicity of theories, some of which are
vacuous and others of which are fanciful, is an embarrassment to the pragmatically oriented members of the tribe. Among this subset, insistence upon the fourth injunction—derive refutable implications and submit these to the data—is growing.

References

Williamson, Oliver E. “Transaction cost economics.” Handbook of new institutional economics. Springer, Berlin, Heidelberg, 2008. 41-65.

by Sebastian Benthall at December 03, 2018 04:41 PM

December 02, 2018

Ph.D. student

Discovering transaction cost economics (TCE)

I’m in the process of discovering transaction cost economics (TCE), the branch of economics devoted to the study of transaction costs, which include bargaining and search costs. Oliver Williamson, who is a professor at UC Berkeley, won the Nobel Prize for his work on TCE in 2009. I’m starting with the Williamson, 2008 article (in the References) which seems like a late-stage overview of what is a large body of work.

Personally, this is yet another time when I’ve discovered that the answers or proper theoretical language for understanding something I am struggling with has simply been Somewhere Else all alone. Delight and frustration are pretty much evening each other out at this point.

Why is TCE so critical (to me)?

  • I think the real story about how the Internet and AI have changed things, which is the topic constantly reiterated in so many policy and HCI studies about platforms, is that they reduced search costs. However, it’s hard to make the case for that without a respectable theorization of search costs and how they matter to the economy. This, I think, what transaction cost economics are about.
  • You may recall I wrote my doctoral dissertation about “data economics” on the presumption (which was, truly, presumptuous) that a proper treatment of the role of data in the economy had not yet been done. This was due mainly to the deficiencies of the discussion of information in neoclassical economic theory. But perhaps I was a fool, because it may be that this missing-link work on information economics has been in transaction cost economics all along! Interestingly, Pat Bajari, who is Chief Economist at Amazon, has done some TCE work, suggesting that like Hal Varian’s economics, this is stuff that actually works in a business context, which is more or less the epistemic standard you want economics to meet. (I would argue that economics should be seen, foremost, as a discipline of social engineering.)
  • A whole other line of research I’ve worked on over the years has been trying to understand the software supply chain, especially with respect to open source software (Benthall 2016; Benthall, 2017). That’s a tricky topic because the idea of “supply” and “chain” in that domain are both highly metaphorical and essentially inaccurate. Yet there are clearly profound questions about the relationships between sociotechnical organizations, their internal and external complexity, and so on to be found there, along with (and this is really what’s exciting about it) ample empirical basis to support arguments about it, just by the nature of it. Well, it turns out that the paradigmatic case for transaction cost economics is vertical integration, or the “make-or-buy” decision wherein a firm decides to (A) purchase it from an open market, (D) produce something in-house, or (C) (and this is the case that transaction cost economics really tries to develop) engage with the supplier in a contract which creates an ongoing and secure relationship between them. Labor contracts are all, for reasons that I may go into later, of this (C) kind.

So, here comes TCE, with its firm roots in organization theory, Hayekian theories of the market, Coase’s and other theories of the firm, and firm emphasis on the supply chain relation between sociotechnical organizations. And I HAVEN’T STUDIED IT. There is even solid work on its relation to privacy done by Whittington and Hoofnagle (2011; 2013). How did I not know about this? Again, if I were not so delighted, I would be livid.

Please expect a long series of posts as I read through the literature on TCE and try to apply it to various cases of interest.

References

Benthall, S. (2017) Assessing Software Supply Chain Risk Using Public Data. IEEE STC 2017 Software Technology Conference.

Benthall, S., Pinney, T., Herz, J., Plummer, K. (2016) An Ecological Approach to Software Supply Chain Risk Management. Proceedings of the 15th Python in Science Conference. p. 136-142. Ed. Sebastian Benthall and Scott Rostrup.

Hoofnagle, Chris Jay, and Jan Whittington. “Free: accounting for the costs of the internet’s most popular price.” UCLA L. Rev. 61 (2013): 606.

Whittington, Jan, and Chris Jay Hoofnagle. “Unpacking Privacy’s Price.” NCL Rev. 90 (2011): 1327.

Williamson, Oliver E. “Transaction cost economics.” Handbook of new institutional economics. Springer, Berlin, Heidelberg, 2008. 41-65.

by Sebastian Benthall at December 02, 2018 10:31 PM

November 30, 2018

adjunct professor

Amsterdam Privacy Conference 2018

Opening keynote talk on The Tethered Economy, 87(4) Geo. Wash. L. Rev. ___ (2019)(with Aaron Perzanowski and Aniket Kesari), Amsterdam Privacy Conference, Oct. 2018.

by chris at November 30, 2018 05:55 AM

November 29, 2018

Ph.D. student

For fairness in machine learning, we need to consider the unfairness of racial categorization

Pre-prints of papers accepted to this coming 2019 Fairness, Accountability, and Transparency conference are floating around Twitter. From the looks of it, many of these papers add a wealth of historical and political context, which I feel is a big improvement.

A noteworthy paper, in this regard, is Hutchinson and Mitchell’s “50 Years of Test (Un)fairness: Lessons for Machine Learning”, which puts recent ‘fairness in machine learning’ work in the context of very analogous debates from the 60’s and 70’s that concerned the use of testing that could be biased due to cultural factors.

I like this paper a lot, in part because it is very thorough and in part because it tees up a line of argument that’s dear to me. Hutchinson and Mitchell raise the question of how to properly think about fairness in machine learning when the protected categories invoked by nondiscrimination law are themselves social constructs.

Some work on practically assessing fairness in ML has tackled the problem of using race as a construct. This echoes concerns in the testing literature that stem back to at least 1966: “one stumbles immediately over the scientific difficulty of establishing clear yardsticks by which people can be classified into convenient racial categories” [30]. Recent approaches have used Fitzpatrick skin type or unsupervised clustering to avoid racial categorizations [7, 55]. We note that the testing literature of the 1960s and 1970s frequently uses the phrase “cultural fairness” when referring to parity between blacks and whites.

They conclude that this is one of the areas where there can be a lot more useful work:

This short review of historical connections in fairness suggest several concrete steps forward for future research in ML fairness: Diving more deeply into the question of how subgroups are defined, suggested as early as 1966 [30], including questioning whether subgroups should be treated as discrete categories at all, and how intersectionality can be modeled. This might include, for example, how to quantify fairness along one dimension (e.g., age) conditioned on another dimension (e.g., skin tone), as recent work has begun to address [27, 39].

This is all very cool to read, because this is precisely the topic that Bruce Haynes and I address in our FAT* paper, “Racial categories in machine learning” (arXiv link). The problem we confront in this paper is that the racial categories we are used to using in the United States (White, Black, Asian) originate in the white supremacy that was enshrined into the Constitution when it was formed and perpetuated since then through the legal system (with some countervailing activity during the Civil Rights Movement, for example). This puts “fair machine learning” researchers in a bind: either they can use these categories, which have always been about perpetuating social inequality, or they can ignore the categories and reproduce the patterns of social inequality that prevail in fact because of the history of race.

In the paper, we propose a third option. First, rather than reify racial categories, we propose breaking race down into the kinds of personal features that get inscribed with racial meaning. Phenotype properties like skin type and ocular folds are one such set of features. Another set are events that indicate position in social class, such as being arrested or receiving welfare. Another set are facts about the national and geographic origin of ones ancestors. These facts about a person are clearly relevant to how racial distinctions are made, but are themselves more granular and multidimensional than race.

The next step is to detect race-like categories by looking at who is segregated from each other. We propose an unsupervised machine learning technique that works with the distribution of the phenotype, class, and ancestry features across spatial tracts (as in when considering where people physically live) or across a social network (as in when considering people’s professional networks, for example). Principal component analysis can identify what race-like dimensions capture the greatest amounts of spatial and social separation. We hypothesize that these dimensions will encode the ways racial categorization has shaped the social structure in tangible ways; these effects may include both politically recognized forms of discrimination as well as forms of discrimination that have not yet been surfaced. These dimensions can then be used to classify people in race-like ways as input to fairness interventions in machine learning.

A key part of our proposal is that race-like classification depends on the empirical distribution of persons in physical and social space, and so are not fixed. This operationalizes the way that race is socially and politically constructed without reifying the categories in terms that reproduce their white supremacist origins.

I’m quite stoked about this research, though obviously it raises a lot of serious challenges in terms of validation.

by Sebastian Benthall at November 29, 2018 03:05 PM

November 27, 2018

Ph.D. student

directions to migrate your WebFaction site to HTTPS

Hiya friends using WebFaction,

Securing the Web, even our little websites, is important — to set a good example, to maintain the confidentiality and integrity of our visitors, to get the best Google search ranking. While secure Web connections had been difficult and/or costly in the past, more recently, migrating a site to HTTPS has become fairly straightforward and costs $0 a year. It may get even easier in the future, but for now, the following steps should do the trick.

Hope this helps, and please let me know if you have any issues,
Nick

P.S. Yes, other friends, I recommend WebFaction as a host; I’ve been very happy with them. Services are reasonably priced and easy to use and I can SSH into a server and install stuff. Sign up via this affiliate link and maybe I get a discount on my service or something.

P.S. And really, let me know if and when you have issues. Encrypting access to your website has gotten easier, but it needs to become much easier still, and one part of that is knowing which parts of the process prove to be the most cumbersome. I’ll make sure your feedback gets to the appropriate people who can, for realsies, make changes as necessary to standards and implementations.

Updated 27 November 2018: As of Fall 2018, WebFaction's control panel now handles installing and renewing Let's Encrypt certificates, and that functionality also breaks by default the scripts described below (you'll likely start getting email errors regarding a 404 error in loading .well-known/acme-challenge). I recommend using WebFaction's Let's Encrypt support, review their simple one-button documentation. This blog post contains the full documentation in case it still proves useful, but if you want to run these scripts, you'll also want to review this issue regarding nginx configuration.

Updated 16 July 2016: to fix the cron job command, which may not have always worked depending on environment variables

Updated 2 December 2016: to use new letsencrypt-webfaction design, which uses WebFaction's API and doesn't require emails and waiting for manual certificate installation.


One day soon I hope WebFaction will make more of these steps unnecessary, but the configuring and testing will be something you have to do manually in pretty much any case. WebFaction now supports installing and renewing certificates with Let's Encrypt just by clicking a button in the control panel! While the full instructions are still included here, you should mostly only need to follow my directions for Create a secure version of your website in the WebFaction Control Panel, Test your website over HTTPS, and Redirect your HTTP site. You should be able to complete all of this in an hour some evening.

Create a secure version of your website in the WebFaction Control Panel

Login to the Web Faction Control Panel, choose the “DOMAINS/WEBSITES” tab and then click “Websites”.

“Add new website”, one that will correspond to one of your existing websites. I suggest choosing a name like existingname-secure. Choose “Encrypted website (https)”. For Domains, testing will be easiest if you choose both your custom domain and a subdomain of yourusername.webfactional.com. (If you don’t have one of those subdomains set up, switch to the Domains tab and add it real quick.) So, for my site, I chose npdoty.name and npdoty.npd.webfactional.com.

Finally, for “Contents”, click “Re-use an existing application” and select whatever application (or multiple applications) you’re currently using for your http:// site.

Click “Save” and this step is done. This shouldn’t affect your existing site one whit.

Test to make sure your site works over HTTPS

Now you can test how your site works over HTTPS, even before you’ve created any certificates, by going to https://subdomain.yourusername.webfactional.com in your browser. Hopefully everything will load smoothly, but it’s reasonably likely that you’ll have some mixed content issues. The debug console of your browser should show them to you: that’s Apple-Option-K in Firefox or Apple-Option-J in Chrome. You may see some warnings like this, telling you that an image, a stylesheet or a script is being requested over HTTP instead of HTTPS:

Mixed Content: The page at ‘https://npdoty.name/’ was loaded over HTTPS, but requested an insecure image ‘http://example.com/blah.jpg’. This content should also be served over HTTPS.

Change these URLs so that they point to https://example.com/blah.jpg (you could also use a scheme-relative URL, like //example.com/blah.jpg) and update the files on the webserver and re-test.

Good job! Now, https://subdomain.yourusername.webfactional.com should work just fine, but https://yourcustomdomain.com shows a really scary message. You need a proper certificate.

Get a free certificate for your domain

Let’s Encrypt is a new, free, automated certificate authority from a bunch of wonderful people. But to get it to setup certificates on WebFaction is a little tricky, so we’ll use the letsencrypt-webfaction utility —- thanks will-in-wi!

SSH into the server with ssh yourusername@yourusername.webfactional.com.

To install, run this command:

GEM_HOME=$HOME/.letsencrypt_webfaction/gems RUBYLIB=$GEM_HOME/lib gem2.2 install letsencrypt_webfaction

(Run the same command to upgrade; necesary if you followed these instructions before Fall 2016.)

For convenience, you can add this as a function to make it easier to call. Edit ~/.bash_profile to include:

function letsencrypt_webfaction {
    PATH=$PATH:$GEM_HOME/bin GEM_HOME=$HOME/.letsencrypt_webfaction/gems RUBYLIB=$GEM_HOME/lib ruby2.2 $HOME/.letsencrypt_webfaction/gems/bin/letsencrypt_webfaction $*
}

Now, let’s test the certificate creation process. You’ll need your email address, the domain you're getting a certificate for, the path to the files for the root of your website on the server, e.g. /home/yourusername/webapps/sitename/ and the WebFaction username and password you use to log in. Filling those in as appropriate, run this command:

letsencrypt_webfaction --letsencrypt_account_email you@example.com --domains yourcustomdomain.com --public /home/yourusername/webapps/sitename/ --username webfaction_username --password webfaction_password

If all went well, you’ll see nothing on the command line. To confirm that the certificate was created successfully, check the SSL certificates tab on the WebFaction Control Panel. ("Aren't these more properly called TLS certificates?" Yes. So it goes.) You should see a certificate listed that is valid for your domain yourcustomdomain.com; click on it and you can see the expiry date and a bunch of gobblydegook which actually is the contents of the certificate.

To actually apply that certificate, head back to the Websites tab, select the -secure version of your website from the list and in the Security section, choose the certificate you just created from the dropdown menu.

Test your website over HTTPS

This time you get to test it for real. Load https://yourcustomdomain.com in your browser. (You may need to force refresh to get the new certificate.) Hopefully it loads smoothly and without any mixed content warnings. Congrats, your site is available over HTTPS!

You are not done. You might think you are done, but if you think so, you are wrong.

Set up automatic renewal of your certificates

Certificates from Let’s Encrypt expire in no more than 90 days. (Why? There are two good reasons.) Your certificates aren’t truly set up until you’ve set them up to renew automatically. You do not want to do this manually every few months; you will forget, I promise.

Cron lets us run code on WebFaction’s server automatically on a regular schedule. If you haven’t set up a cron job before, it’s just a fancy way of editing a special text file. Run this command:

EDITOR=nano crontab -e

If you haven’t done this before, this file will be empty, and you’ll want to test it to see how it works. Paste the following line of code exactly, and then hit Ctrl-O and Ctrl-X to save and exit.

* * * * * echo "cron is running" >> $HOME/logs/user/cron.log 2>&1

This will output to that log every single minute; not a good cron job to have in general, but a handy test. Wait a few minutes and check ~/logs/user/cron.log to make sure it’s working.

Rather than including our username and password in our cron job, we'll set up a configuration file with those details. Create a file config.yml, perhaps at the location ~/le_certs. (If necessary, mkdir le_certs, touch le_certs/config.yml, nano le_certs/config.yml.) In this file, paste the following, and then customize with your details:

letsencrypt_account_email: 'you@example.com'
api_url: 'https://api.webfaction.com/'
username: 'webfaction_username'
password: 'webfaction_password'

(Ctrl-O and Ctrl-X to save and close it.) Now, let’s edit the crontab to remove the test line and add the renewal line, being sure to fill in your domain name, the path to your website’s directory, and the path to the configuration file you just created:

0 4 15 */2 * PATH=$PATH:$GEM_HOME/bin GEM_HOME=$HOME/.letsencrypt_webfaction/gems RUBYLIB=$GEM_HOME/lib /usr/local/bin/ruby2.2 $HOME/.letsencrypt_webfaction/gems/bin/letsencrypt_webfaction --domains example.com --public /home/yourusername/webapps/sitename/ --config /home/yourusername/le_certs/config.yml >> $HOME/logs/user/cron.log 2>&1

You’ll probably want to create the line in a text editor on your computer and then copy and paste it to make sure you get all the substitutions right. Paths must be fully specified as the above; don't use ~ for your home directory. Ctrl-O and Ctrl-X to save and close it. Check with crontab -l that it looks correct. As a test to make sure the config file setup is correct, you can run the command part directly; if it works, you shouldn't see any error messages on the command line. (Copy and paste the line below, making the the same substitutions as you just did for the crontab.)

PATH=$PATH:$GEM_HOME/bin GEM_HOME=$HOME/.letsencrypt_webfaction/gems RUBYLIB=$GEM_HOME/lib /usr/local/bin/ruby2.2 $HOME/.letsencrypt_webfaction/gems/bin/letsencrypt_webfaction --domains example.com --public /home/yourusername/webapps/sitename/ --config /home/yourusername/le_certs/config.yml

With that cron job configured, you'll automatically get a new certificate at 4am on the 15th of alternating months (January, March, May, July, September, November). New certificates every two months is fine, though one day in the future we might change this to get a new certificate every few days; before then WebFaction will have taken over the renewal process anyway. Debugging cron jobs can be tricky (I've had to update the command in this post once already); I recommend adding an alert to your calendar for the day after the first time this renewal is supposed to happen, to remind yourself to confirm that it worked. If it didn't work, any error messages should be stored in the cron.log file.

Redirect your HTTP site (optional, but recommended)

Now you’re serving your website in parallel via http:// and https://. You can keep doing that for a while, but everyone who follows old links to the HTTP site won’t get the added security, so it’s best to start permanently re-directing the HTTP version to HTTPS.

WebFaction has very good documentation on how to do this, and I won’t duplicate it all here. In short, you’ll create a new static application named “redirect”, which just has a .htaccess file with, for example, the following:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
RewriteCond %{HTTP:X-Forwarded-SSL} !on
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

This particular variation will both redirect any URLs that have www to the “naked” domain and make all requests HTTPS. And in the Control Panel, make the redirect application the only one on the HTTP version of your site. You can re-use the “redirect” application for different domains.

Test to make sure it’s working! http://yourcustomdomain.com, http://www.yourcustomdomain.com, https://www.yourcustomdomain.com and https://yourcustomdomain.com should all end up at https://yourcustomdomain.com. (You may need to force refresh a couple of times.)

by nick@npdoty.name at November 27, 2018 09:01 PM

November 26, 2018

Ph.D. student

Is competition good for cybersecurity?

A question that keeps coming up in various forms, but for example in response to recent events around the ‘trade war’ between the U.S. and China and its impact on technology companies, is whether or not market competition is good or bad for cyber-security.

Here is a simple argument for why competition could be good for cyber-security: The security of technical products is a positive quality of them, something that consumers would like. Market competition is what gets producers to make higher quality products at lower cost. Therefore, competition is good for security.

Here is an argument for why competition could be bad for cyber-security: Security is a hard thing for any consumer to understand; since most won’t, we have an information asymmetry here and therefore a ‘market for lemons’ kind of market failure. Therefore, competition is bad for security. It would be better to have a well-regulated monopoly.

This argument echoes, though it doesn’t exactly parallel, some of the arguments in Pasquale’s work on Hamiltonian’s and Jeffersonian’s in technology platform regulation.

by Sebastian Benthall at November 26, 2018 12:36 AM

November 18, 2018

Ph.D. student

“the privatization of public functions”

An emerging theme from the conference on Trade Secrets and Algorithmic Systems was that legal scholars have become concerned about the privatization of public functions. For example, the use of proprietary risk assessment tools instead of the discretion of judges who are supposed to be publicly accountable is a problem. More generally, use of “trade secrecy” in court settings to prevent inquiry into software systems is bogus and moves more societal control into the realm of private ordering.

Many remedies were proposed. Most involved some kind of disclosure and audit to experts. The most extreme form of disclosure is making the software and, where it’s a matter of public record, training data publicly available.

It is striking to me to be encountering the call for government use of open source systems because…this is not a new issue. The conversation about federal use of open source software was alive and well over five years ago. Then, the arguments were about vendor lock-in; now, they are about accountability of AI. But the essential problem of whether core governing logic should be available to public scrutiny, and the effects of its privatization, have been the same.

If we are concerned with the reliability of a closed and large-scale decision-making process of any kind, we are dealing with problems of credibility, opacity, and complexity. The prospects of an efficient market for these kinds of systems are dim. These market conditions are the conditions of sustainability of open source infrastructure. Failures in sustainability are manifest as software vulnerabilities, which are one of the key reasons why governments are warned against OSS now, though the process of measurement and evaluation of OSS software vulnerability versus proprietary vulnerabilities is methodologically highly fraught.

by Sebastian Benthall at November 18, 2018 05:25 PM

November 16, 2018

Ph.D. student

Trade secrecy, “an FDA for algorithms”, a software bills of materials (SBOM) #SecretAlgos

At the Conference on Trade Secrets and Algorithmic Systems at NYU today, the target of most critiques is the use of trade secrecy by proprietary technology providers to prevent courts and the public from seeing the inner workings of algorithms that determine people’s credit scores, health care, criminal sentencing, and so on. The overarching theme is that sometimes companies will use trade secrecy to hide the ways that their software is bad, and that that is a problem.

In one panel, the question of whether an “FDA for Algorithms” is on the table–referring the Food and Drug Administration’s approval of pharmaceuticals. It was not dealt with in too much depth, which is too bad, because it is a nice example of how government oversight of potentially dangerous technology is managed in a way that respects trade secrecy.

According to this article, when filing for FDA approval, a company can declare some of their ingredients to be trade secrets. The upshot of that is that those trade secrets are not subject to FOIA requests. However, these ingredients are still considered when approval is granted by the FDA.

It so happens that in the cybersecurity policy conversation (more so than in privacy) the question of openness of “ingredients” to inspection has been coming up in a serious way. NTIA has been hosting multistakeholder meetings about standards and policy around Software Component Transparency. In particular they are encouraging standardizations of Software Bills of Materials (SBOM) like the Linux Foundation’s Software Package Data Exchange (SPDX). SPDX (and SBOM’s more generally) describe the “ingredients” in a software package at a higher level of resolution than exposing the full source code, but at a level specific enough useful for security audits.

It’s possible that a similar method could be used for algorithmic audits with fairness (i.e., nondiscrimination compliance) and privacy (i.e., information sharing to third-parties) in mind. Particular components could be audited (perhaps in a way that protects trade secrecy), and then those components could be listed as “ingredients” by other vendors.

by Sebastian Benthall at November 16, 2018 08:56 PM

The paradox of ‘data markets’

We often hear that companies are “selling out data”, or that we are “paying for services” with our data. Data brokers literally buy and sell data about people. There are other forms of expensive data sources or data sets. There is, undoubtedly, one or more data markets.

We know that classically, perfect competition in markets depends on perfect information. Buyers and sellers on the market need to have equal and instantaneous access to information about utility curves and prices in order for the market to price things efficiently.

Since the bread and butter of the data market is information asymmetry, we know that data markets can never be perfectly competitive. If it was, the data market would cease to exist, because the perfect information condition would entail that there is nothing to buy and sell.

Data markets therefore have to be imperfectly competitive. But since these are the markets that perfect information in other markets might depend on, this imperfection is viral. The vicissitudes of the data market are the vicissitudes of the economy in general.

The upshot is that the challenges of information economics are not only those that appear in special sectors like insurance markets. They are at the heart of all economic activity, and there are no equilibrium guarantees.

by Sebastian Benthall at November 16, 2018 04:50 PM

November 15, 2018

Center for Technology, Society & Policy

Using Crowdsourcing to address Disparities in Police Reported Data: Addressing Challenges in Technology and Community Engagement

This is a project update from a CTSP project from 2017: Assessing Race and Income Disparities in Crowdsourced Safety Data Collection (with Kate BeckAditya Medury, and Jesus M. Barajas)

Project Update

This work has led to the development of Street Story, a community engagement tool that collects street safety information from the public, through UC Berkeley SafeTREC.

The tool collects qualitative and quantitative information, and then creates maps and tables that can be publicly viewed and downloaded. The Street Story program aims to collect information that can create a fuller picture of transportation safety issues, and make community-provided information publicly accessible.

 

The Problem

Low-income groups, people with disabilities, seniors and racial minorities are at higher risk of being injured while walking and biking, but experts have limited information on what these groups need to reduce these disparities. Transportation agencies typically rely on statistics about transportation crashes aggregated from police reports to decide where to make safety improvements. However, police-reported data is limited in a number of ways. First, crashes involving pedestrians or cyclists are significantly under-reported to police, with reports finding that up to 60% of pedestrian and bicycle crashes go unreported. Second, some demographic groups, including low-income groups, people of color and undocumented immigrants, have histories of contentious relationships with police. Therefore, they may be less likely to report crashes to the police when they do occur. Third, crash data doesn’t include locations where near–misses have happened, or locations where individuals feel unsafe but an issue has not yet happened. In other words, the data allow professionals to react to safety issues, but don’t necessarily allow them to be proactive about them.

One solution to improve and augment the data agencies use to make decisions and allocate resources is to provide a way for people to report transportation safety issues themselves. Some public agencies and private firms are developing apps and websites whether people can report issues for this purpose. But one concern is that the people who are likely to use these crowdsourcing platforms are those who have access to smart phones or the internet and who trust that government agencies with use the data to make changes, biasing the data toward the needs of these privileged groups.

Our Initial Research Plan

We chose to examine whether crowdsourced traffic safety data reflected similar patterns of underreporting and potential bias as police-reported safety data. To do this, we created an online mapping tool that people could use to report traffic crashes, near-misses and general safety issues. We planned to work with a city to release this tool to and collected data from the general public, then work directly with a historically marginalized community, under-represented in police-reported data, to target data collection in a high-need neighborhood. We planned to reduce barriers to entry for this community, including meeting the participants in person to explain the tool, providing them with in-person and online training, providing participants with cell phones, and compensating their data plans for the month. By crowdsourcing data from the general public and from this specific community, we planned to analyze whether there were any differences in the types of information reported by different demographics.

This plan seemed to work well with the research question and with community engagement best practices. However, we came up against a number of challenges with our research plan. Although many municipal agencies and community organizations found the work we were doing interesting and were working to address similar transportation safety issues we were focusing on, many organizations and agencies seemed daunted by the prospect of using technology to address underlying issues of under-reporting. Finally, we found that a year was not enough time to build trusting relationships with the organizations and agencies we had hoped to work with. Nevertheless, we were able to release a web-based mapping tool to collect some crowdsourced safety data from the public.

Changing our Research Plan

To better understand how more well-integrated digital crowdsourcing platforms perform, we pivoted our research project to explore how different neighborhoods engage with government platforms to report non-emergency service needs. We assumed some of these non-emergency services would mirror the negative perceptions of bicycle and pedestrian safety we were interested in collecting via our crowdsourcing safety platform. The City of Oakland relies on SeeClickFix, a smartphone app, to allow residents to request service for several types of issues: infrastructure issues, such as potholes, damaged sidewalks, or malfunctioning traffic signals; and non-infrastructure issues such as illegal dumping or graffiti. The city also provides phone, web, and email-based platforms for reporting the same types of service requests. These alternative platforms are collectively known as 311 services. We looked at 45,744 SeeClickFix-reports and 35,271 311-reports made between January 2013 and May 2016. We classified Oakland neighborhoods by status as community of concern. In the city of Oakland, 69 neighborhoods meet the definition for communities of concern, while 43 do not. Because we did not have data on the characteristics of each person reporting a service request, we made the assumption that people reporting requests also lived in the neighborhood where the request was needed.

How did communities of concern interact with the SeeClickFix and 311 platforms to report service needs? Our analysis highlighted two main takeaways. First, we found that communities of concern were more engaged in reporting than other communities, but had different reporting dynamics based on the type of issue they were reporting. About 70 percent of service issues came from communities of concern, even though they represent only about 60 percent of the communities in Oakland. They were nearly twice as likely to use SeeClickFix than to report via the 311 platforms overall, but only for non-infrastructure issues. Second, we found that even though communities of concern were more engaged, the level of engagement was not equal for everyone in those communities. For example, neighborhoods with higher proportions of limited-English proficient households were less likely to report any type of incident by 311 or SeeClickFix.

Preliminary Findings from Crowdsourcing Transportation Safety Data

We deployed the online tool in August 2017. The crowdsourcing platform was aimed at collecting transportation safety-related concerns pertaining to pedestrian and bicycle crashes, near misses, perceptions of safety, and incidents of crime while walking and bicycling in the Bay Area. We disseminated the link to the crowdsourcing platform primarily through Twitter and some email lists. . Examples of organizations who were contacted through Twitter-based outreach and also subsequently interacted with the tweet (through likes and retweets) include Transform Oakland, Silicon Valley Bike Coalition, Walk Bike Livermore, California Walks, Streetsblog CA, and Oakland Built. By December 2017, we had received 290 responses from 105 respondents. Half of the responses corresponded to perceptions of traffic safety concerns (“I feel unsafe walking/cycling here”), while 34% corresponded to near misses (“I almost got into a crash but avoided it”). In comparison, 12% of responses reported an actual pedestrian or bicycle crash, and 4% of incidents reported a crime while walking or bicycling. The sample size of the responses is too small to report any statistical differences.

Figure 1 shows the spatial patterns of the responses in the Bay Area aggregated to census tracts. Most of the responses were concentrated in Oakland and Berkeley. Oakland was specifically targeted as part of the outreach efforts since it has significant income and racial/ethnic diversity.

Figure 1 Spatial Distribution of the Crowdsourcing Survey Responses

Figure 1 Spatial Distribution of the Crowdsourcing Survey Responses

 

In order to assess the disparities in the crowdsourced data collection, we compared responses between census tracts that are classified as communities of concern or not. A community of concern (COC), as defined by the Metropolitan Transportation Commission, a regional planning agency, is a census tract that ranks highly on several markers of marginalization, including proportion of racial minorities, low-income households, limited-English speakers, and households without vehicles, among others.

Table 1 shows the comparison between the census tracts that received at least one crowdsourcing survey response. The average number of responses received in COCs versus non-COCs across the entire Bay Area were similar and statistically indistinguishable. However, when focusing on Oakland-based tracts, the results reveal that average number of crowdsourced responses in non-COCs were statistically higher. To assess how the trends of self-reported pedestrian/cyclist concerns compare with police-reported crashes, an assessment of pedestrian and bicycle-related police-reported crashes (from 2013-2016) shows that more police-reported pedestrian/bicycle crashes were observed on an average in COCs across the Bay Area as well as in Oakland. The difference in trends observed in the crowdsourced concerns and police-reported crashes suggest that either walking/cycling concerns are greater in non-COCs (thus underrepresented in police crashes), or that participation from among COCs is relatively underrepresented.

Table 1 Comparison of crowdsourced concerns and police-reported pedestrian/bicycle crashes in census tracts that received at least 1 response

Table 1 Comparison of crowdsourced concerns and police-reported pedestrian/bicycle crashes in census tracts that received at least 1 response

Table 2 compares the self-reported income and race/ethnicity characteristics of the respondents with the locations where the responses were reported. For reference purposes, Bay Area’s median household income in 2015 was estimated to be $85,000 (Source: http://www.vitalsigns.mtc.ca.gov/income), and Bay Area’s population was estimated to be 58% White, per the 2010 Census, (Source: http://www.bayareacensus.ca.gov/bayarea.htm).

Table 2 Distribution of all Bay Area responses based on the location of response and the self-reported income and race/ethnicity of respondents

The results reveal that White, medium-to-high income respondents were observed to report more walking/cycling -related safety issues in our survey, and more so in non-COCs. This trend is also consistent with the definition of COCs, which tend to have a higher representation of low-income people and people of color. However, if digital crowdsourcing without widespread community outreach is more likely to attract responses from medium-to-high income groups, and more importantly, if they only live, work, or play in a small portion of the region being investigated, the aggregated results will reflect a biased picture of a region’s transportation safety concerns. Thus, while the scalability of digital crowdsourcing provides an opportunity for capturing underrepresented transportation concerns, it may require greater collaboration with low-income, diverse neighborhoods to ensure uniform adoption of the platform.

Lessons Learned

From our attempts to work directly with community groups and agencies and our subsequent decision to change our research focus, we learned a number of lessons:

  1. Develop a research plan in partnership with communities and agencies. This would have allowed us to ensure that we began with a research plan in which community groups and agencies were better able to partner with us on, and this would have ensured that the partners were on board the topic of interest and the methods we hoped to use.
  2. Recognize the time it takes to build relationships. We found that building relationships with agencies and communities was more time intensive and took longer that we had hoped. These groups often have limitations on the time they can dedicate to unfunded projects. Next time, we should plan for this in our initial research plan.
  3. Use existing data sources to supplement research. We found that using See-Click-Fix and 311 data was a way to collect and analyze information to add context to our research question. Although the data did not have all demographic information we had hoped to analyze, this data source added additional context to the data we collected.
  4. Speak in a language that the general public understands. We found that when we used the term self-reporting, rather than crowdsourcing, when talking to potential partners and to members of the public, these individuals were more willing to consider the use of technology to collect information on safety issues from the public as legitimate. Using vocabulary and phrasing that people are familiar with is crucial when attempting to use technology to benefit the social good.

by Daniel Griffin at November 15, 2018 05:44 PM

Ph.D. student

The Crevasse: a meditation on accountability of firms in the face of opacity as the complexity of scale

To recap:

(A1) Beneath corporate secrecy and user technical illiteracy, a fundamental source of opacity in “algorithms” and “machine learning” is the complexity of scale, especially scale of data inputs. (Burrell, 2016)

(A2) The opacity of the operation of companies using consumer data makes those consumers unable to engage with them as informed market actors. The consequence has been a “free fall” of market failure (Strandburg, 2013).

(A3) Ironically, this “free” fall has been “free” (zero price) for consumers; they appear to get something for nothing without knowing what has been given up or changed as a consequence (Hoofnagle and Whittington, 2013).

Comments:

(B1) The above line of argument conflates “algorithms”, “machine learning”, “data”, and “tech companies”, as is common in the broad discourse. That this conflation is possible speaks to the ignorance of the scholarly position on these topics, and ignorance that is implied by corporate secrecy, technical illiteracy, and complexity of scale simultaneously. We can, if we choose, distinguish between these factors analytically. But because, from the standpoint of the discourse, the internals are unknown, the general indication of a ‘black box’ organization is intuitively compelling.

(B1a) Giving in to the lazy conflation is an error because it prevents informed and effective praxis. If we do not distinguish between a corporate entity and its multiple internal human departments and technical subsystems, then we may confuse ourselves into thinking that a fair and interpretable algorithm can give us a fair and interpretable tech company. Nothing about the former guarantees the latter because tech companies operate in a larger operational field.

(B2) The opacity as the complexity of scale, a property of the functioning of machine learning algorithms, is also a property of the functioning of sociotechnical organizations more broadly. Universities, for example, are often opaque to themselves, because of their own internal complexity and scale. This is because the mathematics governing opacity as a function of complexity and scale are the same in both technical and sociotechnical systems (Benthall, 2016).

(B3) If we discuss the complexity of firms, as opposed the the complexity of algorithms, we should conclude that firms that are complex due to scale of operations and data inputs (including number of customers) will be opaque and therefore have strategic advantage in the market against less complex market actors (consumers) with stiffer bounds on rationality.

(B4) In other words, big, complex, data rich firms will be smarter than individual consumers and outmaneuver them in the market. That’s not just “tech companies”. It’s part of the MO of every firm to do this. Corporate entities are “artificial general intelligences” and they compete in a complex ecosystem in which consumers are a small and vulnerable part.

Twist:

(C1) Another source of opacity in data is that the meaning of data come from the causal context that generates it. (Benthall, 2018)

(C2) Learning causal structure from observational data is hard, both in terms of being data-intensive and being computationally complex (NP). (c.f. Friedman et al., 1998)

(C3) Internal complexity, for a firm, is not sufficient to be “all-knowing” about the data that is coming it; the firm has epistemic challenges of secrecy, illiteracy, and scale with respect to external complexity.

(C4) This is why many applications of machine learning are overrated and so many “AI” products kind of suck.

(C5) There is, in fact, an epistemic crevasse between all autonomous entities, each containing its own complexity and constituting a larger ecological field that is the external/being/environment for any other autonomy.

To do:

The most promising direction based on this analysis is a deeper read into transaction cost economics as a ‘theory of the firm’. This is where the formalization of the idea that what the Internet changed most are search costs (a kind of transaction cost) should be.

It would be nice if those insights could be expressed in the mathematics of “AI”.

There’s still a deep idea in here that I haven’t yet found the articulation for, something to do with autopoeisis.

References

Benthall, Sebastian. (2016) The Human is the Data Science. Workshop on Developing a Research Agenda for Human-Centered Data Science. Computer Supported Cooperative Work 2016. (link)

Sebastian Benthall. Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics. Ph.D. dissertation. Advisors: John Chuang and Deirdre Mulligan. University of California, Berkeley. 2018.

Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society 3.1 (2016): 2053951715622512.

Friedman, Nir, Kevin Murphy, and Stuart Russell. “Learning the structure of dynamic probabilistic networks.” Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1998.

Hoofnagle, Chris Jay, and Jan Whittington. “Free: accounting for the costs of the internet’s most popular price.” UCLA L. Rev. 61 (2013): 606.

Strandburg, Katherine J. “Free fall: The online market’s consumer preference disconnect.” U. Chi. Legal F. (2013): 95.

by Sebastian Benthall at November 15, 2018 04:10 PM

open source sustainability and autonomy, revisited

Some recent chats with Chris Holdgraf and colleagues at NYU interested in “critical digital infrastracture” have gotten me thinking again about the sustainability and autonomy of open source projects again.

I’ll admit to having had naive views about this topic in the past. Certainly, doing empirical data science work on open source software projects has given me a firmer perspective on things. Here are what I feel are the hardest earned insights on the matter:

  • There is tremendous heterogeneity in open source software projects. Almost all quantitative features of these projects fall in log-normal distributions. This suggests that the keys to open source software success are myriad and exogenous (how the technology fits in the larger ecosystem, how outside funding and recognition is accomplished, …) rather than endogenous factors (community policies, etc.) While many open source projects start as hobby and unpaid academic projects, those that go on to be successful find one or more funding sources. This funding is an exogenous factor.
  • The most significant exogenous factors to an open source software project’s success are the industrial organization of private tech companies. Developing an open technology is part of the strategic repertoire of these companies: for example, to undermine the position of a monopolist, developing an open source alternative decreases barriers to market entry and allows for a more competitive field in that sector. Another example: Google funded Mozilla for so long arguably to deflect antitrust action over Google Chrome.
  • There is some truth to Chris Kelty’s idea of open source communities as recursive publics, cultures that have autonomy that can assert political independence at the boundaries of other political forces. This autonomy comes from: the way developers of OSS get specific and valuable human capital in the process of working with the software and their communities; the way institutions begin to depend on OSS as part of their technical stack, creating an installed base; and how many different institutions may support the same project, creating competition for the scarce human capital of the developers. Essentially, at the point where the software and the skills needed to deploy it effectively and the community of people with those skills is self-organized, the OSS community has gained some economic and political autonomy. Often this autonomy will manifest itself in some kind of formal organization, whether a foundation, a non-profit, or a company like Redhat or Canonical or Enthought. If the community is large and diverse enough it may have multiple organizations supporting it. This is in principle good for the autonomy of the project but may also reflect political tensions that can lead to a schism or fork.
  • In general, since OSS development is internally most often very fluid, with the primary regulatory mechanism being the fork, the shape of OSS communities is more determined by exogenous factors than endogenous ones. When exogenous demand for the technology rises, the OSS community can find itself with a ‘surplus’, which can be channeled into autonomous operations.

by Sebastian Benthall at November 15, 2018 02:45 PM

November 13, 2018

MIMS 2012

How to Write Effective Advertisements, according to David Ogilvy

David Ogilvy is known as the “Father of Advertising.” He earned that moniker by pioneering the use of research to come up with effective ads and measure their impact. This was decades before the internet and the deluge of data we have available to us now. I can only imagine how much more potent he would be today.

He breaks down his methods in his book Ogivly on Advertising, which is just as relevant today as it was when it was written in 1983. Since I’ve found his techniques useful, I’m publishing my notes here so I can easily refer back to them and share them.

How to Write Headlines That Sell

Headlines are the most important part of your advertisements. According to research done by Ogilvy, “five times as many people read the headlines as read the body copy. It follows that unless your headline sells your product, you have wasted 90 per cent of your money.”

  • Promise a benefit. Make sure the benefit is important to your customer. For example, “whiter wash, more miles per gallon, freedom from pimples, fewer cavities.”
    • Make it persuasive, and make it unique. Persuasive headlines that aren’t unique, which your competitors can claim, aren’t effective.
  • Make it specific. Use percentages, time elapsed, dollars saved.
  • Personalize it to your audience, such as the city they’re in. (Or the words in their search query)
  • Include the brand and product name.
  • Make it as long or as short as it needs to be. Ogilvy’s research found that, “headlines with more than ten words get less readership than short headlines. On the other hand, a study of retail advertisements found that headlines of ten words sell more merchandise than short headlines. Conclusion: if you need a long headline, go ahead and write one, and if you want a short headline, that’s all right too.”
  • Make it clear and to the point, not clever or tricky.
  • Don’t use superlatives like, “Our product is the best in the world.” Market researcher George Gallup calls this “Brag and Boast.” They convince nobody.

Ideas for Headlines

  • Headlines that contain news are surefire. The news can be announcing a new product, or a new way to use an existing product. “And don’t scorn tried-and-true words like amazing, introducing, now, suddenly.”
  • Include information that’s useful to the reader, provided the information involves your product.
  • Try including a quote, such as from an expert or customers.

How to Write Persuasive Body Copy

According to Ogilvy, body copy is seldom read by more than 10% of people. But the 10% who read it are prospects. What you say determines the success of your ad, so it’s worth spending the time to get it right.

  • Address readers directly, as if you are speaking to them. "One human being to another, second person singular.”
  • Write short sentences and short paragraphs. Avoid complicated words. Use plain, everyday language.
  • Don’t write long-winded, philosophical essays. “Tell your reader what your product will do for him or her, and tell it with specifics.”
  • Write your copy in the form of a story. The headline can be a hook.
  • Avoid analogies. People often misunderstand them.
  • Just like with headlines, stay away from superlatives like, “Our product is the best in the world.”
  • Use testimonials from customers or experts (also known as “social proof”). Avoid celebrity testimonials. Most people forget the product and remember the celebrity. Further, people assume the celebrity has been bought, which is usually true.
  • Coupons and special offers work.
  • Always include the price of your products. “You may see a necklace in a jeweler’s window, but you don’t consider buying it because the price is not shown and you are too shy to go in and ask. It is the same way with advertisements. When the price of the product is left out, people have a way of turning the page.”
  • Long copy sells more than short. “I believe, without any research to support me, that advertisements with long copy convey the impression that you have something important to say, whether people read the copy or not.”
  • Stick to the facts about what your product is and can do.
  • Make the first paragraph a grabber to draw people into reading your copy.
  • Sub-headlines make copy more readable and scannable.
  • People often skip from the headline to the coupon to see the offer, so make the coupons mini-ads, complete with brand name, promise, and a mini photo of the product.
  • To keep prospects on the hook, try “limited edition,” “limited supply,” “last time at this price,” or “special price for promptness.”

Suggestions for Images

After headlines, images are the most important part of advertisements. They draw people in. Here’s what makes imagery effective:

  • The best images arouse the viewer’s curiosity. They look at it and ask, “What’s going on here?” This leads them to read the copy to find out. This is called “Story Appeal.”
  • If you don’t have a good story to tell, make your product the subject.
  • Show the end result of using your product. Before-and-after photographs are highly effective.
  • Photographs attract more readers, are more believable, and better remembered than illustrations.
  • Human faces that are larger than life size repel readers. Don’t use them.
  • Historical subjects bore people.
  • If your picture includes people, it’s most effective if it uses people your audience can identify with. Doctors if you’re trying to sell to doctors, men if you’re trying to appeal to men, and so on.
  • Include captions under your photographs. More people read captions than body copy, so make the caption a mini-advertisement.

Layout

  • KISS – Keep It Simple, Stupid.
  • “Readers look first at the illustration, then at the headline, then at the copy. So put these elements in that order.” This also follows the normal order of scanning.
  • More people read captions of images than body copy, so always include a caption under it. Captions should be mini-advertisements, so include the brand name and promise.

A Few More Tips for Effective Ads

These are some other principles I picked up from the book, which can be useful in many different types of ads.

  • Demonstrations of how well your product works are effective. Try coming up with a demonstration that your reader can perform.
  • Don’t name competitors. The ad is less believable and more confusing. People often think the competitor is the hero.
  • Problem-solution is a tried-and-true ad technique.
  • Give people a reason why they should buy.
  • Emotion can be highly effective. Nostalgia, charm, sentimentality, etc. Consumers need a rational excuse to justify their emotional decisions.
  • Cartoons don’t sell well to adults.
  • The most successful products and services are differentiated from their competitors. This is most effective if you can differentiate via low cost or highest quality. A differentiator doesn’t need to be relevant to the product’s performance, however, to be effective. For example, Owens-Corning differentiated their insulation by advertising the color of the product, which has nothing to do with how the product performs.

Ogilvy’s principles are surprisingly evergreen, despite the technological changes. Towards the end of the book he quotes Bill Bernbach, another advertising giant, on why this is:

Human nature hasn’t changed for a billion years. It won’t even vary in the next billion years. Only the superficial things have changed. It is fashionable to talk about changing man. A communicator must be concerned with unchanging man – what compulsions drive him, what instincts dominate his every action, even though his language too often camouflages what really motivates him. For if you know these things about a man, you can touch him at the core of his being. One thing is unchangingly sure. The creative man with an insight into human nature, with the artistry to touch and move people, will succeed. Without them he will fail.

Human nature hasn’t changed much, indeed.


Get the book here: Ogivly on Advertising

by Jeff Zych at November 13, 2018 06:07 AM

November 12, 2018

Ph.D. student

What proportion of data protection violations are due to “dark data” flows?

“Data protection” refers to the aspect of privacy that is concerned with the use and misuse of personal data by those that process it. Though widely debated, scholars continue to converge (e.g.) on ideal data protection consisting of alignment between the purposes the data processor will use the data for and the expectations of the user, along with collection limitations that reduce exposure to misuse. Through its extraterritorial enforcement mechanism, the GDPR has threatened to make these standards global.

The implication of these trends is that there will be a global field of data flows regulated by these kinds of rules. Many of the large and important actors that process user data can be held accountable to the law. Privacy violations by these actors will be due to a failure to act within the bounds of the law that applies to them.

On the other hand, there is also cybercrime, an economy of data theft and information flows that exists “outside the law”.

I wonder what proportion of data protection violations are due to dark data flows–flows of personal data that are handled by organizations operating outside of any effective regulation.

I’m trying to draw an analogy to a global phenomenon that I know little about but which strikes me as perhaps more pressing than data protection: the interrelated problems of money laundering, off-shore finance, and dark money contributions to election campaigns. While surely oversimplifying the issue, my impression is that the network of financial flows can be divided into those that are more and less regulated by effective global law. Wealth seeks out these opportunities in the dark corners.

How much personal data flows in these dark networks? And how much is it responsible for privacy violations around the world? Versus how much is data protection effectively in the domain of accountable organizations (that may just make mistakes here and there)? Or is the dichotomy false, with truly no firm boundary between licit and illicit data flow networks?

by Sebastian Benthall at November 12, 2018 01:37 PM

November 11, 2018

Ph.D. student

Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks [Talk]

This blog post is a version of a talk I gave at the 2018 ACM Computer Supported Cooperative Work and Social Computing (CSCW) Conference based on a paper written with Deirdre Mulligan, Ellen Van Wyk, John Chuang, and James Pierce, entitled Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks, which was honored with a best paper award. Find out more on our project page, our summary blog post, or download the paper: [PDF link] [ACM link]

In the work described in our paper, we created a set of conceptual speculative designs to explore privacy issues around emerging biosensing technologies, technologies that sense human bodies. We then used these designs to help elicit discussions about privacy with students training to be technologists. We argue that this approach can be useful for Values in Design and Privacy by Design research and practice.

dhs slide

Image from publicintelligence.net. Note the middle bullet point in the middle column – “avoids all privacy issues.”

Let me start with a motivating example, which I’ve discussed in previous talks. In 2007, the US Department of Homeland Security proposed a program to try to predict criminal behavior in advance of the crime itself –using thermal sensing, computer vision, eye tracking, gait sensing, and other physiological signals. And supposedly it would “avoid all privacy issues.” But it seems pretty clear that privacy was not fully thought through in this project. Now Homeland Security projects actually do go through privacy impact assessments and I would guess that in this case, they would probably go through the impact assessment process, find that the system doesn’t store the biosensed data, so privacy is protected. But while this might  address one conception of privacy related to storing data, there are other conceptions of privacy at play. There are still questions here about consent and movement in public space, about data use and collection, or about fairness and privacy from algorithmic bias.

While that particular imagined future hasn’t come to fruition; a lot of these types of sensors are now becoming available as consumer devices, used in applications ranging from health and quantified self, to interpersonal interactions, to tracking and monitoring. And it often seems like privacy isn’t fully thought through before new sensing devices and services are publicly announced or released.

A lot of existing privacy approaches, like privacy impact assessments, are deductive, checklist-based, or assume that privacy problems already known and well-defined in advance which often isn’t the case. Furthermore, the term “design” in discussions of Privacy by Design, is often seen as a way of providing solutions to problems identified by law, rather than viewing design as a generative set of practices useful to understanding what privacy issues might need to be considered in the first place. We argue that speculative design-inspired approaches can help explore and define problem spaces of privacy in inductive, situated, and contextual ways.

Design and Research Approach

We created a design workbook of speculative designs. Workbooks are collections of conceptual designs drawn together to allow designers to explore and reflect on a design space. Speculative design is a practice of using design to ask social questions, by creating conceptual designs or artifacts that help create or suggest a fictional world. We can create speculative designs explore different configurations of the world, imagine and understand possible alternative futures, which helps us think through issues that have relevance in the present. So rather than start with trying to find design solutions for privacy, we wanted to use design workbooks and speculative designs together to create a collection of designs to help us explore the what problem space of privacy might look like with emerging biosensing technologies.

workbook pages

A sampling of the conceptual designs we created as part of our design workbook

In our prior work, we created a design workbook to do this exploration and reflection. Inspired by recent research, science fiction, and trends from the technology industry, we created a couple dozen fictional products, interfaces, and webpages of biosensing technologies. These included smart camera enabled neighborhood watch systems, advanced surveillance systems, implantable tracking devices, and non-contact remote sensors that detect people’s heartrates. This process is documented in a paper from Designing Interactive Systems. These were created as part of a self-reflective exercise, for us as design researchers to explore the problem space of privacy. However, we wanted to know how non-researchers, particularly technology practitioners might discuss privacy in relation to these conceptual designs.

A note on how we’re approaching privacy and values.  Following other values in design work and privacy research, we want to avoid providing a single universalizing definition of privacy as a social value. We recognize privacy as inherently multiple – something that is situated and differs within different contexts and situations.

Our goal was to use our workbook as a way to elicit values reflections and discussion about privacy from our participants – rather than looking for “stakeholder values” to generate design requirements for privacy solutions. In other words, we were interested in how technologists-in-training would use privacy and other values to make sense of the designs.

Growing regulatory calls for “Privacy by Design” suggest that privacy should be embedded into all aspects of the design process, and at least partially done by designers and engineers. Because of this, the ability for technology professionals to surface, discuss, and address privacy and related values is vital. We wanted to know how people training for those jobs might use privacy to discuss their reactions to these designs. We conducted an interview study, recruiting 10 graduate students from a West Coast US University who are training to go into technology professions, most of whom had prior tech industry experience via prior jobs or internships. At the start of the interview, we gave them a physical copy of the designs and explained that the designs were conceptual, but didn’t tell them that the designs were initially made to think about privacy issues. In the following slides, I’ll show a few examples of the speculative design concepts we showed – you can see more of them in the paper. And then I’ll discuss the ways in which participants used values to make sense of or react to some of the designs.

Design examples

 

 This design depicts an imagined surveillance system for public spaces like airports that automatically assigns threat statuses to people by color-coding them. We intentionally left it ambiguous how the design makes its color-coding determinations to try to invite questions about how the system classifies people.

truwork

Conceptual TruWork design – “An integrated solution for your office or workplace!”

In our designs, we also began to iterate on ideas relating to tracking implants, and different types of social contexts they could be used in. Here’s a scenario advertising a workplace implantable tracking device called TruWork. Employers can subscribe to the service and make their employees implant these devices to keep track of their whereabouts and work activities to improve efficiency.

coupletrack3

Conceptual CoupleTrack infographic depicting an implantable tracking chip for couples

We also re-imagined the implant as “coupletrack,” an implantable tracking chip for couples to use, as shown in this infographic.

Findings

We found that participants centered values in their discussions when looking at the designs – predominantly privacy, but also related values such as trust, fairness, security, and due process. We found eight themes of how participants interacted with the designs in ways that surfaced discussion of values, but I’ll highlight three here: Imagining the designs as real; seeing one’s self as multiple users; and seeing one’s self as a technology professional. The rest are discussed in more detail in the paper.

Imagining the Designs as Real

peta-cam-2

Conceptual product page for a small, hidden, wearable camera

Even though participants were aware that the designs were imagined, Some participants imagined the designs as seemingly real by thinking about long term effects in the fictional world of the design. This design (pictured above) is an easily hideable, wearable, live streaming HD camera. One participant imagined what could happen to social norms if these became widely adopted, saying “If anyone can do it, then the definition of wrong-doing would be questioned, would be scrutinized.” He suggests that previously unmonitored activities would become open for surveillance and tracking like “are the nannies picking up my children at the right time or not? The definition of wrong-doing will be challenged”. Participants became actively involved fleshing out and creating the worlds in which these designs might exist. This reflection is also interesting, because it begins to consider some secondary implications of widespread adoption, highlighting potential changes in social norms with increasing data collection.

Seeing One’s Self as Multiple Users

Second, participants took multiple user subject positions in relation to the designs. One participant read the webpage for TruWork and laughed at the design’s claim to create a “happier, more efficient workplace,” saying, “This is again, positioned to the person who would be doing the tracking, not the person who would be tracked.”  She notes that the website is really aimed at the employer. She then imagines herself as an employee using the system, saying:

If I called in sick to work, it shouldn’t actually matter if I’m really sick. […] There’s lots of reasons why I might not wanna say, “This is why I’m not coming to work.” The idea that someone can check up on what I said—it’s not fair.

This participant put herself in both the viewpoint of an employer using the system and as an employee using the system, bringing up issues of workplace surveillance and fairness. This allowed participants to see values implications of the designs from different subject positions or stakeholder viewpoints.

Seeing One’s Self as a Technology Professional

Third, participants also looked at the designs through the lens of being a technology practitioner, relating the designs to their own professional practices. Looking at the design that automatically flags and detects supposedly suspicious people, one participant reflected on his self-identification as a data scientist and the values implications of predicting criminal behavior with data when he said:

the creepy thing, the bad thing is, like—and I am a data scientist, so it’s probably bad for me too, but—the data science is predicting, like Minority Report… [and then half-jokingly says] …Basically, you don’t hire data scientists.

Here he began to reflect on how his practices as data scientist might be implicated in this product’s creepiness – that a his initial propensity to want to use the data to predict if subjects are criminals or not might not be a good way to approach this problem and have implications for due process.

Another participant compared the CoupleTrack design to a project he was working on. He said:

[CoupleTrack] is very similar to our idea. […] except ours is not embedded in your skin. It’s like an IOT charm which people [in relationships] carry around. […] It’s voluntary, and that makes all the difference. You can choose to keep it or not to keep it.

In comparing the fictional CoupleTrack product to the product he’s working on in his own technical practice, the value of consent, and how one might revoke consent, became very clear to this participant. Again, we thought it was compelling that the designs led some participants to begin reflecting on the privacy implications in their own technical practices.

Reflections and Takeaways

Given the workbooks’ ability to help elicit reflections on and discussion of privacy in multiple ways, we see this approach as useful for future Values in Design and Privacy by Design work.

The speculative workbooks helped open up discussions about values, similar to some of what Katie Shilton identifies as “values levers,” activities that foreground values, and cause them to be viewed as relevant and useful to design. Participants’ seeing themselves as users to reflect on privacy harms is similar to prior work showing how self-testing can lead to discussion of values. Participants looking at the designs from multiple subject positions evokes value sensitive design’s foregrounding of multiple stakeholder perspectives. Participants reflected on the designs both from stakeholder subject positions and through the lenses of their professional practices as technology practitioners in training.

While Shilton identifies a range of people who might surface values discussions, we see the workbook as an actor to help surface values discussions. By depicting some provocative designs that raised some visceral and affective reactions, the workbooks brought attention to questions about potential sociotechnical configurations of biosensing technologies. Future values in design work might consider creating and sharing speculative design workbooks for eliciting values reflections with experts and technology practitioners.

More specifically, with this project’s focus on privacy, we think that this approach might be useful for “Privacy by Design”, particularly for technologists trying to surface discussions about the nature of the privacy problem at play for an emerging technology. We analyzed participants’ responses using Mulligan et al’s privacy analytic framework. The paper discusses this in more detail, but the important thing is that participants went beyond just saying privacy and other values are important to think about. They began to grapple with specific, situated, and contextual aspects of privacy – such as considering different ways to consent to data collection, or noting different types of harms that might emerge when the same technology is used in a workplace setting compared to an intimate relationship. Privacy professionals are looking for tools to help them “look around corners,” to help understand what new types of problems related to privacy might occur in emerging technologies and contexts. This provides a potential new tool for privacy professionals in addition to many of the current top-down, checklist approaches–which assume that the concepts of privacy at play are well known in advance. Speculative design practices can be particularly useful here – not to predict the future, but in helping to open and explore the space of possibilities.

Thank you to my collaborators, our participants, and the anonymous reviewers.

Paper citation: Richmond Y. Wong, Deirdre K. Mulligan, Ellen Van Wyk, James Pierce, and John Chuang. 2017. Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 111 (December 2017), 26 pages. DOI: https://doi.org/10.1145/3134746

by Richmond at November 11, 2018 11:06 PM

November 07, 2018

Ph.D. student

the resilience of agonistic control centers of global trade

This post is merely notes; I’m fairly confident that I don’t know what I’m writing about. However, I want to learn more. Please recommend anything that could fill me in about this! I owe most of this to discussion with a colleague who I’m not sure would like to be acknowledged.

Following the logic of James Beniger, an increasingly integrated global economy requires more points of information integration and control.

Bourgeois (in the sense of ‘capitalist’) legal institutions exist precisely for the purpose of arbitrating between merchants.

Hence, on the one hand we would expect international trade law to be Habermasian. However, international trade need not rest on a foundation of German idealism (which increasingly strikes me as the core of European law). Rather, it is an evolved mechanism.

A key part of this mechanism, as I’ve heard, is that it is decentered. Multiple countries compete to be the sites of transnational arbitration, much like multiple nations compete to be tax havens. Sovereignty and discretion are factors of production in the economy of control.

This means, effectively, that one cannot defeat capitalism by chopping off its head. It is rather much more like a hydra: the “heads” are the creation of two-sided markets. These heads have no internalized sense of the public good. Rather, they are optimized to be attractive to the transnational corporations in bilateral negotiation. The plaintiffs and defendants in these cases are corporations and states–social forms and institutions of complexity far beyond that of any individual person. This is where, so to speak, the AI’s clash.

by Sebastian Benthall at November 07, 2018 01:38 PM

October 31, 2018

Ph.D. student

Best Practices Team Challenges

By Stuart Geiger and Dan Sholler, based on a conversation with Aaron Culich, Ciera Martinez, Fernando Hoces, Francois Lanusse, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Sara Stoudt, and Stéfan van der Walt. This post first appeared on the BIDS Blog.

This post is a summary of the first BIDS Best Practices lunch, in which we bring people together from across the Berkeley campus and beyond to discuss a particular challenge or issue in doing data-intensive research. The goal of the series is to informally share experiences and ideas on how to do data science well (or at least better) from many disciplines and contexts. The topic for this week was doing data-intensive research in teams, labs, and other groups. For this first meeting, we focused on just identifying and diagnosing the many different kinds of challenges. In future meetings, we will dive deeper into some of these specific issues and try to identify best practices for dealing with them.

We began planning for this series by reviewing many of the published papers and series around “best practices” in scientific computing (e.g. Wilson et al, 2014), “good enough practices” (Wilson et al, 2017) and PLOS Computational Biology’s “ten simple rules” series (e.g. Sandve et al, 2013; Goodman et al, 2014). We also see this series as an intellectual successor to the collection of case studies in reproducible research published by several BIDS fellows (Kitzes, Turek, and Deniz, 2018). One reason we chose to identify issues with doing data science in teams and groups is because many of us felt like we understood how to best practice data-intensive research individually, but struggled with how to do this well in teams and groups.

Compute and data challenges

Getting on the same stack

Some of the major challenges in doing data-intensive research in teams is around technology use, particularly in using the same tools. Today’s computational researchers have an overwhelming number of options to choose in terms of programming languages, software libraries, data formats, operating systems, compute infrastructures, version control systems, collaboration platforms, and more. One of the major challenges we discussed was that members of a team often have been trained to work with different technologies, which also often come with their own ways of working on a problem. Getting everyone on the same technical stack often takes far more time than is anticipated, and new members can spend much time learning to work in a new stack.

One of the biggest divides our group had experienced was in the choice of using programming languages, as many of us were more comfortable with either R or Python. These programming languages have their own extensive software libraries, like the tidyverse vs. the numpy/pandas/matplotlib stack. There are also many different software environments to choose from at various layers of the stack, from development environments like Jupyter notebooks versus RStudio and RMarkdown to the many options for package and dependency management. While most of the people in the room were committed to open source languages and environments, many people are trained to use proprietary software like MATLAB or SPSS, which raises an additional challenge in teams and groups.

Another major issue is where the actual computing and data storage will take place. Members of a team often come in knowing how to run code on their own laptops, but there are many options for groups to work, including a lab’s own shared physical server, campus clusters, national grid/supercomputer infrastructures, corporate cloud services, and more.

Workflow and pipeline management

Getting everyone to use an interoperable software and hardware environment is as much of a social challenge as it is a technical one, and we had a great discussion about whether a group leader should (or could) require members to use the same language, environment, or infrastructure. One of the technical solutions to this issue — working in staged data analysis pipelines — comes with its own set of challenges. With staged pipelines, data processing and analysis tasks are separated into modular tasks that an individual can solve in their own way, then output their work to a standardized file for the next stage of the pipeline to take as input.

The ideal end goal is often imagined to be a fully-automated (or ‘one click’) data processing and analysis pipeline, but this is difficult to achieve and maintain in practice. Several people in our group said they personally spend substantial amounts of time setting up these pipelines and making sure that each person’s piece works with everyone else’s. Even with groups that had formalized detailed data management plans, a common theme was that someone had to constantly make sure that team members were actually following these standards so that the pipeline keep running.

External handoffs to and from the team

Many of the research projects we discussed involved not only handoffs between members of the team, but also handoffs between the team and external groups. The “raw” data a team begins with is often the final output of another research team, government agency, or company. In these cases, our group discussed issues that ranged from technical to social, from data formats that are technically difficult to integrate at scale (like Excel spreadsheets) to not having adequate documentation to be able to interpret what the data actually means. Similarly, teams often must deliver data to external partners, who may have very different needs, expectations, and standards than the team has for itself. Finally, some teams have sensitive data privacy issues and requirements, which makes collaboration even more difficult. How can these external relationships be managed in mutually beneficial ways?

Team management challenges

Beyond technical challenges, a number of management issues face research groups aspiring to implement best practices for data-intensive research. Our discussion highlighted the difficulties of composing a well-balanced team, of dealing with fluid membership, and of fostering generative coordination and communication among group members.

Composing a well-balanced team

Data-intensive research groups require a team with varied expertise. A consequence of varied expertise is varied capabilities and end goals, so project leads must devote attention to managing team composition. Whereas one or two members might be capable of carrying out tasks across the various stages of research, others might specialize in a particular area. How then can research groups ensure that no one member of the team departing would collapse the project and that the team holds the necessary expertise to accomplish the shared research goal? Furthermore, some members may participate simply to acquire skills, while others seek to establish or build an academic track record. How might groups achieve alignment between personal and team goals?

Dealing with voluntary and fluid membership

A practical management problem also relates to the quasi-voluntary and fluid nature of research groups. Research groups largely rely extensively on students and postdocs, with an expectation that they join the team temporarily to gain new skills and experience, then leave. Turnover becomes a problem when processes, practices, and tacit institutional knowledge are difficult to standardize or document. What strategies might project leads employ to alleviate the difficulties associated with voluntary, fluid membership?

Fostering coordination and communication

The issues of team composition and voluntary or fluid membership raise a third challenge: fostering open communication among group members. Previous research and guidelines for managing teams (Edmondson, 1999; Google re:Work, 2017) emphasize the vital role of psychological safety in ensuring that team members share knowledge and collaborate effectively. Adequate psychological safety ensures that team members are comfortable speaking up about their ideas and welcoming of others’ feedback. Yet fostering psychological safety is a difficult task when research groups comprise members with various levels of expertise, career experience, and, increasingly, communities of practice (as in the case of data scientists working with domain experts). How can projects establish avenues for open communication between diverse members?

Not abandoning best practices when deadlines loom

One of the major issues that resonated across our group was the tendency for a team to stop following various best practices when deadlines rapidly approach. In the rush to do everything that is needed to get a publication submitted, it is easy to accrue what software engineers call “technical debt.” For example, substantial “collaboration debt” or “reproducibility debt” can be foisted on a team when a member works outside of the established workflow to produce a figure or fails to document their changes to analysis code. These stressful moments can also be difficult for the team’s psychological safety, particularly if there is an expectation to work late hours to make the deadline.

Concluding thoughts and plans

Are there universal best practices for all cases and contexts?

At the conclusion of our first substantive meeting, we began to evaluate topics for future discussions that might help us identify potential solutions to the challenges faced by data-intensive research groups. In doing so, we were quickly confronted with the diversity of technologies, research agendas, disciplinary norms, team compositions, and governance structures, and other factors that characterize scientific research groups. Are solutions that work for large teams appropriate for smaller teams? Do cross-institutional or inter-disciplinary teams face different problems than those working in the same institution or discipline? Are solutions that work in astronomy or physics appropriate for ecology or social sciences? Dealing with such diversity and contextuality, then, might require adjusting our line of inquiry to the following question: At what level should we attempt to generalize best practices?

Our future plans

The differences within and between research groups are meaningful and deserve adequate attention, but commonalities do exist. This semester, our group will aggregate and develop input from a diverse community of practitioners to construct sets of thoughtful, grounded recommendations. For example, we’ll aim to provide recommendations on issues such as how to build and maintain pipelines and workflows, as well as strategies for achieving diversity and inclusion in teams. In our next post, we’ll offer some insights on how to manage the common problem of perpetual turnover in team membership. On all topics, we welcome feedback and recommendations.

Combatting impostor syndrome

Finally, many people who attended told us afterwards how positive and valuable it was to share these kinds of issues and experiences, particularly for combatting the “impostor syndrome” that many of us often feel. We typically only present the final end-product of research. Even sharing one’s final code and data in perfectly reproducible pipelines can still hide all the messy, complex, and challenging work that goes into the research process. People deeply appreciated hearing others talk openly about the difficulties and challenges that come with doing data-intensive research and how they tried to deal with them. The format of sharing challenges followed by strategies for dealing with those challenges may be a meta-level best practice for this kind of work, versus the more standard approach of listing more abstract rules and principles. Through these kinds of conversations, we hope to continue to shed light on the doing of data science in ways that will be constructive and generative across the many fields, areas, and contexts in which we all work.

by R. Stuart Geiger at October 31, 2018 07:00 AM

October 23, 2018

Ph.D. student

For a more ethical Silicon Valley, we need a wiser economics of data

Kara Swisher’s NYT op-ed about the dubious ethics of Silicon Valley and Nitasha Tiku’s WIRED article reviewing books with alternative (and perhaps more cynical than otherwise stated) stories about the rise of Silicon Valley has generated discussion and buzz among the tech commentariat.

One point of debate is whether the focus should be on “ethics” or on something more substantively defined, such as human rights. Another point is whether the emphasis should be on “ethics” or on something more substantively enforced, like laws which impose penalties between 1% and 4% of profits, referring of course to the GDPR.

While I’m sympathetic to the European approach (laws enforcing human rights with real teeth), I think there is something naive about it. We have not yet seen whether it’s ever really possible to comply with the GDPR could wind up being a kind of heavy tax on Big Tech companies operating in the EU, but one that doesn’t truly wind up changing how people’s data are used. In any case, the broad principles of European privacy are based on individual human dignity, and so they do not take into account the ways that corporations are social structures, i.e. sociotechnical organizations that transcend individual people. The European regulations address the problem of individual privacy while leaving mystified the question of why the current corporate organization of the world’s personal information is what it is. This sets up the fight over ‘technology ethics’ to be a political conflict between different kinds of actors whose positions are defined as much by their social habitus as by their intellectual reasons.

My own (unpopular!) view is that the solution to our problems of technology ethics are going to have to rely on a better adapted technology economics. We often forget today that economics was originally a branch of moral philosophy. Adam Smith wrote The Theory of Moral Sentiments (1759) before An Inquiry into the Nature and Causes of the Wealth of Nations (1776). Since then the main purpose of economics has been to intellectually grasp the major changes to society due to production, trade, markets, and so on in order to better steer policy and business strategy towards more fruitful equilibria. The discipline has a bad reputation among many “critical” scholars due to its role in supporting neoliberal ideology and policies, but it must be noted that this ideology and policy work is not entirely cynical; it was a successful centrist hegemony for some time. Now that it is under threat, partly due to the successes of the big tech companies that benefited under its regime, it’s worth considering what new lessons we have to learn to steer the economy in an improved direction.

The difference between an economic approach to the problems of the tech economy and either an ‘ethics’ or a ‘law’ based approach is that it inherently acknowledges that there are a wide variety of strategic actors co-creating social outcomes. Individual “ethics” will not be able to settle the outcomes of the economy because the outcomes depend on collective and uncoordinated actions. A fundamentally decent person may still do harm to others due to their own bounded rationality; “the road to hell is paved with good intentions”. Meanwhile, regulatory law is not the same as command; it is at best a way of setting the rules of a game that will be played, faithfully or not, by many others. Putting regulations in place without a good sense of how the game will play out differently because of them is just as irresponsible as implementing a sweeping business practice without thinking through the results, if not more so because the relationship between the state and citizens is coercive, not voluntary as the relationship between businesses and customers is.

Perhaps the biggest obstacle to shifting the debate about technology ethics to one about technology economics is that it requires a change in register. It drains the conversation of the pathos which is so instrumental in surfacing it as an important political topic. Sound analysis often ruins parties like this. Nevertheless, it must be done if we are to progress towards a more just solution to the crises technology gives us today.

by Sebastian Benthall at October 23, 2018 03:04 PM

October 17, 2018

Ph.D. student

Engaging Technologists to Reflect on Privacy Using Design Workbooks

This post summarizes a research paper, Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks, co-authored with Deirdre Mulligan, Ellen Van Wyk, John Chuang, and James Pierce. The paper will be presented at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) on Monday November 5th (in the afternoon Privacy in Social Media session). Full paper available here.

Recent wearable and sensing devices, such as Google GlassStrava, and internet-connected toys have raised questions about ways in which privacy and other social values might be implicated by their development, use, and adoption. At the same time, legal, policy, and technical advocates for “privacy by design” have suggested that privacy should embedded into all aspects of the design process, rather than being addressed after a product is released, or rather than being addressed as just a legal issue. By advocating that privacy be addressed through technical design processes, the ability for technology professionals to surface, discuss, and address privacy and other social values becomes vital.

Companies and technologists already use a range of tools and practices to help address privacy, including privacy engineering practices, or making privacy policies more readable and usable. But many existing privacy mitigation tools are either deductive, or assume that privacy problems already known and well-defined in advance. However we often don’t have privacy concerns well-conceptualized in advance when creating systems. Our research shows that design approaches (drawing on a set of techniques called speculative design and design fiction) can help better explore, define, perhaps even anticipate, the what we mean by “privacy” in a given situation. Rather than trying to look at a single, abstract, universal definition of privacy, these methods help us think about privacy as relations among people, technologies, and institutions in different types of contexts and situations.

Creating Design Workbooks

We created a set of design workbooks — collections of design proposals or conceptual designs, drawn together to allow designers to investigate, explore, reflect on, and expand a design space. We drew on speculative design practices: in brief, our goal was to create a set of slightly provocative conceptual designs to help engage people in reflections or discussions about privacy (rather than propose specific solutions to problems posed by privacy).

A set of sketches that comprise the design workbook

Inspired by science fiction, technology research, and trends from the technology industry, we created a couple dozen fictional products, interfaces, and webpages of biosensing technologies, or technologies that sense people. These included smart camera enabled neighborhood watch systems, advanced surveillance systems, implantable tracking devices, and non-contact remote sensors that detect people’s heartrates. In earlier design work, we reflected on how putting the same technologies in different types of situations, scenarios, and social contexts, would vary the types of privacy concerns that emerged (such as the different types of privacy concerns that would emerge if advanced miniatures cameras were used by the police, by political advocates, or by the general public). However, we wanted to see how non-researchers might react to and discuss the conceptual designs.

How Did Technologists-In-Training View the Designs?

Through a series of interviews, we shared our workbook of designs with masters students in an information technology program who were training to go into the tech industry. We found several ways in which they brought up privacy-related issues while interacting with the workbooks, and highlight three of those ways here.

TruWork — A product webpage for a fictional system that uses an implanted chip allowing employers to keep track of employees’ location, activities, and health, 24/7.

First, our interviewees discussed privacy by taking on multiple user subject positions in relation to the designs. For instance, one participant looked at the fictional TruWork workplace implant design by imagining herself in the positions of an employer using the system and an employee using the system, noting how the product’s claim of creating a “happier, more efficient workplace,” was a value proposition aimed at the employer rather than the employee. While the system promises to tell employers whether or not their employees are lying about why they need a sick day, the participant noted that there might be many reasons why an employee might need to take a sick day, and those reasons should be private from their employer. These reflections are valuable, as prior work has documented how considering the viewpoints of direct and indirect stakeholders is important for considering social values in design practices.

CoupleTrack — an advertising graphic for a fictional system that uses an implanted chip for people in a relationship wear in order to keep track of each other’s location and activities.

A second way privacy reflections emerged was when participants discussed the designs in relation to their professional technical practices. One participant compared the fictional CoupleTrack implant to a wearable device for couples that he was building, in order to discuss different ways in which consent to data collection can be obtained and revoked. CoupleTrack’s embedded nature makes it much more difficult to revoke consent, while a wearable device can be more easily removed. This is useful because we’re looking for ways workbooks of speculative designs can help technologists discuss privacy in ways that they can relate back to their own technical practices.

Airport Tracking System — a sketch of an interface for a fictional system that automatically detects and flags “suspicious people” by color-coding people in surveillance camera footage.

A third theme that we found was that participants discussed and compared multiple ways in which a design could be configured or implemented. Our designs tend to describe products’ functions but do not specify technical implementation details, allowing participants to imagine multiple implementations. For example, a participant looking at the fictional automatic airport tracking and flagging system discussed the privacy implication of two possible implementations: one where the system only identifies and flags people with a prior criminal history (which might create extra burdens for people who have already served their time for a crime and have been released from prison); and one where the system uses behavioral predictors to try to identify “suspicious” behavior (which might go against a notion of “innocent until proven guilty”). The designs were useful at provoking conversations about the privacy and values implications of different design decisions.

Thinking About Privacy and Social Values Implications of Technologies

This work provides a case study showing how design workbooks and speculative design can be useful for thinking about the social values implications of technology, particularly privacy. In the time since we’ve made these designs, some (sometimes eerily) similar technologies have been developed or released, such as workers at a Swedish company embedding RFID chips in their hands, or Logitech’s Circle Camera.

But our design work isn’t meant to predict the future. Instead, what we tried to do is take some technologies that are emerging or on the near horizon, and think seriously about ways in which they might get adopted, or used and misused, or interact with existing social systems — such as the workplace, or government surveillance, or school systems. How might privacy and other values be at stake in those contexts and situations? We aim for for these designs to help shed light on the space of possibilities, in an effort to help technologists make more socially informed design decisions in the present.

We find it compelling that our design workbooks helped technologists-in-training discuss emerging technologies in relation to everyday, situated contexts. These workbooks don’t depict far off speculative science fiction with flying cars and spaceships. Rather they imagine future uses of technologies by having someone look at a product website, or a amazon.com page or an interface and thinking about the real and diverse ways in which people might experience those technology products. Using these techniques that focus on the potential adoptions and uses of emerging technologies in everyday contexts helps raise issues which might not be immediately obvious if we only think about positive social implications of technologies, and they also help surface issues that we might not see if we only think about social implications of technologies in terms of “worst case scenarios” or dystopias.

Paper Citation:

Richmond Y. Wong, Deirdre K. Mulligan, Ellen Van Wyk, James Pierce, and John Chuang. 2017. Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 111 (December 2017), 26 pages. DOI: https://doi.org/10.1145/3134746


This post is crossposted with the ACM CSCW Blog

by Richmond at October 17, 2018 04:40 PM

October 15, 2018

Ph.D. student

Privacy of practicing high-level martial artists (BJJ, CI)

Continuing my somewhat lazy “ethnographic” study of Brazilian Jiu Jitsu, an interesting occurrence happened the other day that illustrates something interesting about BJJ that is reflective of privacy as contextual integrity.

Spencer (2016) has accounted for the changes in martial arts culture, and especially Brazilian Jiu Jitsu, due to the proliferation of video on-line. Social media is now a major vector for the skill acquisition in BJJ. It is also, in my gym, part of the social experience. A few dedicated accounts on social media platforms that share images and video from the practice. There is a group chat where gym members cheer each other on, share BJJ culture (memes, tips), and communicate with the instructors.

Several members have been taking pictures and videos of others in practice and sharing them to the group chat. These are generally met with enthusiastic acclaim and acceptance. The instructors have also been inviting in very experienced (black belt) players for one-off classes. These classes are opportunities for the less experienced folks to see another perspective on the game. Because it is a complex sport, there are a wide variety of styles and in general it is exciting and beneficial to see moves and attitudes of masters besides the ones we normally train with.

After some videos of a new guest instructor were posted to the group chat, one of the permanent instructors (“A”) asked not to do this:

A: “As a general rule of etiquette, you need permission from a black belt and esp if two black belts are rolling to record them training, be it drilling not [sic] rolling live.”

A: “Whether you post it somewhere or not, you need permission from both to record then [sic] training.”

B: “Heard”

C: “That’s totally fine by me, but im not really sure why…?

B: “I’m thinking it’s a respect thing.”

A: “Black belt may not want footage of him rolling or training. as a general rule if two black belts are training together it’s not to be recorded unless expressly asked. if they’re teaching, that’s how they pay their bills so you need permission to record them teaching. So either way, you need permission to record a black belt.”

A: “I’m just clarifying for everyone in class on etiquette, and for visiting other schools. Unless told by X, Y, [other gym staff], etc., or given permission at a school you’re visiting, you’re not to record black belts and visiting upper belts while rolling and potentially even just regular training or class. Some schools take it very seriously.”

C: “OK! Totally fine!”

D: “[thumbs up emoji] gots it :)”

D: “totally makes sense”

A few observations on this exchange.

First, there is the intriguing point that for martial arts black belts teaching, their instruction is part of their livelihood. The knowledge of the expert martial arts practitioner is hard-earned and valuable “intellectual property”, and it is exchanged through being observed. Training at a gym with high-rank players is a privilege that lower ranks pay for. The use of video recording has changed the economy of martial arts training. This has in many ways opened up the sport; it also opens up potential opportunities for the black belt in producing training videos.

Second, this is framed as etiquette, not as a legal obligation. I’m not sure what the law would say about recordings in this case. It’s interesting that as a point of etiquette, it applies only to videos of high belt players. Recording low belt players doesn’t seem to be a problem according to the agreement in the discussion. (I personally have asked not to be recorded at one point at the gym when an instructor explicitly asked to be recorded in order to create demo videos. This was out of embarrassment at my own poor skills; I was also feeling badly because I was injured at the time. This sort of consideration does not, it seem, currently operate as privacy etiquette within the BJJ community. Perhaps these norms are currently being negotiated or are otherwise in flux.)

Third, there is a sense in which high rank in BJJ comes with authority and privileges that do not require any justification. The “trainings are livelihood” argument does apply directly to general practice roles; the argument is not airtight. There is something else about the authority and gravitas of the black belt that is being preserved here. There is a sense of earned respect. Somehow this translates into a different form of privacy (information flow) norm.

References

Spencer, D. C. (2016). From many masters to many Students: YouTube, Brazilian Jiu Jitsu, and communities of practice. Jomec Journal, (5).

by Sebastian Benthall at October 15, 2018 05:11 PM