November 15, 2018

#### Using Crowdsourcing to address Disparities in Police Reported Data: Addressing Challenges in Technology and Community Engagement

This is a project update from a CTSP project from 2017: Assessing Race and Income Disparities in Crowdsourced Safety Data Collection (with Kate BeckAditya Medury, and Jesus M. Barajas)

Project Update

This work has led to the development of Street Story, a community engagement tool that collects street safety information from the public, through UC Berkeley SafeTREC.

The tool collects qualitative and quantitative information, and then creates maps and tables that can be publicly viewed and downloaded. The Street Story program aims to collect information that can create a fuller picture of transportation safety issues, and make community-provided information publicly accessible.

The Problem

Low-income groups, people with disabilities, seniors and racial minorities are at higher risk of being injured while walking and biking, but experts have limited information on what these groups need to reduce these disparities. Transportation agencies typically rely on statistics about transportation crashes aggregated from police reports to decide where to make safety improvements. However, police-reported data is limited in a number of ways. First, crashes involving pedestrians or cyclists are significantly under-reported to police, with reports finding that up to 60% of pedestrian and bicycle crashes go unreported. Second, some demographic groups, including low-income groups, people of color and undocumented immigrants, have histories of contentious relationships with police. Therefore, they may be less likely to report crashes to the police when they do occur. Third, crash data doesn’t include locations where near–misses have happened, or locations where individuals feel unsafe but an issue has not yet happened. In other words, the data allow professionals to react to safety issues, but don’t necessarily allow them to be proactive about them.

One solution to improve and augment the data agencies use to make decisions and allocate resources is to provide a way for people to report transportation safety issues themselves. Some public agencies and private firms are developing apps and websites whether people can report issues for this purpose. But one concern is that the people who are likely to use these crowdsourcing platforms are those who have access to smart phones or the internet and who trust that government agencies with use the data to make changes, biasing the data toward the needs of these privileged groups.

Our Initial Research Plan

We chose to examine whether crowdsourced traffic safety data reflected similar patterns of underreporting and potential bias as police-reported safety data. To do this, we created an online mapping tool that people could use to report traffic crashes, near-misses and general safety issues. We planned to work with a city to release this tool to and collected data from the general public, then work directly with a historically marginalized community, under-represented in police-reported data, to target data collection in a high-need neighborhood. We planned to reduce barriers to entry for this community, including meeting the participants in person to explain the tool, providing them with in-person and online training, providing participants with cell phones, and compensating their data plans for the month. By crowdsourcing data from the general public and from this specific community, we planned to analyze whether there were any differences in the types of information reported by different demographics.

This plan seemed to work well with the research question and with community engagement best practices. However, we came up against a number of challenges with our research plan. Although many municipal agencies and community organizations found the work we were doing interesting and were working to address similar transportation safety issues we were focusing on, many organizations and agencies seemed daunted by the prospect of using technology to address underlying issues of under-reporting. Finally, we found that a year was not enough time to build trusting relationships with the organizations and agencies we had hoped to work with. Nevertheless, we were able to release a web-based mapping tool to collect some crowdsourced safety data from the public.

Changing our Research Plan

To better understand how more well-integrated digital crowdsourcing platforms perform, we pivoted our research project to explore how different neighborhoods engage with government platforms to report non-emergency service needs. We assumed some of these non-emergency services would mirror the negative perceptions of bicycle and pedestrian safety we were interested in collecting via our crowdsourcing safety platform. The City of Oakland relies on SeeClickFix, a smartphone app, to allow residents to request service for several types of issues: infrastructure issues, such as potholes, damaged sidewalks, or malfunctioning traffic signals; and non-infrastructure issues such as illegal dumping or graffiti. The city also provides phone, web, and email-based platforms for reporting the same types of service requests. These alternative platforms are collectively known as 311 services. We looked at 45,744 SeeClickFix-reports and 35,271 311-reports made between January 2013 and May 2016. We classified Oakland neighborhoods by status as community of concern. In the city of Oakland, 69 neighborhoods meet the definition for communities of concern, while 43 do not. Because we did not have data on the characteristics of each person reporting a service request, we made the assumption that people reporting requests also lived in the neighborhood where the request was needed.

How did communities of concern interact with the SeeClickFix and 311 platforms to report service needs? Our analysis highlighted two main takeaways. First, we found that communities of concern were more engaged in reporting than other communities, but had different reporting dynamics based on the type of issue they were reporting. About 70 percent of service issues came from communities of concern, even though they represent only about 60 percent of the communities in Oakland. They were nearly twice as likely to use SeeClickFix than to report via the 311 platforms overall, but only for non-infrastructure issues. Second, we found that even though communities of concern were more engaged, the level of engagement was not equal for everyone in those communities. For example, neighborhoods with higher proportions of limited-English proficient households were less likely to report any type of incident by 311 or SeeClickFix.

Preliminary Findings from Crowdsourcing Transportation Safety Data

We deployed the online tool in August 2017. The crowdsourcing platform was aimed at collecting transportation safety-related concerns pertaining to pedestrian and bicycle crashes, near misses, perceptions of safety, and incidents of crime while walking and bicycling in the Bay Area. We disseminated the link to the crowdsourcing platform primarily through Twitter and some email lists. . Examples of organizations who were contacted through Twitter-based outreach and also subsequently interacted with the tweet (through likes and retweets) include Transform Oakland, Silicon Valley Bike Coalition, Walk Bike Livermore, California Walks, Streetsblog CA, and Oakland Built. By December 2017, we had received 290 responses from 105 respondents. Half of the responses corresponded to perceptions of traffic safety concerns (“I feel unsafe walking/cycling here”), while 34% corresponded to near misses (“I almost got into a crash but avoided it”). In comparison, 12% of responses reported an actual pedestrian or bicycle crash, and 4% of incidents reported a crime while walking or bicycling. The sample size of the responses is too small to report any statistical differences.

Figure 1 shows the spatial patterns of the responses in the Bay Area aggregated to census tracts. Most of the responses were concentrated in Oakland and Berkeley. Oakland was specifically targeted as part of the outreach efforts since it has significant income and racial/ethnic diversity.

Figure 1 Spatial Distribution of the Crowdsourcing Survey Responses

In order to assess the disparities in the crowdsourced data collection, we compared responses between census tracts that are classified as communities of concern or not. A community of concern (COC), as defined by the Metropolitan Transportation Commission, a regional planning agency, is a census tract that ranks highly on several markers of marginalization, including proportion of racial minorities, low-income households, limited-English speakers, and households without vehicles, among others.

Table 1 shows the comparison between the census tracts that received at least one crowdsourcing survey response. The average number of responses received in COCs versus non-COCs across the entire Bay Area were similar and statistically indistinguishable. However, when focusing on Oakland-based tracts, the results reveal that average number of crowdsourced responses in non-COCs were statistically higher. To assess how the trends of self-reported pedestrian/cyclist concerns compare with police-reported crashes, an assessment of pedestrian and bicycle-related police-reported crashes (from 2013-2016) shows that more police-reported pedestrian/bicycle crashes were observed on an average in COCs across the Bay Area as well as in Oakland. The difference in trends observed in the crowdsourced concerns and police-reported crashes suggest that either walking/cycling concerns are greater in non-COCs (thus underrepresented in police crashes), or that participation from among COCs is relatively underrepresented.

Table 1 Comparison of crowdsourced concerns and police-reported pedestrian/bicycle crashes in census tracts that received at least 1 response

Table 2 compares the self-reported income and race/ethnicity characteristics of the respondents with the locations where the responses were reported. For reference purposes, Bay Area’s median household income in 2015 was estimated to be 85,000 (Source: http://www.vitalsigns.mtc.ca.gov/income), and Bay Area’s population was estimated to be 58% White, per the 2010 Census, (Source: http://www.bayareacensus.ca.gov/bayarea.htm). Table 2 Distribution of all Bay Area responses based on the location of response and the self-reported income and race/ethnicity of respondents The results reveal that White, medium-to-high income respondents were observed to report more walking/cycling -related safety issues in our survey, and more so in non-COCs. This trend is also consistent with the definition of COCs, which tend to have a higher representation of low-income people and people of color. However, if digital crowdsourcing without widespread community outreach is more likely to attract responses from medium-to-high income groups, and more importantly, if they only live, work, or play in a small portion of the region being investigated, the aggregated results will reflect a biased picture of a region’s transportation safety concerns. Thus, while the scalability of digital crowdsourcing provides an opportunity for capturing underrepresented transportation concerns, it may require greater collaboration with low-income, diverse neighborhoods to ensure uniform adoption of the platform. Lessons Learned From our attempts to work directly with community groups and agencies and our subsequent decision to change our research focus, we learned a number of lessons: 1. Develop a research plan in partnership with communities and agencies. This would have allowed us to ensure that we began with a research plan in which community groups and agencies were better able to partner with us on, and this would have ensured that the partners were on board the topic of interest and the methods we hoped to use. 2. Recognize the time it takes to build relationships. We found that building relationships with agencies and communities was more time intensive and took longer that we had hoped. These groups often have limitations on the time they can dedicate to unfunded projects. Next time, we should plan for this in our initial research plan. 3. Use existing data sources to supplement research. We found that using See-Click-Fix and 311 data was a way to collect and analyze information to add context to our research question. Although the data did not have all demographic information we had hoped to analyze, this data source added additional context to the data we collected. 4. Speak in a language that the general public understands. We found that when we used the term self-reporting, rather than crowdsourcing, when talking to potential partners and to members of the public, these individuals were more willing to consider the use of technology to collect information on safety issues from the public as legitimate. Using vocabulary and phrasing that people are familiar with is crucial when attempting to use technology to benefit the social good. Ph.D. student #### The Crevasse: a meditation on accountability of firms in the face of opacity as the complexity of scale To recap: (A1) Beneath corporate secrecy and user technical illiteracy, a fundamental source of opacity in “algorithms” and “machine learning” is the complexity of scale, especially scale of data inputs. (Burrell, 2016) (A2) The opacity of the operation of companies using consumer data makes those consumers unable to engage with them as informed market actors. The consequence has been a “free fall” of market failure (Strandburg, 2013). (A3) Ironically, this “free” fall has been “free” (zero price) for consumers; they appear to get something for nothing without knowing what has been given up or changed as a consequence (Hoofnagle and Whittington, 2013). Comments: (B1) The above line of argument conflates “algorithms”, “machine learning”, “data”, and “tech companies”, as is common in the broad discourse. That this conflation is possible speaks to the ignorance of the scholarly position on these topics, and ignorance that is implied by corporate secrecy, technical illiteracy, and complexity of scale simultaneously. We can, if we choose, distinguish between these factors analytically. But because, from the standpoint of the discourse, the internals are unknown, the general indication of a ‘black box’ organization is intuitively compelling. (B1a) Giving in to the lazy conflation is an error because it prevents informed and effective praxis. If we do not distinguish between a corporate entity and its multiple internal human departments and technical subsystems, then we may confuse ourselves into thinking that a fair and interpretable algorithm can give us a fair and interpretable tech company. Nothing about the former guarantees the latter because tech companies operate in a larger operational field. (B2) The opacity as the complexity of scale, a property of the functioning of machine learning algorithms, is also a property of the functioning of sociotechnical organizations more broadly. Universities, for example, are often opaque to themselves, because of their own internal complexity and scale. This is because the mathematics governing opacity as a function of complexity and scale are the same in both technical and sociotechnical systems (Benthall, 2016). (B3) If we discuss the complexity of firms, as opposed the the complexity of algorithms, we should conclude that firms that are complex due to scale of operations and data inputs (including number of customers) will be opaque and therefore have strategic advantage in the market against less complex market actors (consumers) with stiffer bounds on rationality. (B4) In other words, big, complex, data rich firms will be smarter than individual consumers and outmaneuver them in the market. That’s not just “tech companies”. It’s part of the MO of every firm to do this. Corporate entities are “artificial general intelligences” and they compete in a complex ecosystem in which consumers are a small and vulnerable part. Twist: (C1) Another source of opacity in data is that the meaning of data come from the causal context that generates it. (Benthall, 2018) (C2) Learning causal structure from observational data is hard, both in terms of being data-intensive and being computationally complex (NP). (c.f. Friedman et al., 1998) (C3) Internal complexity, for a firm, is not sufficient to be “all-knowing” about the data that is coming it; the firm has epistemic challenges of secrecy, illiteracy, and scale with respect to external complexity. (C4) This is why many applications of machine learning are overrated and so many “AI” products kind of suck. (C5) There is, in fact, an epistemic crevasse between all autonomous entities, each containing its own complexity and constituting a larger ecological field that is the external/being/environment for any other autonomy. To do: The most promising direction based on this analysis is a deeper read into transaction cost economics as a ‘theory of the firm’. This is where the formalization of the idea that what the Internet changed most are search costs (a kind of transaction cost) should be. It would be nice if those insights could be expressed in the mathematics of “AI”. There’s still a deep idea in here that I haven’t yet found the articulation for, something to do with autopoeisis. References Benthall, Sebastian. (2016) The Human is the Data Science. Workshop on Developing a Research Agenda for Human-Centered Data Science. Computer Supported Cooperative Work 2016. (link) Sebastian Benthall. Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics. Ph.D. dissertation. Advisors: John Chuang and Deirdre Mulligan. University of California, Berkeley. 2018. Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society 3.1 (2016): 2053951715622512. Friedman, Nir, Kevin Murphy, and Stuart Russell. “Learning the structure of dynamic probabilistic networks.” Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1998. Hoofnagle, Chris Jay, and Jan Whittington. “Free: accounting for the costs of the internet’s most popular price.” UCLA L. Rev. 61 (2013): 606. Strandburg, Katherine J. “Free fall: The online market’s consumer preference disconnect.” U. Chi. Legal F. (2013): 95. #### open source sustainability and autonomy, revisited Some recent chats with Chris Holdgraf and colleagues at NYU interested in “critical digital infrastracture” have gotten me thinking again about the sustainability and autonomy of open source projects again. I’ll admit to having had naive views about this topic in the past. Certainly, doing empirical data science work on open source software projects has given me a firmer perspective on things. Here are what I feel are the hardest earned insights on the matter: • There is tremendous heterogeneity in open source software projects. Almost all quantitative features of these projects fall in log-normal distributions. This suggests that the keys to open source software success are myriad and exogenous (how the technology fits in the larger ecosystem, how outside funding and recognition is accomplished, …) rather than endogenous factors (community policies, etc.) While many open source projects start as hobby and unpaid academic projects, those that go on to be successful find one or more funding sources. This funding is an exogenous factor. • The most significant exogenous factors to an open source software project’s success are the industrial organization of private tech companies. Developing an open technology is part of the strategic repertoire of these companies: for example, to undermine the position of a monopolist, developing an open source alternative decreases barriers to market entry and allows for a more competitive field in that sector. Another example: Google funded Mozilla for so long arguably to deflect antitrust action over Google Chrome. • There is some truth to Chris Kelty’s idea of open source communities as recursive publics, cultures that have autonomy that can assert political independence at the boundaries of other political forces. This autonomy comes from: the way developers of OSS get specific and valuable human capital in the process of working with the software and their communities; the way institutions begin to depend on OSS as part of their technical stack, creating an installed base; and how many different institutions may support the same project, creating competition for the scarce human capital of the developers. Essentially, at the point where the software and the skills needed to deploy it effectively and the community of people with those skills is self-organized, the OSS community has gained some economic and political autonomy. Often this autonomy will manifest itself in some kind of formal organization, whether a foundation, a non-profit, or a company like Redhat or Canonical or Enthought. If the community is large and diverse enough it may have multiple organizations supporting it. This is in principle good for the autonomy of the project but may also reflect political tensions that can lead to a schism or fork. • In general, since OSS development is internally most often very fluid, with the primary regulatory mechanism being the fork, the shape of OSS communities is more determined by exogenous factors than endogenous ones. When exogenous demand for the technology rises, the OSS community can find itself with a 'surplus', which can be channeled into autonomous operations. According to research done by Ogilvy, “five times as many people read the headlines as read the body copy. It follows that unless your headline sells your product, you have wasted 90 per cent of your money.” • Promise a benefit. Make sure the benefit is important to your customer. For example, “whiter wash, more miles per gallon, freedom from pimples, fewer cavities.” • Make it persuasive, and make it unique. Persuasive headlines that aren’t unique, which your competitors can claim, aren’t effective. • Make it specific. Use percentages, time elapsed, dollars saved. • Personalize it to your audience, such as the city they’re in. (Or the words in their search query) • Include the brand and product name. • Make it as long or as short as it needs to be. Ogilvy’s research found that, “headlines with more than ten words get less readership than short headlines. On the other hand, a study of retail advertisements found that headlines of ten words sell more merchandise than short headlines. Conclusion: if you need a long headline, go ahead and write one, and if you want a short headline, that’s all right too.” • Make it clear and to the point, not clever or tricky. • Don’t use superlatives like, “Our product is the best in the world.” Market researcher George Gallup calls this “Brag and Boast.” They convince nobody. ### Ideas for Headlines • Headlines that contain news are surefire. The news can be announcing a new product, or a new way to use an existing product. “And don’t scorn tried-and-true words like amazing, introducing, now, suddenly.” • Include information that’s useful to the reader, provided the information involves your product. • Try including a quote, such as from an expert or customers. ## How to Write Persuasive Body Copy According to Ogilvy, body copy is seldom read by more than 10% of people. But the 10% who read it are prospects. What you say determines the success of your ad, so it’s worth spending the time to get it right. • Address readers directly, as if you are speaking to them. "One human being to another, second person singular.” • Write short sentences and short paragraphs. Avoid complicated words. Use plain, everyday language. • Don’t write long-winded, philosophical essays. “Tell your reader what your product will do for him or her, and tell it with specifics.” • Write your copy in the form of a story. The headline can be a hook. • Avoid analogies. People often misunderstand them. • Just like with headlines, stay away from superlatives like, “Our product is the best in the world.” • Use testimonials from customers or experts (also known as “social proof”). Avoid celebrity testimonials. Most people forget the product and remember the celebrity. Further, people assume the celebrity has been bought, which is usually true. • Coupons and special offers work. • Always include the price of your products. “You may see a necklace in a jeweler’s window, but you don’t consider buying it because the price is not shown and you are too shy to go in and ask. It is the same way with advertisements. When the price of the product is left out, people have a way of turning the page.” • Long copy sells more than short. “I believe, without any research to support me, that advertisements with long copy convey the impression that you have something important to say, whether people read the copy or not.” • Stick to the facts about what your product is and can do. • Make the first paragraph a grabber to draw people into reading your copy. • Sub-headlines make copy more readable and scannable. • People often skip from the headline to the coupon to see the offer, so make the coupons mini-ads, complete with brand name, promise, and a mini photo of the product. • To keep prospects on the hook, try “limited edition,” “limited supply,” “last time at this price,” or “special price for promptness.” ## Suggestions for Images After headlines, images are the most important part of advertisements. They draw people in. Here’s what makes imagery effective: • The best images arouse the viewer’s curiosity. They look at it and ask, “What’s going on here?” This leads them to read the copy to find out. This is called “Story Appeal.” • If you don’t have a good story to tell, make your product the subject. • Show the end result of using your product. Before-and-after photographs are highly effective. • Photographs attract more readers, are more believable, and better remembered than illustrations. • Human faces that are larger than life size repel readers. Don’t use them. • Historical subjects bore people. • If your picture includes people, it’s most effective if it uses people your audience can identify with. Doctors if you’re trying to sell to doctors, men if you’re trying to appeal to men, and so on. • Include captions under your photographs. More people read captions than body copy, so make the caption a mini-advertisement. ## Layout • KISS – Keep It Simple, Stupid. • “Readers look first at the illustration, then at the headline, then at the copy. So put these elements in that order.” This also follows the normal order of scanning. • More people read captions of images than body copy, so always include a caption under it. Captions should be mini-advertisements, so include the brand name and promise. ## A Few More Tips for Effective Ads These are some other principles I picked up from the book, which can be useful in many different types of ads. • Demonstrations of how well your product works are effective. Try coming up with a demonstration that your reader can perform. • Don’t name competitors. The ad is less believable and more confusing. People often think the competitor is the hero. • Problem-solution is a tried-and-true ad technique. • Give people a reason why they should buy. • Emotion can be highly effective. Nostalgia, charm, sentimentality, etc. Consumers need a rational excuse to justify their emotional decisions. • Cartoons don’t sell well to adults. • The most successful products and services are differentiated from their competitors. This is most effective if you can differentiate via low cost or highest quality. A differentiator doesn’t need to be relevant to the product’s performance, however, to be effective. For example, Owens-Corning differentiated their insulation by advertising the color of the product, which has nothing to do with how the product performs. Ogilvy’s principles are surprisingly evergreen, despite the technological changes. Towards the end of the book he quotes Bill Bernbach, another advertising giant, on why this is: Human nature hasn’t changed for a billion years. It won’t even vary in the next billion years. Only the superficial things have changed. It is fashionable to talk about changing man. A communicator must be concerned with unchanging man – what compulsions drive him, what instincts dominate his every action, even though his language too often camouflages what really motivates him. For if you know these things about a man, you can touch him at the core of his being. One thing is unchangingly sure. The creative man with an insight into human nature, with the artistry to touch and move people, will succeed. Without them he will fail. Human nature hasn’t changed much, indeed. Get the book here: Ogivly on Advertising ## November 12, 2018 Ph.D. student #### What proportion of data protection violations are due to “dark data” flows? “Data protection” refers to the aspect of privacy that is concerned with the use and misuse of personal data by those that process it. Though widely debated, scholars continue to converge (e.g.) on ideal data protection consisting of alignment between the purposes the data processor will use the data for and the expectations of the user, along with collection limitations that reduce exposure to misuse. Through its extraterritorial enforcement mechanism, the GDPR has threatened to make these standards global. The implication of these trends is that there will be a global field of data flows regulated by these kinds of rules. Many of the large and important actors that process user data can be held accountable to the law. Privacy violations by these actors will be due to a failure to act within the bounds of the law that applies to them. On the other hand, there is also cybercrime, an economy of data theft and information flows that exists “outside the law”. I wonder what proportion of data protection violations are due to dark data flows–flows of personal data that are handled by organizations operating outside of any effective regulation. I’m trying to draw an analogy to a global phenomenon that I know little about but which strikes me as perhaps more pressing than data protection: the interrelated problems of money laundering, off-shore finance, and dark money contributions to election campaigns. While surely oversimplifying the issue, my impression is that the network of financial flows can be divided into those that are more and less regulated by effective global law. Wealth seeks out these opportunities in the dark corners. How much personal data flows in these dark networks? And how much is it responsible for privacy violations around the world? Versus how much is data protection effectively in the domain of accountable organizations (that may just make mistakes here and there)? Or is the dichotomy false, with truly no firm boundary between licit and illicit data flow networks? ## November 11, 2018 Ph.D. student #### Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks [Talk] This blog post is a version of a talk I gave at the 2018 ACM Computer Supported Cooperative Work and Social Computing (CSCW) Conference based on a paper written with Deirdre Mulligan, Ellen Van Wyk, John Chuang, and James Pierce, entitled Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks, which was honored with a best paper award. Find out more on our project page, our summary blog post, or download the paper: [PDF link] [ACM link] In the work described in our paper, we created a set of conceptual speculative designs to explore privacy issues around emerging biosensing technologies, technologies that sense human bodies. We then used these designs to help elicit discussions about privacy with students training to be technologists. We argue that this approach can be useful for Values in Design and Privacy by Design research and practice. Image from publicintelligence.net. Note the middle bullet point in the middle column – “avoids all privacy issues.” Let me start with a motivating example, which I’ve discussed in previous talks. In 2007, the US Department of Homeland Security proposed a program to try to predict criminal behavior in advance of the crime itself –using thermal sensing, computer vision, eye tracking, gait sensing, and other physiological signals. And supposedly it would “avoid all privacy issues.” But it seems pretty clear that privacy was not fully thought through in this project. Now Homeland Security projects actually do go through privacy impact assessments and I would guess that in this case, they would probably go through the impact assessment process, find that the system doesn’t store the biosensed data, so privacy is protected. But while this might address one conception of privacy related to storing data, there are other conceptions of privacy at play. There are still questions here about consent and movement in public space, about data use and collection, or about fairness and privacy from algorithmic bias. While that particular imagined future hasn’t come to fruition; a lot of these types of sensors are now becoming available as consumer devices, used in applications ranging from health and quantified self, to interpersonal interactions, to tracking and monitoring. And it often seems like privacy isn’t fully thought through before new sensing devices and services are publicly announced or released. A lot of existing privacy approaches, like privacy impact assessments, are deductive, checklist-based, or assume that privacy problems already known and well-defined in advance which often isn’t the case. Furthermore, the term “design” in discussions of Privacy by Design, is often seen as a way of providing solutions to problems identified by law, rather than viewing design as a generative set of practices useful to understanding what privacy issues might need to be considered in the first place. We argue that speculative design-inspired approaches can help explore and define problem spaces of privacy in inductive, situated, and contextual ways. # Design and Research Approach We created a design workbook of speculative designs. Workbooks are collections of conceptual designs drawn together to allow designers to explore and reflect on a design space. Speculative design is a practice of using design to ask social questions, by creating conceptual designs or artifacts that help create or suggest a fictional world. We can create speculative designs explore different configurations of the world, imagine and understand possible alternative futures, which helps us think through issues that have relevance in the present. So rather than start with trying to find design solutions for privacy, we wanted to use design workbooks and speculative designs together to create a collection of designs to help us explore the what problem space of privacy might look like with emerging biosensing technologies. A sampling of the conceptual designs we created as part of our design workbook In our prior work, we created a design workbook to do this exploration and reflection. Inspired by recent research, science fiction, and trends from the technology industry, we created a couple dozen fictional products, interfaces, and webpages of biosensing technologies. These included smart camera enabled neighborhood watch systems, advanced surveillance systems, implantable tracking devices, and non-contact remote sensors that detect people’s heartrates. This process is documented in a paper from Designing Interactive Systems. These were created as part of a self-reflective exercise, for us as design researchers to explore the problem space of privacy. However, we wanted to know how non-researchers, particularly technology practitioners might discuss privacy in relation to these conceptual designs. A note on how we’re approaching privacy and values. Following other values in design work and privacy research, we want to avoid providing a single universalizing definition of privacy as a social value. We recognize privacy as inherently multiple – something that is situated and differs within different contexts and situations. Our goal was to use our workbook as a way to elicit values reflections and discussion about privacy from our participants – rather than looking for “stakeholder values” to generate design requirements for privacy solutions. In other words, we were interested in how technologists-in-training would use privacy and other values to make sense of the designs. Growing regulatory calls for “Privacy by Design” suggest that privacy should be embedded into all aspects of the design process, and at least partially done by designers and engineers. Because of this, the ability for technology professionals to surface, discuss, and address privacy and related values is vital. We wanted to know how people training for those jobs might use privacy to discuss their reactions to these designs. We conducted an interview study, recruiting 10 graduate students from a West Coast US University who are training to go into technology professions, most of whom had prior tech industry experience via prior jobs or internships. At the start of the interview, we gave them a physical copy of the designs and explained that the designs were conceptual, but didn’t tell them that the designs were initially made to think about privacy issues. In the following slides, I’ll show a few examples of the speculative design concepts we showed – you can see more of them in the paper. And then I’ll discuss the ways in which participants used values to make sense of or react to some of the designs. # Design examples This design depicts an imagined surveillance system for public spaces like airports that automatically assigns threat statuses to people by color-coding them. We intentionally left it ambiguous how the design makes its color-coding determinations to try to invite questions about how the system classifies people. Conceptual TruWork design – “An integrated solution for your office or workplace!” In our designs, we also began to iterate on ideas relating to tracking implants, and different types of social contexts they could be used in. Here’s a scenario advertising a workplace implantable tracking device called TruWork. Employers can subscribe to the service and make their employees implant these devices to keep track of their whereabouts and work activities to improve efficiency. Conceptual CoupleTrack infographic depicting an implantable tracking chip for couples We also re-imagined the implant as “coupletrack,” an implantable tracking chip for couples to use, as shown in this infographic. # Findings We found that participants centered values in their discussions when looking at the designs – predominantly privacy, but also related values such as trust, fairness, security, and due process. We found eight themes of how participants interacted with the designs in ways that surfaced discussion of values, but I’ll highlight three here: Imagining the designs as real; seeing one’s self as multiple users; and seeing one’s self as a technology professional. The rest are discussed in more detail in the paper. ## Imagining the Designs as Real Conceptual product page for a small, hidden, wearable camera Even though participants were aware that the designs were imagined, Some participants imagined the designs as seemingly real by thinking about long term effects in the fictional world of the design. This design (pictured above) is an easily hideable, wearable, live streaming HD camera. One participant imagined what could happen to social norms if these became widely adopted, saying “If anyone can do it, then the definition of wrong-doing would be questioned, would be scrutinized.” He suggests that previously unmonitored activities would become open for surveillance and tracking like “are the nannies picking up my children at the right time or not? The definition of wrong-doing will be challenged”. Participants became actively involved fleshing out and creating the worlds in which these designs might exist. This reflection is also interesting, because it begins to consider some secondary implications of widespread adoption, highlighting potential changes in social norms with increasing data collection. ## Seeing One’s Self as Multiple Users Second, participants took multiple user subject positions in relation to the designs. One participant read the webpage for TruWork and laughed at the design’s claim to create a “happier, more efficient workplace,” saying, “This is again, positioned to the person who would be doing the tracking, not the person who would be tracked.” She notes that the website is really aimed at the employer. She then imagines herself as an employee using the system, saying: If I called in sick to work, it shouldn’t actually matter if I’m really sick. […] There’s lots of reasons why I might not wanna say, “This is why I’m not coming to work.” The idea that someone can check up on what I said—it’s not fair. This participant put herself in both the viewpoint of an employer using the system and as an employee using the system, bringing up issues of workplace surveillance and fairness. This allowed participants to see values implications of the designs from different subject positions or stakeholder viewpoints. ## Seeing One’s Self as a Technology Professional Third, participants also looked at the designs through the lens of being a technology practitioner, relating the designs to their own professional practices. Looking at the design that automatically flags and detects supposedly suspicious people, one participant reflected on his self-identification as a data scientist and the values implications of predicting criminal behavior with data when he said: the creepy thing, the bad thing is, like—and I am a data scientist, so it’s probably bad for me too, but—the data science is predicting, like Minority Report… [and then half-jokingly says] …Basically, you don’t hire data scientists. Here he began to reflect on how his practices as data scientist might be implicated in this product’s creepiness – that a his initial propensity to want to use the data to predict if subjects are criminals or not might not be a good way to approach this problem and have implications for due process. Another participant compared the CoupleTrack design to a project he was working on. He said: [CoupleTrack] is very similar to our idea. […] except ours is not embedded in your skin. It’s like an IOT charm which people [in relationships] carry around. […] It’s voluntary, and that makes all the difference. You can choose to keep it or not to keep it. In comparing the fictional CoupleTrack product to the product he’s working on in his own technical practice, the value of consent, and how one might revoke consent, became very clear to this participant. Again, we thought it was compelling that the designs led some participants to begin reflecting on the privacy implications in their own technical practices. # Reflections and Takeaways Given the workbooks’ ability to help elicit reflections on and discussion of privacy in multiple ways, we see this approach as useful for future Values in Design and Privacy by Design work. The speculative workbooks helped open up discussions about values, similar to some of what Katie Shilton identifies as “values levers,” activities that foreground values, and cause them to be viewed as relevant and useful to design. Participants’ seeing themselves as users to reflect on privacy harms is similar to prior work showing how self-testing can lead to discussion of values. Participants looking at the designs from multiple subject positions evokes value sensitive design’s foregrounding of multiple stakeholder perspectives. Participants reflected on the designs both from stakeholder subject positions and through the lenses of their professional practices as technology practitioners in training. While Shilton identifies a range of people who might surface values discussions, we see the workbook as an actor to help surface values discussions. By depicting some provocative designs that raised some visceral and affective reactions, the workbooks brought attention to questions about potential sociotechnical configurations of biosensing technologies. Future values in design work might consider creating and sharing speculative design workbooks for eliciting values reflections with experts and technology practitioners. More specifically, with this project’s focus on privacy, we think that this approach might be useful for “Privacy by Design”, particularly for technologists trying to surface discussions about the nature of the privacy problem at play for an emerging technology. We analyzed participants’ responses using Mulligan et al’s privacy analytic framework. The paper discusses this in more detail, but the important thing is that participants went beyond just saying privacy and other values are important to think about. They began to grapple with specific, situated, and contextual aspects of privacy – such as considering different ways to consent to data collection, or noting different types of harms that might emerge when the same technology is used in a workplace setting compared to an intimate relationship. Privacy professionals are looking for tools to help them “look around corners,” to help understand what new types of problems related to privacy might occur in emerging technologies and contexts. This provides a potential new tool for privacy professionals in addition to many of the current top-down, checklist approaches–which assume that the concepts of privacy at play are well known in advance. Speculative design practices can be particularly useful here – not to predict the future, but in helping to open and explore the space of possibilities. Thank you to my collaborators, our participants, and the anonymous reviewers. Paper citation: Richmond Y. Wong, Deirdre K. Mulligan, Ellen Van Wyk, James Pierce, and John Chuang. 2017. Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 111 (December 2017), 26 pages. DOI: https://doi.org/10.1145/3134746 ## November 07, 2018 Ph.D. student #### the resilience of agonistic control centers of global trade This post is merely notes; I’m fairly confident that I don’t know what I’m writing about. However, I want to learn more. Please recommend anything that could fill me in about this! I owe most of this to discussion with a colleague who I’m not sure would like to be acknowledged. Following the logic of James Beniger, an increasingly integrated global economy requires more points of information integration and control. Bourgeois (in the sense of ‘capitalist’) legal institutions exist precisely for the purpose of arbitrating between merchants. Hence, on the one hand we would expect international trade law to be Habermasian. However, international trade need not rest on a foundation of German idealism (which increasingly strikes me as the core of European law). Rather, it is an evolved mechanism. A key part of this mechanism, as I’ve heard, is that it is decentered. Multiple countries compete to be the sites of transnational arbitration, much like multiple nations compete to be tax havens. Sovereignty and discretion are factors of production in the economy of control. This means, effectively, that one cannot defeat capitalism by chopping off its head. It is rather much more like a hydra: the “heads” are the creation of two-sided markets. These heads have no internalized sense of the public good. Rather, they are optimized to be attractive to the transnational corporations in bilateral negotiation. The plaintiffs and defendants in these cases are corporations and states–social forms and institutions of complexity far beyond that of any individual person. This is where, so to speak, the AI’s clash. ## October 31, 2018 Ph.D. student #### Best Practices Team Challenges By Stuart Geiger and Dan Sholler, based on a conversation with Aaron Culich, Ciera Martinez, Fernando Hoces, Francois Lanusse, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Sara Stoudt, and Stéfan van der Walt. This post first appeared on the BIDS Blog. This post is a summary of the first BIDS Best Practices lunch, in which we bring people together from across the Berkeley campus and beyond to discuss a particular challenge or issue in doing data-intensive research. The goal of the series is to informally share experiences and ideas on how to do data science well (or at least better) from many disciplines and contexts. The topic for this week was doing data-intensive research in teams, labs, and other groups. For this first meeting, we focused on just identifying and diagnosing the many different kinds of challenges. In future meetings, we will dive deeper into some of these specific issues and try to identify best practices for dealing with them. We began planning for this series by reviewing many of the published papers and series around “best practices” in scientific computing (e.g. Wilson et al, 2014), “good enough practices” (Wilson et al, 2017) and PLOS Computational Biology’s “ten simple rules” series (e.g. Sandve et al, 2013; Goodman et al, 2014). We also see this series as an intellectual successor to the collection of case studies in reproducible research published by several BIDS fellows (Kitzes, Turek, and Deniz, 2018). One reason we chose to identify issues with doing data science in teams and groups is because many of us felt like we understood how to best practice data-intensive research individually, but struggled with how to do this well in teams and groups. ## Compute and data challenges ### Getting on the same stack Some of the major challenges in doing data-intensive research in teams is around technology use, particularly in using the same tools. Today’s computational researchers have an overwhelming number of options to choose in terms of programming languages, software libraries, data formats, operating systems, compute infrastructures, version control systems, collaboration platforms, and more. One of the major challenges we discussed was that members of a team often have been trained to work with different technologies, which also often come with their own ways of working on a problem. Getting everyone on the same technical stack often takes far more time than is anticipated, and new members can spend much time learning to work in a new stack. One of the biggest divides our group had experienced was in the choice of using programming languages, as many of us were more comfortable with either R or Python. These programming languages have their own extensive software libraries, like the tidyverse vs. the numpy/pandas/matplotlib stack. There are also many different software environments to choose from at various layers of the stack, from development environments like Jupyter notebooks versus RStudio and RMarkdown to the many options for package and dependency management. While most of the people in the room were committed to open source languages and environments, many people are trained to use proprietary software like MATLAB or SPSS, which raises an additional challenge in teams and groups. Another major issue is where the actual computing and data storage will take place. Members of a team often come in knowing how to run code on their own laptops, but there are many options for groups to work, including a lab’s own shared physical server, campus clusters, national grid/supercomputer infrastructures, corporate cloud services, and more. ### Workflow and pipeline management Getting everyone to use an interoperable software and hardware environment is as much of a social challenge as it is a technical one, and we had a great discussion about whether a group leader should (or could) require members to use the same language, environment, or infrastructure. One of the technical solutions to this issue — working in staged data analysis pipelines — comes with its own set of challenges. With staged pipelines, data processing and analysis tasks are separated into modular tasks that an individual can solve in their own way, then output their work to a standardized file for the next stage of the pipeline to take as input. The ideal end goal is often imagined to be a fully-automated (or ‘one click’) data processing and analysis pipeline, but this is difficult to achieve and maintain in practice. Several people in our group said they personally spend substantial amounts of time setting up these pipelines and making sure that each person’s piece works with everyone else’s. Even with groups that had formalized detailed data management plans, a common theme was that someone had to constantly make sure that team members were actually following these standards so that the pipeline keep running. ### External handoffs to and from the team Many of the research projects we discussed involved not only handoffs between members of the team, but also handoffs between the team and external groups. The “raw” data a team begins with is often the final output of another research team, government agency, or company. In these cases, our group discussed issues that ranged from technical to social, from data formats that are technically difficult to integrate at scale (like Excel spreadsheets) to not having adequate documentation to be able to interpret what the data actually means. Similarly, teams often must deliver data to external partners, who may have very different needs, expectations, and standards than the team has for itself. Finally, some teams have sensitive data privacy issues and requirements, which makes collaboration even more difficult. How can these external relationships be managed in mutually beneficial ways? ## Team management challenges Beyond technical challenges, a number of management issues face research groups aspiring to implement best practices for data-intensive research. Our discussion highlighted the difficulties of composing a well-balanced team, of dealing with fluid membership, and of fostering generative coordination and communication among group members. ### Composing a well-balanced team Data-intensive research groups require a team with varied expertise. A consequence of varied expertise is varied capabilities and end goals, so project leads must devote attention to managing team composition. Whereas one or two members might be capable of carrying out tasks across the various stages of research, others might specialize in a particular area. How then can research groups ensure that no one member of the team departing would collapse the project and that the team holds the necessary expertise to accomplish the shared research goal? Furthermore, some members may participate simply to acquire skills, while others seek to establish or build an academic track record. How might groups achieve alignment between personal and team goals? ### Dealing with voluntary and fluid membership A practical management problem also relates to the quasi-voluntary and fluid nature of research groups. Research groups largely rely extensively on students and postdocs, with an expectation that they join the team temporarily to gain new skills and experience, then leave. Turnover becomes a problem when processes, practices, and tacit institutional knowledge are difficult to standardize or document. What strategies might project leads employ to alleviate the difficulties associated with voluntary, fluid membership? ### Fostering coordination and communication The issues of team composition and voluntary or fluid membership raise a third challenge: fostering open communication among group members. Previous research and guidelines for managing teams (Edmondson, 1999; Google re:Work, 2017) emphasize the vital role of psychological safety in ensuring that team members share knowledge and collaborate effectively. Adequate psychological safety ensures that team members are comfortable speaking up about their ideas and welcoming of others’ feedback. Yet fostering psychological safety is a difficult task when research groups comprise members with various levels of expertise, career experience, and, increasingly, communities of practice (as in the case of data scientists working with domain experts). How can projects establish avenues for open communication between diverse members? ### Not abandoning best practices when deadlines loom One of the major issues that resonated across our group was the tendency for a team to stop following various best practices when deadlines rapidly approach. In the rush to do everything that is needed to get a publication submitted, it is easy to accrue what software engineers call “technical debt.” For example, substantial “collaboration debt” or “reproducibility debt” can be foisted on a team when a member works outside of the established workflow to produce a figure or fails to document their changes to analysis code. These stressful moments can also be difficult for the team’s psychological safety, particularly if there is an expectation to work late hours to make the deadline. ## Concluding thoughts and plans ### Are there universal best practices for all cases and contexts? At the conclusion of our first substantive meeting, we began to evaluate topics for future discussions that might help us identify potential solutions to the challenges faced by data-intensive research groups. In doing so, we were quickly confronted with the diversity of technologies, research agendas, disciplinary norms, team compositions, and governance structures, and other factors that characterize scientific research groups. Are solutions that work for large teams appropriate for smaller teams? Do cross-institutional or inter-disciplinary teams face different problems than those working in the same institution or discipline? Are solutions that work in astronomy or physics appropriate for ecology or social sciences? Dealing with such diversity and contextuality, then, might require adjusting our line of inquiry to the following question: At what level should we attempt to generalize best practices? ### Our future plans The differences within and between research groups are meaningful and deserve adequate attention, but commonalities do exist. This semester, our group will aggregate and develop input from a diverse community of practitioners to construct sets of thoughtful, grounded recommendations. For example, we’ll aim to provide recommendations on issues such as how to build and maintain pipelines and workflows, as well as strategies for achieving diversity and inclusion in teams. In our next post, we’ll offer some insights on how to manage the common problem of perpetual turnover in team membership. On all topics, we welcome feedback and recommendations. ### Combatting impostor syndrome Finally, many people who attended told us afterwards how positive and valuable it was to share these kinds of issues and experiences, particularly for combatting the “impostor syndrome” that many of us often feel. We typically only present the final end-product of research. Even sharing one’s final code and data in perfectly reproducible pipelines can still hide all the messy, complex, and challenging work that goes into the research process. People deeply appreciated hearing others talk openly about the difficulties and challenges that come with doing data-intensive research and how they tried to deal with them. The format of sharing challenges followed by strategies for dealing with those challenges may be a meta-level best practice for this kind of work, versus the more standard approach of listing more abstract rules and principles. Through these kinds of conversations, we hope to continue to shed light on the doing of data science in ways that will be constructive and generative across the many fields, areas, and contexts in which we all work. ## October 23, 2018 Ph.D. student #### For a more ethical Silicon Valley, we need a wiser economics of data Kara Swisher’s NYT op-ed about the dubious ethics of Silicon Valley and Nitasha Tiku’s WIRED article reviewing books with alternative (and perhaps more cynical than otherwise stated) stories about the rise of Silicon Valley has generated discussion and buzz among the tech commentariat. One point of debate is whether the focus should be on “ethics” or on something more substantively defined, such as human rights. Another point is whether the emphasis should be on “ethics” or on something more substantively enforced, like laws which impose penalties between 1% and 4% of profits, referring of course to the GDPR. While I’m sympathetic to the European approach (laws enforcing human rights with real teeth), I think there is something naive about it. We have not yet seen whether it’s ever really possible to comply with the GDPR could wind up being a kind of heavy tax on Big Tech companies operating in the EU, but one that doesn’t truly wind up changing how people’s data are used. In any case, the broad principles of European privacy are based on individual human dignity, and so they do not take into account the ways that corporations are social structures, i.e. sociotechnical organizations that transcend individual people. The European regulations address the problem of individual privacy while leaving mystified the question of why the current corporate organization of the world’s personal information is what it is. This sets up the fight over ‘technology ethics’ to be a political conflict between different kinds of actors whose positions are defined as much by their social habitus as by their intellectual reasons. My own (unpopular!) view is that the solution to our problems of technology ethics are going to have to rely on a better adapted technology economics. We often forget today that economics was originally a branch of moral philosophy. Adam Smith wrote The Theory of Moral Sentiments (1759) before An Inquiry into the Nature and Causes of the Wealth of Nations (1776). Since then the main purpose of economics has been to intellectually grasp the major changes to society due to production, trade, markets, and so on in order to better steer policy and business strategy towards more fruitful equilibria. The discipline has a bad reputation among many “critical” scholars due to its role in supporting neoliberal ideology and policies, but it must be noted that this ideology and policy work is not entirely cynical; it was a successful centrist hegemony for some time. Now that it is under threat, partly due to the successes of the big tech companies that benefited under its regime, it’s worth considering what new lessons we have to learn to steer the economy in an improved direction. The difference between an economic approach to the problems of the tech economy and either an ‘ethics’ or a ‘law’ based approach is that it inherently acknowledges that there are a wide variety of strategic actors co-creating social outcomes. Individual “ethics” will not be able to settle the outcomes of the economy because the outcomes depend on collective and uncoordinated actions. A fundamentally decent person may still do harm to others due to their own bounded rationality; “the road to hell is paved with good intentions”. Meanwhile, regulatory law is not the same as command; it is at best a way of setting the rules of a game that will be played, faithfully or not, by many others. Putting regulations in place without a good sense of how the game will play out differently because of them is just as irresponsible as implementing a sweeping business practice without thinking through the results, if not more so because the relationship between the state and citizens is coercive, not voluntary as the relationship between businesses and customers is. Perhaps the biggest obstacle to shifting the debate about technology ethics to one about technology economics is that it requires a change in register. It drains the conversation of the pathos which is so instrumental in surfacing it as an important political topic. Sound analysis often ruins parties like this. Nevertheless, it must be done if we are to progress towards a more just solution to the crises technology gives us today. ## October 17, 2018 Ph.D. student #### Engaging Technologists to Reflect on Privacy Using Design Workbooks This post summarizes a research paper, Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks, co-authored with Deirdre Mulligan, Ellen Van Wyk, John Chuang, and James Pierce. The paper will be presented at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) on Monday November 5th (in the afternoon Privacy in Social Media session). Full paper available here. Recent wearable and sensing devices, such as Google GlassStrava, and internet-connected toys have raised questions about ways in which privacy and other social values might be implicated by their development, use, and adoption. At the same time, legal, policy, and technical advocates for “privacy by design” have suggested that privacy should embedded into all aspects of the design process, rather than being addressed after a product is released, or rather than being addressed as just a legal issue. By advocating that privacy be addressed through technical design processes, the ability for technology professionals to surface, discuss, and address privacy and other social values becomes vital. Companies and technologists already use a range of tools and practices to help address privacy, including privacy engineering practices, or making privacy policies more readable and usable. But many existing privacy mitigation tools are either deductive, or assume that privacy problems already known and well-defined in advance. However we often don’t have privacy concerns well-conceptualized in advance when creating systems. Our research shows that design approaches (drawing on a set of techniques called speculative design and design fiction) can help better explore, define, perhaps even anticipate, the what we mean by “privacy” in a given situation. Rather than trying to look at a single, abstract, universal definition of privacy, these methods help us think about privacy as relations among people, technologies, and institutions in different types of contexts and situations. ## Creating Design Workbooks We created a set of design workbooks — collections of design proposals or conceptual designs, drawn together to allow designers to investigate, explore, reflect on, and expand a design space. We drew on speculative design practices: in brief, our goal was to create a set of slightly provocative conceptual designs to help engage people in reflections or discussions about privacy (rather than propose specific solutions to problems posed by privacy). A set of sketches that comprise the design workbook Inspired by science fiction, technology research, and trends from the technology industry, we created a couple dozen fictional products, interfaces, and webpages of biosensing technologies, or technologies that sense people. These included smart camera enabled neighborhood watch systems, advanced surveillance systems, implantable tracking devices, and non-contact remote sensors that detect people’s heartrates. In earlier design work, we reflected on how putting the same technologies in different types of situations, scenarios, and social contexts, would vary the types of privacy concerns that emerged (such as the different types of privacy concerns that would emerge if advanced miniatures cameras were used by the police, by political advocates, or by the general public). However, we wanted to see how non-researchers might react to and discuss the conceptual designs. ## How Did Technologists-In-Training View the Designs? Through a series of interviews, we shared our workbook of designs with masters students in an information technology program who were training to go into the tech industry. We found several ways in which they brought up privacy-related issues while interacting with the workbooks, and highlight three of those ways here. TruWork — A product webpage for a fictional system that uses an implanted chip allowing employers to keep track of employees’ location, activities, and health, 24/7. First, our interviewees discussed privacy by taking on multiple user subject positions in relation to the designs. For instance, one participant looked at the fictional TruWork workplace implant design by imagining herself in the positions of an employer using the system and an employee using the system, noting how the product’s claim of creating a “happier, more efficient workplace,” was a value proposition aimed at the employer rather than the employee. While the system promises to tell employers whether or not their employees are lying about why they need a sick day, the participant noted that there might be many reasons why an employee might need to take a sick day, and those reasons should be private from their employer. These reflections are valuable, as prior work has documented how considering the viewpoints of direct and indirect stakeholders is important for considering social values in design practices. A second way privacy reflections emerged was when participants discussed the designs in relation to their professional technical practices. One participant compared the fictional CoupleTrack implant to a wearable device for couples that he was building, in order to discuss different ways in which consent to data collection can be obtained and revoked. CoupleTrack’s embedded nature makes it much more difficult to revoke consent, while a wearable device can be more easily removed. This is useful because we’re looking for ways workbooks of speculative designs can help technologists discuss privacy in ways that they can relate back to their own technical practices. A third theme that we found was that participants discussed and compared multiple ways in which a design could be configured or implemented. Our designs tend to describe products’ functions but do not specify technical implementation details, allowing participants to imagine multiple implementations. For example, a participant looking at the fictional automatic airport tracking and flagging system discussed the privacy implication of two possible implementations: one where the system only identifies and flags people with a prior criminal history (which might create extra burdens for people who have already served their time for a crime and have been released from prison); and one where the system uses behavioral predictors to try to identify “suspicious” behavior (which might go against a notion of “innocent until proven guilty”). The designs were useful at provoking conversations about the privacy and values implications of different design decisions. ## Thinking About Privacy and Social Values Implications of Technologies This work provides a case study showing how design workbooks and speculative design can be useful for thinking about the social values implications of technology, particularly privacy. In the time since we’ve made these designs, some (sometimes eerily) similar technologies have been developed or released, such as workers at a Swedish company embedding RFID chips in their hands, or Logitech’s Circle Camera. But our design work isn’t meant to predict the future. Instead, what we tried to do is take some technologies that are emerging or on the near horizon, and think seriously about ways in which they might get adopted, or used and misused, or interact with existing social systems — such as the workplace, or government surveillance, or school systems. How might privacy and other values be at stake in those contexts and situations? We aim for for these designs to help shed light on the space of possibilities, in an effort to help technologists make more socially informed design decisions in the present. We find it compelling that our design workbooks helped technologists-in-training discuss emerging technologies in relation to everyday, situated contexts. These workbooks don’t depict far off speculative science fiction with flying cars and spaceships. Rather they imagine future uses of technologies by having someone look at a product website, or a amazon.com page or an interface and thinking about the real and diverse ways in which people might experience those technology products. Using these techniques that focus on the potential adoptions and uses of emerging technologies in everyday contexts helps raise issues which might not be immediately obvious if we only think about positive social implications of technologies, and they also help surface issues that we might not see if we only think about social implications of technologies in terms of “worst case scenarios” or dystopias. Paper Citation: Richmond Y. Wong, Deirdre K. Mulligan, Ellen Van Wyk, James Pierce, and John Chuang. 2017. Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 111 (December 2017), 26 pages. DOI: https://doi.org/10.1145/3134746 This post is crossposted with the ACM CSCW Blog ## October 15, 2018 Ph.D. student #### Privacy of practicing high-level martial artists (BJJ, CI) Continuing my somewhat lazy “ethnographic” study of Brazilian Jiu Jitsu, an interesting occurrence happened the other day that illustrates something interesting about BJJ that is reflective of privacy as contextual integrity. Spencer (2016) has accounted for the changes in martial arts culture, and especially Brazilian Jiu Jitsu, due to the proliferation of video on-line. Social media is now a major vector for the skill acquisition in BJJ. It is also, in my gym, part of the social experience. A few dedicated accounts on social media platforms that share images and video from the practice. There is a group chat where gym members cheer each other on, share BJJ culture (memes, tips), and communicate with the instructors. Several members have been taking pictures and videos of others in practice and sharing them to the group chat. These are generally met with enthusiastic acclaim and acceptance. The instructors have also been inviting in very experienced (black belt) players for one-off classes. These classes are opportunities for the less experienced folks to see another perspective on the game. Because it is a complex sport, there are a wide variety of styles and in general it is exciting and beneficial to see moves and attitudes of masters besides the ones we normally train with. After some videos of a new guest instructor were posted to the group chat, one of the permanent instructors (“A”) asked not to do this: A: “As a general rule of etiquette, you need permission from a black belt and esp if two black belts are rolling to record them training, be it drilling not [sic] rolling live.” A: “Whether you post it somewhere or not, you need permission from both to record then [sic] training.” B: “Heard” C: “That’s totally fine by me, but im not really sure why…? B: “I’m thinking it’s a respect thing.” A: “Black belt may not want footage of him rolling or training. as a general rule if two black belts are training together it’s not to be recorded unless expressly asked. if they’re teaching, that’s how they pay their bills so you need permission to record them teaching. So either way, you need permission to record a black belt.” A: “I’m just clarifying for everyone in class on etiquette, and for visiting other schools. Unless told by X, Y, [other gym staff], etc., or given permission at a school you’re visiting, you’re not to record black belts and visiting upper belts while rolling and potentially even just regular training or class. Some schools take it very seriously.” C: “OK! Totally fine!” D: “[thumbs up emoji] gots it :)” D: “totally makes sense” A few observations on this exchange. First, there is the intriguing point that for martial arts black belts teaching, their instruction is part of their livelihood. The knowledge of the expert martial arts practitioner is hard-earned and valuable “intellectual property”, and it is exchanged through being observed. Training at a gym with high-rank players is a privilege that lower ranks pay for. The use of video recording has changed the economy of martial arts training. This has in many ways opened up the sport; it also opens up potential opportunities for the black belt in producing training videos. Second, this is framed as etiquette, not as a legal obligation. I’m not sure what the law would say about recordings in this case. It’s interesting that as a point of etiquette, it applies only to videos of high belt players. Recording low belt players doesn’t seem to be a problem according to the agreement in the discussion. (I personally have asked not to be recorded at one point at the gym when an instructor explicitly asked to be recorded in order to create demo videos. This was out of embarrassment at my own poor skills; I was also feeling badly because I was injured at the time. This sort of consideration does not, it seem, currently operate as privacy etiquette within the BJJ community. Perhaps these norms are currently being negotiated or are otherwise in flux.) Third, there is a sense in which high rank in BJJ comes with authority and privileges that do not require any justification. The “trainings are livelihood” argument does apply directly to general practice roles; the argument is not airtight. There is something else about the authority and gravitas of the black belt that is being preserved here. There is a sense of earned respect. Somehow this translates into a different form of privacy (information flow) norm. References Spencer, D. C. (2016). From many masters to many Students: YouTube, Brazilian Jiu Jitsu, and communities of practice. Jomec Journal, (5). ## September 27, 2018 Center for Technology, Society & Policy #### CTSP Alumni Updates ### We’re thrilled to highlight some recent updates from our fellows: Gracen Brilmyer, now a PhD student at UCLA, has published a single authored work in one of the leading journals in archival studies, Archival Science: “Archival Assemblages: Applying Disability Studies’ Political/Relational Model to Archival Description” and presented their work on archives, disability, and justice at a number of events over the past two years, including The Archival Education and Research Initiative (AERI), the Allied Media Conference, the International Communications Association (ICA) Preconference, Disability as Spectacle, and their research will be presented at the upcoming Community Informatics Research Network (CIRN). CTSP Funded Project 2016: Vision Archive Originating in the 2017 project “Assessing Race and Income Disparities in Crowdsourced Safety Data Collection” done by Fellows Kate Beck, Aditya Medury, and Jesus Barajas, the Safe Transportation and Research Center will launch a new project, Street Story, in October 2018. Street Story is an online platform that allows community groups and agencies to collect community input about transportation collisions, near-misses, general hazards and safe locations to travel. The platform will be available throughout California and is funded through the California Office of Traffic Safety. CTSP Funded Project 2017: Assessing Race and Income Disparities in Crowdsourced Safety Data Collection Fellow Roel Dobbe has begun a postdoctoral scholar position at the new AI Now Institute. Inspired by his 2018 CTSP project, he has co-authored a position paper with Sarah Dean, Tom Gilbert and Nitin Kohli titled A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. CTSP Funded Project 2018: Unpacking the Black Box of Machine Learning Processes We are also looking forward to a CTSP Fellow filled Computer Supported Cooperative Work conference in November this year! CTSP affiliated papers include: • ulti-year fellow and CSTMS Associate Director Morgan Ames, who with this piece bridges her previous work on CSCW and new interest in the moral visions of AI/ML. • “People Tend to Wind Down, Not Up, When They Browse Social Media,” by former CTSP Co-Director and Fellow Galen Panger (now at Google), based on his award-winning CTSP/CLTC funded dissertation work We also look forward to seeing CTSP affiliates presenting other work, including 2018 Fellows Richmond Wong, Noura Howell, Sarah Fox, and more! ## September 25, 2018 Center for Technology, Society & Policy #### October 25th: Digital Security Crash Course Thursday, October 25, 5-7pm, followed by reception UC Berkeley, South Hall Room 210 Open to the public! RSVP is required. Understanding how to protect your personal digital security is more important than ever. Confused about two factor authentication options? Which messaging app is the most secure? Thursday, October 25, 5-7pm, followed by reception
UC Berkeley, South Hall Room 210
Open to the public!

Understanding how to protect your personal digital security is more important than ever. Confused about two factor authentication options? Which messaging app is the most secure? What happens if you forget your password manager password, or lose the phone you use for 2 factor authentication? How do you keep your private material from being shared or stolen? And how do you help your friends and family consider the potential dangers and work to prevent harm, especially given increased threats to vulnerable communities and unprecedented data breaches?

Whether you are concerned about snooping family and friends, bullies and exes who are out to hack and harass you, thieves who want to impersonate you and steal your funds, or government and corporate spying, we can help you with this fun, straightforward training in how to protect your information and communications.

Join us for a couple hours of discussion and hands-on set up. We'll go over various scenarios you might want to protect against, talk about good tools and best practices, and explore trade offs between usability and security. This training is designed for people at all levels of expertise, and those who want both personal and professional digital security protection.

Refreshments and hardware keys provided! Bring your laptop or other digital device. Take home a hardware key and better digital security practices.

This crash course is sponsored by the Center for Technology, Society & Policy and generously funded by the Charles Koch Foundation.

Jessy Irwin will be our facilitator and guide. Jessy is Head of Security at Tendermint, where she excels at translating complex cybersecurity problems into relatable terms, and is responsible for developing, maintaining and delivering comprehensive security strategy that supports and enables the needs of her organization and its people. Prior to her role at Tendermint, she worked to solve security obstacles for non-expert users as a strategic advisor, security executive and former Security Empress at 1Password. She regularly writes and presents about human-centric security, and believes that people should not have to become experts in technology, security or privacy to be safe online. She regularly writes and presents about human-centric security, and believes that people should not have to become experts in technology, security or privacy to be safe online. RSVP here! ## September 09, 2018 Ph.D. student #### Brazilian Jiu Jitsu (BJJ) and the sociology of martial knowledge Maybe 15 months ago, I started training in Brazilian Jiu Jitsu (BJJ), a martial art that focuses on grappling and ground-fighting. Matches are won through points based on position (e.g., “mount”, where you are sitting on somebody else) and through submission, when a player taps out due to hyperextension under a joint lock or asphyxiation by choking. I recommend it heartily to anybody as a fascinating, smart workout that also has a vibrant and supportive community around it. One of the impressive aspects of BJJ, which differentiates it from many other martial arts, is its emphasis on live drilling and sparring (“rolling”), which can offer a third or more of a training session. In the context of sparring, there is opportunity for experimentation and rapid feedback about technique. In addition to being good fun and practice, regular sparring continually reaffirms the hierarchical ranking of skill. As in some other martial arts, rank is awarded as different colored “belts”–white, blue, purple, brown, black. Intermediary progress is given as “stripes” on the belt. White belts can spar with higher belts; more often than not, when they do so they get submitted. BJJ also has tournaments, which allow players from different dojos to compete against each other. I attended my first tournament in August and thought it was a great experience. There is nothing like meeting a stranger for the first time and then engage them in single combat to kindle a profound respect for the value of sportsmanship. Off the mat, I’ve had some of the most courteous encounters with anybody I have ever met in New York City. At tournaments, hundreds of contestants are divided into brackets. The brackets are determined by belt (white, blue, etc.), weight (up to 155 lbs, up to 170 lbs, etc.), sex (men and women), and age (kids age groups, adult, 30+ adult). There is an “absolute” bracket for those who would rise above the division of weight classes. There are “gi” and “no gi” variants of BJJ; the former requires wearing special uniform of jacket and pants, which are used in many techniques. Overall, it is an efficient system for training a skill. The few readers of this blog will recall that for some time I studied sociology of science and engineering, especially through the lens of Bourdieu’s Science of Science and Reflexivity. This was in turn a reaction to a somewhat startling exposure to sociology of science and education, and intellectual encounter that I never intended to have. I have been interested for a long time in the foundations of science. It was a rude shock, and one that I mostly regret, to have gone to grad school to become a better data scientist and find myself having to engage with the work of Bruno Latour. I did not know how to respond intellectually to the attack on scientific legitimacy on the basis that its self-understanding is insufficiently sociological until encountering Bourdieu, who refuted the Latourian critique and provides a clear-sighted view of how social structure under-girds scientific objectivity, when it works. Better was my encounter with Jean Lave, who introduced me to more phenomenological methods for understanding education through her class and works (Chaiklin and Lave, 1996). This made me more aware of the role of apprenticeship as well as the nuances of culture, framing, context, and purpose in education. Had I not encountered this work, I would likely never have found my way to Contextual Integrity, which draws more abstract themes about privacy from such subtle observations. Now it’s impossible for me to do something as productive and enjoyable as BJJ without considering it through these kinds of lenses. One day I would like to do more formal work along these lines, but as has been my habit I have a few notes to jot down at the moment. The first point, which is a minor one, is that there is something objectively known by experienced BJJ players, and that this knowledge is quintessentially grounded in intersubjective experience. The sparring encounter is the site at which technique is tested and knowledge is confirmed. Sparring simulates conditions of a fight for survival; indeed, if a choke is allowed to progress, a combatant can lose consciousness on the mat. This recalls Hegel’s observation that it is in single combat that a human being is forced to see the limits of their own solipsism. When the Other can kill you, that is an Other that you must see as, in some sense, equivalent in metaphysical status to oneself. This is a sadly forgotten truth in almost every formal academic environment I’ve found myself in, and that, I would argue, is why there is so much bullshit in academia. But now I digress. The second point, which is perhaps more significant, is that BJJ has figured out how to be an inclusive field of knowledge despite the pervasive and ongoing politics of what I have called in another post body agonism. We are at a point where political conflict in the United States and elsewhere seems to be at root about the fact that people have different kinds of bodies, and these differences are upsetting for liberalism. How can we have functioning liberal society when, for example, some people have male bodies and other people have female bodies? It’s an absurd question, perhaps, but nevertheless it seems to be the question of the day. It is certainly a question that plagues academic politics. BJJ provides a wealth of interesting case studies in how to deal productively with body agonism. BJJ is an unarmed martial art. The fact that there are different body types is an instrinsic aspect of the sport. Interestingly, in the dojo practices I’ve seen, trainings are co-ed and all body types (e.g., weight classes) train together. This leads to a dynamic and irregular practice environment that perhaps is better for teaching BJJ as a practical form of self-defense. Anecdotally, self-defense is an important motivation for why especially women are interested in BJJ, and in the context of a gym, sparring with men is a way to safely gain practical skill in defending against male assailants. On the other hand, as far as ranking progress is concerned, different bodies are considered in relation to other similar bodies through the tournament bracket system. While I know a badass 40-year old who submitted two college kids in the last tournament, that was extra. For the purposes of measuring my improvement in the discipline, I will be in the 30+ men’s bracket, compared with other guys approximately my weight. The general sense within the community is that progress in BJJ is a function of time spent practicing (something like the mantra that it takes 10,000 hours to master something), not any other intrinsic talent. Some people who are more dedicated to their training advance faster, and others advance slower. Training in BJJ has been a positive experience for me, and I often wonder whether other social systems could be more like BJJ. There are important lessons to be learned from it, as it is a mental discipline, full of subtlety and intellectual play, in its own right. References Bourdieu, Pierre. Science of science and reflexivity. Polity, 2004. Chaiklin, Seth, and Jean Lave, eds. Understanding practice: Perspectives on activity and context. Cambridge University Press, 1996. ## September 08, 2018 Ph.D. student #### On Hill’s work on ‘Greater Male Variability Hypothesis’ (GMVH) I’m writing in response to Ted Hill’s recent piece describe the acceptance and subsequent removal of a paper about the ‘Greater Male Variability Hypothesis’, the controversial idea that there is more variability in male intelligence than female intelligence, i.e. “that there are more idiots and more geniuses among men than among women.” I have no reason to doubt Hill’s account of events–his collaboration, his acceptance to a journal, and the mysterious political barriers to publication–and assume them for the purposes of this post. If these are refuted by future controversy somehow, I’ll stand corrected. The few of you who have followed this blog for some time will know that I’ve devoted some energy to understanding the controversy around gender and STEM. One post, criticizing how Donna Haraway, widely used in Science and Technology Studies, can be read as implying that women should not become ‘hard scientists’ in the mathematical mode, has gotten a lot of hits (and some pushback). Hill’s piece makes me revisit the issue. The paper itself is quite dry and the following quote is its main thesis: SELECTIVITY-VARIABILITY PRINCIPLE. In a species with two sexes A and B, both of which are needed for reproduction, suppose that sex A is relatively selective, i.e., will mate only with a top tier (less than half ) of B candidates. Then from one generation to the next, among subpopulations of B with comparable average attributes, those with greater variability will tend to prevail over those with lesser variability. Conversely, if A is relatively non-selective, accepting all but a bottom fraction (less than half ) of the opposite sex, then subpopulations of B with lesser variability will tend to prevail over those with comparable means and greater variability. This mathematical thesis is supported in the paper by computational simulations and mathematical proofs. From this, one can get the GMVH if one assumes that: (a) (human) males are less selective in their choice of (human) females when choosing to mate, and (b) traits that drive variability in intelligence are intergenerationally heritable, whether biologically or culturally. While not uncontroversial, neither of these are crazy ideas. In fact, if they weren’t both widely accepted, then we wouldn’t be having this conversation. Is this the kind of result that should be published? This is the controversy. I am less interested in the truth or falsehood of broad implications of the mathematical work than I am in the arguments for why the mathematical work should not be published (in a mathematics journal). As far as I can tell from Hill’s account and also from conversations and cultural osmosis on the matter, there are a number of reasons why research of this kind should not be published. The first reason might be that there are errors in the mathematical or simulation work. In other words, the Selectivity-Variability Principle may be false, and falsely supported. If that is the case, then the reviewers should have rejected the paper on those grounds. However, the principle is intuitively plausible and the reviewers accepted it. Few of Hill’s critics (though some) attacked the piece on mathematical grounds. Rather, the objections were of a social and political nature. I want to focus on these latter objections, though if there is a mathematical refutation of the Selectivity-Variability Principle I’m not aware of, I’ll stand corrected. The crux of the problem seems to be this: the two assumptions (a) and (b) are both so plausible that publishing a defense of (c) the Selectivity-Variability Principle would imply (d) the Greater Male Variability Hypothesis (GMVH). And if GMVH is true, then (e) there is a reason why more of the celebrated high-end of the STEM professions are male. It is because at the high-end, we’re looking at the thin tails of the human distribution, and the male tail is longer. (It is also longer at the low end, but nobody cares about the low end.) The argument goes that if this claim (e) were widely known by aspiring females in STEM fields, then they will be discouraged from pursuing these promising careers, because “women have a lesser chance to succeed in mathematics at the very top end”, which would be a biased, sexist view. (e) could be used to defend the idea that (f) normatively, there’s nothing wrong with men having most success at the top end of mathematics, though there is a big is/ought distinction there. My concern with this argument is that it assumes, at its heart, the idea that women aspiring to be STEM professionals are emotionally vulnerable to being dissuaded by this kind of mathematical argument, even when it is neither an empirical case (it is a mathematical model, not empirically confirmed within the paper) nor does it reflect on the capacity of any particular woman, and especially not after she has been selected for by the myriad social sorting mechanisms available. The argument that GMVH is professionally discouraging assumes many other hypotheses about human professional motivation, for example, the idea that it is only worth taking on a profession if one can expect to have a higher-than-average chance of achieving extremely high relative standing in that field. Given that extremely high relative standing in any field is going to be rare, it’s hard to say this is a good motivation for any profession, for men or for women, in the long run. In general, those that extrapolate from population level gender tendencies to individual cases are committing the ecological fallacy. It is ironic that under the assumption of the critics, potential female entrants into STEM might be screened out precisely because of their inability to understand a mathematical abstraction, along with its limitations and questionable applicability, through a cloud of political tension. Whereas if one were really interested in reaching mathematics in an equitable way, that would require teaching the capacity to see through political tension to the precise form of a mathematical abstraction. That is precisely what top performance in the STEM field should be about, and that it should be unflinchingly encouraged as part of the educational process for both men and women. My point, really, is this: the argument that publishing and discussing GMVH is detrimental to the career aspirations of women, because of how individual women will internalize the result, depends on a host of sexist assumptions that are as if not more pernicious than GMVH. It is based on the idea that women as a whole need special protection from mathematical ideas in order to pursue careers in mathematics, which is self-defeating crazy talk if I’ve ever heard it. The whole point of academic publication is to enable a debate of defeasible positions on their intellectual merits. In the case of mathematics research, the standards of merit are especially clear. If there’s a problem with Hill’s model, that’s a great opportunity for another, better model, on a topic that is clearly politically and socially relevant. (If the reviewers ignored a lot prior work that settled the scientific relevance of the question, then that’s a different story. One gathers that is not what happened.) As a caveat, there are other vectors through which GMVH could lead to bias against women pursuing STEM careers. For example, it could bias their less smart families or colleagues into believing less in their potential on the basis of their sex. But GMVH is about the variance, not the mean, of mathematical ability. So the only population that it’s relevant to is that in the very top tier of performers. That nuance is itself probably beyond the reach of most people who do not have at least some training in STEM, and indeed if somebody is reasoning from GMVH to an assumption about women’s competency in math then they are almost certainly conflating it with a dumber hypothesis about population means which is otherwise irrelevant. This is perhaps the most baffling thing about this debate: that it boils down to a very rarefied form of elite conflict. “Should a respected mathematics journal publish a paper that implies that there is greater variance in mathematical ability between sexes based on their selectivity and therefore…” is a sentence that already selects for a very small segment of the population, a population that should know better than to censor a mathematical proof rather than to take the opportunity to engage it as an opportunity to educate people in STEM and why it is an interesting field. Nobody is objecting to the publication of support for GMVH on the grounds that it implies that more men are grossly incompetent and stupid than women, and it’s worth considering why that is. If our first reaction to GMVH is “but can no one woman never be the best off?”, we are showing that our concerns lie with who gets to be on top, not the welfare of those on bottom. ## September 07, 2018 Ph.D. student #### Note on Austin’s “Cyber Policy in China”: on the emphasis on ‘ethics’ I’ve had recommended to me Greg Austin’s “Cyber Policy in China” (2014) as a good, recent work. I am not sure what I was expecting–something about facts and numbers, how companies are being regulated, etc. Just looking at the preface, it looks like this book is about something else. The preface frames the book in the discourse, beginning in the 20th century, about the “information society”. It explicitly mentions the UN’s World Summit on the Information Society (WSIS) as a touchstone of international consensus about what the information society is, as society “where everyone can create, access, utilise and share information and knowledge’ to ‘achieve their full potential’ in ‘improving their quality of life’. It is ‘people-centered’. In Chinese, the word for information society is xinxi shehui (Please forgive me: I’ve got little to know understanding of the Chinese language and that includes not knowing how to put the appropriate diacritics into transliterations of Chinese terms.) It is related to a term “informatization” (xinxihua) that is compared to industrialization. It means the historical process by which information technology is fully used, information resources are developed and utilized, the exchange of information and knowledge sharing are promoted, the quality of economic growth is improved, and the transformation of economic and social development is promoted”. Austin’s interesting point is that this is “less people-centered than the UN vision and more in the mould of the materialist and technocratic traditions that Chinese Communists have preferred.” This is an interesting statement on the difference between policy articulations by the United Nations and the CCP. It does not come as a surprise. What did come as a surprise is how Austin chooses to orient his book. On the assumption that outcomes in the information society are ethically determined, the analytical framework used in the book revolves around ideal policy values for achieving an advanced information society. This framework is derived from a study of ethics. Thus, the analysis is not presented as a work of social science (be that political science, industry policy or strategic studies). It is more an effort to situate the values of China’s leaders within an ethical framework implied by their acceptance of the ambition to become and advanced information society. This comes as a surprise to me because what I was expected from a book titled “Cyber Policy in China” is really something more like industry policy or strategic studies. I was not ready for, and am frankly a bit disappointed by, the idea that this is really a work of applied philosophy. Why? I do love philosophy as a discipline and have studied it carefully for many years. I’ve written and published about ethics and technological design. But my conclusion after so much study is that “the assumption that outcomes in the information society are ethically determined” is totally incorrect. I have been situated for some time in discussions of “technology ethics” and my main conclusion from them is that (a) “ethics” in this space are more often than not an attempt to universalize what are more narrow political and economic interests, and that (b) “ethics” are constantly getting compromised by economic motivations as well as the mundane difficulty of getting information technology to work as it is intended to in a narrow, functionally defined way. The real world is much bigger and more complex than any particular ethical lens can take in. Attempt to define technological change in terms of “ethics” are almost always a political maneuver, for good or for ill, of some kind that is reducing the real complexity of technological development into a soundbite. A true ethical analysis of cyber policy would need to address industrial policy and strategic aspects, as this is what drives the “cyber” part of it. The irony is that there is something terribly un-emic about this approach. By Austin’s own admission, the CCP cyber policy is motivated by material concerns about the distribution of technology and economic growth. Austin could have approached China’s cyber policy in the technocratic terms they see themselves in. But instead Austin’s approach is “human-centered”, with a focus on leaders and their values. I already doubt the research on anthropological grounds because of the distance between the researcher and the subjects. So I’m not sure what to do about this book. The preface makes it sound like it belongs to a genre of scholarship that reads well, and maybe does important ideological translation work, but does provide something like scientific knowledge of China’s cyber policy, which is what I’m most interested in. Perhaps I should move on, or take other recommendations for reading on this topic. ## September 04, 2018 Center for Technology, Society & Policy #### Backstage Decisions, Front-stage Experts: Interviewing Genome-Editing Scientists by Santiago Molina and Gordon PherriboCTSP Fellows This is the first in a series of posts on the project “Democratizing” Technology: Expertise and Innovation in Genetic Engineering When we think about who is making decisions that will impact the future health and wellbeing of society, one would hope that these individuals would wield their expertise in a way that addresses the social and economic issues affecting our communities. Scientists often fill this role: for example, an ecologist advising a state environmental committee on river water redistribution [1], a geologist consulting for an architectural team building a skyscraper [2], an oncologist discussing the best treatment options based on the patient’s diagnosis and values [3] or an economist brought in by a city government to help develop a strategy for allocating grants to elementary schools. Part of the general contract between technical experts and their democracies is that they inform relevant actors so that decisions are made with the strongest possible factual basis. The three examples above describe scientists going outside of the boundaries of their disciplines to present for people outside of the scientific community “on stage” [4]. But what about decisions made by scientists behind the scenes about new technologies that could affect more than daily laboratory life? In the 1970s, genetic engineers used their technical expertise to make a call about an exciting new technology, recombinant DNA (rDNA). This technology allowed scientists to mix and add DNA from different organisms; later giving rise to engineered bacteria that could produce insulin and eventually transgenic crops. The expert decision making process and outcome, in this case, had little to do with the possibility of commercializing biotechnology or the economic impacts of GMO seed monopolies. This happened before the patenting of whole biological organisms [5], and the use of rDNA in plants in 1982. Instead, the emerging issues surrounding rDNA were dealt with as a technical issue of containment. Researchers wanted to ensure that anything tinkered with genetically stayed not just inside the lab, but inside specially marked and isolated rooms in the lab, eventually given rise to well-established institution of biosafety. A technical fix, for a technical issue. Today, scientists are similarly engaged in a process of expert decision making around another exciting new technology, the CRISPR-Cas9 system. This technology allows scientists to make highly specific changes, “edits”, to the DNA of virtually any organism. Following the original publication that showed that CRISPR-Cas9 could be used to modify DNA in a “programmable” way, scientists have developed the system into a laboratory toolbox and laboratories across the life sciences are using it to tinker away at bacteria, butterflies, corn, frogs, fruit flies, human liver cells, nematodes, and many other organisms. Maybe because most people do not have strong feelings about nematodes, most of the attention in both popular news coverage and in expert circles about this technology has had to do with whether modifications that could affect human offspring (i.e. germline editing) are moral. We have been interviewing faculty members directly engaged in these critical conversations about the potential benefits and risks of new genome editing technologies. As we continue to analyze these interviews, we want to better understand the nature of these backstage conversations and learn how the experiences and professional development activities of these expects influenced their decision-making. In subsequent posts we’ll be sharing some of our findings from these interviews, which so far have highlighted the role of a wide range of technical experiences and skills for the individuals engaged in these discussions, the strength of personal social connections and reputation in getting you a seat at the table and the dynamic nature of expert decision making. [1] Scoville, C. (2017). “We Need Social Scientists!” The Allure and Assumptions of Economistic Optimization in Applied Environmental Science. Science as Culture, 26(4), 468-480. [2] Wildermuth and Dineen (2017) “How ready will Bay Area be for next Quake?” SF Chronicle. [3] Sprangers, M. A., & Aaronson, N. K. (1992). The role of health care providers and significant others in evaluating the quality of life of patients with chronic disease: a review. Journal of clinical epidemiology, 45(7), 743-760. [4] Hilgartner, S. (2000). Science on stage: Expert advice as public drama. Stanford University Press. [5] Diamond v Chakrabarty was in 1980, upheld first whole-scale organism patent (bacterium that could digest crude oil). ## September 03, 2018 Ph.D. student #### How trade protection can increase labor wages (the Stolper-Samuelson theorem) I’m continuing a look into trade policy 8/08/30/trade-policy-and-income-distribution-effects/”>using Corden’s (1997) book on the topic. Picking up where the last post left off, I’m operating on the assumption that any reader is familiar with the arguments for free trade that are an extension of those arguments of laissez-faire markets. I will assume that these arguments are true as far as they go: that the economy grows with free trade, that tariffs create a dead weight loss, that subsidies are expensive, but that both tariffs and subsidies do shift the market towards imports. The question raised by Corden is why, despite its deleterious effects on the economy as a whole, protectionism enjoys political support by some sectors of the economy. He hints, earlier in Chapter 5, that this may be due to income distribution effects. He clarifies this with reference to an answer to this question that was given as early as 1941 by Stolper and Samuelson; their result is now celebrated as the Stolper-Samuelson theorem. The mathematics of the theorem can be read in many places. Like any economic model, it depends on some assumptions that may or may not be the case. Its main advantage is that it articulates how it is possible for protectionism to benefit a class of the population, and not just in relative but in absolute terms. It does this by modeling the returns to different factors of production, which classically have been labor, land, and capital. Roughly, the argument goes like this. Suppose and economy has two commodities, one for import and one for export. Suppose that the imported good is produced with a higher labor to land ratio than the export good. Suppose a protectionist policy increases the amount of the import good produced relative to the export good. Then the return on labor will increase (because more labor is used in supply), and the return on land will decrease (because less land is used in supply). Wages will increase and rent on land will decrease. These breakdowns of the economy into “factors of production” feels very old school. You rarely read economists discuss the economy in these terms now, which is itself interesting. One reason why (and I am only speculating here) is that these models clarify how laborers, land-owners, and capital-owners have different political interests in economic intervention, and that can lead to the kind of thinking that was flushed out of the American academy during the McCarthy era. Another reason may be that “capital” has changed meaning from being about ownership of machine goods into being about having liquid funds available for financial investment. I’m interested in these kinds of models today partly because I’m interested in the political interests in various policies, and also because I’m interested in particular in the economics of supply chain logistics. The “factors of production” approach is a crude way to model the ‘supply chain’ in a broad sense, but one that has proven to be an effective source of insights in the past. References Corden, W. Max. “Trade policy and economic welfare.” OUP Catalogue (1997). Stolper, Wolfgang F., and Paul A. Samuelson. “Protection and real wages.” The Review of Economic Studies 9.1 (1941): 58-73. ## August 30, 2018 Ph.D. student #### trade policy and income distribution effects I am going to start researching trade policy, meaning policies around trade between different countries; imports and exports. Why? • It is politically relevant in the U.S. today. • It is a key component to national cybersecurity strategy, both defensive and offensive, which hinges in many cases on supply chain issues. • It maybe ought to be a component of national tech regulation and privacy policy, if e-commerce is seen as a trade activity. (This could be see as ‘cybersecurity’ policy, more broadly writ). • Formal models from trade policy may be informative in other domains as well. In general, years of life experience and study have taught me that economics, however much it is maligned, is a wise and fundamental social science without which any other understanding of politics and society is incomplete, especially when considering the role of technology in society. Plenty of good reasons! Onward! As a starting point, I’m working through Max Corden’s Trade policy and social welfare (1997), which appears to be a well regarded text on the subject. In it, he sets out to describe a normative theory of trade policy. Here are two notable points based on a first perusal. 1. (from Chapter 1, “Introduction”) Corden identifies three “stages of thought” about trade policy. The first is the discovery of the benefits of free trade with the great original economists Adam Smith and David Ricardo. Here, the new appreciation of free trade was simultaneous with the new appreciation of the free market in general. “Indeed, the case for free trade was really a special case of the argument for laissez-faire.” In the second phase, laissez-faire policies came into question. These policies may not lead to full employment, and the income distribution effects (which Corden takes seriously throughout the book, by the way) may not be desirable. Parallel to this, the argument for free trade was challenged. Some of these challenges were endorsed by John Stuart Mill. One argument is that tariffs might be necessary to protect “infant industries”. As time went on, the favorability of free trade more or less tracked the favorability of laissez-faire. Both were popular in Western Europe and failed to get traction in most other countries (almost all of which were ‘developing’). Corden traces the third stage of thought to Meade’s (1955) Trade and welfare. “In the third stage the link between the case for free trade and the case for laissez-faire was broken.“. The normative case for free trade, in this stage, did not depend on a normative case for laissez-faire, but existed despite normative reasons for government intervention in the economy. The point made in this approach, called the theory of domestic distortions, is that it is generally better for the kinds of government intervention made to solve domestic problems to be domestic interventions, not trade interventions. This third stage came with a much more sophisticated toolkit for comparing the effects of different kinds of policies, which is the subject of exposition for a large part of Corden’s book. 2. (from Chapter 5, “Protection and Income Distribution) Corden devotes at least one whole chapter to an aspect of the trade policy discussion that is very rarely addressed in, say, the mainstream business press. This is the fact that trade policy can have an effect on internal income distribution, and that this has been throughout history a major source of the political momentum for protectionist policies. This explains why the domestic politics of protectionism and free trade can be so heated and are really often independent from arguments about the effect of trade policy on the economy as a whole, which, it must be said, few people realize they have a real stake in. Corden’s examples involve the creation of fledgling industries under the conditions of war, which often cut off foreign supplies. When the war ends, those businesses that flourished during war exert political pressure to protect themselves from erosion from market forces. “Thus the Napoleonic Wars cut off supplies of corn (wheat) to Britain from the Continent and led to expansion of acreage and higher prices of corn. When the war was over, the Corn Law of 1815 was designed to maintain prices, with an import prohibition as long as the domestic price was below a certain level.” It goes almost without saying that this served the interests of a section of the community, the domestic corn farmers, and not of others. This is what Corden means by an “income distribution effect”. “Any history book will show that these income distribution effects are the very stuff of politics. The great free trade versus protection controversies of the nineteenth century in Great Britain and in the United States brought out the conflicting interests of different sections of the community. It was the debate about the effects of the Corn Laws which really stimulated the beginnings of the modern theory of international trade.” Extending this argument a bit, one might say that a major reason why economics gets such a bad rap as a social science is that nobody really cares about Pareto optimality except for those sections of the economy that are well served by a policy that can be justified as being Pareto optimal (in practice, this would seem to be correlated with how much somebody has invested in mutual funds, as these track economic growth). The “stuff of politics” is people using political institutions to change their income outcomes, and the potential for this makes trade policy a very divisive topic. Implication for future research: The two key takeaways for trade policy in cybersecurity are: 1) The trade policy discussion need not remain within the narrow frame of free trade versus protectionism, but rather a more nuanced set of policy analysis tools should be brought to bear on the problem, and 2) An outcome of these policy analyses should be the identification not just of total effects on the economy, or security posture, or what have you, but on the particular effects on different sections of the economy and population. References Corden, W. Max. “Trade policy and economic welfare.” OUP Catalogue (1997). Meade, James Edward. Trade and welfare. Vol. 2. Oxford University Press, 1955. ## August 21, 2018 Center for Technology, Society & Policy #### Standing up for truth in the age of disinformation Professor Deirdre K. Professor Deirdre K. Mulligan and PhD student (and CTSP Co-Director) Daniel Griffin have an op-ed in The Guardian considering how Google might consider its human rights obligations in the face of state censorship demands:

If Google goes to China, will it tell the truth about Tiananmen Square?

The op-ed advances a line of argument developed in a recent article of theirs in the Georgetown Law Technology Review: "Rescripting Search to Respect the Right to Truth" On Thursday, October 4th at 5:30pm the Center for Technology, Society & Policy (CTSP) and the School of Information's Information Management Student Association (IMSA) are co-hosting their third annual Social Impact Un-Pitch Day!

Join CTSP and IMSA to brainstorm ideas for projects that address the challenges of technology, society, and policy. We welcome students, community organizations, local municipal partners, faculty, and campus initiatives to discuss discrete problems that project teams can take on over the course of this academic year. Teams will be encouraged to apply to CTSP to fund their projects.

Location: Room 202, in South Hall. You can share slides and/or description of your ideas even if you aren’t able to attend. Deadline to share materials: midnight October 1st, 2018. ## Funding Opportunities The next application round for fellows will open in November. CTSP’s fellowship program will provide small grants to individuals and small teams of fellows for 2019. CTSP also has a recurring offer of small project support. ## Prior Projects & Collaborations Here are several examples of projects that members of the I School community have pursued as MIMS final projects or CTSP Fellow projects (see more projects from 2016, 2017, and 2018). ## Skills & Interests of Students The above projects demonstrate a range of interests and skills of the I School community. Students here and more broadly on the UC Berkeley campus are interested and skilled in all aspects of where information and technology meets people—from design and data science, to user research and information policy. ## RSVP here! #### August 30th, 5:30pm: Habeas Data Panel Discussion ## Location: South Hall Rm 202 ### Time: 5:30-7pm (followed by light refreshments) ### CTSP’s first event of the semester! ### Co-Sponsored with the Center for Long-Term Cybersecurity Please join us for a panel discussion featuring award-winning tech reporter Cyrus Farivar, whose new book, Habeas Data, explores how the explosive growth of surveillance technology has outpaced our understanding of the ethics, mores, and laws of privacy. Habeas Data explores ten historic court decisions that defined our privacy rights and matches them against the capabilities of modern technology. Mitch Kapor, co-founder, Electronic Frontier Foundation, said the book was “Essential reading for anyone concerned with how technology has overrun privacy.” The panel will be moderated by 2017 and 2018 CTSP Fellow Steve Trush, a MIMS 2018 graduate and now a Research Fellow at the Center for Long-Term Cybersecurity (CLTC). He was on a CTSP project starting in 2017 that provided a report to the Oakland Privacy Advisory Commission—read an East Bay Express write-up on their work here. The panelists will discuss what public governance models can help local governments protect the privacy of citizens—and what role citizen technologists can play in shaping these models. The discussion will showcase the ongoing collaboration between the UC Berkeley School of Information and the Oakland Privacy Advisory Commission (OPAC). Attendees will learn how they can get involved in addressing issues of governance, privacy, fairness, and justice related to state surveillance. ### Panel: • Cyrus Farivar, Author, Habeas Data: Privacy vs. the Rise of Surveillance Tech • Deirdre Mulligan, Associate Professor in the School of Information at UC Berkeley, Faculty Director, UC Berkeley Center for Law & Technology • Catherine Crump, Assistant Clinical Professor of Law, UC Berkeley; Director, Samuelson Law, Technology & Public Policy Clinic. • Camille Ochoa, Coordinator, Grassroots Advocacy; Electronic Frontier Foundation • Moderated by Steve Trush, Research Fellow, UC Berkeley Center for Long-Term Cybersecurity The panel will be followed by a reception with light refreshments. Building is wheelchair accessible – wheelchair users can enter through the ground floor level and take the elevator to the second floor. This event will not be taped or live-streamed. ## RSVP here to attend. ### Panelist Bios: Cyrus [“suh-ROOS”] Farivar is a Senior Tech Policy Reporter at Ars Technica, and is also an author and radio producer. His second book, Habeas Data, about the legal cases over the last 50 years that have had an outsized impact on surveillance and privacy law in America, is out now from Melville House. His first book, The Internet of Elsewhere—about the history and effects of the Internet on different countries around the world, including Senegal, Iran, Estonia and South Korea—was published in April 2011. He previously was the Sci-Tech Editor, and host of “Spectrum” at Deutsche Welle English, Germany’s international broadcaster. He has also reported for the Canadian Broadcasting Corporation, National Public Radio, Public Radio International, The Economist, Wired, The New York Times and many others. His PGP key and other secure channels are available here. Deirdre K. Mulligan is an Associate Professor in the School of Information at UC Berkeley, a faculty Director of the Berkeley Center for Law & Technology, and an affiliated faculty on the Center for Long-Term Cybersecurity. Mulligan’s research explores legal and technical means of protecting values such as privacy, freedom of expression, and fairness in emerging technical systems. Her book, Privacy on the Ground: Driving Corporate Behavior in the United States and Europe, a study of privacy practices in large corporations in five countries, conducted with UC Berkeley Law Prof. Kenneth Bamberger was recently published by MIT Press. Mulligan and Bamberger received the 2016 International Association of Privacy Professionals Leadership Award for their research contributions to the field of privacy protection. Catherine Crump: Catherine Crump is an Assistant Clinical Professor of Law and Director of the Samuelson Law, Technology & Public Policy Clinic. An experienced litigator specializing in constitutional matters, she has represented a broad range of clients seeking to vindicate their First and Fourth Amendment rights. She also has extensive experience litigating to compel the disclosure of government records under the Freedom of Information Act. Professor Crump’s primary interest is the impact of new technologies on civil liberties. Representative matters include serving as counsel in the ACLU’s challenge to the National Security Agency’s mass collection of Americans’ call records; representing artists, media outlets and others challenging a federal internet censorship law, and representing a variety of clients seeking to invalidate the government’s policy of conducting suspicionless searches of laptops and other electronic devices at the international border. Prior to coming to Berkeley, Professor Crump served as a staff attorney at the ACLU for nearly nine years. Before that, she was a law clerk for Judge M. Margaret McKeown at the United States Court of Appeals for the Ninth Circuit. Camille Ochoa: Camille promotes the Electronic Frontier Foundation’s grassroots advocacy initiative (the Electronic Frontier Alliance) and coordinates outreach to student groups, community groups, and hacker spaces throughout the country. She has very strong opinions about food deserts, the school-to-prison pipeline, educational apartheid in America, the takeover of our food system by chemical companies, the general takeover of everything in American life by large conglomerates, and the right to not be spied on by governments or corporations. ## August 12, 2018 Ph.D. student #### “the politicization of the social” and “politics of identity” in Omi and Winant, Cha. 6 A confusing debate in my corner of the intellectual Internet is about (a) whether the progressive left has a coherent intellectual stance that can be articulated, (b) what to call this stance, (c) whether the right-wing critics of this stance have the intellectual credentials to refer to it and thereby land any kind of rhetorical punch. What may be true is that both “sides” reflect social movements more than they reflect coherent philosophies as such, and so trying to bridge between them intellectually is fruitless. Happily, reading through Omi and Winant, which among other things outlines a history of what I think of as the progressive left, or the “social justice”, “identity politics” movement in the United States. They address this in their Chapter 6: “The Great Transformation”. They use “the Great Transformation” to refer to “racial upsurges” in the 1950’s and 1960’s. They are, as far as I can tell, the only people who ever use “The Great Transformation” to refer to this period. I don’t think it is going to stick. They name it this because they see this period as a great victorious period for democracy in the United States. Omi and Winant refer to previous periods in the United States as “racial despotism”, meaning that the state was actively treating nonwhites as second class citizens and preventing them from engaging in democracy in a real way. “Racial democracy”, which would involve true integration across race lines, is an ideal future or political trajectory that was approached during the Great Transformation but not realized fully. The story of the civil rights movements in the mid-20th century are textbook material and I won’t repeat Omi and Winant’s account, which is interesting for a lot of reasons. One reason why it is interesting is how explicitly influenced by Gramsci their analysis is. As the “despotic” elements of United States power structures fade, the racial order is maintained less by coercion and more by consent. A power disparity in social order maintained by consent is a hegemony, in Gramscian theory. They explain the Great Transformation as being due to two factors. One was the decline of the ethnicity paradigm of race, which had perhaps naively assumed that racial conflicts could be resolved through assimilation and recognition of ethnic differences without addressing the politically entrenched mechanisms of racial stratification. The other factor was the rise of new social movements characterized by, in alliance with second-wave feminism, the politicization of the social, whereby social identity and demographic categories were made part of the public political discourse, rather than something private. This is the birth of “politics of identity”, or “identity politics”, for short. These were the original social justice warriors. And they attained some real political victories. The reason why these social movements are not exactly normalized today is that there was a conservative reaction to resist changes in the 70’s. The way Omi and Winant tell it, the “colorblind ideology” of the early 00’s was culmination of a kind of political truce between “racial despotism” and “racial democracy”–a “racial hegemony”. Gilman has called this “racial liberalism”. So what does this mean for identity politics today? It means it has its roots in political activism which was once very radical. It really is influenced by Marxism, as these movements were. It means that its co-option by the right is not actually new, as “reverse racism” was one of the inventions of the groups that originally resisted the Civil Rights movement in the 70’s. What’s new is the crisis of hegemony, not the constituent political elements that were its polar extremes, which have been around for decades. What it also means is that identity politics has been, from its start, a tool for political mobilization. It is not a philosophy of knowledge or about how to live the good life or a world view in a richer sense. It serves a particular instrumental purpose. Omi and Winant talk about the politics of identity is “attractive”, that it is a contagion. These are positive terms for them; they are impressed at how anti-racism spreads. These days I am often referred to Phillips’ report, “The Oxygen of Amplification”, which is about preventing the spread of extremist views by reducing the amount of reporting on them in ‘disgust’. It must be fair to point out that identity politics as a left-wing innovation were at one point an “extremist” view, and that proponents of that view do use media effectively to spread it. This is just how media-based organizing tactics work, now. ## August 07, 2018 Ph.D. student #### Racial projects and racism (Omi and Winant, 2014; Jeong case study) Following up on earlier posts on Omi and Winant, I’ve gotten to the part where they discuss racial projects and racism. Because I use Twitter, I have not been able to avoid the discussion of Sarah Jeong’s tweets. I think it provides a useful case study in Omi and Winant’s terminology. I am not a journalist or particularly with-it person, so I have encountered this media event mainly through articles about it. Here are some. To recap, for Omi and Winant, race is a “master category” of social organization, but nevertheless one that is unstable and politically contested. The continuity of racial classification is due to a historical, mutually reinforcing process that includes both social structures that control the distribution of resources and social meanings and identities that have been acquired by properties of people’s bodies. The fact that race is sustained through this historical and semiotically rich structuration (to adopt a term from Giddens), means that “To identify an individual or group racially is to locate them within a socially and historically demarcated set of demographic and cultural boundaries, state activities, “life-chances”, and tropes of identity/difference/(in)equality. “We cannot understand how racial representations set up patterns of residential segregation, for example, without considering how segregation reciprocally shapes and reinforces the meaning of race itself.” This is totally plausible. Identifying the way that racial classification depends on a relationship between meaning and social structure opens the possibility of human political agency in the (re)definition of race. Omi and Winant’s term for these racial acts is racial projects. A racial project is simultaneously an interpretation, representation, or explanation of racial identities and meanings, and an effort to organize and distribute resources (economic, political, cultural) along particular racial lines. … Racial projects connect the meaning of race in discourse and ideology with the way that social structures are racially organized. “Racial project” is a broad category that can include both large state and institutional interventions and individual actions. “even the decision to wear dreadlocks”. What makes them racial projects is how they reflect and respond to broader patterns of race, whether to reproduce it or to subvert it. Prevailing stereotypes are one of the main ways we can “read” the racial meanings of society, and so the perpetuation of subversion of stereotypes is a form of “racial project”. Racial projects are often in contest with each other; the racial formation process is the interaction and accumulation of these projects. Racial project is a useful category partly because it is key to Omi and Winant’s definition of racism. They acknowledge that the term itself is subject to “enormous debate”, at times inflated to be meaningless and at other times deflated to be too narrow. They believe the definition of racism as “racial hate” is too narrow, though it has gain legal traction as a category, as in when “hate crimes” are considered an offense with enhanced sentencing, or universities institute codes against “hate speech”. I’ve read “racial animus” as another term that means something similar, though perhaps more subtle, than ‘racial hate’. The narrow definition of racism as racial hate is rejected due to an argument O&W attribute to David Theo Goldberg (1997), which is that by narrowly focusing on “crimes of passion” (I would gloss this more broadly to ‘psychological states’), the interpretation of racism misses the ideologies, policies, and practices that “normalize and reproduce racial inequality and domination”. In other words, racism, as a term, has to reference the social structure that is race in order to adequate. Omi and Winant define racism thus: A racial project can be defined as racist if it creates or reproduces structures of domination based on racial significance and identities. A key implication of their argument is that not all racial projects are racist. Recall that Omi and Winant are very critical of colorblindness as (they allege) a political hegemony. They want to make room for racial solidarity and agency despite the hierarchical nature of race as a social fact. This allows them to answer two important questions. Are there anti-racist projects? Yes. “[w]e define anti-racist projects as those that undo or resist structures of domination based on racial significations and identities. Note that the two definitions are not exactly parallel in construction. To “create and reproduce structure” is not entirely the opposite of “undo or resist structure”. Given O&W’s ontology, and the fact that racial structure is always the accumulation of a long history of racial projects, projects that have been performed by (bluntly) both the right and the left, and given that social structure is not homogeneous across location (consider how race is different in the United States and in Brazil, or different in New York City and in Dallas), and given that an act of resistance is also an act of creation, implicitly, one could easily get confused trying to apply these definitions. The key word, “domination”, is not defined precisely, and everything hinges on this. It’s clear from the writing that Omi and Winant subscribe to the “left” view of how racial domination works; this orients their definition of racism concretely. But they also not that the political agency of people of color in the United States over the past hundred years or so has gained them political power. Isn’t the key to being racist having power? This leads O&W to the second question, which is Can Group of Color Advance Racist Projects? O&W’s answer is, yes, they can. There are exceptions to the hierarchy of white supremacy, and in these exceptions there can be racial conflicts where a group of color is racist. Their example is in cases where blacks and Latinos are in contest over resources. O&W do not go so far as to say that it is possible to be racist against white people, because they believe all racial relations are shaped by the overarching power of white supremacy. #### Case Study: Jeong’s tweets That is the setup. So what about Sarah Jeong? Well, she wrote some tweets mocking white people, and specifically white men, in 2014, which was by the way the heyday of obscene group conflict on Twitter. That was the year of Gamergate. A whole year of tweets that are probably best forgotten. She compared white people to goblins, she compared them the dogs. She said she wished ill on white men. As has been pointed out, if any other group besides white men were talked about, her tweets would be seen as undeniably racist, etc. They are, truth be told, similar rhetorically to the kinds of tweets that the left media have been so appalled at for some time. They have surfaced again because Jeong was hired by the New York Times, and right wing activists (or maybe just trolls, I’m a little unclear about which) surfaced the old tweets. In the political climate of 2018, when Internet racism feels like it’s gotten terribly real, these struck a chord and triggered some reflection. What should we make of these tweets, in light of racial formation theory? First, we should acknowledge that the New York Times has some really great lawyers working for it. Their statement was the at the time, (a) Jeong was being harassed, (b) that she responded to them in the same rhetorical manner of the harassment, that (c) that’s regrettable, but also, it’s long past and not so bad. Sarah Jeong’s own statement makes this point, acknowledges that the tweets may be hurtful out of context, and that she didn’t mean them the way others could take them. “Harassment” is actually a relatively neutral term; you can harass somebody, legally speaking, on the basis of their race without invoking a reaction from anti-racist sociologists. This is all perfectly sensible, IMO, and the case is pretty much closed. But that’s not where the discussion on the Internet ended. Why? Because the online media is where the contest of racial formation is happening. We can ask: Were Sarah Jeong’s tweets a racial project? The answer seems to be, yes, they were. It was a representation of racial identity (whiteness) “to organize and distribute resources (economic, political, cultural) along particular racial lines”. Jeong is a journalist and scholar, and these arguments are happening in social media, which are always-already part of the capitalist attention economy. Jeong’s success is partly due to her confrontation of on-line harassers and responses to right-wing media figures. And her activity is the kind that rallies attention along racial lines–anti-racist, racist, etc. Confusingly, the language she used in these tweets reads as hateful. “Dumbass fucking white people marking up the internet with their opinions like dogs pissing on fire hydrants” does, reasonably, sound like it expresses some racial animus. If we were to accept the definition of racism as merely the possession of ill will towards a race, which seems to be Andrew Sullivan’s definition, then we would have to say those were racist tweets. We could invoke a defense here. Were the tweets satire? Did Jeong not actually have any ill will towards white people? One might wonder, similarly, whether 4chan anti-Semites are actually anti-Semitic or just trolling. The whole question of who is just trolling and who should be taken seriously on the Internet is such an interesting one. But it’s one I had to walk away from long ago after the heat got turned up on me one time. So it goes. What everyone knows is at stake, though, is the contention that the ‘racial animus’ definition is not the real definition of racism, but rather that something like O&W’s definition is. By their account, (a) a racial project is only racist if it aligns with structures of racial domination, and (b) the structure of racial domination is a white supremacist one. Ergo, by this account, Jeong’s tweets are not racist, because insulting white people does not create or reproduce structures of white supremacist domination. It’s worth pointing out that there are two different definitions of a word here and that neither one is inherently more correct of a definition. I’m hesitant to label the former definition “right” and the latter definition “left” because there’s nothing about the former definition that would make you, say, not want to abolish the cradle-to-prison system or any number of other real, institutional reforms. But the latter definition is favored by progressives, who have a fairly coherent world view. O&W’s theorizing is consistent with it. The helpful thing about this worldview is that it makes it difficult to complain about progressive rhetorical tactics without getting mired into a theoretical debate about their definitions, which makes it an excellent ideology for getting into fights on the Internet. This is largely what Andrew Sullivan was getting at in his critique. What Jeong and the NYT seem to get, which some others don’t, is that comments that insult an entire race can be hurtful and bothersome even if they are not racist in the progressive sense of the term. It is not clear what we should call a racial project that is hurtful and bothersome to white people if we do not call it racist. A difficulty with the progressive definition of racism is that agreement on the application of the term is going to depend on agreement about what the dominate racial structures are. What we’ve learned in the past few years is that the left-wing view of what these racial structures are is not as widely shared as it was believed to be. Example, there are far more people who believe in anti-Semitic conspiracies, in which the dominant race is the Jews, active in American political life than was supposed. Given O&W’s definition of racism, if it were, factually, the case that Jews ran the world, then anti-Semitic comments would not be racist in the meaningful sense. Which means that the progressive definition of racism, to be effective, depends on widespread agreement about white supremacist hegemony, which is a much, much more complicated thing to try to persuade somebody of than a particular person’s racial animus. A number of people have been dismissing any negative reaction to the resurfacing of Jeong’s tweets, taking the opportunity to disparage that reaction as misguided and backwards. As far as I can tell, there is an argument that Jeong’s tweets are actually anti-racist. This article argues that casually disparaging white men is just something anti-racists do lightly to call attention to the dominant social structures and also the despicable behavior of some white men. Naturally, these comments are meant humorously, and not intended to refer to all white men (to assume it does it to distract from the structural issues at stake). They are jokes that should be celebrated, because the the progressives have already won this argument over #notallmen, also in 2014. Understood properly as progressive, anti-racist, social justice idiom, there is nothing offensive about Jeong’s tweets. I am probably in a minority on this one, but I do not agree with this assessment, for a number of reasons. First, the idea that you can have a private, in-group conversation on Twitter is absurd. Second, the idea that a whole community of people casually expresses racial animus because of representative examples of wrongdoing by members of a social class can be alarming whether or not it’s Trump voters talking about Mexicans or anti-racists talking about white people. That alarm, as an emotional reaction, is a reality whether or not the dominant racial structures are being reproduced or challenged. Third, I’m not convinced that as a racial project, tweets simply insulting white people really counts as “anti-racist” in a substantive sense. Anti-racist projects are “those that undo or resist structures of domination based on racial significations and identities.” Is saying “white men are bullshit” undoing a structure of domination? I’m pretty sure any white supremacist structures of domination have survived that attack. Does it resist white supremacist domination? The thrust of wise sociology of race is that what’s more important than the social meanings are the institutional structures that maintain racial inequality. Even if this statement has a meaning that is degrading to white people, it doesn’t seem to be doing any work of reorganizing resources around (anti-)racial lines. It’s just a crass insult. It may well have actually backfired, or had an effect on the racial organization of attention that neither harmed nor supported white supremacy, but rather just made its manifestation on the Internet more toxic (in response to other, much greater, toxicity, of course). I suppose what I’m arguing for is greater nuance than either the “left” or “right” position has offered on this case. I’m saying that it is possible to engage in a racial project that is neither racist nor anti-racist. You could have a racial project that is amusingly absurd, or toxic, or cleverly insightful. Moreover, there is a complex of ethical responsibilities and principles that intersects with racial projects but is not contained by the logic of race. There are greater standards of decency that can be invoked. These are not simply constraints on etiquette. They also are relevant to the contest of racial projects and their outcomes. ## August 05, 2018 Ph.D. student #### From social movements to business standards Matt Levine has a recent piece discussing how discovering the history of sexual harassment complaints about a company’s leadership is becoming part of standard due diligence before an acquisition. Implicitly, the threat of liability, and presumably the costs of a public relations scandal, are material to the value of the company being acquired. Perhaps relatedly, the National Venture Capital Association has added to its Model Legal Documents a slew of policies related to harassment and discrimination, codes of conduct, attracting and retaining diverse talent, and family friendly policies. Rumor has it that venture capitalists will now encourage companies they invest in to adopt these tested versions of the policies, much as an organization would adopt a tested and well-understood technical standard. I have in various researcher roles studied social movements and political change, but these studies have left me with the conclusion that changes to culture are rarely self-propelled, but rather are often due to more fundamental changes in demographics or institutions. State legislation is very slow to move and limited in its range, and so often trails behind other amassing of power and will. Corporate self-regulation, on the other hand, through standards, contracts, due diligence, and the like, seems to be quite adaptive. This is leading me to the conclusion that a best kept secret of cultural change is that some of the main drivers of it are actually deeply embedded in corporate law. Corporate law has the reputation of being a dry subject which sucks in recent law grads into soulless careers. But what if that wasn’t what corporate law was? What if corporate law was really where the action is? In broader terms, the adaptivety of corporate policy to changing demographics and social needs perhaps explains the paradox of “progressive neoliberalism”, or the idea that the emerging professional business class seems to be socially liberal, whether or not it is fiscally conservative. Professional culture requires, due to antidiscrimination law and other policies, the compliance of its employees with a standard of ‘political correctness’. People can’t be hostile to each other in the workplace or else they will get fired, and they especially can’t be hostile to anybody on the basis of their being part of a protected category. This has been enshrined into law long ago. Part of the role of educational institutions is to teach students a coherent story about why these rules are what they are and how they are not just legally mandated, but morally compelling. So the professional class has an ideology of inclusivity because it must. ## July 30, 2018 Ph.D. student #### How the Internet changed everything: a grand theory of AI, etc. I have read many a think piece and critical take about AI, the Internet, and so on. I offer a new theory of What Happened, the best I can come up with based on my research and observations to date. Consider this article, “The death of Don Draper”, as a story that represents the changes that occur more broadly. In this story, advertising was once a creative field that any company with capital could hire out to increase their chances of getting noticed and purchased, albeit in a noisy way. Because everything was very uncertain, those that could afford it blew a lot of money on it (“Half of advertising is useless; the problem is knowing which half”). A similar story could be told about access to the news–dominated by big budgets that hid quality–and political candidates–whose activities were largely not exposed to scrutiny and could follow a similarly noisy pattern of hype and success. Then along came the Internet and targeted advertising, which did a number of things: • It reduced search costs for people looking for particular products, because Google searches the web and Amazon indexes all the products (and because of lots of smaller versions of Google and Amazon). • It reduced the uncertainty of advertising effectiveness because it allowed for fine-grained measurement of conversion metrics. This reduced the search costs of producers to advertisers, and from advertisers to audiences. • It reduced the search costs of people finding alternative media and political interest groups, leading to a reorganization of culture. The media and cultural landscape could more precisely reflect the exogenous factors of social difference. • It reduced the cost of finding people based on their wealth, social influence, and so on, implicitly creating a kind of ‘social credit system’ distributed across various web services. (Gandy, 1993; Fourcade and Healy, 2016) What happens when you reduce search costs in markets? Robert Jensen’s (2007) study of the introduction of mobile phones to fish markets in Kerala is illustrative here. Fish prices were very noisy due to bad communication until mobile phones were introduced. After that, the prices stabilized, owing to swifter communication between fisherman and markets. Suddenly able to preempt prices rather than subject to the vagaries to them, fisherman could then choose to go to the market that would give them the best price. Reducing search costs makes markets more efficient and larger. In doing so, it increases inequality, because whereas a lot of lower quality goods and services can survive in a noisy economy, when consumers are more informed and more efficient at searching, they can cut out less useful services. They can then standardize on “the best” option available, which can be produced with economies of scale. So inefficient, noisy parts of the economy were squeezed out and the surplus amassed in the hands of a big few intermediaries, who we now see as Big Tech leveraging AI. Is AI an appropriate term? I have always liked this definition of AI: “Anything that humans still do better than computers.” Most recently I’ve seen this restated in an interview with Andrew Moore, quoted by Zachary Lipton: Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence. The use of technical platforms to dramatically reduce search costs. “Searching” for people, products, and information is something that used to require human intelligence. Now it is assisted by computers. And whether or not the average user knows that they are doing when they search (Mulligan and Griffin, 2018), as a commercial function, the panoply of search engines and recommendation systems and auctions that occupy the central places in the information economy outperform human intelligence largely by virtue of having access to more data–a broader perspective–than any individual human could ever accomplish. The comparison between the Google search engine and a human’s intelligence is therefore ill-posed. The kinds of functions tech platforms are performing are things that have only every been solved by human organizations, especially bureaucratic ones. And while the digital user interfaces of these services hides the people “inside” the machines, we know that of course there’s an enormous amount of ongoing human labor involved in the creation and maintenance of any successful “AI” that’s in production. In conclusion, the Internet changed everything for a mundane reason that could have been predicted from neoclassical economic theory. It reduced search costs, creating economic efficiency and inequality, by allowing for new kinds of organizations based on broad digital connectivity. “AI” is a distraction from these accomplishments, as is most “critical” reaction to these developments, which do not do justice to the facts of the matter because by taking up a humanistic lens, they tend not to address how decisions by individual humans and changes to their experience experience are due to large-scale aggregate processes and strategic behaviors by businesses. References Gandy Jr, Oscar H. The Panoptic Sort: A Political Economy of Personal Information. Critical Studies in Communication and in the Cultural Industries. Westview Press, Inc., 5500 Central Avenue, Boulder, CO 80301-2877 (paperback: ISBN-0-8133-1657-X,18.95; hardcover: ISBN-0-8133-1656-1, $61.50)., 1993. Fourcade, Marion, and Kieran Healy. “Seeing like a market.” Socio-Economic Review 15.1 (2016): 9-29. Jensen, Robert. “The digital provide: Information (technology), market performance, and welfare in the South Indian fisheries sector.” The quarterly journal of economics 122.3 (2007): 879-924. Mulligan, Deirdre K. and Griffin, Daniel S. “Rescripting Search to Respect the Right to Truth.” 2 GEO. L. TECH. REV. 557 (2018) ## July 10, 2018 Ph.D. student #### search engines and authoritarian threats I’ve been intrigued by Daniel Griffin’s tweets lately, which have been about situating some upcoming work of his an Deirdre Mulligan’s regarding the experience of using search engines. There is a lively discussion lately about the experience of those searching for information and the way they respond to misinformation or extremism that they discover through organic use of search engines and media recommendation systems. This is apparently how the concern around “fake news” has developed in the HCI and STS world since it became an issue shortly after the 2016 election. I do not have much to add to this discussion directly. Consumer misuse of search engines is, to me, analogous to consumer misuse of other forms of print media. I would assume to best solution to it is education in the complete sense, and the problems with the U.S. education system are, despite all good intentions, not HCI problems. Wearing my privacy researcher hat, however, I have become interested in a different aspect of search engines and the politics around them that is less obvious to the consumer and therefore less popularly discussed, but I fear is more pernicious precisely because it is not part of the general imaginary around search. This is the aspect that is around the tracking of search engine activity, and what it means for this activity to be in the hands of not just such benevolent organizations such as Google, but also such malevolent organizations such as Bizarro World Google*. Here is the scenario, so to speak: for whatever reason, we begin to see ourselves in a more adversarial relationship with search engines. I mean “search engine” here in the broad sense, including Siri, Alexa, Google News, YouTube, Bing, Baidu, Yandex, and all the more minor search engines embedded in web services and appliances that do something more focused than crawl the whole web. By ‘search engine’ I mean entire UX paradigm of the query into the vast unknown of semantic and semiotic space that contemporary information access depends on. In all these cases, the user is at a systematic disadvantage in the sense that their query is a data point amount many others. The task of the search engine is to predict the desired response to the query and provide it. In return, the search engine gets the query, tied to the identity of the user. That is one piece of a larger mosaic; to be a search engine is to have a picture of a population and their interests and the mandate to categorize and understand those people. In Western neoliberal political systems the central function of the search engine is realized as commercial transaction facilitating other commercial transactions. My “search” is a consumer service; I “pay” for this search by giving my query to the adjoined advertising function, which allows other commercial providers to “search” for me, indirectly, through the ad auction platform. It is a market with more than just two sides. There’s the consumer who wants information and may be tempted by other information. There are the primary content providers, who satisfy consumer content demand directly. And there are secondary content providers who want to intrude on consumer attention in a systematic and successful way. The commercial, ad-enabled search engine reduces transaction costs for the consumer’s search and sells a fraction of that attentional surplus to the advertisers. Striking the right balance, the consumer is happy enough with the trade. Part of the success of commercial search engines is the promise of privacy in the sense that the consumer’s queries are entrusted secretly with the engine, and this data is not leaked or sold. Wise people know not to write into email things that they would not want in the worst case exposed to the public. Unwise people are more common than wise people, and ill-considered emails are written all the time. Most unwise people do not come to harm because of this because privacy in email is a de facto standard; it is the very security of email that makes the possibility of its being leaked alarming. So to with search engine queries. “Ask me anything,” suggests the search engine, “I won’t tell”. “Well, I will reveal your data in an aggregate way; I’ll expose you to selective advertising. But I’m a trusted intermediary. You won’t come to any harms besides exposure to a few ads.” That is all a safe assumption until it isn’t, at which point we must reconsider the role of the search engine. Suppose that, instead of living in a neoliberal democracy where the free search for information was sanctioned as necessary for the operation of a free market, we lived in an authoritarian country organized around the principle that disloyalty to the state should be crushed. Under these conditions, the transition of a society into one that depends for its access to information on search engines is quite troubling. The act of looking for information is a political signal. Suppose you are looking for information about an extremist, subversive ideology. To do so is to flag yourself as a potential threat of the state. Suppose that you are looking for information about a morally dubious activity. To do so is to make yourself vulnerable to kompromat. Under an authoritarian regime, curiosity and free thought are a problem, and a problem that are readily identified by ones search queries. Further, an authoritarian regime benefits if the risks of searching for the ‘wrong’ thing are widely known, since it suppresses inquiry. Hence, the very vaguely announced and, in fact, implausible to implement Social Credit System in China does not need to exist to be effective; people need only believe it exists for it to have a chilling and organizing effect on behavior. That is the lesson of the Foucouldean panopticon: it doesn’t need a guard sitting in it to function. Do we have a word for this function of search engines in an authoritarian system? We haven’t needed one in our liberal democracy, which perhaps we take for granted. “Censorship” does not apply, because what’s at stake is not speech but the ability to listen and learn. “Surveillance” is too general. It doesn’t capture the specific constraints on acquiring information, on being curious. What is the right term for this threat? What is the term for the corresponding liberty? I’ll conclude with a chilling thought: when at war, all states are authoritarian, to somebody. Every state has an extremist, subversive ideology that it watches out for and tries in one way or another to suppress. Our search queries are always of strategic or tactical interest to somebody. Search engine policies are always an issue of national security, in one way or another. Ph.D. student #### Exploring Implications of Everyday Brain-Computer Interface Adoption through Design Fiction This blog post is a version of a talk I gave at the 2018 ACM Designing Interactive Systems (DIS) Conference based on a paper written with Nick Merrill and John Chuang, entitled When BCIs have APIs: Design Fictions of Everyday Brain-Computer Interface Adoption. Find out more on our project page, or download the paper: [PDF link] [ACM link] In recent years, brain computer interfaces, or BCIs, have shifted from far-off science fiction, to medical research, to the realm of consumer-grade devices that can sense brainwaves and EEG signals. Brain computer interfaces have also featured more prominently in corporate and public imaginations, such as Elon Musk’s project that has been said to create a global shared brain, or fears that BCIs will result in thought control. Most of these narratives and imaginings about BCIs tend to be utopian, or dystopian, imagining radical technological or social change. However, we instead aim to imagine futures that are not radically different from our own. In our project, we use design fiction to ask: how can we graft brain computer interfaces onto the everyday and mundane worlds we already live in? How can we explore how BCI uses, benefits, and labor practices may not be evenly distributed when they get adopted? Brain computer interfaces allow the control of a computer from neural output. In recent years, several consumer-grade brain-computer interface devices have come to market. One example is the Neurable – it’s a headset used as an input device for virtual reality systems. It detects when a user recognizes an object that they want to select. It uses a phenomenon called the P300 – when a person either recognizes a stimulus, or receives a stimulus they are not expecting, electrical activity in their brain spikes approximately 300 milliseconds after the stimulus. This electrical spike can be detected by an EEG, and by several consumer BCI devices such as the Neurable. Applications utilizing the P300 phenomenon include hands-free ways to type or click. Demo video of a text entry system using the P300 Neurable demonstration video We base our analysis on this already-existing capability of brain computer interfaces, rather than the more fantastical narratives (at least for now) of computers being able to clearly read humans’ inner thoughts and emotions. Instead, we create a set of scenarios that makes use of the P300 phenomenon in new applications, combined with the adoption of consumer-grade BCIs by new groups and social systems. Stories about BCI’s hypothetical future as a device to make life easier for “everyone” abound, particularly in Silicon Valley, as shown in recent research. These tend to be very totalizing accounts, neglecting the nuance of multiple everyday experiences. However, past research shows that the introductions of new digital technologies end up unevenly shaping practices and arrangements of power and work – from the introduction of computers in workplaces in the 1980s, to the introduction of email, to forms of labor enabled algorithms and digital platforms. We use a set of a design fictions to interrogate these potential arrangements in BCI systems, situated in different types of workers’ everyday experiences. # Design Fictions Design fiction is a practice of creating conceptual designs or artifacts that help create a fictional reality. We can use design fiction to ask questions about possible configurations of the world and to think through issues that have relevance and implications for present realities. (I’ve written more about design fiction in prior blog posts). We build on Lindley et al.’s proposal to use design fiction to study the “implications for adoption” of emerging technologies. They argue that design fiction can “create plausible, mundane, and speculative futures, within which today’s emerging technologies may be prototyped as if they are domesticated and situated,” which we can then analyze with a range of lenses, such as those from science and technology studies. For us, this lets us think about technologies beyond ideal use cases. It lets us be attuned to the experiences of power and inequalities that people experience today, and interrogate how emerging technologies might get uptaken, reused, and reinterpreted in a variety of existing social relations and systems of power. To explore this, we thus created a set of interconnected design fictions that exist within the same fictional universe, showing different sites of adoptions and interactions. We build on Coulton et al.’s insight that design fiction can be a “world-building” exercise; design fictions can simultaneously exist in the same imagined world and provide multiple “entry points” into that world. We created 4 design fictions that exist in the same world: (1) a README for a fictional BCI API, (2) a programmer’s question on StackOverflow who is working with the API, (3) an internal business memo from an online dating company, (4) a set of forum posts by crowdworkers who use BCIs to do content moderation tasks. These are downloadable at our project page if you want to see them in more detail. (I’ll also note that we conducted our work in the United States, and that our authorship of these fictions, as well as interpretations and analysis are informed by this sociocultural context.) Design Fiction 1: README documentation of an API for identifying P300 spikes in a stream of EEG signals First, this is README documentation of an API for identifying P300 spikes in a stream of EEG signals. The P300 response, or “oddball” response is a real phenomenon. It’s a spike in brain activity when a person is either surprised, or when see something that they’re looking for. This fictional API helps identify those spikes in EEG data. We made this fiction in the form of a GitHub page to emphasize the everyday nature of this documentation, from the viewpoint of a software developer. In the fiction, the algorithms underlying this API come from a specific set of training data from a controlled environment in a university research lab. The API discloses and openly links to the data that its algorithms were trained on. In our creation and analysis of this fiction, for us it surfaces ambiguity and a tension about how generalizable the system’s model of the brain is. The API with a README implies that the system is meant to be generalizable, despite some indications based on its training dataset that it might be more limited. This fiction also gestures more broadly toward the involvement of academic research in larger technical infrastructures. The documentation notes that the API started as a research project by a professor at a University before becoming hosted and maintained by a large tech company. For us, this highlights how collaborations between research and industry may produce artifacts that move into broader contexts. Yet researchers may not be thinking about the potential effects or implications of their technical systems in these broader contexts. Design Fiction 2: A question on StackOverflow Second, a developer, Jay, is working with the BCI API to develop a tool for content moderation. He asks a question on Stack Overflow, a real website for developers to ask and answer technical questions. He questions the API’s applicability beyond lab-based stimuli, asking “do these ‘lab’ P300 responses really apply to other things? If you are looking over messages to see if any of them are abusive, will we really see the ‘same’ P300 response?” The answers from other developers suggest that they predominantly believe the API is generalizable to a broader class of tasks, with the most agreed-upon answer saying “The P300 is a general response, and should apply perfectly well to your problem.” This fiction helps us explore how and where contestation may occur in technical communities, and where discussion of social values or social implications could arise. We imagine the first developer, Jay, as someone who is sensitive to the way the API was trained, and questions its applicability to a new domain. However, he encounters the commenters who believe that physiological signals are always generalizable, and don’t engage in questions of broader applicability. The community’s answers re-enforce notions not just of what the technical artifacts can do, but what the human brain can do. The stack overflow answers draw on a popular, though critiqued, notion of the “brain-as-computer,” framing the brain as a processing unit with generic processes that take inputs and produce outputs. Here, this notion is reinforced in the social realm on Stack Overflow. Design Fiction 3: An internal business memo for a fictional online dating company Meanwhile, SparkTheMatch.com, a fictional online dating service, is struggling to moderate and manage inappropriate user content on their platform. SparkTheMatch wants to utilize the P300 signal to tap into people’s tacit “gut feelings” to recognize inappropriate content. They are planning to implement a content moderation process using crowdsourced workers wearing BCIs. In creating this fiction, we use the memo to provide insight into some of the practices and labor supporting the BCI-assisted review process from the company’s perspective. The memo suggests that the use of BCIs with Mechanical Turk will “help increase efficiency” for crowdworkers while still giving them a fair wage. The crowdworkers sit and watch a stream of flashing content, while wearing a BCI and the P300 response will subconsciously identity when workers recognize supposedly abnormal content. Yet we find it debatable whether or not this process improves the material conditions of the Turk workers. The amount of content to look at in order to make the supposedly fair wage may not actually be reasonable. SparkTheMatch employees creating the Mechanical Turk tasks don’t directly interact with the BCI API. Instead they use pre-defined templates created by the company’s IT staff, a much more mediated interaction compared to the programmers and developers reading documentation and posting on Stack Overflow. By this point, the research lab origins of the P300 API underlying the service and questions about its broader applicability are hidden. From the viewpoint of SparkTheMatch staff, the BCI-aspects of their service just “works,” allowing managers to design their workflows around it, obfuscating the inner workings of the P300 API. Design fiction 4: A crowdworker forum for workers who use BCIs Fourth, the Mechanical Turk workers who do the SparkTheMatch content moderation work, share their experiences on a crowdworker forum. These crowd workers’ experiences and relationships to the P300 API is strikingly different from the people and organizations described in the other fictions—notably the API is something that they do not get to explicitly see. Aspects of the system are blackboxed or hidden away. While one poster discusses some errors that occurred, there’s ambiguity about whether fault lies with the BCI device or the data processing. EEG signals are not easily human-comprehensible, making feedback mechanisms difficult. Other posters blame the user for the errors. Which is problematic, given the preciousness of these workers’ positions, as crowd workers tend to have few forms of recourse when encountering problems with tasks. For us, these forum accounts are interesting because they describe a situation in which the BCI user is not the person who obtains the real benefits of its use. It’s the company SparkTheMatch, not the BCI-end users, that is obtaining the most benefit from BCIs. # Some Emergent Themes and Reflections From these design fictions, several salient themes arose for us. By looking at BCIs from the perspective of several everyday experiences, we can see different types of work done in relation to BCIs – whether that’s doing software development, being a client for a BCI-service, or using the BCI to conduct work. Our fictions are inspired by others’ research on the existing labor relationships and power dynamics in crowdwork and distributed content moderation (in particular work by scholars Lilly Irani and Sarah T. Roberts). Here we also critique utopian narratives of brain-controlled computing that suggest BCIs will create new efficiencies, seamless interactions, and increased productivity. We investigate a set of questions on the role of technology in shaping and reproducing social and economic inequalities. Second, we use the design fiction to surface questions about the situatedness of brain sensing, questioning how generalizable and universal physiological signals are. Building on prior accounts of situated actions and extended cognition, we note the specific and the particular should be taken into account in the design of supposedly generalizable BCI systems. These themes arose iteratively, and were somewhat surprising for us, particularly just how different the BCI system looks like from each of the different perspectives in the fictions. We initially set out to create a rather mundane fictional platform or infrastructure, an API for BCIs. With this starting point we brainstormed other types of direct and indirect relationships people might have with our BCI API to create multiple “entry points” into our API’s world. We iterated on various types of relationships and artifacts—there are end-users, but also clients, software engineers, app developers, each of whom might interact with an API in different ways, directly or indirectly. Through iterations of different scenarios (a BCI-assisted tax filing service was thought of at one point), and through discussions with our colleagues (some of whom posed questions about what labor in higher education might look like with BCIs), we slowly began to think that looking at the work practices implicated in these different relationships and artifacts would be a fruitful way to focus our designs. ## Toward “Platform Fictions” In part, we think that creating design fictions in mundane technical forms like documentation or stack overflow posts might help the artifacts be legible to software engineers and technical researchers. More generally, this leads us to think more about what it might mean to put platforms and infrastructures at the center of design fiction (as well as build on some of the insights from platform studies and infrastructure studies). Adoption and use does not occur in a vacuum. Rather, technologies get adopted into and by existing sociotechnical systems. We can use design fiction to open the “black boxes” of emerging sociotechnical systems. Given that infrastructures are often relegated to the background in everyday use, surfacing and focusing on an infrastructure helps us situate our design fictions in the everyday and mundane, rather than dystopia or utopia. We find that using a digital infrastructure as a starting point helps surface multiple subject positions in relation to the system at different sites of interaction, beyond those of end-users. From each of these subject positions, we can see where contestation may occur, and how the system looks different. We can also see how assumptions, values, and practices surrounding the system at a particular place and time can be hidden, adapted, or changed by the time the system reaches others. Importantly, we also try to surface ways the system gets used in potentially unintended ways – we don’t think that the academic researchers who developed the API to detect brain signal spikes imagined that it would be used in a system of arguably exploitative crowd labor for content moderation. Our fictions try to blur clear distinctions that might suggest what happens in “labs,” is separate from the “the outside world”, instead highlighting their entanglements. Given that much of BCI research currently exists in research labs, we raise this point to argue that BCI researchers and designers should also be concerned about the implications of adoption and application. This helps gives us insight into the responsibilities (and complicitness) of researchers and builders of technical systems. Some of the recent controversies around Cambridge Analytica’s use of Facebook’s API points to ways in which the building of platforms and infrastructures isn’t neutral, and that it’s incumbent upon designers, developers, and researchers to raise issues related to social concerns and potential inequalities related to adoption and appropriation by others. ## Concluding Thoughts This work isn’t meant to be predictive. The fictions and analysis present our specific viewpoints by focusing on several types of everyday experiences. One can read many themes into our fictions, and we encourage others to do so. But we find that focusing on potential adoptions of an emerging technology in the everyday and mundane helps surface contours of debates that might occur, which might not be immediately obvious when thinking about BCIs – and might not be immediately obvious if we think about social implications in terms of “worst case scenarios” or dystopias. We hope that this work can raise awareness among BCI researchers and designers about social responsibilities they may have for their technology’s adoption and use. In future work, we plan to use these fictions as research probes to understand how technical researchers envision BCI adoptions and their social responsibilities, building on some of our prior projects. And for design researchers, we show that using a fictional platform in design fiction can help raise important social issues about technology adoption and use from multiple perspectives beyond those of end-users, and help surface issues that might arise from unintended or unexpected adoption and use. Using design fiction to interrogate sociotechnical issues present in the everyday can better help us think about the futures we desire. Crossposted with the UC Berkeley BioSENSE Blog Ph.D. student #### The California Consumer Privacy Act of 2018: a deep dive I have given the California Consumer Privacy Act of 2018 a close read. In summary, the act grants consumers a right to request that businesses disclose the categories of information about them that it collects and sells, and gives consumers the right to businesses to delete their information and opt out of sale. What follows are points I found particularly interesting. Quotations from the Act (that’s what I’ll call it) will be in bold. Questions (meaning, questions that I don’t have an answer to at the time of writing) will be in italics. #### Privacy rights SEC. 2. The Legislature finds and declares that: (a) In 1972, California voters amended the California Constitution to include the right of privacy among the “inalienable” rights of all people. … I did not know that. I was under the impression that in the United States, the ‘right to privacy’ was a matter of legal interpretation, derived from other more explicitly protected rights. A right to privacy is enumerated in Article 12 of the Universal Declaration of Human Rights, adopted in 1948 by the United Nations General Assembly. There’s something like a right to privacy in Article 8 of the 1950 European Convention on Human Rights. California appears to have followed their lead on this. In several places in the Act, it specifies that exceptions may be made in order to be compliant with federal law. Is there an ideological or legal disconnect between privacy in California and privacy nationally? Consider the Snowden/Schrems/Privacy Shield issue: exchanges of European data to the United States are given protections from federal surveillance practices. This presumably means that the U.S. federal government agrees to respect EU privacy rights. Can California negotiate for such treatment from the U.S. government? These are the rights specifically granted by the Act: [SEC. 2.] (i) Therefore, it is the intent of the Legislature to further Californians’ right to privacy by giving consumers an effective way to control their personal information, by ensuring the following rights: (1) The right of Californians to know what personal information is being collected about them. (2) The right of Californians to know whether their personal information is sold or disclosed and to whom. (3) The right of Californians to say no to the sale of personal information. (4) The right of Californians to access their personal information. (5) The right of Californians to equal service and price, even if they exercise their privacy rights. It has been only recently that I’ve been attuned to the idea of privacy rights. Perhaps this is because I am from a place that apparently does not have them. A comparison that I believe should be made more often is the comparison of privacy rights to property rights. Clearly privacy rights have become as economically relevant as property rights. But currently, property rights enjoy a widespread acceptance and enforcement that privacy rights do not. #### Personal information defined through example categories “Information” is a notoriously difficult thing to define. The Act gets around the problem of defining “personal information” by repeatedly providing many examples of it. The examples are themselves rather abstract and are implicitly “categories” of personal information. Categorization of personal information is important to the law because under several conditions businesses must disclose the categories of personal information collected, sold, etc. to consumers. SEC. 2. (e) Many businesses collect personal information from California consumers. They may know where a consumer lives and how many children a consumer has, how fast a consumer drives, a consumer’s personality, sleep habits, biometric and health information, financial information, precise geolocation information, and social networks, to name a few categories. [1798.140.] (o) (1) “Personal information” means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following: (A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers. (B) Any categories of personal information described in subdivision (e) of Section 1798.80. (C) Characteristics of protected classifications under California or federal law. (D) Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies. Note that protected classifications (1798.140.(o)(1)(C)) includes race, which is socially constructed category (see Omi and Winant on racial formation). The Act appears to be saying that personal information includes the race of the consumer. Contrast this with information as identifiers (see 1798.140.(o)(1)(A)) and information as records (1798.140.(o)(1)(D)). So “personal information” in one case is the property of a person (and a socially constructed one at that); in another case it is the specific syntactic form; in another case it is a document representing some past action. The Act is very ontologically confused. Other categories of personal information include (continuing this last section): (E) Biometric information. (F) Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an Internet Web site, application, or advertisement. Devices and Internet activity will be discussed in more depth in the next section. (G) Geolocation data. (H) Audio, electronic, visual, thermal, olfactory, or similar information. (I) Professional or employment-related information. (J) Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. section 1232g, 34 C.F.R. Part 99). (K) Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer’s preferences, characteristics, psychological trends, preferences, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes. Given that the main use of information is to support inferences, it is notable that inferences are dealt with here as a special category of information, and that sensitive inferences are those that pertain to behavior and psychology. This may be narrowly interpreted to exclude some kinds of inferences that may be relevant and valuable but not so immediately recognizable as ‘personal’. For example, one could infer from personal information the ‘position’ of a person in an arbitrary multi-dimensional space that compresses everything known about a consumer, and use this representation for targeted interventions (such as advertising). Or one could interpret it broadly: since almost all personal information is relevant to ‘behavior’ in a broad sense, and inference from it is also ‘about behavior’, and therefore protected. #### Device behavior The Act focuses on the rights of consumers and deals somewhat awkwardly with the fact that most information collected about consumers is done indirectly through machines. The Act acknowledges that sometimes devices are used by more than one person (for example, when they are used by a family), but it does not deal easily with other forms of sharing arrangements (i.e., an open Wifi hotspot) and the problems associated with identifying which person a particular device’s activity is “about”. [1798.140.] (g) “Consumer” means a natural person who is a California resident, as defined in Section 17014 of Title 18 of the California Code of Regulations, as that section read on September 1, 2017, however identified, including by any unique identifier. [SB: italics mine.] [1798.140.] (x) “Unique identifier” or “Unique personal identifier” means a persistent identifier that can be used to recognize a consumer, a family, or a device that is linked to a consumer or family, over time and across different services, including, but not limited to, a device identifier; an Internet Protocol address; cookies, beacons, pixel tags, mobile ad identifiers, or similar technology; customer number, unique pseudonym, or user alias; telephone numbers, or other forms of persistent or probabilistic identifiers that can be used to identify a particular consumer or device. For purposes of this subdivision, “family” means a custodial parent or guardian and any minor children over which the parent or guardian has custody. Suppose you are a business that collects traffic information and website behavior connected to IP addresses, but you don’t go through the effort of identifying the ‘consumer’ who is doing the behavior. In fact, you may collect a lot of traffic behavior that is not connected to any particular ‘consumer’ at all, but is rather the activity of a bot or crawler operated by a business. Are you on the hook to disclose personal information to consumers if they ask for their traffic activity? If they do, or if they do not, provide their IP address? Incidentally, while the Act seems comfortable defining a Consumer as a natural person identified by a machine address, it also happily defines a Person as “proprietorship, firm, partnership, joint venture, syndicate, business trust, company, corporation, …” etc. in addition to “an individual”. Note that “personal information” is specifically information about a consumer, not a Person (i.e., business). This may make you wonder what a Business is, since these are the entities that are bound by the Act. #### Businesses and California The Act mainly details the rights that consumers have with respect to businesses that collect, sell, or lose their information. But what is a business? [1798.140.] (c) “Business” means: (1) A sole proprietorship, partnership, limited liability company, corporation, association, or other legal entity that is organized or operated for the profit or financial benefit of its shareholders or other owners, that collects consumers’ personal information, or on the behalf of which such information is collected and that alone, or jointly with others, determines the purposes and means of the processing of consumers’ personal information, that does business in the State of California, and that satisfies one or more of the following thresholds: (A) Has annual gross revenues in excess of twenty-five million dollars ($25,000,000), as adjusted pursuant to paragraph (5) of subdivision (a) of Section 1798.185.

(B) Alone or in combination, annually buys, receives for the business’ commercial purposes, sells, or shares for commercial purposes, alone or in combination, the personal information of 50,000 or more consumers, households, or devices.

(C) Derives 50 percent or more of its annual revenues from selling consumers’ personal information.

This is not a generic definition of a business, just as the earlier definition of ‘consumer’ is not a generic definition of consumer. This definition of ‘business’ is a sui generis definition for the purposes of consumer privacy protection, as it defines businesses in terms of their collection and use of personal information. The definition explicitly thresholds the applicability of the law to businesses over certain limits.

There does appear to be a lot of wiggle room and potential for abuse here. Consider: the Mirai botnet had by one estimate 2.5 million devices compromised. Say you are a small business that collects site traffic. Suppose the Mirai botnet targets your site with a DDOS attack. Suddenly, your business collects information of millions of devices, and the Act comes into effect. Now you are liable for disclosing consumer information. Is that right?

An alternative reading of this section would recall that the definition (!) of consumer, in this law, is a California resident. So maybe the thresholds in 1798.140.(c)(B) and 1798.140.(c)(C) refer specifically to Californian consumers. Of course, for any particular device, information about where that device’s owner lives is personal information.

Having 50,000 California customers or users is a decent threshold for defining whether or not a business “does business in California”. Given the size and demographics of California, you would expect that many of the, just for example, major Chinese technology companies like Tencent to have 50,000 Californian users. This brings up the question of extraterritorial enforcement, which gave the GDPR so much leverage.

#### Extraterritoriality and financing

In a nutshell, it looks like the Act is intended to allow Californians to sue foreign companies. How big a deal is this? The penalties for noncompliance are civil penalties and a price per violation (presumably individual violation), not a ratio of profit, but you could imagine them adding up:

Update:

Ph.D. student

#### So you want to start a data science institute? Achieving sustainability

This is a post that first appeared on the Software Sustainability Institute’s blog and was co-authored by myself, Alejandra Gonzalez-Beltran, Robert Haines, James Hetherington, Chris Holdgraf, Heiko Mueller, Martin O’Reilly, Tomas Petricek, Jake VanderPlas (authors in alphabetical order) during a workshop at the Alan Turing Institute.

## Introduction: Sustaining Data Science and Research Software Engineering

Data and software have enmeshed themselves in the academic world, and are a growing force in most academic disciplines (many of which are not traditionally seen as “data-intensive”). Many universities wish to improve their ability to create software tools, enable efficient data-intensive collaborations, and spread the use of “data science” methods in the academic community.

The fundamentally cross-disciplinary nature of such activities has led to a common model: the creation of institutes or organisations not bound to a particular department or discipline, focusing on the skills and tools that are common across the academic world. However, creating institutes with a cross-university mandate and non-standard academic practices is challenging. These organisations often do not fit into the “traditional” academic model of institutes or departments, and involve work that is not incentivised or rewarded under traditional academic metrics. To add to this challenge, the combination of quantitative and qualitative skills needed is also highly in-demand in non-academic sectors. This raises the question: how do you create such institutes so that they attract top-notch candidates, sustain themselves over time, and provide value both to members of the group as well as the broader university community?

In recent years many universities have experimented with organisational structures aimed at acheiving this goal. They focus on combining research software, data analytics, and training for the broader academic world, and intentionally cut across scientific disciplines. Two-such groups are the Moore-Sloan Data Science Environments based in the USA and the Research Software Engineer groups based in the UK. Representatives from both countries recently met at the Alan Turing Institute in London for the RSE4DataScience18 Workshop to discuss their collective experiences at creating successful data science and research software institutes.

This article synthesises the collective experience of these groups, with a focus on challenges and solutions around the topic of sustainability. To put it bluntly: a sustainable institute depends on sustaining the people within it. This article focuses on three topics that have proven crucial.

1. Creating consistent and competitive funding models.
2. Building a positive culture and an environment where all members feel valued.
3. Defining career trajectories that cater to the diverse goals of members within the organisation.

We’ll discuss each of these points below, and provide some suggestions, tips, and lessons-learned in accomplishing each.

### An Aside on Nomenclature

The terms Research Software Engineer (i.e. RSE; most often used by UK partners) and Data Scientist (most often used by USA partners) have slightly different connotations, but we will not dwell on those aspects here (see Research Software Engineers and Data Scientists: More in Common for some more thoughts on this). In the current document, we will mostly use the terms RSE and Data Scientist interchangeably, to denote the broad range of positions that focus on software-intensive and data-intensive research within academia. In practice, we find that most people flexibly operate in both worlds simultaneously.

## Challenges & Proposed Solutions

### Challenge: Financial sustainability

How can institutions find the financial support to run an RSE program?

The primary challenge for sustainability of this type of program is often financial: how do you raise the funding necessary to hire data scientists and support their research? While this doesn’t require paying industry-leading rates for similar work, it does require resources to compensate people comfortably. In practice, institutions have come at this from a number of angles:

Private Funding: Funding from private philanthropic organisations has been instrumental in getting some of these programs off the ground: for example, the Moore-Sloan Data Science Initiative funded these types of programs for five years at the University of Washington (UW), UC Berkeley, and New York University (NYU). This is probably best viewed as seed funding to help the institutions get on their feet, with the goal of seeking other funding sources for the long term.

Organisational Grants: Many granting organisations (such as the NSF or the UK Research Councils) have seen the importance of software to research, and are beginning to make funding available specifically for cross-disciplinary software-related and data science efforts. Examples are the Alan Turing Institute, mainly funded by the UK Engineering and Physical Sciences Research Council (EPSRC) and the NSF IGERT grant awarded to UW, which funded the interdisciplinary graduate program centered on the data science institute there.

Project-based Grants: There are also opportunities to gain funding for the development of software or to carry out scientific work that requires creating new tools. For example, several members of UC Berkeley were awarded a grant from the Sloan Foundation to hire developers for the NumPy software project. The grant provided enough funding to pay competitive wages with the broader tech community in the Bay Area.

Individual Grants: For organisations that give their RSEs principal investigator status, grants to individuals’ research programs can be a route to sustainable funding, particularly as granting organisations become more aware of and attuned to the importance of software in science. In the UK, the EPSRC has run two rounds of Research Software Engineer Fellowships, supporting leaders in the research software field for a period of five years to establish their RSE groups. Another example of a small grant for individuals promoting and supporting RSE activities is the Software Sustainability Institute fellowship.

Paid Consulting: Some RSE organisations have adopted a paid consulting model, in which they fund their institute by consulting with groups both inside and outside the university. This requires finding common goals with non-academic organisations, and agreeing to create open tools in order to accomplish those goals. An example is at Manchester, where as part of their role in research IT, RSEs provide paid on-demand technical research consulting services for members of the University community. Having a group of experts on campus able to do this sort of work is broadly beneficial to the University as a whole.

University Funding: Universities generally spend part of their budget on in-house services for students and researchers; a prime example is IT departments. When RSE institutes establish themselves as providing a benefit to the University community, the University administration may see fit to support those efforts: this has been the case at UW, where the University funds faculty positions within the data science institute. In addition, several RSE groups perform on-demand training sessions to research groups on campus in exchange for proceeds from research grants.

Information Technology (IT) Connections: IT organisations in universities are generally well-funded, and their present-day role is often far removed from their original mission of supporting computational research. One vision for sustainability is to reimagine RSE programs as the “research wing” of university IT, to make use of the relatively large IT funding stream to help enable more efficient computational research. This model has been implemented at the University of Manchester, where Research IT sits directly within the Division of IT Services. Some baseline funding is provided to support things like research application support and training, and RSE projects are funded via cost recovery.

Professors of Practice: Many U.S. universities have the notion of “professors of practice” or “clinical professors,” which often exist in professional schools like medicine, public policy, business, and law. In these positions, experts with specialised fields are recruited as faculty for their experience outside of traditional academic research. Such positions are typically salaried, but not tenure-track, with these faculty evaluated on different qualities than traditional faculty. Professors of practice are typically able to teach specialised courses, advise students, influence the direction of their departments, and get institutional support for various projects. Such a model could be applied to support academic data science efforts, perhaps by adopting the “Professor of practice” pattern within computational science departments.

Research Librarians: We also see similarities in how academic libraries have supported stable, long-term career paths for their staff. Many academic librarians are experts in both a particular domain specialty and in library science, and spend much of their time helping members of the community with their research. At some universities, librarians have tenure-track positions equivilant to those in academic departments, while at others, librarians are a distinct kind of administrative or staff track that often have substantial long-term job security and career progression. These types of institutions and positions provide a precedent for the kinds of flexible, yet stable academic careers that our data science institutes support.

### Challenge: Community cohesion and personal value

How to create a successful environment where people feel valued?

From our experience, there are four main points that help create an enjoyable and successful environment to facilitate success and makes people feel valued in their role.

Physical Space. The physical space that hosts the group plays an important role to creating an enjoyable working environment. In most cases there will be a lot of collaboration going on between people within the group but also with people from other departments within the university. Having facilities (e.g. meeting spaces) that support collaborative work on software projects will be a big facilitator for successful outputs.

Get Started Early. Another important aspect to creating a successful environment is to connect the group to other researchers with the university early on. It is important to inform people about the tasks and services the group provides, and to involve people early on who are well connected and respected within the university so that they can promote and champion the group within the university. This helps get the efforts off the ground early, and spread the word and bring on further opportunities.

Celebrate Each Other’s Work. While it may not be possible to convince the broader academic community to treat software as first-class research output, data science organisations should explicitly recognise many forms of scientific output, including tools and software, analytics workflows, or non-standard written communication. This is especially true for projects where there is no “owner”, such as major open-source projects. Just because your name isn’t “first” doesn’t mean you can’t make a valuable contribution to science. Creating a culture that celebrates these efforts makes individuals feel that their work is valued.

Allow Free Headspace. The roles of individuals should (i) enable them to work in collaboration with researchers from other domains (e.g., in a support role on their research projects) and (ii) also allow them to explore their own ‘research’ ideas. Involvement in research projects not only helps these projects develop reliable and reproducible results but can be an important source to help identify areas and tasks that are currently poorly supported be existing research software. Having free head space allows individuals to further pursue ideas that help solve the identified tasks. There are a lot of examples for successful open source software projects that have started as small side projects.

### Challenge: Preparing members for a diversity of careers

How do we establish career trajectories that value people’s skills and experience in this new inter-disciplinary domain?

The final dimension that we consider is that of the career progression of data scientists. Their career path generally differs from the traditional academic progression, and the traditional academic incentives and assessment criteria do not necessarily apply to the work they perform.

Professional Development. A data science institute should prepare its staff both in technical skills (such as software development best practices and data-intensive activities) as well as soft skills (such as team work and communication skills) that would allow them to be ready for their next career step in multiple interdisciplinary settings. Whether it is in academia or industry, data science is inherently collaborative, and requires working with a team composed of diverse skillsets.

Where Next. Most individuals will not spend their entire careers within a data science institute, which means their time must be seen as adequately preparing them for their next step. We envision that a data scientist could progress in their career either staying in academia, or moving to industry positions. For the former, career progression might involve moving to new supervisory roles, attaining PI status, or building research groups. For the latter, the acquired technical and soft skills are valuable in industrial settings and should allow for a smooth transition. Members should be encouraged to collaborate or communicate with industry partners in order to understand the roles that data analytics and software play in those organisations.

The Revolving Door. The career trajectory from academia to industry has traditionally been mostly a one-way street, with academic researchers and industry engineers living in different worlds. However, the value of data analytic methods cuts across both groups, and offers opportunities to learn from one another. We believe a Data Science Institute should encourage strong collaborations and a bi-directional and fluid interchange between academic and industrial endeavours. This will enable a more rapid spread of tools and best-practices, and support the intermixing of career paths between research and industry. We see the institute as ‘the revolving door’ with movement of personnel between different research and commercial roles, rather than a one-time commitment where members must choose one or the other.

## Final Thoughts

Though these efforts are still young, we have already seen the dividends of supporting RSEs and Data Scientists within our institutions in the USA and the UK. We hope this document can provide a roadmap for other institutions to develop sustainable programs in support of cross-disciplinary software and research.

#### Research Software Engineers and Data Scientists: More in Common

This is a post that first appeared on the Software Sustainability Institute’s blog and was co-authored by Matthew Archer, Stephen Dowsland, Rosa Filgueira, R. Stuart Geiger, Alejandra Gonzalez-Beltran, Robert Haines, James Hetherington, Christopher Holdgraf, Sanaz Jabbari Bayandor, David Mawdsley, Heiko Mueller, Tom Redfern, Martin O’Reilly, Valentina Staneva, Mark Turner, Jake VanderPlas, Kirstie Whitaker (authors in alphabetical order) during a workshop at the Alan Turing Institute.

In our institutions, we employ multidisciplinary research staff who work with colleagues across many research fields to use and create software to understand and exploit research data. These researchers collaborate with others across the academy to create software and models to understand, predict and classify data not just as a service to advance the research of others, but also as scholars with opinions about computational research as a field, making supportive interventions to advance the practice of science.

Some of us use the term “data scientist” to refer to our team members, in others we use “research software engineer” (RSE), and in some both. Where both terms are used, the difference seems to be that data scientists in an academic context focus more on using software to understand data, while research software engineers more often make software libraries for others to use. However, in some places, one or other term is used to cover both, according to local tradition.

### What we have in common

Regardless of job title, we hold in common many of the skills involved and the goal of driving the use of open and reproducible research practices.

Shared skill focuses include:

• Literate programming: writing code to be read by humans.
• Performant programming: the time or memory used by the code really matters
• Algorithmic understanding: you need to know what the maths of the code you’re working with actually does.
• Coding for a product: software and scripts need to live beyond the author, being used by others.
• Verification and testing: it’s important that the script does what you think it does.
• Scaling beyond the laptop: because performance matters, cloud and HPC skills are important.
• Data wrangling: parsing, managing, linking and cleaning research data in an arcane variety of file formats.
• Interactivity: the visual display of quantitative information.

Shared attitudes and approaches to work are also important commonalities:

• Multidisciplinary agility: the ability to learn what you need from a new research domain as you begin a collaboration.
• Navigating the research landscape: learning the techniques, languages, libraries and algorithms you need as you need them.
• Managing impostor syndrome: as generalists, we know we don’t know the detail of our methods quite as well as the focused specialists, and we know how to work with experts when we need to.

### Our differences emerge from historical context

The very close relationship thus seen between the two professional titles is not an accident. In different places, different tactics have been tried to resolve a common set of frustrations seen as scholars struggle to make effective use of information technology.

In the UK, the RSE Groups have tried to move computational research forward by embracing a service culture while retaining participation in the academic community, sometimes described as being both a “craftsperson and a scholar”, or science-as-a-service. We believe we make a real difference to computational research as a discipline by helping individual research groups use and create software more effectively for research, and that this helps us to create genuine value for researchers rather than to build and publish tools that are not used by researchers to do research.

The Moore-Sloan Data Science Environments (MSDSE) in the US are working to establish Data Science as a new academic interdisciplinary field, bringing together researchers from domain and methodology fields to collectively develop best practices and software for academic research. While these institutes also facilitate collaboration across academia, their funding models are less based on a service model than in UKRSE groups and more based on bringing together graduate students, postdocs, research staff, and faculty across academia together in a shared environment.

Although these approaches differ strongly, we nevertheless see that the skills, behaviours and attitudes used by the people struggling to make this work are very similar. Both movements are tackling similar issues, but in different institutional contexts. We took diverging paths from a common starting point, but now find ourselves envisaging a shared future.

The Alan Turing Institute in the UK straddles the two models, with both a Research Engineering Group following a science-as-a-service model and comprising both Data Scientists and RSEs, and a wider collaborative academic data science engagement across eleven partner universities.

### Recommendations

Observing this convergence, we recommend:

• Create adverts and job descriptions that are welcoming to people who identify as one or the other title: the important thing is to attract and retain the right people.
• Standardised nomenclature is important, but over-specification is harmful. Don’t try too hard to delineate the exact differences in the responsibilities of the two roles: people can and will move between projects and focuses, and this is a good thing.
• These roles, titles, groups, and fields are emerging and defined differently across institutions. It is important to have clear messaging to various stakeholders about the responsibilities and expectations of people in these roles.
• Be open to evolving roles for team members, and ensure that stable, long-term career paths exist to support those who have taken the risk to work in emerging roles.
• Don’t restrict your recruitment drive to people who have worked with one or other of these titles: the skills you need could be found in someone whose earlier roles used the other term.
• Don’t be afraid to embrace service models to allow financial and institutional sustainability, but always maintain the genuine academic collaboration needed for research to flourish.

## April 16, 2018

Ph.D. student

#### Keeping computation open to interpetation: Ethnographers, step right in, please

This is a post that first appeared on the ETHOSLab Blog, written by myself, Bastian Jørgensen (PhD fellow at Technologies in Practice, ITU), Michael Hockenhull (PhD fellow at Technologies in Practice, ITU), Mace Ojala (Research Assistant at Technologies in Practice, ITU).

## Introduction: When is data science?

We recently held a workshop at ETHOS Lab and the Data as Relation project at ITU Copenhagen, as part of Stuart Geiger’s seminar talk on “Computational Ethnography and the Ethnography of Computation: The Case for Context” on 26th of March 2018. Tapping into his valuable experience, and position as a staff ethnographer at Berkeley Institute for Data Science, we wanted to think together about the role that computational methods could play in ethnographic and interpretivist research. Over the past decade, computational methods have exploded in popularity across academia, including in the humanities and interpretive social sciences. Stuart’s talk made an argument for a broad, collaborative, and pluralistic approach to the intersection of computation and ethnography, arguing that ethnography has many roles to play in what is often called “data science.”

Based on Stuart’s talk the previous day, we began the workshop with three different distinctions about how ethnographers can work with computation and computational data: First, the “ethnography of computation” is using traditional qualitative methods to study the social, organizational, and epistemic life of computation in a particular context: how do people build, produce, work with, and relate to systems of computation in their everyday life and work? Ethnographers have been doing such ethnographies of computation for some time, and many frameworks — from actor-network theory (Callon 1986Law 1992) to “technography” (Jansen and Vellema 2011Bucher 2012) — have been useful to think about how to put computation at the center of these research projects.

Second, “computational ethnography” involves extending the traditional qualitative toolkit of methods to include the computational analysis of data from a fieldsite, particularly when working with trace or archival data that ethnographers have not generated themselves. Computational ethnography is not replacing methods like interviews and participant-observation with such methods, but supplementing them. Frameworks like “trace ethnography” (Geiger and Ribes 2010) and “computational grounded theory” (Nelson 2017) have been useful ways of thinking about how to integrate these new methods alongside traditional qualitative methods, while upholding the particular epistemological commitments that make ethnography a rich, holistic, situated, iterative, and inductive method. Stuart walked through a few Jupyter notebooks from a recent paper (Geiger and Halfaker, 2017) in which they replicated and extended a previously published study about bots in Wikipedia. In this project, they found computational methods quite useful in identifying cases for qualitative inquiry, and they also used ethnographic methods to inform a set of computational analyses in ways that were more specific to Wikipedians’ local understandings of conflict and cooperation than previous research.

Finally, the “computation of ethnography” (thanks to Mace for this phrasing) involves applying computational methods to the qualitative data that ethnographers generate themselves, like interview transcripts or typed fieldnotes. Qualitative researchers have long used software tools like NVivo, Atlas.TI, or MaxQDA to assist in the storage and analysis of data, but what are the possibilities and pitfalls of storing and analyzing our qualitative data in various computational ways? Even ethnographers who use more standard word processing tools like Google Docs or Scrivener for fieldnotes and interviews can use computational methods to organize, index, tag, annotate, aggregate and analyze their data. From topic modeling of text data to semantic tagging of concepts to network analyses of people and objects mentioned, there are many possibilities. As multi-sited and collaborative ethnography are also growing, what tools let us collect, store, and analyze data from multiple ethnographers around the world? Finally, how should ethnographers deal with the documents and software code that circulate in their fieldsites, which often need to be linked to their interviews, fieldnotes, memos, and manuscripts?

These are not hard-and-fast distinctions, but instead should be seen as sensitizing concepts that draw our attention to different aspects of the computation / ethnography intersection. In many cases, we spoke about doing all three (or wanting to do all three) in our own projects. Like all definitions, they blur as we look closer at them, but this does not mean we should abandon the distinctions. For example, computation of ethnography can also strongly overlap with computational ethnography, particularly when thinking about how to analyze unstructured qualitative data, as in Nelson’s computational grounded theory. Yet it was productive to have different terms to refer to particular scopings: our discussion of using topic modeling of interview transcripts to help identify common themes was different than our discussion of analyzing of activity logs to see how prevalent a particular phenomenon, which were different than our discussion a situated investigation of the invisible work of code and data maintenance.

We then worked through these issues in the specific context of two cases from ETHOS Lab and Data as Relation project, where Bastian and Michael are both studying public sector organizations in Denmark that work with vast quantities and qualities of data and are often seeking to become more “data-driven.” In the Danish tax administration (SKAT) and the Municipality of Copenhagen’s Department of Cultural and Recreational Activities, there are many projects that are attempting to leverage data further in various ways. For Michael, the challenge is to be able to trace how method assemblages and sociotechnical imaginaries of data travel between private organisations and sites to public organisations, and influence the way data is worked with and what possibilities data are associated with. Whilst doing participant-observation, Michael suggested that a “computation of ethnography” approach might make it easier to trace connections between disparate sites and actors.

## The ethnographer enters the perfect information organization

In one group, we explored the idea of the Perfect Information Organisation, or PIO, in which there are traces available of all workplace activity. This nightmarish panopticon construction would include video and audio surveillance of every meeting and interaction, detailed traces of every activity online, and detailed minutes on meetings and decisions. All of this would be available for the ethnographer, as she went about her work.

The PIO is of course a thought experiment designed to provoke the common desire or fantasy for more data. This is something we all often feel in our fieldwork, but we felt this raised many implicit risks if one combined and extended the three types of ethnography detailed earlier on. By thinking about the PIO, ludicrous though it might be, we would challenge ourselves to look at what sort of questions we could and should ask in such a situation. We came up with the following questions, although there are bound to be many more:

1. What do members know about the data being collected?
2. Does it change their behaviour?
3. What takes place outside of the “surveilled” space? I.e. what happens at the bar after work?
4. What spills out of the organisation, like when members of the organization visit other sites as part of their work?
5. How can such a system be slowed down and/or “disconcerted” (a concept from Helen Verran that have found useful in thinking about data in context)?
6. How can such a system even exist as an assemblage of many surveillance technologies, and would not the weight of the labour sustaining it outstrip its ability to function?

What the list shows is that although the PIO may come off as a wet-dream of the data obsessed or fetisitch researcher, even it has limits as a hypothetical thought experiment. Information is always situated in a context, often defined in relation to where and what information is not available. Yet as we often see in our own fieldwork (and constantly in the public sphere), the fantasies of total or perfect information persist for powerful reasons. Our suggestion was that such a thought experiment would be a good initial exercise for the researcher about to embark on a mixed-methods/ANT/trace ethnography inspired research approach in a site heavily infused with many data sources. The challenge of what topics and questions to ask in ethnography is always as difficult as asking what kind of data to work with, even if we put computational methods and trace data aside. We brought up many tradeoffs in our own fieldwork, such as when getting access to archival data means that the ethnographer is not spending as much time in interviews or participant observation.

This also touches on some of the central questions which the workshop provoked but didn’t answer: what is the phenomenon we are studying, in any given situation? Is it the social life in an organisation, that life distributed over a platform and “real life” social interactions or the platform’s affordances and traces itself? While there is always a risk of making problematic methodological trade-offs in trying to get both digital and more classic ethnographic traces, there is also, perhaps, a methodological necessity in paying attention to the many different types of traces available when the phenomenon we are interested in takes place both online, at the bar and elsewhere. We concluded that ethnography’s intentionally iterative, inductive, and flexible approach to research applies to these methodological tradeoffs as well: as you get access to new data (either through traditional fieldwork or digitized data) ask what you are not focusing on as you see something new.

In the end, these reflections bear a distinct risk of indulging in fantasy: the belief that we can ever achieve a full view (the view from nowhere), or a holistic or even total view of social life in all its myriad forms, whether digital or analog. The principles of ethnography are most certainly not about exhausting the phenomenon, so we do well to remain wary of this fantasy. Today, ethnography is often theorized as documentation of an encounter between an ethnographer and people in a particular context, with the partial perspectives to be embraced. However, we do believe that it is productive to think through the PIO and to not write off in advance traces which do not correspond with an orthodox view of what ethnography might consider proper material or data.

## The perfect total information ethnographers

In the second group conversation originated from the wish of an ethnographer to gain access to a document sharing platform from the organization in which the ethnographer is doing fieldwork. Of course, it is not just one platform, but a loose collection of platforms in various stages of construction, adoption, and acceptance. As we know, ethnographers are not only careful about the wishes of others but also of their own wishes — how would this change their ethnography if they had access to countless internal documents, records, archives, and logs? So rather than “just doing (something)”, the ethnographer took a step back and became puzzled over wanting such a strange thing in the first place.

In the group, we speculated about if ethnographer got their wish to get access to as much data as possible from the field. Would a “Google Street view” recorded from head-mounted 360° cameras into the site be too much? Probably. On highly mediated sites — Wikipedia serving as an example during the workshop — plenty of traces are publicly left by design. Such archival completeness is a property of some media in some organizations, but not others. In ethnographies of computation, the wish of total access brings some particular problems (or opportunities) as a plenitude of traces and documents are being shared on digital platforms. We talked about three potential problems, the first and most obvious being that the ethnographer drowns in the available data. A second problem, is for the ethnographer to believe that getting more access will provide them with a more “whole” or full picture of the situation. The final problem we discussed was whether the ethnographer would end up replicating the problems of the people in the organization they are studying, which was working out how to deal with a multitude of heterogeneous data in their work.

Besides the problems we also discussed, we asked why the ethnographer would want access to the many documents and traces in the first place. What ideas of ethnography and epistemology does such a desire imply? Would the ethnographer want to “power up” their analysis by mimicking the rhetoric of “the more data the better”? Would the ethnographer add their own data (in the form of field notes and pictures) and through visualisations, show a different perspective on the situation? Even though we reject the notion of a panoptic view on various grounds, we are still left with the question of how much data we need or should want as ethnographers. Imagine that we are puzzled by a particular discussion, would we benefit from having access to a large pile of documents or logs that we could computationally search through for further information? Or would more traditional ethnographic methods like interviews actually be better for the goals of ethnography?

### Bringing data home

“Bringing data home” is an idea and phrase that originates from the fieldsite and captures something about the intentions that are playing out. One must wonder what is implied by that idea, and what does the idea do. A straightforward reading would be that it describes a strategic and managerial struggle to cut off a particular data intermediary — a middleman — and restore a more direct data-relationship between the agency and actors using the data they provide. A product/design struggle, so to say. Pushing the speculations further, what might that homecoming, that completion of the re-redesign of data products be like? As ethnographers, and participants in the events we write about, when do we say “come home, data”, or “go home, data”? What ethnography or computation will be left to do, when data has arrived home? In all, we found a common theme in ethnographic fieldwork — that our own positionalities and situations often reflect those of the people in our fieldsites.

## Concluding thoughts – why this was interesting/a good idea

It is interesting that our two groups did not explicitly coordinate our topics – we split up and independently arrived at very similar thought experiments and provocations. We reflected that this is likely because all of us attending the workshop were in similar kinds of situations, as we are all struggling with the dual problem of studying computation as an object and working with computation as a method. We found that these kinds of speculative thought experiments were useful in helping us define what we mean by ethnography. What are the principles, practices, and procedures that we mean when we use this term, as opposed to any number of others that we could also use to describe this kind of work? We did not want to do too much boundary work or policing what is and isn’t “real” ethnography, but we did want to reflect on how our positionality as ethnographers is different than, say, digital humanities or computational social science.

We left with no single, simple answers, but more questions — as is probably appropriate. Where do contributions of ethnography of computation, computational ethnography, or computation of ethnography go in the future? We instead offer a few next steps:

Of all the various fields and disciplines that have taken up ethnography in a computational context, what are their various theories, methods, approaches, commitments, and tools? For example, how is work that has more of a home in STS different from that in CSCW or anthropology? Should ethnographies of computation, computational ethnography, and computation of ethnography look the same across fields and disciplines, or different?

Of all the various ethnographies of computation taking place in different contexts, what are we finding about the ways in which people relate to computation? Ethnography is good at coming up with case studies, but we often struggle (or hesitate) to generalize across cases. Our workshop brought together a diverse group of people who were studying different kinds of topics, cases, sites, peoples, and doing so from different disciplines, methods, and epistemologies. Not everyone at the workshop primarily identified as an ethnographer, which was also productive. We found this mixed group was a great way to force us to make our assumptions explicit, in ways we often get away with when we work closer to home.

Of computational ethnography, did we propose some new, operationalizable mathematical approaches to working with trace data in context? How much should the analysis of trace data depend on the ethnographer’s personal intuition about how to collect and analyze data? How much should computational ethnography involve the integration of interviews and fieldnotes alongside computational analyses?

Of computation of ethnography, what does “tooling up” involve? What do our current tools do well, and what do we struggle to do with them? How do their affordances shape the expectations and epistemologies we have of ethnography? How can we decouple the interfaces from their data, such as exporting the back-end database used by a more standard QDA program and analyzing it programmatically using text analysis packages, and find useful cuts to intervene in, in an ethnographic fashion, without engineering everything from some set of first principles? What skills would be useful in doing so?

#### Syllabi

I’m getting a lot of requests for my syllabi. Here are links to my most recent courses. Please note that we changed our LMS in 2014 and so some of my older course syllabi are missing. I’m going to round those up.

• Cybersecurity in Context (Fall 2018)
• Cybersecurity Reading Group (Spring 2018, Fall 2017, Spring 2017)
• Privacy and Security Lab (Spring 2018, Spring 2017)
• Technology Policy Reading Group (AI & ML; Free Speech: Private Regulation of Speech; CRISPR) (Spring 2017)
• Privacy Law for Technologists (Fall 2017, Fall 2016)
• Problem-Based Learning: The Future of Digital Consumer Protection (Fall 2017)
• Problem-Based Learning: Educational Technology: Design Policy and Law (Spring 2016)
• Computer Crime Law (Fall 2015, Fall 2014, Fall 2013, Fall 2012, Fall 2011)
• FTC Privacy Seminar (Spring 2015, Spring 2010)
• Internet Law (Spring 2013)
• Information Privacy Law (Spring 2012, Spring 2009)
• Samuelson Law, Technology & Public Policy Clinic (Fall 2014, Spring 2014, Fall 2013, Spring 2011, Fall 2010, Fall 2009)

MIMS 2014

#### I Googled Myself (Part 2)

In my last post, I set up an A/B test through Google Optimize and learned Google Tag Manager (GTM), Google Analytics (GA) and Google Data Studio (GDS) along the way. When I was done, I wanted to learn how to integrate Enhanced E-commerce and Adwords into my mock-site, so I set that as my next little project.

As the name suggests, Enhanced E-commerce works best with an e-commerce site—which I don’t quite have. Fortunately, I was able to find a bunch of different mock e-commerce website source code repositories on Github which I could use to bootstrap my own. After some false starts, I found one that worked well for my purposes, based on this repository that made a mock e-commerce site using the “MEAN” stack (MongoDB, Express.js, AngularJS, and node.js).

Forking this repository gave me an opportunity to learn a bit more about modern front-end / back-end website building technologies, which was probably overdue. It was also a chance to brush up on my javascript skills. Tackling this new material would have been much more difficult without the use of WebStorm, the javascript IDE by the same makers of my favorite python IDE, PyCharm.

Properly implementing Enhanced E-commerce does require some back end development—specifically to render static values on a page that can then be passed to GTM (and ultimately to GA) via the dataLayer. In the source code I inherited, this was done through the nunjucks templating library, which was well suited to the task.

Once again, I used Selenium to simulate traffic to the site. I wanted to have semi-realistic traffic to test the GA pipes, so I modeled consumer preferences off of the beta distribution with $\alpha = 2.03$ and $\beta = 4.67$. That looks something like this:

The $x$ value of the beta distribution is normally constrained to the (0,1) interval, but I multiplied it by the number of items in my store to simulate preferences for my customers. So in the graph, the 6th item (according to an arbitrary indexing of the store items) is the most popular, while the 22nd and 23rd items are the least popular.

For the customer basket size, I drew from a poisson distribution with $\lambda = 3$.  That looks like this:

Although the two distributions do look quite similar, they are actually somewhat different. For one thing, the Poisson distribution is discrete while the beta distribution is continuous—though I do end up dropping all decimal figures when drawing samples from the beta distribution since the items are also discrete. However, the two distributions do serve different purposes in the simulation. The $x$ axis in the beta distribution represents an arbitrary item index, and in the poisson distribution, it represents the number of items in a customer’s basket.

So putting everything together, the simulation process goes like this: for every customer, we first draw from the Poisson distribution with $\lambda = 3$ to determine $q$, i.e. how many items that customer will purchase. Then we draw $q$ times from the beta distribution to see which items the customer will buy. Then, using Selenium, these items are added to the customer’s basket and the purchase is executed, while sending the Enhanced Ecommerce data to GA via GTM and the dataLayer.

When it came to implementing Adwords, my plan had been to bid on uber obscure keywords that would be super cheap to bid on (think “idle giraffe” or “bellicose baby”), but unfortunately Google requires that your ad links be live, properly hosted websites. Since my website is running on my localhost, Adwords wouldn’t let me create a campaign with my mock e-commerce website

As a workaround, I created a mock search engine results page that my users would navigate to before going to my mock e-commerce site’s homepage. 20% of users would click on my ‘Adwords ad’ for hoody sweatshirts on that page (that’s one of the things my store sells, BTW) . The ad link was encoded with the same UTM parameters that would be used in Google Adwords to make sure the ad click is attributed to the correct source, medium, and campaign in GA. After imposing a 40% bounce probability on these users, the remaining ones buy a hoody.

It seemed like I might as well use this project as another opportunity to work with GDS, so I went ahead and made another dashboard for my e-commerce website (live link):

If you notice that the big bar graph in the dashboard above looks a little like the beta distribution from before, that’s not an accident. Seeing the Hoody Promo Conv. Rate hover around 60% was another sign things were working as expected (implemented as a Goal in GA).

In my second go-around with GDS, however, I did come up against a few more frustrating limitations. One thing I really wanted to do was create a scorecard element that would tell you the name of the most popular item in the store, but GDS won’t let you do that.

I also wanted to make a histogram, but that is also not supported in GDS. Using my own log data, I did manage to generate the histogram I wanted—of the average order value.

I’m pretty sure we’re seeing evidence of the Central Limit Theorem kicking in here. The CLT says that the distribution of sample means—even when drawn from a distribution that is not normal—will tend towards normality as the sample size gets larger.

A few things have me wondering here, however. In this simulation, the sample size is itself a random variable which is never that big. The rule of thumb says that 30 counts as a large sample size, but if you look at the Poisson graph above you’ll see the sample size rarely goes above 8. I’m wondering whether this is mitigated by a large number of samples (i.e. simulated users); the histogram above is based on 50,000 simulated users. Also, because average order values can never be negative, we can only have at best a truncated normal distribution, so unfortunately we cannot graphically verify the symmetry typical of the normal distribution in this case.

But anyway, that’s just me trying to inject a bit of probability/stats into an otherwise implementation-heavy analytics project. Next I might try to re-implement the mock e-commerce site through something like Shopify or WordPress. We’ll see.

MIMS 2012

#### Discovery Kanban 101: My Newest Skillshare Class

I just published my first Skillshare class — Discovery Kanban 101: How to Integrate User-Centered Design with Agile. From the class description:

Learn how to make space for designers and researchers to do user-centered design in an Agile/scrum engineering environment. By creating an explicit Discovery process to focus on customer needs before committing engineers to shipping code, you will unlock design’s potential to deliver great user experiences to your customers.

By the end of this class, you will have built a Discovery Kanban board and learned how to use it to plan and manage the work of your team.

While I was at Optimizely, I implemented a Discovery kanban process to improve the effectiveness of my design team (which I blogged about previously here and here, and spoke about here). I took the lessons I learned from doing that and turned them into a class on Skillshare to help any design leader implement an explicit Discovery process at their organization.

Whether you’re a design manager, a product designer, a program manager, a product manager, or just someone who’s interested in user-centered design, I hope you find this course valuable. If you have any thoughts or questions, don’t hesitate to reach out: @jlzych