PORTIA Workshop

on Sensitive Data in Medical, Financial, and Content-Distribution Systems


Alessandro Acquisti (Carnegie Mellon University)
Privacy, Rationality, and the Economics of Immediate Gratification

Dichotomies between privacy attitudes and behavior have been noted in the literature but not yet fully explained. We apply lessons from the research on behavioral economics to understand the individual decision making process with respect to privacy in electronic commerce. We show that it is unrealistic to expect individual rationality in this context. Models of incomplete information, bounded rationality, and immediate gratification offer more realistic descriptions of the decision process and are more consistent with currently available data. In particular, we present a model that shows why individuals who may genuinely want to protect their privacy might not do so because of psychological distortions well documented in the behavioral literature. The model shows that these distortions may affect not only 'naive' individuals but also 'sophisticated' ones, and that this may occur also when individuals perceive the risks from not protecting their privacy as significant. Finally, we present preliminary evidence from an ongoing series of surveys and experiments aimed at testing the model's predictions.

Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu (Stanford University)
Anonymizing Tables for Privacy Protection

Download PDF here.

Carol Coye Benson (Glenbrook Partners)
Preventing Identity Theft: Consumer Credit Files, Banks and Privacy

The current credit-granting infrastructure of is one of the great drivers of the American economic engine, and the envy of many other countries. This same infrastructure, unfortunately, is also the unwitting enabler of identity theft. Stopping this crime will require "locking down" the infrastructure and giving consumers more control over their credit files. The challenge is in authenticating the consumer. Banks may play a critical role in resolving this issue.

Joan Feigenbaum (Yale University)
Are "Trusted Systems" Important for Privacy Protection?

Trusted-platform initiatives such as Microsoft's Next-Generation Secure-Computing Base and the industry-wide Trusted Computing Group project are the subject of signicant research and development now. The goal of these initiatives is to change a fundamental fact about networked, general-purpose computers that is often viewed as a barrier to security: Once data are sent from one machine to another, the sender loses control over them. Trusted-platform designs offer hardware-based, cryptographic support for proofs that a potential receiver's machine is running an approved software stack. By making such proofs prerequisites for the transfer of sensitive data, owners of these data can ensure that only authorized applications will be run and only authorized actions will be taken by users.

The best publicized motivation for this type of "remote control" of networked computers is copyright enforcement for entertainment content, but many people have claimed that they are much more widely applicable. In particular, the claim is often made that, in application domains in which sensitive data abound, such as healthcare and finance, data protection would be greatly aided by widespread adoption of trusted systems.

The purpose of this talk is to examine the validity of this claim. Is circumvention of data-protection systems (the type of attack that might be thwarted by trusted systems) a significant barrier to secure information systems in healthcare and finance, or are there more mundane barriers that are actually more significant, to wit:
  1. Many sensitive data objects are small (e.g., names, social-security numbers, or one-bit answers to medical tests) and hence easy to transmit via low-tech channels such as phone calls;
  2. Regulatory regimes (e.g., HIPAA or Graham-Leach-Bliley) are non-deterministic and, more generally, hard to automate;
  3. Critical personnel are poorly trained, and enterprises often lack proper (human) procedures that must complement information systems in order for end-to-end handling of data to go smoothly;
  4. Many regulatory regimes *permit* data flows that data subjects would consider to be "leaks"!

The talk will conclude with an attempt to characterize application domains in which trusted systems could be most helpful in secure-policy enforcement.

Robert Grimm (New York University)
Security Challenges for Rich-Media Educational Environments

Medicine is undergoing a major and growing chasm between scientific knowledge and medical practice. On one side, rapid advances in molecular biology are reshaping medical science. On the other side, managed care has resulted in drastically reduced lengths-of-stay in hospitals and a general compartmentalization of medical practice. As a result, it is becoming increasingly difficult to train physicians that can provide state-of-the-art medical care, as medical practitioners cannot keep up with the rapidly changing basic sciences, do not have enough context to make appropriate diagnoses, and may rely on out-dated procedures or drug regimens.

The premise of the Infrastructure for Rich-Media Educational Environments (IRMEE) project at New York University is that a sustainable solution requires the integration of medical knowledge across specializations, between theory and practice, and across geographical boundaries and time. The chosen approach is to create a web-based rich-media environment that (1) provides ubiquitous and lifelong access to educational and scientific materials, (2) structures educational content along narrative lines to re-establish missing context, and (3) fosters a community of students and practitioners not bound by geography. Experiences at NYU s medical school with a set of prototypes support the general approach, demonstrating that rich-media educational environments have advantages over textbooks, educational videos, and lectures alike.

Unfortunately, the straight-forward multi-tier web architecture used for these prototypes has serious scalability constraints and does not provide an adequate basis for realizing the larger vision of IRMEE. To overcome this major deficiency, we are building a more scalable content delivery infrastructure. The goal is to combine the usability of familiar web content management systems with the scalability of peer-to-peer content distribution networks (CDNs) built on distributed hash tables. We aim to achieve this goal by allowing for the execution of application-specific services, which are expressed through scripts, within the content distribution network instead of only on the server. Our architecture leaves both clients and servers unchanged, thus letting us track any advances in web functionality. Furthermore, its scripting-based programming model is already familiar to web developers, thus significantly reducing the barrier to entry in developing applications.

While we believe that a scripting-enhanced CDN provides an appropriate solution for scaling IRMEE, our architecture also raises two important security challenges. First, since the CDN is implemented as a peer-to-peer system, content integrity becomes an important issue. Without additional safeguards, CDN nodes can modify or replace content with their own, arbitrary versions. Some replaced content may be obvious consider spam-like advertisements but other content may be considerably less obvious and consequently more dangerous consider falsified medical research reports. To make matters worse, established solutions for ensuring content integrity, such as cryptographic hashes, are ineffective in our architecture, as scripting-enabled CDN nodes, by definition, may modify or even create content.

Second, some content, such as students  contributions to discussion groups, may refer to actual patients  case histories. As medical data must be kept private, access to user-generated content should be restricted to authorized users. However, as the peer-to-peer CDN is generally untrusted, authorization can only be performed by the original servers. The simplest solution is to partition all content into two categories, public and private. Public content will be accessible through the CDN, while private content can only be accessed directly through the corresponding server, which is protected through SSL and proper authentication. However, this solution also has the disadvantage of eschewing any scalability advantages of the CDN for private content. Overall, from a security perspective, the issue is to provide strong security guarantees for a relatively untrusted network, while also remaining compatible with the existing web-based infrastructure.

Rachel Greenstadt and Jean Francois Raymond (Harvard University)
Applications of Trusted Computing for Medical Privacy

Download PDF here.

Benjamin Grosof (MIT Sloan School of Management)
Rules Knowledge Representation for Privacy Policies: RuleML, Semantic Web Services, and their Research Frontiers

We give an overview of how the field of rules knowledge representation (KR) bears on the policy aspect of privacy, including current techniques, theory, standards, and research frontiers. Our own previous contributions to rules KR include the Situated Courteous Logic Programs KR (SCLP), the RuleML emerging standard for Semantic Web rules which is based on SCLP (co-founder), and their applications to e-contracting, financial information integration, and trust policy management in Semantic Web and Web Services.

Privacy policies can be viewed as a broad special case of trust authorization policies -- those in which authorization decisions are made about access to information. Such policies in today's commercially deployed systems are usually well represented as rules, but those systems' designs do not yet exploit the last decade's research about rules KR, notably SCLP and RuleML. This creates a major set of opportunities for privacy research. First, today's leading Web standards for access control policies (XACML) and client privacy policies (P3P) are based on rules. Second, the policy aspect of Web Services, particularly for security, is now a major focus of industry efforts in Web Services overall. Third, a small number of vertical industry domains appear to be suitable as early adopters/investors for this technology direction. These verticals include financial services (we give examples), health, and police/military.

Stanislaw Jarecki (University of California, Irvine)
Patrick Lincoln (SRI International)
Vitaly Shmatikov (SRI International)
Handcuffing Big Brother: an abuse-resilient transaction escrow scheme

We propose a new approach for privacy-preserving transaction escrow that balances citizens' desire for privacy and the need of government agencies to collect accurate information about financial, commercial and other transactions, and to quickly identify certain patterns of activities. Our escrow scheme provides a provable anonymity and privacy guarantee to transaction participants unless their transactions match a pre-specified pattern, or are subpoenaed by a court warrant.

Neither selective disclosure, nor efficient subpoena can be implemented using conventional public-key escrow mechanisms. Moreover, traditional escrow schemes for protecting keys, identities, and data assume that the escrow agency is trusted not to perform unauthorized searches on the data, leak the keys to third parties, and so on. They are vulnerable
to the insider threat: a malicious or careless employee can exploit or disclose citizens' personal data without authorization. By contrast, our transaction escrow scheme is provably secure against malicious misbehavior by the escrow agency's employees.

The key innovation underlying our technology is "verifiable transaction escrow." We propose to equip existing commercial and governmental databases and other information processing centers with transaction escrowing capabilities. Transaction participants will encrypt the data themselves, but correctness of the escrows will be verified, in a privacy-preserving way, using efficient zero-knowledge protocols. This will guarantee that the escrow agent can de-anonymize the entries and remove the encryption *if and only if* the data match a certain pattern or one of the transaction participants has been subpoenaed. For example, a national security agency may collect encrypted passenger itineraries from commercial airlines and require automatic disclosure for the records of any passenger who traveled to the Middle East 5 times or more within a year. In another application, a financial regulator may require automatic disclosure of all transfers to a particular group of accounts as soon as the total amount of these transfers exceeds $10,000 - even if the transfers are performed using different banks and wire services! The transfers not matching this pattern will remain completely anonymous and undecipherable even while stored in government-controlled databases, thus alleviating concerns of privacy advocates. Until their creator is subpoenaed, it is provably infeasible even to determine whether two entries refer to the same individual or not.

The key features of our transaction escrow scheme are (i) selective disclosure for transaction records that match certain patterns, (ii) complete anonymity and privacy for all other records without requiring the escrow agency to trust the subjects of monitoring or vice versa, and without involving a trusted intermediary in every transaction, and (iii) practical efficiency. Our approach also obviates the need for independent auditing of database access. We provide strong cryptographic guarantees that it is simply impossible to access the database in any manner other than that explicitly permitted by the selective disclosure policy.

The overall objective of our project is to provide a cryptographically protected balance between citizens' privacy and the need of authorities to collect certain well-defined information. In health reporting, law enforcement, anti-terror, secure audit, and other applications, if honest users were assured of the privacy of their data, there would be higher levels of compliance and less need for privacy-by-obscurity.

Bret Kiraly, Andy Podgurski, and Sharona Hoffman (Case Western Research University)
Security Vulnerabilities and Conflicts of Interest in the Provider-Clearinghouse*-Payer Model

Download the PDF.

Rick Luce (Los Alamos National Laboratory)

Seeking a Sustainable Balance for Governmental Technical Reports: Public Access vs. Security

What happens in a world where the events of September 11 turn the access to governmental reports from a shining example of public reporting and accountability into a safety and security liability? When the definitions for what is sensitive information and who are legitimate users is dynamic and changes literally overnight, what are the requirements for systems that support and deliver such information? As early as 1994 Los Alamos National Laboratory led the nation in making its technical report literature available via the Web to the global community, only to face new definitions of what is or is not appropriate for Web access, and what constitutes
sensitive information. Building on the LANL experience, this talk will outline some of the issues that governmental agencies face in publishing reports that on one hand contain information that legally is required to be widely disseminated, yet on the other hand may be of use in unforeseeable ways by malicious actors.

Daniel R. Masys (University of California, San Diego)
Medical Data: It's Only Sensitive if It Hurts When You Touch It

Effective health care is built on a foundation of trust between provider and patient.  Trust, in turn, requires that confidentiality of personal information be maintained, a principle that has been a cornerstone of health care since the time of Hippocrates.  Professional codes of ethics regarding confidentiality became state regulations in the 20th Century, and public concern over medical data privacy gave rise to the federal Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, which established uniform requirements for protecting the confidentiality of medical data effective in 2003.  Among the provisions of the Privacy Rule are new rights of individuals to inspect and copy their medical records, obtain a record of disclosures of their data, request amendments to their records, and request restrictions on how their medical data is used and who can see it.  A companion HIPAA Security Rule will become effective in 2005.  The Security rule addresses policy and technology requirements for protecting health data contained in computers and transmitted over networks, that are similar to best practices used in other industries.

Healthcare provides unique challenges with respect to data security.  Conventional role-based security is difficult to implement due to the many (often underspecified) roles played by individuals, institutions and processes, including primary and specialist providers, support services, billing and payment arrangements, public health and regulatory agencies.  Society upholds the notion of confidentiality but reserves the right to pre-empt confidentiality protections in cases of communicable diseases and other threats to public health.  Special protections are written into law for certain types of medical data, such as mental health records, substance abuse, adoption, abortion, and HIV status.  Creating technology that recognizes these types of data in narrative records of care is an unsolved challenge.  Perhaps most importantly, simple models of information security fail for two reasons:  the healthcare system is so complex that individuals generally cannot comprehend the full effect of their decisions to withhold or release medical data, and their preferences regarding confidentiality may change over time in ways they cannot predict.  When an individual most needs to change a prior preference, they may be unconscious or otherwise cognitively impaired and unable to do so. 

Currently available data suggests that far more harm (in the form of medical errors) results from lack of accessibility of medical data than from breaches of confidentiality, and evidence is accumulating that HIPAA is stifling clinical research, epidemiology, and the advancement of health science. Practical measures such as the assignment of a healthcare-specific unique identifier for each person have underpinned national health systems in other countries, but have been blocked in this country by privacy rights advocates, contributing to an inefficient and sometimes life threatening fragmentation of care. As a nation, we seem to be incapable of achieving consensus on the appropriate uses of personal health data.

New models exist for “e-consent” and healthcare-specific role-based security, but these technologies have yet to be tested and widely accepted.  Whatever the pathway of future innovation, it is clear that an electronic security infrastructure for medical data will need to have great flexibility to adapt to context-specific preferences and policies, and will need to incorporate understanding of the semantic content as well as the structure of healthcare records. 

Nina Mishra (HP Labs/Stanford) and Kobbi Nissim (Microsoft)
How Auditors May Inadvertently Compromise Your Privacy

Download PDF here.

Prakash Nadkarni,Rohit Gadagkar, Charles Lu, Aniruddha Deshpande, Kexin Sun, and Cynthia Brandt. (Yale University)
Security in the context of a Generic Clinical Study Data Management System

TrialDB is a generic clinical study data management system (CSDMS) that is used at Yale by a several departments, as well as by several centers nationally. In a system that is intended to facilitate the logistics of prospective clinical trials, authorities such as Daniel Masys have pointed out that there is a conflict between the need for maintaining patient anonymity, and the support of automation for tasks such as patient appointment and follow-up: if a policy is made not to store any form of personal health information in the database, such automation becomes impossible. Also, in situations such as cancer chemotherapy, where the decision as to whether or not to escalate the dose of highly toxic drugs is made based on patient response and occurrence of adverse effects, the immediate consequences of accidentally escalating dosage for the wrong patient - namely, death or disability - far outweigh the risks of identity disclosure.

Our first step towards implementing secure practices was to have a policy in place that specifies the appropriate level of confidentiality for a given type of clinical study (e.g., a retrospective study vs. a prospective one, or a survey vs. one that involves major therapeutic interventions which are themselves associated with significant risk). Second, patient-related data security is not the only kind that must be considered: one must define what the various types of participants in a particular study - investigators, administrators, data entry personnel - need to know in order to function and what they do not: the definition of standard "roles" greatly assists system design and implementation. The technical aspects of implementing security are considerably easier than they were a few years ago, with mature toolkits such as the Microsoft .NET framework shielding the developer from the low-level details of particular encryption or message-digest algorithm.

Zachary N. J. Peterson, Randal Burns, and Adam Stubblefield (The Johns Hopkins University)
Limiting Liability in a Federally Compliant File System

Download PDF here.

Daniel Schutzer (Citigroup)
Financial Services Viewpoint: Towards the Management and Handling of Sensitive Data

The needs of society, financial institutions, and the individual, regarding the use, handling and management of sensitive data (especially financially-related data) is complex and often in conflict with one another. The need to audit financial transactions, to detect anomalous behavior and obtain evidence that can hold up in court is necessary in support of fraud and threat risk management, dispute and error handling, incident response and forensic analysis, and regulatory reporting requirements. This need often conflicts with the individual's concern for privacy and desire for anonymity. Customized customer service, sales, and products requirements often introduce further conflict with indiviudal privacy needs. The introduction of strong authentication (e.g. multi-factor), access controls, and information protection technology (e.g. cryptography and trusted agent technology) could serve the dual needs of personal privacy protection and enhanced security if they could be designed with all these competing needs in mind. Unfortunately seven key design requirment hurdles need to be overcome in order to simultaneoulsy achieve these complex interacting needs. They are:
  1. Be implementable and operated at affordable cost to both customer and FI.
  2. Be easy, convenient and intuititive.
  3. Be compatible with prevailing accepted customer behavior.
  4. Be able to support efficient targeted marketing, product and service customization and fraud and threat risk management information needs.
  5. Be able to preserve a customer's sense of privacy and control over their personal information.
  6. Be able to support trusted communications, not only between Financial Institution and customer to each other (assured mutual authentication), butwith third parties (e.g. auditors, regulators, merchants, distributors, IT product providers, and trusted intermediaries).
  7. Allow for all of the above to work in the presence of real world software, that exists with known and unknown bugs and vulnerabilities, in an evironment of continuous patching and in the presense of social engineering fraud schemes, such as phishing.

Anna Slomovic (Electronic Privacy Information Center)
Health Data Flows: Where Pets Can Help

Patients and physicians have increasing privacy concerns as health information moves into electronic form. Patients are concerned because, despite Notice of Privacy Practices, they do not know who sees their health information in the normal course of business and have no mechanism for controlling access. Physicians are concerned because electronic health information systems raise the possibility that their actions can be tracked and available for analysis by third parties, such as accreditation agencies, insurers and licensing authorities. In 1997 the National Research Council report For the Record: Protecting Electronic Health Information identified two categories of privacy and security concerns: concerns about inappropriate releases from individual organizations and concerns about systemic flows of information throughout the health care and related industries. This talk examines permissible data flows within and between health care organizations and recommends areas in which technical solutions may create a more privacy-friendly environment. Although health care organizations are creating policies and procedures to minimize risks of inappropriate disclosures, many ÒauthorizedÓ users do not need to have individually identifiable information to do their jobs. In fact, health care organizations often have special procedures for VIPs or celebrities to prevent them from being subject of curiosity. There are several areas in which technical solutions can help limit information within health care organizations without compromising quality of care. Risks in information flows between organizations arise because the health care system is complex, fragmented and includes many different types of organizations. Many data flows fall into the categories permitted under the HIPAA Privacy Rule, such as treatment, payment, health care operations, public health, and disclosures required or permitted by law. There are areas in which technology can play a role to limit disclosures of individually identifiable information in ways that would protect patients and physicians while permitting researchers, public health authorities, and others.