NEWS: UC San Diego Health Information Security Team wins the 2024 UC Tech Golden Award for IT Security

The UC San Diego Health Information Security Team won Gold in the IT Security category at the 2024
UC Tech Awards. This top honor recognizes their groundbreaking “PHI in Email Cleanup
Project,” a comprehensive effort to dramatically reduce the risk of privacy breaches by
proactively scanning and eliminating vast stores of sensitive patient data from email systems —
protecting UC San Diego Health’s patients, staff and reputation.

Summary

Phishing is one of the most common, and successful, threats to healthcare organizations. Given
that email is commonly used for communication and collaboration, often including patient
information, a successful phish can result in a breach of protected information. While many
tools are in place to reduce the likelihood of a successful phishing attack, a reduction in the
amount of protected data in email can substantially reduce the impact when phishing does
occur.

Narrative

Phishing is the most common type of attack used by threat actors against all organizations,
including healthcare. Phishing is a type of social engineering attack that attempt to trick users
into clicking a link, or downloading a file, with the hope of stealing credentials or spreading
malware. Organizations invest significantly in measure to reduce this risk of this type of attack
using email security tools that can detect and block malicious emails (UCSD Health blocks
approximately 10 million emails per month), but no protection is 100% effective, and as a result
phishing continues to represent a major risk.

The most common type of phishing is credential phishing, the goal of which is to steal
usernames and passwords to gain access to internal systems. Given the use of cloud email
systems, the attacker will often first attempt to access user emails using the stolen credentials,
combined with various techniques to bypass Multi-Factor Authentication. Given the broad use of email systems in healthcare for document sharing, collaboration, reporting, and other types
of communication there is a high likelihood that user emails will contain some amount of
sensitive information that can be stolen once an attacker has access. This stolen information
can lead to a breach of sensitive data, and in the case of Protected Health Information (PHI) this
can result in a Privacy Breach that can have significant impact on the organization, including
fines, penalties, litigation, and reputational harm.

In most organizations significant effort is put into deploying tools to reduce the likelihood of a
phishing attack being successful, including training, phishing simulations, technological controls,
and others. While these tools can be very effective, the impact when a phish is successful can
be significant given the potential for large volumes of data contained in email. Because of risk
of a significant breach resulting from a successful phish, UC San Diego Health has implemented
a program to significantly reduce the impact of a loss of data by scanning emails for potentially
sensitive data and implementing processes to delete this data from user email accounts.

To combat the risk of phishing related data breaches UC San Diego Health launched a first of its
kind risk mitigation effort in early 2023. Recognizing that over time email systems have become
an archive of sensitive data UC San Diego Health began the process of scanning all accounts
withing our email tenant to identify any email that contained large amounts of PHI and other
sensitive data (PII, credit cards, etc.). Using a variety of tools from vendor partners, including
Spirion and Microsoft, over 40,500 email accounts were scanned using both static rules (exact
data match), and dynamic rules (combinations of different data types). Data types scanned for
included:

Social Security Number
Medical Record Number
Epic Patient ID (different from MRN)
Credit Card Number
Date of Birth
Passwords
Driver’s License Number or Government-Issued ID
Insurance Numbers/Subscriber Number
Payor Name ie: Humana, Tri-Care Date of Birth, Blue Cross
Patient e-mail (private or work)
Patient Name Passwords
Account Number Hospital Account Record
Admission Date, Discharge Date, Procedure Date

Since the most significant and impactful risk associated with a data breach is the amount of
data stolen it was determined that the scan would be configured to look for ‘bulk’ data. To
balance the risk reductions, and the time required to scan, only emails that contained more
than 200 records per email (in either the body or an attachment) were included in the results.
Given this scope, the number or accounts, the age of many accounts (some 20+ years old), and
the volume of data within our email tenant, the scanning required approximately 20 virtual
servers running the scanning application and took close to nine months to complete.

At the conclusion of the scan we identified over 6,500 users that had at least one email with
more than 200 records of sensitive data, over 154,000 individual emails, and approximately 1.5
Billion individual pieces of sensitive data.

As email is commonly a secondary data source, with the primary record being in other systems,
it was determined in consultation with Legal, Privacy, and Compliance, that the data identified
would be deleted. Certain data was exempted from this, such as any emails or mailboxes on
legal hold, and a process was defined to allow for other emails required for business or
operational purposes to be retained as well. The process was defined to allow each user to self-
identify any email that they felt should not be deleted, and an approval process that required
both a Director/DBO/Chair approval, and ultimately an Executive Director/Dean approval.

Certain users, in particular those with a large number of emails, or those with very large
volumes of data, were reviewed outside of this process, but similar approval were required in
order to accept the risk associated with the retention of the data. Given the risk of continuing
to store exempted emails in user inboxes, those emails approved for retention are moved to
the user’s OneDrive account.

Given the combination of tools that were used to complete the scan, it was determined that
there was no commercially available technology to allow for the consolidation of the scan data,
to show individuals there scan results, and to support a deletion exception process, so the
UCSD Web Services team took on the task of building such a tool. In collaboration with the
scanning vendor a web application was developed to allow users to review the emails identified
in the scan, and to select any that they believed should not be deleted though this process.

Further, the web app facilitated the need to allow for exemption approval, and ultimately
integrated into the deletion script to flag any emails that had been approved for retention.

With such a large number of users impacted, and the complexity of the data sets, a detailed
communication plan was developed to ensure organization and user awareness of the process.
A series of emails were sent to all users, to those users with emails in the scan results, and to
the approvers to inform them of the process and what their actions were. Presentations were
delivered in organization wide town halls, team meetings, team huddles, and to executive and
leadership committees. In addition to communication about the cleanup and deletion process,
resources and documentation outlining best practices for collaboration, data sharing, and the
use of email containing sensitive data was shared by email, on internal web-pages, and through
a variety of in person forums and presentations.

As of May 1, 2024 the final wave of users in the process has begun, and all users with emails in
the scan have been notified of the emails, and the required actions. The full process for each
wave runs for 3 weeks, with one additional week for the approver workflow, after which all
non-exempted emails will be deleted. The final wave is expected to be completed, and emails
deleted, by May 29, 2024