38 Terabytes Of Leaked Microsoft Data Includes Internal Teams Messages

Microsoft's AI team 'accidentally' exposed 38 terabytes of private data, including internal Teams messages, private keys and passwords

Collaboration Latest News

Published: September 20, 2023

Kieran Devlin

Microsoft’s AI team “accidentally” exposed 38 terabytes of company data, including over 30,000 internal Teams messages.

As first spotted by cloud security platform Wiz, Microsoft’s AI researchers were uploading training data for colleagues to train AI models for image recognition before inadvertently leaking the data. As well as internal Teams chats, the data included secrets, private keys and passwords.

Microsoft released an official statement, noting that no customer data was exposed and that no action is required by customers.

The statement wrote:

The information that was exposed consisted of information unique to two former Microsoft employees and these former employees’ workstations. No customer data was exposed, and no other Microsoft services were put at risk because of this issue. Customers do not need to take any additional action to remain secure.”

The software behind the leaked files was built using an Azure feature named “SAS tokens”. These tokens produce sharable links for users from Azure Storage accounts. Usually, the access level can be exclusive to specific files, but this link was configured to share the complete storage account.

Wiz found the link was accessible in June before notifying Microsoft about the data breach. The SAS token was removed the next day, with Microsoft saying it had addressed the problem. Microsoft has also redesigned SAS tokens to prevent similar leaks in future in mind.

“Like any secret, SAS tokens need to be created and handled appropriately,” Microsoft added. “As always, we highly encourage customers to follow our best practices when using SAS tokens to minimize the risk of unintended access or abuse.”

Wiz argued that these types of data breaches could occur more frequently as more AIs are used and trained.

“This case is an example of the new risks organizations face when starting to leverage the power of AI more broadly, as more of their engineers now work with massive amounts of training data,” Wiz said. “As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.”

AI’s Security (and Legal) Impact

Microsoft’s data leak story accelerates the growing concern over AI’s potentially transformative impact on cybersecurity.

Its own AI product, Bing Chat Enterprise, was launched this summer with its security feature set at the epicentre of Microsoft’s sell, with the tech giant stressing that the solution is designed to provide greater data security for businesses concerned about privacy and data breaches.

This was prompted by the privacy and security of business data most widely available generative AI solutions coming under close scrutiny in recent months.

For example, OpenAI’s ChatGPT, the most widely used generative AI service, saves user prompts to develop and improve its model unless users deliberately opt out. This has catalysed worries that employees might accidentally include proprietary or confidential data in their prompts, which ChatGPT then extracts to inform future queries. There is no clear obstacle to these data breaches.

In March, OpenAI revealed a bug in ChatGPT had resulted in data leaks. In June, the company was subject to a class action lawsuit filed in California federal court, alleging it extracted “massive amounts of personal data from the internet”. The suggestion claimed that OpenAI stole and misappropriated millions of peoples’ data from the internet to improve its AI models.

In July, it was announced that the US Federal Trade Commission (FTC) had opened talks with OpenAI about the risks to consumers from ChatGPT creating false information or information partly informed by leaked confidential data. The FTC is also assessing OpenAI’s approach to data privacy and how it extracts data to train and develop its AI.

AI is also redefining copyright law, with questions over the legal risks of how AI parses copyright-protected IP continues to grow.

Again, Microsoft has proactively looked to address these concerns, having introduced several AI customer commitments ahead of the launch of Bing Chat Enterprise and its flagship AI tool, Copilot. Earlier this month, Microsoft announced a Copilot Copyright Commitment that intends to assuage worries around IP infringement among those planning to sign up for Microsoft’s AI-powered productivity tool.

Microsoft Teams UCaaS

Brands mentioned in this article.

Microsoft