import { Helmet } from 'react-helmet'
import { Link } from 'react-router-dom'

import { CSS } from '../../../utils/domUtils'
import { PATHS } from '../../../utils/paths'
import BackToResources from '../../common/BackToResources'
import YouMightBeInterested from '../../common/YouMightBeInterested'
import { modifiedArticleItems } from '../../data/articles'
import TitleCard from './TitleCard'

const Blog3Route: React.FC = () => (
	<div className={CSS.BLOG}>
		<Helmet>
			<title>How Privacy Enhancing Technologies Keep Data Safe</title>
		</Helmet>
		<TitleCard />
		<div className={CSS.LIGHT_BACKGROUND}>
			<div className={CSS.CONTAINER}>
				<div className={CSS.BLOG_CONTENT}>
					<section>
						<div className={CSS.LEAD}>
							As artificial intelligence continues to advance, the volume of data required to train models
							has grown exponentially. This data-driven approach to AI, although very powerful, raises
							significant privacy concerns. From inadequate consent to the inclusion of sensitive
							information, the training process for AI models often involves substantial risks to data
							privacy.
						</div>
						<p>
							<Link to={PATHS.PRIVACY_ENHANCING_TECHNOLOGIES}>Privacy-Enhancing Technologies (PETs)</Link>{' '}
							- a suite of technological approaches designed to maximize the privacy of data while
							retaining its usefulness - are a leading tool for addressing these challenges, providing a
							variety of methods for safeguarding personal and sensitive information throughout the AI
							lifecycle.
						</p>
					</section>
					<section>
						<h2>Key Data Privacy Concerns in AI Model Training</h2>
						<h3>Data Collection and Consent</h3>
						<p>
							AI models are trained on large datasets that, depending on the purpose and industry, can be
							collected from a variety of sources, or scraped from the internet. Many people may therefore
							be unaware that their data is being used - or they may have an awareness of its potential
							use, but not have provided explicit consent. This lack of informed consent is a significant
							ethical and legal issue.
						</p>
						<h3>Inclusion of Personal Data</h3>
						<p>
							In many cases, personally identifiable information (PII) may be present in datasets used for
							training - exposing that sensitive information to risk if the datasets are not properly
							protected.
						</p>
						<h3>Data Security</h3>
						<p>
							The vast amounts of data used to train AI models are often stored in centralized
							repositories, making them vulnerable to breaches and unauthorized access.
						</p>
						<h3>Bias and Discrimination</h3>
						<p>
							If trained on biased data, or on data from a range of sources that does not accurately
							reflect the diversity of available information, AI can perpetuate negative biases -
							reinforcing social inequalities, and potentially leading to further discrimination.
						</p>
						<h3>Lack of Transparency</h3>
						<p>
							The specifics of how data is used during the training process can be unclear- making it
							difficult to create a system for assessing or auditing training processes in a way that
							reliably ensures that privacy regulations are being followed.
						</p>
					</section>
					<section>
						<h2>The Role of Privacy Enhancing Technologies</h2>
						<p>
							Privacy-Enhancing Technologies are increasingly being applied to the field of AI as a means
							of allowing model training to utilize the maximum amount of valuable data, while ensuring
							that privacy and security are not compromised. With different approaches yielding different
							results here are some of the most valuable techniques, and their applications.
						</p>
					</section>
					<section>
						<h3>Homomorphic Encryption (HE): Secure Data Processing</h3>
						<p>
							Homomorphic Encryption is a cryptographic technique which allows data to remain encrypted
							during processing and analysis - meaning that no raw information is ever revealed at any
							stage of the process.
						</p>
						<h5>Application</h5>
						<p>
							HE can be used to encrypt datasets before they are shared for use in AI model training. As
							this process happens while the data is still stored locally to the data controller, it means
							that there is no need for any raw data to be transferred or processed at any stage of
							training - so, even if the training infrastructure is compromised, all data will remain
							secure.
						</p>
						<p>
							HE is also valuable in the case of generative AI models, which can sometimes be trained on
							private documents or personal conversations. In this case, HE allows the model to learn from
							data without revealing its raw value and potentially exposing sensitive information.
						</p>
						<h3>Secure Multi-Party Computation: Collaborative Data Analysis</h3>
						<p>
							Secure Multi-Party Computation (SMPC) is a cryptographic protocol that allows multiple
							collaborating parties to jointly run computations based on all of their combined data inputs
							- while keeping the contents and origins of each input private.
						</p>
						<p>
							This is achieved by breaking down each contributor’s data into multiple different parts
							(called shares), which are then randomly distributed between the participants in such a way
							that no collaborator can piece together any original dataset based only on the information
							they have access to. Computations are then performed on these shares independently, and
							results are generated by combining them.
						</p>
						<p>
							SMPC is particularly powerful when combined with other PETs, such as homomorphic encryption
							- in which case, the data inputs would also remain fully encrypted throughout the process.
						</p>
						<h5>Application</h5>
						<p>
							SMPC is particularly useful for training AI models in industries such as healthcare, where
							different organizations (eg, multiple hospitals or research institutions) want to conduct
							training on a combination of their data, in order for it to benefit from the broadest and
							most diverse possible range of information. In this case, SMPC allows for the utilization of
							all data sources, without compromising patient privacy.
						</p>
						<h3>Federated Learning: Decentralized Data Training</h3>
						<p>
							Federated Learning is an approach to training AI models in which data remains on local
							devices at all times during training. The model updates, once trained locally on that data,
							are then shared with a central server and combined to create a complete model. This
							decentralized approach ensures that raw data never leaves any user&apos;s device, reducing
							the risks of data breaches and unauthorized access.
						</p>
						<h5>Application</h5>
						<p>
							Federated learning is particularly valuable in collaborative model training projects, and
							when data is distributed across multiple devices or locations - for example, when training
							data is drawn from mobile applications or IoT devices. The federated approach enables AI
							models to be trained on vast amounts of data without having to share sensitive information
							in a more vulnerable centralized location.
						</p>
						<h3>Differential Privacy: Protecting Data During Training</h3>
						<p>
							Differential Privacy is a mathematical approach which protects against data points being
							reverse-engineered or re-identified from analysis results. This is achieved by adding random
							‘noise’ to the data, so that no-one viewing the dataset or analysis results could tell which
							information originates from a ‘real’ dataset, and which have been artificially added. This
							is done in a highly controlled way, to preserve the integrity of the original datasets with
							minimal distortion.
						</p>
						<h5>Application</h5>
						<p>
							Differential privacy is particularly valuable when AI models are being trained on sensitive
							datasets - for example, those which contain financial or medical information. By
							incorporating differential privacy, models can be trained which provide accurate predictions
							and insights without exposing any individual’s data.
						</p>
						<p>
							In the context of generative AI, differential privacy can prevent a model from reproducing
							specific data points - for example, by generating content that closely resembles one single
							input, rather than an amalgamation of many. This significantly reduces the risk of sensitive
							information being accidentally leaked in the model’s output.
						</p>
						<h3>What’s next?</h3>
						<p>
							The integration of Privacy Enhancing Technologies into AI model training is a powerful step
							forward for developing an approach to AI which also addresses many significant privacy
							concerns. In many cases, a combination of multiple techniques is the most effective approach
							to achieve maximum privacy, and although this can result in significant computational
							overheads, a lot of effort within the PETs space is currently being devoted to improving the
							speed and efficiency of these approaches.
						</p>
						<p>
							As AI continues to evolve, then, it seems that PETs will remain crucial in ensuring that its
							development and deployment is done in a way that complies with data protection laws and
							regulations, as well as protecting individual privacy.
						</p>
					</section>
				</div>
				<YouMightBeInterested items={modifiedArticleItems} />
				<BackToResources />
			</div>
		</div>
	</div>
)

export default Blog3Route
