This is part 2 of our 2-part series on why AI can’t replace our volunteers — professional translators working in teams to ensure the best and safest results for our non-profit clients. If you haven’t read it, you can read the first part here (opens in a new tab) and then come back here.
3. Large Language Models may ’feed’ off our input
There are ethical and practical limitations of feeding all types of information into AI models, and these need to be addressed.
’Sensitive’ data is characterised as information that, if exposed or misused, can lead to significant harm to individuals or organisations. These include: Personally Identifiable Information (PII); Protected Health Information (PHI); Financial Information; Legal and Confidential Documents; Sensitive Personal Characteristics.
LLMs, in particular, are built on the principle of continuous learning and improvement. When you interact with a public AI model (like many free-to-use chatbots), your inputs, prompts, and even the generated responses are often collected and analysed. This ’mining’ of data serves several purposes:
- Model Improvement: The primary reason is to fine-tune the model, improve its understanding of nuances, expand its knowledge base, and correct its errors. The more diverse and realistic the data it learns from, the better it becomes at predicting relevant outputs.
- Feature Development: Identifying common user patterns and needs can inform the development of new features or specialised AI models.
- Research and Development: Companies use this data for ongoing research into AI capabilities, pushing the boundaries of what these models can do.
Many public LLMs state in their terms of service that data submitted by users may be used to train and improve their models. This means our client’s sensitive information could become permanently embedded in the model’s knowledge base. Even if not directly accessible, there’s a risk of it being inadvertently ’regurgitated’ in future responses to other users or through complex prompt injection attacks.
Once data is ingested by an LLM for training, it’s incredibly difficult, if not impossible, to guarantee its complete removal (the ’right to erasure’ appears to represent a huge technical challenge). This directly conflicts with privacy regulations like GDPR, which grant individuals the right to have their data deleted. Therefore, sharing regulated data (like PHI or PII) with a general-purpose LLM provider that doesn’t offer specific compliance guarantees (e.g., a Business Associate Agreement for HIPAA) can lead to severe legal penalties, fines, and reputational damage for the organisation sharing the data.
Before any sensitive data is fed into an AI, it should ideally undergo rigorous human review and anonymisation/pseudonymisation to remove PII and other identifying markers.
While AI offers transformative potential, its reliance on data mining means that we must all exercise extreme caution when dealing with sensitive information and AI. Organisations like OLI and our translators must prioritise data privacy and security, understanding that not all data is suitable for treatment by general-purpose AI models, and that human oversight or specialised, secure AI solutions (on-premise AI deployments or private cloud-based LLM instances) are often required.
4. AI and IP
The rise of AI in translation, particularly with Large Language Models (LLMs), presents a complex web of ethical challenges, especially concerning the Intellectual Property (IP) rights of both original authors and human translators. These challenges stem from how AI models are trained, how they generate content, and the evolving legal landscape surrounding AI creativity (Generative AI).
The most fundamental challenge arises from the training data of LLMs. These models learn by ingesting vast amounts of text from the internet, digitised books, articles, and other sources. Much of this data is copyrighted. When LLMs ’mine’ this data, they are essentially making copies and creating derivative works in the form of their learned statistical patterns. The question is whether this constitutes copyright infringement. Authors have argued that their works are being used without permission or compensation, potentially undermining their economic rights. On February 11, 2025, a federal court in the USA issued the first major decision concerning the use of copyrighted material to train AI, but it is not the final word on the issue of copyright infringement in the AI training context.
More generally, authors and their representatives contend that the scale of copying is immense, and the resulting AI models directly compete with human creators, thus exceeding the bounds of fair use. There’s a growing demand for clear ’opt-out’ mechanisms that allow authors to prevent their works from being used for AI training. Without such mechanisms, authors feel they have no control over how their intellectual property is exploited. While LLMs don’t typically ’cite’ sources, there’s a risk that their output might inadvertently or intentionally reproduce copyrighted phrases, styles, or even entire passages from their training data. This raises questions of plagiarism and proper attribution, which are central to academic and creative integrity.
Translators face a distinct set of IP challenges. A human translation is generally considered a ’derivative work’ and is protected by copyright, with the translator typically holding the copyright to their specific translation (unless assigned otherwise by contract). AI blurs these lines. While on one hand, AI is trained on vast swaths of data, some of which might be copyrighted translations, on the other, to what extent can a translator post-editing AI translations claim ’authorship’ of their translation?
There is a long way to go to solve all of these ethical dilemmas. As long as the legal framework isn’t clear, a human translation will always be safer than an AI translation, from the point of view of copyright, at least.
5. AI is very resource-intensive
The natural resources consumed by AI are another significant challenge facing the widespread adoption and sustainable development of artificial intelligence. This expense isn’t just financial; it encompasses vast consumption of energy, specialised hardware, and even water, leading to substantial environmental implications.
The core of AI, particularly deep learning and Large Language Models (LLMs), relies on immense computational power. Training a cutting-edge LLM involves processing quintillions of parameters across petabytes of data (one petabyte equals one quadrillion bytes). This requires
- Thousands of GPUs/TPUs: Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) are specialised processors optimised for the parallel computations needed for neural networks. Thousands of these powerful chips run continuously, often for weeks or even months, to train a single large model. For instance, training OpenAI’s GPT-4 reportedly cost over $100 million in computing alone, with Google’s Gemini Ultra training costs estimated at $191 million.
- Massive Data Centers: These computations occur in data centres: large, climate-controlled warehouses filled with racks of servers. These facilities consume incredible amounts of electricity, not just for the computational work but also for crucial cooling systems. A single large AI data centre can consume as much power as 100,000 households.
- High Energy Consumption: The energy footprint is staggering. While the figures vary, estimates suggest that data centres could account for 20% of global electricity use by 2030–2035, with AI being the most significant driver. A single AI image generation can consume as much energy as fully charging a smartphone.
Running AI models also consumes significant energy. Every time a user interacts with an AI (e.g., asking ChatGPT a question, generating an image), energy is expended. With billions of daily AI interactions, the cumulative energy consumption adds up dramatically. A ChatGPT query, for instance, can consume 5–10 times more electricity than a simple web search.
The demand for AI has made the need for high-performance computing hardware surge, leading to higher resource extraction, more complex supply chains, and increased electronic waste (e-waste). The manufacturing of GPUs, TPUs, and other AI-specific components requires significant amounts of rare earth minerals and other raw materials, often extracted through environmentally damaging processes. The global supply chain for these components is intricate, involving substantial energy consumption for manufacturing, transportation, and assembly. The rapid pace of AI development leads to the quick obsolescence of hardware. As newer, more powerful chips are released, older ones are discarded, contributing to a growing e-waste problem that contains hazardous materials.
The intense heat generated by thousands of continuously running processors necessitates sophisticated cooling systems in data centres. Many data centres rely on chilled water systems to dissipate heat and prevent server overheating. This requires vast quantities of fresh water. Some estimates suggest that Microsoft’s water consumption increased by 34% due to AI-related cooling demands. A simple 20–50 question conversation with ChatGPT can consume the equivalent of a 500 ml bottle of water. Many data centres are also built in areas with cheap land, which can often be water-stressed regions. This places additional pressure on local water supplies and ecosystems, particularly in drought-prone regions.
The ‘carbon footprint’ of AI is a growing concern, contributing to greenhouse gas emissions and climate change. The immense cost means that only a handful of highly funded organisations (e.g., Google, Microsoft, OpenAI, Amazon) can afford to develop and train frontier AI models. This creates a highly centralised AI landscape, limiting access for smaller research institutions, startups, and developing nations. The current trajectory of AI resource consumption is not sustainable in the long term without significant advancements in energy efficiency, renewable energy integration for data centres, and more efficient AI architectures.
Addressing these issues is crucial for the responsible and sustainable evolution of AI. At the Open Language Initiative, we aim towards a reasoned use of AI: any time it is possible, we shall continue to rely on our expert human intelligence (our qualified translators) for a sustainable future for all.