Using LLMs to Develop Software: Ethics, Risks, and Responsible Practice

A living framework for NGOs, civil society organizations, and mission-driven teams navigating AI adoption in software development.

First draft - due to update May 2026


Collaborators

This framework was built in collaboration and is adopted by the following partners:

Introduction

AI coding tools have moved from novelty to daily workflow in under two years. Andrej Karpathy coined the term "vibe coding" in early 2025 - describing developers who prompt AI, accept all suggestions, and barely read the output. By early 2026, he had already moved on, calling the practice outdated and advocating instead for "agentic engineering": careful, supervised AI-assisted development with full human oversight 1. While early-2025 AI models were shown in some cases to have a net negative impact on developer productivity 21, models have improved significantly by early 2026, alongside growing efforts within open-source communities to establish appropriate governance and usage policies.

This trajectory tells us something important: the tools are real and improving rapidly, but the hype cycle consistently outpaces responsible adoption. For any organization working in the public interest and given the prevalence of LLM use in software in this moment, a governance approach that provides deliberate attention to the ethics of LLM use in software is required.

This document provides a framework in three parts: the ethical concerns AI adoption raises and how to mitigate them; the specific responsibilities that arise when AI intersects with open-source practice; and the human dimensions - learning, craft, and cognition - that must be protected as these tools become pervasive.


1. Ethical Concerns

AI adoption is not just a tooling decision. It is a values decision. Below are the primary ethical risks, alongside practical mitigation strategies.

1.1 Data Privacy and Security

Prompts sent to proprietary AI services may be stored or reused. Pasting sensitive data - beneficiary records, donor information, strategy documents, personnel details - into a commercial AI tool creates privacy and security exposure. However, this risk is not as present during software development.

Research published in Nature Scientific Reports highlights the cybersecurity risks inherent in AI-generated code, including injection vulnerabilities, insecure templates, and insufficient input validation 2.

Mitigation approaches:

1.2 Bias and Discrimination

AI models are trained on internet-scale data that reflects existing societal biases. Research has shown LLMs associating specific ethnic groups with violence, reproducing gender stereotypes, and skewing outputs toward Western perspectives 3. For organizations serving marginalized communities globally, this is not an abstract concern - it is an operational risk. AI-generated content, code, or analysis may silently encode assumptions that undermine the very populations an organization exists to serve.

In coding, bias can surface in less obvious ways: culturally narrow test data, dataset assumptions, internationalization blind spots, and biased evaluation criteria in synthetic datasets.

Mitigation approaches:

1.3 Environmental Cost

Training large language models requires enormous computational resources. Each query consumes energy. For organizations with environmental or sustainability principles, uncritical adoption of AI tools creates a tension between productivity gains and ecological impact.

Mitigation approaches:

1.4 Labour and Exploitation

The refinement of AI models often relies on low-paid human labor for data labeling and content moderation, frequently in low- and middle-income economies. The training data itself was often collected without consent from its creators. Using these tools means participating in a supply chain with unresolved ethical questions about consent, compensation, and intellectual property 20.

Mitigation approaches:

1.5 Intellectual Property and Copyright

Current AI models raise unresolved questions about copyright. The LLVM Project's AI policy states it clearly: using AI tools to regenerate copyrighted material does not remove the copyright, and contributors remain responsible for ensuring nothing infringing enters their work 4. The risk includes inadvertently incorporating copyrighted code or text into publicly released outputs.

Unfortunately, there are not many strategies to avoid the underlying ethical concern of stolen intellectual property, aside from not using the models. Even 'open' models will generally contain traces of stolen training material.

Mitigation approaches:

1.6 Digital Divide and Equity

AI coding tools are already more accessible to people in wealthy countries, and as the technology industry attempts to recoup its enormous capital investments, prices are likely to rise. At the same time, AI tools are eroding the equitable commons of free and open-source knowledge and universally accessible knowledge bases like Stack Exchange. There is a real risk of a two-tier system developing: massively powerful tools running in corporate data centers for the well-resourced, much less capable local instances for everyone else, and a diminished shared commons between the two 5.

Mitigation approaches:


2. AI in Open Source: Responsibility, Pressure, and Maintenance

AI affects not just how we code - but how we participate in the commons.

2.1 Asymmetric Pressure and Extractive Contributions

Dries Buytaert, lead of the Drupal project, describes the core problem precisely: AI makes it cheaper to contribute, but it does not make it cheaper to review 6. More contributions are flowing into open-source projects, but the burden of evaluating them still falls on the same small group of maintainers. This creates asymmetric pressure that risks burning out the people who hold projects together.

The LLVM Project introduced the concept of an "extractive contribution" - one where the cost to maintainers of reviewing it exceeds the benefit to the project 4. Before AI, posting a change for review signalled genuine interest from a potential long-term contributor. AI has decoupled effort from intent. A drive-by contributor can now generate a large patch in minutes and shift hours of review work onto volunteers.

Daniel Stenberg, maintainer of curl, canceled the project's bug bounty program after AI-generated reports flooded his seven-person security team - fewer than one in twenty turned out to be real bugs. Yet in the same period, an AI security startup used AI well and found all 12 zero-day vulnerabilities in a recent OpenSSL security release, some hiding for over 25 years 7. The difference was not whether AI was used. It was expertise and intent.

AI-generated code also frequently reinvents the wheel - producing custom implementations rather than leveraging well-tested community libraries. This creates fragmentation and shifts maintenance burden onto the ecosystem 8.

Mitigation: review discipline and contribution hygiene. Good engineering practice matters more than ever. Organizations should formalize policies addressing AI in contributions. For practical guidance, see Working with AI Tools as a Developer and Repo Checklist.

Regardless of whether AI is used:

2.2 Long-Term Maintenance

While AI can be effective for quickly getting something up and running, it creates a significant pain point when it comes to maintaining or upgrading that code. If the people responsible for a codebase do not understand how it was built, they will eventually hit a wall - making maintenance, debugging, and upgrades extremely difficult. This can ultimately restrict an organization's ability to build anything new, because it is trapped by code it cannot confidently modify 9.

AI tools are demonstrably helpful when assisting someone who already understands the codebase and the broader technical landscape, but they are far less reliable as a substitute for that understanding.

See AI-Assisted Coding Guide for details on appropriate usage of LLMs.

2.3 What Leading Projects Are Doing

Project responses range from cautious acceptance to outright bans. The landscape is moving fast, but the following represent the most significant approaches as of early 2026. Notably, the platforms hosting open-source projects have been slow to provide maintainer tooling for filtering or flagging AI-generated contributions - several projects cite this as a direct driver of their restrictive policies 10. OSS foundations, meanwhile, have largely focused on licensing questions rather than the quality and burnout crisis maintainers are facing now 10.

Disclosure and accountability:

Still navigating:

Restrictive approaches or bans:

The common thread: human accountability, transparent AI use, respect for maintainer time, and protection of the commons. Notably, even projects that ban external AI contributions often use AI internally - the issue is not the tool itself but the absence of understanding, accountability, and genuine engagement behind the contribution.


3. Sustaining Human Skill, Judgment, and Craft

AI tools are powerful, but they interact with human cognition in ways that require deliberate management.

3.1 Cognitive Risks

Research confirms what many developers suspect: how AI is used matters as much as whether it is used. In a randomized study, participants who relied solely on generated code scored just 24–39% on follow-up comprehension tests, while those who asked for explanations scored 65–86% 22. The delegation group finished fastest - but retained the least.

Several patterns can erode skill if left unchecked:

For small teams, this is a serious risk. If developers stop deeply understanding the systems they maintain, there is no safety net - and the people least equipped to debug AI-written code may be those whose skills were eroded by relying on it.

Mitigation approaches:

3.2 Preserving the Craft of Engineering

AI can generate syntactically correct code quickly. But framing the right problem, designing architecture, evaluating trade-offs, aligning with stakeholder needs, and mentoring others - these remain deeply human tasks. As AI handles more of the mechanical work of coding, it becomes more important, not less, for human interaction to focus on problem framing, approach discussion, and alignment before setting AI to do the implementation work.

This applies with particular force to junior developers. The errors and dead ends that feel frustrating during independent work are also where the deepest learning happens. Skipping that struggle in favor of AI-generated solutions can create a gap between apparent productivity and actual competence - one that may not become visible until something breaks in production.

Practical commitments:


Guiding Principles

  1. Human accountability is non-negotiable. AI assists; humans decide, review, and own the output.
  2. Transparency is mandatory. When AI is used, it should be disclosed - in code commits, in documents, in reports.
  3. Protect your maintainers. Never allow AI to increase the burden on those who review and maintain code without providing corresponding relief.
  4. Prioritize learning over speed. An organization's greatest asset is its people. If AI adoption undermines their ability to learn and grow, the short-term productivity gain is not worth it.
  5. Never input sensitive data into commercial AI tools. Beneficiary data, personnel information, strategic documents, and donor details must not enter commercial AI systems without clear data governance.
  6. Interrogate bias actively. Every AI output that touches the communities you serve should be critically evaluated for embedded assumptions.
  7. Respect the open-source commons. Ensure AI-assisted contributions are high quality, transparent, and do not extract more from maintainers than they give back.
  8. Champion equitable access. Advocate for and invest in open-source models that can run locally, ensuring communities are not left behind.
  9. Use fit-for-purpose models. Match the tool to the task; do not default to the largest available model.
  10. Always favor small, reviewable changes. Good engineering discipline is the best defense against AI-generated complexity.

Living Document Commitment

AI capabilities, norms, and risks evolve rapidly. This document should be reviewed and updated at least every three months. Responsible AI adoption is not about maximizing automation - it is about responsibly augmenting human capacity while protecting beneficiaries, contributors, the open-source ecosystem, and the long-term capability of teams.

This framework is intended as a starting point for consultation among NGOs, civil society organizations, and mission-driven teams. Contributions, critique, and adaptation are welcome.


References

  1. Karpathy, A. (2025–2026). From "vibe coding" to "agentic engineering."
  2. Nature Scientific Reports (2026). Cybersecurity risks in AI-generated code.
  3. Queen Margaret University Library. Generative AI: Ethics.
  4. LLVM Project. AI Tool Policy.
  5. Stack Overflow Blog (2025). Whether AI is a bubble or revolution, how does software survive?
  6. Buytaert, D. (2025). AI creates asymmetric pressure on open source.
  7. AI finds 12 of 12 OpenSSL zero-days while curl cancelled its bug bounty.
  8. Mapscaping Podcast. Vibe coding and the fragmentation of open source.
  9. Caimito (2025). The recurring dream of replacing developers.
  10. Holterhoff, K. (2026). AI Slopageddon and the OSS Maintainers. RedMonk.
  11. Nair, K. (2026). AI usage in popular open source projects.
  12. Linux kernel mailing list AI Policy.
  13. cURL contribution policy: On AI use in curl.
  14. QGIS Enhancement Proposal. AI Tool Policy.
  15. GDAL AI Tool Policy.
  16. OpenDroneMap AI contribution policy discussion.
  17. Debian AI General Resolution withdrawn.
  18. Hicks, C. Cognitive helmets for the AI bicycle.
  19. Cloud Native PG AI Usage Policy
  20. Regilme, S.S.F. (2024). Artificial Intelligence Colonialism
  21. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
  22. Shen, H.S & Tamkin, A. (2026). How AI Impacts Skill Formation
  23. Epoch AI. How much energy does ChatGPT use?
  24. OpenAI. Data Residency Docs
  25. International Energy Agency. Global Emissions Report
  26. Founders Pledge. Climate And Lifestyle Report
  27. Effective Environmentalism. Climate Charity Recommendations

Additional Sources

The following sources informed the development of this framework but are not directly cited above.


Disclaimer: Initial content summarized by Claude Opus 4.6 from the sources listed above, then manually reviewed and edited. This document is released for consultation and collaborative refinement.


Appendix A: Methodology for Estimating LLM Energy & CO₂ Emissions and Donation Proxy

There is no easy way to estimate energy usage of LLM queries.

Below are some simple 'back-of-the-envelope' calculations to give a rough estimation of the potential magnitude of energy consumption.

1. Approximate LLM Usage

The most accurate approach would be to average token use per team member on a given provider.

However, as we do not have a prescriptive usage policy, and developers can use open models, we need approximations:

6400 prompts per month

2. Convert Queries to Electricity Usage

Energy per query: ~0.018 kWh

115.2 kWh usage per month (for a 5 person team)

3. Convert Electricity to CO₂ Emissions

tCO₂e = kWh_total × (gCO₂/kWh / 1000 / 1000)

0.051 tonnes CO₂ equivalent produced per month

4. Convert Emissions to Donation Proxy

Recommendation: ~26 USD donation per month, for a team of 5 devs using LLM assistance.