In this white paper, we explore using large language models and on-premises AI to transcribe and analyze pilot- controller communications, enhancing aviation safety and efficiency through real-time insights and secure, customizable deployment.
Introduction
As aviation operations become increasingly complex, the ability to capture, interpret, and act on communication between pilots and air traffic controllers in real time is vital for maintaining airspace safety. MosaicATM’s Sky-for-All Foundational Model Feasibility Study explores how large language models (LLMs) and foundational AI technologies can revolutionize voice analytics in aviation. By transcribing and analyzing controller-pilot communications, these models can detect anomalies, identify procedural deviations, and provide critical insights into safety events—either in real time or during post-flight analysis.
This whitepaper investigates the various use cases for a novel two-step approach that leverages voice-to-text transcription followed by contextual language analysis using LLMs. The goal: to unlock scalable, high-fidelity communication intelligence that improves both operational responsiveness and long-term safety assessment. Beyond cloud deployments, we assess the potential and benefits of on-premises implementations—particularly valuable in aviation, where data privacy, regulatory compliance, and real-time integration are paramount.

Use Case: Air Traffic Controller Training
LLMs can be trained using historic pilot-controller communications and learn to communicate as a
pilot or controller. In this workflow, an existing LLM, such as LLaMA, Mistral, etc, can be fine-tuned on a specific corpus of historical ATC communication data. In a simulation system, the LLM can be integrated as a pilot or controller chatbot and be trained to simulate complex scenarios and emergency situations. The LLM can provide contextual information, suggest responses, and evaluate the trainee’s decisions based on established protocols and best practices and FAA ATC communications manuals. The LLM trained with historical data can simulate realistic scenarios and generate off-nominal events to evaluate the trainee under a range of scenarios and ensure safety is maintained.
In this use case, due to specific behavior required by the training system, knowledge around
emergency situations, and strict safety requirements, the general LLM knowledge might be
insufficient to ensure an LLM meets the needs of the use case. Fine-tuning the selected LLM with
hours of controller-pilot communications relevant to the use cases would boost performance and
ensure the fine-tuned LLM meets the needs of the controller evaluation tool.

Use Case: Safety Incident Report Analysis
Incident reporting is a critical component of aviation safety management. However, safety reports
are often provided in free text form, and an in-depth analysis of the events and factors leading to the safety event often requires time-consuming manual review. LLMs can automate or act as a humanin-the loop companion for analyzing large volumes of incident reports by extracting key information, identifying trends, and summarizing the root causes. LLMs can help aviation safety analysts focus on high-priority issues by categorizing incidents based on severity, potential risk factors, and recurring themes. LLMs and systematic information extraction for all reports as they enter the system facilitate aggregated analysis of large amounts of historical data, as well as more in-depth analysis and understanding of the underlying factors.
A thorough analysis of safety reports requires two main steps. The first step consists of extracting the key entities defined by aviation SMEs. This knowledge is often documented as a taxonomy with aviation concepts and relationships. The second step is making this information available to analysts and answering questions like “Show me reports with human factor-related issues”, which require filtering, selecting reports of interest, and leveraging the defined taxonomy. A closely related foundational model technology (Graph-RAG) can solve this problem by first creating a graph-based taxonomy provided by SMEs, where each report is mapped to the graph and nodes represent classes in the taxonomy. Secondly, Graph-RAG can find an answer to a user query by leveraging the information in the reports and the taxonomy graph structure previously extracted from each report.
Use Case: Aviation Information Parsing
The aviation ecosystem generates large amounts of unstructured data, such as weather reports like METARs (Meteorological Aerodrome) or TAFs (Terminal Area Forecast), NOTAMs (Notices to Airmen), PIREPs (Pilot Reports), and many more. These reports feed a variety of systems, analysis tools, and research supporting ATM operations. Historically, parsers have been coded in different languages to extract the details from text reports, e.g., wind value from a METAR report. However, LLMs can translate unstructured text to a structured format with no need for complex parsing of information. Leveraging LLMs to process aviation textual information reduces development time, leading to more effective data processing, and is robust to format or syntax errors.
Fine-tuning of the LLM may be required if the general LLM was not exposed in the training process to enough information for the data source of interest. However, many of the current GPT-class models can execute such functionality as-is. If necessary, few-shot or other prompting approaches can be used to fill the knowledge gap.

Use Case: Regulatory Compliance
As drones become more integrated into controlled airspace, operators often need to apply for waivers to fly beyond visual line-of-sight (BVLOS) or in restricted areas. Mosaic ATM’s EASEL-AI uses LLMs to streamline the FAA waiver application approval process by parsing complex regulatory documents and operators’ applications to identify whether the waiver application is ready for approval or if additional information/changes need to be made. This is currently a time-consuming process in which the FAA reviews waiver applications manually. While not intended to be an automated replacement, integrating LLMs into the approval process would reduce manual effort on repetitive tasks. Through this, approval authority would focus on the high-level review and verification of the findings provided by the LLM, leading to a more efficient and thorough process.
In this LLM application, it is necessary to first extract all the relevant requirements from regulatory documentation in a textual form the LLM can understand. These requirements may need to be expanded with additional knowledge provided by waiver approvers, which would help the LLM to better align with the current approval process. The LLM would need to review each of the requirements and the waiver application of interest. This can be done via prompt engineering where the requirements are matched to waiver content, and the LLM is asked whether a requirement is met or not, and the reasoning behind the provided answer.
Use Case: Maintenance Manuals and Field Repair
Lengthy, complex manuals are common in aviation, such as aircraft systems (e.g. Instrument Landing System ILS manuals). Technicians in the field or the office need to navigate thousands of pages to locate the specific information they need. For new technicians in training, finding the correct information can be particularly challenging. LLMs can help by quickly parsing and summarizing the content of these manuals, providing technicians with concise, relevant instructions or troubleshooting steps. This capability reduces the time spent searching for information, enhances accuracy, and supports faster, more efficient maintenance operations, ultimately ensuring higher system reliability and safety. This use case falls under the search category for which RAG is well suited. In RAG, documents are first ingested as embeddings in a vector database, and the most relevant content to a user query is later extracted in real time.
Including multi-modal visual language model (VLM) capabilities would further enhance the abilities of this approach. Many aviation manuals, including images, are critical to system operation such as component illustrations, wiring diagrams, and system layouts. Using multi-modal RAG via VLMs would allow technicians to receive comprehensive aid and responses informed not only from pure text, but also visual context describing the relationships between components. Additionally, multi-modal VLMs enable advanced functionality such as image querying. In this example, a technician could upload a photo of a component, the retriever can identify this component against images and diagrams within the vector database, then provide informed answers in response to a technician’s questions such as “What is the repair procedure for this fuel line: {photo}?”. By considering both the textual and visual domains, this system closely aligns with human processes for repair and maintenance.

Use Case: Aircraft Inspections
Aircraft visual inspections are critical to ensure safety and cover key components such as wings,
control surfaces, the tail, propellers, jet engines, and other systems. Airlines’ quick turnarounds to
increase aircraft utilization lead to short time windows to perform aircraft inspections. Traditional
inspection methods lean heavily on experienced technicians to identify visible issues, but such processes are time-consuming and prone to human error, especially under tight deadlines. A multi-modal Vision Language Model (VLM) can significantly enhance aircraft pre-checks by integrating text and images and leveraging the flexibility of the VLMs to perform extensive checks quickly and improve safety.
In this example, VLMs can be leveraged by integrating both textual and visual inputs from inspections. For example, a technician could take images of critical components during a walkaround inspection. The VLM would then compare these images to a database of known aircraft conditions, identifying problems such as icing, corrosion, or damaged components. Going one step further, the VLM could also provide specific actions or instructions for additional areas that may require closer inspection based on the observations. While not a replacement for technicians, this would serve as an additional layer of safety and reduce human fatigue.
While out-of-the-box VLMs can identify the most visually noticeable issues, fine-tuning might be required for small defects with limited exposure during general training. Just as LLMs can be fine-tuned with additional textual content, VLMs such as LLaVA can be fine-tuned on additional imagecaption pairs. By fine-tuning VLMs to this task, it may be possible to quickly and reliably identify a variety of issues that require attention, such as cracks, corrosion, or delamination of materials. This would allow aviation stakeholders to maximize both the speed and thoroughness of inspections.

Use Case: Systems Engineering Applications
Integrating foundational models such as LLMs into systems engineering (SE) applications has the potential to revolutionize aviation projects at agencies such as NASA and the FAA. As projects continue to grow in complexity and technological demands, it is possible to leverage LLMs to advance and enhance SE workflows by bridging the gap across structured and unstructured data domains and modalities. By parsing SysML diagrams, such models can help both engineers and stakeholders better understand system architecture, components, and interactions. Such models can also be applied to map regulatory policies and project requirements directly into models-based systems engineering (MBSE), ensuring compliance. Additionally, this has the potential to automate the generation of SE artifacts such as specifications, requirement traceability matrices, and visual representations.
Integrating foundational model capabilities into MBSE workflows would likely utilize both VLMs for understanding graphical relationships between components and code-based LLMs for parsing and the underlying SysML code of such diagrams. While potentially powerful, challenges include technical integration with existing SE tools and the potential cultural adoption barriers of new technologies that need to be addressed to fully realize the advantages of integrating LLMs into the MBSE projects.
On-Premise Deployment
The above-described use cases leverage LLMs and closely related models and technologies such as VLMs and RAG for document storage, search, and retrieval. While commonly deployed in a cloud infrastructure such as AWS or Azure, many models are available open source for fine-tuning and can be deployed internally either on a server or even a single workstation with some “lightweight” models. In the aviation industry, on-premises deployment offers a critical advantage when dealing with sensitive, proprietary, or even classified data. These sectors often handle highly confidential information for mission-critical operations, such as flight systems or system integration data. While all such cloud providers offer assurances in data management and security for enterprises, onpremises services are an absolute guarantee in terms of data governance.
In addition to security and compliance, on-premises deployment allows aviation organizations to fully customize model applications to fit specific needs and use cases. With on-site control, such models can be fine-tuned on proprietary datasets, which may include operational data, maintenance records, and others. By bringing such capabilities directly on-premises, such models can interface directly with existing infrastructure without interference or additional layers of regulation included with cloud-based providers
Conclusion: LLMs for Air Traffic Communication Safety
The Sky-for-All Foundational Model Feasibility Study demonstrates the transformative potential of LLMs and related AI technologies in aviation communication analytics. From real-time alerting on protocol deviations to deep post-flight safety investigations, foundational models provide scalable solutions for extracting actionable intelligence from unstructured voice data.
By enabling secure on-premises deployment, aviation organizations can harness these AI capabilities while maintaining full control over proprietary and sensitive data. Whether deployed on a local server or a lightweight workstation, these models can be tailored to interface directly with existing systems, providing a powerful, cost-effective, and privacy-preserving alternative to traditional cloud-based solutions. MosaicATM’s study paves the way for the next generation of intelligent, adaptive, and mission critical tools in the national airspace system.
 
													 
													