Peach Pilot — Principal QA Engineer

Other Jobs To Apply

<p><strong>Peach Pilot — Principal QA Engineer (AI Systems & Platform)</strong></p> <p><strong>Remote — US Based  |  Full-Time |  Early Engineering Team</strong></p> <p> </p> <p><strong>The Mission: Trust Has to Be Earned — Every Release</strong></p> <p>95% of enterprise AI pilots fail — not because the technology is broken, but because users don't trust it. At Peach Pilot, we are building an enterprise AI operating system where trust is the product. That means every feature we ship must work exactly as the user expects, every time. One broken interaction at the wrong moment can undo months of adoption. You are the last line of defense before our platform reaches a CFO's desk.</p> <p>We are a funded startup co-founded by <strong>Mario Montag</strong> (ex-McKinsey, Predikto founder — acquired by Fortune 50) and <strong>JP James</strong> (Georgia Tech alum, US patent holder in AI/ML, Senior Fellow at the National War College, four-time Atlanta 500 honoree).</p> <p><strong>The Role</strong></p> <p>This is a player-coach hire. You will build and own the QA function at Peach Pilot — writing test code, designing eval pipelines, and setting the quality bar — while also standing up and growing a QA team as the company scales. We are not looking for someone who manages spreadsheets and delegates everything. We are looking for someone who can do the work, knows what good looks like, and builds a team around that standard.</p> <p><strong>The Challenge: QA for AI is a Different Problem</strong></p> <p>Traditional QA assumes deterministic outputs. LLMs don't give you that. You will be building a quality function from scratch in an environment where:</p> <ul> <li>Multi-model routing (Claude, GPT-4o, Grok, Gemini) means the same input can produce different outputs depending on which model handled it.</li> <li>Agent orchestration and governance agents must maintain a structurally separate audit trail — any drift between execution and governance is a critical failure.</li> <li>The file ingestion pipeline (Word, Excel, PowerPoint, PDF) must survive edge cases that enterprise clients will find within the first week of deployment.</li> <li>Your users are CEOs and operations leaders who have never used a terminal. A confusing error state isn't a minor bug — it kills adoption.</li> </ul> <p>This is not a ticket-closing role. This is a quality architecture and team leadership role.</p> <p><strong>What You Will Own & Build</strong></p> <p><strong>Build the QA Foundation (First 90 Days)</strong></p> <ul> <li>Establish the testing framework from zero: unit, integration, end-to-end, and LLM-specific evaluation pipelines.</li> <li>Define quality standards, test coverage requirements, and documentation practices in partnership with the Lead Engineer.</li> <li>Audit the existing platform and identify the highest-risk surfaces before the next major customer deployment.</li> <li>Define the team structure you will need — onshore vs. offshore mix, roles, and a hiring roadmap — and begin executing against it.</li> </ul> <p> </p> <p><strong>Build and Lead the QA Team</strong></p> <ul> <li>Recruit, hire, and onboard QA engineers as the team grows, setting clear expectations, working standards, and a bar for technical excellence from day one.</li> <li>Mentor junior and mid-level QA engineers — building their ability to own test domains independently rather than creating dependency on you.</li> <li>Act as the quality culture carrier across the full engineering team — QA is not a department, it is everyone's responsibility, and you will make that real.</li> <li>Report directly to the Lead Engineer and participate in product planning to ensure quality is designed in, not bolted on.</li> </ul> <p> </p> <p><strong>AI & Agent Testing</strong></p> <ul> <li>Design evaluation frameworks for non-deterministic LLM outputs — including prompt regression testing, model drift detection, and output quality scoring across Claude, GPT-4o, Grok, and Gemini.</li> <li>Build automated test suites for the agent orchestration layer, including governance agent audit trail integrity and human-override behavior.</li> <li>Validate the Enterprise Knowledge Graph (Neo4j + vector search) for data accuracy, retrieval quality, and failure modes under real enterprise data conditions.</li> </ul> <p> </p> <p><strong>Platform & Integration Testing</strong></p> <ul> <li>Own end-to-end testing of the file ingestion pipeline across document types (Word, Excel, PowerPoint, PDF) including encryption, formatting edge cases, and audit trail continuity.</li> <li>Validate streaming response handling, latency thresholds, and graceful degradation when a model is unavailable or slow.</li> <li>Test multi-model routing logic to confirm cost-optimized task allocation behaves correctly across LLM providers.</li> </ul> <p> </p> <p><strong>UX Quality & FDE Support</strong></p> <ul> <li>Partner with the Full-Stack Engineer to define and test trust-layer UX standards — onboarding flows, progressive disclosure, uncertainty states, and real-time document viewers.</li> <li>Build reusable test playbooks for Forward Deployed Engineers to use in new customer deployments and agent configurations.</li> <li>Act as the internal advocate for the non-technical enterprise user — if a CEO would be confused by it, it doesn't ship.</li> </ul> <p> </p> <p><strong>Who You Are</strong></p> <ul> <li>The Player-Coach: You have 7+ years of QA engineering experience, with at least 3 years in a lead or senior role where you both wrote test code and managed or mentored other engineers. You do not delegate the hard problems — you model how to solve them.</li> <li>Team Builder: You have experience leading and growing QA teams. You know how to hire for technical excellence, set up engineers to own domains independently, and build accountability without micromanagement.</li> <li>AI-Native Tester: You have hands-on experience testing LLM-powered applications — you understand prompt sensitivity, output variance, and how to build eval pipelines that catch regressions across model updates.</li> <li>Automation-First: You write test code. Python is your primary tool. You have built and maintained CI/CD-integrated test suites and you don't wait for someone to file a bug to find one.</li> <li>Integration Expert: You are comfortable testing complex API chains, async/streaming responses, and multi-service workflows. Document processing pipelines and knowledge graph outputs don't intimidate you.</li> <li>0-to-1 Mindset: You have built a QA function from the ground up in an early-stage environment. You know when to move fast and when to go deep, and you can make that call without being told.</li> <li>Enterprise Empathy: You understand that your end users are not developers. You test for confusion and trust failure, not just broken functionality.</li> </ul> <p><strong>The Stack You'll Test Against</strong></p> <ul> <li>AI/LLM: Anthropic Claude, OpenAI GPT-4o, xAI Grok, Gemini, OpenClaw</li> <li>Frontend: React/Next.js, TypeScript, Tailwind CSS</li> <li>Backend: Python, Node.js/TypeScript (FastAPI/Express)</li> <li>Data & Graph: Neo4j, Snowflake, Azure Cosmos DB, Azure AI Search</li> <li>Infrastructure: Azure (Functions, Key Vault), CI/CD pipelines</li> <li>Visualization: Plotly, D3, Recharts, Mermaid</li> </ul> <p><strong>Even Better If</strong></p> <ul> <li>You have experience with LLM evaluation frameworks (e.g., LangSmith, PromptFlow, or custom eval pipelines).</li> <li>You have tested agent frameworks such as LangChain or CrewAI.</li> <li>You have a background in enterprise software or regulated industries where audit trail integrity is non-negotiable.</li> <li>You have worked alongside Forward Deployed or solutions engineering teams and understand field deployment risk.</li> </ul> <p><strong>Why This is Different</strong></p> <ul> <li>You are building the QA function and the team — not inheriting either. Your decisions will define how this company ships software for the next five years.</li> <li>You will work directly with the founding engineering team. Your findings shape the roadmap, not a backlog queue.</li> <li>Real enterprise data, real deployments, real consequences. No toy environments.</li> <li>Meaningful equity as a founding-team engineering hire.</li> </ul> <p><strong>Compensation & Benefits</strong></p> <ul> <li>Base Salary: Broad range (Commensurate with experience)</li> <li>Equity: Meaningful founding-team equity package</li> <li>Benefits: Comprehensive medical, dental, and vision; 401(k); flexible PTO</li> <li>Location: Fully Remote — US Based</li> </ul> <p><strong>The Clincher</strong></p> <p>Tell us about a quality failure — one you caught before it shipped, or one that got through. What did you build or change after it, and how did you make sure your team could catch the next one without you?</p>

Back to blog

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...