A hands-on role for someone who wants to engineer production prompts, evaluate AI agents rigorously, and ship LLM features at a real startup.

We're a small team building two products. Skillify runs career workshops and mentorship programs for students — we've reached thousands of students across 220+ schools. Kinship Labs (kinshipcomms.ai) is the AI-powered relationship management platform that powers Skillify and helps other organizations maintain personalized, human connections at scale across SMS and email. The AI agents you'll work on power both.

You'll work directly with the engineering and product teams to build, optimize, and evaluate the AI agents at the heart of our platforms. This is hands-on work — prompt engineering, LLM integration, conversation analysis, and building evaluation frameworks to make sure our AI systems perform reliably at scale.

Work environment: full-time office setting with a small, collaborative team. You'll have a dedicated workstation, access to professional AI tooling (Claude, OpenAI, etc.), and direct mentorship from senior engineers.

What you'll actually do

Prompt engineering & optimization (50%)

Engineer and iterate on system prompts to improve AI agent response quality
Design context window strategies for multi-turn conversations
Optimize prompts for specific use cases: coaching, follow-ups, scheduling, FAQs
Document prompt strategies and build a library of effective patterns
Experiment with different models (Claude, GPT-5, etc.) for various tasks

AI evaluation & testing (35%)

Design and execute evaluation frameworks (evals) to measure AI performance
Analyze AI agent conversations to identify failure modes and areas for improvement
Build automated testing pipelines for prompt regression testing
Apply NLP techniques to assess conversation quality, tone, and sentiment
Create benchmarks and dashboards to track AI performance over time

AI feature development (15%)

Assist in building new AI-powered features (summarization, classification, extraction)
Integrate LLM APIs into production systems
Prototype new AI capabilities and present findings to the team

What you'll learn

By the end of this internship, you'll be able to:

Engineer production prompts — design, test, and iterate on prompts that perform reliably across diverse inputs and edge cases
Evaluate AI systems rigorously — build systematic evaluation frameworks to measure LLM performance on real-world tasks
Apply NLP in production — translate academic NLP knowledge into practical conversation analysis and quality assessment
Optimize for cost and latency — make informed tradeoffs between model capability, response time, and API costs
Use modern development workflows — Git/GitHub, code review, CI/CD, and collaborative software development
Communicate technical findings — present AI performance insights and recommendations to non-technical stakeholders

What we're looking for

Required

Currently enrolled in or recently completed a Master's program in Computer Science, Data Science, or related field (exceptional Bachelor's candidates considered)
Coursework in NLP, machine learning, or AI
Familiarity with Python or JavaScript programming
Strong written and verbal communication skills

Nice to have

Experience with prompt engineering or building LLM-powered applications
Familiarity with evaluation frameworks (e.g., RAGAS, custom evals)
Understanding of embeddings, RAG architectures, or vector databases
Experience with Git and GitHub
Personal projects, research, or coursework involving LLMs or conversational AI

Tech you'll work with

AI APIs: Claude API (Anthropic), OpenAI API
Languages: Python, JavaScript, SQL
Backend: Node.js, Express, Prisma ORM
Database: PostgreSQL via Supabase
Version control: Git/GitHub
Communication: Slack

Mentorship & path to full-time

You'll get daily check-ins with progress review, weekly 1:1 mentorship sessions, direct access to senior engineers and our CTO for technical guidance, and code review on all your work.

This internship is designed with a potential transition to a full-time role in mind. At the end of the 3-month program, strong performers may be offered a full-time position as a Data Scientist or AI Engineer. Final hiring depends on mutual fit, business needs, and available funding at the time of evaluation.

Sound like you?

If you read this and thought "I was literally born for this" — we want to hear from you.

Apply now →