A hands-on role for someone who wants to engineer production prompts, evaluate AI agents rigorously, and ship LLM features at a real startup.
We're a small team building two products. Skillify runs career workshops and mentorship programs for students — we've reached thousands of students across 220+ schools. Kinship Labs (kinshipcomms.ai) is the AI-powered relationship management platform that powers Skillify and helps other organizations maintain personalized, human connections at scale across SMS and email. The AI agents you'll work on power both.
You'll work directly with the engineering and product teams to build, optimize, and evaluate the AI agents at the heart of our platforms. This is hands-on work — prompt engineering, LLM integration, conversation analysis, and building evaluation frameworks to make sure our AI systems perform reliably at scale.
Work environment: full-time office setting with a small, collaborative team. You'll have a dedicated workstation, access to professional AI tooling (Claude, OpenAI, etc.), and direct mentorship from senior engineers.
What you'll actually do
Prompt engineering & optimization (50%)
- Engineer and iterate on system prompts to improve AI agent response quality
- Design context window strategies for multi-turn conversations
- Optimize prompts for specific use cases: coaching, follow-ups, scheduling, FAQs
- Document prompt strategies and build a library of effective patterns
- Experiment with different models (Claude, GPT-5, etc.) for various tasks
AI evaluation & testing (35%)
- Design and execute evaluation frameworks (evals) to measure AI performance
- Analyze AI agent conversations to identify failure modes and areas for improvement
- Build automated testing pipelines for prompt regression testing
- Apply NLP techniques to assess conversation quality, tone, and sentiment
- Create benchmarks and dashboards to track AI performance over time
AI feature development (15%)
- Assist in building new AI-powered features (summarization, classification, extraction)
- Integrate LLM APIs into production systems
- Prototype new AI capabilities and present findings to the team
What you'll learn
By the end of this internship, you'll be able to:
- Engineer production prompts — design, test, and iterate on prompts that perform reliably across diverse inputs and edge cases
- Evaluate AI systems rigorously — build systematic evaluation frameworks to measure LLM performance on real-world tasks
- Apply NLP in production — translate academic NLP knowledge into practical conversation analysis and quality assessment
- Optimize for cost and latency — make informed tradeoffs between model capability, response time, and API costs
- Use modern development workflows — Git/GitHub, code review, CI/CD, and collaborative software development
- Communicate technical findings — present AI performance insights and recommendations to non-technical stakeholders
What we're looking for
Required
- Currently enrolled in or recently completed a Master's program in Computer Science, Data Science, or related field (exceptional Bachelor's candidates considered)
- Coursework in NLP, machine learning, or AI
- Familiarity with Python or JavaScript programming
- Strong written and verbal communication skills
Nice to have
- Experience with prompt engineering or building LLM-powered applications
- Familiarity with evaluation frameworks (e.g., RAGAS, custom evals)
- Understanding of embeddings, RAG architectures, or vector databases
- Experience with Git and GitHub
- Personal projects, research, or coursework involving LLMs or conversational AI
Tech you'll work with
- AI APIs: Claude API (Anthropic), OpenAI API
- Languages: Python, JavaScript, SQL
- Backend: Node.js, Express, Prisma ORM
- Database: PostgreSQL via Supabase
- Version control: Git/GitHub
- Communication: Slack
Mentorship & path to full-time
You'll get daily check-ins with progress review, weekly 1:1 mentorship sessions, direct access to senior engineers and our CTO for technical guidance, and code review on all your work.
This internship is designed with a potential transition to a full-time role in mind. At the end of the 3-month program, strong performers may be offered a full-time position as a Data Scientist or AI Engineer. Final hiring depends on mutual fit, business needs, and available funding at the time of evaluation.
Sound like you?
If you read this and thought "I was literally born for this" — we want to hear from you.