Research Intern - Human Intelligence

11/27/2025

Research Interns will collaborate with mentors and fellow interns to advance research in human intelligence. They are expected to present findings and contribute to the community during the internship.

Working Hours

40 hours/week

Company Size

10,001+ employees

Language

English

Visa Sponsorship

About The Company

Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day open to learning our own biases, changing our behavior, and inviting in differences. Because impact matters. Microsoft operates in 190 countries and is made up of approximately 228,000 passionate employees worldwide.

About the Role

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer. Research & prototype methods in areas such as: Unified face normalization across modalities (e.g., RGB↔NIR), with joint prototype + feature learning and cross-modal alignment. Multimodal face recognition (fusion across RGB, NIR, depth/IR, audio cues where appropriate), with robustness/fairness under distribution shift. Large Language Models-aided face verification: explore Vision Language Models (VLM)/Large Language Models (LLM) pipelines that (i) use visual context in the photo (attributes, scene cues, spatiotemporal hints) to assist verification; (ii) provide interpretable rationales; and (iii) improve failure detection and human-in-the-loop triage. Efficiency & reliability: distillation/quantization/pruning, lightweight encoders/normalizers, calibration and uncertainty, liveness/antispoof integration. Evaluate thoroughly: define datasets and protocols; run ablations and benchmarks (ROC, EER, TPR@FAR, latency/memory, fairness/robustness). Production immersion: learn Windows Hello-style pipelines (signals, constraints, on-device considerations) to align research with deployment. Publish: communicate results via talks, internal tech reports, and submissions to top venues. Currently enrolled in a Master's or Ph.D. program in CS, EE, Applied Math, or related field with a focus in vision/graphics/ML. In addition to the qualifications below, you'll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. Publications in CVPR/ICCV/ECCV/NeurIPS/ICLR/ICML/SIGGRAPH or related journals. Experience in face recognition/verification, multimodal learning/fusion, metric learning, representation or generative modeling. Depth in multimodal normalization (pre-FR normalizers, prototype learning) and RGB↔NIR FR. Experience with VLMs/LLMs (prompting, fine-tuning, tool-use) for visual reasoning, explainability, or safety. Scalable training (DDP/multi-node), dataset curation, reproducible MLOps; familiarity with liveness/FAS and fairness/robustness evaluation. Proficient PyTorch/JAX background; ability to implement/reproduce SOTA.

Key Skills

ResearchCollaborationFace RecognitionMultimodal LearningMetric LearningRepresentation ModelingGenerative ModelingVision Language ModelsLarge Language ModelsScalable TrainingDataset CurationMLOpsPyTorchJAXFairness EvaluationRobustness Evaluation