Question
5-10

Senior Data Engineer / Data Curator

6/25/2025

Design and implement data pipelines for processing and curating large datasets used in model training. Collaborate with model teams to ensure data aligns with model requirements and performance goals.

Working Hours

40 hours/week

Company Size

10,001+ employees

Language

English

Visa Sponsorship

No

About The Company
Established in 1987, TSMC is the world's first dedicated semiconductor foundry. As the founder and a leader of the Dedicated IC Foundry segment, TSMC has built its reputation by offering advanced and "More-than-Moore"​ wafer production processes and unparalleled manufacturing efficiency. From its inception, TSMC has consistently offered the foundry segment's leading technologies and TSMC COMPATIBLE® design services. TSMC has consistently experienced strong growth by building solid partnerships with its customers, large and small. IC suppliers from around the world trust TSMC with their manufacturing needs, thanks to its unique integration of cutting-edge process technologies, pioneering design services, manufacturing productivity and product quality. The company's total managed capacity reached above 9 million 12-inch equivalent wafers in 2015. TSMC operates three advanced 12-inch wafer fabs, four eight-inch wafer fabs, one six-inch wafer fab (fab 2) and two backend fabs (advanced backend fab 1 and 2). TSMC also manages two eight-inch fabs at wholly owned subsidiaries: WaferTech in the United States and TSMC China Company Limited. TSMC also obtains eight-inch wafer capacity from other companies in which the Company has an equity interest. TSMC is listed on the Taiwan Stock Exchange (TWSE) under ticker number 2330, and its American Depositary Shares trade on the New York Stock Exchange (NYSE) under the symbol "TSM"​.
About the Role

Senior Data Engineer / Data Curator

 

A job at TSMC Arizona offers an opportunity to work at the most advanced semiconductor fab in the United States. TSMC Arizona’s first fab will operate it’s leading-edge semiconductor process technology (N4 process), starting production in the first half of 2025. The second fab will utilize its leading edge N3 and N2 process technology and be operational in 2028. The recently announced third fab will manufacture chips using 2nm or even more advanced process technology, with production starting by the end of the decade. America’s leading technology companies are ready to rely on TSMC Arizona for the next generations of chips that will power the digital future.

 

As a Senior Data Engineer in the AI Data Curation track, you will ensure that the data powering our AI models is high-quality, well-organized, and fit for use in model training and deployment. You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.

Responsibilities:

  • Design and implement data pipelines for processing, cleaning, and curating large datasets used in model training and fine-tuning.
  • Automate data cleaning processes (e.g., removing noise, duplicates, irrelevant content) and ensure datasets are appropriately labeled and structured.
  • Collaborate with model teams to ensure data aligns with model requirements and performance goals.
  • Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data.
  • Manage data storage and retrieval strategies, ensuring scalability and data consistency across different environments.
  • Conduct regular audits to ensure data integrity, privacy, and security compliance.

Minimum Qualifications/Requirements:

Education: Minimum degree required: Bachelor's degree in Computer Science, Data Science, or a related field.

Technical Skills:

  • 5+ years of experience in data engineering, data wrangling, or data curation, particularly in machine learning or AI-driven environments.
  • Strong proficiency in Python (Pandas, NumPy) and SQL for data manipulation and querying.
  • Familiarity with cloud-based data storage (AWS S3, Google Cloud Storage, etc.) and distributed systems for managing large datasets.
  • Experience with data annotation tools and platforms for manual or semi-automated labeling.
  • Experience with NLP data formats, such as JSONL, text, or embeddings, and an understanding of tokenization.
  • Experience managing data pipelines with tools like Apache Kafka, Apache Airflow, or similar ETL tools.
  • Strong knowledge of AI ethics, data privacy, and compliance standards (GDPR, CCPA, etc.).
  • Bonus: Experience with vector databases and indexing for LLMs (e.g., FAISS, Pinecone).

Interpersonal Skills:

  • Communication
  • Computer proficiency
  • Presentation skills
  • Listening
  • Teamwork

 

Candidates must be willing and able to work on-site at our Phoenix Arizona facility.

 

As a valued member of the TSMC family, we place a significant focus on your health and well-being. When you are at your best-physically, mentally, and financially-our company is at its best. We offer a comprehensive and competitive benefits program that provides the resources you need to help you manage your health and achieve your goals across many areas of your life. This includes a variety of medical, dental and vision plan offerings you can choose from that best fit your and your family’s needs. Additionally, TSMC provides income-protection programs to financially assist you should you experience an injury or illness, and a 401(k)-retirement savings plan to help you secure your financial future. TSMC also offers competitive paid time-off programs and paid holidays allowing you to recharge and spend time with your family and loved ones.

 

Work Location: 5088 W. Innovation Circle, Phoenix, AZ 85083

 

TSMC is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other protected characteristic. We encourage all qualified individuals to apply, and we welcome applications from individuals with diverse backgrounds and experiences. Candidates must be able to perform the essential functions of the job with or without a reasonable accommodation. If you need a reasonable accommodation as part of this application process, please contact P_LOA@tsmc.com.

#LI-Onsite

Key Skills
Data EngineeringData WranglingData CurationMachine LearningAIPythonSQLCloud StorageData AnnotationNLPData PipelinesAI EthicsData PrivacyCompliance StandardsVector Databases
Benefits
MedicalDentalVisionIncome Protection Programs401(k) Retirement Savings PlanPaid Time-OffPaid Holidays
Apply Now

Please let TSMC know you found this job on InterviewPal. This helps us grow!

Apply Now
Prepare for Your Interview

We scan and aggregate real interview questions reported by candidates across thousands of companies. This role already has a tailored question set waiting for you.

Elevate your application

Generate a resume, cover letter, or prepare with our AI mock interviewer tailored to this job's requirements.

Senior Data Engineer / Data Curator - InterviewPal Jobs