New

LLM Evaluation Expert

Belcan Corporation
United States, Washington, Redmond
Apr 02, 2025
Details: #NowHiring #SoftwareDevEngineer Job Title: Software Dev Engineer II Contract: 6 Months Compensation: This position is offering a hourly rate range from $57 - $59 an hour. Belcan is a leading provider of professional IT, Engineering, Workforce Solutions and staffing in the United States, Canada, UK, Europe, and India. A Software Dev Engineer II Job Virtually is currently available through Belcan. In this role you will play a crucial role in assessing and improving our language models' coding capabilities. If you are interested in this role, Apply Today! Job Description: Actual Job Title: LLM Evaluation Expert - Domain: Coding The role will follow the qualifications and job duties below. Only 10 hours per week, no wiggle room on this. Company Overview: Artificial General Intelligence (AGI) Data Services is at the forefront of AI innovation, specializing in the development and refinement of large language models (LLMs). Our mission is to create AI systems that can understand and generate high-quality code, revolutionizing the software development landscape. Job Description: As an LLM Evaluation Expert specializing in Coding, you will play a crucial role in assessing and improving our language models' coding capabilities. Your expertise will be instrumental in evaluating LLM-generated code responses, making high-level judgments, and setting the standard for what constitutes excellent AI-assisted coding. Key Responsibilities: * Critically analyze and evaluate code responses generated by our LLMs across various programming languages and paradigms * Exercise expert judgment to select the most appropriate and efficient code solutions from multiple LLM-generated options * Make informed decisions on behalf of our customers, ensuring that selected code meets industry standards, best practices, and specific client needs * Develop and write coding demonstrations to illustrate "what good looks like" in AI-generated code, setting benchmarks for quality and efficiency * Provide detailed feedback and explanations for your evaluations, helping to refine and improve the LLM's understanding and output * Collaborate with the AI research team to identify areas for improvement in the LLM's coding capabilities * Stay abreast of the latest developments in software engineering, coding standards, and AI to ensure our evaluations remain cutting-edge Required Qualifications: * Advanced degree in Computer Science, Software Engineering, or a related field * Extensive experience (5+ years) in software development across multiple programming languages and paradigms * Demonstrated ability to critically evaluate code quality, efficiency, and adherence to best practices * Strong analytical and decision-making skills, with the ability to make complex judgments under ambiguous circumstances * Excellent written and verbal communication skills, with the ability to explain technical concepts clearly * Experience in technical writing, particularly in creating coding examples or tutorials Preferred Qualifications: * Previous experience working with or evaluating AI systems, particularly in the context of code generation * Familiarity with a wide range of software development methodologies and architectural patterns * Understanding of machine learning concepts, particularly as they apply to natural language processing and code generation * Experience in creating or contributing to coding standards or style guides This role requires a unique blend of technical expertise, critical thinking, and communication skills. You will be the bridge between advanced AI technology and practical, real-world coding applications. Your work will directly influence the development of next-generation AI coding assistants, shaping the future of software development. If you're passionate about code quality, have a keen eye for detail, and are excited about the potential of AI in software engineering, we encourage you to apply for this pivotal role at AGI Data Services. desired coding languages for this role: Python, PHP, Java, Ruby, JavaScript, TypeScript, C++, Go, Cypher, SQL - must have experience with several to be considered for the role Please ensure that you are giving them the information included in the intake form attached in the request. Including the actual job title listed in the intake form in the job description section. If selected for an interview, the interview will be based on testing their ability to execute the duties in the job description, including practical exercises based on the job description. In the interviews and job role, candidates will be presented with 2 to 3 examples of user prompts to an LLM and the model's response. All prompts will be coding-related questions. Candidates will need to understand the user's request (i.e. generate code, evaluate code, explain code, etc.) and evaluate the model's response based on that understanding on dimensions like correctness, logic, coherence, applicability to the user's question, etc. After taking time to evaluate the example, the candidates must be able to summarize their findings in a few sentences explaining any issues found and how they would adjust the model's answer to make it better. In the interviews and job role, candidates will not be designing, writing, and or developing code from scratch. They will only be evaluating model responses as described above. Location: Remote Keyword"s: #Remotejobs; #SoftwareDevEngineerjobs; Start Date: Right Away #ZR If you are interested in this role, please apply via the apply now link provided. Our overriding goal is to provide quality staffing solutions that help people, organizations, and communities succeed. Belcan is a leading provider of qualified personnel to many of the world's most respected enterprises. We offer excellent opportunities for contract, temporary, temp-to-hire, and direct assignments. We are the employer of choice for thousands worldwide. For more information, please visit our website at Belcan.com EOE/F/M/D/V