Data Scientist

M Science · New York City Metropolitan Area

Title: Data ScientistLocation: New York, NYAbout M Science:M Science is a data-driven research and analytics firm, uncovering new insights for leading financial institutions and corporations. M Science is revolutionizing research, discovering new data sets, and pioneering methodologies to provide actionable intelligence. Our research teams have decades of experience working with massive amounts of unstructured data in near real-time to discern critical insights that help clients make smarter, more informed decisions. We combine the best of finance, data, and technology to create a truly unique value proposition for both financial services firms and major corporations.Job Overview:We are seeking a highly skilled Data Scientist to design and develop pipelines and AI/ML models and workflows on our 50+ alternative data panels. The ideal candidate will have deep expertise in mathematical and statistical modeling and will have experience building data pipelines and models using Python, SQL, and PySpark. This person will test new data assets for the firm, solve complex data problems, contribute to the firm’s analytics library, and use traditional machine learning and statistical methods to improve panel data. M Science expects its data scientists to implement production code, so the ideal candidate will have experience writing well tested, performant, object-oriented code.Responsibilities:Solve complex data science problems using appropriate statistical and/or ML models; present results to the stakeholdersProcess, cleanse, and verify the integrity of data used for analysisCreate automated alerting and notification systems for deviations in data quality, validation failures, or unusual patternsEvaluate new datasets for the firmContribute to the firm’s documented, unit-tested analytics libraryDesign, develop, and optimize scalable and fault-tolerant data pipelines using Databricks, Airflow, Python, and SparkBuild resilient data pipelines that handle vendor-related issues such as delayed deliveries, schema changes, incomplete records, and data corruptionAble to utilize agentic workflows to automate redundant workQualifications:Advanced Python for data processing, scripting, and automationFluency in PySpark/Spark or SQLBachelor's or higher degree, or significant experience in Statistics, Mathematics, Computer Science, Information Science, or a similar quantitative disciplineExcellent knowledge of multivariate statistical analysis, including but not limited to ordinary least squares, principal component analysis, factor analysis, LDA, and panel methodsExcellent knowledge of other ML methods including additive modeling and ensemble modelingExperience using gen-AI tools for workflow improvementExperience with named entity resolution methods a strong plusFamiliarity with cloud data platforms (AWS) and cloud-based storage solutionsStrong troubleshooting skills to diagnose and resolve performance bottlenecks in data pipelinesPrimary Location: New York, NYSalary Range: $90,000-$175,000 USD/AnnualThe salary offered will take into consideration an individual’s experience level and qualifications. In addition to salary, M Science offers, for eligible employees, an annual discretionary incentive bonus, competitive employee benefits, including: medical, dental, & vision coverage; 401(k); life, accident, disability insurance; and wellness programs including discounted and flexible gym memberships; and a robust employee discount program. M Science also offers paid time off packages that include planned time off (vacation), unplanned time off (sick leave), paid holidays and paid parental leave.