Your primary focus will be to apply data mining techniques, perform feature engineering, optimize algorithms and to scale machine learning models in a highly parallelized, big data environment. In addition, you will lead database integration/s and the cleaning of messy, unstructured data sets. You will work closely with data scientists to determine which data are needed for analysis, and data architects to determine requirements for scalable solutions.
This is a unique opportunity to join a new, multidisciplinary team of creative and passionate individuals. The team mission is to develop, launch, and operate new businesses that leverage T-Mobile network data, retail, and platform assets.
We’re a lean, flat and experimental team with profit and loss responsibility that feeds straight into T-Mobile’s bottom line. Come join us!
What you’ll do in your role.
- Build and optimize high-performance algorithms. Scale existing or proposed algorithms on terabytes of data. Improve time complexity and space complexity for data pre-processing. Recommend ways to improve data pipeline reliability, pipeline efficiency, and data quality.
- Process structured and unstructured data, validate data quality, help to design automated data quality tests in big data environment.
- Assist in the development and support of data products. Develop automated processes for data mining, modeling, and enrichment.
- Work closely with engineers and data scientists. Help data science team and engineers to improve Spark performance. Help lead data product deployment.
- Create custom software components using Hive, Spark or PySpark (e.g. specialized UDFs) and analytics applications.
- Help create source control and visualization tools for tracking model progress, model performance and data quality. Integrate new data management technologies and software engineering tools into existing structures.
- Collaborate with data architects, engineers, data scientist, business team members on project planning and goals.
The experience you’ll bring.
- BS degree in a quantitative field such as statistics, operations research, computer science, mathematics, physics, electrical engineering, industrial engineering
- 2 years of relevant work experience in big data analysis or related field (engineer/developer)
- Expert in Spark, Python or R, PySpark or SparkR, Scala, SQL/Hive
- Familiar with Spark MLlib, SparkSQL
- Accomplished in Hadoop-based data mining frameworks……Read more>>