Data infrastructure is the foundation of everything we do. As a data engineer you’ll have the authority to design and maintain infrastructure for collecting, storing, processing, and analyzing terabyte-scale sets of data such as a large document corpora, tons of metadata, application data and more. You will be responsible for familiarizing yourself with available datasets and take lead in understanding the ins and outs of the data sources we use and building a robust infrastructure that makes this data readily available for our web application and data science teams.
This is an exciting opportunity to part of a startup that is applying deep learning to a real-world problem at global scale. We’re always looking for leaders and there is room to grow for the right person to take increasing responsibility working as part of a small and dynamic team.
Tokyo, Melbourne, San Francisco (willing to negotiate remote work as well)
- Design and implement data infrastructure.
- Develop and maintain performant ETL pipelines for complex and large data sets from a variety of sources.
- Enable and accelerate development speed and machine learning performance by ensuring that complex and large data sets are clean and readily accessible in the formats required.
- Enable customer-facing real-time data analytics by optimizing database design and query performance.
- Develop or implement an internal dashboard with descriptive statistics about the data we have and the performance of our databases.
- Automate workflows and data pipelines using replicable and extensible code with good version control practices and testing.
Minimum Qualifications and Education Requirements:
- BSc/BEng degree in computer science, mathematics, machine learning or equivalent (MSc/MEng preferable)
- Experiencing designing and administering a terabyte-scale Postgres database
- Experience implementing a variety of ETL techniques and frameworks
- Experience with distributed computing for data processing (Hadoop, Spark)
- Experience with unstructured data sources
- Experience with XML (especially proficiency with xpath)
- Knowledge of machine learning and/or statistics
- Experience with Google Cloud Platform
- Passion for AI and excitement about new developments
- Contributions to open source projects
To apply, please contact us at email@example.com with your CV.