Data Engineer

Data is the foundation of everything we do. As a senior data engineer you’ll have authority (and responsibility) of designing and maintaining infrastructure for collecting, storing, processing, and analyzing terabyte-scale sets of data including large document corpora, metadata, and application data.

We are looking for a Data Engineer to collect, manage, and deploy massive sets of global patent data and more. Your role will be to ensure that our dataset of over 100 million patents is readily available for our web application and data science teams. You’ll also be responsible for identifying and integrating additional data sets which allow us to expand our product features and AI capabilities.

This is an exciting opportunity to engage with cutting-edge technology and work on a real-world problem at global scale. In addition to competitive compensation and benefits there is also room for the right person to take on increased responsibilities. And it’s a lot of fun (although fast-paced and even chaotic at times) working as part of a small, passionate team.


Location:

Tokyo, Melbourne, San Francisco (remote work is possible in some cases)
 

Responsibilities:

  • Take ownership of understanding the landscape of innovation and technology related datasets starting with global patents

  • Write and automate pipelines for data cleansing, ingestion of machine learning results, ingestion of raw data from multiple sources, aggregation and more

  • Architect and manage data infrastructure to optimize for machine learning and large-scale data exploration

  • Ensure reliable data access to clean data which our client-facing web application depends on

  • Seek and integrate new sources of data related to our core business

Minimum Qualifications and Education Requirements:

  • BSc/BEng degree in computer science or equivalent

  • Strong relational database fundamentals, preferably with Postgres

  • The ability to communicate high level information about datasets, preferably using data visualization

  • Experience writing performant data pipelines at scale, e.g. with Spark

  • The ability to use a modern, low-level, typed language with a strong concurrency model for fast data processing such as Rust or Go

  • Experience with queue systems such as RabbitMQ or Kafka

 

Preferred Qualifications:

  • Passion for AI and excitement about new developments

  • Contributions to open source projects

  • Experience with machine learning

  • Experience with data visualization

  • MSc/MEng degree in computer science or equivalent

To apply, please contact us at info@amplified.ai with your CV and cover letter.