Data is the foundation of everything we do. As a senior data engineer you’ll have authority (and responsibility) of designing and maintaining infrastructure for collecting, storing, processing, and analyzing terabyte-scale sets of data including large document corpora, machine learning results, metadata, and application data.
We are looking for a Data Engineer to collect, manage, and deploy massive sets of global patent data and more. Your role will be to ensure that our dataset of over 100 million patents is readily available for our web application and data science teams. You’ll also be responsible for identifying and integrating additional data sets which allow us to expand our product features and AI capabilities.
This is an exciting opportunity to engage with cutting-edge technology and work on a real-world problem at global scale. In addition to competitive compensation and benefits there is also room for the right person to take on increased responsibilities. And it’s a lot of fun (although fast-paced and even chaotic at times) working as part of a small, passionate team.
Tokyo, Melbourne, San Francisco (remote work is possible in some cases)
Take ownership of understanding, acquiring, and managing innovation and technology related datasets starting with global patents
Write and automate pipelines for data cleansing, ingestion of machine learning results, ingestion of raw data from multiple sources, aggregation and more
Architect and manage data infrastructure to optimize for machine learning, large-scale data exploration
Ensure fast and reliable access to clean data which our client-facing web application depends on
Seek and integrate new sources of data related to our core business
Communicate data extent and performance internally
Minimum Qualifications and Education Requirements:
BSc/BEng degree in computer science or equivalent
Strong relational database experience, preferably with Postgres
The ability to communicate high level information about datasets, preferably using data visualization
Experience writing performant data pipelines at scale, e.g. with Spark or Airflow
The ability to use a modern language with a strong concurrency model for fast data processing such as Elixir, Rust or Go
Passion for AI and excitement about new developments
Contributions to open source projects
Experience with machine learning
Experience with data visualization
MSc/MEng degree in computer science or equivalent
To apply, please contact us at email@example.com with your CV.