Senior Data Infrastructure Engineer, Autonomous Driving
At Ridecell View All Jobs
We are building Nemo - a search engine for automotive data. Nemo extracts relevant events and scenarios from the petabytes of sensor data generated by Connected and Automated Vehicles (CAVs), including those with ADAS and fully autonomous driving systems.
We are looking for experienced engineers to design and build our next generation data and cloud infrastructure, to meet the data processing needs of large-scale vehicle fleets.
- Data pipeline architecture & refactoring
- Data schema optimization (e.g. moving from CSV files to time-indexed columnar storage)
- Designing and implementing a highly scalable data querying and visualization system (e.g. using ElasticSearch and Kibana), architecting data structures for low latency operation
- Database engineering and development of data access libraries
- Automate and script database management tasks
- Design and Build an API (REST/Python/CLI) for data handling, job management, serving processed artifacts, etc.
- Porting local and on-premises software to cloud infrastructure
- Dockerization of data processing software
- Tune python scripts for performance and efficiency
- CI/CD setup
- Move a POC product to production level and all that it entails
- This position requires expertise in data pipeline engineering for potentially massive amounts of structured and unstructured data.
- Experience with setting up Docker and container management systems like Kubernetes/EKS/ECS
- Experience with Airflow setup
- Experience with Docker configuration
- Experience with infrastructure-as-code (Terraform and CloudFormation) deployment and maintenance
- Experience with large scale storage systems (cloud and on-prem, NAS).
- Experience deploying applications to heterogeneous environments: cloud, on-prem (private cloud), and end-user (developer, robot).
- Experience working with Big Data and strong knowledge of big data infrastructure (e.g., Hadoop, Hive, HDFS, Spark, Elastic Search, AWS Athena)
- Experience with Apache parquet file based columnar storage
- Good knowledge of common ETL packages / libraries and data ingestion.
- Experience in setting up ELK Stack (Elasticsearch, Logstash, Kibana stack) to process data from multiple data sources, data analysis for filtering and indexing required data and building interactive Kibana dashboards required
- Experience maintaining systems in AWS (experience with other clouds is a plus). Understand and follow the Well-Architected Framework.
- Programming/Scripting experience in python
- Apache AirFlow, Nifi
- Apache Parquet files (columnar storage)
- Codebase primarily in Python, with NumPy, Pandas
- PostgreSQL with GIS, time-served extensions
- AWS S3, EC2
- Candidate technologies for scaling up: Apache Hive, ElasticSearch/Kibana, Amazon Athena/EMR
- Full health, vision, dental benefits
- Unlimited PTO