Job Description

Title: Principal Data Engineer

Location: Remote or Madrid (HQ)

About the positions

These are excellent opportunities to work in a company with a highly technological product that generates hundreds of thousands of events per second. A vast sea of data that not only stored and organized but also consumed to improve all aspects of the operation: pricing, dispatching, marketing, governance, and many others.

At data engineering, we have dozens of services (Scala, Golang, Python), pipelines (Apache Beam, Airflow), and our in-house developed Machine Learning platform. We are a hands-on team: we manage our own infrastructure (GCE and AWS) and several Kubernetes clusters. Our etls platform has more than 300 processes (Python, Airflow, Redshift, S3, Spectrum, Glue, RDS) that we visualize with our Tableau server cluster (Tableau Server Linux, Ec2, Python)

Cabify is a global company with a very complex product, but at the same time with the perfect size to allow you to have a tangible impact on the final product. You will be able to build and improve the platform that provides trusted data at scale to the rest of the company. And you will do it as part of a team of experienced data engineers, helping each other grow technically and professionally.

You will (within Platform and Machine Learning group):

  • Design and develop end-to-end data solutions and modern data architectures for Cabify products and teams (streaming ingestion, data lake, data warehouse…).
  • Evolve and maintain Lykeion, a Machine Learning platform developed along with the Data Science team, to take care of the whole lifecycle of models and features. It includes a feature store, which allows other groups inside Cabify to make better decisions based on data, and a prediction platform to serve ML models.
  • Design and maintain complex APIs exposing data at scale, that helps other teams to make better decisions.
  • Provide the company with data discoverability and governance.
  • Collaborate with other technical teams to define, execute and release new services and features.
  • Manage and evolve our infrastructure. Continuously identify, evaluate, and implement new tools and approaches to maximize development speed and cost efficiency.
  • Extract data from internal and external sources to empower our Analytics team.

You will (within Analytics Engineering group):

  • Maintain & evolve our data warehouse, making sure the data is easily accessible, reliable & accurate.
  • Create data models, applying business logic to data. Evaluate all proposals and requests to improve the structure of the data warehouse.
  • Coordinate with other data stakeholders to ensure overall health and performance of the data warehouse environment.
  • Design, develop, test, monitor, manage, and validate data warehouse activity. Including defining standards for the data warehouse as well as troubleshooting ETL processes and resolving issues effectively.

What we’re looking for (within Platform and Machine Learning group):

We are looking for experienced data engineers with excellent know-how in large-scale distributed systems:

  • 5+ years of tenure in coding and delivering complex data engineering projects.
  • Fluency in different programming languages (we work with Python, Scala, and Go; you don’t need to master all three of them).
  • Deep understanding of:
    • Message delivery systems and streaming processing (Kafka, RabbitMQ, Akka streams, Apache Beam )
    • Data processing technology stacks and distributed processing (Hadoop, Spark, Apache Beam, Apache Flink…)
    • Storage technologies (file-based, relational, columnar, document-based, key-value…)
    • Orchestration tools such as Airflow, Luigi, or Dagster.
    • Cloud infrastructures (GCP, AWS, Azure)
    • Automation/IaC tools (Terraform, Puppet, Ansible )
    • MLOps

What we’re looking for (within Analytics Engineering group):

  • Great alignment with our principles, we take this very seriously.
  • You continuously find ways to derive more value in our raw data and lower the amount of effort our end users spend on getting answers.
  • You can take a complex concept and make it sound simple. You’re accomplished at orienting business users within the data domain, understanding their needs, and translating them into technical requirements to ultimately design effective data solutions.
  • You have experience integrating data from multiple sources including DBs, product tracking, and APIs. You get excited by seeing your jobs run like clockwork.
  • Proven track record in Data Modeling.
  • At least 5 years tenure in coding and delivering complex data projects.
  • At least 5 years of work experience in python.
  • You have high level SQL skills, which allows you to implement, understand and improve complex queries.
  • You have proper DBA skills, with which you should ensure high standards in DWH data accessibility, integrity, security & performance monitoring.