About the Role

Title: Principal Software Engineer – Data Engineering

Location: USA – San Francisco, CA / USA – Remote

Job Description:

Why project44?

At project44, we revolutionize supply chains with our High-Velocity Supply Chain Platform. As the connective tissue of the supply chain, project44 optimizes global product movement, delivering unparalleled resiliency, sustainability, and value for our customers. We operate the world’s most trusted end-to-end visibility platform, tracking over 1 billion shipments annually for 1,300 leading brands across various industries, including manufacturing, automotive, retail, life sciences, food & beverage, and oil, chemical & gas. Our High-Velocity platform eliminates supply chain friction, enabling sophisticated inventory control, exceptional customer experience, and predictive analytics through machine learning and automation.

If you’re eager to be part of a winning team that works together to solve some of the most challenging supply chain challenges every day, let’s talk.

About the role:

As a Principal Software Engineer – Data Engineering at project44, you’ll have opportunities to work on the latest technologies to streamline Machine Learning & AI Operations, build scalable data infrastructure and democratize data access.

What you’ll be doing:

· Work with software architecture and design as part of your job. Leverage and institute best practices from the areas of distributed systems, databases, data platform, infrastructure and platform software, manageability and observability.

· Providing guidance on new technologies and continuous improvement in best practices. Researching, implementation and development of software development tools

· Build systems in a multi-cloud environment – we use AWS & GCP but value experience in other cloud environments such as Azure

· Build complex metrics solutions with data visualization support for actionable business insights.

· Leverage expertise in latest Gen AI tools & methodologies like RAG , Vector DB, embeddings to architect and build automated data access & interpretation solutions.

· Design and development ETL/ELT using Python/Java with Snowflake, Postgres and other data stores. Be able to develop & automate a project through its entire lifecycle

· Knowledge in Data Warehouse/Data Mart design and implementation. Be able to develop a project through its entire lifecycle

· Build distributed, reusable, and efficient backend ETLs. Implementation of security and Data protection

· Understand repeatable automated processes for building the application, test it, document it, and deploy it at scale

· Work collaboratively with insights and data science teams to understand end user requirements to provide technical solutions and for the implementation of new features and data pipelines

· Establish quality processes to deliver a stable and reliable solution

· Efficient in writing complex SQL, stored procedures in Snowflake, Postgres, BigQuery

· Preparing documentation (Data Mapping, Technical Specifications, Production Support, data dictionaries, test cases, etc.) for all projects

· Coach junior team members and help your team to continuously improve by contributing to tooling, documentation, and development practices

You could be a great fit if you have:

Experience & Education

· 8+ Years of experience in leading Data Engineering efforts

· 3+ Years of experience in Snowflake, Oracle and knowledge in No SQL database like MongoDB

· 3+ Years of experience in Python/Java

· 3+ Years of experience in ETL Developer role with deep knowledge of data processing tools like Airflow, Argo workflow

· 4+ yrs experience with data engineering and operations, including administering production-level, always-on, high throughput, complex OLTP RDBMS

· Experience in delivering software solutions in areas of distributed systems.

· Experience with working with Neural network and Gen AI methodologies.

· Strong experience in building data warehouse solutions and Data Modeling

· Strong ETL performance-tuning skills and the ability to analyze and optimize production volumes and batch schedules

· Experience with ETL, GCP, Unix/Linux, Helm Charts as well as Git or other version control systems

· Experience with PII redaction for traditional ETL pipelines, as well as in GenAI solutions.

· Expertise in operational data stores and real time data integration

· Expert level skill in modeling, managing, scaling and performance tuning high-volume transactional database

· Bachelor’s Degree in computer science or equivalent experience

Technical Skills

· Strong programming/scripting knowledge in building and maintaining ETL using Java, SQL, Python, Bash, Go

· In-depth hands-on knowledge of public clouds – GCP(preferred)/AWS, PostgreSQL (version 9.6+), ElasticSearch, MongoDB, MySQL/MariaDB, Snowflake, BigQuery

· Participate in an on-call rotation to mitigate any data pipeline failures

· Strong experience with Kafka or equivalent event/streaming based systems

· Experience with Docker, Kubernetes

· Experience with RAG, Vector DB, embedbings,etc.

· Develop and deploy CICD pipelines for Data Engineering

· Experience and knowledge of optimizing database performance and capacity utilization to provide high availability and redundancy

· Proficiency with high volume OLTP Databases and large data warehouse environments

· Ability to work in a fast-paced, rapidly changing environment

· Understanding of Agile and its implementation for Data Warehouse Development

Professional Skills/Competency

· Focus on development/ improvement of framework to support repeatable and scalable solutions

· Demonstrates excellent communication and interpersonal skills; able to communicate clearly and concisely

· Takes initiative to recommend/ develop innovative approaches to getting things done

· Is a team player and encourages collaboration

APPLY HERE