Bangalore, IN
In this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, to enable business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.
The Enabling Functions Data Office Team is responsible for designing, developing, testing, and supporting automated end-to-end data pipelines and applications on Enabling Function’s data management and analytics platform (Palantir Foundry, AWS and other components).
Developing pipelines and applications on cloud platform requires:
* Hands-on experience with Terraform or Cloudformation and other infrastructure automation tools
* Experience with Azure DEVOPS
* Proven track record in setting up CI/CD pipelines and automating cloud infrastructure
* Strong understanding of cloud infrastructure, with experience in AWS or other cloud providers
* Experience with Azure DEVOPS , GitOps approach for automation
* Experience with automation of DBT orchestration using Azue Devops pipelines
* Experience working with services like Glue, EC2, ELB, RDS, Dynamo DB and S3
* Ability to work independently, troubleshoot issues, and optimize performance
* Practical experience is valued more than certifications
This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.
*Roles & Responsibilities:*
* Tech / B.Sc./M.Sc. in Computer Science or related field and overall 6+ years of industry experience
* Strong experience in Big Data & Data Analytics
* Experience in building robust ETL pipelines for batch as well as streaming ingestion.
* Big Data engineers with a firm grounding in Object Oriented Programming and an advanced level knowledge with commercial experience in Python, PySpark and SQL
* Interacting with RESTful APIs incl. authentication via SAML and OAuth2
* Experience with test driven development and CI/CD workflows
* Knowledge of Git for source control management
* Agile experience in Scrum environments like Jira
* Knowledge of container technologies such as Docker and Kubernetes is an advantage
* Experience in Palantir Foundry, AWS or Snowflake is an advantage
* Problem solving abilities
* Proficient in English with strong written and verbal communication
* Primary Responsibilities
o Responsible for designing, developing, testing and supporting data pipelines and applications
o Industrialize data pipelines
o Establishes a continuous quality improvement process to systematically optimize data quality
o Collaboration with various stakeholders incl. business and IT
*Education*
* Bachelor (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields
*Professional Experience*
* 6+ years of experience in system engineering or software development
* 3+ years of experience in engineering with experience in ETL type work with databases and Cloud platforms.
*Skills*
|*Big Data General*|Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.|
|*Data management / data structures*|Must be proficient in technical data management tasks, i.e. writing code to read, transform and store data
XML/JSON knowledge
Experience working with REST APIs|
|*Spark*|Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.|
|*SCC/Git*|Must be experienced in the use of source code control systems such as Git|
|*ETL* |Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.|
|*Authorization*|Basic understanding of user authorization (Apache Ranger preferred)|
|*Programming* |Must be at able to code in Python or expert in at least one high level language such as Python, Java, Scala.
Must have experience in using REST APIs|
|*SQL* |Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures and exception handling.|
|*AWS* |General knowledge of AWS Stack (EC2, S3, EBS, …)|
|*IT Process Compliance*|SDLC experience and formalized change controls
Working in DevOps teams, based on Agile principles (e.g. Scrum)
ITIL knowledge (especially incident, problem and change management)|
|*Languages* |Fluent English skills|
*Specific information related to the position:*
* Physical presence in primary work location (Bangalore)
* Flexible to work CEST and US EST time zones (according to team rotation plan)
* Willingness to travel to Germany, US and potentially other locations (as per project demand)