Data Engineer (m/f)
City of London (Greater London) IT development
Job description
3118026
Additional Cities
Cramlington, Stutensee
Career Level
Experienced
Relocation Assistance
No
Business
GE Oil & Gas
Business Segment
Oil & Gas Headquarters
Function
Digital Technology
Country/Territory
Germany, United Kingdom
Postal Code
NE23 1WW
Role Summary/Purpose
You will be a member of an integrated team of data engineers, software engineers, data scientists and a product owner to deliver successful outcomes driving efficiency and creating new revenue streams
Essential Responsibilities
· Collaborating with system engineers, data scientists, frontend developers and software developers to implement solutions that are aligned with our stakeholder’s and our own strategic directions
· Implementing, documenting and supporting data engineering solutions in an agile environment
· Creating data visualizations using the latest development methods and infrastructure
· Create custom software components (e.g. specialized UDFs) and analytics applications
· Usage of Spark 2.x APIs like DataFrames, DataSets, SQL Apis, spark sessions, utilization of latest spark features for scaling of algorithm development
· Installation of Python and other libraries required for Data engineering on Hadoop environment
· Following tasks are carried out by Ayush independently
· AWS EMR/Spark configurations, Monitoring of Spark UI Monitoring for Java Heap, usage of clusters and execution of jobs
· Usage of AWS S3 buckets for executing of Spark jobs, knowledge of Spark parameters for remote code and application jar execution
· ETL operation in Spark and Optimal storage of intermediate data on AWS S3 buckets – knowledge of parquet formats, query and read from these files in Spark jobs
· Creating Bootstrap scripts for any libraries installs for all clusters
· Generic monitoring of Spark Clusters using Ganglia – reporting of memory usage, CPU stats for Spark Jobs so that Data Engineering /Data scientist team can optimize the scaled spark codes
Qualifications/Requirements
· Experience with relational databases, (preferably Oracle, Postgres, Greenplum)
· Significant experience writing complex SQL queries, strong PL/SQL skills
· Experience with at least one programming language (preferably Scala, Java or Python)
· Experience in Unix/Linux environments
· Good English skills (written and spoken)
· Positive attitude and team player
Applications from job seekers who require sponsorship to work in the UK are welcome and will be considered alongside all other applications. However, non-EU/EEA candidates may not be appointed to a post if a suitably qualified, experienced and skilled EU/EEA candidate is available to take up the post, as the employing body is unlikely, in these circumstances, to satisfy the Resident Labour Market Test. For further information please visit the UK Border Agency website
http://www.ukba.homeoffice.gov.uk/visas-immigration/working
Desired Characteristics
· Variety of languages and tools (e.g. scripting languages) to marry systems together
· Integration of these libraries to current algorithm/Spark jobs- integration of JUnit, Spark Unit-Testing framework in the spark codes
· Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala, Phoenix
· Programing skills in Map/Reduce , Spark jobs –heavy lifting programming skills will be required for custom implementations or specialized implementations and use case based on Algorithms and machine learning models built/configured by Data scientist/Algorithm teams
· Development of Map/Reduce , Spark Jobs/Pipelines on Hadoop distributed environment in Python and Java - Pyspark and Java-Spark jobs
#DTR
About Us
GE is the world's Digital Industrial Company, transforming industry with software-defined machines and solutions that are connected, responsive and predictive. Through our people, leadership development, services, technology and scale, GE delivers better outcomes for global customers by speaking the language of industry.
Primary Country
United Kingdom