Software Developer (HPC/LSF/MPI/dask)
Initial 12 month contract + extensions
Hamburg / Remote Working (3 days per week onsite in Hamburg, can be reduced to 1 day per week onsite later on in project)
Goal of the project:
- Development/maintenance of a High-Performance Compute program using massive parallel computations
- Our client has an existing (self-written) SW project, which processed massive amounts (between 2GB and 2TB) of data and computes statistical data (averages, moments, correlations etc).
- The compute methods are implemented in cpython and the general framework in python.
- To speed computations up, they use a huge compute cluster and use LSF to schedule jobs and currently MPI for communication between jobs.
- The current software has become complex and nearly unmaintainable. They have two options now:
1. Either refactor, cleanup the code to be able to maintain and extend it or
2. Rewrite it from scratch, possibly using a different parallelization framework (e.g. python dask).
- The SW runs in a secure datacenter (REDhat Enterprise Linux 6 and 7), which only allows access from an certified location, so at least partial local presence in Hamburg is required.
- They also have other datacenter, which are accessible using VPN from remote,
- The SW can also run on normal shared memory workstations under Linux/Windows.
- They need support in the decision (cleanup vs. rewrite), possibly *define* a new SW architecture concept, implement, test, benchmark either the old or new SW. They are looking for a SW developer with deep knowledge in high performance computing (HPC) and especially LSF/MPI (or dask).
- Several years of experience in python, cython is required
- Beside the actual implementation, they also expect some architectural guidance from the candidate (they are users, not SW experts) and the ability to work self-driven
- Writing documentation, training users, debugging issues are also part of the tasks.
- Understanding of current SW and helping to decide if cleanup, rewrite is better option.
- Depending on this decision either restructuring or rewrite of HPC SW
- Document, implement, test, benchmark SW
- Train users
- Deep experience with LSF, MPI, dask, python, cpython
- Ability to design SW components with a focus on performance, scaling, maintainability
- Ability to work self-driven without constant guidance