Software Development Intern [gn] - Ilmenau / Germany

Sorry, this job was removed at 12:15 p.m. (CST) on Thursday, May 08, 2025
Hiring Remotely in Ilmenau, Thüringen, DEU
In-Office or Remote
Cloud • Software • Database • Analytics
The Role

We offer a position as Software Development Intern with the opportunity to gain experience on a cutting-edge data processing system. This is your chance to become a member of the team behind the fastest analytical database system on the market. You will learn about high performance implementations in the database kernel and in its distributed version based on the Hadoop environment. 


You will be part of our ActianX/Vector team, thereby engineering a family of products in C, which are provided for a variety of OS distributions. We are looking for team players that integrate into our distributed development team. You will be mentored by senior developers and be introduced into our software development process with responsibilities that cover design and implementation challenges.  


 

KEY RESPONSIBILITIES:  

To be curious and eager to learn more about state-of-the-art database development 

Contributions to design and implementation of enhancements improving performance, stability and scalability of our high-performance data processing kernel 


Create tests for and maintain the implemented functionality in our continuous integration management environment 


ESSENTIAL QUALIFICATIONS:   

Enrolled into Bachelor’s or Master’s Degree program in computer science or equivalent at a German university

Good programming skills (C family) 

Well-founded algorithm-design skills 


OPTIONAL SKILLS: 

Experiences in software development, e.g., advanced academic studies or in a commercial setting 

Experience with large-scale systems development 

Competent script programming skills (Python, Bash) 

Experience with concurrent, parallel and network programming techniques 


Experience in the areas of business intelligence, high-performance data processing and computer architecture and related 

Knowledge of operating systems internals (memory management, IO, scheduling etc.) 

Knowledge of database concepts and technology 

A working knowledge of SQL 


WE OFFER:


-        Internships– An internship with us may last from 3 to 12 months. For each internship, we provide a tailored project to research, design and implement a new functionality into our Vector database.

-        Master Topics – in coordination with your university’s examination office and your collaborating professor we will define a master project tailored to your needs and based on our available thesis topics.

-        Part-time jobs – Part-time jobs with us provide you with the opportunity to gain your first work experience in a program related field. Your contribution will help improve an already outstanding database product. Working hours and times are flexible and can be discussed when you decide to start a project with us.

 

Below you will find a list of topics together with a short explanation. These topics are either marked with (I)nternship, (M)aster topic and/or (J)ob.

 

Vector cloud deployment.

Providing databases as a hybrid on premise and in the cloud is a promising and already growing business. Our goal is to bring an on-premise Vector partially to the cloud and within this project the task is to exploit our cloud storage architecture used in Avalanche and bring it to the Vector product. (I,M)


Load balanced query execution in a clustered environment.

Load-balancing in a cluster is hard, because normally you cannot offload work to another node if that node does not have the data to work on. However, as the HDFS integration of VectorH controls its replication policy, there are opportunities to shift work around to other cluster nodes that already have the data. This requires developing a strategy for data placement, data processing and a work shifting strategy. (I,M)


External tables in Spark

VectorH supports reading data from external data sources such as Spark. The performance of queries accessing such external tables could be improved greatly, e.g., by pushing down selections or even subtrees of the query plan into Spark. The feature could also be extended with support for more sources and data types. (I,M)


Compact hash tables.

Smaller hash tables can be significantly faster, thanks to fewer CPU cache and TLB misses. The goal of this project is to find such compact representations by bit-packing multiple columns and using dictionaries for string data. (M)


PDTs on flash.

The goal of this project is to modify our structure for differential updates (Positional delta Trees - PDTs) to expand to disk. This requires the addition of a layer that resides on disk, most likely a flash disk. The fact that PDTs are expanded to flash would make it possible to store much more updates, hence reduce the checkpoint interval (where PDT updates are merged into the main data storage structures), and lead to the system being able to sustain much higher update workloads. Current research project with TU Ilmenau DBIS(I,M)


Collations.

Understanding the current use of character sets in Ingres including the way these character sets collate data and make these rules available also for Vector. Providing performance in that case is very difficult since some mechanisms heavily rely on expanding characters before processing them. Finding cache efficient algorithms for these cases is also part of the project. As an example, consider the ASCII order for “a”, “b” and “ä”. While ASCII would order these three letters “abä”, the German language typically requires “aäb”. (I,M)


Spatial data type support.

The goal of this project is the integration of geospatial datatype support into Vector. This requires the definition of new Vector datatypes and the integration into all stages of query execution. (I,M)


Tuple layout planning.

In this project, we want to challenge the way data is stored during query processing. In principle, any mix between horizontal and vertical storage (NSM vs. DSM) can be chosen. Some columns may actually be processed in vertical vectors, while other columns are processed in a tuple layout. Horizontal storage of data inside hash tables is already supported but needs to be extended to other operators. (M)


RDF in Vector. In principle, it should be possible to turn Vector into a highly efficient engine for RDF storage and query evaluation. This entails the storage of quads in a compressed PAX format, and a basic translation of SPARQL to SQL or even direct Vector algebra. (M)


Exploiting co-processors for Vector.

The most powerful piece of hardware in today’s average PC is the GPU, not the CPU. There have been studies how to express database operations of almost every conceivable type in GPUs. However, what is missing is a framework where complex queries consisting of many such operations could work together. (M)


Maintenance of our testing infrastructure. For our number one scoring TPC-H experiments we need to constantly stay up to date. Test numbers for our own improvements need to be recorded and maintained. In addition, all tests and comparisons need to be kept up-o-date with our competition (Impala, Hawk, SparkSQL and Hive). (J)


Adaptation of conversion functions.

There are many built-in datatype conversion functions that are slow in comparison to an actual optimized implementation. Replacing these functions will directly impact affected queries and lead to noticeable performance improvement. (J)



We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. 

Similar Jobs

DigitalOcean Logo DigitalOcean

Data Center Engineer II

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Remote
Germany
1400 Employees

GitLab Logo GitLab

Account Executive

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
Germany
2500 Employees
Remote
2 Locations
2331 Employees
17-17 Hourly

WeLocalize Logo WeLocalize

*Scout Search Quality Rater - German (Germany)

Machine Learning • Natural Language Processing
Remote
Germany
2331 Employees
15-15 Hourly
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Round Rock, TX
365 Employees
Year Founded: 2005

What We Do

Actian enables some of the most data-Intensive enterprises on earth to run their most mission-critical analytics and data management workloads. Thousands of forward-thinking organizations around the globe like Bloomberg, Intuit, Lufthansa, and Citibank trust Actian to help them solve the toughest data challenges and transform how they run their businesses... with data. Actian is majority owned by HCL Technologies (HCL), a next-generation global technology company that helps enterprises reimagine their businesses for the digital age. HCL serves leading enterprises across key industries, including 250 of the Fortune 500 and 650 of the Global 2000.

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account