A Day in the Life of 6 Data Engineers

Adam Calica
February 15, 2021
Updated: May 28, 2021
Adam Calica
February 15, 2021
Updated: May 28, 2021

Being a data engineer isn’t confined to one set of responsibilities. 

For example, Dennis Hume, a data engineer at alcohol e-commerce platform Drizly, is currently responsible for transitioning the company’s data orchestration over to Dagster. That project ladders up what Hume believes to be the purpose of a data engineer. 

“Ultimately, the role of data engineering is to support data stakeholders across the organization and not be the bottleneck for data work,” Hume said. 

Cross-team collaboration is similarly practiced at software development company MORSE Corp. Lead Data Scientist Lena Bartell is responsible for managing a team of data scientists, software engineers and data engineers — all of whom are constantly working together to create algorithms and data processors. 

Built In caught up with six data engineers to get a better pulse on their day to day and how they are enabled to be successful in their roles. 

 

MORSE Corp

Lena Bartell

LEAD DATA SCIENTIST

Lena Bartell

MORSE Corp. is an artificial intelligence company that provides customer-focused algorithm and software development services. Lead Data Scientist Lena Bartell said that the company’s engineering teams are designed to be multidisciplinary in order to create diverse teams where they can continuously learn something new from one another.

 

Typically, what does your role as a data engineer entail?  

As the lead of a data science and data engineering project, I have a team of about 10 people who have expertise in data science, software engineering and data engineering. Together, we primarily build Python-based algorithms and data processors that are wrapped and deployed into Linux-based systems and AWS cloud instances, and run in real-time environments using Docker containers.

My typical day-to-day tasks as a data engineer:

  • Go through email and reply to messages.
  • Check-in with the team for 15-30 minutes to review progress and issues.
  • Review work completed by other folks on the team to make sure it meets best practices and works as expected.
  • Meet with individuals ad-hoc to work through any bugs or blockers.
  • Write/test/run code and algorithms on the data to make sure they run and work as expected.
  • And plan tasks for the team for the upcoming days and weeks and review decisions with management.

     

    Teams are built with engineers who have a mixture of skills, so you’re always working with people from both similar and different backgrounds.”


    What’s one thing that might surprise people about your role as a data engineer at your company? 

    One thing that might surprise people about the roles at MORSE is that data engineers (or any particular discipline) are not separated off into their own group. Teams are built with engineers who have a mixture of skills, so you’re always working with people from both similar and different backgrounds. This is great for building a diverse and talented project team as well as allows all of us to gain knowledge on new topics. 

     

    H-E-B

    Amogh Antarkar

    DATA ENGINEER

    Amogh Antarkar

    Give us a little insight into a typical day for you.

    My day typically starts with a daily team sync and coffee. I spend time in design, architecture and development of the data engineering patterns and solutions. My role involves working closely with fellow data platform engineers and architects to design and model solutions for the business and data science teams that are our key stakeholders. Other times, I am modernizing data platforms and implementing frameworks to establish better standards. I occasionally sync up with leaders and product for sprint and roadmap-planning to collaborate on new initiatives.

    For our tech stack, I use services on AWS and Azure because that is where the data lake resides. Python, Spark, Databricks, Datadog, Kafka, GitLab and CI/CD tools are some of the other tech we use. Many source systems reside on Google Cloud and on-premise tools, so we integrate with these as well. H-E-B is a large progressive company with an increasingly digital presence. Its engineering teams reside in multi-cloud environments with a variety of technologies and tools.

     

    Tell us about a project youre working on right now that youre really excited about.

    I designed and implemented a data engineering service that integrates Google Analytics clickstream data across H-E-B’s e-commerce apps and sites. This helps us listen to the pulse of the business and run experimentation to improve customer experience. The sheer volume of data generated daily made it an interesting large-scale data problem. Add in the multi-cloud nature of the platforms, the faster analytics, and the business and data science needs, and this project was demanding. It forced the team and I to architect a well-thought-out cloud native data processing pipeline. 

    A project like this was possible because of the flat organizational team structure at H-E-B and the people-centric culture. It gave me an opportunity to drive a key initiative and build a reusable and reliable engineering pattern from the ground up that had a positive impact in the engineering chapter.

    In order to be a strong data engineer, you need a diverse skill set.’’

     

    Whats the most important skill (hard or soft) a data engineer needs to be successful in their role?

    In order to be a strong data engineer, you need a diverse skill set. This includes programming, SQL, cloud, data and software engineering, design and data science. In addition, an engineer needs to understand the business and have an ability to communicate with stakeholders effectively. Data is the core to everything. It is essential to be curious, and to learn about state-of-the-art technologies. Loving (data) is the answer!

     

    SPINS LLC

    Mark Dai

    SQL DEVELOPER

    Mark Dai

    What does a typical day look like for you? 

    During a scheduled data release week, a typical day involves executing, monitoring and troubleshooting workflows in our on-premises and cloud environment. There is also excitement around our day-to-day projects outside of data release week, as we constantly look for ways to enhance our current workflows. For instance, we problem-solve; identify operating procedures; create new automated quality assurance checks; and keep up with new functionality released for our cloud computing. 

    As a data operation team member, it is important to communicate with a wide range of stakeholders. We need to know about upcoming features and changes to our product library, then evaluate whether there’s anything that needs to be altered in the operation procedure. We also need to understand common questions or requests received by customer success so we can prioritize our work. 

    I mainly use Python and SQL for projects. We also need a strong understanding of various Google Cloud Platform tools and workflow processing engines like Airflow and Azkaban. Occasionally, we need to perform troubleshooting with MySQL, Hadoop Distributed File System and Hadoop logs.

    As a data ops team member, it’s important to communicate with a wide range of stakeholders.”

     

    Tell us about a project you’re working on right now that you’re really excited about. What about this project specifically do you find rewarding or challenging?

    We are always excited about adding new technologies to our workflow that simplify our daily operations and improve efficiency. One of the current projects I am very excited about is migrating our scripts from our on-premises server to Airflow. This new system will be more efficient than repeatedly deploying code, running the environment and scheduling jobs with cron. Airflow also enhances collaboration, making it easier to trigger processes continuously and share the tools with others.

     

    What’s the most important skill a data engineer needs to be successful in their role?

    Although many skills are needed for this role, the most important ones are staying focused and task management under pressure. The bottom line is that data will go directly to customers after our process, so we must make sure that we have good data quality in a limited amount of time. When operation procedures do not go as scheduled, we may need to execute some things out of order to ensure high-priority needs are met.

     

    Young Alfred

    Kang Cao

    SOFTWARE AND DATA ENGINEER

    Kang Cao

    Kang Cao, describe a typical day for you. 

    When working on a data science project, we adapt from the general data mining methodology CRISP-DM (cross-industry standard process for data mining) to a spiral development process. The six stages of that process are as follows: business understanding, data understanding, data preparation, modeling, evaluation and deployment.

    After defining the scope and brainstorming, data engineers are required to collect data for exploration. Data sources can be various and scatter in numerous corners in production environments, not to mention metadata. Then engineers can work on data integration, including data standardization, data categorization and persistence. It is hard to know how to standardize data, including table naming, label categorization, table usage and whether to increment data volume. It may require assistance from business intelligence. After figuring out what our data warehouse looks like, it is easy to provide the functionality of transforming and providing data for an application. 

    From there, we ask ourselves if we should provide a RESTful API or a stream-engine Kafka. It can be flexible based on the needs of our analyst teammates. When building the platform, we value performance, clean code and crafted query. 

    When it comes to deployment, engineers have an important role in converting the model into a product. We might need a packaging model into binary when there is a difference in language between research and production environment. Suppose the model does not provide a solution of extracting and processing scalable data. In that case, we might need extra layers such as wrapping Python functions into a cron or streaming job with a cache layer. In most cases, monitoring mechanisms are required to locate problems ignored in the development stage and detect covariant changes, which are any changes in distribution of source data of the running model.

     

    HOW YOUNG ALFRED USES DATA

    Young Alfred’s home insurance platform allows clients to make informed homebuying decisions. Its technology analyzes billions of data points for every customer to find the best-priced insurance options available.


     

    Tell us about a project youre working on right now that youre excited about. 

    Last month, we released Young Alfred’s home insurance calculator to estimate a home insurance premium using a limited number of input data. A comprehensive home insurance calculation requires over 100 factors, but people often don’t want to enter 100 pieces of information to get a ballpark figure. Adding to the complexity, those 100 factors can change over time and change from one zip code to another. There are more than 42,000 ZIP codes in America and each one can have a different custom calculator to estimate home insurance premiums. 

    Rather than starting with a clustering algorithm, we chose to pick a simpler model as a baseline and improve on it over time. We built a simple model framework with 42,000 sets of model coefficients. 

    We finalized on a model that cannot guarantee it will get all 10 input factors and still perform while minimizing squared error. While we evaluated over a dozen types of models to arrive at our first implementation, it’s fun to know that the one we implemented was grounded in simplicity and still leaves room for improvement. As a starting point, it is still more powerful than anything else out there.

    Drawing insights and predictions from this undermined data is similar to discovering a whole new world. 

     

    What’s one thing that might surprise people about your role as a data engineer at your company?

    Understand that not all data is equal. Datasets are as diverse as people. Most of the established tech and finance industries work with beautifully cleaned and normalized data because the industry is mature and the value of a basis point in that industry is so high. However, many industries are filled with disorganized data that is highly categorical and incredibly sparse. Drawing insights and predictions from this undermined data is similar to discovering a whole new world. 

    More than data engineers, we are data explorers. We handle the data everyone else said was too hard to parse or model, and that is what excites the best data engineers.

     

    Drizly

    Dennis Hume

    DATA ENGINEER

    Dennis Hume

    Data Engineer Dennis Hume said that in his role at Drizly, he works closely with the analytics and DevOps teams to ensure they’re all on the same page and are able to leverage each other’s work on any given project. 

     

    Typically, what does your role as a data engineer entail? 

    I work very closely with the analytics team, which includes data scientists and analysts. Within our data stack, we have a lot of shared spaces (dbt, Snowflake, etc.) and we want to make sure that everything is flexible enough for the range of jobs on the team, but also that there are enough guardrails that no single process can prevent other people from doing their job. 

    Outside of analytics, I work a lot with our DevOps team. The infrastructure requirements for analytics are usually different from the core application, so we always want to make sure we’re all on the same page and can support each other. I want to be able to leverage all the work they do but think of how it applies to analytics use cases.

    Ultimately, the role of data engineering is to support data stakeholders across the organization.”


    What’s an interesting project you’re currently working on?

    We are transitioning our data orchestration over to Dagster. As we grew the data science team, Dagster has been something the team can coalesce around. We treat the work done by analytics like a core application and Dagster helps ensure that when we build something it has predefined environments, tests and workflows that align with all the roles on the team. We want as little bespoke work as possible. If we have done something successfully in the past, it should be trivial for someone else to spin up a similar process.

     

    What’s an important skill a data engineer needs to be successful in their role?

    The most important thing is trying to stay informed about the data landscape and know how you can evolve your data stack with your organization. Many of the technologies we use did not exist a few years ago and now I can’t imagine life without them. When you evaluate a new piece of technology, you want to think about how it complements what exists within your stack and what it would potentially replace. 

    Ultimately, the role of data engineering is to support data stakeholders across the organization and not be the bottleneck for data work. That’s why it’s important to see what other people are doing in the data world and how other organizations are solving problems. Building a solution from scratch should rarely be your first thought — even if you have a clever pun for your project name.

     

    MobilityWare

    Grace Ge

    ASSOCIATE DIRECTOR OF DATA ENGINEERING

    Grace Ge

    What’s a typical day like for you as a data engineer at MobilityWare? 

    A typical day for me begins with checking the alerts for the data integration automation jobs to see if I need to troubleshoot any issues or backfill data. If not, I will proceed to peruse the Jira tickets to identify tasks to add to my to-do list, such as data pipeline implementation, code review, production deployment, query performance tuning or data quality testing. Next is our team’s daily 15-minute sync to share development plans and identify any blockers that require immediate attention. 

    I work very closely with our CTO and CRO, along with the product, marketing, monetization and engineering teams. I do my best to truly understand each departments’ requirements and business goals so that I can design the most efficient systems possible. For the rest of the day, I focus on coding for our data lake and data warehouse projects or using Python for API integration projects. Once a week, we will have a design and code review meeting that helps us share fresh ideas and best practices. Whenever possible, I also schedule some time for self-improvement by learning new management and engineering techniques.
     

    The opportunity to solve some of the challenges facing our teams is what excites me about coming to work (virtually) every day.”


    What’s an interesting project you’re currently working on? 

    As a mobile gaming company, we have a lot of in-game events and third-party data across our titles and platforms. Having a 360-degree view of customer behavior with standardized KPI metrics is key to making informed business decisions. I’ve standardized our infrastructure and I’m working on a data-unification project with my team in which we are creating a standard framework to consolidate our customers’ data on a daily basis. 

    The challenge of this project is solving both the technical and process requirements simultaneously. We’re required to design an efficient system that is stable, scalable and secure to handle trillions of rows of data across in-house and cloud data platforms. Also, we need to streamline the processes among various departments to ensure that the newly created systems and frameworks are adopted properly. This project will not only provide a single source of truth for our customer view across our marketing, sales and finance departments, but it will also reduce hundreds of labor hours for our data and business intelligence teams. The opportunity to solve some of the challenges facing our teams is what excites me about coming to work (virtually) every day.

     

    What’s the most important skill a data engineer needs to be successful in their role?

    There are many skills required to be a successful data engineer at MobilityWare. Creating batch and real-time data pipelines, understanding data warehouse and data mart design and implementation, API data integration, communication and critical thinking are just some of the skills needed to succeed here. Above all though, I think the ability to troubleshoot in both technical and non-technical areas is the most vital. 

    At MobilityWare, you will have the opportunity to create a brand-new data pipeline and you will also be the first person to see it in production. Due to the complexity of the data sources, there will be unexpected data issues that come up frequently. You need to be able to troubleshoot on the fly to quickly figure out the root cause, via error logs or other means, to keep the data and minimize any interruptions to our business. One of our company values is to encourage team members to do the right thing. To do so, you need to keep your eyes open for potential issues and look for new ways to improve our systems so that MobilityWare may continue to bring joy to others one game at a time.

     

    Great Companies Need Great People. That's Where We Come In.

    Recruit With Us