At Data Society Group, we provide the highest quality, leading-edge, industry-tailored data and AI training and solutions for Fortune 1,000 companies and federal, state, and local governmental organizations. We partner with our clients to educate, equip, and empower their workforces with the skills they need to achieve their goals and expand their impact. We are empowering the workforces of the future, supporting engineers and scientists to train up on the most complex AI solutions and Machine Learning skills.Role Overview
We are seeking a capable and resourceful Data Engineer with expertise in cloud-based text-focused AI systems to join our technology and solutions team. In this role, you will be a key individual contributor, applying your expertise to build robust, scalable, and complex data and AI solutions for our external clients. You will work within a cross-functional team, collaborating closely with UX Designers, Engineers, and Project Managers to translate client requirements into high-quality technical deliverables.
Note: Due to the confidential nature of our federal government Clients, this role requires the ability to pass a United States federal government Public Trust background check and is exclusively open to U.S. Citizens located within the United States.
Responsibilities- Design, build, and maintain scalable data pipelines for structured and unstructured data ingestion, transformation, and processing.
- Architect, build, and deploy LLM-based solutions on cloud platforms, including prompt pipelines, orchestration layers, embeddings, vector databases, and evaluation workflows.
- Design and implement RAG systems end-to-end including document ingestion, chunking/embedding, indexing, retrieval, grounding, model integration.
- Architect and enforce data models, governance, cataloging and schema design to support both analytics and AI workloads.
- Build and optimize cloud-native data architectures to support compute, storage, and orchestration for high-throughput, production-grade AI workloads.
- Implement reliable and efficient ETL patterns, leveraging best practices for data quality, lineage, versioning, and cataloging.
- Instrument observability and monitoring for data pipelines, including latency, error rates, and schema drift, with alerting and automated remediation where possible.
- Implement monitoring, observability, and performance optimization for data and AI systems.
- Operate effectively within Agile workflows, contribute to sprint planning, estimations, backlog refinement and continuous improvement.
- Work closely with clients to gather requirements, provide technical guidance and present solutions and implementation plans.
- Communicate complex technical information to both technical and non-technical stakeholders.
- Work cross-functionally with UX, engineering, and PM teams to deliver client-facing solutions.
- Translate complex technical needs into clear development requirements and implementation plans.
- Stay current with emerging technologies and recommend improvements to our engineering practices, architecture patterns, and cloud ecosystem.
- Hands-on experience deploying LLM-based applications, including RAG or similar retrieval systems.
- Proven experience deploying systems on AWS or Azure (AWS preferred).
- Strong understanding of embeddings, chunking strategies, retrieval optimization, and evaluation.
- 5+ years of data and analytics engineering in cloud environments.
- Expertise in SQL, Python, and schema design with experience in data cataloging and governance tools.
- Demonstrated experience building robust and maintainable data architectures, including real-time or steaming pipelines.
- Experience working in Agile / Scrum development processes.
- Excellent communication skills and ability to work cross-functionally with non-technical teams.
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
This position will be remote in the US though based out of the Washington, DC area with travel to client sites in DC if needed.
Top Skills
What We Do
WHO WE ARE:
Data Society provides industry-tailored, high-quality data science training and advisory services for corporations and government agencies with a desire to make data driven decisions. We believe that modern organizations need to be equipped to make data driven decisions that significantly amplify the effects of existing subject matter expertise. These skills result in increased efficiency and more accurate analyses for research and operations. In order for organizations to become more data driven, they must develop a common data vocabulary across leadership, analysts and generalists. They must also empower their managers and technical staff with appropriate data science skills. Data Society solves both of these needs through customized data science workshops and online learning that help organizations stay competitive and accomplish their missions.
OUR VISION:
We are passionate about increasing data literacy on a global scale so that organizations and their employees can be smarter about data. We've seen how data analysis can solve problems and provide insights that lead to better business and policy decisions. We are dedicated to giving others the tools to approach problems strategically and building a community in which we answer the question "How can data science change the world?".
OUR SOLUTIONS:
Our expert data scientists and engineers work alongside your teams to make more informed decisions, automate time-consuming manual processes, solve your most complex data challenges, and build data systems to ensure the enduring impact of your work.
OUR TRAINING:
We teach data science in an intuitive, engaging, and applicable way so that you can apply the methods you learn in class right back in the workplace. We provide training through a combination of in-person, virtual, and online instruction with supplemental coaching and support. We don't just provide data science courses - we create communities of professionals who think about data differently.









