About NewsBreak
NewsBreak is the content intelligence platform shaping the future of local information. With over 40 million monthly active users, our flagship platform delivers highly personalized local news and information experiences powered by advanced AI, recommendation systems, and adtech.
We're proud to be a Great Place to Work®-certified company, home to a dynamic team of technologists, product innovators, and business leaders who are passionate about solving meaningful challenges at scale.
If you’re a Dreamer, a Builder, an Innovator, we’d love to hear from you!
For more information, visit www.newsbreak.com/about
We're seeking a founding engineer to lead the design and development of our next-generation web crawling and dynamic indexing infrastructure. Your mission will be to create an adaptive, real-time crawling system that not only integrates seamlessly with external search providers but rapidly evolves based on user queries and interactions, continuously expanding and refining NewsBreak’s proprietary content index and recommendation knowledge graph.
This is far beyond a traditional web crawling role. You’ll architect and implement sophisticated crawling strategies informed directly by real user search patterns, enabling our AI agents to provide fresh, accurate, and hyper-localized responses. You will build infrastructure capable of dynamically responding to user queries, proactively crawling and indexing content within minutes, rather than days or weeks.
Your work will directly empower our AI-driven question-answering and recommendation systems, creating a closed-loop feedback mechanism where user queries trigger real-time crawling and indexing tasks, continuously improving our content quality and comprehensiveness. This is an opportunity to rethink web crawling as a foundational intelligence layer, rather than a static data collection tool.
Responsibilities- Design, develop, and deploy a real-time, adaptive web crawling and indexing infrastructure capable of proactively responding to user-generated queries and external search results integration.
- Architect dynamic crawling strategies that rapidly prioritize, fetch, parse, and index web pages based on real-time demand signals.
- Implement scalable crawling systems supporting millions of URLs per day with low latency (minutes-level) from discovery to indexing.
- Collaborate closely with AI, search, and recommendation teams to build a tightly coupled feedback loop between user queries, crawling decisions, and content indexing.
- Own the full lifecycle of the crawler infrastructure, from discovery algorithms, URL state management, garbage collection, deduplication, and storage optimization, to downstream indexing integration.
- Optimize crawler performance, reliability, and resource utilization through rigorous profiling, monitoring, and tooling.
- Mentor junior engineers and help build out a high-performing infrastructure team with deep expertise in intelligent web crawling systems.
- Bachelor's degree or higher in Computer Science, Engineering, or a related technical field.
- 5+ years of proven experience designing and operating large-scale web crawling and indexing infrastructure at major technology companies or innovative startups.
- Extensive experience with distributed systems, crawler frontier design, real-time URL prioritization, and high-QPS crawling infrastructure.
- Strong system-level coding skills in Python, Go, or C++.
- Demonstrated ability to integrate web crawling systems with downstream indexing, search engines, or NLP/AI pipelines.
- Solid understanding of web technologies, web protocols (HTTP/HTTPS), JavaScript rendering, and anti-scraping countermeasures.
- Experience building responsive crawling systems driven by real-time user signals or query logs is a strong plus.
- Excellent problem-solving, analytical, and communication skills, with a proactive attitude towards system improvement and innovation.
We offer a competitive benefits package:
- Health, dental, and vision care for you and your family (100% coverage for employee)
- Top-tier 401(K) plan with company matching
- Paid time off and paid holidays
- FSA, HSA and commuter benefits programs
- Team activity budget
CPRA Privacy Notice for California Candidates
Similar Jobs
What We Do
NewsBreak is the leading platform for local news and information, with more than 40 million users across America. By using new technology, NewsBreak provides community-focused news and information from over 10,000 sources in a timely and accessible way. NewsBreak is bridging the gap between new technology and traditional local media, offering an innovative digital solution that allows users to get the information they need to live safer, more vibrant, and connected lives.
Based in Mountain View, California, NewsBreak connects users with local information, national publishers, and targeted advertising from local businesses, with increased traffic and revenue that helps strengthen local communities.
We are always looking for great talents. Contact us via [email protected]
Our website:
https://www.newsbreak.com/about
Creator Program:
https://www.newsbreak.com/creators
Publishing Platform:
https://mp.newsbreakapp.com
Android App Download:
https://play.google.com/store/apps/details?id=com.particlenews.newsbreak&hl=en
iOS App Download:
https://itunes.apple.com/us/app/news-break-personal-local/id1132762804?mt=8
.png)






