We are looking for a skilled and motivated Mid-level Java Developer to join our dynamic data acquisition team. In this role, you will be responsible for building and maintaining robust web scrapers that form the backbone of our data-driven insights.
You will work on extracting both structured and unstructured data from a wide array of web sources, ensuring the efficiency, accuracy, and reliability of our data pipelines. This position requires a strong foundation in Java development, a good understanding of web technologies, and a passion for solving the unique challenges presented by web scraping at scale.
Develop and Maintain Scrapers: Build, deploy, and maintain efficient and reliable web scrapers using Java and its core libraries to extract data from diverse websites and online sources.
Automate and Schedule: Design and implement scripts to automate repetitive scraping tasks, scheduling jobs using tools like cron or enterprise schedulers (e.g., Airflow) to ensure timely data collection.
Data Storage and Management: Store and manage scraped data effectively in various databases, including SQL and NoSQL solutions, as well as cloud-based storage platforms.
Overcome Scraping Hurdles: Employ various tools and techniques to successfully navigate and bypass common web scraping obstacles such as CAPTCHAs, dynamic content loading, and IP blocking.
Optimize for Performance: Ensure scrapers are optimized for performance and scalability, capable of handling large-scale data extraction tasks without compromising system stability.
Data Processing and Cleansing: Transform raw scraped data into clean, structured formats like CSV and JSON. Implement data validation and cleansing processes to guarantee data quality and integrity.
Ensure Compliance: Adhere to web scraping best practices and ensure all data acquisition activities are in compliance with legal and ethical standards, including website terms of service.
Collaborate Effectively: Work closely with data analysts, product managers, and other developers to understand data requirements and deliver high-quality, actionable data.
A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
3+ years of professional experience in software engineering with a strong focus on Java development and proven experience writing Java code to extract data from websites, ensuring efficiency, accuracy, and adherence to best practices.
2+ years of experience with web technologies, including a solid understanding of JavaScript, HTML, CSS, and XML for effective entity extraction and hands-on experience designing, querying, and managing data in both SQL or NoSQL databases.
2+ years of experience with core Java web scraping libraries such as Jsoup for HTML parsing and browser automation tools like Selenium or HtmlUnit for handling dynamic, JavaScript-rendered content, handling data formats like JSON and CSV, coupled with experience in data cleaning and validation techniques.
English proficiency of B2 or higher.
Experience with cloud platforms such as AWS, Google Cloud, or Azure for deploying and managing scraping infrastructure.
A foundational understanding of network traffic analysis.
Familiarity with the full Software Development Life Cycle (SDLC), including testing and quality assurance.
Proficiency with version control systems, particularly Git, for collaborative development.
Experience with CI/CD pipelines and associated tools.
A keen understanding of the importance of respecting website terms of service and practicing ethical scraping.
Primary Location:
CRI-SabanaFunction:
Function - Tech Dev and Client ServicesSchedule:
Full timeSimilar Jobs
What We Do
At Equifax (NYSE: EFX), we believe knowledge drives progress. As a global data, analytics, and technology company, we play an essential role in the global economy by helping financial institutions, companies, employers, and government agencies make critical decisions with greater confidence. Our unique blend of differentiated data, analytics, and cloud technology drives insights to power decisions to move people forward.
Headquartered in Atlanta and supported by nearly 15,000 employees worldwide, Equifax operates or has investments in 24 countries in North America, Central and South America, Europe, and the Asia Pacific region.
For more information, visit Equifax.com.
.png)





