We believe the world is changing faster than ever

Join a Two Sigma Ventures portfolio company to help shape the future of technology.
Leverage our network to build your career.
Tell us about your professional DNA to get discovered by any company in our network with opportunities relevant to your career goals.

Staff Data Engineer



Software Engineering, Data Science
United States · Remote
Posted on Monday, September 25, 2023

Predictive analytics and machine learning power Socure’s groundbreaking technology and fuel our mission to verify 100% of good identities in real time and completely eliminate identity fraud on the internet.

Socure is the world leader in digital identity verification and fraud prevention. Our recent awards include Forbes 2022 America’s Best Startup Employers, The Forbes Cloud 100, The Deloitte Technology Fast 500, and Inc. 5000’s fastest growing companies.

Listen to why some of the world’s top technology investors see the enormous, transformative potential in Socure’s mission and products:


Socure is a data science company focused on digital identity verification and fraud prevention. We’re on a mission to provide equitable and seamless access to the products that people love most and it starts with a platform where business and consumers can manage their identity from anywhere in the world. Socure's approach is forward-looking and demonstrates how machine learning and AI can be applied to complex problems in a variety of industries including fintech, public sector, gaming, gig, e-commerce and more.

We are relentlessly focused on delivering an exceptional experience to our customers that helps them solve their most important problems. We achieve this by hiring the most talented data scientists, engineers, product managers, and go-to-market teams, empowering them to make smart decisions to solve real customer needs. You’ll have an outsized impact on the company's revenue and we’re excited to share exponential wins together.

Our Culture:

  • We don't care where you work from, as long as you are highly productive and collaborative. #NoPolitics #NoBureaucracy

  • We prize a growth mindset and continuous learning. If you think you've stopped learning, you will not thrive here.

  • We value outcomes and impact, not hours worked or adherence to arbitrary schedules. Do what it takes to achieve your key results, and take the time you need to recharge.

  • We trust teams and give them full autonomy and accountability. They are passionate professionals, not cogs in a machine.

  • Transparency builds trust and empowers teams to make the best decisions. Share everything.

  • Failure is learning. We create a safe environment where risk-taking is encouraged and mistakes are tolerated.

  • Teams over individuals. We promote a collaborative culture where silos are broken down and groups work together with a shared purpose. Solving hard problems requires partnership.

  • Remote first company.

As a Staff Data Engineer, you will:

  • Apply advanced algorithms and techniques to perform entity resolution, matching, and linking, enabling the identification and connection of related entities within the dataset.

  • Build an identity-based entity resolution graph by incorporating various identity attributes and factors to accurately link and disambiguate entities.

  • Utilize the entity graph in the development of data-driven applications, such as fraud detection systems, identity profiling etc.

  • Orchestrate and tune extremely large scale data processing across hundreds of nodes using Apache Spark

  • Collaborate closely with cross-functional teams, including data scientists, analysts, and software engineers, to understand requirements and integrate entity resolution capabilities into data-driven solutions.

  • Optimize graph-based algorithms and data structures to efficiently handle large volumes of data, ensuring the process remains performant, cost efficient and scalable.

  • Conduct data profiling and analysis to understand data quality issues and develop strategies for data cleansing and normalization.

  • Evaluate and integrate external data sources and APIs to enrich the entity resolution process, expanding the scope and accuracy of entity identification.

  • Stay up to date with the latest research and advancements in entity resolution and graph-based data processing techniques.

  • Provide technical leadership, guidance, and mentorship to junior team members, fostering their growth and ensuring best practices are followed.

  • Collaborate with stakeholders to understand business requirements and leverage the entity resolution graph to provide valuable insights and drive data-informed decisions.

  • Architect and develop scalable data processing pipelines for extracting, transforming, and loading data from multiple sources into the entity resolution graph.

  • Document technical specifications, data processes, and best practices, ensuring knowledge sharing and facilitating efficient collaboration across teams.

We’re looking for someone who has:

  • 7-9 years experience as a data engineer.

  • Extensive hands-on experience with Apache Spark, including building and optimizing Spark-based data processing pipelines for large-scale datasets.

  • Strong programming skills and hands on development experience in Scala, Python, and other languages.

  • Strong problem-solving and analytical skills, with the ability to design and implement efficient algorithms for entity resolution.

  • Proficiency in SQL and experience with both relational and non-relational databases.

  • Experience with data quality assessment and data cleansing techniques is a plus.

  • Excellent communication and collaboration skills, with the ability to work effectively in a team environment.

  • Solid understanding of data engineering principles, including data extraction, transformation, and loading (ETL) processes.

  • Proven experience as a Data Engineer, specifically working on entity resolution or graph-based data processing projects.

  • Experience with graph databases and frameworks such as Neo4j, Apache Giraph, or Apache Flink is plus.

  • Comfortable working with a large set of data and lean team to solve complex problems.

  • Ability to envision new idea and products through sophisticated data analysis.

Technologies we use:

  • Data Processing Frameworks: We leverage powerful data processing frameworks such as Apache Spark and Snowflake to handle large-scale data processing and distributed computing, enabling efficient entity resolution across massive datasets.

  • Programming Languages: Our data engineering team predominantly uses Scala and Python for implementing data processing pipelines, developing matching algorithms, and building scalable solutions for entity resolution.

  • Data Integration Tools: We utilize data integration tools like Apache Airflow along with proprietary data platform to efficiently collect, transform, and stream data from various sources into our entity resolution pipeline. These tools enable seamless integration and data flow management across different systems.

  • Cloud Platforms: We leverage the capabilities of cloud platforms such as Amazon Web Services (AWS) to enable scalability, flexibility, and cost-effectiveness in our entity resolution infrastructure. These platforms offer a wide range of managed services and storage options for processing and storing large volumes of data.

  • Machine Learning Libraries: We employ machine learning libraries like scikit-learn or PyTorch to incorporate advanced machine learning techniques into our entity resolution process. These libraries facilitate feature engineering, model training, and prediction for improved entity matching and linking.

  • Version Control and Collaboration: We follow best practices in version control and collaboration using GitLab. Gitlab enable seamless collaboration, code review, and version tracking, ensuring a smooth and efficient development process.

If you are passionate about entity resolution, graph-based data processing, enjoy working with large datasets, and want to be part of a dynamic team that is driving innovation through data, we would love to hear from you. This position could be based anywhere in the United States.

Salary Disclosure:

Base Salary range: $180,000 - $195,000

This represents the expected salary range for this job requisition. Final offers may vary from the amount listed based on factors including geography, candidate experience and expertise, and other job related factors. Socure's compensation and rewards package for full time roles includes a market competitive salary, equity, comprehensive benefits, and, for applicable roles, commissions plans or an annual discretionary performance bonus.

Socure is all about encouraging people to push the boundaries of what’s possible through top-tier performance, innovation, ownership, and shared expertise.

We empower excellence by providing great perks and benefits to both our fully remote employees in North America and our hybrid teams in India.

To learn more, check out Socure’s Career Page: https://www.socure.com/company/careers

Socure is an equal opportunity employer and value diversity of all kinds at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

To learn more about how our work is changing the world, check out these articles and videos: