Automation & Site Reliability Engineer

Wells Fargo
April 26, 2021
Concord, CA
Job Type


Job Description

Important Note: During the application process, ensure your contact information (email and phone number) is up to date and upload your current resume when submitting your application for consideration. To participate in some selection activities you will need to respond to an invitation. The invitation can be sent by both email and text message.  In order to receive text message invitations, your profile must include a mobile phone number designated as 'Personal Cell' or 'Cellular' in the contact information of your application.

At Wells Fargo, we are looking for talented people who will put our customers at the center of everything we do. We are seeking candidates who embrace diversity, equity and inclusion in a workplace where everyone feels valued and inspired.

Help us build a better Wells Fargo. It all begins with outstanding talent. It all begins with you.

Technology sets IT strategy; enhances the design, development, and operations of our systems; optimizes the Wells Fargo infrastructure; provides information security; and enables Wells Fargo global customers to have 24 hours a day, 7 days a week banking access through in-branch, online, ATMs, and other channels.

Our mission is to deliver stable, secure, scalable, and innovative services at speeds that delight and satisfy our customers and unleash the skills potential of our employees.

Platform Management Engineering Services, within Enterprise Functions Technology (EFT) focuses on scaled horizontal enterprise solutions that are stable, secure, and always on. EFT Engineering Services is seeking an Automation Engineer and a Site Reliability Engineer (SRE) to be a part of a newly embedded Site Reliability Engineering practice within EFT supporting multiple technology divisions.  We believe that "Hope is not a Strategy" and we solve operational issues through code. 

We are looking for an SRE who enjoys and thrives on solving complex problems through innovation impacting change at scale in a diverse environment. You will join a focused team of SREs introducing and advancing SRE discipline across several hundred applications and multiple vertical lines of business supporting the entire firm.  The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away complex issues and integrate with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications onboarded to SRE are available, have full stack observability, introduce continuous improvement through code and automation, provide operational insight through analytics, continuously test, are integrated with CI/D and work with application teams to ensure products and service we provide are always on.

The Automation Engineer will work within Wells Fargo Platform Management team partnering across platform teams, development teams, product owners, scrum masters, and with other technology centers of excellence. They are responsible for engineering new solutions (automated and procedural) to improve platform and application stability, performance, staff productivity, metrics and reporting;  ensuring all availability, architecture, quality, security, support and risk/compliance standards are met.  It will include both collaborating with other teams to accomplish the solutions and in other cases creating the solution.

The Automation Engineer will be responsible for the following:

  • Develop and oversee the engineering of automated solutions to improve platform and application stability, performance, staff productivity, metrics and reporting.
  • Design, build, deploy and maintain engineered solutions through collaborative efforts with team members and third-party vendors.
  • Collaborate with other teams within the Enterprise to design and create effective solutions.
  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
  • Perform advanced troubleshooting of incidents in mission-critical systems (on call support as necessary) and participate in preventative problem management activities.
  • Partner to influence and support innovation & continued drive towards automation, touch less operational sustainment as a design/architecture construct working with EFT technology partners/managers.

This role is posted as an Automation Engineer, the Wells Fargo job title is a Systems Operations Engineer.

  • Operational sustainment and reduce risks in the eco-system by aggressively pursuing safety and soundness type of actions not limited to vulnerability, patching, end of life and resiliency.
  • Manage and coordinate Production change requests and release management.
  • Manages continuous services improvements and drives innovation to ensure SLAs, KPIs and OLAs for the critical business processes, applications and partner interfaces.

This Site Reliability Engineer will be responsible for the following:

  • Instantiate Site Reliability Engineering practice at Wells Fargo EFT igniting the practice, principles, and culture leading by example.  Assist in training skilled peer engineers by growing the practice within EFT and partnering with peer platform embedded SRE teams.
  • Onboard 16 critical customer journeys and applications to Site Reliability Engineering working within EFT and Lines of Business to assess the availability of critical business flows, identify service level objectives and indicators, instrument applications for observability, onboard to CI/CD pipeline taking advantage of continuous testing, introduce continuous inspection, continuous improvement, and conduct destructive testing to reach 99.99% availability for the firms critical products and services leading to higher customer satisfaction and customer experience.
  • Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing
  • Evolve AIOPS, ChatOps, NoOps  introducing self healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes, leveraging Robotic Process Automation, Unified Communication, and AI/ML to improve availability of products we provide to customers
  • Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems, and evolve ChatOps to reduce time to recovery.
  • Share support responsibilities for critical applications and customer journeys onboarded to SRE including remediation of issues through Agile, conduct blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats.

Proven Technical Expertise with one or more of the following:

  • Software Development: Java, Go, C/C++, Scala, R
  • OS and Platform - AWS, Lamda, PCF, Kubernetes, OpenShift, Linux, Azure, Windows, VMware
  • CI/CD and Automation: Jenkins, Gitlab, SonarQube, Artifactory, Ansible, Puppet, Apigee
  • Observability and AIOPS: DataDog, Grafana, Prometheus, ELK, Elastic, Kibana, Kafka, CloudWatch, Jaeger, Zipkin, Kinesis, Apache Airflow, AppDynamics, Splunk

Experience in one or more of the following areas is desired:

  • AIOPS: Moogsoft, BigPanda, UIpath, Robotic Processing, Artificial Intelligence (AI) and Machine Learning (ML) Frameworks
  • Operations Tools: ServiceNow, PagerDuty, Microsoft Teams, Symphony/Slack, Remedy, IBM Netcool
  • Data/Data Structures: Oracle, SQL, Mongo, Hadoop, Cloudera, Spark, Teradata
  • Testing: Gremlin, Chaos Monkey, Selenium, jmeter, Blazemeter, Performance Center, Perfecto, Gherkin, DevTest
  • Capacity Management: Turbonomics, BMC Truesight

Required Qualifications

  • 5+ years of software engineering experience
  • 5+ years of development experience with languages such as Python, Java, Scala, or R
  • 1+ year of build-deploy automation and configuration experience within the Linux and Unix environment

Desired Qualifications

  • An industry-standard technology certification
  • Strong verbal, written, and interpersonal communication skills
  • Scripting and automation experience
  • Experience with Ansible automation tool
  • Incident Management System experience
  • Configuration Management Tools experience
  • Experience with Agile Scrum (Daily Standup, Sprint Planning and Sprint Retrospective meetings) and Kanban
  • Excellent verbal, written, and interpersonal communication skills

Other Desired Qualifications

  • 2+ years working with configuration and monitoring technologies such Ansible, Telegraf, Grafana
  • 3 + years of Unix or Linux administration experience
  • 2+ years of design, implementation and governance experience with Artificial Intelligence, Natural Language Processing or Machine Learning architecture
  • 3+ years of experience with Cloud technologies
  • Experience with Ansible automation
  • Experience with mass vulnerability remediation automation
  • Experience with system administration across multiple platforms
  • Experience with one or more Technology Platforms (Cloud, o/s, etc.): Pivotal Cloud Foundry (PCF), AWS, Azure, Linux, VMware
  • Experience with Observability/Monitoring technologies: Splunk, DataDog, Elastic Stack/ELK, Grafana, Prometheus, Kafka, Cloudwatch
  • Experience with Container technologies: Kubernetes, Docker, PKS
  • Experience with Site Reliability Engineering (SRE)
  • 5+ years of experience with Agile, Kanban, or Lean methodology

Job Expectations

  • Willingness to work on-site at stated location on the job opening
  • Ability to travel up to 15% of the time

Street Address

NC-Raleigh: 1100 Corporate Center Dr - Raleigh, NC
NY-New York: 100 Park Ave - New York, NY
AZ-Chandler: 2600 S Price Rd - Chandler, AZ
MN-Minneapolis: 255 2nd Ave S - Minneapolis, MN
CA-Concord: 1755 Grant Street - Concord, CA
TX-DAL-Downtown Dallas: 1445 Ross Ave - Dallas, TX


All offers for employment with Wells Fargo are contingent upon the candidate having successfully completed a criminal background check. Wells Fargo will consider qualified candidates with criminal histories in a manner consistent with the requirements of applicable local, state and Federal law, including Section 19 of the Federal Deposit Insurance Act.

Relevant military experience is considered for veterans and transitioning service men and women.

Wells Fargo is an Affirmative Action and Equal Opportunity Employer, Minority/Female/Disabled/Veteran/Gender Identity/Sexual Orientation.

Benefits Summary


Visit for benefits information.

Drop files here browse files ...

Related Jobs

May 14, 2021
May 14, 2021
May 14, 2021
Scientist - I (Assistant)   South San Francisco, CA new
May 14, 2021
Scientist - I   South San Francisco, CA new
May 14, 2021