The audit is conducted in two stages:
I. Audit of monitoring and critical problems
- Operational analysis of the current architecture to identify and address critical problems and, if necessary, make changes to the architecture
- Audit of monitoring systems in the existing infrastructure
- Improvement of monitoring systems based on identified infrastructure features
- Audit of the failure alert system
- Improvement and adjustment of the failure alert system based on identified infrastructure features
- Audit of the organization of duty shifts, proposals for improvement based on our own company’s experience
- Audit of escalation plan within the technical support department
- Documentation based on the audit results
II. Audit of infrastructure
- Analysis of infrastructure stability
- Audit of the reliability of backup systems, suggestions for improvement
- Analysis of the operation of project backup systems, proposals for improving backup systems, organization of a plan for regular switching to the backup site
- Analysis of DBMS operation, suggestions for improvement
- Consultations on working with technologies from the Apache Foundation and their applicability in the company's infrastructure
- Documentation with recommendations for applying the proposed changes in the operation of the company's infrastructure
Technology stack:
The technologies we apply in DevOpsProdigy include but not limited to the following:
- Monitoring and visualization systems - Prometheus, Grafana, TICK Stack, Zabbix, Nagios, Icinga, DataDog, NewRelic,
- Incident response systems - PagerDuty, Amixr
- Logging, error tracking - ELK, EFK, Grafana Loki, Graylog, Sentry
- Tracing systems - Jaeger, Zipkin
- Web, ingress and application servers - Nginx, Envoy, Linkerd, Traefik, Apache, HAProxy, Jetty, Tomcat, NodeJS
- Programming languages - Python, TypeScript, JavaScript, Go, Java, PHP, Ruby, Erlang
- Cloud computing platforms and services - Amazon AWS, Google Cloud Platform, Microsoft Azure, Rackspace, Alibaba Cloud
- Container systems: Docker, CRI-O, LXC, LXD
- Container Orchestration: Kubernetes, Nomad, Docker Swarm, RedHat OpenShift, Mesos/Marathon
- Automation tools: Jenkins, Gitlab CI, CircleCI, Travis CI, Bitbucket Pipelines, TeamCity, GoCD, ArgoCD, Spinnaker
- Cloud automation tools: AWS CodePipeline, AWS CodeDeploy, AWS CodeCommit, Google CloudBuild, Spinnaker
- Cloud databases: AWS RDS and other DBs, Google Cloud SQL and other DBs, Firebase, MongoDB Atlas
- IaaC: Terraform, Pulumi, Ansible, Chef, Puppet
- Service Mesh: Istio, Maesh, Linkerd
- Service Discovery: Consul, etcd, ZooKeeper, MongoDB
- On-Premise Databases: PostgreSQL, MongoDB, MySQL, Cassandra, Elastic, Redis, MemSQL, Galera, Aerospike, MSSQL, Clickhouse, InfluxDB
- Messaging systems: Apache Kafka, RabbitMQ, Apache Beam, Mosquitto
- Storages for Big Data Applications: Apache Hadoop, Apache HBase, Greenplum, Cassandra
- Query Engines: Apache Hive, Apache Impala
- Data processing and transformation: Apache Spark, Apache Flink, Apache Airflow, Apache NiFi, Kafka Streams