System Administration

This part of the documentation will describe the structure and environment of the system parts deployed on the UniApp servers. It will also describe the tools used for operation of the UniApp. The first section “Server Infrastructure”- will describe the server landscape. The second section “Operations” will describe how to maintain the server. In particular, this section will describe the tools for administration and maintenance and will provide some guides for these tasks. The mentioned guides can be useful for newbies in situations where quick action is needed.

Server Infrastructure

As mentioned above this section will describe the server architecture for that I will first clarify the most important questions: what and where:

What?

System

The main part of the hosted system is the backend (web) server. This is a apache web server hosting php code implemented using the laravel framework. This server is responsible for the public homepage and the RESTful API. The latter one depends on the databases which are also a part of the server system. A crucial part of our system is that we support multiple versions of the backend server at the same time. This is needed as we cannot ensure that all users update their clients immediately. Furthermore, for development purposes we also deploy a stable beta environment and an unstable dev environment. That is why the backend server is hosted at least three times (dev, beta and at least one prod version) and the database is hosted exactly three times (dev, beta and the prod database). For the provisioning of our system we also run a reverse proxy. This is responsible for all the routing, TLS and similar. Another part of the system is code external documentation.

In the summer semester 2020 we introduced a centralized documentation for the UniApp using the documentation generator Sphinx. This documentation is hosted publicly on our servers. The second part of the documentation is the API-documentation. Actually this part of the documentation is more an specification than a (pure) documentation. Therefore, it is written using OAS3 and hosted using swagger ui via our laravel backend. Like before, we also need multiple environments (dev, beta and at least one prod env) for the OAS as they should match the hosted versions of the REST API.

As a last part, we do host another little interface on our servers. This API is used to query if a selected version is currently active or if it is deprecated/non-existent. This interface runs independently from our other REST APIs as it should help to determine if a backend is down by accident or if the requested version is not supported anymore.

CI-Runner

A big part of our operations toolset are the CI/CD-Pipelines. These are used to build, test and deploy our services and a lot of other stuff (e.g. schedules or archiving meeting protocols). To be able to run resource intensive pipelines we are hosting our own pipeline runners (as we use gitlab we use the gitlab-runner).

Operation Tools

Last but not least, we also host some tools needed for operation and maintenance of the servers. The main tools for monitoring are Sentry.io and Grafana which uses a Prometheus stack.

Where?

TL;DR

we host our stuff in the bwCloud which is a cloud/IaaS hosted by multiple universities in Baden-Württemberg.

Let me start with a history lesson. *opens a paper roll* 📜 Once upon a time… Joke. Until the summer semester 2020 the UniApp was hosted directly on a VM provided by the institute for software engineering and programming languages. The VM was running a Ubuntu 16.04 installation using Apache2 as web server (and a gitlab shell runner). Running the whole environment on the VM directly was pretty basic and worked back then. The separation of the different environments was (theoretically - as it was not fully implemented)done with directoryRoots/locations with Apache2. There was one mysql server running, hosting one database for each environment. Deploying to this server was done by connecting via ssh to the server and using git to clone/pull the latest version. But this setup was strongly dependent on this server. It was really easy to screw something up because nothing deployed on this server was using a sandbox or similar. For example, we ran multiple times in problems with untracked files in the git repo and similar. Another risk of this installation was, that we couldn’t easily replicate it anywhere else, as the whole configuration must be done manually. After we ran in several problems with this setup (and the Ubuntu version was rather old) we decided that we wanted to do a rework of this environment using a containerized environment. Instead of upgrading the old server and migrating the environment we used a new Ubuntu 20.04 server to build the new environment (the detailes are described below). This transition was successfull and the new environment was running for some time. But… again we ran into some problems: A resource-hungry ci-pipeline (Android) ran for an hour before getting killed by a timeout and used all of the memory of our server. This caused serveral downtimes of the Backend. Another problem was that we students had no chance to recover the server if it was not reachable (via ssh). Therefore, we decided in the end of the summer semester 2020 that we want to migrate our environment again. But this time we wanted to move to the bwcloud, which should solve all the mentioned problems. As the environment was (almost completely) containerized, this transition was quite easy: installing some dependencies like docker and docker-compose, configuring some stuff on the host (ssh, users, etc), copy some docker-compose files (only the components not deployed by the pipeline) to the server and start them and rerun the deployment jobs of our pipeline.

And that kids is how I met your mother… ahh… environment

Present

As mentioned above the UniApp is hosted in the bwCloud. The bwCloud is a cloud system / IaaS for Science and Education hosted by several universities in BW, one of them is the Ulm University. The bwCloud is based on OpenStack and ceph. Before I will get into detail on the deployed environment(s), I will explain some stuff around the instances. All our inctances are located in the project “ulm_softeng_projects” (ask the supervisor for access). For the network configuration we need two security groups. The first one (“uniapp-databases”) allows ingress traffic for the ports 3306, 3307 and 3308. These ports are used for the databases. The other security group is used to allow some traffic for our ops-tools (like prometheus, grafana and sentry.io). For creating instances in cloud systems a SSH key pair is needed. Ours is stored in the bwCloud. Now that all this stuff is out of the way I will describe our instances. (As my teacher always said:) The attentive observer has noticed that I separated the what? section in three parts: system, ci - runner and operation tools. This separation was done on purpose as it reflects our actual server environment. We run three instances in the bwCloud where each represent one of the mentioned sections. The CI-Runner may be the most fast-forward one. As described: This instance is only responsible for running the gitlab-runner. First some basic information on the bwCloud configuration:

instance name: uniapp-ci
description: CI/CD for project UniApp
image: Ubuntu 20.04 (more accurate a snapshot which was based on Ubuntu 20.04)
volume: 64GiB as boot volume (mounted to /)
flavor: m1.large (means 4 VCPUs, 8GB RAM)
network: public-ulm (default)
security groups: no special one

The basic installation of gitlab-runner is really simple: just follow this guide (also for updates). A guide for the configuration can be found here. The configuration file is located here: /etc/gitlab-runner/config.toml. Here is our configuration:

Gitlab runner configuration

 concurrent = 3
 check_interval = 0

 [session_server]
     session_timeout = 1800

 [[runners]]
     name = "uniapp-ci"
     url = "https://gitlab.uni-ulm.de/"
     token = "<< redacted >> "
     executor = "docker"
     [runners.custom_build_dir]
     [runners.cache]
     [runners.cache.s3]
     [runners.cache.gcs]
     [runners.cache.azure]
     [runners.docker]
     tls_verify = false
     image = "alpine:latest"
     privileged = false
     disable_entrypoint_overwrite = false
     oom_kill_disable = false
     disable_cache = false
     volumes = ["/cache"]
     shm_size = 0

The last thing configured for the CI-Runner are two cronjobs. These cronjobs are used to cleanup the instance. We use the gitlab-runner with a docker-executor under the hood. During the execution of a job the gitlab-runner pulls the specified image. These images (especially the android-ci image) uses a lot storage! Therefore, we prune all docker images and volumes each night. For that we use a cronjob:

0 2 * * 1       /usr/bin/docker image prune -a -f
0 2 * * 1       /usr/bin/docker volume prune -f

Next, I will talk about the operations instance. As described, this instance is responsible for hosting several operation tools. The configuration of the bwCloud instance is pretty similar than the configuration described before:

instance name: uniapp-ops
description: None
image: Ubuntu 20.04
volume: 25GiB as boot volume (mounted to /)
flavor: m1.large (means 4 VCPUs, 8GB RAM)
network: public-ulm (default)
security groups: uniapp-devops (opens up ingress tcp traffic on port 3000)

Sentry.io provides an easy way to install their onpremise version. This version is the same nearly-fully-featured version as deployed on the paid, hosted sentry server. The only difference is that the onpremise version has only one organization. For installation sentry provides a guide here. The cloned repository can be found in /home/uniapp/sentry-onpremise. For configuration the following files are used: sentry/config.yml, sentry/sentry.conf.py and sentry/requirements.txt (as we use a third party plugin for the gitlab auth integration). The YAML-file uses almost only the default values. The only keys configured by hand are the keys used for the slack integration. Here only the keys slack.client-id, slack.client-secret, slack.signing-secret and slack.legacy-app are used. The latter one is set to False. The values for the ID and the secrets can also be found in the ‘back-end’ part of the KeePass File. The python configuration had to be adjusted due to the gitlab auth integration. First we removed the lines organization:incidents, organization:metric-alert-builder-aggregate and organizations:advanced-search from the features list (as they caused some problems) and we added the GITLAB_APP_ID, GITLAB_APP_SECRET and GITLAB_BASE_DOMAIN variables to the file. NOTE: The first login uses the default sentry login page with the admin user created during setup. You will than have to activate the gitlab auth integration in the sentry settings. The gitlab-user used for this will be associated to the admin@sentry.local user (password for this user is in KeePass). Other users will use their uni-ulm mail. Lastly, in the requirments file we will have to add the plugin sentry-auth-gitlab-v2. The last instance hosts the main system components. Firstly, here the bwCloud configuration:

instance name: uniapp
description: Backend for UniApp
image: Ubuntu 20.04
volume: 200GiB as boot volume (mounted to /)
flavor: m1.medium (means 2 VCPUs, 4GB RAM)
network: public-ulm (default)
security groups: uniapp-databases

The most important components hosted on this system may be the actual Laravel Backend. As described above we host multiple environments of the Backend. Each environment runs in its own docker container. So let me first describe the container itself. I will do that by explaining the Dockerfile. The image is based on the php:7.4-apache-image, a ubuntu based image with handy stuff for php and apache as web server. All the magic in the backend container happens in the APP_HOME directory (/var/www/html). The image has some additional packages installed (You can find them in the Dockerfile). It also has to configure some dependencies (mysql, graphics library, php extensions). It will then install composer, install the dependencies and copy our code and some additional stuff. While starting the container some stuff will get initialized (eg cp the .env file or running the migrations & seeds) and finally the apache server will be started. The apache configuration defines a virtual host for the web server. This host runs on port 80 (the tls stuff is terminated in the reserve proxy/edge router and not handled by the backend container). The config first sets some basic stuff:

Sets the server name to uniapp.informatik.uni-ulm.de
Sets the document root to /var/www/html/public
Sets error log to /tmp/uniapp_error.log
Sets custom log to ${APACHE_LOG_DIR}/access.log combined

The requests arriving at this server do have a path prefix like /dev, /beta or /vX. As Laravel wants to do a lot of the routing (for dynamic routes like /api/news) itself through the public/index.php we have to reroute all calls to this php-file. All this stuff is done according to the laravel documentation and some tricks found in the apache docs by Leander Nachreiner. Using the AliasMatch config we define that /var/www/html/public is responsible for all requests while stripping away the mentioned prefixes. Using some apache typical rewrite rules we are rewriting the paths for all dynamic routes (routes not pointing to a file or directory) to the index.php. All configs mentioned above (Dockerfile and apache config) are generic, therefore we can use them in all environments without any changes. The rest of the configurations for the backend container (and the dependencies) are done in the docker-compose-files. There is one compose-file that defines the commonly used configurations and another compose-file for each environment. The common-compose-file defines one service (the laravel service) with the image mentioned before. The image is referenced using the gitlab container registry with the project name (backend) and the git hash or git tag of the image. Furthermore it defines some variables (eg. for the DB), the health check and similar. Lastly, it also configures some stuff used later for the configuration of our reverse proxy (the labels and the network config). Let me go on with the compose-file for prod. I will discuss it and point out some other stuff that is important for setting up the host server (like filesystem dependencies). First of all this compose-file overrides some env-variables. The LARAVEL_ENV_FILE is the path to the env file which will be used by laravel (by copying to the project root during startup). The APP_URL and ASSET_URL are used by laravel to resolve absolute paths (for downloads etc). Next, the compose-file specifies some bind mounts. These are used to make some files or directories available in the container (ATTENTION: The paths on the host must exist before starting the containers!). In particular, the following directories are defined (inherited from the gitlab ci variables):

feedback screenshot directory: /srv/feedback
map files directory: /srv/maps
ads images directory: /srv/ads
news images directory: /srv/news
log files directory: /srv/logs/vX

All of this stuff will be used by multiple containers (all running prod versions). Lastly, again some stuff for the reverse proxy is configured as labels. The prod compose-file does not specify any database. The prod database is hosted independently as the same db must be used by multiple environments (multiple prod versions).

In contrast, the dev and beta compose-files are defining a second services as a mysql database. This service uses the mysql:8.0.20 image. As Laravel 7 has some problems with the latest recommended authentication mechanism used by mysql, we have to specify another mechanism using --default-authentication-plugin=mysql_native_password as command for that service. These services also specify some environment variables used by the database (passwords, default database and similar). Both environments also specify some reverse-proxy specific stuff (labels and network). There is one important difference between the configuration of the beta db and the dev db: The beta db specifies a (read-only) bind mount to a dump directory (/srv/sqldump). On startup the newest sql-dump file in this directory will be used to seed the database (as this directory contains nightly sql dumps of our prod db (more on that later) we will have a duplicate of production data in our beta environment). Let me continue with the laravel service in the compose-file for beta and dev. Both are pretty much the same. Both specify the appropriate LARAVEL_ENV_FILE, DB_HOST, APP_URL and ASSET_URL. One difference is that the dev environment overwrites the MIGRATE_COMMAND specified in the Dockerfile. By default this command (php artisan migrate / as used in beta and prod) only executes new migrations (transforming the existing database structure). In the dev environment this command (php artisan migrate:fresh --seed) will drop all tables, re-run all migrations and seeds the database. Therefore we will have a clean state on dev after each deployment. The dev and the beta environment are mounting the following directories:

map files directory: /srv/maps
log files directory: /srv/logs/dev or /srv/logs/beta

The files in these directories should be persistent and may be used by multiple environments (maps are used by all environments as the map file transformation is a resource heavy job).

As I mentioned before: Our prod database is hosted in a separate container instance, as we want to use it from several other instances. Nevertheless, we still host this database in a docker container using docker-compose. Like before, the mysql:8.0.20 image is used, we must specify the mysql_native_password, we must provide some environment variables (for initializing passwords, user and a default database) and we define some stuff for the reverse proxy. Similar to the beta environment we specify a directory for sql dumps (/srv/sqldump). The sql dump in this directory will be loaded in the database when the container starts (needed in the rare case when our system would crash and the volumes would be deleted… and backups can never be wrong). Unlike in the beta environment, the bind mount for the sql dumps is not mounted read-only (obviously… as we want to create dumps…). For creating these dumps we use the gitlab ci schedule (as this is the easiest to control without doing ssh or similar). This ci job uses docker exec in combination with DOCKER_HOST=ssh://ourdockeruser@ourdocker.host to execute the sqldump command in our prod db container. As mentioned before the laravel containers are also used to host the api-specification using swagger ui. For that we are hosting a static webpage using laravel views. This page will than load the swagger ui from a cdn and the specification from the server. The swagger ui is hosted on /${env}/swagger and the specification will be loaded from /${env}/openapi.yaml (which is a merged/resolved version of our multi-file specification).

As we have talked about the backend instances and the related databases, we have one last api running. This is a little interface to check wether a major version is (should be) running/supported. Again, this is hosted using a image based on the php:7.4-apache image. Like before, we are using a Dockerfile to do some configurations (php mysql extension), defining /var/www/html as the directory in which the magic happens and copying the relevant files. The apache config is fast-forward this time. We define the hostname, the logs, the document root (/var/www/html) and the alias (/version alias for /var/www/html). The docker-compose file specifies this time only one service: versionAPI. This service specifies the mentioned image, healthchecks, database environment variables (as this api uses our prod db) and some stuff for the reverse proxy.

The main documentation is hosted based on sphinx. All the relevant files are located here. Sphinx is a toolset that compiles reStructuredText or similar files to static files (html-files). Therefore, we can host this documentation with a simple web server. To host the sphinx-doc we use the official apache image httpd:2.4-alpine. The Dockerfile only copies the compiled files in to the image. The docker-compose-file defines only the mentioned image and some stuff used by the reverse proxy.

The last hosted part is the reverse proxy. It is responsible to route the requests arriving on the server to the responsible service. For this task we are using the traefik proxy. Traefik is a simple, cloud native and dynamic (but improvable documented) application proxy. Cloud native means that we are able to run traefik as a container and it can manage routes for other containers, based on their specification (more on that later). A benefit of traefik are additional features like TLS termination (in combination with let’s encrypt for providing the certificates), rate limits and other middlewares (auth and similar). We used docker-compose for the setup of traefik and stored the compose file (and other configuration files) in our git repo (similar to the db config). Attention: traefik is not part of our continouos deployment. Traefik splits the configuration in two parts the dynamic configuration (dynamic routing configuration) and the static configuration (traefik describes it as startup configuration) (more on that here). First of all I will talk about the static configuration. It is located here: /etc/traefik/traefik.yml [>> repo]. The static configuration defines three entrypoints (one for each database, the http and https). The http entrypoint listens on port 80 and redirects any connection to the https entrypoint. The https entrypoint listens on port 443. The db entrypoints are listening to 3306 (prod), 3307 (beta) and 3308 (dev). Next, the configuration specifies two providers for the dynamic configuration: docker (configured using docker container labels) and file (a yaml file located here /etc/traefik/config/dynamic.yml [>>repo]). As mentioned above we use traefik to do the tls termination. For that we define a certificatesResolver in the static configuration. Our static configuration also enables the traefik API, the dashboard (accessable here) and the ping feature. Lastly, the static configuration enables the access log (/etc/traefik/log/access.log we implemented a log rotation using a cronjob) and the error log (/etc/traefik/log/traefik.log). The dynamic config file defines a router and a redirection middleware for redirecting requests connecting to the “root” to the latest prod version (/ -> /vX). The other dynamic configurations are done using docker labels. These labels are defined using the compose-file. Generally: all services (as they are called in docker-compose) use the label traefik.enable=true, which is used to say traefik that the service should be exposed. Next, I will show the dynamic configuration of the db as they are special. For enabling connections to our databases we expose them using tcp (not http) routers and with an wildcard rule(*). Rules (in the context of traefik) are specifying the hostname (and for http routers also paths) the service should listen for. Furthermore the db services specify, which entrypoints they will be assigned to (mysql-dev, mysql-beta & mysql). The other docker-compose-services are exposed using a http router. The common configuration (for all services using http) includes the rule, the entrypoint and the certresolver. The rule uses the host (uniapp.informatik.uni-ulm.de) and a path prefix (/$ENV and /docs with dev, beta or the major prod version as $ENV).

The sphinx service (/docs) uses two middlewares: one for stripping the /docs prefix and the other to redirect requests on the root (/docs) without a trailing slash to the same location using a trailing slash. The second middleware is required as relative paths in the html would not respect the prefix (the relative path script.js would be resolved to example.com/script.js instead of example.com/docs/script.js). The most complicated dynamic configuration is the configuration on the traefik service. First of all, the routers are listening to the path prefixes /api and /dashboard (used by traefik internally). Further, we have to specify the exact traefik service the router should be assigned to, as this service is an internal service api@internal and does not reflect the traefik container itself. This router also uses two middlewares: a redirection if the trailing slash is missing (as described before) and a basic auth middleware, providing a basic authentication for the dashboard and the traefik API. We used the dynamic configuration in the compose-file of traefik for another two things: the definition of a rate-limit middleware referenced by the static configuration and the configuration of the traefik ping feature. The ping feature uses the same common router configuration as the other http routers (with the path prefix /ping) and the traefik service ping@internal. The rate limit middleware specifies an average of 100 and a burst of 200. The average specifies the amount of requests forwarded per minute. The burst defines the buffer size.

As every component of our server infrastructure are described above I want to point out the monitoring chapter, which will describe our monitoring tools that may be helpful for debugging the described infrastructure (and the backend).