Skip to main content

3 posts tagged with "docker"

View All Tags

· 6 min read
TheBidouilleur

Since the DevOps movement started (or rather Platform engineering), the topic of high availability has been brought to the forefront. And one of the most versatile solutions to achieve high availability is to create application clusters. (and so: containers)

So I've been running a Swarm cluster for a few years and I recently switched to Kubernetes (k3s to be precise). And by having clusters holding several hundreds of containers, we forget about maintenance and update.

And in this article, we will talk about updates.

Out-of-cluster container upgrade solutions

WatchTower

I think the best known solution is Watchtower

Watchtower is easy to use and is based (like many others) on labels. A label allows to define some parameters and to activate (or deactivate) the monitoring of updates.

Updating is not always good...

Be careful not to automatically update sensitive programs! We can't check what an update and if they won't break something. It's up to you to choose which applications to monitor, and to trigger an update or not.

WatchTower will notify you in several ways:

  • email
  • slack
  • msteams
  • gotify
  • shoutrrr

And among these methods, you do not have only proprietary solutions, free to you to host a shoutrrr, a gotify or to use your smtp so that this information does not leave your IS! *(I am very critical of the use of msteams, slack, discord to receive notifications)

WatchTower will scan for updates on a regular basis (configurable).

container-updater (from @PAPAMICA)

The most provided/complex solution is not always the best. Papamica has set up a bash script to meet his specific needs (which many other people must have): an update system notifying him through Discord and Zabbix.

This one is also based on labels and also takes care of the case where you want to update by docker-compose. (instead of doing a docker pull, docker restart like Watchtower)

labels:
- "autoupdate=true"
- "autoupdate.docker-compose=/link/to/docker-compose.yml"

Even if I don't use it, I had a time when I was using Zabbix and I needed to be notified on my Zabbix. (which notified me by Mail/Gotify)

Papamica states that he plans to add private registry support (for now only github registry or dockerhub) as well as other notification methods.

Solutions for Swarm

Swarm is probably the container orchestrator I enjoyed the most: it's **simple**! You learn fast, you discover fast and you get quick results. But I've already written about Swarm in another article...

Sheperd

What I like in Papamica's program (and that goes with Sheperd) is that we keep bash as the central language. A language that we all know in the main thanks to Linux, and that we can read and modify if we take the time.

Shepherd's code is only ~200 lines and works fine like that.

version: "3"
services:
...
shepherd:
build: .
image: mazzolino/shepherd
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
placement:
constraints:
- node.role == manager

This one will accept several private registers, which gives a nice advantage compared to the other solutions presented. Example:

    deploy:
labels:
- shepherd.enable=true
- shepherd.auth.config=blog

Shepherd does not include a (default) notification system. That's why its creator decided to offer a Apprise sidecar as an alternative. Which can redirect to many things like Telegram, SMS, Gotify, Mail, Slack, msteams etc....

I think this is the simplest and most versatile solution. I hope it will be found in other contexts. I hope it will be found in other contexts (but I don't go into too much detail on the subject, I'd like to write an article about it).

I used Shepherd for a long time and I had no problems.

Solutions for Kubernetes

For Kubernetes, we start to lose in simplicity. Especially since with the imagePullPolicy: Always option, you just have to restart a pod to get the last image with the same tag. For a long time, I used ArgoCD to update my configurations and re-deploy my images at each update on Git.

But ArgoCD is only used to update the configuration and not the image. The methodology is incorrect and it is necessary to find a suitable tool for that.

Keel.sh

Keel is a tool that meets the same need: Update pod images. But it incorporates several features not found elsewhere.

Keel

If you want to keep the same operation as the alternatives (i.e. regularly check for updates), it is possible:

metadata:
annotations:
keel.sh/policy: force
keel.sh/trigger: poll
keel.sh/pollSchedule: "@every 3m"

But where Keel excels is that it offers triggers and approvals.

A trigger is an event that will trigger the update of Keel. We can imagine a webhook coming from Github, Dockerhub, Gitea which will trigger the update of the server. *(So we avoid a regular crontab and we save resources, traffic and time) As the use of webhook has become widespread in CICD systems, it can be coupled to many use cases.

The approvals are the little gem that was missing from the other tools. Indeed, I specified that updating images is dangerous and you should not target sensitive applications in automatic updates. And it's just in response to that that Keel developed the approvals.

The idea is to give permission to Keel to update the pod. We can choose the moment and check manually.

I think it's a pity that we have Slack or MSTeams imposed for the approvals, it's then a feature that I won't use.

A UI

So for now, I use Keel without its web interface, it may bring new features, but I would like to avoid an umpteenth interface to manage.

Conclusion

Updating a container is not that easy when you are looking for automation and security. If today, I find that Keel corresponds to my needs, I have the impression that the tools are similar without offering real innovations. (I'm thinking of tackling the canary idea one day) I hope to discover new solutions soon, hoping that they will better fit my needs.

· 8 min read
TheBidouilleur

[ Cet article provient de mon ancien-blog, celui-ci sera également disponible dans la partie “Documentation” du site ]

Introduction

Depuis que jai commencé l’informatique (depuis un peu moins d’une dizaine d’année), je ne me suis jamais préoccupé de comment je visualisais mes logs. Un petit view par ci, un gros grep par là.. mais aucune gestion avancée.

J’ai basé ma supervision sur Zabbix et Grafana qui m’affichent les metriques de chaque machine virtuelle individuellement. Et même si c’est bien pratique, je n’ai presque aucun visuel sur l’état de mes applications ! J’ai donc décidé de me renseigner sur Graylog et Elastic Search proposant un stack assez fiable et facile à mettre en place. Puis en voyant les ressources demandées, j’ai remis ce besoin à “plus tard”, et j’ai remis “plus tard” à l’année prochaine.. Et ainsi de suite !

2 ans plus tard…

Aujourd’hui (Decembre 2021), une grosse faille 0day est dévoilée concernant Log4J, et on ne parle pas d’une “petite” faille, c’est une bonne grosse RCE comme on les aime !

Je ne suis pas concerné par Log4J, ce n’est pas utilisé dans Jenkins, et je n’ai aucune autre application basée sur Java ouverte sur internet. Mais j’aurai bien aimé savoir si mon serveur a été scanné par les mêmes IP que l’on retrouve sur les listes à bannir. Et c’est avec cet évenement que j’ai décidé de me renseigner sur “Comment centraliser et visualiser ses logs?”.

Le choix du stack

Un stack est un groupement de logiciel permettant de répondre à une fonction. Un exemple classique est celui du stack “G.I.T.” (et non pas comme l’outil de versioning!) :

  • Grafana
  • Influxdb
  • Telegraf

C’est un stack qui permet de visualiser les mectriques de différentes machines, InfluxDB est la base de donnée stockant les informations, Telegraf est l’agent qui permet aux machines d’envoyer les métriques, et Grafana est le service web permettant de les visualiser.

Comme dit dans l’introduction, j’utilise Zabbix qui me permet de monitorer et collecter les metriques, et j’y ai couplé Grafana pour les afficher avec beaucoup de paramètrages.

Dans la centralisation de logs (et la visualisation), on parle souvent du stack suivant:

ELK:

  • ElasticSearch
  • Logstash
  • Kibana

Mais ce stack n’est pas à déployer dans n’importe quel environnement, il est efficace, mais très lourd.

Dans ma quête pour trouver un stack permettant la centralisation de logs, j’apprécierai utiliser des services que je dispose déjà.
Et voici le miracle à la mode de 2021 ! Le Stack GLP : Grafana, Loki, Promtail.

Stack GLP

Là où j’apprécie particulièrement ce stack, c’est qu’il est léger. Beaucoup plus léger que ELK qui, même si très efficace, demande beaucoup.

De même que Graylog2 + Elastic Search (une très bonne alternative) qui demande presque un serveur baremetal low-cost à lui seul.

Alors que Grafana / Loki ne demanderont que 2Go pour fonctionner efficacement et sans contraintes. (Grand maximum, à mon échelle : j’utiliserai beaucoup moins que 2Go)

Installer notre stack

Je pars du principe que tout le monde sait installer un Grafana, c’est souvent vers ce service que les gens commencent l’auto-hebergement (en même temps, les graphiques de grafana sont super sexy !).

Mais si vous n’avez pas encore installé votre Grafana (dans ce cas, quittez la salle et revenez plus tard), voici un lien qui vous permettra de le faire assez rapidement

Par simplicité, je ne vais pas utiliser Docker dans cette installation.

Partie Loki

J’ai installé Loki sur un conteneur LXC en suivant le guide sur le site officiel ici. Je passe par systemd pour lancer l’executable, et je créé à l’avance un fichier avec le minimum syndical (qui est disponible sur le github de Grafana)

auth_enabled: false

server:
http_listen_port: 3100
grpc_listen_port: 9096

common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory

schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h

Je n’ai pas pris la peine d’activer l’authentification en sachant que je suis dans un LAN avec uniquement mes machines virtuelles. Je considère pas que mon Loki comme un point sensible de mon infra.

Après seulement 2-3 minutes de configuration, notre Loki est déjà disponible !

On peut dès maintenant l’ajouter en tant que datasource sur notre Grafana :

(J’utilise localhost car la machine possédant le grafana héberge également le Loki)

Il se peut que Grafana rale un peu car notre base de donnée Loki est vide.

Partie Promtail

Promtail est l’agent qui va nous permettre d’envoyer nos logs à Loki, j’ai écris un role Ansible assez simple me permettant d’installer notre agent sur de nombreuses machines en surveillant les logs provenant de Docker, varlog et syslog.

Voici ma template Jinja2 à propos de ma configuration :

server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml

clients:
{% if loki_url is defined %}
- url: {{ loki_url }}
{% endif %}


scrape_configs:


- job_name: authlog
static_configs:
- targets:
- localhost
labels:
{% if ansible_hostname is defined %}
host: {{ ansible_hostname }}
{% endif %}
job: authlog
__path__: /var/log/auth.log


- job_name: syslog
static_configs:
- targets:
- localhost
labels:
{% if ansible_hostname is defined %}
host: {{ ansible_hostname }}
{% endif %}
job: syslog
__path__: /var/log/syslog

- job_name: Containers
static_configs:
- targets:
- localhost
labels:
{% if ansible_hostname is defined %}
host: {{ ansible_hostname }}
{% endif %}
job: containerslogs
__path__: /var/lib/docker/containers/*/*-json.log

- job_name: DaemonLog
static_configs:
- targets:
- localhost
labels:
{% if ansible_hostname is defined %}
host: {{ ansible_hostname }}
{% endif %}
job: daemon
__path__: /var/log/daemon.log

Si vous n’êtes pas à l’aise avec des templates Jinja2, vous trouverez une version “pure” de la config ici

Vous pouvez bien evidemment adapter cette template à vos besoins. Mon idée première est d’avoir une “base” que je peux mettre sur chaque machine (en sachant aussi que si aucun log n’est disponible, comme pour Docker, Promtail ne causera pas une erreur en ne trouvant pas les fichiers)

Une fois Promtail configuré, on peut le démarrer : via l’executable directement :

/opt/promtail/promtail -config.file /opt/promtail/promtail-local-config.yaml

ou via systemd (automatique si vous passez par mon playbook) :
systemctl start promtail

Une fois cet agent un peu partout, on va directement aller s’amuser sur Grafana !

Faire des requetes à Loki depuis Grafana

On va faire quelque chose d’assez contre-intuitif : nous n’allons pas commencer par faire un Dashboard : on va d’abord tester nos requetes ! Scrollez pas, je vous jure que c’est la partie la plus fun !

Sur Grafana, nous avons un onglet “Explore”. Celui-ci va nous donner accès à Loki en écrivant des requetes, celles-ci sont assez simple, et surtout en utilisant l’outil “click-o-drome” en dépliant le Log Browser

Pardon j'ai un chouïa avancé sans vous...

Avec la template que je vous ai donné, vous aurez 4 jobs :

  • daemon
  • authlog
  • syslog
  • containersjobs

Ces jobs permettent de trier les logs, on va tester ça ensemble. Nous allons donc selectionner la machine “Ansible”, puis demander le job “authlog”. Je commence par cliquer sur Ansible, puis Authlog. Grafana me proposera exactement si je souhaite choisir un fichier spécifique. Si on ne précise pas de fichier(filename) Grafana prendra tous les fichiers (donc aucune importance si nous n’avons qu’un seul fichier)

(vous remarquerez plus tard que dès notre 1ere selection, grafana va cacher les jobs/hôte/fichier qui ne concernent pas notre début de requete)

En validant notre requete (bouton show logs)

Nous avons donc le résultat de la requete vers Loki dans le lapse de temps configuré dans Grafana (1h pour moi). Mon authlog n’est pas très interessant, et mon syslog est pollué par beaucoup de message pas très pertinents.

Nous allons donc commencer à trier nos logs !

En cliquant sur le petit ”?” au dessus de notre requete, nous avons une “cheatsheet” résumant les fonctions basiques de Loki. Nous découvrons comment faire une recherche exacte avec |=, comment ignorer les lignes avec != et comment utiliser une expression regulière avec |~

Je vous partage également une cheatsheet un peu plus complète que j’ai trouvé sur un blog : ici

Ainsi, on peut directement obtenir des logs un peu plus colorés qui nous permettrons de cibler l’essentiel !

(L’idée est de cibler les logs sympas avec les couleurs qui vont avec)

Conclusion

Si on entend souvent parler de la suite ELK, ça n’est pas non-plus une raison pour s’en servir à tout prix ! Loki est une bonne alternative proposant des fonctionnalitées basiques qui suffiront pour la plupart.

· 5 min read
TheBidouilleur

[ This article is from my old-blog, it will also be available in the "Documentation" section of the site ]

Docker Swarm

Introduction

The world of containerization has brought many things into system administration, and has updated the concept of DevOps. But one of the main things that containers (and especially Docker) bring us is automation.

And although Docker is already complete with service deployment, we can go a little further by automating container management! And to answer that: Docker Inc. offers a tool suitable for automatic instance orchestration: Docker Swarm.

What is Docker Swarm?

As previously stated: Docker Swarm is an orchestration tool. With this tool, we can automatically manage our containers with rules favoring High-availability, and Scalability of your services. We can therefore imagine two scenarios that are entirely compatible:

  • Your site has a peak load and requires several containers: Docker Swarm manages replication and load balancing
  • A machine hosting your Dockers is down: Docker Swarm replicates your containers on other machines.

So we'll see how to configure that, and take a little look at the state of play of the features on offer.

Create Swarm Cluster

For testing, I will use PWD (Play With Docker) to avoid mounting this on my infra:)

So I have 4 machines under Alpine on which I will start a Swarm cluster.

The first step is to define a Manager, this will be the head of the cluster, as well as the access points to the different machines. In our case, we will make it very simple, the manager will be Node1.

To start the Swarm on the manager, simply use the 'docker swarm init' command. But, if your system has a network card count greater than 1 (Fairly easy on a server), you must give the listening IP. In my case, the LAN interface IP (where VMs communicate) is 192.168.0.8. So the command I'm going to run is

docker swarm init èèadvertise-addr 192.168.0.8

Docker says:

Swarm initialized: current node (cdbgbq3q4jp1e6espusj48qm3) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join —token SWMTKN-1-5od5zuquln0kgkxpjybvcd45pctp4cp0l12srhdqe178ly8s2m-046hmuczuim8oddmk08gjd1fp 192.168.0.8:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.`

In summary: The cluster is well started, and it gives us the exact command to join the cluster from other machines! Since Node1 is the manager, I just need to run the docker swarm join command on Node2-4.

docker swarm join --token SWMTKN-1-5od5zuquln0kgkxpjybvcd45pctp4cp0l12srhdqe178ly8s2m-046hmuczuim8oddmk08gjd1fp 192.168.0.8:2377

Once completed, you can view the result on the manager with the command 'docker node ls'

Deploy a simple service

If you are a docker run user and you refuse docker-compose, you should know one thing: i don't like you. As you are nice to me, here is a piece of information that won't help: the equivalent of 'docker run' in Swarm is 'docker service'. But we're not going to get into that in this article.

Instead, we will use the docker-composed equivalent, which is the docker stack. So first of all, here's the .yml file

version: "3"
services:
viz:
image: dockersamples/visualizer
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
ports:
- "8080:8080"
deploy:
replicas: 1
placement:
constraints:
- node.role == manager

Before you start it, you'll probably notice the deploy part that lets you give directions to Swarm. So we can add constraints to deploy this on the manager(s), ask the host to limit the use of resources, or manage replicas for load balancing.

This first container will be used to have a simple dashboard to see where the Dashboards are positioned, and avoid going to CLI only for this function.

We will deploy this compose with the following command:

docker stack deploy —compose-file docker-compose.yml swarm-visualize

Once the command is complete, you simply open the manager's web server at port 8080.

So we now have a web panel to track container updates.

Simplified management of replicas

When you access a container, you must go through the manager. But there is nothing to prevent being redirected to the 3-4 node via the manager. This is why it is possible to distribute the load balancing with a system similar to HAProxy, i.e. by redirecting users to a different container each time a page is loaded.

Here is a docker-compose automatically creating replicas:

version: '3.3'
services:
hello-world:
container_name: web-test
ports:
- '80:8000'
image: crccheck/hello-world
deploy:
replicas: 4

And the result is surprising:

We can also adjust the number of replica. By decreasing it:

docker service scale hello-world_hello-world=2

Or by increasing it:

docker service scale hello-world_hello-world=20

What about High Availability?

I focused this article on the functions of Swarm, and how to use them. And if I did not address this item first, it is because every container created in this post is managed in HA! For example, I will forcibly stop the 10th replica of the "Hello world" container, which is on Node1. And this one will be directly revived,

Okay, But docker could already automatically restart containers in case of problem, how is swarm different?

And to answer that, I'm going to stop the node4

It is noted that the other nodes distribute automatically (and without any intervention) the stopped containers. And since we only access services through managers, they will only redirect to the containers that are started. One of the servers can therefore catch fire, the service will always be redundant, balanced, and accessible.

Conclusion

Docker-Swarm is a gateway to application clusters that are incredibly complex without a suitable tool. Swarm is easy to meet special needs without any technical expertise. In a production environment, it is advisable to switch to Kubernetes or Nomad which are much more complete and powerful alternatives.

I encourage you to try this kind of technology that will govern our world of tomorrow!

Thanks for reading