2022-08-16: Observability Reports, Metrics and OpenTelemetry, Securing Kubernetes, Git Auto Setup Remote, Learning eBPF and WebAssembly, viddy¶
Thanks for reading the web version, you can subscribe to the Ops In Dev newsletter to receive it in your mail inbox.
👋 Hey, lovely to see you again¶
It's summer, and hopefully, y'all are taking some time off for vacation. After seeing chaos with flights early in May in Valencia, I thought of staying local in my area, maybe expanding towards Austria and the mountains. A spontaneous Google maps tour later, I found a lovely spot near Bad Aussee to go for hiking adventures. Highly recommended to refresh batteries in nature, surrounded by mountains, rivers, and lakes.
What did I miss? Lots of great updates on Observability, securing Kubernetes, and handy tools to make our lives easier :-)
☕ Hot Topics¶
- Observability with OpenTelemetry is a great learning series by Thomas Stringer in 6 parts, covering Introduction, Instrumentation, Exporting, Collector, Propagation, Ecosystem.
- How we improved on-call life by reducing pager noise, avoiding alert fatigue.
- Remember the
git pusherrors with
git push --set-upstream origin branchnameto set up the remote branch tracking? Using the HEAD pointer and short command options, this can be shortened to
git push -u origin head. Still, why not automatically use the local branch name and set up the tracking? Git 2.37.0 for the rescue:
git config --global --add --bool push.autoSetupRemote true
🎯 Release speed-run¶
The first release of Cilium Service Mesh is available following the release of Cilium 1.12. "eBPF-Native When Possible" - Besides the option to remove sidecars, Cilium Service Mesh can perform a variety of service mesh features directly in eBPF to reduce the overhead even further. Prometheus 2.37 is the first LTS release, bringing a longer support lifecycle of at least 6 months and Hashicorp Nomad Service Discovery as a new feature. Litmus Chaos 2.11.0 and 2.12.0 bring more HTTP chaos experiments for status code, modify body/header, reset peer in Kubernetes clusters. Tracee 0.8.x improved the documentation to help with builds and quickstarts. Gitpod announced SSH support for workspaces.
🛡️ The Sec in Ops in Dev¶
Struggling to understand SLSA (Supply chain Levels for Software Artifacts) and its purpose? The CloudSecDocs on SLSA provides a great introduction and in-depth explanation of supply chain threats, SLSA levels, and more. Related, Microsoft open-sourced its SBOM generation tool.
Datadog released Threatest, a framework for end-to-end testing of threat detection rules. It supports creating so-called detonation attacks that expect to trigger an alert in an external platform such as Datadog. Another project to keep an eye out for: bomtool, an early proof of concept that uses libpkgconf to generate Software Bills of Materials (SBOMs).
Switching roles to Ops and architecture in Dev: Redis explained provides a thorough deep dive and touches on performance challenges with forking and copy-on-disk, clusters and replications, data storage types, and much more. I liked the illustrations helping verify the learning curve after each section.
⛅ Cloud Native¶
The new Pod Security Admission feature in Kubernetes competes with existing integrations such as Kyverno. This article dives into the provided features, evaluates the pros (easy to set up, integrated, version pinning) and cons (only pods, no enforcement of pod controllers, no pipeline support) and compares them to Kyverno pros (easy cluster-wide policy, audit results, CLI) and cons (add-on, blocked resources aren't in audits). A good summary to plan with future security measurements in your Kubernetes clusters.
What does the ImagePullBackOff status mean in Kubernetes? A pod could not start on the assigned node, failed to pull the container image, and will retry with an increasing delay (back-off). Great blog post, explaining the error and root causes to debug.
The Kubernetes Network Model is a thorough deep dive into different topologies: Local, pod-to-pod, multi-pod service abstraction, ingress, and egress communication, together in the mix with DNS, IPv6 network policies problems. The learning steps are helped with a local lab setup. Speaking of DNS, did you know that DNS records and response sizes may return more than 512 bytes? It's true; we left the space of small UDP packets with EDNS, TCP, and DNSSEC.
Learn about the first four threat vectors in Kubernetes: initial access, execution, persistence, and privilege escalation in MITRE ATT&CK Matrix for Kubernetes: Tactics & Techniques Part 1. The Trivy Operator now exports security metrics for Prometheus, the demo video provides more hints and shows the integration with the Prometheus Operator and kube-prometheus' ServiceMonitor CRD.
Splunk’s report focuses on Mean-time-to-resolve, AIOps, Observability driven development, and unknown unknowns, with leaders finding interesting new job offers and areas. There is work for teams to onboard, and solving the talent gap - finding good people is hard. Observability platforms that integrate all observability data types help with unified views and correlate data. We will see better reliability from deployments, which means faster innovation, confidence with SLOs, and code to production workflows with CI/CD.
VMware Tanzu’s report dives into observability gaining momentum and folks seeing the benefit, Ops going hybrid and multi-cloud, which increases complexity. “The pace of development drives the need for Observability” is a key statement. There are existing visibility challenges with limited team access and a lack of metrics in cloud environments. Disparate monitoring tools are identified as the root cause: Separate app and infrastructure monitoring don’t work. Often there is no consensus on how to rationalize the toolset.
TL;DR - making everyone see value in Observability has arrived. Onboarding and learning observability practices are the next steps, with OpenTelemetry being identified as key emerging technology by Gartner. Open Source for better Observability by Dotan Horovits provides a great entry and overview on the why and how of Observability. I enjoyed watching the recording.
Looking to dive into eBPF and needing more learning resources? This Twitter thread provides great resources such as Aya, Rust for eBPF, also explained at eBPF day at KubeCon EU 2022 shown with Parca for profiling. Related, Polar Signals published a nice getting started tutorial for Parca diving into the benefits of continuous profiling.
Timescale published an awesome article comparing Prometheus and OpenTelemetry metrics. Prometheus metrics are a good start being a subset of OpenTelemetry metrics. Mixing both standards can become problematic with the same visibility on Observability data - we again have more things to evaluate and learn.
🔍 The inner Dev¶
We've started learning WebAssembly with Assembly script in the #EveryoneCanContribute cafe meetup, and since then I keep following more folks to learn about ideas and innovative thoughts. One of them is to understand how WebAssembly can help as a Universal Binary Format. You might have seen that Vercel ran PHP compiled as WebAssembly on serverless edge functions.
C++23 is in feature-freeze mode -
std::generator providing usable coroutines in the standard library caught my attention. More changes are explained in this blog post. Following the C++ announcements, Carbon as the experimental successor of C++ was announced at Cpp North. It promises to be easy to learn and extend, with safer fundamentals and a memory-safe subset. Suggest following the project repository and inspecting the language features.
I haven't touched recursion in Rust yet (and to be honest, I avoid this design pattern unless the performance gain and code quality can be justified) - the article Elegant and performant recursion in Rust provides a gentle introduction and borrows Haskell recursion schemes to dive deep into optimization steps.
📈 Your next project could be ...¶
- Running Renovate on GitLab.com to automate dependency updates.
- Instrumenting your CI scripts with OpenTelemetry Shell
📚 Tools and tips for your daily use¶
- viddy is a modern watch command which supports getting paged, time machine mode (rewind, go to the past, future), etc.
- tproxy is a CLI tool to proxy and analyze TCP connections. For example, gRPC connections, MySQL connection pools, etc.
- Laurel: Linux Audit - Usable, Robust, Easy Logging, is an event post-processing plugin for auditd(8) to improve its usability in modern security monitoring setups.
- lensm is a tool for viewing assembly and source in Go.
- Docusaurus is a project for building, deploying, and maintaining open source project websites easily (playground).
- Git push options to create merge requests in GitLab, shared in this tweet and on Hacker News
- Deploy to AWS Amplify from GitLab CI/CD Self Managed
- SRE book recommendations with pictures in replies on Twitter.
- Awesome OpenTelemetry, a curated collection of tools, APIs, and SDKs.
- Awesome Embedded Rust, a curated list of resources for embedded and low-level development in the Rust programming language.
🎥 Events and CfPs¶
- Sep 5-7: Container Days EU in Hamburg, Germany. Join me there!
- Sep 13-16: OS Summit EU in Dublin, Ireland
- Sep 28-29: eBPF Summit, virtual
- Oct 12-13: Kubernetes Community Days Munich in Munich, Germany
- Oct 24+25: KubeCon NA co-located events
- Oct 24-28: KubeCon NA in Detroit, Michigan. Join me there!
- Nov 10-11: All Day DevOps, virtual
- Nov 16-17: Continuous Lifecycle / Container Conf in Mannheim, Germany. Join me there!
👋 CfPs due soon
- Apr 17-21: KubeCon EU 2023 in Amsterdam, CfP opens Aug 29.
Looking for more CfPs? Try CFP Land.
Aleksandr for a laugh on "When the project is not ready, but the client wants a demo.
Thanks for reading! If you are viewing the website archive, make sure to subscribe to stay in the loop!
PS: If you want to share items for the next newsletter, please check out the contributing guide - tag me in tweet replies or send me a DM. Thanks!