Skip to content

2023-03-20: Polar Signals Let's Profile, learning eBPF and AI/ML cont., Gitpod signing commits, Docker Hub changes, Observability in practice, OpenTelemetry in use, otel-desktop-viewer

Thanks for reading the web version, you can subscribe to the Ops In Dev newsletter here to receive it in your mail inbox.

πŸ‘‹ Hey, lovely to see you again

Many things happened in the past month - this issue is written from my bookmarks, finally taking time to reflect on the amazing learning opportunities with eBPF and AI. I have to admit - currently, the technology movement feels too fast, with too much information, and no breaks in between. Stephanie Stimac shared a good read about bullet points about mental health. I constantly remind friends, and myself, to take time off as much as possible. Refresh batteries because you cannot work 24/7. Ignore private messages and email requests that flood your inbox - and decide what has the most impact.

Flat things got sorted, new furniture assembled. I also reflected on my third year all-remote. My DM backlog is huge - sorry if you are waiting for responses. The easiest way to catch me for a chat is at events - QCon London next week, KubeCon EU in Amsterdam in April, Cloud Expo Frankfurt in May, and CloudLand near Cologne, Germany, in June (spoiler: 2 talks accepted, one about eBPF). In between, LEGO Rivendell needs to be built :-)

Before we dive in, here's my personal highlight this month: Polar Signals started a great new live-learning session "Let's Profile!". The third session focussed on profiling Kubernetes, and discussing Go code optimizations. Although I did not understand everything, following the thought process and explanations of possible code changes is invaluable to learn. I'm so looking forward to the next episode!

Polar Signals - Let's Profile episode 3

🌱 The Inner Dev learning ...

Everyone can contribute. To ensure that, we need to learn in public together, and share thoughts, ideas, and research strategies. The following sections collect my eBPF and AI/ML learnings this month, great stuff :-)

🐝 The Inner Dev learning eBPF

Let's start with a self-paced learning overview of eBPF enabling cloud-native innovation and eBPF - Everything You Need to Know in 5 Minutes.

Testing eBPF programs, like any other code project, can be challenging. The article about One year of testing eBPF programs is on point. It starts with a manual debugging story about a DNS query resolving problem, UDP message buffers, code execution race conditions. Pulsar can be used to monitor Linux devices at runtime, using eBPF. Without tests, too much time is wasted with production debugging. The newly created test suite executes all test cases by loading the eBPF program into the kernel, spawns tasks to read all events, triggers the events, and collect the received events, stop the eBPF program. The final report pretty-prints the test results. The second blog series part will look into the build environment for testing - looking forward to learning how they solve the CI/CD verifier problem.

"Turn YAML into eBPF bytecode" sounds to good to be true (thanks for sharing, Bill Mulligan). The Cilium Tetragon documentation now shows how to take advantage of the TracingPolicy custom resource definition (CRD) in Kubernetes. You can trace arbitrary events in the kernel, and optionally define actions to take on a match. It supports two types of events: kprobes and tracepoints. The Tetragon maintainers consider this CRD low-level requiring kernel knowledge, a more user-friendly RuntimeSecurityPolicy is considered in the future.

eBPF on Windows, getting to run Linux based eBPF programs? Microsoft wrote about their findings with the eBPF for Windows project. The project provides two getting started guides already: basic eBPF tutorial and a tutorial on debugging eBPF verification failures, something to bookmark when loading the eBPF program throws cryptic errors.

Innovation with eBPF: Keyval uses eBPF for automated instrumentation in Odigos, and shared a new way to make instrumentation as aeasy and assible as possible, without modifying containers. The blog post shows how to use Kubernetes Device Plugins over ConfigMaps, Secrets and InitContainers with shared volumes. The benefit are shorter YAML configuration, support for security contexts, and scheduler friendliness. Odigos v0.1.42 added support for virtual device instrumentation.

Platform innovation where eBPF can be run - "After three+ years at AMD and Xilinx, our compiler that turns plain eBPF XDP C code into a high-performance, fully-custom hardware packet processing pipeline (for FPGA, in HLS C++) via LLVM IR and a bouqet of passes is now open-source", the proof-of-concept shared by Stephan Diestelhorst on LinkedIn. "Nanotube is a collection of compiler passes, libraries, and an API to facilitate execution of EBPF XDP and similar networking code on an FPGA in a SmartNIC." In my studies 20 years ago (Hardware/Software Systems Engineering), we learned programming VHDL on a field programmable gate array (FPGA) using Xilinx Spartan III. Super excited to come back to my development learnings with new technology using eBPF :-)

πŸ€– The Inner Dev learning AI/ML

Learning to code with the help of AI? The creators of added Boots, an artificially intelligent bear that explains code (and comes with a personality, too). I did not know about yet, it is a quest-like learning platform that teaches you backend development. Tried the demo with Python, very addictive to continue and verify my knowledge (I stopped to finish writing this newsletter ;)).

AWS announced a partnership with Hugging Face, which is developing a ChatGPT equivalent. AWS cloud customers get access to Hugging Face product, including a language generation tool, as a building block for own applications. The AI tools will be available through Amazon's Sagemaker program. A demo of Sagemaker is available in the #EveryoneCanContribute cafe meetup from 2021.

This illustration of ChatGPT3 defines key terms (Generative, Pre-trained, Transformer, ChatGPT), the history progression, and how it was trained. A good cheatsheet to get started. Legitify added support for OpenAI ChatGPT to find common misconfigurations in GitLab/GitHub setups (SaaS, self-managed), in v0.2.3. The implementation PR shows how GPT3 is added into the Golang source.

The folks from GroupThink performed an interesting experiment: They asked their AI to bookmark the GitLab Developer Evangelism handbook, which provides too much data to summarize it similar to ChatGPT questions. Instead, the author asked the AI to share the team KPIs, and let the AI analyze and suggest KPIs for the next quarter. Additionally, they asked for good KPI frameworks for DevRel. (KPIs are definitely the hardest part about KPIs imho).

Raycast is also investing into AI - the shortcut commands to generate Git repository changelogs look very promising. Looking forward to using it in production!

πŸ›‘οΈ The Sec in Ops in Dev

Securing application code and its memory usage is challenging, with differences across the languages used. How to secure memory-safe vs. manually managed languages dives deep into memory safe languages which provide garbage collection (Python, Go, Java) and unsafe memory languages (C, C++), and which security scanners to use to detect potential vulnerabilities.

Gitpod added support for signing commits with 1Password using SSH keys. The technical background is 1Password using SSH agent forwarding, and it only works in a Gitpod supported desktop IDE until now. Implements security best practices with commit verification support in GitLab/GitHub, to help prevent supply chain attacks.

Why you should offload your PostgreSQL analytical workloads to ClickHouse uses YouTube video archive data for analytics in both backends, and benchmarks the results. Very interesting read, following my learnings from this blog post. Clickhouse Inc is also massively investing into their SaaS offering, and shared how they built ClickHouse Cloud From Scratch in a Year. Architecture design, reliability challenges, benchmarks, pricing model, UI and product analytics - everything to learn from.

Flowchart design is hard: How much detail is helpful, does it need refactoring later, and which tools can help? The Mermaid Charts team shared a practical article evaluating all the pros and cons of complex flowcharts.

β›… Cloud Native

Felipe Cruz shared their learning story to run your own WebAssembly Cloud in Hetzner Cloud, following a tutorial guide by David Flannagan on Rawkode Academy. The cloud provisioning involves the Hashistack (Terraform, Nomad, Consul, Vault), Traefik and Fermyon on Hetzner Cloud (PR). Fermyon Spin is a framework for building and running event-driven microservice applications with WebAssembly. The spin CLI provides tutorial-alike application quickstarts, which can then be built and deployed using Fermyon, using a cloud provider setup. The spin applications are packaged as OCI images signed with cosign - trying it out yourself can help understand two technologies and their use cases: WebAssembly deployments and Sigstore cosign.

Kunal Kushwaha shared an excellent tutorial for Kubernetes beginners on YouTube. Saiyam Pathak teased the next wave of the KubeSimplify learning series, while sharing the existing workshops on Youtube including Kubernetes 101, GitOps with ArgoCD, Container & Security, Troubleshooting, Observability, and much more.

πŸ‘οΈ Observability

Lili Cosic wrote a fantastic piece about Observability in practice. It walks you through the definitions, where to get started (kube-prometheus in Kubernetes environments), cleaning up the myth of the perfect observability, and guiding to define your service level objectives. Alert fatigue, overrated dashboards, and logging should die ... runbooks everywhere, instrument everything, put profiling on your priority list, and tracing on your backlog. Lili also compiles a list of interesting observability projects.

Michael Hausenblas announced their "OpenTelemetry in use" monthly podcast focussing on OpenTelemetry usage, where practitioners can learn from end-users, contributors and vendors. Tune in at Dotan Horovits discussed "FinOps Observability: Monitoring Kubernetes Cost with OpenCost" with Matt Ray. A 2023 theme is efficiency, and reducing operational cost is part of this strategy. The recording for my talk "From Monitoring to Observability: eBPF Chaos" at Config Management Camp is on YouTube. Dive into my eBPF learning story, learn about production usage, and how eBPF inspires reliability and DevSecOps ideas.

"You know, I'd consider working at Netflix again just to use the internal time series metrics system (Atlas) amd the metric dashboard system (Lumen)..." sparked my interest in this Twitter thread. Atlas was first announced as Telemetry platform in 2014, at a time when Prometheus was two years old (10 years interview). It is open-source, with in-memory storage for fast real-time analytics, and provides code instrumentation - ideas that inspired a common specification with OpenTelemetry. Lumen, announced in 2018, on the other hand is a self-service dashboarding platform for Netflix. Its design principles with data sources, visualizations, mappers, variables follows Grafana, and it might get inspired by Perses providing dashboards-as-code - who knows, Lumen is not open source yet. The challenges to build observability at scale never change, and a look-back in time can be a good inspiration :-)

πŸ“š Tools and tips for your daily use

  • Broot, a new way to see and navigate directory trees. (and much faster than tree).
  • otel-desktop-viewer is a CLI tool for receiving OpenTelemetry traces while working on your local machine that helps you visualize and explore your trace data without needing to send it on to a telemetry vendor. What a time saver!
  • otel-cli is an OpenTelemetry command-line tool for sending events from shell scripts & similar environments. Can be useful in CI/CD environments for example.
  • neovim, as an alternative to the vanilla vim editor. It supports Lua plugins next to vimscript, and provides faster and more accurate syntax highlighting.
  • Visual Studio 2022 17.5 comes with an integrated HTTP client, allowing you to send and debug HTTP requests from the UI.
  • Chrome 112 brings a new headless mode which uses native functionality (and is not a separate browser implementation anymore). Headless browsers can be used for end-to-end tests in CI/CD for example.

πŸ”– Book'mark

🎯 Release speed-run

Flux CD v0.40.0 brings observability improvements for image resources, flux stats -A now pronts a report of custom resources and amoung of cumulative storage, and much more.

docker-compose v1 in Docker Desktop will be removed in June 2023. The Homebrew formula on macOS already provides docker-compose v2, Linux distributions using the Docker repositories need to install compose as CLI plugin. Docker also sent an email that the free team organization tier on Docker Hub is deprecated. It was not clear whether container images would be deleted when maintainers do not upgrade to a paid tier, or apply for the open-source program. A later blog post including a FAQ, and Twitter discussions shed more light - team organization maintainers can migrate to a personal account, open-source program, or paid tier to continue updating their images. My guess is that we will see certain container images becoming outdated quite fast, implying security issues. To help everyone, I've quickly written an article on the GitLab blog which includes analysis of CI/CD images, Kubernetes manifests, and custom built container sources. It also shows ways to use the dependency proxy and the container registry to mitigate the problems, and observability hints to verify that CI/CD pipelines and Kubernetes environments stay healthy.

OpenSearch 2.6 introduces a simple schema for Observability, to standardize acessing metrics, traces and unstructured data such as logs. The schema conforms to the OpenTelemetry standards. Zabbix 6.4 brings just-in-time user provisioning, supporting SAML/LDAP, adds support for instant propagation of config changes in distributed monitoring environments, zero-downtime upgrades, SNMP discovery/bulk data improvements, and real-time streaming of metrics and events over HTTP to external consumers. Impressive!

Hurl 2.0.0 adds support for GraphQL API queries, allows processing data with filters, and much more. I wrote about Hurl for continuous testing in this blog post.

And more: PyTorch 2.0, TypeScript 5.0

πŸŽ₯ Events and CFPs

πŸ‘‹ CFPs due soon

Looking for more CfPs?

🎀 Shoutouts

"I Made the Same Game in 8 Engines" is both challenging, entertaining and fun to learn. (in my second life, I'll be a game designer/developer)


Thanks for reading! If you are viewing the website archive, make sure to subscribe to stay in the loop!

See you next month - let me know what you think on LinkedIn, Twitter, Mastodon.



PS: If you want to share items for the next newsletter, please check out the contributing guide - tag me in the comments, send me a DM or submit this form. Thanks!