Skip to content

2023-06-08: GitOps energy efficiency, DevPod development environments, Learning eBPF & generative AI, Bridgekeeper Kubernetes policies, OpenObserve, Grafana Pyroscope, Coroot distributed tracing

Thanks for reading the web version, you can subscribe to the Ops In Dev newsletter to receive it in your mail inbox.

πŸ‘‹ Hey, lovely to see you again

May brought fresh ideas into AI efficiency ideas in DevSecOps workflows, learning progress on eBPF (and many walls I ran into), insights into remote development environments, and great initiatives touching energy efficiency and reducing CO2 emissions. More focus on learning technology and how to adopt it, and so many ideas. Great to see so many tutorials and free learning resources for AI and eBPF. Thanks, everyone!

There are also great Observability stories with OpenObserve, Grafana Pyroscope, and a cloud-native highlight with Bridgekeeper for Kubernetes policies in Python, next to OPA and Kyverno. There are more release highlights with tracee, trivy, and Coroot adding distributed tracing support with visual heatmaps.

Tip: If you plan to join KubeCon NA in Chicago in November, be quick with CFP submissions - the due date is June 18th. Kubernetes Community Days UK and Australia will also close their CFPs soon. I will be at Cloudland in Phantasialand, Germany on June 19-23, and later in September, at Container Days in Hamburg.

🌱 The Inner Dev learning ...

🐝 The Inner Dev learning eBPF

I wrote about my eBPF learning story in a new InfoQ.com article: "Learning eBPF for better Observability". Tackling the getting started experience, which tools to try, what programming languages and demos to look at, to finding use cases, and developing and testing eBPF in CI/CD workflows. My learning story continues, a second article will dive into tools for debugging in production. I'm also working on a learning eBPF workshop and more ideas and insights in future talks (excited for CloudLand and Container Days!).

Learning eBPF - need help?

Chilling in a hammock, surrounded by nature, after resetting my busy brain, I finally have read Liz Rice's great book "Learning eBPF" and learned many more insights and ways to understand eBPF. The book follows a great learning curve, and combined with my experience as Ops focussed developer I could dive into many of the C examples to verify and see something. I now understand the challenges with Compile-Once-Run-Everywhere (CO-RE) better and how BTF is used to update changes between Kernel versions - just wow! (explaining that here would be too much. Get the book, it is worth the money. You can skip complex topics if you want to dive directly into the concepts with Cilium and Tetragon at the end). I need time to write a thorough book review; for now, know that the "Learning eBPF" book ranks as high as "Hacking Kubernetes" in my recommendations for fantastic learning resources.

"There is a tooling gap to be filled between something like cosign or notary validating that an OCI image can be trusted and the kernel being able to say that an executable file from that image can be trusted to run eBPF programs." -- Liz Rice on Twitter. A very interesting idea indeed, eCHO Episode 93: BPF Signing. Watch this space.

More learning resources have been added to the eBPF topic on o11y.love, here are a few more recommendations:

πŸ€– The Inner Dev learning AI/ML

If you are wondering what Google Bard does differently to ChatGPT, this Twitter thread sheds some light: Internet access for real-time information, plugins for everyone, code suggestions, images in prompt results, image search, export data into Google docs or send via Gmail, website/article summary, multiple draft support, voice prompting, SEO companion.

Google published a learning path series for generative AI. The courses include an introduction to generative AI, Large Language Models (LLMs), Responsible AI, Generative AI Fundamentals, Image Generation, Encoder-Decoder Architecture, Attention Mechanism, Transformer Models/BERT Model, Image Captioning Models, Generative AI Studio, and Generative AI Explorer - Vertex AI. Thanks Emilio Salvador for sharing.

Elastic announced their Elasticsearch Relevance Engine(tm), claiming it will be advanced search for the AI revolution (LinkedIn post). Philipp Krenn also shared talk slides for an deep-dive into Elasticsearch: Vector and Hybrid Search. Very interesting to see where AI, LLMs and search engines are heading.

ProjectDiscovery.io released aix, a CLI tool to interact with OpenAI GPT3.5 and GPT4 (LinkedIn post).

slogpt.ai can generate a Service Level Objective based on a monitoring graph screenshot. It invites to ask the prompt questions about the suggested SLO, even singing a SLO song.

You can read more about my AI learning stories soon on the GitLab blog; meanwhile, I recommend following the AI/ML category and dive into code suggestions enhancements, AI-assisted merge request review summaries and many more efficiency insights. Cannot wait to apply them to my workflows :-) (this newsletter is written in Markdown using the GitLab Web IDE which will soon have code suggestions btw.)

πŸ›‘οΈ The Sec in Ops in Dev

Loft introduced DevPod, which empowers users to control where they run their development environment. It does not host or manage the dev environments; instead, it allows to define the environment which can be run on cloud infrastructure or localhost with Docker or Kubernetes. DevPod describes dev environments using the open DevContainer standard.

It is great to see that "bring your own infrastructure" for development environments resonates well with users. Last month, I looked into provisioning Kubernetes clusters to integrate with the cloud-based development environments beta in GitLab (DevRel use cases group, blog post soon). Platform engineers can take advantage of securing the workspaces, control where the data is stored, and overall provide developers a better development experience.

Learning from production incidents is great. Highly recommend subscribing to the Netflix Technology blog for great deep-dives, this month is about: Debugging a FUSE deadlock in the Linux kernel.

πŸ‘οΈ Observability

Gergely Orosz shared an interesting observation with regards to SaaS vs self-managed Observability:

β€œOur {popular observability vendor} bill is out of control so we decided to move this in-house, using Prometheus+Grafana.” I am hearing this a lot, these days. And teams are following through. β€œThe UX is not as good, but we expect to save $$” is feedback from teams that did it.

Some responses highlight what I see too: You do not want to push raw Observability data to SaaS vendors but use filters and pipeline ingestion before sending the data somewhere. The OpenTelemetry collector supports transforming telemetry and sends the data to local or SaaS services later. Also, a choice has to be made whether the data is useful for the range of an incident or needs long-term storage which can be very expensive.

OpenObserve (LinkedIn post) launched this week as a new open source alternative for Elastic/Splunk/Datadog. While its primary focus seems to be log management, more features support more Observability data ingestion (metrics, traces). OpenObserve stores data in a columnar format to support logs, metrics, and traces. The query language is based on SQL. Real-time and scheduled alerts with different targets can also be sent to Prometheus AlertManager. PromQL support for metrics is in development. OpenTelemetry support comes with instrumenting code to send traces, and sending metrics via Prometheus write to OpenObserve. The announcement blog post shares more insights on the reasons, and future roadmap. You can try the free tier SaaS, or install OpenObserve in your infrastructure.

Pyroscope was acquired by Grafana earlier this year, now the first milestone has been announced: Continuous Profiling is in public preview in Grafana Cloud. In the background, engineering work was done to merge Grafana's own project Phlare with Pyroscope, and keep features like horizontally scaling. The Grafana UI was enhanced to visualize and analyze profiling data.

More Observability learning resources this month:

🌀️ Cloud Native

Bridgekeeper is a new Kubernetes policy enforcement framework using Python as configuration language. It aims to be easier than Rego (Open Policy Agent). Bridgekeeper defines a custom resource definition called Policy which defines the rules with inline Python scripts. An example policy shows how to check for container images using the latest tag, by using Python dictionaries and string manipulation methods in 7 lines of code. Recommend to bookmark and try Bridgekeeper! Thanks to Christian Heckelmann for sharing.

The talk Evaluating the Energy Footprint of GitOps Architectures: A Benchmark Analysis compares energy consumption and CO2 emissions of different GitOps architectures with a series of experiments. The research uses Kepler, a light-weight low level power consumption metrics exporter. The research provides insights into the decision making for energy efficiency in the cloud-native ecosystems.

More cloud native learning resources:

πŸ“š Tools and tips for your daily use

πŸ”– Book'mark

🎯 Release speed-run

Kepler (Kubernetes Efficient Power Level Exporter) released 0.5, and onboarded as CNCF sandbox project. Prometheus 2.44.0 adds health and readiness checks into Promtool CLI, and improvements for remote-write/read. OpenSearch 2.7.0 and 2.8.0 bring GA for searchable snapshots and segment replication, multiple data sources support for OpenSearch Dashboards, and much more. Coroot 0.17.0 supports distributed tracing through OpenTelemetry. The coroot agent can auto-instrument protocols including HTTP, PostgreSQL, MzSQL, Redis, etc. to collect more tracing insights. The traces are stored in ClickHouse. The UI provides great visualization features, such as heatmaps to make it easier to pinpoint specific traces such as slow requests or errors.

tracee v0.15.0 allows to configure tracee using Kubernetes ConfigMaps, adds new policy actions webhook, forward, introduces experimental event data sources, and supports capturing read operations. trivy v0.42.0 allows to convert JSON reports into different formats, shows digests for OS packages, adds support for scanning Terraform Plan files. Kyverno v1.10 increases scalability with service decomposition, and extensibility via external service call support. It also adds support for CNCF Notary to help software supply chain security. Chaos Mesh v2.6.0 adds more stability, and by default runs the Chaos DNS server for simulating DNS faults.

Rust 1.70.0 adds stable support for OnceCell, OnceLock, IsTerminal and now enforces stability in the test CLI. GitLab 16.0 brings code suggestions and remote development environments in beta, GPU enabled SaaS runners on Linux, comment templates, fork sync on the GitLab UI, error tracking GA, reusable CI/CD components, and a new menu navigation as opt-in (default in 16.1 on June 22).

πŸŽ₯ Events and CFPs

πŸ‘‹ CFPs due soon

Looking for more CfPs?

🎀 Shoutouts

🌐

Thanks for reading! If you are viewing the website archive, make sure to subscribe to stay in the loop!

See you next month - let me know what you think on LinkedIn, Twitter, Mastodon.

Cheers,

Michael

PS: If you want to share items for the next newsletter, please check out the contributing guide - tag me in the comments, send me a DM or submit this form. Thanks!