Just relax, take it easy..


近期看了一部电影:《首尔之春》,便萌生了韩国(South Korea)旅行的念头。正好趁五一假期,完成了一场六日五晚的冒险。

从上海出发,两小时的飞行便抵达首尔仁川机场(Incheon International Airport),后三天乘坐 KTX 到达釜山(Busan)。

When switching from Android to iOS, I was unable to find a light-weighted but handy habit tracking app, so I decided to make one by myself :)

For the name of the project, it came from a game called "Against the Storm" (which I spent over 100 hours, highly recommended). In the game, my favorite city builder species is beaver, hoping this web app works as a beaver to save ur precious moments in your fleeting life.

Recently, we discovered some unexpected autoscaling EC2_INSTANCE_TERMINATE events in our Scala system: "instance was taken out of service in response to an ELB system health check failure".

After checking the error logs, the issue was caused by Too many open files, which leaded to unsuccessful DNS resolution, consequently resulting in errors when accessing the AWS endpoint, finally causing the server to hang.

System will never go down without "changes", e.g. code release, traffic overloaded or external dependency down, ...

In all kinds of changes above, human changes are responsible for over 80% of incidents, as humans are not machines and make mistakes all the time :)

So when planning a change request for production operation (without a perfect and automated pipeline), how can we leverage strategies to minimize the risk and impact on our customers?

家庭网络质量,如同水电燃气一般,对生活幸福感至关重要。突发奇想为家中的网络中枢 HTTP proxy 编写 Prometheus exporter 并配置可观测大盘。

趁此机会学习 如何编写一个自定义的 exporter,以及 PromQL 中 rate/irate 函数的实现原理。


机缘巧合有幸加入 Autodesk,在外企中真实体验了三个月,分享一些个人感受。希望不管是对 SRE 还是职业生涯工作选择,都可以带来一点点参考与帮助 :)

熟悉我的朋友可能知道,博主最近机缘巧合换了一份工作(从互联网的大牙变成了外企的 Henry :))

正巧碰上新公司圣诞一周假期(Recharge Days),趁此机会一起回顾如海浪般平静却充满涌动的 2023