Skip Navigation

I eat words

@ saint @group.lt

Posts
198
Comments
63
Joined
4 yr. ago

Linuxoid

Matrix - @saint:group.lt

  • heh, like other models are safe and reliable ;-)

  • no, no and no, but you will have to find an answer if your decision to have or not to have kids was the right choice in any case.

  • Būtų įdomu paskaityt tai kas ten iš tiesų įvyko ir kaip buvo tvarkoma, bet turbūt Cloudflare lygio post-mortem analizės tikėtis neverta.

  • Sysadmins for sysadmins @group.lt

    India Is Building an Open-Source Cloud Computing Effort

    spectrum.ieee.org /cloud-computing-in-india
  • Sysadmins for sysadmins @group.lt

    Improving platform resilience at Cloudflare through automation

    blog.cloudflare.com /improving-platform-resilience-at-cloudflare/
  • Sysadmins for sysadmins @group.lt

    | Amazon.com, Amazon App, AWS Outage Reported Across U.S. on FridayFrequent Business Traveler

    www.frequentbusinesstraveler.com /2024/10/amazon-com-amazon-app-aws-outage-reported-across-u-s-on-friday/
  • They cut all such scenes and pasted into The Boys, in a Mark Twain style “Sprinkle these around as you see fit!”.

  • Biology @mander.xyz

    Most Life on Earth Is Dormant, After Pulling an ‘Emergency Brake’ | Quanta Magazine

    www.quantamagazine.org /most-life-on-earth-is-dormant-after-pulling-an-emergency-brake-20240605/
  • Sysadmins for sysadmins @group.lt

    Finnish Startup Wants to Build 100x Faster CPUs

    spectrum.ieee.org /parallel-processing-unit
  • Science @lemmy.ml

    Doom scrolling - Works in Progress

    worksinprogress.co /issue/doom-scrolling/
  • Science @lemmy.ml

    The Physics of Cold Water May Have Jump-Started Complex Life | Quanta Magazine

    www.quantamagazine.org /the-physics-of-cold-water-may-have-jump-started-complex-life-20240724/
  • Science @lemmy.ml

    With ‘Digital Twins,’ The Doctor Will See You Now | Quanta Magazine

    www.quantamagazine.org /with-digital-twins-the-doctor-will-see-you-now-20240726/
  • Science @beehaw.org

    The S-Matrix Is the Oracle Physicists Turn To in Times of Crisis | Quanta Magazine

    www.quantamagazine.org /the-s-matrix-is-the-oracle-physicists-turn-to-in-times-of-crisis-20240523/
  • Sysadmins for sysadmins @group.lt

    Enable build system on macOS hosts - Daniel Gomez via B4 Relay

    lore.kernel.org /dri-devel/20240906-macos-build-support-v2-0-06beff418848@samsung.com/
  • Science @lemmy.ml

    Across a Continent, Trees Sync Their Fruiting to the Sun | Quanta Magazine

    www.quantamagazine.org /across-a-continent-trees-sync-their-fruiting-to-the-sun-20240618/
  • Sysadmins for sysadmins @group.lt

    UCLA's Leonard Kleinrock on packet switching, early Internet

  • Sysadmins for sysadmins @group.lt

    How We Built the Internet

    every.to /p/how-we-built-the-internet
  • no

  • Sysadmins for sysadmins @group.lt

    Incantations

    josvisser.substack.com /p/incantations
  • Science @beehaw.org

    Redefining the scientific method: as the use of sophisticated scientific methods that extend our mind

    academic.oup.com /pnasnexus/article/3/4/pgae112/7626940
  • Reread today again, with some highlights:

    Lessons Learned from Twenty Years of Site Reliability Engineering

    Metadata

    Highlights

    The riskiness of a mitigation should scale with the severity of the outage

    We, here in SRE, have had some interesting experiences in choosing a mitigation with more risks than the outage it's meant to resolve.

    We learned the hard way that during an incident, we should monitor and evaluate the severity of the situation and choose a mitigation path whose riskiness is appropriate for that severity.

    Recovery mechanisms should be fully tested before an emergency

    An emergency fire evacuation in a tall city building is a terrible opportunity to use a ladder for the first time.

    Testing recovery mechanisms has a fun side effect of reducing the risk of performing some of these actions. Since this messy outage, we've doubled down on testing.

    We were pretty sure that it would not lead to anything bad. But pretty sure is not 100% sure.

    A "Big Red Button" is a unique but highly practical safety feature: it should kick off a simple, easy-to-trigger action that reverts whatever triggered the undesirable state to (ideally) shut down whatever's happening.

    Unit tests alone are not enough - integration testing is also needed

    This lesson was learned during a Calendar outage in which our testing didn't follow the same path as real use, resulting in plenty of testing... that didn't help us assess how a change would perform in reality.

    Teams were expecting to be able to use Google Hangouts and Google Meet to manage the incident. But when 350M users were logged out of their devices and services... relying on these Google services was, in retrospect, kind of a bad call.

    It's easy to think of availability as either "fully up" or "fully down" ... but being able to offer a continuous minimum functionality with a degraded performance mode helps to offer a more consistent user experience.

    This next lesson is a recommendation to ensure that your last-line-of-defense system works as expected in extreme scenarios, such as natural disasters or cyber attacks, that result in loss of productivity or service availability.

    A useful activity can also be sitting your team down and working through how some of these scenarios could theoretically play out—tabletop game style. This can also be a fun opportunity to explore those terrifying "What Ifs", for example, "What if part of your network connectivity gets shut down unexpectedly?".

    In such instances, you can reduce your mean time to resolution (MTTR), by automating mitigating measures done by hand. If there's a clear signal that a particular failure is occurring, then why can't that mitigation be kicked off in an automated way? Sometimes it is better to use an automated mitigation first and save the root-causing for after user impact has been avoided.

    Having long delays between rollouts, especially in complex, multiple component systems, makes it extremely difficult to reason out the safety of a particular change. Frequent rollouts—with the proper testing in place— lead to fewer surprises from this class of failure.

    Having only one particular model of device to perform a critical function can make for simpler operations and maintenance. However, it means that if that model turns out to have a problem, that critical function is no longer being performed.

    Latent bugs in critical infrastructure can lurk undetected until a seemingly innocuous event triggers them. Maintaining a diverse infrastructure, while incurring costs of its own, can mean the difference between a troublesome outage and a total one.

  • This is what you get when are not sleeping during biology classes.

  • a source code of a game ;))

  • i am all for normalizing raiding ambassies for [put the cause you support] as well

  • woah, so nothing is sacred now? 😱🤔😐

  • looks interesting, but not this one.

  • can do, if you could provide the link to the debunking source - would be great!

  • nice, thank you.

  • a lot of things are possible if you are lucky enough ;)

  • well this is probably PR as there is no such system nor it can be made that can have 100% uptime. not talking about the fact that network engineers rarely work with servers :)