• magguzu@midwest.social
    link
    fedilink
    English
    arrow-up
    105
    arrow-down
    1
    ·
    7 days ago

    So much talking out of ass in these comments.

    Federation/decentralization is great. It’s why we’re here on Lemmy.

    It also means you expect everyone involved, people you’ve never met or vetted, to be competent and be able to shell out the cash and time to commit to a certain level of uptime. That’s unacceptable for a high SLA product like Signal. Hell midwest.social, the Lemmy instance I’m on, is very often quite slow. I and others put up with it because we know it’s run by one person on one server that he’s presumably paying for himself. But that doesn’t reflect Lemmy as a whole.

    AWS isn’t just a bunch of servers. They have dedicated services for database clusters, cache store, data warehouse, load balancing, container clusters, kubernetes clusters, CDN, web access firewall, to name just a few. Every region has multiple datacenters, the largest by far of which is North Virginia’s. By default most people use one DC but multi region while being a huge expensive lift is something they already have tools to assist with. Also, and maybe most importantly, AWS, Azure and GCP run their own backbones between the datacenters rather than rely on the shared one that you, me, and most other smaller DCs are using.

    I’m a DevOps Engineer but I’m no big tech fan. I run my own hobby server too. Amazon is an evil company. But the claim that “multi cloud is easy, smaller CSPs are just as good” is naive at best.

    Ideally some legislation comes in and forces these companies to simplify the process for adopting multi cloud, because right now you have to build it all yourself and it becomes still very imperfect when you start to factor things like databases and DNS, and this is what they rely on hard for vendor lock-in.

    • shalafi@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      ·
      6 days ago

      Can’t find a screenshot, but when you’re logged in and click for the screen to show all AWS products, holy shit. AWS is far more than most people think.

    • douglasg14b@lemmy.world
      link
      fedilink
      English
      arrow-up
      18
      ·
      7 days ago

      Not to mention the fact that the grand majority of federalized services have extremely unsustainable performance characteristics that make them effectively impossible to scale from hobby projects

    • Dragonstaff@leminal.space
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      6 days ago

      AWS needs to be broken up way more than Ma Bell ever did. We need to have open protocols developed so that there can be actual competition.

      • jfrnz@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        6 days ago

        There is actual competition though, from Google and Microsoft at a minimum.

        • Dragonstaff@leminal.space
          link
          fedilink
          English
          arrow-up
          7
          ·
          5 days ago

          3-5 companies in a sector is an oligopoly, which acts nearly the same as a monopoly. This is not “actual competition”.

          All of these companies cornered their own markets, and now they own the backbone of the internet.

          If we broke up all of them and required open standards and interoperability then other companies could innovate.

            • Dragonstaff@leminal.space
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              5 days ago

              How much of the economy in the 60s was telecommunications vs how much of the economy today relies on the internet?

    • rumba@lemmy.zip
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      edit-2
      6 days ago

      DevOps here too, I’ve been starting to slide my smaller redundant services into k8s. I had to really defend my position not to use ECS.

      No, we’re using kubeadm because I don’t want to give a damn if it’s running in the office, or google or amazon or my house. It’s WAY harder and more expensive than setting up an eks and a EC/Aurora cluster, but I can bypass vendor lock in. Setting up my own clusters and replicas is a never ending source of work.

  • Axum@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    5
    ·
    7 days ago

    SimpleX literally solves the messaging problem. You can bounce through their default relay nodes or run your own to use exclusively or add to the mix. It’s all very transparent to end users.

    At most, aws outage would have only affected chats relayed on those aws servers.

    SimpleX also doesn’t require a fukkin phone number.

  • I Cast Fist@programming.dev
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    6 days ago

    Tangent: Jami is p2p, so the only risk of going offline is if everyone in the groups go offline. It does lack several quality of life features, though.

  • net00@lemmy.today
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    4
    ·
    7 days ago

    Didn’t only 1 AWS region go down? maybe before even thinking about anything else they should focus on redundancy within AWS

    • shalafi@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      6 days ago

      us-east-1 went down. Problem is that IAM services all run through that DC. Any code relying on an IAM role would not be able to authenticate. Think of it as a username in a Windows domain. IAM encompasses all that you are allowed to view, change, launch, etc.

      I didn’t hardly touch AWS at my last job, but listening to my teammates and seeing their code led me to believe IAM is used everywhere.

      • amzd@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        4
        ·
        6 days ago

        How is that even legal, I thought there were data export laws in the eu

        • shalafi@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          ·
          6 days ago

          Nothing to do with moving data. But you can’t move data without authentication.

          I want my service to do a $thing. It won’t do $thing without knowing who I am and what permissions I have. The data doesn’t have to cross borders, the service simply needs to function.

          Does that make sense? As I said, didn’t do much in AWS, but the principles are sound.

    • magguzu@midwest.social
      link
      fedilink
      English
      arrow-up
      8
      ·
      7 days ago

      This is the actual realistic change a lot of people are missing. Multi cloud is hard and imperfect and brings its own new potential issues. But AWS does give you tools to adopt multi region. It’s just very expensive.

      Unfortunately DNS transcends regions though so that can’t really be escaped.

    • Evotech@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      7 days ago

      Apparently even if you are fully redundant there’s a lot of core services in US east 1 that you rely on

    • lando55@lemmy.zip
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      7 days ago

      This has been my biggest pet peeve in the wake of the AWS outage. If you’d built for high-availability and continuity then this event would at most have been a minor blip in your services.

      • shalafi@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        6 days ago

        Yeah, but if you want real redundancy, you pay double. My team looked into it. Even our CEO, no tightwad, just laughed and shook his head when we told him.

  • majster@lemmy.zip
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    6
    ·
    7 days ago

    They are serving 1on1 chats and group chats. That practically partitions itself. There are many server lease options all over the world. My assumption is that they use some AWS service and now can’t migrate off. But you need an oncall team anyway so you aren’t buying that much convenience.

    • boonhet@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      20
      ·
      7 days ago

      There are many server lease options all over the world

      It increases complexity a lot to go with a bunch of separate server leases. There’s a reason global companies use hyperscalers instead of getting VPSes in 30 or 40 different countries.

      I hate the centralization as much as everyone else, but for some things it’s just not feasible to go on-prem. I do know an exception. Used to work at a company with a pretty large and widely spread out customer base (big corps on multiple continents) that had its own k8s cluster in a super secure colocation space. But our backend was always slow to some degree (in multiple cases I optimized multi-second API endpoints into 10-200ms), we used asynchronous processing for the truly slow things instead of letting the user wait for a multi-minute API request, and it just wasn’t the sort of application that you need to be super fast anyway, so the extra milliseconds of latency didn’t matter that much, whether it was 50 or 500.

      But with a chat app, users want it to be fast. They expect their messages to be sent as soon as they hit the send button. It might take longer to actually reach the other people in the conversation, but it needs to be fast enough that if the user hits send and then immediately closes the app, it’s sent already. Otherwise it’s bad UX.

      • vacuumflower@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        7 days ago

        It’s weird for Signal to not be able to do what Telegram does. Yes, for this particular purpose they are not different.

          • vacuumflower@lemmy.sdf.org
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            2
            ·
            7 days ago

            For the purpose of “shoot a message, go offline and be certain it’s sent” it’s the same service.

            • Jean-luc Peak-hard@piefed.social
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              6 days ago

              If sending a message is the only requirement, email fits the bill and has worked for half a century. If we’re being real, the reason Signal “can’t do what Telegram does” is because Telegram doesn’t even attempt to do what Signal does. Signal is tackling a much bigger problem.

              • vacuumflower@lemmy.sdf.org
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                6 days ago

                What are you talking about?

                I’m saying that the parts of infrastructure needed to accept a message to the service from the client application, encrypted or not, associated to a user or not, are under same requirements for Signal and Telegram.

                I don’t know if you understand that every big service is basically its own 90s’ Internet self-contained, and what accepts your messages is pretty similar to an SMTP server in their architecture.

  • sugar_in_your_tea@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    4
    ·
    7 days ago

    Why is it that only the larger cloud providers are acceptable? What’s wrong with one of the smaller providers like Linode/Akamai? There are a lot of crappy options, but also plenty of decent ones. If you build your infrastructure over a few different providers, you’ll pay more upfront in engineering time, but you’ll get a lot more flexibility.

    For something like Signal, it should be pretty easy to build this type of redundancy since data storage is minimal and sending messages probably doesn’t need to use that data storage.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        7
        ·
        7 days ago

        It is, compared to AWS, Azure, and Google Cloud. Here’s 2024 revenue to give an idea of scale:

        • Akamai - $4B, Linode itself is ~$100M
        • AWS - $107B
        • Azure - ~$75B
        • Google Cloud - ~$43B

        The smallest on this this list has 10x the revenue of Akamai.

        Here are a few other providers for reference:

        • Hetzner (what I use) - €367M
        • Digital Ocean - $692.9M
        • Vultr (my old host) - not public, but estimates are ~$37M

        I’m arguing they could put together a solution with these smaller providers. That takes more work, but you’re rewarded with more resilience and probably lower hosting costs. Once you have two providers in your infra, it’s easier to add another. Maybe start with using them for disaster recovery, then slowly diversify the hosting portfolio.

        • Squizzy@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 days ago

          10% the size of google is decent. If I had ten percent of a tech giant’s reach in any particular sector I would consider myself significant but I get where you ae coming from

    • Encrypt-Keeper@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      6
      ·
      7 days ago

      Also you know… building your own data centers / co-locating. Even with the added man hours required it ends up being far cheaper.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        7 days ago

        But far less reliable. If your data center has a power outrage or internet disruption, you’re screwed. Signal isn’t big enough to have several data centers for geographic diversity and redundancy, they’re maybe a few racks total.

        Colo is more feasible, but who is going to travel to the various parts of the world to swap drives or whatever? If there’s an outage, you’re talking hours to days to get another server up, vs minutes for rented hosting.

        For the scale that signal operates at and the relatively small processing needs, I think you’d want lots of small instances. To route messages, you need very little info, and messages don’t need to be stored. I’d rather have 50 small replicas than 5 big instances for that workload.

        For something like Lemmy, colo makes a ton of sense though.

        • Encrypt-Keeper@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          5
          ·
          7 days ago

          It’s plenty reliable. AWS is just somebody else’s datacenter.

          Colo is more feasible, but who is going to travel to the various parts of the world to swap drives or whatever?

          Most Colo DCs offer ad hoc remote hands, but that’s beside the point. What do you mean here by “Various parts of the world”? In Signal’s case even Amazon didn’t need anyone in “various parts of the world” because the Signal infra on AWS was evidently in exactly one part of the world.

          If there’s an outage, you’re talking hours to days to get another server up, vs minutes for rented hosting.

          You mean like the hours it took for Signal to recover on AWS, meanwhile it would have been minutes if it was their own infrastructure?

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            3
            ·
            6 days ago

            the Signal infra on AWS was evidently in exactly one part of the world.

            We don’t necessarily know that. All I know is that AWS’s load balancers had issues in one region. It could be that they use that region for a critical load balancer, but they have local instances in other parts of the world to reduce latency.

            I’m not talking about how Signal is currently set up (maybe it is that fragile), I’m talking about how it could be set up. If their issue is merely w/ the load balancer, they could have a bit of redundancy in the load balancer w/o making their config that much more complex.

            You mean like the hours it took for Signal to recover on AWS, meanwhile it would have been minutes if it was their own infrastructure?

            No, I mean if they had a proper distributed network of servers across the globe and were able to reroute traffic to other regions when one has issues, there could be minimal disruption to the service overall, with mostly local latency spikes for the impacted region.

            My company uses AWS, and we had a disaster recovery mechanism almost trigger that would move our workload to a different region. The only reason we didn’t trigger it is because we only need the app to be responsive during specific work hours, and AWS recovered by the time we needed our production services available. A normal disaster recovery takes well under an hour.

            With a self-hosted datacenter/server room, if there’s a disruption, there is usually no backup, so you’re out until the outage is resolved. I don’t know if Signal has disaster recovery or if they used it, I didn’t follow their end of things very closely, but it’s not difficult to do when you’re using cloud services, whereas it is difficult to do when you’re self-hosting. Colo is a bit easier since you can have hot spares in different regions/overbuild your infra so any node can go down.

            • Encrypt-Keeper@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              6 days ago

              It was a DNS issue with DynamoDB, the load balancer issue was a knock-on effect after the DNS issue was resolved. But the problem is it was a ~15 hour outage, and a big reason behind that was the fact that the load in that region is massive. Signal could very well have had their infrastructure in more than one availability zone but since the outage affected the entire region they are screwed.

              You’re right that this can be somewhat mitigated by having infrastructure in multiple regions, but if they don’t, the reason is cost. Multi-region redundancy costs an arm and a leg. You can accomplish that same redundancy via Colo DCs for a fraction of the cost, and when you do fix the root issue, you won’t then have your load balancers fail on you because in addition to your own systems you have half the internet all trying to pass its backlog of traffic at once.

              • sugar_in_your_tea@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                1
                ·
                6 days ago

                Multi-region redundancy costs an arm and a leg

                Yes, if you buy an off the shelf solution, it’ll be expensive.

                I’m suggesting treating VPS instances like you would a colo setup. Let cloud providers manage the hardware, and keep the load balancing in house. For Signal, this can be as simple as client-side latency/load checks. You can still colo in locations with heavier load; that’s how some Linux distros handle repo mirrors, and it works well. Signal’s data needs should be so low that simple DB replicas should be sufficient.

  • balance8873@lemmy.myserv.one
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    6
    ·
    7 days ago

    The phrasing of the quotes is very “I sure hope someone comes along and fixes this for me because I’m not going to”

  • PieMePlenty@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    9
    ·
    7 days ago

    Matrix solved this with decentralization and federation. Don’t tell me its not possible.

    • menas@lemmy.wtf
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      2
      ·
      7 days ago

      Decentralized matrix has not the quality level for production. The only matrix user that has no issue have their account on all of their contacts on matrix.org. So, use it as a centralize app.

      • carrylex@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        5
        ·
        6 days ago

        And your source?

        I’m running and am part of a Matrix server for years and experienced near zero problems with them so far.

        • magguzu@midwest.social
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          1
          ·
          6 days ago

          Great. Can you reference an SLA to prove that, and what’s the size of that server?

          Apples and oranges.

          • carrylex@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            2
            ·
            6 days ago
            1. We don’t have a SLA because SLAs are worthless pinky promises
            2. 10-200 people
            • magguzu@midwest.social
              link
              fedilink
              English
              arrow-up
              4
              ·
              6 days ago
              1. Where do people keep getting this SLA pink promise information? Your SLA has to be presented to potential clients and whether you’ve been successful in maintaining it can make or break them being on board. It’s also audited for things like SOC2 compliance.

              2. That’s a microscopic fraction of what a product the size of Signal is dealing with, and unimaginably small compared to AWS

  • Tiger_Man_@szmer.info
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    10
    ·
    6 days ago

    There always is another and better choice and it’s called using your own fucking servers

    • PrettyFlyForAFatGuy@feddit.uk
      link
      fedilink
      English
      arrow-up
      12
      ·
      6 days ago

      There was a big exodus to signal for a while a couple years ago when meta were fucking with their whatsapp privacy policy, similar to the exodus from reddit to lemmy.

      Having your infrastructure on a cloud provider allows you to keep your costs in line with your current amount of users, if you have a big influx you can immediately scale up to accommodate them, and then when that spike in users dies off as they invariably do you can scale back down instead of being left with a load of hardware you’ve just bought for your new users (that have since fucked off) and now aren’t using

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 days ago

      using your own fucking servers

      And/or peer to peer mesh. Personally, I WANT a system that has peak performance AND multiple fallbacks to prevent blackout single point of failure situations.

  • carrylex@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    12
    ·
    6 days ago

    Just read through the bluesky thread and it’s obvious that she’s a CEO and has no idea how to code or design infrastructure

    It’s leasing access to a whole sprawling, capital-intensive, technically-capable system that must be just as available in Cairo as in Capetown, just as functional in Bangkok as Berlin.

    Yeah then why was Signal completely down when a single datacenter (us-east-1) fails and all others are working perfectly?

    Did it ever come to your brilliant mind that your system design might be the problem?

    Jump over your shadow, say that you screwed up and tell the people that you are no longer going to rely on a single S3 bucket in us-east-1 and stop your fingerpointing.

    But you don’t even manage to host a proper working status page or technically explain your outages, so guess this train is long gone…

    • Fjdybank@lemmy.ca
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      6 days ago

      Way to shoot the messenger there. Or are you also taking that pitchfork after Jassy?

  • Galactose@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    11
    ·
    edit-2
    6 days ago

    Excuse me, but I don’t believe this BS. EDIT: As in Signal’s excuse. (Sorry, I should’ve been clear)

  • zr0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    14
    ·
    7 days ago

    Wrong. It is actually quite easy to use multiple clouds with the help of OpenTofu. So it is just a cheap excuse

  • Mubelotix@jlai.lu
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    23
    ·
    7 days ago

    I call it bullshit too. If its too expensive for them, just decentralize the project. Self-hosters all around the world would help. I alone have better uptime than AWS and probably wouldn’t even notice usage from a few hundred thousands users

    • 1984@lemmy.today
      link
      fedilink
      English
      arrow-up
      33
      arrow-down
      2
      ·
      7 days ago

      You cant run a professional service on self hosters hardware…

      I think you guys dont really have experience of building these global, low latency apps and dont know the challanges that come with that…

    • magguzu@midwest.social
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      2
      ·
      7 days ago

      you, on a single ISP who relies on the world’s shared backbone rather than your own between multiple DCs within a region and multiple regions around the world, have better uptime than AWS?

      Stop.

      I’m all for decentralizing for the case of no single entity controlling everything, but not for the case of uptime. That is one thing you give up with services like Matrix or Lemmy.

      AWS actually has an SLA it’s contractually committed to when you pay them with thousands of engineers working to maintain it.

      • Mubelotix@jlai.lu
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        6
        ·
        7 days ago

        Well yes considering the downtime they had. SLA is just words on a paper, you also need to not fuck your infrastructure up. Even if all self-hosters had 99% uptime which is bad, it’s easy building a system that replicates data on a few of them to achieve resiliency. People need to stop assuming they can be 100% reliant on a single host and actually design their systems to take downtimes into account and recover from them

        • magguzu@midwest.social
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          1
          ·
          7 days ago

          It’s not just words on paper. It’s a level of service you commit to and owe repercussions when broken.

  • shortwavesurfer@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    26
    ·
    7 days ago

    I’m going to call bullshit in that there are several networks that might be capable of doing this such as several blockchain networks or IPFS.

    • JoshuaFalken@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      7 days ago

      I’m going to call bullshit on the underlying assertion that Signal is using Amazon services for the sake of lining Jeff’s pocket instead of considering the “several” alternatives. As if they don’t have staff to consider such a thing and just hit buy now on the Amazon smile.

      In any monopoly, there are going to be smaller, less versatile, less reliable options. Fine and dandy for Mr Joe Technology to hop on the niche wagon and save a few bucks, but that’s not going to work for anyone casting a net encompassing the world.