![]() ![]() More interestingly, Okta flagged the possibility of a future product enhancement to enable automatic failover between cells: “I think there is an interest product enhancement out of your question, which is how Okta could allow you to have the same organization in multiple cells, and if one goes down, we can elegantly funnel traffic to the other cell. A post in the Okta Developer Community forum notes that it’s possible “to run multiple authorization servers in the same organization,” which “could be used for failover if one authorization server went down” although, this wasn’t the intended purpose of the capability. Given the critical “front door” nature of the Okta service, the incident has given users-and Okta itself-some pause for thought on redundancy. In total, this 403 problem lasted for an hour, according to the post-incident report, which matched ThousandEyes’ observations. The application remained available and accessible, even if it did not function fully as intended for a subset of users. The disruption was officially categorized as a “service degradation,” not a full outage. As a result, users didn’t have full access to some of the applications they normally use during their workday. A subset of the application’s icons that are usually displayed on the page didn’t render properly. While users could still sign in and access their Okta dashboard, some visual elements of the application did not appear like they normally do, impacting the usability of the application. ![]() The bug caused network rules implemented as part of a fix to be “incorrectly set to block requests,” manifesting as 403s on the front end. HTTP 403 forbidden errors in response to user authentication requests.Īccording to a post-incident report, a bug in Okta’s internal tooling prolonged the 403 issue. These issues were fixed after 30 minutes, but then the same cell-and others-started presenting 403 forbidden errors in response to user authentication requests. The problems initially manifested as 504 gateway timeout errors in one “cell” (Okta groups its public-facing infrastructure as a series of cells, isolated from one another). On March 12, Okta users in some geographies, including North America, experienced problems accessing their corporate applications when Okta’s single sign-on (SSO) service encountered issues. Read on for our analysis of these events and global outage trends, or use the links below to jump to the sections that most interest you. See the By the Numbers section below to learn more. U.S.-centric outages accounted for 34% of all observed outages. outage numbers continue the downward trend seen over the previous two weeks, with global outages dropping 33% over the two-week period, and U.S. Looking at overall outage trends, we also saw global and U.S. These incidents at companies like Okta, Twitch, Reddit, and GitHub leave important lessons for IT teams on how to navigate similar issues and minimize downtime for users. HTTP 403, 503, and 504 status codes dominated the last few weeks as multiple companies experienced application degradations and outages. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |