SolvedRocket.Chat Rocket.Chat stops working with 1000 active users

Description:

For us, Rocket.Chat does not work with more than 1000 active users. Rebooting a server, restarting Apache or restarting Rocket.Chat after an update causes all clients to face serious issues connecting to the chat.

Steps to reproduce:

  1. Setup a chat with 1000 simultaneous active users
  2. Restart all instances at once.

Expected behavior:

Clients can reconnect to the chat.

Actual behavior:

While reconnecting the server sends an enormous amount of the following messages over websocket:

{"msg":"added","collection":"users","id":"$userId","fields":{"name":"$displayName","status":"online","username":"$username","utcOffset":2}}
{"msg":"changed","collection":"users","id":"$userId","fields":{"status":"online"}}
{"msg":"removed","collection":"users","id":"$userId"}

This continues until the server closes the websocket. I assume, this is due to the lack of ping-pong messages in this time. The client instantly requests a new websocket starting the whole thing over and over again.

The only effective way to get the cluster up and working again is to force-logout all users by deleting their loginTokens from mongodb directly.

Server Setup Information:

  • Version of Rocket.Chat Server: 0.65.2
  • Operating System: Debian 8.11
  • Deployment Method: tar with pm2
  • Number of Running Instances: 8 virtual machines with 3 instances each (24 instances)
  • DB Replicaset Oplog: On
  • NodeJS Version: 8.9.4
  • MongoDB Version: 3.4.9

Additional context

The high amount of instances we operate directly results from this issue. When we first ran into it with about 700 users, we assumed we might need to scale the cluster accordingly but we are not willing to add another server to the cluster for every 40 new users. We planned to support around 8000 users. Approximately half of them active.

For now, we do not allow mobile clients yet. We would really love to do so but with the current state of the cluster this wont happen soon.

58 Answers

✔️Accepted Answer

Thanks for all of your suggestions. We could now prove that the UserPresenceMonitor was responsible for the denial of service we faced.

We disabled it on all but two separate instances and can restart the cluster now without causing tons of status updates.

We did so by patching the source and setting USER_PRESENCE_MONITOR environment:

--- rocket.chat/programs/server/app/app.js	2018-07-04 18:07:36.917547890 +0200
+++ app.js	2018-07-04 18:10:12.273401726 +0200
@@ -7753,7 +7753,10 @@
 
   InstanceStatus.registerInstance('rocket.chat', instance);
   UserPresence.start();
-  return UserPresenceMonitor.start();
+
+  if (process.env['USER_PRESENCE_MONITOR']) {
+    return UserPresenceMonitor.start();
+  }
 });
 /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
 

I would still like to have an official fix for this rather than patching the source with every update.

@magicbelette We still use the slow database engine but we do not observer high CPU load or memory usage on database servers yet.

Other Answers:

@sampaiodiego Thanks to COVID-19 we had a lot of trouble scaling beyond 1650 active users. Thanks to v3.0.4 we are now at 2250 user max. per day. Thank you for further improving on this issue.

hi @kaiiiiiiiii,
I am the admin of @AmShaegar13's Rocket.Chat-Setup, he kindly asked me to post this here:

001-rs:PRIMARY> db.serverStatus().connections
{ "current" : 182, "available" : 51018, "totalCreated" : 3234457 }

root@rocketchatdb:~# lsof -i | grep mongodb | wc -l
186

So this shouldn't be a thing…

Best,
qchn

This is getting a major annoyance. More and more users complain about broken notifications. This issue is a major drawback for acceptance in our company.

Just wanted to follow up here. We are working through another case like this. So this is definitely on our radar.

Related Issues:

63
Rocket.Chat AppArmor errors after Snap Update
So I have a workaround for this: nano /var/lib/snapd/apparmor/profiles/snap.rocketchat-server.rocket...
21
Rocket.Chat Can't access RocketChat after setting iframe URL to localhost
Hi a quick way to resolve getting back to the admin console via iframe : Go to iframe browser consol...
19
Rocket.Chat Franz - invalid URL since 0.52.0
So . there is an update workaround without the need of older server: Install Franz and quit it (ot j...
17
Rocket.Chat [BUG] Users can't send messages (>= 2.4.x)
Same issue here Worked around it by using the reset feature in /admin/Message Description: I updated...
15
Rocket.Chat Rocket.Chat stops working with 1000 active users
Thanks for all of your suggestions We could now prove that the UserPresenceMonitor was responsible f...
15
Rocket.Chat How can I move rocket chat to other server?
@karamata do a mongo export: tar it up copy it over to new host Your Rocket.Chat version: Rocket.Cha...
5
Rocket.Chat Atlassian Crowd integration doesn't work after upgrading Atllasian Crowd from 3.7.1 to 4.0.0
@karl-in-office @flover97 @gabriellsh (@gammpamm @fbuchmeier Description: Our Rocket.Chat server wor...
5
Rocket.Chat Can't connect to custom deployed Rocket.Chat using mobile apps (iOS or Android)
Somehow mine started working only change was to delete and reinstall the mobile app.. For reference ...
5
Rocket.Chat Migration Issues after Update Release Version
That is a really bad habit to use tag latest with docker deployment in general and with rocketchat p...
5
Rocket.Chat File upload doesn't work
@gregharvey I have the same problem File Upload is working if I run Rocket.chat as root but this is ...
3
Rocket.Chat Fail to start rocket.chat snap
@mholt thanks for taking the time to post on this issue I'll get another build of this going to give...
3
Rocket.Chat RealTime API stream-notify-user/message event
Seems like I found it The collection name is stream-room-messages and event name is __my_messages__ ...
248
yakyak Hangs on connecting state because of parsing error
Looks like Google added a nonce to a script tag which broke yakyak's HTML parsing ...
113
react native gifted chat Different Bubble color for each user.
FYI this is the code example you want: Issue Description If more than 2 user in a conversation ...
74
hangups Error: invalid_scope from oauth2
Got a workaround here! Using one of the urls linked above you can get to a programmatic_auth url tha...
63
client How to uninstall the OSX client?
To uninstall keybase and KBFS: and then remove /Applications/Keybase.app On older versions you may n...
62
react native gifted chat Latest React Native 0.62.0: Type Errror: Super expression must either be null or a function
I hacked together a fork w/ the new action-sheet using the currently released version here ...
49
react native gifted chat Text box not visible when keyboard active using expo
I got it. Just do like this. need to declare flex to parent view to make it works 👍 ...
41
react native gifted chat How to use with Expo?
In the meantime I'm using the following workaround of using a keyboard spacer Trying to understand h...
31
hangups Support for Google provided email accounts (i.e. myusername@myuniversity.edu)
Manual login process: Download and run this Python script It requires hangups to be installed Open t...
26
thelounge A logo for the project
Hey everyone! As I'm writing the changelog entry for the incoming release of v2.7.0 in which I am ma...
19
yakyak Crash at launch
I made this change to line 36 instead ({conversation event = []} = conv); When launching 1.5.9 ...
18
react native gifted chat Multiline does not grow TextInput until several characters into new line
This was not an easy one It took me a lot of work to find and fix the issue too + animate the input ...
18
client How to uninstall on ubuntu?
IMHO this is a seriously bad design How come there is no option or preferences to config startup beh...
15
react native gifted chat Show/Hide/Disable chat textinput ?
This code works: renderInputToolbar={disable ? () => null : undefined} is there any function I can u...
13
BotFramework WebChat showing special character (’) instead of ‘ (apostrophe) character
I can reproduce this reliably for things like three dots in a row (...) on Safari and Chrome on OSX ...
12
yakyak YakYak stopped working with Hangouts
I can confirm that changing the email field number in YakYak-darwin-x64/YakYak.app/Contents/Resource...
10
react native gifted chat android:windowSoftInputMode="adjustResize" causes different parts of the app to adjust size
@mrnahidtalukder a workaround for now is to use another package and set the softinputmode programati...
6
Rocket.Chat.Electron New v3 version doesn't connect with a valid server
Reverting to v2.17.11 and works fine. My Setup Operating System: macOS Catalina v10.15.6 App Version...
3
react virtuoso v1 beta is available - test now if you are building chat / feed interfaces
Huge thank you everyone for the feedback the contributions and the testing v1 is now official and pu...
3
ejabberd XEP-0359: stanza-id in each of received message
This is what we need to do to announce urn:xmpp:sid:0 and urn:xmpp:mam:2 support ...
94
server Upgrade fails with - Column name "oc_flow_operations.entity" is "NotNull", but has empty string or null as default
I just got this error while upgrading from 17.0.10 to 18.0.10 Also with me the column entity did not...
44
server Thunderbird 60.x unable to use caldav/carddav
By disabling the newly introduced parameter network.cookie.same-site.enabled the problem can be solv...
41
server Redirect loop login / Renewing session token failed
Hello I had a similar issue regarding the redirect mine was related to url overwrite protocol option...
26
server Security and config warnings
Finally got this sorted as well How to use GitHub Please use the 👍 reaction to show that you are af...
24
live share "Failed to install support for joining sessions from a URL"
For me this was caused by my M1 MacBook not yet having installed Rosetta 2 Installing Rosetta 2 by r...
23
server Syntax error or access violation: 1118 Row size too large
Changing the ROW_FORMAT to DYNAMIC did not solve the issue Steps to reproduce Run Nextcloud 16.0.0 f...
21
bigbluebutton Default presentation not show with https
As of September 30 2021 this may be related to the expiration of the DST Root CA X3 certificate ...
18
server [Nextcloud 15] Nextcloud clients cannot log in after upgrading from 15.0.0 to 15.0.2
nextcloud/android#3430 TL;DR: try adding this to your config.php: 'overwriteprotocol' => 'https', ...
18
server [Updater] Could not find resource js/config.js to load
Same issue on Nextcloud 20 PHP 7.4.3 MYSQL 8 APACHE2 Web server error logs How to use GitHub Please ...
16
bigbluebutton presentation conversion fails after update to 2.2.17
It was solved by replacing certbot: Describe the bug The default presentation is no longer shown ...
16
server Failed Code Integrity because of EXTRA_FILE after Update from 16.0.1 to 16.0.2
Okay so for everyone getting this two things: Delete the cypress.json and the cypress folder in the ...
16
server OSX calendar sync fails (via caldav)
@dhowe I managed to get it working on High Sierra if I select the advanced profile when adding a new...
12
server Repair error during upgrade from 17.0.2 to 18.0.0
Had the same issue but it does not look critical.. its working.. You may do the following to ensure ...
12
server "Exception: Database error when running migration latest for app core" When Upgrading NC 20.0.11.1 -> 21.0.3
As a follow up it seems like I had to add this as an argument to my mariadb container: The argument ...
11
dvc Dataset storage improvements
I will give my impressions on your questions: There were many requests related to datasets storing w...
11
dvc [Feature Request?] dvc run ... without actually running?
We've been thinking about it a lot and decided to change dvc add/run/repro so they will only save ch...
11
server Logintoken are Invalidated 21.0.1
Workaround whilst this gets fixed Explanation: It seems that the commit linked below is the only cha...