SolvedRocket.Chat Rocket.Chat stops working with 1000 active users
✔️Accepted Answer
Thanks for all of your suggestions. We could now prove that the UserPresenceMonitor was responsible for the denial of service we faced.
We disabled it on all but two separate instances and can restart the cluster now without causing tons of status updates.
We did so by patching the source and setting USER_PRESENCE_MONITOR environment:
--- rocket.chat/programs/server/app/app.js 2018-07-04 18:07:36.917547890 +0200
+++ app.js 2018-07-04 18:10:12.273401726 +0200
@@ -7753,7 +7753,10 @@
InstanceStatus.registerInstance('rocket.chat', instance);
UserPresence.start();
- return UserPresenceMonitor.start();
+
+ if (process.env['USER_PRESENCE_MONITOR']) {
+ return UserPresenceMonitor.start();
+ }
});
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I would still like to have an official fix for this rather than patching the source with every update.
@magicbelette We still use the slow database engine but we do not observer high CPU load or memory usage on database servers yet.
Other Answers:
@sampaiodiego Thanks to COVID-19 we had a lot of trouble scaling beyond 1650 active users. Thanks to v3.0.4 we are now at 2250 user max. per day. Thank you for further improving on this issue.
hi @kaiiiiiiiii,
I am the admin of @AmShaegar13's Rocket.Chat-Setup, he kindly asked me to post this here:
001-rs:PRIMARY> db.serverStatus().connections
{ "current" : 182, "available" : 51018, "totalCreated" : 3234457 }
root@rocketchatdb:~# lsof -i | grep mongodb | wc -l
186
So this shouldn't be a thing…
Best,
qchn
This is getting a major annoyance. More and more users complain about broken notifications. This issue is a major drawback for acceptance in our company.
Just wanted to follow up here. We are working through another case like this. So this is definitely on our radar.
Description:
For us, Rocket.Chat does not work with more than 1000 active users. Rebooting a server, restarting Apache or restarting Rocket.Chat after an update causes all clients to face serious issues connecting to the chat.
Steps to reproduce:
Expected behavior:
Clients can reconnect to the chat.
Actual behavior:
While reconnecting the server sends an enormous amount of the following messages over websocket:
This continues until the server closes the websocket. I assume, this is due to the lack of ping-pong messages in this time. The client instantly requests a new websocket starting the whole thing over and over again.
The only effective way to get the cluster up and working again is to force-logout all users by deleting their
loginTokens
from mongodb directly.Server Setup Information:
Additional context
The high amount of instances we operate directly results from this issue. When we first ran into it with about 700 users, we assumed we might need to scale the cluster accordingly but we are not willing to add another server to the cluster for every 40 new users. We planned to support around 8000 users. Approximately half of them active.
For now, we do not allow mobile clients yet. We would really love to do so but with the current state of the cluster this wont happen soon.