The Discourse AI plugin provides four self-hosted features: NSFW detection, toxic content detection, sentiment analysis, and text embedding. Before you can use these features, you first need to run their corresponding models.
The official Discourse team has already packaged Docker images for these models, so we can run the models on our server very easily.
Note: The Chinese-supporting embedding model multilingual-e5-small is extremely memory-intensive (on my setup, the container uses over 30 GB of memory). Each of the other containers uses approximately 2 GB.
Next, run the following commands to start the toxic content detection, NSFW detection, and sentiment analysis containers provided by Discourse:
# Please disable selinux first, otherwise you will encounter a permission error when downloading the model
mkdir /opt/tei-cache
docker run -itd --restart always -e "API_KEYS=xxx" --name detoxify -e BIND_HOST=0.0.0.0 -p8082:80 ghcr.io/discourse/detoxify:latest
docker run -itd --restart always -e "API_KEYS=xxx" --name nsfw -e BIND_HOST=0.0.0.0 -p8083:80 ghcr.io/discourse/nsfw-service:latest
docker run -itd --restart always --name sentiment --shm-size 1g -p 8084:80 -v /opt/tei-cache:/data ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id cardiffnlp/twitter-roberta-base-sentiment-latest --revision refs/pr/30 --api-key xxx
Replace API_KEYS with a secret key you set yourself to prevent your endpoints from being abused.
If your server is located in mainland China and you experience slow connection speeds to ghcr.io, you can change the domain to ghcr.dockerproxy.com to speed up downloads.
If your server’s configuration meets the requirements to run the embedding model, you can run the following command to start the embedding model container provided by HuggingFace:
docker run -itd --restart always --name embeddings --shm-size 1g -p 8081:80 -v /opt/tei-cache:/data ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id intfloat/multilingual-e5-large --api-key xxx
There are 3 optional models to choose from. You can find the full list in the ai embeddings model configuration item in the admin dashboard. Use multilingual-e5-large for Chinese language support.
You also need to set the --api-key parameter here to prevent abuse.
If your server is located in mainland China and you experience slow connection speeds to HuggingFace, you can add -e "http_proxy=http://192.168.x.x:xxxx" -e "https_proxy=http://192.168.x.x:xxxx" to the command to use a proxy for faster speeds.
If everything is running correctly (you can check container logs with docker logs xxx), you can then configure the plugin in the Discourse admin dashboard.
For the three containers provided by Discourse, you only need to configure the corresponding API endpoints and secret keys; you can select the model yourself.
For the HuggingFace embedding model, you need to configure the ai hugging face tei endpoint and ai hugging face tei api key settings. Select the model you are using in the ai embeddings model setting below
, and it must match the model ID you entered when starting the container earlier.
Finally, simply enable the master switch for each feature you want to use.