A capturing system using playwright, as a web service.
This install method is not very well supported, use only if you know what you're doing, and are ready to debug issues you may encounter
To run lacus using docker or podman you need docker installed or podman and podman-compose:
podman-compose build # or docker compose build
podman-compose up # or docker compose up
# go to http://localhost:7100/
You need poetry installed, see the install guide. The poetry shell plugin is not strictly required, but will make your life easier. You can install it this way.
Lacus supports valkey or redis, but valkey is prefered now due to the change of license. So we will use valkey below, but as of now, redis also works.
System dependencies:
sudo apt-get update
sudo apt install build-essential
# To run the tests
sudo apt install tclYou need to have valkey cloned and installed in the same directory you clone the lacus repository in:
lacus and valkey must be in the same directory, and not valkey cloned in the lacus directory.
git clone https://github.com/valkey-io/valkey.gitCompile valkey:
cd valkey
git checkout 8.0
make
# Optionally, you can run the tests:
make test
cd ..Clone this repository if you haven't done it already:
git clone https://github.com/ail-project/lacus.gitThe directory tree must look like that:
.
├── valkey => cloned valkey
└── lacus => cloned lacus
From the lacus directory, run:
poetry installInstall the system dependencies required by playwright (will call sudo):
poetry run playwright install-deps
# for pydub:
sudo apt install ffmpeg libavcodec-extraInitialize the .env file:
echo LACUS_HOME="`pwd`" >> .envInitialize the config and install playwright browsers:
poetry run update --initIt will launch the instance if you answer yes to the "restart" question.
Edit the config file config/generic.json, and configure it accordingly to your needs.
Start the tool (as usual, from the directory):
poetry run startYou can stop it with
poetry run stopWith the default configuration, you can access the web interface on http://0.0.0.0:7100,
where you will find the API and can start playing with it.
If you have recurring messages like the ones below you can remove the uuid from the queue as follows. Note that it is probably due to an unclean stop of lacus and they will be removed automatically after a while
2023-11-17 08:00:59,936 LacusCore WARNING:[ef7f653d-4cfd-4e7b-9b91-58c9c2658868] Attempted to clear capture that is still being processed.
2023-11-17 08:01:00,939 LacusCore WARNING:[ef7f653d-4cfd-4e7b-9b91-58c9c2658868] Attempted to clear capture that is still being processed.
2023-11-17 08:01:01,941 LacusCore WARNING:[ef7f653d-4cfd-4e7b-9b91-58c9c2658868] Attempted to clear capture that is still being processed.
2023-11-17 08:01:02,944 LacusCore WARNING:[ef7f653d-4cfd-4e7b-9b91-58c9c2658868] Attempted to clear capture that is still being processed.
2023-11-17 08:01:03,947 LacusCore WARNING:[ef7f653d-4cfd-4e7b-9b91-58c9c2658868] Attempted to clear capture that is still being processed.
...
While valkey is running connect to it via its socket and zrem then entry.
ail@ail-tokyo:~$ cd lacus/
ail@ail-tokyo:~/lacus$ ../valkey/src/valkey-cli -s cache/cache.sock
valkey cache/cache.sock> zrem lacus:ongoing ef7f653d-4cfd-4e7b-9b91-58c9c2658868
(integer) 1
valkey cache/cache.sock>
On an initial install, we tell you to run playwright install-deps. After updating an existing lacus instance, you may have to do that again if new ones are required by playwright. This call isn't executed automatically because it will use sudo and if the user requires a password, it will block the update script.
It that's the case, run the following command from the lacus directory:
poetry run playwright install-depsLacus supports interactive capture sessions powered by xpra. In an interactive session the browser is displayed inside a virtual X display managed by xpra. You can connect to it via a browser-based HTML5 client, interact with the page (log in, solve CAPTCHAs, etc.), and finalize the capture once the page is in the desired state.
On Ubuntu 24.04, add the xpra.org repository and install the dependencies:
curl -s https://xpra.org/gpg.asc | sudo apt-key add -
echo "deb https://xpra.org/ noble main" | sudo tee /etc/apt/sources.list.d/xpra.list
sudo apt update
sudo apt install xpra xvfbWarning: On desktop systems, installing xpra enables and starts a system-wide xpra service and socket. These are not needed by Lacus (which manages its own per-session xpra servers) and should be disabled:
sudo systemctl disable --now xpra-server.socket
Each interactive capture starts its own short-lived xpra server bound to a private unix socket. Lacus remains the captures controller: it enqueues interactive captures, reports session state, and accepts the final finish signal. The Lacus interactive interface (nicknamed tactus), proxies /interactive/<uuid>/view/ traffic to the matching xpra socket so you can view the HTML5 client in a browser, inside an iFrame.
This keeps the project boundaries:
LacusCoreowns interactive session lifecycle and xpra transport details.lacusexposes the API.- Tactus handles end-user browser traffic for the HTML5 client.
By default, Tactus listens on 127.0.0.1:7101 and serves /interactive/<uuid>/view/, including the nested xpra transport under /interactive/<uuid>/view/session/.
The bundled wrapper assumes a same-origin deployment for its own UI assets and uses view-local convenience endpoints under /interactive/<uuid>/view/ for status polling and finish. Those Tactus-local wrapper endpoints proxy to the Lacus API routes such as GET /interactive/<uuid> and POST /interactive/<uuid>/finish, so Flask remains the source of truth while the browser keeps a same-origin path for the panel controls.
You need to configure the key remote_headed_settings in config/generic.json.
"remote_headed_settings": {
"allow_remote_headed": true,
"tactus_listen_ip": "127.0.0.1",
"tactus_listen_port": 7101,
"backend_type": "xpra",
"public_base_url": "http://127.0.0.1:7101"
}Set allow_remote_headed to true to enable the interactive interface, tactus_listen_ip and tactus_listen_port are the settings so lacus can connect to tactus.
public_base_url is the base URL you'll use to open the interactive page. It is what the user will get when they request the interactive page.
Sample reverse-proxy configurations are available in:
/etc/nginx/lacus.conf.sample/etc/apache2/lacus.conf.sample
These examples route only /interactive/<uuid>/view/ to Tactus and send the rest of the API traffic to the main Lacus application. The bundled wrapper uses Tactus-local helper endpoints under /interactive/<uuid>/view/ for its panel controls, and those helpers proxy to the canonical Lacus API routes for session metadata and finish.
For systemd deployments, a sample Tactus unit is available in:
/etc/systemd/system/lacus-tactus.service.sample
For supervisord deployments, a sample configuration that includes Tactus is available in:
/supervisord/supervisord-tactus.conf.sample
The internal xpra unix sockets are not meant to be exposed directly to end-users.
Enqueue an interactive capture (or use PyLacus):
UUID=$(curl -s -X POST http://localhost:7100/enqueue \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com", "remote_headfull": true, "general_timeout_in_sec": 600}')
echo "Session UUID: $UUID"Poll until the session status is ready. Possible status values are
starting, ready, capture_requested, stopped, expired, and error.
curl -s http://localhost:7100/interactive/$UUID
# {"status": "ready", "view_url": "http://127.0.0.1:8080/interactive/<uuid>/view/", ...}If view_url is present, open it in a browser to interact with the page. When ready, trigger the capture:
curl -s -X POST http://localhost:7100/interactive/$UUID/finishRetrieve the result (poll until status is not "unknown"):
curl -s http://localhost:7100/capture_result/$UUIDTactus exists to make interactive sessions testable and deployable without turning the main Lacus web app into a full HTML5/WebSocket proxy. If you already have a third-party application (e.g. LookyLoo, AIL, Pandora) that fronts Lacus, that application can provide the same /interactive/<uuid>/view/ route itself and proxy to the Lacus-managed unix socket instead of using Tactus.
If a third-party application fronts Lacus, this is the approach to use:
GET /interactive/<uuid>on Lacus returns session state plus an optional deployment-facingview_url.- The third-party app serves
/interactive/<uuid>/view/to end-users. - The wrapper page, if used, may expose same-origin helper routes under
/interactive/<uuid>/view/, but those helpers should proxy to the canonical Lacus API routes (GET /interactive/<uuid>andPOST /interactive/<uuid>/finish) rather than reimplementing the control plane. - That route must proxy both HTTP and WebSocket traffic to the xpra server bound to the session's internal unix socket.
- The raw unix socket path should stay internal to trusted infrastructure. It should not be exposed to normal end-users.
The bundled tactus sidecar is a reference implementation.
One clean deployment model is:
lacus.serviceruns the main Lacus control-plane service on127.0.0.1:7100lacus-tactus.serviceruns Tactus on127.0.0.1:7101- nginx listens on the host-facing interface and routes:
/interactive/<uuid>/view/to127.0.0.1:7101- everything else to
127.0.0.1:7100
Then enable and start both services:
sudo systemctl enable lacus.service lacus-tactus.service
sudo systemctl start lacus.service lacus-tactus.serviceFinally, install the nginx sample and adjust server_name, TLS, and any access-control rules as needed for your environment.
