Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.
This filter scans the 'contents' column of an input table using ClamAV, and outputs corresponding tables containing 'virus_detection' column (by default).
If a virus is detected, the 'virus_detection' column contains the detected virus signature name; otherwise null.
For testing and running this transform on local, we are using a unix socket shared with a docker container.
However, docker for mac doesn't support a shared unix socket.
For Mac users, ClamAV will be set up by running make venv
.
If thet script doesn't work for you, please ensure that you have installed clamd
command, and it runs with a local unix socket: /var/run/clamav/clamd.ctl
.
- Install ClamAV with Homebrew
brew install clamav
- Copy and edit config files.
cp $(brew --prefix)/etc/clamav/clamd.conf.sample $(brew --prefix)/etc/clamav/clamd.conf sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/clamd.conf echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/clamd.conf echo "LocalSocket /var/run/clamav/clamd.ctl" >> $(brew --prefix)/etc/clamav/clamd.conf cp $(brew --prefix)/etc/clamav/freshclam.conf.sample $(brew --prefix)/etc/clamav/freshclam.conf sed -i '' -e 's/^Example/# Example/' $(brew --prefix)/etc/clamav/freshclam.conf echo "DatabaseDirectory /var/lib/clamav" >> $(brew --prefix)/etc/clamav/freshclam.conf
- Create a directory for a local unix socket
sudo mkdir -p /var/run/clamav sudo chown $(id -u):$(id -g) /var/run/clamav
- Create a direcotry for a database of ClamAV
sudo mkdir -p /var/lib/clamav sudo chown $(id -u):$(id -g) /var/lib/clamav
- Update a database of ClamAV
freshclam
- Edit
venv/bin/activate
, and add following lines to startclamd
bysource venv/bin/activate
if [ ! -e /var/run/clamav/clamd.ctl ]; then clamd --config-file=$(brew --prefix)/etc/clamav/clamd.conf fi
The set of dictionary keys holding MalwareTransform configuration for values are as follows:
- malware_input_column - specifies the input column's name to scan. (default:
contents
) - malware_output_column - specifies the output column's name of the detected virus signature name. (default:
virus_detection
)
As shown in the output of the local run of malware transform, the metadata contains several statistics:
- Global statistics:
infected
: total number of documents (rows) in which any malwares were detected.clean
: total number of documents (rows) in which no malwares were detected.
The following command line arguments are available in addition to the options provided by the python launcher and the python launcher.
--malware_input_column MALWARE_INPUT_COLUMN
input column name
--malware_output_column MALWARE_OUTPUT_COLUMN
output column name
To run the samples, use the following make
targets
run-cli-sample
- runs src/malware_transform_python.py using command line argsrun-local-sample
- runs src/malware_local.pyrun-local-python-sample
- runs src/malware_local_python.py
These targets will activate the virtual environment and set up any configuration needed.
Use the -n
option of make
to see the detail of what is done to run the sample.
For example,
make run-cli-sample
...
Then
ls output
To see results of the transform.
To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.