Beyond the goals set by the task, I aim for simple & agnostic code that can used by anyone to deploy the environment in a quick & easy manner.
Initially, I desired to remain local & deploy the environment on VirtualBox using a provider & provide a Vagrant 'toolbox' VM but due to using musl standard library instead of glibc I faced technical issues.
I settled on Azure instead since an Active Directory service is required & Azure is a Microsoft product so superb integration is expected.
I chose NGINX as a webserver due to familiarity.
In the current scenario, the choice of Terraform & Pulumi is a personal preference, I chose Pulumi.
Terraform is mature, as such documentation is plenty, & it uses a DSL so it's great for 'Ops' background folk.
Pulumi is more dynamic due to its programming library nature & caters to 'Devs'.
The end goal is to provide access to static content served by a webserver & a Splunk dashboard via HTTPS with a certificate generated by Active Directory. The access & integration between the webserver & Splunk dashboard will be developed first since it's the focal point of the task. The Active Directory certificate is only secondary to a functioning integration between Splunk & NGINX.
Beginning with setting up the Linux VM over at Azure. Pulumi provides handy templates with boilerplate code to make this process quicker & a web-app AI code generator.
SSH keys for accessing the VMs will be generated locally & uploaded to Azure to be consumed by the rest of the infrastructure.
The Ansbile code is structured around the services & infrastructure, Linux hosting Splunk, Syslog & NGINX & Windows hosting AD & CA.
First I wanted to achieve public access to the 'Hello World' page. I have defined the required network rules, created an HTML file, required NGINX configuration & Docker installed as a dependency. A container is used instead of directly deploying for easier local development & testing, as a standalone service & integration with Splunk.
During the research, I discovered that the initial idea of sending the NGINX access logs directly to Splunk isn't an option and an intermediary service is required, Syslog.
Initially, I deployed a Splunk container locally to get familiar with it, the GUI, primarily the API & data collection. By referencing the API docs I used the curl command to authenticate, create an HEC & send a sample payload. HEC was chosen for the JSON support.
Next, I looked into integrating NGINX with Syslog, in this case, a containerized instance of Syslog-ng.
First I looked for the required NGINX directives for the log format & destination. For logs, I defined a JSON structure & access log as hostname:514.
For testing purposes, I used Netcat, listened to port 514 & accessed the webserver to see how the sent messages looked. Also looking into the NGINX & Syslog logs for debugging.
Following setting up Syslog to listen to messages from NGINX & pass them to Splunk. Listening to messages was straightforward while passing to Splunk required effort. First keeping the message identical since NGINX was handling the formatting. Adding the required authentication header to contact Splunk's HEC. Finally, locate the flag for disabling TLS validation since Splunk used a self-signed certificate.
Once again I used Netcat & listened to messages from Syslog on a different port for the message header & body verification, as well as looking into Syslog & Splunk logs for the TLS issue.
With all of this done manually & documented I proceeded to develop the required automation code. Once done I tested out a deployment of this segment.
The missing piece was the certificate by a Windows Certificate Authority.
The initial step was to deploy in an automated fashion an SSH-accessible Windows Server. After failing to set the same SSH key as in the Linux host I resorted to generating a password & passing it to Ansible.
The next step was setting up Active Directory & Certificate Authority to produce a signed certificate for Splunk. The initial Windows experience was a 'fish out of water'. while trying the certreq command it simply hanged. After a day without any progress I undeployed the cloud environment & proceeded to deploy a Windows server locally. To my surprise, the hanging command was occurring due to a pop-up GUI - which I couldn't interact with while SSHing.
I finally resorted to Powershell & the community module 'PSCertificateEnrollment'. After a long trial & error, a certificate was produced manually, and tested locally & all the steps were translated into Ansible automation.
It was a good challenge, I learned a lot. I had a hard time estimating the amount of unknowns I would face along the way, I assumed the certificates part would be quick & it turned out to be the biggest blocker.