I’ve been setting up a self-hosted CI pipeline — Jenkins running in Docker on a QNAP NAS, tunneled out to GitHub via Cloudflare. It’s not a trivial project. There are a lot of moving pieces: Docker Compose configs, Jenkinsfiles, webhook setup, network routing, MobSF integration. Normally this is the kind of thing where you spend a lot of time jumping between documentation tabs and Stack Overflow, stitching things together slowly.

I decided to do most of it inside Cursor using Claude as the AI backend, leaning into what people are calling “agentic” mode — where instead of asking one question at a time, you describe a goal and let the AI work through it across multiple steps. Here’s what I noticed.

What’s different about agentic flow

Normal AI-assisted coding looks like this: you ask a question, get an answer, copy something, ask a follow-up, repeat. It’s useful but you’re still doing most of the driving.

Agentic flow is different. You describe what you’re trying to accomplish — “set up a Jenkins pipeline that builds an Android APK, runs Espresso tests, and passes the result to MobSF for a security scan” — and the model starts working through it. It reads your existing files, figures out what’s missing, writes what’s needed, and checks its own output. You’re more like a reviewer than a driver.

The thing that surprised me is how much context it holds across the work. When I was setting up the Cloudflare Tunnel container, it already knew from earlier in the session how Jenkins was configured, which port it was on, and what the Docker network was called. It didn’t need me to repeat myself.

What it was actually like for the QNAP setup

The CI pipeline involved a few distinct problems that would normally each require separate research sessions:

Docker Compose on QNAP — Container Station has some quirks compared to a standard Docker host. Getting the volume mounts right for the Android SDK inside the Jenkins container took some back and forth, but Claude flagged the QNAP-specific paths without me having to look them up.

The Jenkinsfile — Writing a declarative pipeline with three stages (build, test, security scan) and the right failure conditions is the kind of thing where the syntax is fiddly and easy to get wrong. Claude wrote a working first draft, I told it what I wanted to change, and it updated it. I didn’t have to look at the Jenkins documentation once.

MobSF integration — This was the part I was least sure about. MobSF has an API, but wiring it into a Jenkins stage — uploading the APK, polling for the scan result, failing the build above a certain severity score — is not something I’d done before. Claude worked through the API calls, wrote the shell script that handles the integration, and explained what each part was doing as it went.

The Cloudflare Tunnel piece was probably the easiest because the concept is straightforward once you understand it, and Claude explained the “why” well — outbound connection from your machine, no ports, permanent URL — which made it easier to debug when something wasn’t routing correctly.

Where it still needs you

It’s not a replacement for understanding what you’re building. A few times Claude produced something that looked right but had an assumption baked in that didn’t match my setup. If I hadn’t known enough to catch it, I would have had a broken pipeline and no idea why.

It also doesn’t know what it doesn’t know. If you’re exploring something genuinely novel or obscure, it’ll sometimes fill in gaps with plausible-sounding things that aren’t quite right. You still need to verify.

And the agentic mode works best when you give it a clear target. Vague prompts produce vague results. The more specific you are about what done looks like, the better the output.

Would I use it this way again

Yes, without hesitation. The CI pipeline would have taken me significantly longer to piece together manually — not because any individual part is that hard, but because there are a lot of parts, and keeping them all in your head while also looking things up gets tiring fast.

Having something that can hold the full context of what you’re building, write the boilerplate confidently, and explain its reasoning as it goes makes the whole thing faster and less frustrating. It doesn’t replace the thinking — you still have to know what you want and whether you’re getting it — but it removes a lot of the friction in between.

For infrastructure work especially, where the reward is a thing that runs quietly in the background and never needs attention again, that trade-off feels very worth it.