I bought a £50 security camera off Amazon to improve security at our home.
For a low cost unit it’s fairly capable, it works at night thanks to an array of IR LEDs, and the companion software allows you to control/view the feed from your phone, even outside your home network (which makes me slightly worried!).
It supports recording a video feed to a micro SD card, but I couldn’t actually prise the unit open to install one.
You can set a region of the image that you want to the camera to monitor. If it detects movement above a defined threshold it can alert you.
The camera can email you (couldn’t get that to work) or send you a push notification. This doesn’t include an image of what it saw, so it’s next to useless.
FTP
Not a problem, the camera can transfer images to an FTP site when movement is detected.
I set up a Raspberry Pi as an FTP server and started collecting images.
There are several how-tos for this on the web.
This is a great start, but my Pi will quickly fill up with pictures of my driveway, we should upload these images to the cloud.
Azure Blob Storage
I wrote a node.js application in TypeScript which picks up new files saved to the FTP site (by listening to the file system) and uploads them as blobs to Azure Blob Storage.
This was highly unreliable. The linux file system raises multiple events for file writes, and my code would pick up partially written files, despite numerous attempts to add delays and debounce code.
ONVIF
ONVIF is an industry forum that promotes standards for IP based security systems.
It basically means there’s a SOAP endpoint (remember that dead end?) that the camera exposes which allows you to control the camera. The standard includes zooming, panning, and adjusting the camera lens etc.., but my camera only allows you to take photos and capture the url for the live video feed.
This means I can use the FTP upload as a signal that the camera has detected movement, and then capture my own photo(s) for upload.
Nice! I now have complete, clean photos from the camera when movement is detected (or about 500ms after!) uploaded to the cloud.
Slack
But how do I view these images?
I started writing a static web app that can be installed as an icon in iOS. It uses CORS to connect to blob storage to retrieve the photos. This worked, but it wasn’t good enough.
Slack has a mobile (and desktop) app and easy integration points. It supports notifications and will display images. Let’s do it.
You need to register a web hook for a channel, which will give you a URL you can post messages to.
How do I securely share the image to slack though? Azure Blob Storage allows you to create a Shared Access Signature, which gives you a URL allowing you to download files from blob storage, whilst keeping the files otherwise private.
Now I get the photos posted to slack whenever movement is detected. Cool!
Azure Cognitive Services
There’s a problem. At dusk moths enjoy fluttering around in front of my camera. Spiders like to continually trip the motion detection (every 30 seconds) by building webs in front of it. The wind blows the sunflowers growing in the raised bed.
There are a lot of false positives.
I tried employing Azure’s Custom Vision Service, part of Cognitive Services. You can make use of Microsoft’s pre-trained image recognition models, and provide just a small set of your own classified images for it to learn from.
I uploaded 50 example images, half containing a person, half not and let it think about it for a couple of hours. After it had trained it’s model you get an API endpoint which you can then use to classify images.
The endpoint will take either the image in the request body, or a link to it. I might as well use the same URL I use for slack:
Now I get my results filtered to those that feature humans (mostly). It’s probably about 90% accurate.
To begin with I just posted the probability along with the image to slack so I could keep an eye on performance.
Next Steps
I might try to consume the live video feed and do my own movement detection. I’m also thinking of uploading this feed to blob storage and building a UI to watch it back.
I’d like to improve the classification, perhaps running the model locally rather than calling out to the Custom Vision API (you can export the trained models).
I’d also like to see where I could get by adding more cameras, perhaps they can cooperate?
It was fun pulling this code together. Perhaps I’ll do some refining and release it as open source…