Aesthetics is what happens when our brains interact with content and go, “ooh pretty, give me more of that please.” Whether it’s a starry night or The Starry Night, the sound of a scenic seashore or the latest single from Megan Thee Stallion, understanding how the sensory experiences that scintillate us most deeply do so has spawned an entire branch of philosophy studying art, in all its forms, as well as how it is devised, produced and consumed. While what constitutes “good” art varies between people as much as what constitutes porn, the appreciation of life’s finer things is an intrinsically human endeavor (sorry, Suda) — or at least it was until we taught computers how to do it too.
The study of computational aesthetics seeks to quantify beauty as expressed in human creative endeavors, essentially using mathematical formulas and machine learning algorithms to appraise a specific piece based on existing criteria, reaching (hopefully) an equivalent opinion to that of a human performing the same inspection. This field was founded in the early 1930s when American mathematician George David Birkhoff devised his theory of aesthetics, M=O/C, where M is the aesthetic measure (think, a numerical score), O is order and C is complexity. Under this metric simple, orderly pieces would be ranked higher — i.e. be more aesthetically pleasing — than complex and chaotic scenes.
German philosopher Max Bense and French engineer Abraham Moles both, and independently, formalized Birkoff’s initial works into a reliable scientific method for gauging aesthetics in the 1950s. By the ’90s, the International Society for Mathematical and Computational Aesthetics had been founded and, over the past 30 years, the field has further evolved, spreading into AI and computer graphics, with an ultimate goal of developing computational systems capable of judging art with the same objectivity and sensitivity as humans, if not superior sensibilities. As such, these computer vision systems have found use in augmenting human appraisers’ judgements and automating rote image analysis similar to what we’re seeing in medical diagnostics, as well as grading video and photographs to help amateur shutterbugs improve their craft.
Recently, a team of researchers from Cornell University took a state of the art computational aesthetic system one step further, enabling the AI to not only determine the most pleasing picture in a given dataset, but capture new, original — and most importantly, good — shots on its own. They’ve dubbed it, AutoPhoto, its study was presented last fall at the International Conference on Intelligent Robots and Systems. This robo-photographer consists of three parts: the image evaluation algorithm, which evaluates a presented image and issues an aesthetic score; a Clearpath Jackal wheeled robot upon which the camera is affixed; and the AutoPhoto algorithm itself, which serves as a sort of firmware, translating the results from the image grading process into drive commands for the physical robot and effectively automating the optimized image capture process.
For its image evaluation algorithm, the Cornell team led by second year Masters student Hadi AlZayer, leveraged an existing learned aesthetic estimation model, which had been trained on a dataset of more than a million human-ranked photographs. AutoPhoto itself was virtually trained on dozens of 3D images of interior room scenes to spot the optimally composed angle before the team attached it to the Jackal.
When let loose in a building on campus, as you can see in the video above, the robot starts off with a slew of bad takes, but as the AutoPhoto algorithm gains its bearings, its shot selection steadily improves until the images rival those of local Zillow listings. On average it took about a dozen iterations to optimize each shot and the whole process takes just a few minutes to complete.
“You can essentially take incremental improvements to the current commands,” AlZayer told Engadget. “You can do it one step at a time, meaning you can formulate it as a reinforcement learning problem.” This way, the algorithm doesn’t have to conform to traditional heuristics like the rule of thirds because it already knows what people will like as it was taught to match the look and feel of the shots it takes with the highest-ranked pictures from its training data, AlZayer explained.
“The most challenging part was the fact there was no existing baseline number we were trying to improve,” AlZayer noted to the Cornell Press. “We had to define the entire process and the problem.”
Looking ahead, AlZayer hopes to adapt the AutoPhoto system for outdoor use, potentially swapping out the terrestrial Jackal for a UAV. “Simulating high quality realistic outdoor scenes is very hard,” AlZayer said, “just because it’s harder to perform reconstruction of a controlled scene.” To get around that issue, he and his team are currently investigating whether the AutoPhoto model can be trained on video or still images rather than 3D scenes.