Blog
The Heidelberg University Library’s new image database, which has been integrated into prometheus for a few days now, comprises 2,343 data records. The aim of this HeidICON pool „UB Anatomische Illustrationen “ is the complete formal and content-related indexing of the illustrations of selected 19th century anatomical plates from the library’s holdings. They originate from the Cooperation with the Institute of Anatomy and Cell Biology of the University of Heidelberg. The selected textbooks, illustrations and records reflect the content of the former teaching and learning collection and document the contemporary focus of research.
Today we would like to take another look at the image similarity search integrated in prometheus, which creates image vectors based on the SwAV (Swapping Assignments between Views) self-supervised learning algorithm. They are limited to 80 dimensions sufficient for the result, which relate, for example, to color properties or the brightness of pixels as well as to the structural image composition. These created image vectors are pre-calculated for the images in the image archive and stored in the index so that the search engine queries are reduced to calculating the distance between these vectors stored in the index. They have not yet been created and indexed for all images in prometheus, but this is done at regular intervals. If images are deleted from the original databases and are therefore “not available”, they will remain in the index until the next update.
We occasionally receive feedback on the image similarity search from users who are not convinced by the results because, for example, one of our examples, a winter landscape by Witsen, shows many summer landscapes or „Asparagus“ by Edouard Manet in the results.
What do you think? Are the images similar or not similar?
We see the similarity in the images, between snow and sand, which we do not evaluate based on the metadata associated with the image. And yes, there are some surprising, astonishing and sometimes inexplicable results that we find in this way, especially when the calculated distance is greater in the results listed below.
But there are also fascinating results, as in the case of the „Madonna and Child“ by Giovanni Bellini.
Most of the time, however, we don’t do exploratory searches, but rather targeted searches to get less unexpected results and then we search for winter landscape or the keywords winter landscape, winter, landscape in the title using the Advanced Search.
Have you already tried the image similarity search?
Inspired by the graphic „Dependency“, today we briefly present the most important facts about the development process for the prometheus software.
In the current main development stack we have ruby on rails 7.1, ruby 3.2, elasticsearch 8.7, mariadb 10.11 and apache 2.4 alongside the other components imagemagick for processing images, ffmpeg for processing videos and nokogiri for processing most metadata imports. First, we test all changes and new features on our test suite, which consists of two parts. On the one hand, we maintain a unit test suite with Minitest to test important components of our application in isolation, such as the authorization model and image processing. Secondly, our e2e suite with selenium-webdriver simulates real users launching a browser and using the Prometheus application. No code is ever deployed to our servers without passing all tests first.
To ensure that we can easily onboard new team members while maintaining a consistent coding style, we use rubocop during our test runs to enforce a few rules. Similarly, we perform security audits with tools like Brakeman. During development, we use a number of debuggers and profilers to isolate bottlenecks and fix hard-to-find bugs.
We operate the image archive on three servers with a total of 12 CPUs and 48G RAM. Recently, these and our other servers were migrated to Debian 12, the basis for many popular Linux distributions such as Ubuntu or Mint.
Every week the top image bar on the homepage of prometheus changes and gives a first visual impression of the image series of the week. The topics are mostly inspired by current exhibitions, for example this week’s „Anna Oppermann. A Retro Perspective“ in the Bundeskunsthalle in Bonn. We often take an aspect of the exhibition or the artist’s work, such as Anna Oppermann’s “Ensembles” in this case, and look for suitable images in the prometheus image archive. We cannot always rely on a research database and 2,191 data sets on the artist’s work.
However, there is always a public image collection at prometheus that you can click on directly (see Fig. “1.”) and where you can find more material on the topic. As of today, you can also click on the thumbnails directly (see Fig. “2.”) and the associated data record will be displayed in the image archive.
We would be happy to accept your topics for a #pictureSeriesOfTheWeek, for an exhibition, but also for projects or campaigns. Get in touch with us and see how it can be implemented.
This year we will once again begin our information section in the picture archive with a look at the annual list of the artists you most frequently searched for last year.
Paula Modersohn-Becker made it to the top in 2022 but this year she came in 9th place.
She was replaced at the top by Pablo Picasso, followed by Vincent van Gogh and Max Ernst. The most wanted artist in 2023 is Hannah Höch behind this trio. With her there are seven other artists in the top 20.
All top 20 in 2023:
1. Pablo Picasso
2. Vincent van Gogh
3. Max Ernst
4. Hannah Höch
5. René Magritte
6. Claude Monet
7. Gabriele Münter
8. Caspar David Friedrich
9. Paula Modersohn-Becker
10. Hilma af Klint
11. Caravaggio
12. Albrecht Dürer
13. Otto Dix
14. Frida Kahlo
15. Nan Goldin
16. Henri Matisse
17. Gerhard Richter
18. Kandinsky
19. Rebecca Horn
20. Cindy Sherman
A lot has also changed in the list of the ten living artists who aroused the most interest on Google and which internet service providers identified for Monopol magazine compared to last year. Last year’s number 1 Banksy is no longer in the top 10, just like Jeff Koons, Cindy Sherman, Damien Hirst and Wolfgang Tillmanns.
1. Gerhard Richter
2. Yoko Ono
3. Marina Abramović
4. Anselm Kiefer
5. Leon Löwentraut
6. David Hockney
7. Yayoi Kusama
8. Isa Genzken
9. Kaws
10. Georg Baselitz
Images in prometheus are always displayed within a set size frame in the first and second magnification levels. Portrait or landscape format can be seen there, but how big is the image in reality?
The “size” field provides information about the dimensions of the original.
In our example it is 29.6 × 23.6 cm.
In order to get a visual idea of how big or how small the object is directly from the image in prometheus, the comparison size is integrated into the image archive as a 175 cm tall group of people. It is visible in all images where height and width are specified.
Around 75% of all data sets in the 124 image databases integrated in prometheus are dated and available for filtering search results by dating.
For example, if you search for “Christmas” in the advanced search, you will get 809 records in the results list.
Are you more interested in depictions of Christmas at a specific time? For example, around 1920? Under “filter by dating”, limit the results from 1920 to 1920. You will receive 30 data sets with the exact time “1920” and with time periods such as “around 1915 or “1876 – 1924” or “20th c.”.
Want another time period instead? Maybe 100 to 1.000 A.D.?
For about two years now, there is an image similiarity search in prometheus that allows you to find similar images within the image inventory based on one image.
It was developed and integrated within Task Area 3 of the NFDI4Culture project by Francisco Mondaca and Jörg Koch.
On the basis of the self-supervised learning algorithm SwAV (Swapping Assignments between Views), image vectors were created that are pre-calculated for all images in the image archive and stored in the index so the search engine’s queries is reduced to calculating the distance between these vectors stored in the index. For all new images in prometheus, additional image vectors are created at regular intervals (started just recently) and stored in the index.
You will find four similar images under the single image view.
By clicking on „Show all“ you can access the view of all similar images of the winter landscape we selected.
Wikidata is a free, shared database and a project of the Wikimedia Foundation with the goal of centralizing structured data and making it usable.
Last year we first integrated Wikidata search links into prometheus and a few months later the possibility to add the associated Wikidata ID to each image in the artist fields.
By clicking on “Add Wikidata ID” a window opens in which first the name can be entered and then the corresponding Wikidata entry can be selected. After saving, the Wikidata ID is added. If necessary, a click on the pen also enables correction.
These Wikidata IDs, the existing ones and the created ones, take you directly from the image archive to the corresponding authority data in Wikidata. And you can search for the Wikidata IDs in prometheus.
So far, 150 entries have been added this way. But there should be more to come in the next few weeks and months. Try it out, too!
Another image database is integrated into prometheus with the institute database “Historical Photo Collection“ of the Institute for Art and Visual History at the Humboldt University Berlin.
Most of the approximately 1,500 older photographs in the institute’s photographic collection that exist today were acquired antiquarian around 1950. Some of them date back to the early days of photography. Currently, around two thirds of the older photographs have been digitally indexed and the first 758 data sets are available for your research in the image archive.