Learning about digital preservation and people around it

Posted: 2009-06-03 23:05:59

IASSIST conference reception at the Tampere old city hall. I wrote half a year ago about my new job with environmental and cultural data. During the last seven months I've learned a great deal about ensuring long term access to digital data or digital preservation as it is commonly called, about museums, libraries and archives and about people who work in them.

My main focus has been on the Finnish National Digital Library project (Kansallinen digitaalinen kirjasto in Finnish) and in particular the long term preservation of digital and digitized cultural works. Those who can read Finnish can take a look at the preliminary functional requirements document which also gives a good overview on what the project is all about. The information is already slightly outdated as a working group has been drafting the overall system architecture during April and May, but it's still good for taking a more in depth look and sending comments if you have any.

One of the best aspects has been meeting people from museums, libraries and archives, learning at least a little bit about what these cultural institutions are doing behind the scenes. The National Digital Library project seems to be a strong motivator for the previously rather isolated sectors to work together. Interesting things are being revealed even within each sector: for example how two archives can arrive in two perfectly logical but semantically incompatible descriptions of an object although both are using the same metadata standard.

Finland is not alone: in particular the Europeana portal for cultural works has inspired many countries to set up similar projects during the last couple of years. One of the objectives of the Finnish National Digital Library is to bring content online so that it can be found not only via the Finnish interface but also through Europeana, Google and other search engines. Online access to cultural works in a few year timeframe looks therefore rather bright — at least from a historian's point of view. Old works whose copyright has already expired will be made available, but it's another question how much of the more recent content will be easily accessible.

Being assigned to a national project, I've traveled less than earlier when I was representing CSC in Nordic and EU projects. I haven't missed it much: flying back and forth between cities spending most of the time in meetings wouldn't satisfy my travel appetite and feel bad from the environmental point of view. Nevertheless, I spent a week in Rome in March learning about data repository risk analysis before CouchSurfing with four lively Italians, and last week I was at Tampere in the IASSIST 2009 conference. IASSIST was suprisingly lot of fun in addition to the information content: jokes were flying around, people liked to party and the photo session at the end wrapped up everything perfectly. The most inspiring presentation was given by Dr. Michael Batty on data visualization, in particular on what can be done using GMap, Image Cutter and other tools developed by the Centre for Advanced Spatial Analysis in University College London.

My work contract has been extended until the end of the year. Like until now it's 80 hours per month, approximately half time. As I've been working for more than that during the winter and spring I'll start the summer with a two month vacation, getting back to work in August. Travel destinations will this time include Canada, United States, Finland and Norway — but that's already the topic of another post.

By the way, Ministry of Education also finally got out the survey on the current state of geographic information related data in Finland to which I contributed a little bit. I have no idea whether the information and suggestions in the report will really be used or whether they'll become buried and forgotten — we'll see. If you want to read more about my thoughts on the topic in general take a look of my earlier article.

Turku 1881. Picture from the Finnish National Archive, Senaatin
     kartasto IX: 16, copyright expired. My half-time work experiment felt good enough that I'm doing it again. Same employer (CSC), 80 hours per month contract until end of May 2009. This time I'm working on services providing storage and access to environmental and cultural data. There are opportunities to study topics I care about and participate in making design choices, which makes the deal even more attractive than last time.

Geographic information wants to be free

A lot of geographic information and data about natural resources is gathered around the world by governmental research institutes. The U.S. has been fairly open with providing access to such data while in Europe most institutes have been sitting on their databases and selling information with very restricted terms. Now the situation is slowly changing, partly due to an OECD recommendation which suggests open access to research data from public funding. The community maintained OpenStreetMap project has challenged closed models, and increasing popularity of partly open privately funded services such as Google Maps plays a role as well. The INSPIRE EU directive (see also the Ministry of Agriculture and Forestry INSPIRE page in Finnish) aims towards interoperability and sharing of geographical data, although its level of required openness falls behind the OECD recommendation.

In Finland, there's a lot of high quality data but it is scattered and little used outside the organizations collecting it. Many voices are raised in support of open access, for example in the Pätevä seminar two weeks ago. In practice progress is rather slow. The public research institutes are pointing towards the Ministry of Finance and the law about fees of public services (Maksuperustelaki) which their funding models are partly based on. Curiously, the key reason for fees in the law is to avoid causing harm to private companies competing in the same domain. However, many sources show (meteorology example) that business overall benefits from freely available public information. A fundamental change of government policy is needed in order to have open access by default also in Finland.

My first task in October was to contribute a little bit to a survey which reviews the current state of geographic information related data in Finland, and gives suggestions on what should be done. The survey focused on what data exists, how to make it available and usable for officials, researchers and politicians, and interoperability issues between different datasets. For example different coordinate systems and semantics are a big hindrance to cross analysis. I personally believe that increased openness will gradually help to improve lower level data compatibility as well. Fully open access raises strong opinions both in favor and against, but there seems to be a more general consensus that at least researchers should have convenient access to data.

Preserving cultural data for the next 100 years

Since beginning of November I'm participating in the National digital library project, which is about access, usability and long term preservation of Finnish cultural data. The Finnish National Archives and the National Library are digitizing old books, newspapers and other documents. In this case, open access gladly seems to be the default for at least old works whose copyrights have expired. You can already check out 18th and 19th century newspapers with full text search or municipal documents dating back until Middle Ages. The picture of this blog entry is part of a map of Turku in 1881, retrieved from the National Archives (see the full map). Several Finnish museums are also digitizing their collections. Many new documents, photos, movies and modern art works are already digital when they are created.

Preserving all this material reliably for tens and hundreds of years is a challenging task. The lifetime of computer and storage systems is around five or at most a couple of dozen years. Text in a paper book stays readable for centuries, but digital data will have to be continuously transferred to new, yet unseen storage systems. Current file formats and software to access them will become outdated over time. Human error or attack can have much greater impact in the digital archive than spilling coffee over one book in a physical library.

CSC does not take part in the digitization, but we are currently working on a preliminary requirements specification for the long term storage. Finland is not the first country thinking about it so there's a lot of material available. However, nobody has a complete and definitive solution to the problem yet. There are chances to do pioneering work and contribute to best practices also internationally.

On a personal level, I find projects on environmental and cultural data both very interesting. One challenge is where to focus energy in order to make a difference instead of getting lost between committee meetings and bureaucracy. Another challenge will be to keep work from ruling life, by reserving enough time for hobbies and rest. In November I already surpassed my 80 hours by 50%, not counting when work topics were in my thoughts during free time. However, that's still less than full time and I don't mind working hard if it feels important and rewarding. The half time contract has been a good starting point.

The half time work experiment

Posted: 2008-06-14 21:04:16

A corner of the CSC building with the company logo in morning light. Eight months ago, I made a contract for working 80 hours per month at CSC. Back then, I signed up until April 2008, and it was extended by one month because the project I worked on had an important testing phase in May. Now I'm off again to enjoy the summer and it's time to review how everything worked out. In short, it has been a good experiment.

My working hours varied between 49 and 105 per month, the average being 82. There was occasionally an urgent task to finish or a problem to fix, but never too much pressure or stress. I was able to put a little bit of effort into a couple of side projects while focusing more than 80% of the hours on my main task, setting up the Finnish part of data storage for the CERN LHC particle accelerator. It has been exciting to have a tiny role in one of the largest projects ever undertaken by humans. The real test will come when the accelerator will start in August, but the data storage installation project reached its goals with positive feedback, so I'd call it a success. A big thanks for that goes of course to my colleagues who did their parts of the job competently and were great to work with.

I generally went to the workplace on Mondays and Thursdays and more randomly on other days, in particular skipping most Fridays. Whenever I had activities in the clubs I belong to or just didn't feel like working, I could cut those days short or stay out of the office. Compared to my previous full time employment, I'd say the hours I spent at work were more efficient. While working half time I probably got about 60% of the work done compared to being a full timer. On the other hand, further cutting down the number of hours per month would have probably lowered efficiency again, because there is always some overhead due to meetings, company events, emails and administrative work.

On the hobbies side, I had time for more or less what I planned to do in Finnish Linux User Group: no major new projects but at least helping to get the group back in life after a couple of problematic years. I also continued going to Chinese lessons, although I didn't progress too much. If I really want to learn to communicate in Chinese I'll have to go to China or at least put much more effort in studying it than I've done now.

In Japania ry, a Finnish-Japanese friendship society, I didn't get done nearly as much as I wanted, so most of the things which would need my computer skills are still hanging in the same state they were 8 months ago. A few more personal projects I had in mind also didn't progress at all which annoys me a bit. However, most importantly, I had time to go out with friends, enjoy concerts and parties, read a few books, relax and get enough sleep. April and May were a bit on the busy side, but overall it has certainly been more balanced than a few years ago when I tried to do all the same while being employed full time.

Compared to life before my one year on the road, surprisingly little changed. Same employer, mostly same hobbies, only taking a bit more time through a non-standard work contract. However, that was an important difference. It was like being inside the rat race but looking around and observing instead of rushing full speed to win the race. I could compare the situation before, during and after my year out. There were some interesting discussions with colleagues and friends which would have never taken place had I stayed away for good.

As I wrote in the beginning, I'm again without a job following my own decision not to extend the fixed term contract. This time the leave is less about desire to travel and see the world, although I plan to do a bit of that too. It's more like a step out of the routine leaving room for new ideas.

Books of my homeless friends

Posted: 2007-11-24 16:10:27

Platinainen pilvenreuna books at the Helsinki book fair, October 2007. A few months ago I wrote about meeting Päivi and Santeri in Phnom Penh, Cambodia. They describe themselves as homeless loiterers and claim not to be doing much anything, but they've turned out to be quite active in writing books. They started with La Habanera (available in Finnish, in English and even in Hebrew) which tells their story of quitting their jobs and leaving Finland to escape the rat race. More recently published Platinainen pilvenreuna (in Finnish) describes Santeri's life as an entrepreneur in more detail through the rise and fall of Finnish Software Engineering SOT Oy, his open source software company.

Platinainen pilvenreuna was particularly interesting for me as during the good days of SOT I was the press secretary of the Finnish Linux User Group FLUG ry, collaborating with Santeri quite often. This is also mentioned in the book. Our relationship obviously changed when he left but friendship stayed. I helped a little with the book by reviewing draft versions of it during spring 2007. It was actually quite fun to read about familiar events in the recent history of information technology in Finland, while relaxing at a bamboo hut by the Indian sea.

Many bits and pieces of information in Platinainen pilvenreuna are in public for the first time. Facts are at least mostly correct. The story is told from Santeri's point of view, which may raise some different opinions on how the more private events actually went — the relations between him and some other main players were rocky at times. The main author of the book is actually Santeri's wife Päivi, which was probably good both for the balance and fluency of the text.

If you'd like to check out the book without buying it, it should be available in some libraries in Finland and I have two copies which I'll be happy to borrow (one of them is out right now). At least I personally liked the book and can therefore recommend it. See also Päivi's and Santeri's other books (in English | in Finnish) and Päivi's blog (in Finnish) about literature, reading and writing.

Working again for HIP

Posted: 2007-11-22 00:59:10

CSC announced last week the collaboration between Helsinki Institute of Physics (HIP) and CSC to build the Finnish part of the data storage and computing environment required to observe what happens to minuscule particles when they hit each other really hard. Digitoday also picked up the story (in Finnish).

I worked for HIP during summers 1999-2002 (the first two being physically located at CERN) and additionally part time along with my studies during 2001-2002. After my exchange year in Japan 2002-2003, I joined CSC. Now, my task there is setting up the data storage system of this experiment — with the help of a couple of colleagues of course. In other words, I'm again working for HIP, although not directly employed by them. Funny how you always end up doing pretty much the same things... :)

