Blog software upgrade helped with comment spam

Posted: 2013-09-30 18:41:00, Categories: General, Free software, 487 words (permalink)

I recently upgraded the software of my blog to the most recent version of b2evolution. The main reason was simply to stay current with security and bug fixes. Another reason was the new "mass delete" option to get quickly rid of comment spam. As a nice surprise, most of the daily load of spam disappeared without even needing to go through and remove it.

My blog doesn't have a large number of followers, who would regularly comment on the articles. Over 90% of the comments have already for a long time been spam, senseless junk messages trying to advertise some product. I haven't really understood why the spammers have been so persistent: the comments are moderated which means that their message never gets out to the public anyway. Besides, a Captcha plugin at the bottom of the comment form is at least attempting to make sure that commenters are humans and not spam generating robots. I'm not using the centralized blacklist of b2evolution, because I don't want to block domains or content based on keywords, at least if I haven't selected the blocking criteria myself.

In any case, by summer 2013 the situation had gotten pretty bad. The blog was collecting more than a thousand spam comments per month, the traffic of my website exceeded 10GB/month, and over 70% of hits were coming from China. That sounded strange: my site has no content about China or in Chinese language. A couple of Chinese sites link to a few of my photos, but that didn't explain the big mass of visits. Logs showed that most of the traffic was requesting articles from the blog and posting comments on them.

After the upgrade, the number of spam comments dropped immediately to about one per day. The share of hits coming from China dropped to less than half of the total, although it's still surprisingly high at 40%. When looking at the amount of transferred data, connections from United States are leading with about 30% of total, Chinese traffic is less than 10%. Before the upgrade, the amount of traffic from China was equal to that from the U.S.. Finland comes as number three, but far behind the top two. That sounds reasonable — as I write in English it's not a big surprise to have more visitors from U.S. than from Finland.

The most likely conclusion coming to my mind is that the old version of the software had a bug, allowing comments to be posted by automated scripts without going through the Captcha. Although I didn't change the Captcha plugin itself, now it seems again to keep most of the spam out. Or the scripts used by spammers simply aren't upgraded yet to match the new b2evo. Whatever the reason, let's hope it stays like this at least for a while.

Non-spam comments are welcome. :-) Particularly, if you notice something has been broken due to the upgrade, I'd be happy to hear about it.

Turku 1881. Picture from the Finnish National Archive, Senaatin
     kartasto IX: 16, copyright expired. My half-time work experiment felt good enough that I'm doing it again. Same employer (CSC), 80 hours per month contract until end of May 2009. This time I'm working on services providing storage and access to environmental and cultural data. There are opportunities to study topics I care about and participate in making design choices, which makes the deal even more attractive than last time.

Geographic information wants to be free

A lot of geographic information and data about natural resources is gathered around the world by governmental research institutes. The U.S. has been fairly open with providing access to such data while in Europe most institutes have been sitting on their databases and selling information with very restricted terms. Now the situation is slowly changing, partly due to an OECD recommendation which suggests open access to research data from public funding. The community maintained OpenStreetMap project has challenged closed models, and increasing popularity of partly open privately funded services such as Google Maps plays a role as well. The INSPIRE EU directive (see also the Ministry of Agriculture and Forestry INSPIRE page in Finnish) aims towards interoperability and sharing of geographical data, although its level of required openness falls behind the OECD recommendation.

In Finland, there's a lot of high quality data but it is scattered and little used outside the organizations collecting it. Many voices are raised in support of open access, for example in the Pätevä seminar two weeks ago. In practice progress is rather slow. The public research institutes are pointing towards the Ministry of Finance and the law about fees of public services (Maksuperustelaki) which their funding models are partly based on. Curiously, the key reason for fees in the law is to avoid causing harm to private companies competing in the same domain. However, many sources show (meteorology example) that business overall benefits from freely available public information. A fundamental change of government policy is needed in order to have open access by default also in Finland.

My first task in October was to contribute a little bit to a survey which reviews the current state of geographic information related data in Finland, and gives suggestions on what should be done. The survey focused on what data exists, how to make it available and usable for officials, researchers and politicians, and interoperability issues between different datasets. For example different coordinate systems and semantics are a big hindrance to cross analysis. I personally believe that increased openness will gradually help to improve lower level data compatibility as well. Fully open access raises strong opinions both in favor and against, but there seems to be a more general consensus that at least researchers should have convenient access to data.

(Added a link to the survey on geographic information related data on May 31, 2009)

Preserving cultural data for the next 100 years

Since beginning of November I'm participating in the National digital library project, which is about access, usability and long term preservation of Finnish cultural data. The Finnish National Archives and the National Library are digitizing old books, newspapers and other documents. In this case, open access gladly seems to be the default for at least old works whose copyrights have expired. You can already check out 18th and 19th century newspapers with full text search or municipal documents dating back until Middle Ages. The picture of this blog entry is part of a map of Turku in 1881, retrieved from the National Archives (see the full map). Several Finnish museums are also digitizing their collections. Many new documents, photos, movies and modern art works are already digital when they are created.

Preserving all this material reliably for tens and hundreds of years is a challenging task. The lifetime of computer and storage systems is around five or at most a couple of dozen years. Text in a paper book stays readable for centuries, but digital data will have to be continuously transferred to new, yet unseen storage systems. Current file formats and software to access them will become outdated over time. Human error or attack can have much greater impact in the digital archive than spilling coffee over one book in a physical library.

CSC does not take part in the digitization, but we are currently working on a preliminary requirements specification for the long term storage. Finland is not the first country thinking about it so there's a lot of material available. However, nobody has a complete and definitive solution to the problem yet. There are chances to do pioneering work and contribute to best practices also internationally.

On a personal level, I find projects on environmental and cultural data both very interesting. One challenge is where to focus energy in order to make a difference instead of getting lost between committee meetings and bureaucracy. Another challenge will be to keep work from ruling life, by reserving enough time for hobbies and rest. In November I already surpassed my 80 hours by 50%, not counting when work topics were in my thoughts during free time. However, that's still less than full time and I don't mind working hard if it feels important and rewarding. The half time contract has been a good starting point.

Books of my homeless friends

Posted: 2007-11-24 16:10:27, Categories: Travel, Work, Free software, Literature, 352 words (permalink)

Platinainen pilvenreuna books at the Helsinki book fair, October 2007. A few months ago I wrote about meeting Päivi and Santeri in Phnom Penh, Cambodia. They describe themselves as homeless loiterers and claim not to be doing much anything, but they've turned out to be quite active in writing books. They started with La Habanera (available in Finnish, in English and even in Hebrew) which tells their story of quitting their jobs and leaving Finland to escape the rat race. More recently published Platinainen pilvenreuna (in Finnish) describes Santeri's life as an entrepreneur in more detail through the rise and fall of Finnish Software Engineering SOT Oy, his open source software company.

Platinainen pilvenreuna was particularly interesting for me as during the good days of SOT I was the press secretary of the Finnish Linux User Group FLUG ry, collaborating with Santeri quite often. This is also mentioned in the book. Our relationship obviously changed when he left but friendship stayed. I helped a little with the book by reviewing draft versions of it during spring 2007. It was actually quite fun to read about familiar events in the recent history of information technology in Finland, while relaxing at a bamboo hut by the Indian sea.

Many bits and pieces of information in Platinainen pilvenreuna are in public for the first time. Facts are at least mostly correct. The story is told from Santeri's point of view, which may raise some different opinions on how the more private events actually went — the relations between him and some other main players were rocky at times. The main author of the book is actually Santeri's wife Päivi, which was probably good both for the balance and fluency of the text.

If you'd like to check out the book without buying it, it should be available in some libraries in Finland and I have two copies which I'll be happy to borrow (one of them is out right now). At least I personally liked the book and can therefore recommend it. See also Päivi's and Santeri's other books (in English | in Finnish) and Päivi's blog (in Finnish) about literature, reading and writing.

Kindle and the future of electronic books

Posted: 2007-11-21 01:34:00, Categories: Free software, Literature, 526 words (permalink)

Picture of the Kindle ebook reader by ShakataGaNai. A couple of days ago, Amazon unveiled their electronic book reader called Kindle. Newsweek published an extensive story about it. In short, Kindle is a roughly A5 size, 300 gram electronic device with a daylight-readable grayscale display and possibility to buy electronic books from Amazon.com. See also Gizmodo's hands on test and Wired's critical comparison between Kindle and Sony Reader for more information.

There has been several attempts in the eBook arena before, none of which have been hugely succesful. It's interesting, considering that most other forms of content such as photos, music and videos have already gone digital. Net is used for many kinds of text content such as email and news, but books and magazines are still mainly read on paper.

The most important new feature in Kindle compared to other similar readers is its mobile Internet connection. Transfer fees are included in the price at least in the U.S. That makes it easy to retrieve a new book any time, without the need of connecting the reader to a computer. Compared to PDAs, the biggest difference is the daylight readable screen and longer battery life. Kindle is advertised to survive for up to one week on a single charge, or two days with the wireless connection enabled. That sounds finally enough for not having to care about charging all the time.

Books to the device are delivered in an Amazon proprietary DRM-protected format. Kindle supports some unprotected formats as well, but most of the content will be delivered as protected files. Fortunately it seems that purchased books can at least be backed up to a computer and continued to be read even if Amazon would decide to discontinue the service. However, copying and reading them freely on any device doesn't seem to be possible. It's not a surprise, but still disappointing.

Having a lightweight, daylight-readable, long battery life, Internet connected device for reading books and other information on the web (Kindle provides access to Wikipedia and some kind of limited web browsing) sounds attractive. I'm certainly a potential customer. However, I hate being locked in some proprietary format for the content I buy. Therefore I'm rather unwilling to go shopping for DRM-crippled books; unprotected PDFs would be much better. Similarly, I've already bought plenty of music online in MP3, Ogg Vorbis and FLAC formats, but not a single copy-protected song. Perhaps I could consider if the restrictions are circumvented first — I have bought DVD's after DeCSS came out.

It's early to say whether Kindle will succeed or not. However, with the backing of a company as large as Amazon, which is able to provide a huge selection of available content, it will certainly not be ignored on the market. The majority seem to be less strict about DRM than me, as iTunes taking over a large chunk of music sales has shown. Whether it'll be the now-released Kindle or one of its successors which will become the killer device, I do believe that the real transition from paper to electronic books has now begun.

(The picture in this article is from Wikipedia, taken by user ShakataGaNai, see the picture page for details.)

Open source discussion in Tietokone magazine

Posted: 2007-11-17 13:45:45, Categories: Work, Free software, 178 words (permalink)

I was recently invited in a panel discussion about open source, organized by Tietokone, the largest IT magazine in Finland geared towards business users. An article about the discussion appeared in the latest issue (Tietokone 13/2007) and the full recording is available online (in Finnish).

The discussion focused around commonly heard claims, such as "Open source is socialism", "One cannot make money with open source", "There is no innovation in open source" and "Open source software is difficult to use". The starting point wasn't that the claims would all be true, rather the idea was to debunk some common false assumptions and present a more balanced view on the topic.

One of the invited participants didn't make it, so eventually it was only me and Janne Pikkarainen, one of the admins of the MBnet web site, with Tietokone magazine editor Kari Haakana throwing the claims questions from the other side of the table. It still made a good discussion, so if you're interested in Free / open source software (and can understand Finnish), I can recommend listening to the interview.

