Saturday, 2 January 2010

RESOURCES FOR DITA POSTINGS

All of my module lecture materials were the backbone of my learning in this module and therefore the basis of the content of these blog posts. Every time I have referenced (Lecture materials), I am referring to the lecture materials from the specified session I am writing about.

During my learning I also referenced:
http://www.city.ac.uk/tsg/unix/DoingMore.pdf
http://www.getty.com/
http://www.google.com/
http://homepage.mac.com/pmsexton/Germany/PhotoAlbum14.html
http://www.webopedia.com/DidYouKnow/Internet/2002/JPG_GIF_PNG.asp
http://www.w3.org/
http://www.w3.org/People/Berners-Lee/
http://www.w3schools.com/Css/default.asp
http://www.wikipedia.com/(entries on HTML, XML, SQL)

My programming work can be viewed at:
http://www.student.city.ac.uk/~abhp725/indexsession4rvsd2.html
http://www.student.city.ac.uk/~abhp725/mariaefstathioufirstjavascript.html

My blog can be viewed at:
http://www.thesmilinglibrarian.blogspot.com/

Hmmm … My Information Architect Ex-Boyfriend in 1997 Was Actually Cool?? (Session 10)

Information architecture and information technology: architecture and construction. Each relies on the other – it is a symbiotic relationship and will always impact my work as an information manager. A website is like a building, or like a grocery store for example like in lecture today - quite ‘Godard-esque’ as many people actually grocery shop online.

To identify the mystery vegetable I first entered my own descriptors ('vegetable, large white bulb, multiple green stems') which returned pictures of spring onions. I then searched for ‘vegetable pictures’ which returned veggie cornucopia pictures. Then I searched 'vegetable descriptions' which returned sites that were child-oriented. Then 'guide to vegetables' or 'vegetable guide' yielded a site that listed vegetables which required me to click on the vegetable names to access pictures and descriptions. I chose vegetable names that I couldn't identify until I found a picture that vaguely looked like the mystery vegetable. I entered the name of the pictured vegetable into Google Image search and the mystery picture came up number 3: a kohlrabi. This took over 5 minutes, but it would have taken me over 1 hour at the library.

The Tesco site was built with excellent architecture. The user has fantastic control with the 'shopping list feature'. The lack of graphics and more copy heavy product description works well. It is a practical choice of architecture favoring speed over prettiness, but the user can choose to see images. Amazon is busy and annoying with too many choices, but I can create a wish list (a desired library).

Information architecture has a wide reaching impact. Datacenters will proliferate as e-books become de rigueur and client-server applications based in JavaScript move from our desktops into the Cloud. This information tidal wave must be managed by information architects on the WWW, and our world will look very different. The learning materials and tools in this program are just one example. We are entirely digital and a significant amount of information architecture was required to put our learning experience together comprehensively.

Developing Applications Rather Than Applying My Make-Up (Session 9)

I learned that nothing I have done to date is actually programming, it is just ‘declarative’ code that defines the digital world. JavaScript is programming that allows for user interaction and results in more personalization of online experiences. (Lecture materials)

Here is my sad JavaScript:

Maria's JavaScript

JavaScript exists within HTML and it influences HTML processors in the WWW, and that is why it is written within an HTML file.

Durer’s Drunk Bunny Must Feel Inverted Too (Session 8)

I used Boolean search terms in Bing: all of these terms "arts & crafts movement" AND "architecture" AND "best" produces great results, in fact even better than when I did not specify 'all of these terms' in the search box. When I replaced AND with OR in the exact same query as above it gave me results I have no interest in (most were about cooking). No coherent answer was returned if I asked "Who is the best architect in the arts and crafts movement?" Not surprising, as the answer is debated widely.

To find a photo of the Durer statue of the Drunken Bunny in Nuremburg, I created two inverted files:
DOC 1: Durer and the Nuremberg rabbit statue
DOC 2: Gothic sculpture photos and archived in a ‘collection’

Inverted files allow for full searches of terms in a document. They are the central data structure allowing typical search engines to operate. Rather than Google searching through a forward index of listed words in a document, which would take significant power and time, inverted indexes are lists of how many documents contain a specific word, like, for example, ‘Durer’. So instead of searching a list of words for each document that exists on the WWW, there are lists of words that correspond to a number of existing documents that contain the word ‘Durer’. (Lecture materials) Inverted files don’t work as well for images, as they must be specifically tagged by the creator or user. This is why Flickr is better than Google, because users throughout the WWW tag photos with descriptive terms on Flickr. On Web 1.0 sites, photos can not be tagged to make them more searchable.


My Se-Qu-eL To Basic Algebra, But Actually Useful This Time Around (Session 7)

Doing complicated things on a black screen makes me feel smart. Following the directions, I am rushing through the exercises, and I love it. It reminds me of algebra class, like I am writing math. SQL commands are fantastic. This is how we find things, how we impose order on heaps of information. I am pleased that I can query effectively within a given set of rules.

SQL is a programming language that deals with files that are in a DATABASE format. You can create tables by choosing the relevant data as necessary (ordered by column) in order to make it useful for those who need to use the database to get information. (Lecture materials)

I searched SQL on Google and learned it is a language that was based on relational algebra. It is satisfying that my first reaction to SQL was the same as when I learned basic algebra. I attempted to make a database table that contained my favorite magazines and publishers and category names. It didn’t work. But if someone were writing the code, I believe I could guide them well in which publications were important within a given subject and the appropriate publisher’s name.

Oooh, I Just Loooove Your New Cascading Style Sheet … Did You Get It On Net-A -Porter? (Session 6)

Cascading Style Sheets are my answer to how websites look good. I understand why those who are good at writing CSS get compensated handsomely. It is important to interface with work colleagues that are masters of these style sheets, as your resources and should be presented in a user-friendly, engaging manner. Style is the key.

CSS does not result in consistent styles across browsers, because browsers read the code differently. This creates user dissatisfaction. CSS work is necessary create efficient, useful and popular information resources, but I am frustrated by my in ability to rattle off code. This can't be learned in a heartbeat.

I applied a CSS that I found on the internet from W3 Schools to my website:
Maria's HTML

"THIS IS LIKE OPENING THE BONNET TO SEE WHAT IS INSIDE." - Professor Andy MacFarlane

What A Coincidence! I Love To Consider Myself Valid And Well-Formed As Well … (Session 5)

XML is a richer form of HTML. XML and CSS are two technologies that have been agreed upon by W3W to support the exchange of information on the WWW. XML has a lot more tags than XML to account for semantics. It is a mark-up language that has exponentially more functionality and possibilities but requires a fraction of the processing power of other mark up languages like SGML, although they are compatible – SGML as a first attempt that was quickly replaced with XML. XML works through a collection of declarations that define structure, elements and attributes called DOCUMENT TYPE DEFINITION (DTD). (Lecture Materials)

My colleague and I flew through the XML code review exercises. We spotted all of the flaws that were either not valid or not well-formed. It was great fun – like playing a word game. Grammar and syntax aren't just concepts in books anymore, they are actually tools. XML is a language that more robustly serves information search and retrieval. It is a stack of virtual shelves upon which information lives.

An example of XML at work in my field, which is media and marketing, is GETTY IMAGES, one of the leading image banks in the world. See my search for images of Greece, both editorial and creative:
Getty Search - Images from Greece

Getty uses XML to result in images tagged with what the user specifies as the entire system operates according to the rules set by firstly DTDs ultimately defined by TCP/IP.

Issues arise when the metadata are not written into the XML as a user specifies. I entered the word ‘Tsangarada’ into the Getty search and it did not return any images. It is unlikely that they have no images taken from that location because it is notoriously beautiful, but they did not define that metadata to any images in their archive.

Advertising banners on the WWW also operate through XML. Moreover, this blog operates on XML. I actively use XML every day.

Graphic Designers Have Painstaking Jobs (Session 4)

Amazing - I have updated my webpage in my own html with a picture of my mother and me in Paris and an image of David Hockney from The New York Times, because today was all about images and graphics.

To view my altogether pathetic work please click on the following: Maria's HTML

I followed course instructions with images – manipulated them, changed sizes and colors with IrfanView 4.23. It was straightforward but I don’t enjoy the work. It’s great to know what is available, if needed, but I will post pictures on my blog with the tools available through Blogspot, and leave graphic design to the designers. For my marketing work I use MailingManager which allows images to be uploaded and manipulated (size, color, and quality).

The difference between a GIF and a JPG is the amount of information stored in each (bits per pixel). JPG stores millions of colors but has limited use on the WWW. More programs can read GIFs as they display only 256 colors, making them perfect for the WWW as they are easier to process and have the maximum amount of color that the WWW can handle. PNGs lose no information when they are shrunk in size because they use the fewest bits per pixel, thereby making them ideal for web pages, but they have a limited color palette so they have largely fallen out of use. (Reference)

Of Poets And Physicists … (Session 3)

This week’s graphics give a bird’s eye view of a digital world map. One showed different computers linked together, and the next showed a Domain Name system, which illustrated how the data associated with these names is stored digitally in different places. The world now has 4 billion computers at minimum. The ‘identity’ of data stored within them are assigned both names that are more intuitive to use, as well as address numbers exchanged during requests. (Lecture materials)

URLs (Uniform Resource Locators) enable the identification of files, within folders, within domains within a country. The information is transferred with something called HTTP (Hypertext Transfer Protocol) which is ‘how’ we make requests for information on the WWW. (Lecture materials)

Berners-Lee understood that in OLE (object linking or embedding) information can either be file –centered (embedding) or document-centered (linking). The WWW emerged through DOCUMENT CENTERED THINKING. Software that could ‘read’ images and graphics within documents helped with the growth of the WWW. The organization of information on the WWW is based on the linking of documents, which is non-linear and reflexive. That is the basis of the programming language HTML which is written specifically ‘document centered’ rather than ‘file centered’ for this system.

There is a language of HTML by which the WWW works otherwise it wouldn’t know how to transfer binary code. Data is held in packets and http, ftp, and telnet is governed by a set of protocols called TCP/IP. (Lecture materials)

‘Mark-up’ in the WWW is particularly important, and this is why we need to learn to ‘code’ in HTML because we are ‘marking-up’. This was born out of the idea of Berner-Lee’s ‘hypertext’, which is a means of “adding value to information”. (Lecture materials) Hypertext is a section of text within a document that incorporates links to other parts of the document or other documents. (Lecture materials)

As an undergrad, the Creative Writing department was buzzing about this new thing called ‘hypertext’ and everyone was including this ‘tool’ in their dissertations. Poetry students showed me work that linked to all different places. I was stunned by it 13 years ago. Little did I know that was the beginning of HTML.

HTML allows text and images to be appropriately formatted and subsequently exchanged on the WWW, which explains why I had problems with embedding links into blog posts, but I needed to understand HTML CODE to do it properly. This is empowering as a budding information manager because I don't want to call IT when I have a problem.

I must build my HTML skills at least to a rudimentary level so I can start posting my own URLs in case I must share information quickly. When we 'published' our HTML, it had to be through a specific server, not on our workstation's hard drive, and not through the general internet. This illustrates interactions of the WWW. My workstation HD can not speak to all workstations at University - code must be uploaded to a specific server in order for it to be shared with my colleagues.
Maria's HTML

‘Byte’ My Metadata, Murdoch! (Session 2)

‘Bits’ explain in real terms how much information is required for a computer to be a useful tool in ordering and retrieving information. I remember pop-up notes during the dial-up days which specified whether you had 56 or 128 speed of bits transmitted. ‘Bits’ make sense when recalling how this information was pertinent. The way data is stored, transmitted, represented and managed is based on bits, and thereby they are the foundations of how information is used, shared and developed today. The printing press looks lame.

Exponential possibilities are afforded by adding more bits but extra processing speed would be required by hardware to make binary code with more bits in a chain. (Lecture materials) I am peeking through the door of mathematics and am amazed. The data that binary code represents only means what the author wants it to mean – that why we have coding rules and standards. ‘It is a human decision to apply this meaning’. (Lecture materials) Code is meaningless unless context is defined and agreed by a community of users.

File Format (bits are stored in files) is defined by ASCII so we all use the same formatting rules in English to communicate between all computers (think of it as the ‘alphabet’ of a computer language) and data files help us organize the information on all computers effectively. (Lecture materials) The file is manipulated by the user as a distinct entity – it is a fundamental unit of information that can be used in a distinct way with the FORMAT defining how it can be used. This is the system by which the ‘library’ of a computer is organized.

Metadata is a set of rules that define how the data contained in files is useful, and this metadata is defined as important by the programmer. Metadata is either semantic or presentational, and semantic metadata proves more useful. (Lecture materials) This is the ‘Dewey’ system of a computer ‘library’. We are learning about metadata because this is how the internet works with ‘mark-up languages’. I remember seeing the word ‘mark-up’ in a programmers books once but it makes sense that there is mark up language coded into a PC in general (like Word) because the computer must search for files to operate. Similar principles apply to a mark-up language written with defined metadata as to physical document filing – a file pathname is the same as the structure of a document filing system. There is an academic theoretical model of this ‘root’ architecture in library sciences. There are other academics who say that information is actually organized like tangled, folding ‘ganglia’, but still all files in the digital world are organized with a root structure otherwise we would never be able to find anything. What information is stored in those files is what varies. Data can be stored in a manner that is either ‘File-Centered’ or ‘Document-Centered’. (Lecture materials)

There is sense of data ‘lossless-ness’ surrounding digital information. Because data can be infinitely retrieved/replicated/modified/referenced, IPR issues are created. (Lecture materials) This is demonstrated with the fight between Google and News Corp. – the information publishers/creators VS. search engines that allow us to access/retrieve that information.

Carefully Wading Into “The Messy Middle” (Session 1)

“Information exists in the messy middle.” (Rosenfeld and Morville, 2007) This is the essence of DITA. Information technology and architecture enables and makes use of data and knowledge. (Lecture materials)

Blogs are proliferating as a core digital communication used to organize and publish information. They are Web 2.0 technology, not static 1.0 Websites. Facebook can be viewed as a shared mini-blog platform where users do not move between URLs. It is assumed that users filter through blogs that are “credible versus rubbish”, and the rubbish thereby becomes irrelevant, therefore the democratic nature of Web 2.0. (Lecture materials) But blogs can be used by influential propagandists as well, e.g. lies spread about Obama being born in Kenya. Does democracy always get it right?

Setting up the blog was challenging because many programming languages and file formats are used. I hope to use the labeling system efficiently but can't see an option to create a robust labeling system in advance, which is frustrating. Tags are imperative to make your blog interesting and give it an ‘identity’, but tags also have to be useful for the end reader. My blog is targeted to friends/family, colleagues, and prospective employers and tags will be listed alongside for users to sort by interest. In essence is it is an online diary of my personal learning and self-discovery as I embark on a new career.

The UNIX form of information access is the foundation of information storage: the skeleton, the backbone of the “house of knowledge and information storage and access”. (Lecture materials) I had no dexterity at using the system. I know what it is, but I can’t use it. I went to http://www.city.ac.uk/tsg/unix/DoingMore.pdf and it is clear that I will not regularly command a computer through UNIX.

Certain keywords are fascinating: Graphical Information, Presentation of Representation (CSS), Information Retrieval (Google), Applications Development (JavaScript), Wikis, trackbacks (particularly cool), blogroll, Archive including Label/Tag List, Syndication (includes RSS feeds – rich site summary feeds in XML for tracking blogs). I subscribed to Google Reader. The feeds are overwhelming – must filter with keywords.

The purpose of DITA is to understand all the methods and tools available for CREATING/FINDING/ORGANISING digital information. There are industry standards and rules and teachable conventional systems. I did not know this before DITA.