The code for creating a single TSV-file of the “Dispatch” with each item being a record with date, type, header and the text was essentially the same as for the clean copies of the items of the last assignment with only minor changes:

  • The target folder was created and named differently (TSV_files)
  • The list where the items were collected was put outside of the loop
source_location = "./practice3/"   # the folder in which the "dispatch" is
target_location = "./TSV_files/" # the folder in which the new files will be saved 

lof=os.listdir(source_location) # getting all files from a folder
list_items= [] # to append list of items at the end, now out of loop 
  • All line breaks were exchanged by “;;;” so that it is easier to recreate them later from the TSV-format. This was done in the “cleaning part”
clean = re.sub("<[^<]+>", "", r) #get rid of all < >
clean = re.sub(" +\n|\n +","\n",clean) #get rid of all line breaks with spacing
clean = clean.strip() #get rid of all spacing in beginning and end
clean = re.sub("\n+",";;;",clean) #get rid of all additional line breaks
header = re.sub("<[^<]+>", "", header_section) #cleaning header of < > 
  • The items were now called records and the needed information was seperated with a tabular (“\t”) as the TSV format requires. Furthermore, the end of a record was marked with 6 ;. Compared to the code for the clean copies, this was the only thing created (no extra final variables).
if len(re.sub("\W","", clean)) != 0: #noticed that some don't have text but still something -> this way don't include them
    record= "record:" + issue_date +"\t"+ item_type + "\t" + header + "\t" + clean + ";;;;;;;;;" #create record information
    list_items.append(record)    #append to list of all items in this issue
  • The part were the information is saved in a file was moved outside of the loop to create one single TSV-file. The file was called “dispatch.tsv”, making it clear by its extension that it is a TSV-file.
new_issue ="".join(list_items) #combines the parts

new_file="dispatch.tsv" # need to add the extension to work with it
with open(target_location+new_file, "w", encoding ="utf8") as f2: #open in target folder
    f2.write(new_issue) #write in one file all the information