The code for creating a single TSV-file of the “Dispatch” with each item being a record with date, type, header and the text was essentially the same as for the clean copies of the items of the last assignment with only minor changes:
- The target folder was created and named differently (TSV_files)
- The list where the items were collected was put outside of the loop
source_location = "./practice3/" # the folder in which the "dispatch" is target_location = "./TSV_files/" # the folder in which the new files will be saved lof=os.listdir(source_location) # getting all files from a folder list_items=  # to append list of items at the end, now out of loop
- All line breaks were exchanged by “;;;” so that it is easier to recreate them later from the TSV-format. This was done in the “cleaning part”
clean = re.sub("<[^<]+>", "", r) #get rid of all < > clean = re.sub(" +\n|\n +","\n",clean) #get rid of all line breaks with spacing clean = clean.strip() #get rid of all spacing in beginning and end clean = re.sub("\n+",";;;",clean) #get rid of all additional line breaks header = re.sub("<[^<]+>", "", header_section) #cleaning header of < >
- The items were now called records and the needed information was seperated with a tabular (“\t”) as the TSV format requires. Furthermore, the end of a record was marked with 6 ;. Compared to the code for the clean copies, this was the only thing created (no extra final variables).
if len(re.sub("\W","", clean)) != 0: #noticed that some don't have text but still something -> this way don't include them record= "record:" + issue_date +"\t"+ item_type + "\t" + header + "\t" + clean + ";;;;;;;;;" #create record information list_items.append(record) #append to list of all items in this issue
- The part were the information is saved in a file was moved outside of the loop to create one single TSV-file. The file was called “dispatch.tsv”, making it clear by its extension that it is a TSV-file.
new_issue ="".join(list_items) #combines the parts new_file="dispatch.tsv" # need to add the extension to work with it with open(target_location+new_file, "w", encoding ="utf8") as f2: #open in target folder f2.write(new_issue) #write in one file all the information