Pandas Read Date Yy Instead of Yyyy

10 Tricks for Converting Numbers and Strings to Datetime in Pandas

Pandas tips and tricks to aid you get started with Information Analysis

B. Chen

When doing data analysis, information technology is important to ensure correct data types. Otherwise, you may get unexpected results or errors. Datetime is a common information type in data science projects and the information is often saved as numbers or strings. During data analysis, you will likely need to explicitly convert them to a datetime type.

This article volition discuss how to catechumen numbers and strings to a datetime blazon. More than specifically, you will acquire how to utilize the Pandas congenital-in methods to_datetime() and astype() to bargain with the following mutual bug:

  1. Converting numbers to datetime
  2. Converting strings to datetime
  3. Handling 24-hour interval first format
  4. Dealing with custom datetime format
  5. Treatment parse fault
  6. Handling missing values
  7. Assembling datetime from multiple columns
  8. Converting multiple columns at once
  9. Parsing appointment cavalcade when reading a CSV file
  10. Departure betwixt astype() and to_datetime()

Please ch due east ck out the Notebook for the source code.

1. Converting numbers to datetime

Pandas has 2 built-in methods astype() and to_datetime() that can exist used to convert numbers to datetime. For instance, to convert numbers announce second to datetime:

          df = pd.DataFrame({'appointment': [1470195805, 1480195805, 1490195805],
'value': [2, 3, 4]})

When using to_datetime() , we need to phone call it from Pandas and set up the argument unit='s':

          >>>            pd.to_datetime(df['date'], unit of measurement='s')                    0   2016-08-03 03:43:25
1 2016-11-26 21:xxx:05
2 2017-03-22 15:sixteen:45
Proper noun: date, dtype: datetime64[ns]

When using astype() , nosotros need to telephone call it from a Series (the engagement column) and pass in 'datetime[s]':

          >>> df['date'].astype('datetime64[south]')          0   2016-08-03 03:43:25
ane 2016-11-26 21:30:05
2 2017-03-22 15:sixteen:45
Name: date, dtype: datetime64[ns]

Similarly, nosotros can convert numbers announce other units (D,due south, ms, united states of america, ns) to datetime, for instance, numbers denote the day

          df = pd.DataFrame({'date': [1470, 1480, 1490],            
'value': [ii, iii, 4]})
>>> pd.to_datetime(df['date'], unit='D') 0 1974-01-10
1 1974-01-xx
2 1974-01-xxx
Name: engagement, dtype: datetime64[ns]
>>> df['date'].astype('datetime64[D]') 0 1974-01-10
i 1974-01-20
2 1974-01-30
Name: date, dtype: datetime64[ns]

2. Converting strings to datetime

Often, you'll find that dates are represented as strings. In Pandas, strings are shown as object, it'south the internal Pandas lingo for the string.

          >>> df = pd.DataFrame({'date':['3/10/2015','3/xi/2015','3/12/2015'],
'value': [2, 3, four]})
>>> df.dtypes
date object
value int64
dtype: object

Both to_datetime() and astype() can exist used to convert strings to datetime.

          >>> pd.to_datetime(df['appointment'])                    0   2015-03-10
one 2015-03-11
ii 2015-03-12
Proper name: engagement, dtype: datetime64[ns]
>>> df['date'].astype('datetime64') 0 2015-03-10
1 2015-03-11
two 2015-03-12
Proper name: engagement, dtype: datetime64[ns]

three. Handling day commencement format

By default, to_datetime() will parse strings with month first (MM/DD, MM DD, or MM-DD) format, and this arrangement is relatively unique in the United Land.

In most of the rest of the world, the day is written first (DD/MM, DD MM, or DD-MM). If you would like Pandas to consider solar day beginning instead of month, yous can prepare the statement dayfirst to True.

          df = pd.DataFrame({'date': ['3/ten/2000', 'three/eleven/2000', '3/12/2000'],
'value': [two, 3, four]})
df['engagement'] = pd.to_datetime(df['date'], dayfirst=Truthful)

Prototype by author

Alternatively, you pass a custom format to the argument format.

4. Treatment custom datetime format

Past default, strings are parsed using the Pandas built-in parser from dateutil.parser.parse. Sometimes, your strings might exist in a custom format, for example, YYYY-d-m HH:MM:SS. Pandas to_datetime() has an argument called format that allows you to laissez passer a custom format:

          df = pd.DataFrame({'date': ['2016-6-10 20:30:0',              
'2016-7-one 19:45:30',
'2013-10-12 4:5:1']
,
'value': [two, three, 4]})
df['engagement'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%South")

Image by writer

4. Handling parse error

If a appointment does not run into the timestamp limitations, nosotros will get a ParseError when converting. For case, an invalid cord a/11/2000:

          df = pd.DataFrame({'date': ['iii/10/2000',            'a/11/2000', 'three/12/2000'],
'value': [ii, three, 4]})
# Getting ParseError
df['date'] = pd.to_datetime(df['date'])

paradigm by writer

to_datetime() has an argument called errors that allows you lot to ignore the error or force an invalid value to NaT.

          df['appointment'] = pd.to_datetime(df['date'],            errors='ignore')
df

Image past author

And to force an invalid value to NaT:

          df['date'] = pd.to_datetime(df['date'],            errors='coerce')        

Image by author

6. Treatment missing values

In Pandas, missing values are given the value NaN, short for "Not a Number".

          df = pd.DataFrame({'date': ['3/10/2000',            np.nan, '3/12/2000'],
'value': [2, 3, four]})

When converting a cavalcade with missing values to datetime, both to_datetime() and astype() are changing Numpy's NaN to Pandas' NaT and this allows information technology to be a datetime.

          >>> df['date'].astype('datetime64')                    0   2000-03-10
1 NaT
2 2000-03-12
Proper noun: date, dtype: datetime64[ns]
>>> pd.to_datetime(df['date']) 0 2000-03-x
1 NaT
2 2000-03-12
Name: engagement, dtype: datetime64[ns]

Alternatively, we can replace Numpy NaN with another value (for example replacing NaN with '3/11/2000')

          df = pd.DataFrame({'date': ['3/10/2000', np.nan, 'iii/12/2000'],
'value': [ii, three, 4]})
df['appointment'] = df['date'].fillna('three/xi/2000').astype('datetime64[ns]')

To acquire more nearly working with missing values

7. Assembling a datetime from multiple columns

to_datetime() tin exist used to assemble a datetime from multiple columns every bit well. The keys (columns characterization) tin can be common abbreviations similar ['year', 'calendar month', 'mean solar day', 'infinitesimal', 'second', 'ms', 'us', 'ns']) or plurals of the same.

          df = pd.DataFrame({'twelvemonth': [2015, 2016],
'month': [2, three],
'twenty-four hours': [4, 5],
'hour': [10,11]
})

To create a datetime column from a subset of columns

          >>> pd.to_datetime(df[['calendar month','24-hour interval','year']])          0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

To create a datetime cavalcade from the entire DataFrame

          >>> pd.to_datetime(df)                    0   2015-02-04 ten:00:00
ane 2016-03-05 11:00:00
dtype: datetime64[ns]

8. Converting multiple columns at one time

And then far, nosotros have been converting data blazon one column at a time. There is a DataFrame method also called astype() allows the states to convert multiple cavalcade information types at once. It is time-saving when you take a agglomeration of columns you want to change.

          df = df.astype({
'date_start': 'datetime64',
'date_end': 'datetime64'

})

ix. Parsing date cavalcade when reading a CSV file

If you want to set the data type for each column when reading a CSV file, you lot can use the argument parse_date when loading data with read_csv():

Note the data blazon datetime64 is not supported by dtype, and we should use parse_dates argument instead.

          df = pd.read_csv(
'dataset.csv',
dtype={
# datetime64[ns] is not supported
'value': 'float16'
},
parse_dates=['date']
)

To larn more near parsing date column with Pandas read_csv():

ten. Departure betwixt astype('datetime64') and to_datetime()

astype() is the common method to catechumen data blazon from one to other. The method is supported by both Pandas DataFrame and Series. If y'all demand to catechumen a bunch of columns, the astype() should be the outset choice as information technology:

  • can convert multiple columns at one time
  • has the all-time performance (shown in the screenshot below)

However, astype() won't work for a column with invalid information. For instance, an invalid date string a/eleven/2000. If we effort to employ astype() we would get a ParseError. As of Pandas 0.twenty.0, this error tin be suppressed by setting the argument errors='ignore', just your original data will be returned untouched.

The Pandas to_datetime() function tin handle these values more than gracefully. Rather than fail, we tin set up the argument errors='coerce' to coerce invalid values to NaT.

In addition, it can be very difficult to employ astype() when dealing with custom datetime format. The Pandas to_datetime() has an statement called format and offers more possibility in the way of custom conversion.

Decision

We have seen how we can convert a Pandas information column to a datetime blazon with astype() and to_datetime(). to_datetime() is the simplest way and offers error handling and more possibility in the mode of custom conversion, while astype() has better operation and tin convert multiple columns at once.

I hope this article will help you to save time in learning Pandas. I recommend you to check out the documentation for the astypes() and to_datetime() API and to know near other things you lot can exercise.

Thank you for reading. Delight check out the notebook for the source code and stay tuned if you lot are interested in the applied aspect of motorcar learning.

You may exist interested in some of my other Pandas articles:

  • 10 tricks to convert data to a numeric blazon in Pandas
  • Pandas json_normalize() y'all should know for flattening JSON
  • All Pandas cutting() you should know for transforming numerical data into categorical information
  • Using Pandas method chaining to better lawmaking readability
  • How to do a Custom Sort on Pandas DataFrame
  • All the Pandas shift() you should know for data analysis
  • When to use Pandas transform() part
  • Pandas concat() tricks you should know
  • All the Pandas merge() you should know
  • Working with datetime in Pandas DataFrame
  • Pandas read_csv() tricks you lot should know
  • iv tricks you should know to parse date columns with Pandas read_csv()

More tutorials tin can be found on my Github

kellerherivink.blogspot.com

Source: https://towardsdatascience.com/10-tricks-for-converting-numbers-and-strings-to-datetime-in-pandas-82a4645fc23d

0 Response to "Pandas Read Date Yy Instead of Yyyy"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel